All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/19] KVM GICv3 emulation
@ 2014-11-14 10:07 Andre Przywara
  2014-11-14 10:07 ` [PATCH v4 01/19] arm/arm64: KVM: rework MPIDR assignment and add accessors Andre Przywara
                   ` (19 more replies)
  0 siblings, 20 replies; 80+ messages in thread
From: Andre Przywara @ 2014-11-14 10:07 UTC (permalink / raw)
  To: linux-arm-kernel

This is version 4 of the GICv3 guest emulation series.

As the previous series, this is based on v3.18-rc2. I didn't bother to
update this to keep it diff-able with v3 of the patch series.

I addressed most of the comments from Christoffer's last review round
(mange tak!). There were numerous changes, though mostly reworking and
style changes. To ease spotting them I updated the kvm-gicv3/v3 branch
in the repo mentioned below to carry all the delta patches. Those
patches are just for reference to see what has changed between
v3 and v4. For review and all other purposes please use the v4 branch.

For an changelog summary see below, also each patch carries a
changelog now.
Patch 02, 04, 06, 07, 09, 10, 11 have had no code changes compared to
their previous counterparts.
The (v3) patch 06 is gone now, the new patch 14 has changed dramatically
and a new patch (16/19) has been added.

I quickly tested this version with a GICv3 capable fast model in all
endianness modes (LE guest on LE host, BE on LE, LE on BE, BE on BE).
Both a GICv2 and a GICv3 guest were booted in all four combinations.
In contrast to the v3 commit message, cross-endianness was working
fine already with the previous patch series.

A git repo hosting all these patches lives in the kvm-gicv3/v4 branch
of:
http://www.linux-arm.org/git?p=linux-ap.git
git://linux-arm.org/linux-ap.git
-----

GICv3 is the ARM generic interrupt controller designed to overcome
some limits of the prevalent GICv2. Most notably it lifts the 8-CPU
limit. Though with Linux-3.17 Marc introduced support for hosts to
use a GICv3, the CPU limitation still applies to KVM guests, since
the current code emulates a GICv2 only.
Also, GICv2 backward compatibility being optional in GICv3, a number
of systems won't be able to run GICv2 guests.

This patch series provides code to emulate a GICv3 distributor and
redistributor for any KVM guest. It requires a GICv3 in the host to
work. With those patches one can run guests efficiently on any GICv3
host. It has the following features:
- Affinity routing (support for up to 255 VCPUs, more possible)
- System registers (as opposed to MMIO access)
- No ITS
- No priority support (as the GICv2 emulation)
- No save / restore support so far (will be added soon)
- Only Group1 interrupts support

The first patches actually refactor the current VGIC code to make
room for a different VGIC model to be dropped in with Patch 16.
The remaining patches connect the new model to the kernel backend and
the userland facing code.

The series goes on top of v3.18-rc2.
The necessary patches for kvmtool to enable the guest's GICv3 have
been posted here before [1], an updated version will follow soon.

There was some testing on the fast model with some I/O and interrupt
affinity shuffling in a Linux guest with a varying number of VCPUs as
well as some testing on a Juno board (GICv2 only, to spot regressions).

Please review and test.
I would be grateful for people to test for GICv2 regressions also
(so on a GICv2 host with current kvmtool/qemu), as there is quite
some refactoring on that front.

Much of the code was inspired by MarcZ, also kudos to him for doing
the rather painful rebase on top of v3.17-rc1.

Cheers,
Andre.

[1] https://lists.cs.columbia.edu/pipermail/kvmarm/2014-June/010086.html

Changes v3 ... v4:
* bug-fix in handling GICv3 redistributor CFG register
* move set/get_lr from gic_vm_ops back to vgic_ops (get rid of v3 06/19)
* getting rid of init_emul() at all
* rework guest GIC model initialization
* use non-atomic bit-set and bit-clear functions
* split up handle_mmio_misc* into multiple functions
* refine handling of some reserved registers
* use symbolic names for ICC_SGI1R_EL1 register fields (new patch 16/19)
* move private parameter from MMIO accessors to struct kvm_mmio_exit
* added documentation of new GICv3 guest device
* added lots of comments
* some renaming of identifiers
* minor changes in style and code flow of various functions

Changes v2 ... v3:
* rebase to v3.18-rc2
* adapt to new kvm_register_device() function
* split up vm_ops patch and the GICv2 split-off patch to ease review
* various smaller changes due to Christoffer's review
* fix compilation for arm
* remove support for trapping SGI sysreg accesses on arm hosts

Changes v1 ... v2:
* rebase to v3.17-rc1, caused quite some changes to the init code
* new 9/15 patch to make 10/15 smaller
* fix wrongly ordered cp15 register trap entry (MarcZ)
* fix SGI broadcast (thanks to wanghaibin for spotting)
* fix broken bailout path in kvm_vgic_create (wanghaibin)
* check return value of init_emulation_ops() (wanghaibin)
* fix return value check in vgic_[sg]et_attr()
* add header inclusion guards
* remove double definition of VCPU_NOT_ALLOCATED
* some code move-around
* whitespace fixes

Andre Przywara (19):
  arm/arm64: KVM: rework MPIDR assignment and add accessors
  arm/arm64: KVM: pass down user space provided GIC type into vGIC code
  arm/arm64: KVM: refactor vgic_handle_mmio() function
  arm/arm64: KVM: wrap 64 bit MMIO accesses with two 32 bit ones
  arm/arm64: KVM: introduce per-VM ops
  arm/arm64: KVM: move kvm_register_device_ops() into vGIC probing
  arm/arm64: KVM: dont rely on a valid GICH base address
  arm/arm64: KVM: make the maximum number of vCPUs a per-VM value
  arm/arm64: KVM: make the value of ICC_SRE_EL1 a per-VM variable
  arm/arm64: KVM: refactor MMIO accessors
  arm/arm64: KVM: refactor/wrap vgic_set/get_attr()
  arm/arm64: KVM: add vgic.h header file
  arm/arm64: KVM: split GICv2 specific emulation code from vgic.c
  arm/arm64: KVM: add opaque private pointer to MMIO data
  arm/arm64: KVM: add virtual GICv3 distributor emulation
  arm64: GICv3: introduce symbolic names for GICv3 ICC_SGI1R_EL1 fields
  arm64: KVM: add SGI generation register emulation
  arm/arm64: KVM: enable kernel side of GICv3 emulation
  arm/arm64: KVM: allow userland to request a virtual GICv3

 Documentation/virtual/kvm/devices/arm-vgic.txt |   21 +-
 arch/arm/include/asm/kvm_emulate.h             |    5 +-
 arch/arm/include/asm/kvm_host.h                |    3 +
 arch/arm/include/asm/kvm_mmio.h                |    1 +
 arch/arm/kvm/Makefile                          |    1 +
 arch/arm/kvm/arm.c                             |   23 +-
 arch/arm/kvm/psci.c                            |   17 +-
 arch/arm64/include/asm/kvm_emulate.h           |    5 +-
 arch/arm64/include/asm/kvm_host.h              |    5 +
 arch/arm64/include/asm/kvm_mmio.h              |    1 +
 arch/arm64/include/uapi/asm/kvm.h              |    7 +
 arch/arm64/kernel/asm-offsets.c                |    1 +
 arch/arm64/kvm/Makefile                        |    2 +
 arch/arm64/kvm/sys_regs.c                      |   37 +-
 arch/arm64/kvm/vgic-v3-switch.S                |   14 +-
 drivers/irqchip/irq-gic-v3.c                   |   14 +-
 include/kvm/arm_vgic.h                         |   34 +-
 include/linux/irqchip/arm-gic-v3.h             |   44 +
 include/linux/kvm_host.h                       |    2 +
 include/uapi/linux/kvm.h                       |    2 +
 virt/kvm/arm/vgic-v2-emul.c                    |  805 ++++++++++++++++++
 virt/kvm/arm/vgic-v2.c                         |    3 +
 virt/kvm/arm/vgic-v3-emul.c                    | 1020 +++++++++++++++++++++++
 virt/kvm/arm/vgic-v3.c                         |   89 +-
 virt/kvm/arm/vgic.c                            | 1065 ++++++------------------
 virt/kvm/arm/vgic.h                            |  122 +++
 26 files changed, 2469 insertions(+), 874 deletions(-)
 create mode 100644 virt/kvm/arm/vgic-v2-emul.c
 create mode 100644 virt/kvm/arm/vgic-v3-emul.c
 create mode 100644 virt/kvm/arm/vgic.h

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 01/19] arm/arm64: KVM: rework MPIDR assignment and add accessors
  2014-11-14 10:07 [PATCH v4 00/19] KVM GICv3 emulation Andre Przywara
@ 2014-11-14 10:07 ` Andre Przywara
  2014-11-18 10:35   ` Eric Auger
  2014-11-23  9:34   ` Christoffer Dall
  2014-11-14 10:07 ` [PATCH v4 02/19] arm/arm64: KVM: pass down user space provided GIC type into vGIC code Andre Przywara
                   ` (18 subsequent siblings)
  19 siblings, 2 replies; 80+ messages in thread
From: Andre Przywara @ 2014-11-14 10:07 UTC (permalink / raw)
  To: linux-arm-kernel

The virtual MPIDR registers (containing topology information) for the
guest are currently mapped linearily to the vcpu_id. Improve this
mapping for arm64 by using three levels to not artificially limit the
number of vCPUs.
To help this, change and rename the kvm_vcpu_get_mpidr() function to
mask off the non-affinity bits in the MPIDR register.
Also add an accessor to later allow easier access to a vCPU with a
given MPIDR. Use this new accessor in the PSCI emulation.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
Changelog v3...v4:
- rename kvm_vcpu_get_mpidr() to kvm_vcpu_get_mpidr_aff()
- simplify kvm_mpidr_to_vcpu()
- fixup comment

 arch/arm/include/asm/kvm_emulate.h   |    5 +++--
 arch/arm/include/asm/kvm_host.h      |    2 ++
 arch/arm/kvm/arm.c                   |   13 +++++++++++++
 arch/arm/kvm/psci.c                  |   17 +++++------------
 arch/arm64/include/asm/kvm_emulate.h |    5 +++--
 arch/arm64/include/asm/kvm_host.h    |    2 ++
 arch/arm64/kvm/sys_regs.c            |   11 +++++++++--
 7 files changed, 37 insertions(+), 18 deletions(-)

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index b9db269..3ae88ac 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -23,6 +23,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmio.h>
 #include <asm/kvm_arm.h>
+#include <asm/cputype.h>
 
 unsigned long *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num);
 unsigned long *vcpu_spsr(struct kvm_vcpu *vcpu);
@@ -162,9 +163,9 @@ static inline u32 kvm_vcpu_hvc_get_imm(struct kvm_vcpu *vcpu)
 	return kvm_vcpu_get_hsr(vcpu) & HSR_HVC_IMM_MASK;
 }
 
-static inline unsigned long kvm_vcpu_get_mpidr(struct kvm_vcpu *vcpu)
+static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
 {
-	return vcpu->arch.cp15[c0_MPIDR];
+	return vcpu->arch.cp15[c0_MPIDR] & MPIDR_HWID_BITMASK;
 }
 
 static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 53036e2..b443dfe 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -236,6 +236,8 @@ static inline void vgic_arch_setup(const struct vgic_params *vgic)
 int kvm_perf_init(void);
 int kvm_perf_teardown(void);
 
+struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
+
 static inline void kvm_arch_hardware_disable(void) {}
 static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 9e193c8..c2a5c69 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -977,6 +977,19 @@ static void check_kvm_target_cpu(void *ret)
 	*(int *)ret = kvm_target_cpu();
 }
 
+struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr)
+{
+	struct kvm_vcpu *vcpu;
+	int i;
+
+	mpidr &= MPIDR_HWID_BITMASK;
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (mpidr == kvm_vcpu_get_mpidr_aff(vcpu))
+			return vcpu;
+	}
+	return NULL;
+}
+
 /**
  * Initialize Hyp-mode and memory mappings on all CPUs.
  */
diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
index 09cf377..84121b2 100644
--- a/arch/arm/kvm/psci.c
+++ b/arch/arm/kvm/psci.c
@@ -21,6 +21,7 @@
 #include <asm/cputype.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_psci.h>
+#include <asm/kvm_host.h>
 
 /*
  * This is an implementation of the Power State Coordination Interface
@@ -65,25 +66,17 @@ static void kvm_psci_vcpu_off(struct kvm_vcpu *vcpu)
 static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 {
 	struct kvm *kvm = source_vcpu->kvm;
-	struct kvm_vcpu *vcpu = NULL, *tmp;
+	struct kvm_vcpu *vcpu = NULL;
 	wait_queue_head_t *wq;
 	unsigned long cpu_id;
 	unsigned long context_id;
-	unsigned long mpidr;
 	phys_addr_t target_pc;
-	int i;
 
-	cpu_id = *vcpu_reg(source_vcpu, 1);
+	cpu_id = *vcpu_reg(source_vcpu, 1) & MPIDR_HWID_BITMASK;
 	if (vcpu_mode_is_32bit(source_vcpu))
 		cpu_id &= ~((u32) 0);
 
-	kvm_for_each_vcpu(i, tmp, kvm) {
-		mpidr = kvm_vcpu_get_mpidr(tmp);
-		if ((mpidr & MPIDR_HWID_BITMASK) == (cpu_id & MPIDR_HWID_BITMASK)) {
-			vcpu = tmp;
-			break;
-		}
-	}
+	vcpu = kvm_mpidr_to_vcpu(kvm, cpu_id);
 
 	/*
 	 * Make sure the caller requested a valid CPU and that the CPU is
@@ -154,7 +147,7 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
 	 * then ON else OFF
 	 */
 	kvm_for_each_vcpu(i, tmp, kvm) {
-		mpidr = kvm_vcpu_get_mpidr(tmp);
+		mpidr = kvm_vcpu_get_mpidr_aff(tmp);
 		if (((mpidr & target_affinity_mask) == target_affinity) &&
 		    !tmp->arch.pause) {
 			return PSCI_0_2_AFFINITY_LEVEL_ON;
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 5674a55..d4daaa5 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -27,6 +27,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmio.h>
 #include <asm/ptrace.h>
+#include <asm/cputype.h>
 
 unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num);
 unsigned long *vcpu_spsr32(const struct kvm_vcpu *vcpu);
@@ -182,9 +183,9 @@ static inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vcpu)
 	return kvm_vcpu_get_hsr(vcpu) & ESR_EL2_FSC_TYPE;
 }
 
-static inline unsigned long kvm_vcpu_get_mpidr(struct kvm_vcpu *vcpu)
+static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
 {
-	return vcpu_sys_reg(vcpu, MPIDR_EL1);
+	return vcpu_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
 }
 
 static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 2012c4b..286bb61 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -207,6 +207,8 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 int kvm_perf_init(void);
 int kvm_perf_teardown(void);
 
+struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
+
 static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
 				       phys_addr_t pgd_ptr,
 				       unsigned long hyp_stack_ptr,
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 4cc3b71..fd3ffc3 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -252,10 +252,17 @@ static void reset_amair_el1(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 
 static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 {
+	u64 mpidr;
+
 	/*
-	 * Simply map the vcpu_id into the Aff0 field of the MPIDR.
+	 * Map the vcpu_id into the first three Aff fields of the MPIDR.
+	 * We limit the number of VCPUs in Aff0 due to a limitation in the
+	 * ICC_SGIxR registers of the GICv3.
 	 */
-	vcpu_sys_reg(vcpu, MPIDR_EL1) = (1UL << 31) | (vcpu->vcpu_id & 0xff);
+	mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
+	mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
+	mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
+	vcpu_sys_reg(vcpu, MPIDR_EL1) = (1ULL << 31) | mpidr;
 }
 
 /* Silly macro to expand the DBG{BCR,BVR,WVR,WCR}n_EL1 registers in one go */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v4 02/19] arm/arm64: KVM: pass down user space provided GIC type into vGIC code
  2014-11-14 10:07 [PATCH v4 00/19] KVM GICv3 emulation Andre Przywara
  2014-11-14 10:07 ` [PATCH v4 01/19] arm/arm64: KVM: rework MPIDR assignment and add accessors Andre Przywara
@ 2014-11-14 10:07 ` Andre Przywara
  2014-11-18 10:36   ` Eric Auger
  2014-11-14 10:07 ` [PATCH v4 03/19] arm/arm64: KVM: refactor vgic_handle_mmio() function Andre Przywara
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 80+ messages in thread
From: Andre Przywara @ 2014-11-14 10:07 UTC (permalink / raw)
  To: linux-arm-kernel

With the introduction of a second emulated GIC model we need to let
userspace specify the GIC model to use for each VM. Pass the
userspace provided value down into the vGIC code and store it there
to differentiate later.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
---
Changelog v3...v4:
- added Acked-by

 arch/arm/kvm/arm.c     |    2 +-
 include/kvm/arm_vgic.h |    7 +++++--
 virt/kvm/arm/vgic.c    |    5 +++--
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index c2a5c69..8817fbd 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -753,7 +753,7 @@ long kvm_arch_vm_ioctl(struct file *filp,
 	switch (ioctl) {
 	case KVM_CREATE_IRQCHIP: {
 		if (vgic_present)
-			return kvm_vgic_create(kvm);
+			return kvm_vgic_create(kvm, KVM_DEV_TYPE_ARM_VGIC_V2);
 		else
 			return -ENXIO;
 	}
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 206dcc3..dde5a00 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -140,6 +140,9 @@ struct vgic_dist {
 	bool			in_kernel;
 	bool			ready;
 
+	/* vGIC model the kernel emulates for the guest (GICv2 or GICv3) */
+	u32			vgic_model;
+
 	int			nr_cpus;
 	int			nr_irqs;
 
@@ -275,7 +278,7 @@ struct kvm_exit_mmio;
 int kvm_vgic_addr(struct kvm *kvm, unsigned long type, u64 *addr, bool write);
 int kvm_vgic_hyp_init(void);
 int kvm_vgic_init(struct kvm *kvm);
-int kvm_vgic_create(struct kvm *kvm);
+int kvm_vgic_create(struct kvm *kvm, u32 type);
 void kvm_vgic_destroy(struct kvm *kvm);
 void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu);
 void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu);
@@ -326,7 +329,7 @@ static inline int kvm_vgic_init(struct kvm *kvm)
 	return 0;
 }
 
-static inline int kvm_vgic_create(struct kvm *kvm)
+static inline int kvm_vgic_create(struct kvm *kvm, u32 type)
 {
 	return 0;
 }
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 3aaca49..2403d72 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1931,7 +1931,7 @@ out:
 	return ret;
 }
 
-int kvm_vgic_create(struct kvm *kvm)
+int kvm_vgic_create(struct kvm *kvm, u32 type)
 {
 	int i, vcpu_lock_idx = -1, ret = 0;
 	struct kvm_vcpu *vcpu;
@@ -1963,6 +1963,7 @@ int kvm_vgic_create(struct kvm *kvm)
 
 	spin_lock_init(&kvm->arch.vgic.lock);
 	kvm->arch.vgic.in_kernel = true;
+	kvm->arch.vgic.vgic_model = type;
 	kvm->arch.vgic.vctrl_base = vgic->vctrl_base;
 	kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
 	kvm->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
@@ -2388,7 +2389,7 @@ static void vgic_destroy(struct kvm_device *dev)
 
 static int vgic_create(struct kvm_device *dev, u32 type)
 {
-	return kvm_vgic_create(dev->kvm);
+	return kvm_vgic_create(dev->kvm, type);
 }
 
 static struct kvm_device_ops kvm_arm_vgic_v2_ops = {
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v4 03/19] arm/arm64: KVM: refactor vgic_handle_mmio() function
  2014-11-14 10:07 [PATCH v4 00/19] KVM GICv3 emulation Andre Przywara
  2014-11-14 10:07 ` [PATCH v4 01/19] arm/arm64: KVM: rework MPIDR assignment and add accessors Andre Przywara
  2014-11-14 10:07 ` [PATCH v4 02/19] arm/arm64: KVM: pass down user space provided GIC type into vGIC code Andre Przywara
@ 2014-11-14 10:07 ` Andre Przywara
  2014-11-18 10:35   ` Eric Auger
  2014-11-14 10:07 ` [PATCH v4 04/19] arm/arm64: KVM: wrap 64 bit MMIO accesses with two 32 bit ones Andre Przywara
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 80+ messages in thread
From: Andre Przywara @ 2014-11-14 10:07 UTC (permalink / raw)
  To: linux-arm-kernel

Currently we only need to deal with one MMIO region for the GIC
emulation, but we soon need to extend this. Refactor the existing
code to allow easier addition of different ranges without code
duplication.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
---
Changelog v3...v4:
- simplify is_in_range()
- added Reviewed-by:

 virt/kvm/arm/vgic.c |   75 ++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 54 insertions(+), 21 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 2403d72..5eee3de 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1032,37 +1032,28 @@ static bool vgic_validate_access(const struct vgic_dist *dist,
 	return true;
 }
 
-/**
- * vgic_handle_mmio - handle an in-kernel MMIO access
+/*
+ * vgic_handle_mmio_range - handle an in-kernel MMIO access
  * @vcpu:	pointer to the vcpu performing the access
  * @run:	pointer to the kvm_run structure
  * @mmio:	pointer to the data describing the access
+ * @ranges:	pointer to the register defining structure
+ * @mmio_base:	base address for this mapping
  *
- * returns true if the MMIO access has been performed in kernel space,
- * and false if it needs to be emulated in user space.
+ * returns true if the MMIO access could be performed
  */
-bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
-		      struct kvm_exit_mmio *mmio)
+static bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, struct kvm_run *run,
+			    struct kvm_exit_mmio *mmio,
+			    const struct mmio_range *ranges,
+			    unsigned long mmio_base)
 {
 	const struct mmio_range *range;
 	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
-	unsigned long base = dist->vgic_dist_base;
 	bool updated_state;
 	unsigned long offset;
 
-	if (!irqchip_in_kernel(vcpu->kvm) ||
-	    mmio->phys_addr < base ||
-	    (mmio->phys_addr + mmio->len) > (base + KVM_VGIC_V2_DIST_SIZE))
-		return false;
-
-	/* We don't support ldrd / strd or ldm / stm to the emulated vgic */
-	if (mmio->len > 4) {
-		kvm_inject_dabt(vcpu, mmio->phys_addr);
-		return true;
-	}
-
-	offset = mmio->phys_addr - base;
-	range = find_matching_range(vgic_dist_ranges, mmio, offset);
+	offset = mmio->phys_addr - mmio_base;
+	range = find_matching_range(ranges, mmio, offset);
 	if (unlikely(!range || !range->handle_mmio)) {
 		pr_warn("Unhandled access %d %08llx %d\n",
 			mmio->is_write, mmio->phys_addr, mmio->len);
@@ -1070,7 +1061,7 @@ bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
 	}
 
 	spin_lock(&vcpu->kvm->arch.vgic.lock);
-	offset = mmio->phys_addr - range->base - base;
+	offset -= range->base;
 	if (vgic_validate_access(dist, range, offset)) {
 		updated_state = range->handle_mmio(vcpu, mmio, offset);
 	} else {
@@ -1088,6 +1079,48 @@ bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
 	return true;
 }
 
+static inline bool is_in_range(phys_addr_t addr, unsigned long len,
+			       phys_addr_t baseaddr, unsigned long size)
+{
+	return (addr >= baseaddr) && (addr + len <= baseaddr + size);
+}
+
+static bool vgic_v2_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
+				struct kvm_exit_mmio *mmio)
+{
+	unsigned long base = vcpu->kvm->arch.vgic.vgic_dist_base;
+
+	if (!is_in_range(mmio->phys_addr, mmio->len, base,
+			 KVM_VGIC_V2_DIST_SIZE))
+		return false;
+
+	/* GICv2 does not support accesses wider than 32 bits */
+	if (mmio->len > 4) {
+		kvm_inject_dabt(vcpu, mmio->phys_addr);
+		return true;
+	}
+
+	return vgic_handle_mmio_range(vcpu, run, mmio, vgic_dist_ranges, base);
+}
+
+/**
+ * vgic_handle_mmio - handle an in-kernel MMIO access for the GIC emulation
+ * @vcpu:      pointer to the vcpu performing the access
+ * @run:       pointer to the kvm_run structure
+ * @mmio:      pointer to the data describing the access
+ *
+ * returns true if the MMIO access has been performed in kernel space,
+ * and false if it needs to be emulated in user space.
+ */
+bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
+		      struct kvm_exit_mmio *mmio)
+{
+	if (!irqchip_in_kernel(vcpu->kvm))
+		return false;
+
+	return vgic_v2_handle_mmio(vcpu, run, mmio);
+}
+
 static u8 *vgic_get_sgi_sources(struct vgic_dist *dist, int vcpu_id, int sgi)
 {
 	return dist->irq_sgi_sources + vcpu_id * VGIC_NR_SGIS + sgi;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v4 04/19] arm/arm64: KVM: wrap 64 bit MMIO accesses with two 32 bit ones
  2014-11-14 10:07 [PATCH v4 00/19] KVM GICv3 emulation Andre Przywara
                   ` (2 preceding siblings ...)
  2014-11-14 10:07 ` [PATCH v4 03/19] arm/arm64: KVM: refactor vgic_handle_mmio() function Andre Przywara
@ 2014-11-14 10:07 ` Andre Przywara
  2014-11-18 10:36   ` Eric Auger
  2014-11-23  9:42   ` Christoffer Dall
  2014-11-14 10:07 ` [PATCH v4 05/19] arm/arm64: KVM: introduce per-VM ops Andre Przywara
                   ` (15 subsequent siblings)
  19 siblings, 2 replies; 80+ messages in thread
From: Andre Przywara @ 2014-11-14 10:07 UTC (permalink / raw)
  To: linux-arm-kernel

Some GICv3 registers can and will be accessed as 64 bit registers.
Currently the register handling code can only deal with 32 bit
accesses, so we do two consecutive calls to cover this.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
Changelog v3...v4:
- add comment explaining little endian handling

 virt/kvm/arm/vgic.c |   51 ++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 5eee3de..dba51e4 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1033,6 +1033,51 @@ static bool vgic_validate_access(const struct vgic_dist *dist,
 }
 
 /*
+ * Call the respective handler function for the given range.
+ * We split up any 64 bit accesses into two consecutive 32 bit
+ * handler calls and merge the result afterwards.
+ * We do this in a little endian fashion regardless of the host's
+ * or guest's endianness, because the GIC is always LE and the rest of
+ * the code (vgic_reg_access) also puts it in a LE fashion already.
+ */
+static bool call_range_handler(struct kvm_vcpu *vcpu,
+			       struct kvm_exit_mmio *mmio,
+			       unsigned long offset,
+			       const struct mmio_range *range)
+{
+	u32 *data32 = (void *)mmio->data;
+	struct kvm_exit_mmio mmio32;
+	bool ret;
+
+	if (likely(mmio->len <= 4))
+		return range->handle_mmio(vcpu, mmio, offset);
+
+	/*
+	 * Any access bigger than 4 bytes (that we currently handle in KVM)
+	 * is actually 8 bytes long, caused by a 64-bit access
+	 */
+
+	mmio32.len = 4;
+	mmio32.is_write = mmio->is_write;
+
+	mmio32.phys_addr = mmio->phys_addr + 4;
+	if (mmio->is_write)
+		*(u32 *)mmio32.data = data32[1];
+	ret = range->handle_mmio(vcpu, &mmio32, offset + 4);
+	if (!mmio->is_write)
+		data32[1] = *(u32 *)mmio32.data;
+
+	mmio32.phys_addr = mmio->phys_addr;
+	if (mmio->is_write)
+		*(u32 *)mmio32.data = data32[0];
+	ret |= range->handle_mmio(vcpu, &mmio32, offset);
+	if (!mmio->is_write)
+		data32[0] = *(u32 *)mmio32.data;
+
+	return ret;
+}
+
+/*
  * vgic_handle_mmio_range - handle an in-kernel MMIO access
  * @vcpu:	pointer to the vcpu performing the access
  * @run:	pointer to the kvm_run structure
@@ -1063,10 +1108,10 @@ static bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, struct kvm_run *run,
 	spin_lock(&vcpu->kvm->arch.vgic.lock);
 	offset -= range->base;
 	if (vgic_validate_access(dist, range, offset)) {
-		updated_state = range->handle_mmio(vcpu, mmio, offset);
+		updated_state = call_range_handler(vcpu, mmio, offset, range);
 	} else {
-		vgic_reg_access(mmio, NULL, offset,
-				ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
+		if (!mmio->is_write)
+			memset(mmio->data, 0, mmio->len);
 		updated_state = false;
 	}
 	spin_unlock(&vcpu->kvm->arch.vgic.lock);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v4 05/19] arm/arm64: KVM: introduce per-VM ops
  2014-11-14 10:07 [PATCH v4 00/19] KVM GICv3 emulation Andre Przywara
                   ` (3 preceding siblings ...)
  2014-11-14 10:07 ` [PATCH v4 04/19] arm/arm64: KVM: wrap 64 bit MMIO accesses with two 32 bit ones Andre Przywara
@ 2014-11-14 10:07 ` Andre Przywara
  2014-11-23  9:58   ` Christoffer Dall
  2014-11-14 10:07 ` [PATCH v4 06/19] arm/arm64: KVM: move kvm_register_device_ops() into vGIC probing Andre Przywara
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 80+ messages in thread
From: Andre Przywara @ 2014-11-14 10:07 UTC (permalink / raw)
  To: linux-arm-kernel

Currently we only have one virtual GIC model supported, so all guests
use the same emulation code. With the addition of another model we
end up with different guests using potentially different vGIC models,
so we have to split up some functions to be per VM.
Introduce a vgic_vm_ops struct to hold function pointers for those
functions that are different and provide the necessary code to
initialize them.
Also split up the kvm_vgic_init() function to separate out VGIC model
specific functionality into a separate function, which will later be
different for a GICv3 model.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
Changelog v3...v4:
- add accessor functions for vm_ops members
- introduce init_vgic_model() to differentiate between guest GIC models
- simplify vgic_v2_init_emulation() 
- help debugging by hinting on handle_mmio codeflow in comment

 include/kvm/arm_vgic.h |   10 +++++
 virt/kvm/arm/vgic.c    |  114 ++++++++++++++++++++++++++++++++++++++----------
 2 files changed, 102 insertions(+), 22 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index dde5a00..bfb660a 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -134,6 +134,14 @@ struct vgic_params {
 	void __iomem	*vctrl_base;
 };
 
+struct vgic_vm_ops {
+	bool	(*handle_mmio)(struct kvm_vcpu *, struct kvm_run *,
+			       struct kvm_exit_mmio *);
+	bool	(*queue_sgi)(struct kvm_vcpu *vcpu, int irq);
+	void	(*add_sgi_source)(struct kvm_vcpu *vcpu, int irq, int source);
+	int	(*vgic_init)(struct kvm *kvm, const struct vgic_params *params);
+};
+
 struct vgic_dist {
 #ifdef CONFIG_KVM_ARM_VGIC
 	spinlock_t		lock;
@@ -215,6 +223,8 @@ struct vgic_dist {
 
 	/* Bitmap indicating which CPU has something pending */
 	unsigned long		*irq_pending_on_cpu;
+
+	struct vgic_vm_ops	vm_ops;
 #endif
 };
 
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index dba51e4..963b84e 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -105,6 +105,21 @@ static void vgic_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr);
 static const struct vgic_ops *vgic_ops;
 static const struct vgic_params *vgic;
 
+static void add_sgi_source(struct kvm_vcpu *vcpu, int irq, int source)
+{
+	vcpu->kvm->arch.vgic.vm_ops.add_sgi_source(vcpu, irq, source);
+}
+
+static bool queue_sgi(struct kvm_vcpu *vcpu, int irq)
+{
+	return vcpu->kvm->arch.vgic.vm_ops.queue_sgi(vcpu, irq);
+}
+
+static int vgic_init(struct kvm *kvm, const struct vgic_params *params)
+{
+	return kvm->arch.vgic.vm_ops.vgic_init(kvm, params);
+}
+
 /*
  * struct vgic_bitmap contains a bitmap made of unsigned longs, but
  * extracts u32s out of them.
@@ -761,6 +776,13 @@ static bool handle_mmio_sgi_reg(struct kvm_vcpu *vcpu,
 	return false;
 }
 
+static void vgic_v2_add_sgi_source(struct kvm_vcpu *vcpu, int irq, int source)
+{
+	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+
+	*vgic_get_sgi_sources(dist, vcpu->vcpu_id, irq) |= 1 << source;
+}
+
 /**
  * vgic_unqueue_irqs - move pending IRQs from LRs to the distributor
  * @vgic_cpu: Pointer to the vgic_cpu struct holding the LRs
@@ -775,9 +797,7 @@ static bool handle_mmio_sgi_reg(struct kvm_vcpu *vcpu,
  */
 static void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
 {
-	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
-	int vcpu_id = vcpu->vcpu_id;
 	int i;
 
 	for_each_set_bit(i, vgic_cpu->lr_used, vgic_cpu->nr_lr) {
@@ -804,7 +824,7 @@ static void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
 		 */
 		vgic_dist_irq_set_pending(vcpu, lr.irq);
 		if (lr.irq < VGIC_NR_SGIS)
-			*vgic_get_sgi_sources(dist, vcpu_id, lr.irq) |= 1 << lr.source;
+			add_sgi_source(vcpu, lr.irq, lr.source);
 		lr.state &= ~LR_STATE_PENDING;
 		vgic_set_lr(vcpu, i, lr);
 
@@ -1156,6 +1176,7 @@ static bool vgic_v2_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
  *
  * returns true if the MMIO access has been performed in kernel space,
  * and false if it needs to be emulated in user space.
+ * Calls the actual handling routine for the selected VGIC model.
  */
 bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
 		      struct kvm_exit_mmio *mmio)
@@ -1163,7 +1184,12 @@ bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
 	if (!irqchip_in_kernel(vcpu->kvm))
 		return false;
 
-	return vgic_v2_handle_mmio(vcpu, run, mmio);
+	/*
+	 * This will currently call either vgic_v2_handle_mmio() or
+	 * vgic_v3_handle_mmio(), which in turn will call
+	 * vgic_handle_mmio_range() defined above.
+	 */
+	return vcpu->kvm->arch.vgic.vm_ops.handle_mmio(vcpu, run, mmio);
 }
 
 static u8 *vgic_get_sgi_sources(struct vgic_dist *dist, int vcpu_id, int sgi)
@@ -1415,7 +1441,7 @@ static bool vgic_queue_irq(struct kvm_vcpu *vcpu, u8 sgi_source_id, int irq)
 	return true;
 }
 
-static bool vgic_queue_sgi(struct kvm_vcpu *vcpu, int irq)
+static bool vgic_v2_queue_sgi(struct kvm_vcpu *vcpu, int irq)
 {
 	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
 	unsigned long sources;
@@ -1490,7 +1516,7 @@ static void __kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
 
 	/* SGIs */
 	for_each_set_bit(i, vgic_cpu->pending_percpu, VGIC_NR_SGIS) {
-		if (!vgic_queue_sgi(vcpu, i))
+		if (!queue_sgi(vcpu, i))
 			overflow = 1;
 	}
 
@@ -1945,9 +1971,6 @@ static int vgic_init_maps(struct kvm *kvm)
 		}
 	}
 
-	for (i = VGIC_NR_PRIVATE_IRQS; i < dist->nr_irqs; i += 4)
-		vgic_set_target_reg(kvm, 0, i);
-
 out:
 	if (ret)
 		kvm_vgic_destroy(kvm);
@@ -1955,6 +1978,32 @@ out:
 	return ret;
 }
 
+static int vgic_v2_init(struct kvm *kvm, const struct vgic_params *params)
+{
+	struct vgic_dist *dist = &kvm->arch.vgic;
+	int ret, i;
+
+	if (IS_VGIC_ADDR_UNDEF(dist->vgic_dist_base) ||
+	    IS_VGIC_ADDR_UNDEF(dist->vgic_cpu_base)) {
+		kvm_err("Need to set vgic distributor addresses first\n");
+		return -ENXIO;
+	}
+
+	ret = kvm_phys_addr_ioremap(kvm, dist->vgic_cpu_base,
+				    params->vcpu_base,
+				    KVM_VGIC_V2_CPU_SIZE, true);
+	if (ret) {
+		kvm_err("Unable to remap VGIC CPU to VCPU\n");
+		return ret;
+	}
+
+	/* Initialize the target VCPUs for each IRQ to VCPU 0 */
+	for (i = VGIC_NR_PRIVATE_IRQS; i < dist->nr_irqs; i += 4)
+		vgic_set_target_reg(kvm, 0, i);
+
+	return 0;
+}
+
 /**
  * kvm_vgic_init - Initialize global VGIC state before running any VCPUs
  * @kvm: pointer to the kvm struct
@@ -1977,26 +2026,15 @@ int kvm_vgic_init(struct kvm *kvm)
 	if (vgic_initialized(kvm))
 		goto out;
 
-	if (IS_VGIC_ADDR_UNDEF(kvm->arch.vgic.vgic_dist_base) ||
-	    IS_VGIC_ADDR_UNDEF(kvm->arch.vgic.vgic_cpu_base)) {
-		kvm_err("Need to set vgic cpu and dist addresses first\n");
-		ret = -ENXIO;
-		goto out;
-	}
-
 	ret = vgic_init_maps(kvm);
 	if (ret) {
 		kvm_err("Unable to allocate maps\n");
 		goto out;
 	}
 
-	ret = kvm_phys_addr_ioremap(kvm, kvm->arch.vgic.vgic_cpu_base,
-				    vgic->vcpu_base, KVM_VGIC_V2_CPU_SIZE,
-				    true);
-	if (ret) {
-		kvm_err("Unable to remap VGIC CPU to VCPU\n");
+	ret = vgic_init(kvm, vgic);
+	if (ret)
 		goto out;
-	}
 
 	kvm_for_each_vcpu(i, vcpu, kvm)
 		kvm_vgic_vcpu_init(vcpu);
@@ -2009,6 +2047,34 @@ out:
 	return ret;
 }
 
+static int vgic_v2_init_emulation(struct kvm *kvm)
+{
+	struct vgic_dist *dist = &kvm->arch.vgic;
+
+	dist->vm_ops.handle_mmio = vgic_v2_handle_mmio;
+	dist->vm_ops.queue_sgi = vgic_v2_queue_sgi;
+	dist->vm_ops.add_sgi_source = vgic_v2_add_sgi_source;
+	dist->vm_ops.vgic_init = vgic_v2_init;
+
+	return 0;
+}
+
+static int init_vgic_model(struct kvm *kvm, int type)
+{
+	int ret;
+
+	switch (type) {
+	case KVM_DEV_TYPE_ARM_VGIC_V2:
+		ret = vgic_v2_init_emulation(kvm);
+		break;
+	default:
+		ret = -ENODEV;
+		break;
+	}
+
+	return ret;
+}
+
 int kvm_vgic_create(struct kvm *kvm, u32 type)
 {
 	int i, vcpu_lock_idx = -1, ret = 0;
@@ -2039,6 +2105,10 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
 		}
 	}
 
+	ret = init_vgic_model(kvm, type);
+	if (ret)
+		goto out_unlock;
+
 	spin_lock_init(&kvm->arch.vgic.lock);
 	kvm->arch.vgic.in_kernel = true;
 	kvm->arch.vgic.vgic_model = type;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v4 06/19] arm/arm64: KVM: move kvm_register_device_ops() into vGIC probing
  2014-11-14 10:07 [PATCH v4 00/19] KVM GICv3 emulation Andre Przywara
                   ` (4 preceding siblings ...)
  2014-11-14 10:07 ` [PATCH v4 05/19] arm/arm64: KVM: introduce per-VM ops Andre Przywara
@ 2014-11-14 10:07 ` Andre Przywara
  2014-11-18 10:43   ` Eric Auger
  2014-11-14 10:07 ` [PATCH v4 07/19] arm/arm64: KVM: dont rely on a valid GICH base address Andre Przywara
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 80+ messages in thread
From: Andre Przywara @ 2014-11-14 10:07 UTC (permalink / raw)
  To: linux-arm-kernel

Currently we unconditionally register the GICv2 emulation device
during the host's KVM initialization. Since with GICv3 support we
may end up with only v2 or only v3 or both supported, we move the
registration into the GIC probing function, where we will later know
which combination is valid.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
---
Changelog v3...v4:
- add Acked-by

 include/linux/kvm_host.h |    1 +
 virt/kvm/arm/vgic-v2.c   |    2 ++
 virt/kvm/arm/vgic-v3.c   |    1 +
 virt/kvm/arm/vgic.c      |    5 ++---
 4 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ea53b04..326ba7a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1084,6 +1084,7 @@ void kvm_unregister_device_ops(u32 type);
 
 extern struct kvm_device_ops kvm_mpic_ops;
 extern struct kvm_device_ops kvm_xics_ops;
+extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
 
 #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
 
diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
index 2935405..e1cd3cb 100644
--- a/virt/kvm/arm/vgic-v2.c
+++ b/virt/kvm/arm/vgic-v2.c
@@ -229,6 +229,8 @@ int vgic_v2_probe(struct device_node *vgic_node,
 		goto out_unmap;
 	}
 
+	kvm_register_device_ops(&kvm_arm_vgic_v2_ops, KVM_DEV_TYPE_ARM_VGIC_V2);
+
 	vgic->vcpu_base = vcpu_res.start;
 
 	kvm_info("%s@%llx IRQ%d\n", vgic_node->name,
diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
index 1c2c8ee..d14c75f 100644
--- a/virt/kvm/arm/vgic-v3.c
+++ b/virt/kvm/arm/vgic-v3.c
@@ -230,6 +230,7 @@ int vgic_v3_probe(struct device_node *vgic_node,
 		ret = -ENXIO;
 		goto out;
 	}
+	kvm_register_device_ops(&kvm_arm_vgic_v2_ops, KVM_DEV_TYPE_ARM_VGIC_V2);
 
 	vgic->vcpu_base = vcpu_res.start;
 	vgic->vctrl_base = NULL;
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 963b84e..e8003ca 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -2540,7 +2540,7 @@ static int vgic_create(struct kvm_device *dev, u32 type)
 	return kvm_vgic_create(dev->kvm, type);
 }
 
-static struct kvm_device_ops kvm_arm_vgic_v2_ops = {
+struct kvm_device_ops kvm_arm_vgic_v2_ops = {
 	.name = "kvm-arm-vgic",
 	.create = vgic_create,
 	.destroy = vgic_destroy,
@@ -2619,8 +2619,7 @@ int kvm_vgic_hyp_init(void)
 
 	on_each_cpu(vgic_init_maintenance_interrupt, NULL, 1);
 
-	return kvm_register_device_ops(&kvm_arm_vgic_v2_ops,
-				       KVM_DEV_TYPE_ARM_VGIC_V2);
+	return 0;
 
 out_free_irq:
 	free_percpu_irq(vgic->maint_irq, kvm_get_running_vcpus());
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v4 07/19] arm/arm64: KVM: dont rely on a valid GICH base address
  2014-11-14 10:07 [PATCH v4 00/19] KVM GICv3 emulation Andre Przywara
                   ` (5 preceding siblings ...)
  2014-11-14 10:07 ` [PATCH v4 06/19] arm/arm64: KVM: move kvm_register_device_ops() into vGIC probing Andre Przywara
@ 2014-11-14 10:07 ` Andre Przywara
  2014-11-14 10:07 ` [PATCH v4 08/19] arm/arm64: KVM: make the maximum number of vCPUs a per-VM value Andre Przywara
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 80+ messages in thread
From: Andre Przywara @ 2014-11-14 10:07 UTC (permalink / raw)
  To: linux-arm-kernel

To check whether the vGIC was already initialized, we currently check
the GICH base address for not being NULL. Since with GICv3 we may
get along without this address, lets use the irqchip_in_kernel()
function to detect an already initialized vGIC.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
---
Changelog v3...v4:
- add Acked-by

 virt/kvm/arm/vgic.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index e8003ca..4aa0b2f 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -2082,7 +2082,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
 
 	mutex_lock(&kvm->lock);
 
-	if (kvm->arch.vgic.vctrl_base) {
+	if (irqchip_in_kernel(kvm)) {
 		ret = -EEXIST;
 		goto out;
 	}
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v4 08/19] arm/arm64: KVM: make the maximum number of vCPUs a per-VM value
  2014-11-14 10:07 [PATCH v4 00/19] KVM GICv3 emulation Andre Przywara
                   ` (6 preceding siblings ...)
  2014-11-14 10:07 ` [PATCH v4 07/19] arm/arm64: KVM: dont rely on a valid GICH base address Andre Przywara
@ 2014-11-14 10:07 ` Andre Przywara
  2014-11-23 13:21   ` Christoffer Dall
  2014-11-14 10:07 ` [PATCH v4 09/19] arm/arm64: KVM: make the value of ICC_SRE_EL1 a per-VM variable Andre Przywara
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 80+ messages in thread
From: Andre Przywara @ 2014-11-14 10:07 UTC (permalink / raw)
  To: linux-arm-kernel

Currently the maximum number of vCPUs supported is a global value
limited by the used GIC model. GICv3 will lift this limit, but we
still need to observe it for guests using GICv2.
So the maximum number of vCPUs is per-VM value, depending on the
GIC model the guest uses.
Store and check the value in struct kvm_arch, but keep it down to
8 for now.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
Changelog v3...v4:
- initialize max_vcpus with limit based on host GIC
- remove *_init_emul_* from VGIC backend
- refine VCPU limit on VGIC creation
- print warning when userland tries to create more VCPUs than supported

 arch/arm/include/asm/kvm_host.h   |    1 +
 arch/arm/kvm/arm.c                |    8 ++++++++
 arch/arm64/include/asm/kvm_host.h |    3 +++
 include/kvm/arm_vgic.h            |    2 ++
 virt/kvm/arm/vgic-v2.c            |    1 +
 virt/kvm/arm/vgic-v3.c            |    1 +
 virt/kvm/arm/vgic.c               |   22 ++++++++++++++++++++++
 7 files changed, 38 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index b443dfe..7969e6e 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -68,6 +68,7 @@ struct kvm_arch {
 
 	/* Interrupt controller */
 	struct vgic_dist	vgic;
+	int max_vcpus;
 };
 
 #define KVM_NR_MEM_OBJS     40
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 8817fbd..c3d0fbd 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -132,6 +132,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	/* Mark the initial VMID generation invalid */
 	kvm->arch.vmid_gen = 0;
 
+	/* The maximum number of VCPUs is limited by the host's GIC model */
+	kvm->arch.max_vcpus = kvm_vgic_get_max_vcpus();
+
 	return ret;
 out_free_stage2_pgd:
 	kvm_free_stage2_pgd(kvm);
@@ -213,6 +216,11 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
 	int err;
 	struct kvm_vcpu *vcpu;
 
+	if (id >= kvm->arch.max_vcpus) {
+		err = -EINVAL;
+		goto out;
+	}
+
 	vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
 	if (!vcpu) {
 		err = -ENOMEM;
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 286bb61..f9e130d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -59,6 +59,9 @@ struct kvm_arch {
 	/* VTTBR value associated with above pgd and vmid */
 	u64    vttbr;
 
+	/* The maximum number of vCPUs depends on the used GIC model */
+	int max_vcpus;
+
 	/* Interrupt controller */
 	struct vgic_dist	vgic;
 
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index bfb660a..09344ac 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -132,6 +132,7 @@ struct vgic_params {
 	unsigned int	maint_irq;
 	/* Virtual control interface base address */
 	void __iomem	*vctrl_base;
+	int		max_hw_vcpus;
 };
 
 struct vgic_vm_ops {
@@ -287,6 +288,7 @@ struct kvm_exit_mmio;
 #ifdef CONFIG_KVM_ARM_VGIC
 int kvm_vgic_addr(struct kvm *kvm, unsigned long type, u64 *addr, bool write);
 int kvm_vgic_hyp_init(void);
+int kvm_vgic_get_max_vcpus(void);
 int kvm_vgic_init(struct kvm *kvm);
 int kvm_vgic_create(struct kvm *kvm, u32 type);
 void kvm_vgic_destroy(struct kvm *kvm);
diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
index e1cd3cb..49fb288 100644
--- a/virt/kvm/arm/vgic-v2.c
+++ b/virt/kvm/arm/vgic-v2.c
@@ -237,6 +237,7 @@ int vgic_v2_probe(struct device_node *vgic_node,
 		 vctrl_res.start, vgic->maint_irq);
 
 	vgic->type = VGIC_V2;
+	vgic->max_hw_vcpus = 8;
 	*ops = &vgic_v2_ops;
 	*params = vgic;
 	goto out;
diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
index d14c75f..acd256c7 100644
--- a/virt/kvm/arm/vgic-v3.c
+++ b/virt/kvm/arm/vgic-v3.c
@@ -235,6 +235,7 @@ int vgic_v3_probe(struct device_node *vgic_node,
 	vgic->vcpu_base = vcpu_res.start;
 	vgic->vctrl_base = NULL;
 	vgic->type = VGIC_V3;
+	vgic->max_hw_vcpus = KVM_MAX_VCPUS;
 
 	kvm_info("%s@%llx IRQ%d\n", vgic_node->name,
 		 vcpu_res.start, vgic->maint_irq);
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 4aa0b2f..4c72c66 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1841,6 +1841,17 @@ static int vgic_vcpu_init_maps(struct kvm_vcpu *vcpu, int nr_irqs)
 }
 
 /**
+ * kvm_vgic_get_max_vcpus - Get the maximum number of VCPUs allowed by HW
+ *
+ * The host's GIC naturally limits the maximum amount of VCPUs a guest
+ * can use.
+ */
+int kvm_vgic_get_max_vcpus(void)
+{
+	return vgic->max_hw_vcpus;
+}
+
+/**
  * kvm_vgic_vcpu_init - Initialize per-vcpu VGIC state
  * @vcpu: pointer to the vcpu struct
  *
@@ -2056,6 +2067,8 @@ static int vgic_v2_init_emulation(struct kvm *kvm)
 	dist->vm_ops.add_sgi_source = vgic_v2_add_sgi_source;
 	dist->vm_ops.vgic_init = vgic_v2_init;
 
+	kvm->arch.max_vcpus = 8;
+
 	return 0;
 }
 
@@ -2072,6 +2085,15 @@ static int init_vgic_model(struct kvm *kvm, int type)
 		break;
 	}
 
+	if (ret)
+		return ret;
+
+	if (kvm->arch.max_vcpus < atomic_read(&kvm->online_vcpus)) {
+		pr_warn_ratelimited("VGIC model only supports up to %d vCPUs\n",
+			kvm->arch.max_vcpus);
+		ret = -EINVAL;
+	}
+
 	return ret;
 }
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v4 09/19] arm/arm64: KVM: make the value of ICC_SRE_EL1 a per-VM variable
  2014-11-14 10:07 [PATCH v4 00/19] KVM GICv3 emulation Andre Przywara
                   ` (7 preceding siblings ...)
  2014-11-14 10:07 ` [PATCH v4 08/19] arm/arm64: KVM: make the maximum number of vCPUs a per-VM value Andre Przywara
@ 2014-11-14 10:07 ` Andre Przywara
  2014-11-14 10:07 ` [PATCH v4 10/19] arm/arm64: KVM: refactor MMIO accessors Andre Przywara
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 80+ messages in thread
From: Andre Przywara @ 2014-11-14 10:07 UTC (permalink / raw)
  To: linux-arm-kernel

ICC_SRE_EL1 is a system register allowing msr/mrs accesses to the
GIC CPU interface for EL1 (guests). Currently we force it to 0, but
for proper GICv3 support we have to allow guests to use it (depending
on their selected virtual GIC model).
So add ICC_SRE_EL1 to the list of saved/restored registers on a
world switch, but actually disallow a guest to change it by only
restoring a fixed, once-initialized value.
This value depends on the GIC model userland has chosen for a guest.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
---
Changelog v3...v4:
- initialize variable on declaration

 arch/arm64/kernel/asm-offsets.c |    1 +
 arch/arm64/kvm/vgic-v3-switch.S |   14 +++++++++-----
 include/kvm/arm_vgic.h          |    1 +
 virt/kvm/arm/vgic-v3.c          |    8 ++++++--
 4 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 9a9fce0..9d34486 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -140,6 +140,7 @@ int main(void)
   DEFINE(VGIC_V2_CPU_ELRSR,	offsetof(struct vgic_cpu, vgic_v2.vgic_elrsr));
   DEFINE(VGIC_V2_CPU_APR,	offsetof(struct vgic_cpu, vgic_v2.vgic_apr));
   DEFINE(VGIC_V2_CPU_LR,	offsetof(struct vgic_cpu, vgic_v2.vgic_lr));
+  DEFINE(VGIC_V3_CPU_SRE,	offsetof(struct vgic_cpu, vgic_v3.vgic_sre));
   DEFINE(VGIC_V3_CPU_HCR,	offsetof(struct vgic_cpu, vgic_v3.vgic_hcr));
   DEFINE(VGIC_V3_CPU_VMCR,	offsetof(struct vgic_cpu, vgic_v3.vgic_vmcr));
   DEFINE(VGIC_V3_CPU_MISR,	offsetof(struct vgic_cpu, vgic_v3.vgic_misr));
diff --git a/arch/arm64/kvm/vgic-v3-switch.S b/arch/arm64/kvm/vgic-v3-switch.S
index d160469..617a012 100644
--- a/arch/arm64/kvm/vgic-v3-switch.S
+++ b/arch/arm64/kvm/vgic-v3-switch.S
@@ -148,17 +148,18 @@
  * x0: Register pointing to VCPU struct
  */
 .macro	restore_vgic_v3_state
-	// Disable SRE_EL1 access. Necessary, otherwise
-	// ICH_VMCR_EL2.VFIQEn becomes one, and FIQ happens...
-	msr_s	ICC_SRE_EL1, xzr
-	isb
-
 	// Compute the address of struct vgic_cpu
 	add	x3, x0, #VCPU_VGIC_CPU
 
 	// Restore all interesting registers
 	ldr	w4, [x3, #VGIC_V3_CPU_HCR]
 	ldr	w5, [x3, #VGIC_V3_CPU_VMCR]
+	ldr	w25, [x3, #VGIC_V3_CPU_SRE]
+
+	msr_s	ICC_SRE_EL1, x25
+
+	// make sure SRE is valid before writing the other registers
+	isb
 
 	msr_s	ICH_HCR_EL2, x4
 	msr_s	ICH_VMCR_EL2, x5
@@ -244,9 +245,12 @@
 	dsb	sy
 
 	// Prevent the guest from touching the GIC system registers
+	// if SRE isn't enabled for GICv3 emulation
+	cbnz	x25, 1f
 	mrs_s	x5, ICC_SRE_EL2
 	and	x5, x5, #~ICC_SRE_EL2_ENABLE
 	msr_s	ICC_SRE_EL2, x5
+1:
 .endm
 
 ENTRY(__save_vgic_v3_state)
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 09344ac..421833f 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -243,6 +243,7 @@ struct vgic_v3_cpu_if {
 #ifdef CONFIG_ARM_GIC_V3
 	u32		vgic_hcr;
 	u32		vgic_vmcr;
+	u32		vgic_sre;	/* Restored only, change ignored */
 	u32		vgic_misr;	/* Saved only */
 	u32		vgic_eisr;	/* Saved only */
 	u32		vgic_elrsr;	/* Saved only */
diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
index acd256c7..a04d208 100644
--- a/virt/kvm/arm/vgic-v3.c
+++ b/virt/kvm/arm/vgic-v3.c
@@ -145,15 +145,19 @@ static void vgic_v3_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcrp)
 
 static void vgic_v3_enable(struct kvm_vcpu *vcpu)
 {
+	struct vgic_v3_cpu_if *vgic_v3 = &vcpu->arch.vgic_cpu.vgic_v3;
+
 	/*
 	 * By forcing VMCR to zero, the GIC will restore the binary
 	 * points to their reset values. Anything else resets to zero
 	 * anyway.
 	 */
-	vcpu->arch.vgic_cpu.vgic_v3.vgic_vmcr = 0;
+	vgic_v3->vgic_vmcr = 0;
+
+	vgic_v3->vgic_sre = 0;
 
 	/* Get the show on the road... */
-	vcpu->arch.vgic_cpu.vgic_v3.vgic_hcr = ICH_HCR_EN;
+	vgic_v3->vgic_hcr = ICH_HCR_EN;
 }
 
 static const struct vgic_ops vgic_v3_ops = {
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v4 10/19] arm/arm64: KVM: refactor MMIO accessors
  2014-11-14 10:07 [PATCH v4 00/19] KVM GICv3 emulation Andre Przywara
                   ` (8 preceding siblings ...)
  2014-11-14 10:07 ` [PATCH v4 09/19] arm/arm64: KVM: make the value of ICC_SRE_EL1 a per-VM variable Andre Przywara
@ 2014-11-14 10:07 ` Andre Przywara
  2014-11-14 10:07 ` [PATCH v4 11/19] arm/arm64: KVM: refactor/wrap vgic_set/get_attr() Andre Przywara
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 80+ messages in thread
From: Andre Przywara @ 2014-11-14 10:07 UTC (permalink / raw)
  To: linux-arm-kernel

The MMIO accessors for GICD_I[CS]ENABLER, GICD_I[CS]PENDR and
GICD_ICFGR behave very similar for GICv2 and GICv3, although the way
the affected VCPU is determined differs.
Since we need them to access the registers from three different
places in the future, we factor out a generic, backend-facing
implementation and use small wrappers in the current GICv2 emulation.
This will ease adding GICv3 accessors later.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
---
Changelog v3...v4:
- improve commit message
- add Reviewed-by

 virt/kvm/arm/vgic.c |  126 ++++++++++++++++++++++++++++++---------------------
 1 file changed, 74 insertions(+), 52 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 4c72c66..d324054 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -491,64 +491,66 @@ static bool handle_mmio_raz_wi(struct kvm_vcpu *vcpu,
 	return false;
 }
 
-static bool handle_mmio_set_enable_reg(struct kvm_vcpu *vcpu,
-				       struct kvm_exit_mmio *mmio,
-				       phys_addr_t offset)
+static bool vgic_handle_enable_reg(struct kvm *kvm, struct kvm_exit_mmio *mmio,
+				   phys_addr_t offset, int vcpu_id, int access)
 {
-	u32 *reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_enabled,
-				       vcpu->vcpu_id, offset);
-	vgic_reg_access(mmio, reg, offset,
-			ACCESS_READ_VALUE | ACCESS_WRITE_SETBIT);
+	u32 *reg;
+	int mode = ACCESS_READ_VALUE | access;
+	struct kvm_vcpu *target_vcpu = kvm_get_vcpu(kvm, vcpu_id);
+
+	reg = vgic_bitmap_get_reg(&kvm->arch.vgic.irq_enabled, vcpu_id, offset);
+	vgic_reg_access(mmio, reg, offset, mode);
 	if (mmio->is_write) {
-		vgic_update_state(vcpu->kvm);
+		if (access & ACCESS_WRITE_CLEARBIT) {
+			if (offset < 4) /* Force SGI enabled */
+				*reg |= 0xffff;
+			vgic_retire_disabled_irqs(target_vcpu);
+		}
+		vgic_update_state(kvm);
 		return true;
 	}
 
 	return false;
 }
 
+static bool handle_mmio_set_enable_reg(struct kvm_vcpu *vcpu,
+				       struct kvm_exit_mmio *mmio,
+				       phys_addr_t offset)
+{
+	return vgic_handle_enable_reg(vcpu->kvm, mmio, offset,
+				      vcpu->vcpu_id, ACCESS_WRITE_SETBIT);
+}
+
 static bool handle_mmio_clear_enable_reg(struct kvm_vcpu *vcpu,
 					 struct kvm_exit_mmio *mmio,
 					 phys_addr_t offset)
 {
-	u32 *reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_enabled,
-				       vcpu->vcpu_id, offset);
-	vgic_reg_access(mmio, reg, offset,
-			ACCESS_READ_VALUE | ACCESS_WRITE_CLEARBIT);
-	if (mmio->is_write) {
-		if (offset < 4) /* Force SGI enabled */
-			*reg |= 0xffff;
-		vgic_retire_disabled_irqs(vcpu);
-		vgic_update_state(vcpu->kvm);
-		return true;
-	}
-
-	return false;
+	return vgic_handle_enable_reg(vcpu->kvm, mmio, offset,
+				      vcpu->vcpu_id, ACCESS_WRITE_CLEARBIT);
 }
 
-static bool handle_mmio_set_pending_reg(struct kvm_vcpu *vcpu,
+static bool vgic_handle_set_pending_reg(struct kvm *kvm,
 					struct kvm_exit_mmio *mmio,
-					phys_addr_t offset)
+					phys_addr_t offset, int vcpu_id)
 {
 	u32 *reg, orig;
 	u32 level_mask;
-	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+	int mode = ACCESS_READ_VALUE | ACCESS_WRITE_SETBIT;
+	struct vgic_dist *dist = &kvm->arch.vgic;
 
-	reg = vgic_bitmap_get_reg(&dist->irq_cfg, vcpu->vcpu_id, offset);
+	reg = vgic_bitmap_get_reg(&dist->irq_cfg, vcpu_id, offset);
 	level_mask = (~(*reg));
 
 	/* Mark both level and edge triggered irqs as pending */
-	reg = vgic_bitmap_get_reg(&dist->irq_pending, vcpu->vcpu_id, offset);
+	reg = vgic_bitmap_get_reg(&dist->irq_pending, vcpu_id, offset);
 	orig = *reg;
-	vgic_reg_access(mmio, reg, offset,
-			ACCESS_READ_VALUE | ACCESS_WRITE_SETBIT);
+	vgic_reg_access(mmio, reg, offset, mode);
 
 	if (mmio->is_write) {
 		/* Set the soft-pending flag only for level-triggered irqs */
 		reg = vgic_bitmap_get_reg(&dist->irq_soft_pend,
-					  vcpu->vcpu_id, offset);
-		vgic_reg_access(mmio, reg, offset,
-				ACCESS_READ_VALUE | ACCESS_WRITE_SETBIT);
+					  vcpu_id, offset);
+		vgic_reg_access(mmio, reg, offset, mode);
 		*reg &= level_mask;
 
 		/* Ignore writes to SGIs */
@@ -557,31 +559,30 @@ static bool handle_mmio_set_pending_reg(struct kvm_vcpu *vcpu,
 			*reg |= orig & 0xffff;
 		}
 
-		vgic_update_state(vcpu->kvm);
+		vgic_update_state(kvm);
 		return true;
 	}
 
 	return false;
 }
 
-static bool handle_mmio_clear_pending_reg(struct kvm_vcpu *vcpu,
+static bool vgic_handle_clear_pending_reg(struct kvm *kvm,
 					  struct kvm_exit_mmio *mmio,
-					  phys_addr_t offset)
+					  phys_addr_t offset, int vcpu_id)
 {
 	u32 *level_active;
 	u32 *reg, orig;
-	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+	int mode = ACCESS_READ_VALUE | ACCESS_WRITE_CLEARBIT;
+	struct vgic_dist *dist = &kvm->arch.vgic;
 
-	reg = vgic_bitmap_get_reg(&dist->irq_pending, vcpu->vcpu_id, offset);
+	reg = vgic_bitmap_get_reg(&dist->irq_pending, vcpu_id, offset);
 	orig = *reg;
-	vgic_reg_access(mmio, reg, offset,
-			ACCESS_READ_VALUE | ACCESS_WRITE_CLEARBIT);
+	vgic_reg_access(mmio, reg, offset, mode);
 	if (mmio->is_write) {
 		/* Re-set level triggered level-active interrupts */
 		level_active = vgic_bitmap_get_reg(&dist->irq_level,
-					  vcpu->vcpu_id, offset);
-		reg = vgic_bitmap_get_reg(&dist->irq_pending,
-					  vcpu->vcpu_id, offset);
+					  vcpu_id, offset);
+		reg = vgic_bitmap_get_reg(&dist->irq_pending, vcpu_id, offset);
 		*reg |= *level_active;
 
 		/* Ignore writes to SGIs */
@@ -592,17 +593,31 @@ static bool handle_mmio_clear_pending_reg(struct kvm_vcpu *vcpu,
 
 		/* Clear soft-pending flags */
 		reg = vgic_bitmap_get_reg(&dist->irq_soft_pend,
-					  vcpu->vcpu_id, offset);
-		vgic_reg_access(mmio, reg, offset,
-				ACCESS_READ_VALUE | ACCESS_WRITE_CLEARBIT);
+					  vcpu_id, offset);
+		vgic_reg_access(mmio, reg, offset, mode);
 
-		vgic_update_state(vcpu->kvm);
+		vgic_update_state(kvm);
 		return true;
 	}
-
 	return false;
 }
 
+static bool handle_mmio_set_pending_reg(struct kvm_vcpu *vcpu,
+					struct kvm_exit_mmio *mmio,
+					phys_addr_t offset)
+{
+	return vgic_handle_set_pending_reg(vcpu->kvm, mmio, offset,
+					   vcpu->vcpu_id);
+}
+
+static bool handle_mmio_clear_pending_reg(struct kvm_vcpu *vcpu,
+					  struct kvm_exit_mmio *mmio,
+					  phys_addr_t offset)
+{
+	return vgic_handle_clear_pending_reg(vcpu->kvm, mmio, offset,
+					     vcpu->vcpu_id);
+}
+
 static bool handle_mmio_priority_reg(struct kvm_vcpu *vcpu,
 				     struct kvm_exit_mmio *mmio,
 				     phys_addr_t offset)
@@ -725,14 +740,10 @@ static u16 vgic_cfg_compress(u32 val)
  * LSB is always 0. As such, we only keep the upper bit, and use the
  * two above functions to compress/expand the bits
  */
-static bool handle_mmio_cfg_reg(struct kvm_vcpu *vcpu,
-				struct kvm_exit_mmio *mmio, phys_addr_t offset)
+static bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio *mmio,
+				phys_addr_t offset)
 {
 	u32 val;
-	u32 *reg;
-
-	reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_cfg,
-				  vcpu->vcpu_id, offset >> 1);
 
 	if (offset & 4)
 		val = *reg >> 16;
@@ -761,6 +772,17 @@ static bool handle_mmio_cfg_reg(struct kvm_vcpu *vcpu,
 	return false;
 }
 
+static bool handle_mmio_cfg_reg(struct kvm_vcpu *vcpu,
+				struct kvm_exit_mmio *mmio, phys_addr_t offset)
+{
+	u32 *reg;
+
+	reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_cfg,
+				  vcpu->vcpu_id, offset >> 1);
+
+	return vgic_handle_cfg_reg(reg, mmio, offset);
+}
+
 static bool handle_mmio_sgi_reg(struct kvm_vcpu *vcpu,
 				struct kvm_exit_mmio *mmio, phys_addr_t offset)
 {
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v4 11/19] arm/arm64: KVM: refactor/wrap vgic_set/get_attr()
  2014-11-14 10:07 [PATCH v4 00/19] KVM GICv3 emulation Andre Przywara
                   ` (9 preceding siblings ...)
  2014-11-14 10:07 ` [PATCH v4 10/19] arm/arm64: KVM: refactor MMIO accessors Andre Przywara
@ 2014-11-14 10:07 ` Andre Przywara
  2014-11-23 13:27   ` Christoffer Dall
  2014-11-14 10:07 ` [PATCH v4 12/19] arm/arm64: KVM: add vgic.h header file Andre Przywara
                   ` (8 subsequent siblings)
  19 siblings, 1 reply; 80+ messages in thread
From: Andre Przywara @ 2014-11-14 10:07 UTC (permalink / raw)
  To: linux-arm-kernel

vgic_set_attr() and vgic_get_attr() contain both code specific for
the emulated GIC as well as code for the userland facing, generic
part of the GIC.
Split the guest GIC facing code of from the generic part to allow
easier splitting later.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
Changelog v3...v4:
- (none)

 virt/kvm/arm/vgic.c |   78 +++++++++++++++++++++++++++++++++++----------------
 1 file changed, 54 insertions(+), 24 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index d324054..ea71cd0 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -2434,7 +2434,8 @@ out:
 	return ret;
 }
 
-static int vgic_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
+static int vgic_set_common_attr(struct kvm_device *dev,
+				struct kvm_device_attr *attr)
 {
 	int r;
 
@@ -2450,17 +2451,6 @@ static int vgic_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
 		r = kvm_vgic_addr(dev->kvm, type, &addr, true);
 		return (r == -ENODEV) ? -ENXIO : r;
 	}
-
-	case KVM_DEV_ARM_VGIC_GRP_DIST_REGS:
-	case KVM_DEV_ARM_VGIC_GRP_CPU_REGS: {
-		u32 __user *uaddr = (u32 __user *)(long)attr->addr;
-		u32 reg;
-
-		if (get_user(reg, uaddr))
-			return -EFAULT;
-
-		return vgic_attr_regs_access(dev, attr, &reg, true);
-	}
 	case KVM_DEV_ARM_VGIC_GRP_NR_IRQS: {
 		u32 __user *uaddr = (u32 __user *)(long)attr->addr;
 		u32 val;
@@ -2497,7 +2487,33 @@ static int vgic_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
 	return -ENXIO;
 }
 
-static int vgic_get_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
+static int vgic_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
+{
+	int ret;
+
+	ret = vgic_set_common_attr(dev, attr);
+	if (ret != -ENXIO)
+		return ret;
+
+	switch (attr->group) {
+	case KVM_DEV_ARM_VGIC_GRP_DIST_REGS:
+	case KVM_DEV_ARM_VGIC_GRP_CPU_REGS: {
+		u32 __user *uaddr = (u32 __user *)(long)attr->addr;
+		u32 reg;
+
+		if (get_user(reg, uaddr))
+			return -EFAULT;
+
+		return vgic_attr_regs_access(dev, attr, &reg, true);
+	}
+
+	}
+
+	return -ENXIO;
+}
+
+static int vgic_get_common_attr(struct kvm_device *dev,
+				struct kvm_device_attr *attr)
 {
 	int r = -ENXIO;
 
@@ -2515,27 +2531,41 @@ static int vgic_get_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
 			return -EFAULT;
 		break;
 	}
+	case KVM_DEV_ARM_VGIC_GRP_NR_IRQS: {
+		u32 __user *uaddr = (u32 __user *)(long)attr->addr;
+
+		r = put_user(dev->kvm->arch.vgic.nr_irqs, uaddr);
+		break;
+	}
+
+	}
+
+	return r;
+}
+
+static int vgic_get_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
+{
+	int ret;
+
+	ret = vgic_get_common_attr(dev, attr);
+	if (ret != -ENXIO)
+		return ret;
 
+	switch (attr->group) {
 	case KVM_DEV_ARM_VGIC_GRP_DIST_REGS:
 	case KVM_DEV_ARM_VGIC_GRP_CPU_REGS: {
 		u32 __user *uaddr = (u32 __user *)(long)attr->addr;
 		u32 reg = 0;
 
-		r = vgic_attr_regs_access(dev, attr, &reg, false);
-		if (r)
-			return r;
-		r = put_user(reg, uaddr);
-		break;
-	}
-	case KVM_DEV_ARM_VGIC_GRP_NR_IRQS: {
-		u32 __user *uaddr = (u32 __user *)(long)attr->addr;
-		r = put_user(dev->kvm->arch.vgic.nr_irqs, uaddr);
-		break;
+		ret = vgic_attr_regs_access(dev, attr, &reg, false);
+		if (ret)
+			return ret;
+		return put_user(reg, uaddr);
 	}
 
 	}
 
-	return r;
+	return -ENXIO;
 }
 
 static int vgic_has_attr_regs(const struct mmio_range *ranges,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v4 12/19] arm/arm64: KVM: add vgic.h header file
  2014-11-14 10:07 [PATCH v4 00/19] KVM GICv3 emulation Andre Przywara
                   ` (10 preceding siblings ...)
  2014-11-14 10:07 ` [PATCH v4 11/19] arm/arm64: KVM: refactor/wrap vgic_set/get_attr() Andre Przywara
@ 2014-11-14 10:07 ` Andre Przywara
  2014-11-18 14:07   ` Eric Auger
  2014-11-23 13:29   ` Christoffer Dall
  2014-11-14 10:07 ` [PATCH v4 13/19] arm/arm64: KVM: split GICv2 specific emulation code from vgic.c Andre Przywara
                   ` (7 subsequent siblings)
  19 siblings, 2 replies; 80+ messages in thread
From: Andre Przywara @ 2014-11-14 10:07 UTC (permalink / raw)
  To: linux-arm-kernel

vgic.c is currently a mixture of generic vGIC emulation code and
functions specific to emulating a GICv2. To ease the addition of
GICv3 later, we create new header file vgic.h, which holds constants
and prototypes of commonly used functions.
Rename some identifiers to avoid name space clutter.
I removed the long-standing comment about using the kvm_io_bus API
to tackle the GIC register ranges, as it wouldn't be a win for us
anymore.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>

-------
As the diff isn't always obvious here (and to aid eventual rebases),
here is a list of high-level changes done to the code:
* moved definitions and prototypes from vgic.c to vgic.h:
  - VGIC_ADDR_UNDEF
  - ACCESS_{READ,WRITE}_*
  - vgic_update_state()
  - vgic_kick_vcpus()
  - vgic_get_vmcr()
  - vgic_set_vmcr()
  - struct mmio_range {} (renamed to struct kvm_mmio_range)
* removed static keyword and exported prototype in vgic.h:
  - vgic_bitmap_get_reg()
  - vgic_bitmap_set_irq_val()
  - vgic_bitmap_get_shared_map()
  - vgic_bytemap_get_reg()
  - vgic_dist_irq_set()
  - vgic_dist_irq_clear()
  - vgic_cpu_irq_clear()
  - vgic_reg_access()
  - handle_mmio_raz_wi()
  - vgic_handle_enable_reg()
  - vgic_handle_pending_reg()
  - vgic_handle_cfg_reg()
  - vgic_unqueue_irqs()
  - find_matching_range() (renamed to vgic_find_range)
  - vgic_handle_mmio_range()
  - vgic_update_state()
  - vgic_get_vmcr()
  - vgic_set_vmcr()
  - vgic_queue_irq()
  - vgic_kick_vcpus()
  - vgic_init_maps()
  - vgic_has_attr_regs()
  - vgic_set_common_attr()
  - vgic_get_common_attr()
* moved functions to vgic.h (static inline):
  - mmio_data_read()
  - mmio_data_write()
  - is_in_range()
---
Changelog v3...v4:
- rename struct mmio_range to struct kvm_mmio_range
- rename find_matching_range() to vgic_find_range()
- remove vgic_create() and vgic_destroy() from header

 virt/kvm/arm/vgic.c |  150 +++++++++++++++++----------------------------------
 virt/kvm/arm/vgic.h |  119 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 169 insertions(+), 100 deletions(-)
 create mode 100644 virt/kvm/arm/vgic.h

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index ea71cd0..4fa58c9 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -75,32 +75,16 @@
  *   inactive as long as the external input line is held high.
  */
 
-#define VGIC_ADDR_UNDEF		(-1)
-#define IS_VGIC_ADDR_UNDEF(_x)  ((_x) == VGIC_ADDR_UNDEF)
+#include "vgic.h"
 
-#define PRODUCT_ID_KVM		0x4b	/* ASCII code K */
-#define IMPLEMENTER_ARM		0x43b
 #define GICC_ARCH_VERSION_V2	0x2
 
-#define ACCESS_READ_VALUE	(1 << 0)
-#define ACCESS_READ_RAZ		(0 << 0)
-#define ACCESS_READ_MASK(x)	((x) & (1 << 0))
-#define ACCESS_WRITE_IGNORED	(0 << 1)
-#define ACCESS_WRITE_SETBIT	(1 << 1)
-#define ACCESS_WRITE_CLEARBIT	(2 << 1)
-#define ACCESS_WRITE_VALUE	(3 << 1)
-#define ACCESS_WRITE_MASK(x)	((x) & (3 << 1))
-
 static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
 static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
-static void vgic_update_state(struct kvm *kvm);
-static void vgic_kick_vcpus(struct kvm *kvm);
 static u8 *vgic_get_sgi_sources(struct vgic_dist *dist, int vcpu_id, int sgi);
 static void vgic_dispatch_sgi(struct kvm_vcpu *vcpu, u32 reg);
 static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
 static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
-static void vgic_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr);
-static void vgic_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr);
 
 static const struct vgic_ops *vgic_ops;
 static const struct vgic_params *vgic;
@@ -174,8 +158,7 @@ static unsigned long *u64_to_bitmask(u64 *val)
 	return (unsigned long *)val;
 }
 
-static u32 *vgic_bitmap_get_reg(struct vgic_bitmap *x,
-				int cpuid, u32 offset)
+u32 *vgic_bitmap_get_reg(struct vgic_bitmap *x, int cpuid, u32 offset)
 {
 	offset >>= 2;
 	if (!offset)
@@ -193,8 +176,8 @@ static int vgic_bitmap_get_irq_val(struct vgic_bitmap *x,
 	return test_bit(irq - VGIC_NR_PRIVATE_IRQS, x->shared);
 }
 
-static void vgic_bitmap_set_irq_val(struct vgic_bitmap *x, int cpuid,
-				    int irq, int val)
+void vgic_bitmap_set_irq_val(struct vgic_bitmap *x, int cpuid,
+			     int irq, int val)
 {
 	unsigned long *reg;
 
@@ -216,7 +199,7 @@ static unsigned long *vgic_bitmap_get_cpu_map(struct vgic_bitmap *x, int cpuid)
 	return x->private + cpuid;
 }
 
-static unsigned long *vgic_bitmap_get_shared_map(struct vgic_bitmap *x)
+unsigned long *vgic_bitmap_get_shared_map(struct vgic_bitmap *x)
 {
 	return x->shared;
 }
@@ -243,7 +226,7 @@ static void vgic_free_bytemap(struct vgic_bytemap *b)
 	b->shared = NULL;
 }
 
-static u32 *vgic_bytemap_get_reg(struct vgic_bytemap *x, int cpuid, u32 offset)
+u32 *vgic_bytemap_get_reg(struct vgic_bytemap *x, int cpuid, u32 offset)
 {
 	u32 *reg;
 
@@ -340,14 +323,14 @@ static int vgic_dist_irq_is_pending(struct kvm_vcpu *vcpu, int irq)
 	return vgic_bitmap_get_irq_val(&dist->irq_pending, vcpu->vcpu_id, irq);
 }
 
-static void vgic_dist_irq_set_pending(struct kvm_vcpu *vcpu, int irq)
+void vgic_dist_irq_set_pending(struct kvm_vcpu *vcpu, int irq)
 {
 	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
 
 	vgic_bitmap_set_irq_val(&dist->irq_pending, vcpu->vcpu_id, irq, 1);
 }
 
-static void vgic_dist_irq_clear_pending(struct kvm_vcpu *vcpu, int irq)
+void vgic_dist_irq_clear_pending(struct kvm_vcpu *vcpu, int irq)
 {
 	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
 
@@ -363,7 +346,7 @@ static void vgic_cpu_irq_set(struct kvm_vcpu *vcpu, int irq)
 			vcpu->arch.vgic_cpu.pending_shared);
 }
 
-static void vgic_cpu_irq_clear(struct kvm_vcpu *vcpu, int irq)
+void vgic_cpu_irq_clear(struct kvm_vcpu *vcpu, int irq)
 {
 	if (irq < VGIC_NR_PRIVATE_IRQS)
 		clear_bit(irq, vcpu->arch.vgic_cpu.pending_percpu);
@@ -377,16 +360,6 @@ static bool vgic_can_sample_irq(struct kvm_vcpu *vcpu, int irq)
 	return vgic_irq_is_edge(vcpu, irq) || !vgic_irq_is_queued(vcpu, irq);
 }
 
-static u32 mmio_data_read(struct kvm_exit_mmio *mmio, u32 mask)
-{
-	return le32_to_cpu(*((u32 *)mmio->data)) & mask;
-}
-
-static void mmio_data_write(struct kvm_exit_mmio *mmio, u32 mask, u32 value)
-{
-	*((u32 *)mmio->data) = cpu_to_le32(value) & mask;
-}
-
 /**
  * vgic_reg_access - access vgic register
  * @mmio:   pointer to the data describing the mmio access
@@ -398,8 +371,8 @@ static void mmio_data_write(struct kvm_exit_mmio *mmio, u32 mask, u32 value)
  * modes defined for vgic register access
  * (read,raz,write-ignored,setbit,clearbit,write)
  */
-static void vgic_reg_access(struct kvm_exit_mmio *mmio, u32 *reg,
-			    phys_addr_t offset, int mode)
+void vgic_reg_access(struct kvm_exit_mmio *mmio, u32 *reg,
+		     phys_addr_t offset, int mode)
 {
 	int word_offset = (offset & 3) * 8;
 	u32 mask = (1UL << (mmio->len * 8)) - 1;
@@ -483,16 +456,16 @@ static bool handle_mmio_misc(struct kvm_vcpu *vcpu,
 	return false;
 }
 
-static bool handle_mmio_raz_wi(struct kvm_vcpu *vcpu,
-			       struct kvm_exit_mmio *mmio, phys_addr_t offset)
+bool handle_mmio_raz_wi(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio,
+			phys_addr_t offset)
 {
 	vgic_reg_access(mmio, NULL, offset,
 			ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
 	return false;
 }
 
-static bool vgic_handle_enable_reg(struct kvm *kvm, struct kvm_exit_mmio *mmio,
-				   phys_addr_t offset, int vcpu_id, int access)
+bool vgic_handle_enable_reg(struct kvm *kvm, struct kvm_exit_mmio *mmio,
+			    phys_addr_t offset, int vcpu_id, int access)
 {
 	u32 *reg;
 	int mode = ACCESS_READ_VALUE | access;
@@ -529,9 +502,9 @@ static bool handle_mmio_clear_enable_reg(struct kvm_vcpu *vcpu,
 				      vcpu->vcpu_id, ACCESS_WRITE_CLEARBIT);
 }
 
-static bool vgic_handle_set_pending_reg(struct kvm *kvm,
-					struct kvm_exit_mmio *mmio,
-					phys_addr_t offset, int vcpu_id)
+bool vgic_handle_set_pending_reg(struct kvm *kvm,
+				 struct kvm_exit_mmio *mmio,
+				 phys_addr_t offset, int vcpu_id)
 {
 	u32 *reg, orig;
 	u32 level_mask;
@@ -566,9 +539,9 @@ static bool vgic_handle_set_pending_reg(struct kvm *kvm,
 	return false;
 }
 
-static bool vgic_handle_clear_pending_reg(struct kvm *kvm,
-					  struct kvm_exit_mmio *mmio,
-					  phys_addr_t offset, int vcpu_id)
+bool vgic_handle_clear_pending_reg(struct kvm *kvm,
+				   struct kvm_exit_mmio *mmio,
+				   phys_addr_t offset, int vcpu_id)
 {
 	u32 *level_active;
 	u32 *reg, orig;
@@ -740,8 +713,8 @@ static u16 vgic_cfg_compress(u32 val)
  * LSB is always 0. As such, we only keep the upper bit, and use the
  * two above functions to compress/expand the bits
  */
-static bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio *mmio,
-				phys_addr_t offset)
+bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio *mmio,
+			 phys_addr_t offset)
 {
 	u32 val;
 
@@ -817,7 +790,7 @@ static void vgic_v2_add_sgi_source(struct kvm_vcpu *vcpu, int irq, int source)
  * to the distributor but the active state stays in the LRs, because we don't
  * track the active state on the distributor side.
  */
-static void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
+void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
 	int i;
@@ -942,21 +915,7 @@ static bool handle_mmio_sgi_clear(struct kvm_vcpu *vcpu,
 		return write_set_clear_sgi_pend_reg(vcpu, mmio, offset, false);
 }
 
-/*
- * I would have liked to use the kvm_bus_io_*() API instead, but it
- * cannot cope with banked registers (only the VM pointer is passed
- * around, and we need the vcpu). One of these days, someone please
- * fix it!
- */
-struct mmio_range {
-	phys_addr_t base;
-	unsigned long len;
-	int bits_per_irq;
-	bool (*handle_mmio)(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio,
-			    phys_addr_t offset);
-};
-
-static const struct mmio_range vgic_dist_ranges[] = {
+static const struct kvm_mmio_range vgic_dist_ranges[] = {
 	{
 		.base		= GIC_DIST_CTRL,
 		.len		= 12,
@@ -1041,12 +1000,12 @@ static const struct mmio_range vgic_dist_ranges[] = {
 	{}
 };
 
-static const
-struct mmio_range *find_matching_range(const struct mmio_range *ranges,
+const
+struct kvm_mmio_range *vgic_find_range(const struct kvm_mmio_range *ranges,
 				       struct kvm_exit_mmio *mmio,
 				       phys_addr_t offset)
 {
-	const struct mmio_range *r = ranges;
+	const struct kvm_mmio_range *r = ranges;
 
 	while (r->len) {
 		if (offset >= r->base &&
@@ -1059,7 +1018,7 @@ struct mmio_range *find_matching_range(const struct mmio_range *ranges,
 }
 
 static bool vgic_validate_access(const struct vgic_dist *dist,
-				 const struct mmio_range *range,
+				 const struct kvm_mmio_range *range,
 				 unsigned long offset)
 {
 	int irq;
@@ -1085,7 +1044,7 @@ static bool vgic_validate_access(const struct vgic_dist *dist,
 static bool call_range_handler(struct kvm_vcpu *vcpu,
 			       struct kvm_exit_mmio *mmio,
 			       unsigned long offset,
-			       const struct mmio_range *range)
+			       const struct kvm_mmio_range *range)
 {
 	u32 *data32 = (void *)mmio->data;
 	struct kvm_exit_mmio mmio32;
@@ -1129,18 +1088,18 @@ static bool call_range_handler(struct kvm_vcpu *vcpu,
  *
  * returns true if the MMIO access could be performed
  */
-static bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, struct kvm_run *run,
+bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, struct kvm_run *run,
 			    struct kvm_exit_mmio *mmio,
-			    const struct mmio_range *ranges,
+			    const struct kvm_mmio_range *ranges,
 			    unsigned long mmio_base)
 {
-	const struct mmio_range *range;
+	const struct kvm_mmio_range *range;
 	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
 	bool updated_state;
 	unsigned long offset;
 
 	offset = mmio->phys_addr - mmio_base;
-	range = find_matching_range(ranges, mmio, offset);
+	range = vgic_find_range(ranges, mmio, offset);
 	if (unlikely(!range || !range->handle_mmio)) {
 		pr_warn("Unhandled access %d %08llx %d\n",
 			mmio->is_write, mmio->phys_addr, mmio->len);
@@ -1166,12 +1125,6 @@ static bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, struct kvm_run *run,
 	return true;
 }
 
-static inline bool is_in_range(phys_addr_t addr, unsigned long len,
-			       phys_addr_t baseaddr, unsigned long size)
-{
-	return (addr >= baseaddr) && (addr + len <= baseaddr + size);
-}
-
 static bool vgic_v2_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
 				struct kvm_exit_mmio *mmio)
 {
@@ -1298,7 +1251,7 @@ static int compute_pending_for_cpu(struct kvm_vcpu *vcpu)
  * Update the interrupt state and determine which CPUs have pending
  * interrupts. Must be called with distributor lock held.
  */
-static void vgic_update_state(struct kvm *kvm)
+void vgic_update_state(struct kvm *kvm)
 {
 	struct vgic_dist *dist = &kvm->arch.vgic;
 	struct kvm_vcpu *vcpu;
@@ -1359,12 +1312,12 @@ static inline void vgic_disable_underflow(struct kvm_vcpu *vcpu)
 	vgic_ops->disable_underflow(vcpu);
 }
 
-static inline void vgic_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr)
+void vgic_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr)
 {
 	vgic_ops->get_vmcr(vcpu, vmcr);
 }
 
-static void vgic_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr)
+void vgic_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr)
 {
 	vgic_ops->set_vmcr(vcpu, vmcr);
 }
@@ -1414,7 +1367,7 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu)
  * Queue an interrupt to a CPU virtual interface. Return true on success,
  * or false if it wasn't possible to queue it.
  */
-static bool vgic_queue_irq(struct kvm_vcpu *vcpu, u8 sgi_source_id, int irq)
+bool vgic_queue_irq(struct kvm_vcpu *vcpu, u8 sgi_source_id, int irq)
 {
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
 	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
@@ -1700,7 +1653,7 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
 	return test_bit(vcpu->vcpu_id, dist->irq_pending_on_cpu);
 }
 
-static void vgic_kick_vcpus(struct kvm *kvm)
+void vgic_kick_vcpus(struct kvm *kvm)
 {
 	struct kvm_vcpu *vcpu;
 	int c;
@@ -1941,7 +1894,7 @@ void kvm_vgic_destroy(struct kvm *kvm)
  * Allocate and initialize the various data structures. Must be called
  * with kvm->lock held!
  */
-static int vgic_init_maps(struct kvm *kvm)
+int vgic_init_maps(struct kvm *kvm)
 {
 	struct vgic_dist *dist = &kvm->arch.vgic;
 	struct kvm_vcpu *vcpu;
@@ -2080,7 +2033,7 @@ out:
 	return ret;
 }
 
-static int vgic_v2_init_emulation(struct kvm *kvm)
+int vgic_v2_init_emulation(struct kvm *kvm)
 {
 	struct vgic_dist *dist = &kvm->arch.vgic;
 
@@ -2320,7 +2273,7 @@ static bool handle_cpu_mmio_ident(struct kvm_vcpu *vcpu,
  * CPU Interface Register accesses - these are not accessed by the VM, but by
  * user space for saving and restoring VGIC state.
  */
-static const struct mmio_range vgic_cpu_ranges[] = {
+static const struct kvm_mmio_range vgic_cpu_ranges[] = {
 	{
 		.base		= GIC_CPU_CTRL,
 		.len		= 12,
@@ -2347,7 +2300,7 @@ static int vgic_attr_regs_access(struct kvm_device *dev,
 				 struct kvm_device_attr *attr,
 				 u32 *reg, bool is_write)
 {
-	const struct mmio_range *r = NULL, *ranges;
+	const struct kvm_mmio_range *r = NULL, *ranges;
 	phys_addr_t offset;
 	int ret, cpuid, c;
 	struct kvm_vcpu *vcpu, *tmp_vcpu;
@@ -2388,7 +2341,7 @@ static int vgic_attr_regs_access(struct kvm_device *dev,
 	default:
 		BUG();
 	}
-	r = find_matching_range(ranges, &mmio, offset);
+	r = vgic_find_range(ranges, &mmio, offset);
 
 	if (unlikely(!r || !r->handle_mmio)) {
 		ret = -ENXIO;
@@ -2434,8 +2387,7 @@ out:
 	return ret;
 }
 
-static int vgic_set_common_attr(struct kvm_device *dev,
-				struct kvm_device_attr *attr)
+int vgic_set_common_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
 {
 	int r;
 
@@ -2512,8 +2464,7 @@ static int vgic_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
 	return -ENXIO;
 }
 
-static int vgic_get_common_attr(struct kvm_device *dev,
-				struct kvm_device_attr *attr)
+int vgic_get_common_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
 {
 	int r = -ENXIO;
 
@@ -2568,13 +2519,12 @@ static int vgic_get_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
 	return -ENXIO;
 }
 
-static int vgic_has_attr_regs(const struct mmio_range *ranges,
-			      phys_addr_t offset)
+int vgic_has_attr_regs(const struct kvm_mmio_range *ranges, phys_addr_t offset)
 {
 	struct kvm_exit_mmio dev_attr_mmio;
 
 	dev_attr_mmio.len = 4;
-	if (find_matching_range(ranges, &dev_attr_mmio, offset))
+	if (vgic_find_range(ranges, &dev_attr_mmio, offset))
 		return 0;
 	else
 		return -ENXIO;
@@ -2604,12 +2554,12 @@ static int vgic_has_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
 	return -ENXIO;
 }
 
-static void vgic_destroy(struct kvm_device *dev)
+void vgic_destroy(struct kvm_device *dev)
 {
 	kfree(dev);
 }
 
-static int vgic_create(struct kvm_device *dev, u32 type)
+int vgic_create(struct kvm_device *dev, u32 type)
 {
 	return kvm_vgic_create(dev->kvm, type);
 }
diff --git a/virt/kvm/arm/vgic.h b/virt/kvm/arm/vgic.h
new file mode 100644
index 0000000..ff3171a
--- /dev/null
+++ b/virt/kvm/arm/vgic.h
@@ -0,0 +1,119 @@
+/*
+ * Copyright (C) 2012-2014 ARM Ltd.
+ * Author: Marc Zyngier <marc.zyngier@arm.com>
+ *
+ * Derived from virt/kvm/arm/vgic.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __KVM_VGIC_H__
+#define __KVM_VGIC_H__
+
+#define VGIC_ADDR_UNDEF		(-1)
+#define IS_VGIC_ADDR_UNDEF(_x)  ((_x) == VGIC_ADDR_UNDEF)
+
+#define PRODUCT_ID_KVM		0x4b	/* ASCII code K */
+#define IMPLEMENTER_ARM		0x43b
+
+#define ACCESS_READ_VALUE	(1 << 0)
+#define ACCESS_READ_RAZ		(0 << 0)
+#define ACCESS_READ_MASK(x)	((x) & (1 << 0))
+#define ACCESS_WRITE_IGNORED	(0 << 1)
+#define ACCESS_WRITE_SETBIT	(1 << 1)
+#define ACCESS_WRITE_CLEARBIT	(2 << 1)
+#define ACCESS_WRITE_VALUE	(3 << 1)
+#define ACCESS_WRITE_MASK(x)	((x) & (3 << 1))
+
+unsigned long *vgic_bitmap_get_shared_map(struct vgic_bitmap *x);
+
+void vgic_update_state(struct kvm *kvm);
+int vgic_init_maps(struct kvm *kvm);
+
+u32 *vgic_bitmap_get_reg(struct vgic_bitmap *x, int cpuid, u32 offset);
+u32 *vgic_bytemap_get_reg(struct vgic_bytemap *x, int cpuid, u32 offset);
+
+void vgic_dist_irq_set_pending(struct kvm_vcpu *vcpu, int irq);
+void vgic_dist_irq_clear_pending(struct kvm_vcpu *vcpu, int irq);
+void vgic_cpu_irq_clear(struct kvm_vcpu *vcpu, int irq);
+void vgic_bitmap_set_irq_val(struct vgic_bitmap *x, int cpuid,
+			     int irq, int val);
+
+void vgic_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr);
+void vgic_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr);
+
+bool vgic_queue_irq(struct kvm_vcpu *vcpu, u8 sgi_source_id, int irq);
+void vgic_unqueue_irqs(struct kvm_vcpu *vcpu);
+
+void vgic_reg_access(struct kvm_exit_mmio *mmio, u32 *reg,
+		     phys_addr_t offset, int mode);
+bool handle_mmio_raz_wi(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio,
+			phys_addr_t offset);
+
+static inline
+u32 mmio_data_read(struct kvm_exit_mmio *mmio, u32 mask)
+{
+	return le32_to_cpu(*((u32 *)mmio->data)) & mask;
+}
+
+static inline
+void mmio_data_write(struct kvm_exit_mmio *mmio, u32 mask, u32 value)
+{
+	*((u32 *)mmio->data) = cpu_to_le32(value) & mask;
+}
+
+struct kvm_mmio_range {
+	phys_addr_t base;
+	unsigned long len;
+	int bits_per_irq;
+	bool (*handle_mmio)(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio,
+			    phys_addr_t offset);
+};
+
+static inline bool is_in_range(phys_addr_t addr, unsigned long len,
+			       phys_addr_t baseaddr, unsigned long size)
+{
+	return (addr >= baseaddr) && (addr + len <= baseaddr + size);
+}
+
+const
+struct kvm_mmio_range *vgic_find_range(const struct kvm_mmio_range *ranges,
+				       struct kvm_exit_mmio *mmio,
+				       phys_addr_t offset);
+
+bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, struct kvm_run *run,
+			    struct kvm_exit_mmio *mmio,
+			    const struct kvm_mmio_range *ranges,
+			    unsigned long mmio_base);
+
+bool vgic_handle_enable_reg(struct kvm *kvm, struct kvm_exit_mmio *mmio,
+			    phys_addr_t offset, int vcpu_id, int access);
+
+bool vgic_handle_set_pending_reg(struct kvm *kvm, struct kvm_exit_mmio *mmio,
+				 phys_addr_t offset, int vcpu_id);
+
+bool vgic_handle_clear_pending_reg(struct kvm *kvm, struct kvm_exit_mmio *mmio,
+				   phys_addr_t offset, int vcpu_id);
+
+bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio *mmio,
+			 phys_addr_t offset);
+
+void vgic_kick_vcpus(struct kvm *kvm);
+
+int vgic_has_attr_regs(const struct kvm_mmio_range *ranges, phys_addr_t offset);
+int vgic_set_common_attr(struct kvm_device *dev, struct kvm_device_attr *attr);
+int vgic_get_common_attr(struct kvm_device *dev, struct kvm_device_attr *attr);
+
+int vgic_v2_init_emulation(struct kvm *kvm);
+
+#endif
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v4 13/19] arm/arm64: KVM: split GICv2 specific emulation code from vgic.c
  2014-11-14 10:07 [PATCH v4 00/19] KVM GICv3 emulation Andre Przywara
                   ` (11 preceding siblings ...)
  2014-11-14 10:07 ` [PATCH v4 12/19] arm/arm64: KVM: add vgic.h header file Andre Przywara
@ 2014-11-14 10:07 ` Andre Przywara
  2014-11-23 13:32   ` Christoffer Dall
  2014-11-14 10:07 ` [PATCH v4 14/19] arm/arm64: KVM: add opaque private pointer to MMIO data Andre Przywara
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 80+ messages in thread
From: Andre Przywara @ 2014-11-14 10:07 UTC (permalink / raw)
  To: linux-arm-kernel

vgic.c is currently a mixture of generic vGIC emulation code and
functions specific to emulating a GICv2. To ease the addition of
GICv3, split off strictly v2 specific parts into a new file
vgic-v2-emul.c.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>

-------
As the diff isn't always obvious here (and to aid eventual rebases),
here is a list of high-level changes done to the code:
* added new file to respective arm/arm64 Makefiles
* moved GICv2 specific functions to vgic-v2-emul.c:
  - handle_mmio_misc()
  - handle_mmio_set_enable_reg()
  - handle_mmio_clear_enable_reg()
  - handle_mmio_set_pending_reg()
  - handle_mmio_clear_pending_reg()
  - handle_mmio_priority_reg()
  - vgic_get_target_reg()
  - vgic_set_target_reg()
  - handle_mmio_target_reg()
  - handle_mmio_cfg_reg()
  - handle_mmio_sgi_reg()
  - vgic_v2_unqueue_sgi()
  - read_set_clear_sgi_pend_reg()
  - write_set_clear_sgi_pend_reg()
  - handle_mmio_sgi_set()
  - handle_mmio_sgi_clear()
  - vgic_v2_handle_mmio()
  - vgic_get_sgi_sources()
  - vgic_dispatch_sgi()
  - vgic_v2_queue_sgi()
  - vgic_v2_init()
  - vgic_v2_init_emulation()
  - handle_cpu_mmio_misc()
  - handle_mmio_abpr()
  - handle_cpu_mmio_ident()
  - vgic_attr_regs_access()
  - vgic_create() (renamed to vgic_v2_create())
  - vgic_destroy() (renamed to vgic_v2_destroy())
  - vgic_has_attr() (renamed to vgic_v2_has_attr())
  - vgic_set_attr() (renamed to vgic_v2_set_attr())
  - vgic_get_attr() (renamed to vgic_v2_get_attr())
  - struct kvm_mmio_range vgic_dist_ranges[]
  - struct kvm_mmio_range vgic_cpu_ranges[]
  - struct kvm_device_ops kvm_arm_vgic_v2_ops {}
---
Changelog v3...v4:
- (move adapted to previous patches)
- moving vgic_create() and vgic_destroy() also

 arch/arm/kvm/Makefile       |    1 +
 arch/arm64/kvm/Makefile     |    1 +
 virt/kvm/arm/vgic-v2-emul.c |  805 +++++++++++++++++++++++++++++++++++++++++++
 virt/kvm/arm/vgic.c         |  762 +---------------------------------------
 4 files changed, 808 insertions(+), 761 deletions(-)
 create mode 100644 virt/kvm/arm/vgic-v2-emul.c

diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index f7057ed..443b8be 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -22,4 +22,5 @@ obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
 obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
 obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic.o
 obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v2.o
+obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v2-emul.o
 obj-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 32a0961..d957353 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -21,6 +21,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += guest.o reset.o sys_regs.o sys_regs_generic_v8.o
 
 kvm-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic.o
 kvm-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v2.o
+kvm-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v2-emul.o
 kvm-$(CONFIG_KVM_ARM_VGIC) += vgic-v2-switch.o
 kvm-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v3.o
 kvm-$(CONFIG_KVM_ARM_VGIC) += vgic-v3-switch.o
diff --git a/virt/kvm/arm/vgic-v2-emul.c b/virt/kvm/arm/vgic-v2-emul.c
new file mode 100644
index 0000000..e085ef7
--- /dev/null
+++ b/virt/kvm/arm/vgic-v2-emul.c
@@ -0,0 +1,805 @@
+/*
+ * Contains GICv2 specific emulation code, was in vgic.c before.
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ * Author: Marc Zyngier <marc.zyngier@arm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/cpu.h>
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/uaccess.h>
+
+#include <linux/irqchip/arm-gic.h>
+
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_mmu.h>
+
+#include "vgic.h"
+
+#define GICC_ARCH_VERSION_V2		0x2
+
+static void vgic_dispatch_sgi(struct kvm_vcpu *vcpu, u32 reg);
+static u8 *vgic_get_sgi_sources(struct vgic_dist *dist, int vcpu_id, int sgi)
+{
+	return dist->irq_sgi_sources + vcpu_id * VGIC_NR_SGIS + sgi;
+}
+
+static bool handle_mmio_misc(struct kvm_vcpu *vcpu,
+			     struct kvm_exit_mmio *mmio, phys_addr_t offset)
+{
+	u32 reg;
+	u32 word_offset = offset & 3;
+
+	switch (offset & ~3) {
+	case 0:			/* GICD_CTLR */
+		reg = vcpu->kvm->arch.vgic.enabled;
+		vgic_reg_access(mmio, &reg, word_offset,
+				ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
+		if (mmio->is_write) {
+			vcpu->kvm->arch.vgic.enabled = reg & 1;
+			vgic_update_state(vcpu->kvm);
+			return true;
+		}
+		break;
+
+	case 4:			/* GICD_TYPER */
+		reg  = (atomic_read(&vcpu->kvm->online_vcpus) - 1) << 5;
+		reg |= (vcpu->kvm->arch.vgic.nr_irqs >> 5) - 1;
+		vgic_reg_access(mmio, &reg, word_offset,
+				ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
+		break;
+
+	case 8:			/* GICD_IIDR */
+		reg = (PRODUCT_ID_KVM << 24) | (IMPLEMENTER_ARM << 0);
+		vgic_reg_access(mmio, &reg, word_offset,
+				ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
+		break;
+	}
+
+	return false;
+}
+
+static bool handle_mmio_set_enable_reg(struct kvm_vcpu *vcpu,
+				       struct kvm_exit_mmio *mmio,
+				       phys_addr_t offset)
+{
+	return vgic_handle_enable_reg(vcpu->kvm, mmio, offset,
+				      vcpu->vcpu_id, ACCESS_WRITE_SETBIT);
+}
+
+static bool handle_mmio_clear_enable_reg(struct kvm_vcpu *vcpu,
+					 struct kvm_exit_mmio *mmio,
+					 phys_addr_t offset)
+{
+	return vgic_handle_enable_reg(vcpu->kvm, mmio, offset,
+				      vcpu->vcpu_id, ACCESS_WRITE_CLEARBIT);
+}
+
+static bool handle_mmio_set_pending_reg(struct kvm_vcpu *vcpu,
+					struct kvm_exit_mmio *mmio,
+					phys_addr_t offset)
+{
+	return vgic_handle_set_pending_reg(vcpu->kvm, mmio, offset,
+					   vcpu->vcpu_id);
+}
+
+static bool handle_mmio_clear_pending_reg(struct kvm_vcpu *vcpu,
+					  struct kvm_exit_mmio *mmio,
+					  phys_addr_t offset)
+{
+	return vgic_handle_clear_pending_reg(vcpu->kvm, mmio, offset,
+					     vcpu->vcpu_id);
+}
+
+static bool handle_mmio_priority_reg(struct kvm_vcpu *vcpu,
+				     struct kvm_exit_mmio *mmio,
+				     phys_addr_t offset)
+{
+	u32 *reg = vgic_bytemap_get_reg(&vcpu->kvm->arch.vgic.irq_priority,
+					vcpu->vcpu_id, offset);
+	vgic_reg_access(mmio, reg, offset,
+			ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
+	return false;
+}
+
+#define GICD_ITARGETSR_SIZE	32
+#define GICD_CPUTARGETS_BITS	8
+#define GICD_IRQS_PER_ITARGETSR	(GICD_ITARGETSR_SIZE / GICD_CPUTARGETS_BITS)
+static u32 vgic_get_target_reg(struct kvm *kvm, int irq)
+{
+	struct vgic_dist *dist = &kvm->arch.vgic;
+	int i;
+	u32 val = 0;
+
+	irq -= VGIC_NR_PRIVATE_IRQS;
+
+	for (i = 0; i < GICD_IRQS_PER_ITARGETSR; i++)
+		val |= 1 << (dist->irq_spi_cpu[irq + i] + i * 8);
+
+	return val;
+}
+
+static void vgic_set_target_reg(struct kvm *kvm, u32 val, int irq)
+{
+	struct vgic_dist *dist = &kvm->arch.vgic;
+	struct kvm_vcpu *vcpu;
+	int i, c;
+	unsigned long *bmap;
+	u32 target;
+
+	irq -= VGIC_NR_PRIVATE_IRQS;
+
+	/*
+	 * Pick the LSB in each byte. This ensures we target exactly
+	 * one vcpu per IRQ. If the byte is null, assume we target
+	 * CPU0.
+	 */
+	for (i = 0; i < GICD_IRQS_PER_ITARGETSR; i++) {
+		int shift = i * GICD_CPUTARGETS_BITS;
+
+		target = ffs((val >> shift) & 0xffU);
+		target = target ? (target - 1) : 0;
+		dist->irq_spi_cpu[irq + i] = target;
+		kvm_for_each_vcpu(c, vcpu, kvm) {
+			bmap = vgic_bitmap_get_shared_map(&dist->irq_spi_target[c]);
+			if (c == target)
+				set_bit(irq + i, bmap);
+			else
+				clear_bit(irq + i, bmap);
+		}
+	}
+}
+
+static bool handle_mmio_target_reg(struct kvm_vcpu *vcpu,
+				   struct kvm_exit_mmio *mmio,
+				   phys_addr_t offset)
+{
+	u32 reg;
+
+	/* We treat the banked interrupts targets as read-only */
+	if (offset < 32) {
+		u32 roreg;
+
+		roreg = 1 << vcpu->vcpu_id;
+		roreg |= roreg << 8;
+		roreg |= roreg << 16;
+
+		vgic_reg_access(mmio, &roreg, offset,
+				ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
+		return false;
+	}
+
+	reg = vgic_get_target_reg(vcpu->kvm, offset & ~3U);
+	vgic_reg_access(mmio, &reg, offset,
+			ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
+	if (mmio->is_write) {
+		vgic_set_target_reg(vcpu->kvm, reg, offset & ~3U);
+		vgic_update_state(vcpu->kvm);
+		return true;
+	}
+
+	return false;
+}
+
+static bool handle_mmio_cfg_reg(struct kvm_vcpu *vcpu,
+				struct kvm_exit_mmio *mmio, phys_addr_t offset)
+{
+	u32 *reg;
+
+	reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_cfg,
+				  vcpu->vcpu_id, offset >> 1);
+
+	return vgic_handle_cfg_reg(reg, mmio, offset);
+}
+
+static bool handle_mmio_sgi_reg(struct kvm_vcpu *vcpu,
+				struct kvm_exit_mmio *mmio, phys_addr_t offset)
+{
+	u32 reg;
+
+	vgic_reg_access(mmio, &reg, offset,
+			ACCESS_READ_RAZ | ACCESS_WRITE_VALUE);
+	if (mmio->is_write) {
+		vgic_dispatch_sgi(vcpu, reg);
+		vgic_update_state(vcpu->kvm);
+		return true;
+	}
+
+	return false;
+}
+
+/* Handle reads of GICD_CPENDSGIRn and GICD_SPENDSGIRn */
+static bool read_set_clear_sgi_pend_reg(struct kvm_vcpu *vcpu,
+					struct kvm_exit_mmio *mmio,
+					phys_addr_t offset)
+{
+	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+	int sgi;
+	int min_sgi = (offset & ~0x3);
+	int max_sgi = min_sgi + 3;
+	int vcpu_id = vcpu->vcpu_id;
+	u32 reg = 0;
+
+	/* Copy source SGIs from distributor side */
+	for (sgi = min_sgi; sgi <= max_sgi; sgi++) {
+		u8 sources = *vgic_get_sgi_sources(dist, vcpu_id, sgi);
+
+		reg |= ((u32)sources) << (8 * (sgi - min_sgi));
+	}
+
+	mmio_data_write(mmio, ~0, reg);
+	return false;
+}
+
+static bool write_set_clear_sgi_pend_reg(struct kvm_vcpu *vcpu,
+					 struct kvm_exit_mmio *mmio,
+					 phys_addr_t offset, bool set)
+{
+	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+	int sgi;
+	int min_sgi = (offset & ~0x3);
+	int max_sgi = min_sgi + 3;
+	int vcpu_id = vcpu->vcpu_id;
+	u32 reg;
+	bool updated = false;
+
+	reg = mmio_data_read(mmio, ~0);
+
+	/* Clear pending SGIs on the distributor */
+	for (sgi = min_sgi; sgi <= max_sgi; sgi++) {
+		u8 mask = reg >> (8 * (sgi - min_sgi));
+		u8 *src = vgic_get_sgi_sources(dist, vcpu_id, sgi);
+
+		if (set) {
+			if ((*src & mask) != mask)
+				updated = true;
+			*src |= mask;
+		} else {
+			if (*src & mask)
+				updated = true;
+			*src &= ~mask;
+		}
+	}
+
+	if (updated)
+		vgic_update_state(vcpu->kvm);
+
+	return updated;
+}
+
+static bool handle_mmio_sgi_set(struct kvm_vcpu *vcpu,
+				struct kvm_exit_mmio *mmio,
+				phys_addr_t offset)
+{
+	if (!mmio->is_write)
+		return read_set_clear_sgi_pend_reg(vcpu, mmio, offset);
+	else
+		return write_set_clear_sgi_pend_reg(vcpu, mmio, offset, true);
+}
+
+static bool handle_mmio_sgi_clear(struct kvm_vcpu *vcpu,
+				  struct kvm_exit_mmio *mmio,
+				  phys_addr_t offset)
+{
+	if (!mmio->is_write)
+		return read_set_clear_sgi_pend_reg(vcpu, mmio, offset);
+	else
+		return write_set_clear_sgi_pend_reg(vcpu, mmio, offset, false);
+}
+
+static const struct kvm_mmio_range vgic_dist_ranges[] = {
+	{
+		.base		= GIC_DIST_CTRL,
+		.len		= 12,
+		.bits_per_irq	= 0,
+		.handle_mmio	= handle_mmio_misc,
+	},
+	{
+		.base		= GIC_DIST_IGROUP,
+		.len		= VGIC_MAX_IRQS / 8,
+		.bits_per_irq	= 1,
+		.handle_mmio	= handle_mmio_raz_wi,
+	},
+	{
+		.base		= GIC_DIST_ENABLE_SET,
+		.len		= VGIC_MAX_IRQS / 8,
+		.bits_per_irq	= 1,
+		.handle_mmio	= handle_mmio_set_enable_reg,
+	},
+	{
+		.base		= GIC_DIST_ENABLE_CLEAR,
+		.len		= VGIC_MAX_IRQS / 8,
+		.bits_per_irq	= 1,
+		.handle_mmio	= handle_mmio_clear_enable_reg,
+	},
+	{
+		.base		= GIC_DIST_PENDING_SET,
+		.len		= VGIC_MAX_IRQS / 8,
+		.bits_per_irq	= 1,
+		.handle_mmio	= handle_mmio_set_pending_reg,
+	},
+	{
+		.base		= GIC_DIST_PENDING_CLEAR,
+		.len		= VGIC_MAX_IRQS / 8,
+		.bits_per_irq	= 1,
+		.handle_mmio	= handle_mmio_clear_pending_reg,
+	},
+	{
+		.base		= GIC_DIST_ACTIVE_SET,
+		.len		= VGIC_MAX_IRQS / 8,
+		.bits_per_irq	= 1,
+		.handle_mmio	= handle_mmio_raz_wi,
+	},
+	{
+		.base		= GIC_DIST_ACTIVE_CLEAR,
+		.len		= VGIC_MAX_IRQS / 8,
+		.bits_per_irq	= 1,
+		.handle_mmio	= handle_mmio_raz_wi,
+	},
+	{
+		.base		= GIC_DIST_PRI,
+		.len		= VGIC_MAX_IRQS,
+		.bits_per_irq	= 8,
+		.handle_mmio	= handle_mmio_priority_reg,
+	},
+	{
+		.base		= GIC_DIST_TARGET,
+		.len		= VGIC_MAX_IRQS,
+		.bits_per_irq	= 8,
+		.handle_mmio	= handle_mmio_target_reg,
+	},
+	{
+		.base		= GIC_DIST_CONFIG,
+		.len		= VGIC_MAX_IRQS / 4,
+		.bits_per_irq	= 2,
+		.handle_mmio	= handle_mmio_cfg_reg,
+	},
+	{
+		.base		= GIC_DIST_SOFTINT,
+		.len		= 4,
+		.handle_mmio	= handle_mmio_sgi_reg,
+	},
+	{
+		.base		= GIC_DIST_SGI_PENDING_CLEAR,
+		.len		= VGIC_NR_SGIS,
+		.handle_mmio	= handle_mmio_sgi_clear,
+	},
+	{
+		.base		= GIC_DIST_SGI_PENDING_SET,
+		.len		= VGIC_NR_SGIS,
+		.handle_mmio	= handle_mmio_sgi_set,
+	},
+	{}
+};
+
+static bool vgic_v2_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
+				struct kvm_exit_mmio *mmio)
+{
+	unsigned long base = vcpu->kvm->arch.vgic.vgic_dist_base;
+
+	if (!is_in_range(mmio->phys_addr, mmio->len, base,
+			 KVM_VGIC_V2_DIST_SIZE))
+		return false;
+
+	/* GICv2 does not support accesses wider than 32 bits */
+	if (mmio->len > 4) {
+		kvm_inject_dabt(vcpu, mmio->phys_addr);
+		return true;
+	}
+
+	return vgic_handle_mmio_range(vcpu, run, mmio, vgic_dist_ranges, base);
+}
+
+static void vgic_dispatch_sgi(struct kvm_vcpu *vcpu, u32 reg)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct vgic_dist *dist = &kvm->arch.vgic;
+	int nrcpus = atomic_read(&kvm->online_vcpus);
+	u8 target_cpus;
+	int sgi, mode, c, vcpu_id;
+
+	vcpu_id = vcpu->vcpu_id;
+
+	sgi = reg & 0xf;
+	target_cpus = (reg >> 16) & 0xff;
+	mode = (reg >> 24) & 3;
+
+	switch (mode) {
+	case 0:
+		if (!target_cpus)
+			return;
+		break;
+
+	case 1:
+		target_cpus = ((1 << nrcpus) - 1) & ~(1 << vcpu_id) & 0xff;
+		break;
+
+	case 2:
+		target_cpus = 1 << vcpu_id;
+		break;
+	}
+
+	kvm_for_each_vcpu(c, vcpu, kvm) {
+		if (target_cpus & 1) {
+			/* Flag the SGI as pending */
+			vgic_dist_irq_set_pending(vcpu, sgi);
+			*vgic_get_sgi_sources(dist, c, sgi) |= 1 << vcpu_id;
+			kvm_debug("SGI%d from CPU%d to CPU%d\n",
+				  sgi, vcpu_id, c);
+		}
+
+		target_cpus >>= 1;
+	}
+}
+
+static bool vgic_v2_queue_sgi(struct kvm_vcpu *vcpu, int irq)
+{
+	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+	unsigned long sources;
+	int vcpu_id = vcpu->vcpu_id;
+	int c;
+
+	sources = *vgic_get_sgi_sources(dist, vcpu_id, irq);
+
+	for_each_set_bit(c, &sources, dist->nr_cpus) {
+		if (vgic_queue_irq(vcpu, c, irq))
+			clear_bit(c, &sources);
+	}
+
+	*vgic_get_sgi_sources(dist, vcpu_id, irq) = sources;
+
+	/*
+	 * If the sources bitmap has been cleared it means that we
+	 * could queue all the SGIs onto link registers (see the
+	 * clear_bit above), and therefore we are done with them in
+	 * our emulated gic and can get rid of them.
+	 */
+	if (!sources) {
+		vgic_dist_irq_clear_pending(vcpu, irq);
+		vgic_cpu_irq_clear(vcpu, irq);
+		return true;
+	}
+
+	return false;
+}
+
+static int vgic_v2_init(struct kvm *kvm, const struct vgic_params *params)
+{
+	struct vgic_dist *dist = &kvm->arch.vgic;
+	int ret, i;
+
+	if (IS_VGIC_ADDR_UNDEF(dist->vgic_dist_base) ||
+	    IS_VGIC_ADDR_UNDEF(dist->vgic_cpu_base)) {
+		kvm_err("Need to set vgic distributor addresses first\n");
+		return -ENXIO;
+	}
+
+	ret = kvm_phys_addr_ioremap(kvm, dist->vgic_cpu_base,
+				    params->vcpu_base,
+				    KVM_VGIC_V2_CPU_SIZE, true);
+	if (ret) {
+		kvm_err("Unable to remap VGIC CPU to VCPU\n");
+		return ret;
+	}
+
+	/* Initialize the target VCPUs for each IRQ to VCPU 0 */
+	for (i = VGIC_NR_PRIVATE_IRQS; i < dist->nr_irqs; i += 4)
+		vgic_set_target_reg(kvm, 0, i);
+
+	return 0;
+}
+
+static void vgic_v2_add_sgi_source(struct kvm_vcpu *vcpu, int irq, int source)
+{
+	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+
+	*vgic_get_sgi_sources(dist, vcpu->vcpu_id, irq) |= 1 << source;
+}
+
+int vgic_v2_init_emulation(struct kvm *kvm)
+{
+	struct vgic_dist *dist = &kvm->arch.vgic;
+
+	dist->vm_ops.handle_mmio = vgic_v2_handle_mmio;
+	dist->vm_ops.queue_sgi = vgic_v2_queue_sgi;
+	dist->vm_ops.add_sgi_source = vgic_v2_add_sgi_source;
+	dist->vm_ops.vgic_init = vgic_v2_init;
+
+	kvm->arch.max_vcpus = 8;
+
+	return 0;
+}
+
+static bool handle_cpu_mmio_misc(struct kvm_vcpu *vcpu,
+				 struct kvm_exit_mmio *mmio, phys_addr_t offset)
+{
+	bool updated = false;
+	struct vgic_vmcr vmcr;
+	u32 *vmcr_field;
+	u32 reg;
+
+	vgic_get_vmcr(vcpu, &vmcr);
+
+	switch (offset & ~0x3) {
+	case GIC_CPU_CTRL:
+		vmcr_field = &vmcr.ctlr;
+		break;
+	case GIC_CPU_PRIMASK:
+		vmcr_field = &vmcr.pmr;
+		break;
+	case GIC_CPU_BINPOINT:
+		vmcr_field = &vmcr.bpr;
+		break;
+	case GIC_CPU_ALIAS_BINPOINT:
+		vmcr_field = &vmcr.abpr;
+		break;
+	default:
+		BUG();
+	}
+
+	if (!mmio->is_write) {
+		reg = *vmcr_field;
+		mmio_data_write(mmio, ~0, reg);
+	} else {
+		reg = mmio_data_read(mmio, ~0);
+		if (reg != *vmcr_field) {
+			*vmcr_field = reg;
+			vgic_set_vmcr(vcpu, &vmcr);
+			updated = true;
+		}
+	}
+	return updated;
+}
+
+static bool handle_mmio_abpr(struct kvm_vcpu *vcpu,
+			     struct kvm_exit_mmio *mmio, phys_addr_t offset)
+{
+	return handle_cpu_mmio_misc(vcpu, mmio, GIC_CPU_ALIAS_BINPOINT);
+}
+
+static bool handle_cpu_mmio_ident(struct kvm_vcpu *vcpu,
+				  struct kvm_exit_mmio *mmio,
+				  phys_addr_t offset)
+{
+	u32 reg;
+
+	if (mmio->is_write)
+		return false;
+
+	/* GICC_IIDR */
+	reg = (PRODUCT_ID_KVM << 20) |
+	      (GICC_ARCH_VERSION_V2 << 16) |
+	      (IMPLEMENTER_ARM << 0);
+	mmio_data_write(mmio, ~0, reg);
+	return false;
+}
+
+/*
+ * CPU Interface Register accesses - these are not accessed by the VM, but by
+ * user space for saving and restoring VGIC state.
+ */
+static const struct kvm_mmio_range vgic_cpu_ranges[] = {
+	{
+		.base		= GIC_CPU_CTRL,
+		.len		= 12,
+		.handle_mmio	= handle_cpu_mmio_misc,
+	},
+	{
+		.base		= GIC_CPU_ALIAS_BINPOINT,
+		.len		= 4,
+		.handle_mmio	= handle_mmio_abpr,
+	},
+	{
+		.base		= GIC_CPU_ACTIVEPRIO,
+		.len		= 16,
+		.handle_mmio	= handle_mmio_raz_wi,
+	},
+	{
+		.base		= GIC_CPU_IDENT,
+		.len		= 4,
+		.handle_mmio	= handle_cpu_mmio_ident,
+	},
+};
+
+static int vgic_attr_regs_access(struct kvm_device *dev,
+				 struct kvm_device_attr *attr,
+				 u32 *reg, bool is_write)
+{
+	const struct kvm_mmio_range *r = NULL, *ranges;
+	phys_addr_t offset;
+	int ret, cpuid, c;
+	struct kvm_vcpu *vcpu, *tmp_vcpu;
+	struct vgic_dist *vgic;
+	struct kvm_exit_mmio mmio;
+
+	offset = attr->attr & KVM_DEV_ARM_VGIC_OFFSET_MASK;
+	cpuid = (attr->attr & KVM_DEV_ARM_VGIC_CPUID_MASK) >>
+		KVM_DEV_ARM_VGIC_CPUID_SHIFT;
+
+	mutex_lock(&dev->kvm->lock);
+
+	ret = vgic_init_maps(dev->kvm);
+	if (ret)
+		goto out;
+
+	if (cpuid >= atomic_read(&dev->kvm->online_vcpus)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	vcpu = kvm_get_vcpu(dev->kvm, cpuid);
+	vgic = &dev->kvm->arch.vgic;
+
+	mmio.len = 4;
+	mmio.is_write = is_write;
+	if (is_write)
+		mmio_data_write(&mmio, ~0, *reg);
+	switch (attr->group) {
+	case KVM_DEV_ARM_VGIC_GRP_DIST_REGS:
+		mmio.phys_addr = vgic->vgic_dist_base + offset;
+		ranges = vgic_dist_ranges;
+		break;
+	case KVM_DEV_ARM_VGIC_GRP_CPU_REGS:
+		mmio.phys_addr = vgic->vgic_cpu_base + offset;
+		ranges = vgic_cpu_ranges;
+		break;
+	default:
+		BUG();
+	}
+	r = vgic_find_range(ranges, &mmio, offset);
+
+	if (unlikely(!r || !r->handle_mmio)) {
+		ret = -ENXIO;
+		goto out;
+	}
+
+
+	spin_lock(&vgic->lock);
+
+	/*
+	 * Ensure that no other VCPU is running by checking the vcpu->cpu
+	 * field.  If no other VPCUs are running we can safely access the VGIC
+	 * state, because even if another VPU is run after this point, that
+	 * VCPU will not touch the vgic state, because it will block on
+	 * getting the vgic->lock in kvm_vgic_sync_hwstate().
+	 */
+	kvm_for_each_vcpu(c, tmp_vcpu, dev->kvm) {
+		if (unlikely(tmp_vcpu->cpu != -1)) {
+			ret = -EBUSY;
+			goto out_vgic_unlock;
+		}
+	}
+
+	/*
+	 * Move all pending IRQs from the LRs on all VCPUs so the pending
+	 * state can be properly represented in the register state accessible
+	 * through this API.
+	 */
+	kvm_for_each_vcpu(c, tmp_vcpu, dev->kvm)
+		vgic_unqueue_irqs(tmp_vcpu);
+
+	offset -= r->base;
+	r->handle_mmio(vcpu, &mmio, offset);
+
+	if (!is_write)
+		*reg = mmio_data_read(&mmio, ~0);
+
+	ret = 0;
+out_vgic_unlock:
+	spin_unlock(&vgic->lock);
+out:
+	mutex_unlock(&dev->kvm->lock);
+	return ret;
+}
+
+static int vgic_v2_create(struct kvm_device *dev, u32 type)
+{
+	return kvm_vgic_create(dev->kvm, type);
+}
+
+static void vgic_v2_destroy(struct kvm_device *dev)
+{
+	kfree(dev);
+}
+
+static int vgic_v2_set_attr(struct kvm_device *dev,
+			    struct kvm_device_attr *attr)
+{
+	int ret;
+
+	ret = vgic_set_common_attr(dev, attr);
+	if (ret != -ENXIO)
+		return ret;
+
+	switch (attr->group) {
+	case KVM_DEV_ARM_VGIC_GRP_DIST_REGS:
+	case KVM_DEV_ARM_VGIC_GRP_CPU_REGS: {
+		u32 __user *uaddr = (u32 __user *)(long)attr->addr;
+		u32 reg;
+
+		if (get_user(reg, uaddr))
+			return -EFAULT;
+
+		return vgic_attr_regs_access(dev, attr, &reg, true);
+	}
+
+	}
+
+	return -ENXIO;
+}
+
+static int vgic_v2_get_attr(struct kvm_device *dev,
+			    struct kvm_device_attr *attr)
+{
+	int ret;
+
+	ret = vgic_get_common_attr(dev, attr);
+	if (ret != -ENXIO)
+		return ret;
+
+	switch (attr->group) {
+	case KVM_DEV_ARM_VGIC_GRP_DIST_REGS:
+	case KVM_DEV_ARM_VGIC_GRP_CPU_REGS: {
+		u32 __user *uaddr = (u32 __user *)(long)attr->addr;
+		u32 reg = 0;
+
+		ret = vgic_attr_regs_access(dev, attr, &reg, false);
+		if (ret)
+			return ret;
+		return put_user(reg, uaddr);
+	}
+
+	}
+
+	return -ENXIO;
+}
+
+static int vgic_v2_has_attr(struct kvm_device *dev,
+			    struct kvm_device_attr *attr)
+{
+	phys_addr_t offset;
+
+	switch (attr->group) {
+	case KVM_DEV_ARM_VGIC_GRP_ADDR:
+		switch (attr->attr) {
+		case KVM_VGIC_V2_ADDR_TYPE_DIST:
+		case KVM_VGIC_V2_ADDR_TYPE_CPU:
+			return 0;
+		}
+		break;
+	case KVM_DEV_ARM_VGIC_GRP_DIST_REGS:
+		offset = attr->attr & KVM_DEV_ARM_VGIC_OFFSET_MASK;
+		return vgic_has_attr_regs(vgic_dist_ranges, offset);
+	case KVM_DEV_ARM_VGIC_GRP_CPU_REGS:
+		offset = attr->attr & KVM_DEV_ARM_VGIC_OFFSET_MASK;
+		return vgic_has_attr_regs(vgic_cpu_ranges, offset);
+	case KVM_DEV_ARM_VGIC_GRP_NR_IRQS:
+		return 0;
+	}
+	return -ENXIO;
+}
+
+struct kvm_device_ops kvm_arm_vgic_v2_ops = {
+	.name = "kvm-arm-vgic-v2",
+	.create = vgic_v2_create,
+	.destroy = vgic_v2_destroy,
+	.set_attr = vgic_v2_set_attr,
+	.get_attr = vgic_v2_get_attr,
+	.has_attr = vgic_v2_has_attr,
+};
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 4fa58c9..421745d 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -81,8 +81,6 @@
 
 static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
 static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
-static u8 *vgic_get_sgi_sources(struct vgic_dist *dist, int vcpu_id, int sgi);
-static void vgic_dispatch_sgi(struct kvm_vcpu *vcpu, u32 reg);
 static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
 static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
 
@@ -421,41 +419,6 @@ void vgic_reg_access(struct kvm_exit_mmio *mmio, u32 *reg,
 	}
 }
 
-static bool handle_mmio_misc(struct kvm_vcpu *vcpu,
-			     struct kvm_exit_mmio *mmio, phys_addr_t offset)
-{
-	u32 reg;
-	u32 word_offset = offset & 3;
-
-	switch (offset & ~3) {
-	case 0:			/* GICD_CTLR */
-		reg = vcpu->kvm->arch.vgic.enabled;
-		vgic_reg_access(mmio, &reg, word_offset,
-				ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
-		if (mmio->is_write) {
-			vcpu->kvm->arch.vgic.enabled = reg & 1;
-			vgic_update_state(vcpu->kvm);
-			return true;
-		}
-		break;
-
-	case 4:			/* GICD_TYPER */
-		reg  = (atomic_read(&vcpu->kvm->online_vcpus) - 1) << 5;
-		reg |= (vcpu->kvm->arch.vgic.nr_irqs >> 5) - 1;
-		vgic_reg_access(mmio, &reg, word_offset,
-				ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
-		break;
-
-	case 8:			/* GICD_IIDR */
-		reg = (PRODUCT_ID_KVM << 24) | (IMPLEMENTER_ARM << 0);
-		vgic_reg_access(mmio, &reg, word_offset,
-				ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
-		break;
-	}
-
-	return false;
-}
-
 bool handle_mmio_raz_wi(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio,
 			phys_addr_t offset)
 {
@@ -486,22 +449,6 @@ bool vgic_handle_enable_reg(struct kvm *kvm, struct kvm_exit_mmio *mmio,
 	return false;
 }
 
-static bool handle_mmio_set_enable_reg(struct kvm_vcpu *vcpu,
-				       struct kvm_exit_mmio *mmio,
-				       phys_addr_t offset)
-{
-	return vgic_handle_enable_reg(vcpu->kvm, mmio, offset,
-				      vcpu->vcpu_id, ACCESS_WRITE_SETBIT);
-}
-
-static bool handle_mmio_clear_enable_reg(struct kvm_vcpu *vcpu,
-					 struct kvm_exit_mmio *mmio,
-					 phys_addr_t offset)
-{
-	return vgic_handle_enable_reg(vcpu->kvm, mmio, offset,
-				      vcpu->vcpu_id, ACCESS_WRITE_CLEARBIT);
-}
-
 bool vgic_handle_set_pending_reg(struct kvm *kvm,
 				 struct kvm_exit_mmio *mmio,
 				 phys_addr_t offset, int vcpu_id)
@@ -575,109 +522,6 @@ bool vgic_handle_clear_pending_reg(struct kvm *kvm,
 	return false;
 }
 
-static bool handle_mmio_set_pending_reg(struct kvm_vcpu *vcpu,
-					struct kvm_exit_mmio *mmio,
-					phys_addr_t offset)
-{
-	return vgic_handle_set_pending_reg(vcpu->kvm, mmio, offset,
-					   vcpu->vcpu_id);
-}
-
-static bool handle_mmio_clear_pending_reg(struct kvm_vcpu *vcpu,
-					  struct kvm_exit_mmio *mmio,
-					  phys_addr_t offset)
-{
-	return vgic_handle_clear_pending_reg(vcpu->kvm, mmio, offset,
-					     vcpu->vcpu_id);
-}
-
-static bool handle_mmio_priority_reg(struct kvm_vcpu *vcpu,
-				     struct kvm_exit_mmio *mmio,
-				     phys_addr_t offset)
-{
-	u32 *reg = vgic_bytemap_get_reg(&vcpu->kvm->arch.vgic.irq_priority,
-					vcpu->vcpu_id, offset);
-	vgic_reg_access(mmio, reg, offset,
-			ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
-	return false;
-}
-
-#define GICD_ITARGETSR_SIZE	32
-#define GICD_CPUTARGETS_BITS	8
-#define GICD_IRQS_PER_ITARGETSR	(GICD_ITARGETSR_SIZE / GICD_CPUTARGETS_BITS)
-static u32 vgic_get_target_reg(struct kvm *kvm, int irq)
-{
-	struct vgic_dist *dist = &kvm->arch.vgic;
-	int i;
-	u32 val = 0;
-
-	irq -= VGIC_NR_PRIVATE_IRQS;
-
-	for (i = 0; i < GICD_IRQS_PER_ITARGETSR; i++)
-		val |= 1 << (dist->irq_spi_cpu[irq + i] + i * 8);
-
-	return val;
-}
-
-static void vgic_set_target_reg(struct kvm *kvm, u32 val, int irq)
-{
-	struct vgic_dist *dist = &kvm->arch.vgic;
-	struct kvm_vcpu *vcpu;
-	int i, c;
-	unsigned long *bmap;
-	u32 target;
-
-	irq -= VGIC_NR_PRIVATE_IRQS;
-
-	/*
-	 * Pick the LSB in each byte. This ensures we target exactly
-	 * one vcpu per IRQ. If the byte is null, assume we target
-	 * CPU0.
-	 */
-	for (i = 0; i < GICD_IRQS_PER_ITARGETSR; i++) {
-		int shift = i * GICD_CPUTARGETS_BITS;
-		target = ffs((val >> shift) & 0xffU);
-		target = target ? (target - 1) : 0;
-		dist->irq_spi_cpu[irq + i] = target;
-		kvm_for_each_vcpu(c, vcpu, kvm) {
-			bmap = vgic_bitmap_get_shared_map(&dist->irq_spi_target[c]);
-			if (c == target)
-				set_bit(irq + i, bmap);
-			else
-				clear_bit(irq + i, bmap);
-		}
-	}
-}
-
-static bool handle_mmio_target_reg(struct kvm_vcpu *vcpu,
-				   struct kvm_exit_mmio *mmio,
-				   phys_addr_t offset)
-{
-	u32 reg;
-
-	/* We treat the banked interrupts targets as read-only */
-	if (offset < 32) {
-		u32 roreg = 1 << vcpu->vcpu_id;
-		roreg |= roreg << 8;
-		roreg |= roreg << 16;
-
-		vgic_reg_access(mmio, &roreg, offset,
-				ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
-		return false;
-	}
-
-	reg = vgic_get_target_reg(vcpu->kvm, offset & ~3U);
-	vgic_reg_access(mmio, &reg, offset,
-			ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
-	if (mmio->is_write) {
-		vgic_set_target_reg(vcpu->kvm, reg, offset & ~3U);
-		vgic_update_state(vcpu->kvm);
-		return true;
-	}
-
-	return false;
-}
-
 static u32 vgic_cfg_expand(u16 val)
 {
 	u32 res = 0;
@@ -745,39 +589,6 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio *mmio,
 	return false;
 }
 
-static bool handle_mmio_cfg_reg(struct kvm_vcpu *vcpu,
-				struct kvm_exit_mmio *mmio, phys_addr_t offset)
-{
-	u32 *reg;
-
-	reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_cfg,
-				  vcpu->vcpu_id, offset >> 1);
-
-	return vgic_handle_cfg_reg(reg, mmio, offset);
-}
-
-static bool handle_mmio_sgi_reg(struct kvm_vcpu *vcpu,
-				struct kvm_exit_mmio *mmio, phys_addr_t offset)
-{
-	u32 reg;
-	vgic_reg_access(mmio, &reg, offset,
-			ACCESS_READ_RAZ | ACCESS_WRITE_VALUE);
-	if (mmio->is_write) {
-		vgic_dispatch_sgi(vcpu, reg);
-		vgic_update_state(vcpu->kvm);
-		return true;
-	}
-
-	return false;
-}
-
-static void vgic_v2_add_sgi_source(struct kvm_vcpu *vcpu, int irq, int source)
-{
-	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
-
-	*vgic_get_sgi_sources(dist, vcpu->vcpu_id, irq) |= 1 << source;
-}
-
 /**
  * vgic_unqueue_irqs - move pending IRQs from LRs to the distributor
  * @vgic_cpu: Pointer to the vgic_cpu struct holding the LRs
@@ -838,168 +649,6 @@ void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
 	}
 }
 
-/* Handle reads of GICD_CPENDSGIRn and GICD_SPENDSGIRn */
-static bool read_set_clear_sgi_pend_reg(struct kvm_vcpu *vcpu,
-					struct kvm_exit_mmio *mmio,
-					phys_addr_t offset)
-{
-	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
-	int sgi;
-	int min_sgi = (offset & ~0x3);
-	int max_sgi = min_sgi + 3;
-	int vcpu_id = vcpu->vcpu_id;
-	u32 reg = 0;
-
-	/* Copy source SGIs from distributor side */
-	for (sgi = min_sgi; sgi <= max_sgi; sgi++) {
-		int shift = 8 * (sgi - min_sgi);
-		reg |= ((u32)*vgic_get_sgi_sources(dist, vcpu_id, sgi)) << shift;
-	}
-
-	mmio_data_write(mmio, ~0, reg);
-	return false;
-}
-
-static bool write_set_clear_sgi_pend_reg(struct kvm_vcpu *vcpu,
-					 struct kvm_exit_mmio *mmio,
-					 phys_addr_t offset, bool set)
-{
-	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
-	int sgi;
-	int min_sgi = (offset & ~0x3);
-	int max_sgi = min_sgi + 3;
-	int vcpu_id = vcpu->vcpu_id;
-	u32 reg;
-	bool updated = false;
-
-	reg = mmio_data_read(mmio, ~0);
-
-	/* Clear pending SGIs on the distributor */
-	for (sgi = min_sgi; sgi <= max_sgi; sgi++) {
-		u8 mask = reg >> (8 * (sgi - min_sgi));
-		u8 *src = vgic_get_sgi_sources(dist, vcpu_id, sgi);
-		if (set) {
-			if ((*src & mask) != mask)
-				updated = true;
-			*src |= mask;
-		} else {
-			if (*src & mask)
-				updated = true;
-			*src &= ~mask;
-		}
-	}
-
-	if (updated)
-		vgic_update_state(vcpu->kvm);
-
-	return updated;
-}
-
-static bool handle_mmio_sgi_set(struct kvm_vcpu *vcpu,
-				struct kvm_exit_mmio *mmio,
-				phys_addr_t offset)
-{
-	if (!mmio->is_write)
-		return read_set_clear_sgi_pend_reg(vcpu, mmio, offset);
-	else
-		return write_set_clear_sgi_pend_reg(vcpu, mmio, offset, true);
-}
-
-static bool handle_mmio_sgi_clear(struct kvm_vcpu *vcpu,
-				  struct kvm_exit_mmio *mmio,
-				  phys_addr_t offset)
-{
-	if (!mmio->is_write)
-		return read_set_clear_sgi_pend_reg(vcpu, mmio, offset);
-	else
-		return write_set_clear_sgi_pend_reg(vcpu, mmio, offset, false);
-}
-
-static const struct kvm_mmio_range vgic_dist_ranges[] = {
-	{
-		.base		= GIC_DIST_CTRL,
-		.len		= 12,
-		.bits_per_irq	= 0,
-		.handle_mmio	= handle_mmio_misc,
-	},
-	{
-		.base		= GIC_DIST_IGROUP,
-		.len		= VGIC_MAX_IRQS / 8,
-		.bits_per_irq	= 1,
-		.handle_mmio	= handle_mmio_raz_wi,
-	},
-	{
-		.base		= GIC_DIST_ENABLE_SET,
-		.len		= VGIC_MAX_IRQS / 8,
-		.bits_per_irq	= 1,
-		.handle_mmio	= handle_mmio_set_enable_reg,
-	},
-	{
-		.base		= GIC_DIST_ENABLE_CLEAR,
-		.len		= VGIC_MAX_IRQS / 8,
-		.bits_per_irq	= 1,
-		.handle_mmio	= handle_mmio_clear_enable_reg,
-	},
-	{
-		.base		= GIC_DIST_PENDING_SET,
-		.len		= VGIC_MAX_IRQS / 8,
-		.bits_per_irq	= 1,
-		.handle_mmio	= handle_mmio_set_pending_reg,
-	},
-	{
-		.base		= GIC_DIST_PENDING_CLEAR,
-		.len		= VGIC_MAX_IRQS / 8,
-		.bits_per_irq	= 1,
-		.handle_mmio	= handle_mmio_clear_pending_reg,
-	},
-	{
-		.base		= GIC_DIST_ACTIVE_SET,
-		.len		= VGIC_MAX_IRQS / 8,
-		.bits_per_irq	= 1,
-		.handle_mmio	= handle_mmio_raz_wi,
-	},
-	{
-		.base		= GIC_DIST_ACTIVE_CLEAR,
-		.len		= VGIC_MAX_IRQS / 8,
-		.bits_per_irq	= 1,
-		.handle_mmio	= handle_mmio_raz_wi,
-	},
-	{
-		.base		= GIC_DIST_PRI,
-		.len		= VGIC_MAX_IRQS,
-		.bits_per_irq	= 8,
-		.handle_mmio	= handle_mmio_priority_reg,
-	},
-	{
-		.base		= GIC_DIST_TARGET,
-		.len		= VGIC_MAX_IRQS,
-		.bits_per_irq	= 8,
-		.handle_mmio	= handle_mmio_target_reg,
-	},
-	{
-		.base		= GIC_DIST_CONFIG,
-		.len		= VGIC_MAX_IRQS / 4,
-		.bits_per_irq	= 2,
-		.handle_mmio	= handle_mmio_cfg_reg,
-	},
-	{
-		.base		= GIC_DIST_SOFTINT,
-		.len		= 4,
-		.handle_mmio	= handle_mmio_sgi_reg,
-	},
-	{
-		.base		= GIC_DIST_SGI_PENDING_CLEAR,
-		.len		= VGIC_NR_SGIS,
-		.handle_mmio	= handle_mmio_sgi_clear,
-	},
-	{
-		.base		= GIC_DIST_SGI_PENDING_SET,
-		.len		= VGIC_NR_SGIS,
-		.handle_mmio	= handle_mmio_sgi_set,
-	},
-	{}
-};
-
 const
 struct kvm_mmio_range *vgic_find_range(const struct kvm_mmio_range *ranges,
 				       struct kvm_exit_mmio *mmio,
@@ -1125,24 +774,6 @@ bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, struct kvm_run *run,
 	return true;
 }
 
-static bool vgic_v2_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
-				struct kvm_exit_mmio *mmio)
-{
-	unsigned long base = vcpu->kvm->arch.vgic.vgic_dist_base;
-
-	if (!is_in_range(mmio->phys_addr, mmio->len, base,
-			 KVM_VGIC_V2_DIST_SIZE))
-		return false;
-
-	/* GICv2 does not support accesses wider than 32 bits */
-	if (mmio->len > 4) {
-		kvm_inject_dabt(vcpu, mmio->phys_addr);
-		return true;
-	}
-
-	return vgic_handle_mmio_range(vcpu, run, mmio, vgic_dist_ranges, base);
-}
-
 /**
  * vgic_handle_mmio - handle an in-kernel MMIO access for the GIC emulation
  * @vcpu:      pointer to the vcpu performing the access
@@ -1167,52 +798,6 @@ bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
 	return vcpu->kvm->arch.vgic.vm_ops.handle_mmio(vcpu, run, mmio);
 }
 
-static u8 *vgic_get_sgi_sources(struct vgic_dist *dist, int vcpu_id, int sgi)
-{
-	return dist->irq_sgi_sources + vcpu_id * VGIC_NR_SGIS + sgi;
-}
-
-static void vgic_dispatch_sgi(struct kvm_vcpu *vcpu, u32 reg)
-{
-	struct kvm *kvm = vcpu->kvm;
-	struct vgic_dist *dist = &kvm->arch.vgic;
-	int nrcpus = atomic_read(&kvm->online_vcpus);
-	u8 target_cpus;
-	int sgi, mode, c, vcpu_id;
-
-	vcpu_id = vcpu->vcpu_id;
-
-	sgi = reg & 0xf;
-	target_cpus = (reg >> 16) & 0xff;
-	mode = (reg >> 24) & 3;
-
-	switch (mode) {
-	case 0:
-		if (!target_cpus)
-			return;
-		break;
-
-	case 1:
-		target_cpus = ((1 << nrcpus) - 1) & ~(1 << vcpu_id) & 0xff;
-		break;
-
-	case 2:
-		target_cpus = 1 << vcpu_id;
-		break;
-	}
-
-	kvm_for_each_vcpu(c, vcpu, kvm) {
-		if (target_cpus & 1) {
-			/* Flag the SGI as pending */
-			vgic_dist_irq_set_pending(vcpu, sgi);
-			*vgic_get_sgi_sources(dist, c, sgi) |= 1 << vcpu_id;
-			kvm_debug("SGI%d from CPU%d to CPU%d\n", sgi, vcpu_id, c);
-		}
-
-		target_cpus >>= 1;
-	}
-}
-
 static int vgic_nr_shared_irqs(struct vgic_dist *dist)
 {
 	return dist->nr_irqs - VGIC_NR_PRIVATE_IRQS;
@@ -1366,6 +951,7 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu)
 /*
  * Queue an interrupt to a CPU virtual interface. Return true on success,
  * or false if it wasn't possible to queue it.
+ * sgi_source must be zero for any non-SGI interrupts.
  */
 bool vgic_queue_irq(struct kvm_vcpu *vcpu, u8 sgi_source_id, int irq)
 {
@@ -1416,37 +1002,6 @@ bool vgic_queue_irq(struct kvm_vcpu *vcpu, u8 sgi_source_id, int irq)
 	return true;
 }
 
-static bool vgic_v2_queue_sgi(struct kvm_vcpu *vcpu, int irq)
-{
-	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
-	unsigned long sources;
-	int vcpu_id = vcpu->vcpu_id;
-	int c;
-
-	sources = *vgic_get_sgi_sources(dist, vcpu_id, irq);
-
-	for_each_set_bit(c, &sources, dist->nr_cpus) {
-		if (vgic_queue_irq(vcpu, c, irq))
-			clear_bit(c, &sources);
-	}
-
-	*vgic_get_sgi_sources(dist, vcpu_id, irq) = sources;
-
-	/*
-	 * If the sources bitmap has been cleared it means that we
-	 * could queue all the SGIs onto link registers (see the
-	 * clear_bit above), and therefore we are done with them in
-	 * our emulated gic and can get rid of them.
-	 */
-	if (!sources) {
-		vgic_dist_irq_clear_pending(vcpu, irq);
-		vgic_cpu_irq_clear(vcpu, irq);
-		return true;
-	}
-
-	return false;
-}
-
 static bool vgic_queue_hwirq(struct kvm_vcpu *vcpu, int irq)
 {
 	if (!vgic_can_sample_irq(vcpu, irq))
@@ -1964,32 +1519,6 @@ out:
 	return ret;
 }
 
-static int vgic_v2_init(struct kvm *kvm, const struct vgic_params *params)
-{
-	struct vgic_dist *dist = &kvm->arch.vgic;
-	int ret, i;
-
-	if (IS_VGIC_ADDR_UNDEF(dist->vgic_dist_base) ||
-	    IS_VGIC_ADDR_UNDEF(dist->vgic_cpu_base)) {
-		kvm_err("Need to set vgic distributor addresses first\n");
-		return -ENXIO;
-	}
-
-	ret = kvm_phys_addr_ioremap(kvm, dist->vgic_cpu_base,
-				    params->vcpu_base,
-				    KVM_VGIC_V2_CPU_SIZE, true);
-	if (ret) {
-		kvm_err("Unable to remap VGIC CPU to VCPU\n");
-		return ret;
-	}
-
-	/* Initialize the target VCPUs for each IRQ to VCPU 0 */
-	for (i = VGIC_NR_PRIVATE_IRQS; i < dist->nr_irqs; i += 4)
-		vgic_set_target_reg(kvm, 0, i);
-
-	return 0;
-}
-
 /**
  * kvm_vgic_init - Initialize global VGIC state before running any VCPUs
  * @kvm: pointer to the kvm struct
@@ -2033,20 +1562,6 @@ out:
 	return ret;
 }
 
-int vgic_v2_init_emulation(struct kvm *kvm)
-{
-	struct vgic_dist *dist = &kvm->arch.vgic;
-
-	dist->vm_ops.handle_mmio = vgic_v2_handle_mmio;
-	dist->vm_ops.queue_sgi = vgic_v2_queue_sgi;
-	dist->vm_ops.add_sgi_source = vgic_v2_add_sgi_source;
-	dist->vm_ops.vgic_init = vgic_v2_init;
-
-	kvm->arch.max_vcpus = 8;
-
-	return 0;
-}
-
 static int init_vgic_model(struct kvm *kvm, int type)
 {
 	int ret;
@@ -2205,188 +1720,6 @@ int kvm_vgic_addr(struct kvm *kvm, unsigned long type, u64 *addr, bool write)
 	return r;
 }
 
-static bool handle_cpu_mmio_misc(struct kvm_vcpu *vcpu,
-				 struct kvm_exit_mmio *mmio, phys_addr_t offset)
-{
-	bool updated = false;
-	struct vgic_vmcr vmcr;
-	u32 *vmcr_field;
-	u32 reg;
-
-	vgic_get_vmcr(vcpu, &vmcr);
-
-	switch (offset & ~0x3) {
-	case GIC_CPU_CTRL:
-		vmcr_field = &vmcr.ctlr;
-		break;
-	case GIC_CPU_PRIMASK:
-		vmcr_field = &vmcr.pmr;
-		break;
-	case GIC_CPU_BINPOINT:
-		vmcr_field = &vmcr.bpr;
-		break;
-	case GIC_CPU_ALIAS_BINPOINT:
-		vmcr_field = &vmcr.abpr;
-		break;
-	default:
-		BUG();
-	}
-
-	if (!mmio->is_write) {
-		reg = *vmcr_field;
-		mmio_data_write(mmio, ~0, reg);
-	} else {
-		reg = mmio_data_read(mmio, ~0);
-		if (reg != *vmcr_field) {
-			*vmcr_field = reg;
-			vgic_set_vmcr(vcpu, &vmcr);
-			updated = true;
-		}
-	}
-	return updated;
-}
-
-static bool handle_mmio_abpr(struct kvm_vcpu *vcpu,
-			     struct kvm_exit_mmio *mmio, phys_addr_t offset)
-{
-	return handle_cpu_mmio_misc(vcpu, mmio, GIC_CPU_ALIAS_BINPOINT);
-}
-
-static bool handle_cpu_mmio_ident(struct kvm_vcpu *vcpu,
-				  struct kvm_exit_mmio *mmio,
-				  phys_addr_t offset)
-{
-	u32 reg;
-
-	if (mmio->is_write)
-		return false;
-
-	/* GICC_IIDR */
-	reg = (PRODUCT_ID_KVM << 20) |
-	      (GICC_ARCH_VERSION_V2 << 16) |
-	      (IMPLEMENTER_ARM << 0);
-	mmio_data_write(mmio, ~0, reg);
-	return false;
-}
-
-/*
- * CPU Interface Register accesses - these are not accessed by the VM, but by
- * user space for saving and restoring VGIC state.
- */
-static const struct kvm_mmio_range vgic_cpu_ranges[] = {
-	{
-		.base		= GIC_CPU_CTRL,
-		.len		= 12,
-		.handle_mmio	= handle_cpu_mmio_misc,
-	},
-	{
-		.base		= GIC_CPU_ALIAS_BINPOINT,
-		.len		= 4,
-		.handle_mmio	= handle_mmio_abpr,
-	},
-	{
-		.base		= GIC_CPU_ACTIVEPRIO,
-		.len		= 16,
-		.handle_mmio	= handle_mmio_raz_wi,
-	},
-	{
-		.base		= GIC_CPU_IDENT,
-		.len		= 4,
-		.handle_mmio	= handle_cpu_mmio_ident,
-	},
-};
-
-static int vgic_attr_regs_access(struct kvm_device *dev,
-				 struct kvm_device_attr *attr,
-				 u32 *reg, bool is_write)
-{
-	const struct kvm_mmio_range *r = NULL, *ranges;
-	phys_addr_t offset;
-	int ret, cpuid, c;
-	struct kvm_vcpu *vcpu, *tmp_vcpu;
-	struct vgic_dist *vgic;
-	struct kvm_exit_mmio mmio;
-
-	offset = attr->attr & KVM_DEV_ARM_VGIC_OFFSET_MASK;
-	cpuid = (attr->attr & KVM_DEV_ARM_VGIC_CPUID_MASK) >>
-		KVM_DEV_ARM_VGIC_CPUID_SHIFT;
-
-	mutex_lock(&dev->kvm->lock);
-
-	ret = vgic_init_maps(dev->kvm);
-	if (ret)
-		goto out;
-
-	if (cpuid >= atomic_read(&dev->kvm->online_vcpus)) {
-		ret = -EINVAL;
-		goto out;
-	}
-
-	vcpu = kvm_get_vcpu(dev->kvm, cpuid);
-	vgic = &dev->kvm->arch.vgic;
-
-	mmio.len = 4;
-	mmio.is_write = is_write;
-	if (is_write)
-		mmio_data_write(&mmio, ~0, *reg);
-	switch (attr->group) {
-	case KVM_DEV_ARM_VGIC_GRP_DIST_REGS:
-		mmio.phys_addr = vgic->vgic_dist_base + offset;
-		ranges = vgic_dist_ranges;
-		break;
-	case KVM_DEV_ARM_VGIC_GRP_CPU_REGS:
-		mmio.phys_addr = vgic->vgic_cpu_base + offset;
-		ranges = vgic_cpu_ranges;
-		break;
-	default:
-		BUG();
-	}
-	r = vgic_find_range(ranges, &mmio, offset);
-
-	if (unlikely(!r || !r->handle_mmio)) {
-		ret = -ENXIO;
-		goto out;
-	}
-
-
-	spin_lock(&vgic->lock);
-
-	/*
-	 * Ensure that no other VCPU is running by checking the vcpu->cpu
-	 * field.  If no other VPCUs are running we can safely access the VGIC
-	 * state, because even if another VPU is run after this point, that
-	 * VCPU will not touch the vgic state, because it will block on
-	 * getting the vgic->lock in kvm_vgic_sync_hwstate().
-	 */
-	kvm_for_each_vcpu(c, tmp_vcpu, dev->kvm) {
-		if (unlikely(tmp_vcpu->cpu != -1)) {
-			ret = -EBUSY;
-			goto out_vgic_unlock;
-		}
-	}
-
-	/*
-	 * Move all pending IRQs from the LRs on all VCPUs so the pending
-	 * state can be properly represented in the register state accessible
-	 * through this API.
-	 */
-	kvm_for_each_vcpu(c, tmp_vcpu, dev->kvm)
-		vgic_unqueue_irqs(tmp_vcpu);
-
-	offset -= r->base;
-	r->handle_mmio(vcpu, &mmio, offset);
-
-	if (!is_write)
-		*reg = mmio_data_read(&mmio, ~0);
-
-	ret = 0;
-out_vgic_unlock:
-	spin_unlock(&vgic->lock);
-out:
-	mutex_unlock(&dev->kvm->lock);
-	return ret;
-}
-
 int vgic_set_common_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
 {
 	int r;
@@ -2439,31 +1772,6 @@ int vgic_set_common_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
 	return -ENXIO;
 }
 
-static int vgic_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
-{
-	int ret;
-
-	ret = vgic_set_common_attr(dev, attr);
-	if (ret != -ENXIO)
-		return ret;
-
-	switch (attr->group) {
-	case KVM_DEV_ARM_VGIC_GRP_DIST_REGS:
-	case KVM_DEV_ARM_VGIC_GRP_CPU_REGS: {
-		u32 __user *uaddr = (u32 __user *)(long)attr->addr;
-		u32 reg;
-
-		if (get_user(reg, uaddr))
-			return -EFAULT;
-
-		return vgic_attr_regs_access(dev, attr, &reg, true);
-	}
-
-	}
-
-	return -ENXIO;
-}
-
 int vgic_get_common_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
 {
 	int r = -ENXIO;
@@ -2494,31 +1802,6 @@ int vgic_get_common_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
 	return r;
 }
 
-static int vgic_get_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
-{
-	int ret;
-
-	ret = vgic_get_common_attr(dev, attr);
-	if (ret != -ENXIO)
-		return ret;
-
-	switch (attr->group) {
-	case KVM_DEV_ARM_VGIC_GRP_DIST_REGS:
-	case KVM_DEV_ARM_VGIC_GRP_CPU_REGS: {
-		u32 __user *uaddr = (u32 __user *)(long)attr->addr;
-		u32 reg = 0;
-
-		ret = vgic_attr_regs_access(dev, attr, &reg, false);
-		if (ret)
-			return ret;
-		return put_user(reg, uaddr);
-	}
-
-	}
-
-	return -ENXIO;
-}
-
 int vgic_has_attr_regs(const struct kvm_mmio_range *ranges, phys_addr_t offset)
 {
 	struct kvm_exit_mmio dev_attr_mmio;
@@ -2530,49 +1813,6 @@ int vgic_has_attr_regs(const struct kvm_mmio_range *ranges, phys_addr_t offset)
 		return -ENXIO;
 }
 
-static int vgic_has_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
-{
-	phys_addr_t offset;
-
-	switch (attr->group) {
-	case KVM_DEV_ARM_VGIC_GRP_ADDR:
-		switch (attr->attr) {
-		case KVM_VGIC_V2_ADDR_TYPE_DIST:
-		case KVM_VGIC_V2_ADDR_TYPE_CPU:
-			return 0;
-		}
-		break;
-	case KVM_DEV_ARM_VGIC_GRP_DIST_REGS:
-		offset = attr->attr & KVM_DEV_ARM_VGIC_OFFSET_MASK;
-		return vgic_has_attr_regs(vgic_dist_ranges, offset);
-	case KVM_DEV_ARM_VGIC_GRP_CPU_REGS:
-		offset = attr->attr & KVM_DEV_ARM_VGIC_OFFSET_MASK;
-		return vgic_has_attr_regs(vgic_cpu_ranges, offset);
-	case KVM_DEV_ARM_VGIC_GRP_NR_IRQS:
-		return 0;
-	}
-	return -ENXIO;
-}
-
-void vgic_destroy(struct kvm_device *dev)
-{
-	kfree(dev);
-}
-
-int vgic_create(struct kvm_device *dev, u32 type)
-{
-	return kvm_vgic_create(dev->kvm, type);
-}
-
-struct kvm_device_ops kvm_arm_vgic_v2_ops = {
-	.name = "kvm-arm-vgic",
-	.create = vgic_create,
-	.destroy = vgic_destroy,
-	.set_attr = vgic_set_attr,
-	.get_attr = vgic_get_attr,
-	.has_attr = vgic_has_attr,
-};
-
 static void vgic_init_maintenance_interrupt(void *info)
 {
 	enable_percpu_irq(vgic->maint_irq, 0);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v4 14/19] arm/arm64: KVM: add opaque private pointer to MMIO data
  2014-11-14 10:07 [PATCH v4 00/19] KVM GICv3 emulation Andre Przywara
                   ` (12 preceding siblings ...)
  2014-11-14 10:07 ` [PATCH v4 13/19] arm/arm64: KVM: split GICv2 specific emulation code from vgic.c Andre Przywara
@ 2014-11-14 10:07 ` Andre Przywara
  2014-11-23 13:33   ` Christoffer Dall
  2014-11-14 10:07 ` [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation Andre Przywara
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 80+ messages in thread
From: Andre Przywara @ 2014-11-14 10:07 UTC (permalink / raw)
  To: linux-arm-kernel

For a GICv2 there is always only one (v)CPU involved: the one that
does the access. On a GICv3 the access to a CPU redistributor is
memory-mapped, but not banked, so the (v)CPU affected is determined by
looking at the MMIO address region being accessed.
To allow passing the affected CPU into the accessors later, extend
struct kvm_exit_mmio to add an opaque private pointer parameter.
The current GICv2 emulation just does not use it.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
Changelog v3...v4:
- moving private parameter from each accessor into struct kvm_exit_mmio

 arch/arm/include/asm/kvm_mmio.h   |    1 +
 arch/arm64/include/asm/kvm_mmio.h |    1 +
 virt/kvm/arm/vgic.c               |    1 +
 3 files changed, 3 insertions(+)

diff --git a/arch/arm/include/asm/kvm_mmio.h b/arch/arm/include/asm/kvm_mmio.h
index adcc0d7..3f83db2 100644
--- a/arch/arm/include/asm/kvm_mmio.h
+++ b/arch/arm/include/asm/kvm_mmio.h
@@ -37,6 +37,7 @@ struct kvm_exit_mmio {
 	u8		data[8];
 	u32		len;
 	bool		is_write;
+	void		*private;
 };
 
 static inline void kvm_prepare_mmio(struct kvm_run *run,
diff --git a/arch/arm64/include/asm/kvm_mmio.h b/arch/arm64/include/asm/kvm_mmio.h
index fc2f689..9f52beb 100644
--- a/arch/arm64/include/asm/kvm_mmio.h
+++ b/arch/arm64/include/asm/kvm_mmio.h
@@ -40,6 +40,7 @@ struct kvm_exit_mmio {
 	u8		data[8];
 	u32		len;
 	bool		is_write;
+	void		*private;
 };
 
 static inline void kvm_prepare_mmio(struct kvm_run *run,
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 421745d..335ffe0 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -709,6 +709,7 @@ static bool call_range_handler(struct kvm_vcpu *vcpu,
 
 	mmio32.len = 4;
 	mmio32.is_write = mmio->is_write;
+	mmio32.private = mmio->private;
 
 	mmio32.phys_addr = mmio->phys_addr + 4;
 	if (mmio->is_write)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation
  2014-11-14 10:07 [PATCH v4 00/19] KVM GICv3 emulation Andre Przywara
                   ` (13 preceding siblings ...)
  2014-11-14 10:07 ` [PATCH v4 14/19] arm/arm64: KVM: add opaque private pointer to MMIO data Andre Przywara
@ 2014-11-14 10:07 ` Andre Przywara
  2014-11-14 11:07   ` Christoffer Dall
                     ` (2 more replies)
  2014-11-14 10:08 ` [PATCH v4 16/19] arm64: GICv3: introduce symbolic names for GICv3 ICC_SGI1R_EL1 fields Andre Przywara
                   ` (4 subsequent siblings)
  19 siblings, 3 replies; 80+ messages in thread
From: Andre Przywara @ 2014-11-14 10:07 UTC (permalink / raw)
  To: linux-arm-kernel

With everything separated and prepared, we implement a model of a
GICv3 distributor and redistributors by using the existing framework
to provide handler functions for each register group.

Currently we limit the emulation to a model enforcing a single
security state, with SRE==1 (forcing system register access) and
ARE==1 (allowing more than 8 VCPUs).

We share some of the functions provided for GICv2 emulation, but take
the different ways of addressing (v)CPUs into account.
Save and restore is currently not implemented.

Similar to the split-off of the GICv2 specific code, the new emulation
code goes into a new file (vgic-v3-emul.c).

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
Changelog v3...v4:
- remove ICC_SGI1R_EL1 register handling (moved into later patch)
- add definitions for single security state
- document emulation limitations in vgic-v3-emul.c header
- move CTLR, TYPER and IIDR handling into separate functions
- add RAO/WI handling for IGROUPRn registers
- remove unneeded offset masking on calling vgic_reg_access()
- rework handle_mmio_route_reg() to only handle SPIs
- refine IROUTERn register range
- use non-atomic bitops functions (__clear_bit() and __set_bit())
- rename vgic_dist_ranges[] to vgic_v3_dist_ranges[]
- add (RAZ/WI) implementation of GICD_STATUSR
- add (RAZ/WI) implementations of MBI registers
- adapt to new private passing (in struct kvm_exit_mmio instead of a paramter)
- fix vcpu_id calculation bug in handle CFG registers
- always use hexadecimal numbers for .len member
- simplify vgic_v3_handle_mmio()
- add vgic_v3_create() and vgic_v3_destroy()
- swap vgic_v3_[sg]et_attr() code location
- add and improve comments
- (adaptions to changes from earlier patches)

 arch/arm64/kvm/Makefile            |    1 +
 include/kvm/arm_vgic.h             |    9 +-
 include/linux/irqchip/arm-gic-v3.h |   32 ++
 include/linux/kvm_host.h           |    1 +
 include/uapi/linux/kvm.h           |    2 +
 virt/kvm/arm/vgic-v3-emul.c        |  904 ++++++++++++++++++++++++++++++++++++
 virt/kvm/arm/vgic.c                |   11 +-
 virt/kvm/arm/vgic.h                |    3 +
 8 files changed, 960 insertions(+), 3 deletions(-)
 create mode 100644 virt/kvm/arm/vgic-v3-emul.c

diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index d957353..4e6e09e 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -24,5 +24,6 @@ kvm-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v2.o
 kvm-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v2-emul.o
 kvm-$(CONFIG_KVM_ARM_VGIC) += vgic-v2-switch.o
 kvm-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v3.o
+kvm-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v3-emul.o
 kvm-$(CONFIG_KVM_ARM_VGIC) += vgic-v3-switch.o
 kvm-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 421833f..c1ef5a9 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -160,7 +160,11 @@ struct vgic_dist {
 
 	/* Distributor and vcpu interface mapping in the guest */
 	phys_addr_t		vgic_dist_base;
-	phys_addr_t		vgic_cpu_base;
+	/* GICv2 and GICv3 use different mapped register blocks */
+	union {
+		phys_addr_t		vgic_cpu_base;
+		phys_addr_t		vgic_redist_base;
+	};
 
 	/* Distributor enabled */
 	u32			enabled;
@@ -222,6 +226,9 @@ struct vgic_dist {
 	 */
 	struct vgic_bitmap	*irq_spi_target;
 
+	/* Target MPIDR for each IRQ (needed for GICv3 IROUTERn) only */
+	u32			*irq_spi_mpidr;
+
 	/* Bitmap indicating which CPU has something pending */
 	unsigned long		*irq_pending_on_cpu;
 
diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
index 03a4ea3..726d898 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -33,6 +33,7 @@
 #define GICD_SETSPI_SR			0x0050
 #define GICD_CLRSPI_SR			0x0058
 #define GICD_SEIR			0x0068
+#define GICD_IGROUPR			0x0080
 #define GICD_ISENABLER			0x0100
 #define GICD_ICENABLER			0x0180
 #define GICD_ISPENDR			0x0200
@@ -41,14 +42,37 @@
 #define GICD_ICACTIVER			0x0380
 #define GICD_IPRIORITYR			0x0400
 #define GICD_ICFGR			0x0C00
+#define GICD_IGRPMODR			0x0D00
+#define GICD_NSACR			0x0E00
 #define GICD_IROUTER			0x6000
+#define GICD_IDREGS			0xFFD0
 #define GICD_PIDR2			0xFFE8
 
+/*
+ * Those registers are actually from GICv2, but the spec demands that they
+ * are implemented as RES0 if ARE is 1 (which we do in KVM's emulated GICv3).
+ */
+#define GICD_ITARGETSR			0x0800
+#define GICD_SGIR			0x0F00
+#define GICD_CPENDSGIR			0x0F10
+#define GICD_SPENDSGIR			0x0F20
+
 #define GICD_CTLR_RWP			(1U << 31)
+#define GICD_CTLR_DS			(1U << 6)
 #define GICD_CTLR_ARE_NS		(1U << 4)
 #define GICD_CTLR_ENABLE_G1A		(1U << 1)
 #define GICD_CTLR_ENABLE_G1		(1U << 0)
 
+/*
+ * In systems with a single security state (what we emulate in KVM)
+ * the meaning of the interrupt group enable bits is slightly different
+ */
+#define GICD_CTLR_ENABLE_SS_G1		(1U << 1)
+#define GICD_CTLR_ENABLE_SS_G0		(1U << 0)
+
+#define GICD_TYPER_LPIS			(1U << 17)
+#define GICD_TYPER_MBIS			(1U << 16)
+
 #define GICD_IROUTER_SPI_MODE_ONE	(0U << 31)
 #define GICD_IROUTER_SPI_MODE_ANY	(1U << 31)
 
@@ -56,6 +80,8 @@
 #define GIC_PIDR2_ARCH_GICv3		0x30
 #define GIC_PIDR2_ARCH_GICv4		0x40
 
+#define GIC_V3_DIST_SIZE		0x10000
+
 /*
  * Re-Distributor registers, offsets from RD_base
  */
@@ -74,6 +100,7 @@
 #define GICR_SYNCR			0x00C0
 #define GICR_MOVLPIR			0x0100
 #define GICR_MOVALLR			0x0110
+#define GICR_IDREGS			GICD_IDREGS
 #define GICR_PIDR2			GICD_PIDR2
 
 #define GICR_WAKER_ProcessorSleep	(1U << 1)
@@ -82,6 +109,7 @@
 /*
  * Re-Distributor registers, offsets from SGI_base
  */
+#define GICR_IGROUPR0			GICD_IGROUPR
 #define GICR_ISENABLER0			GICD_ISENABLER
 #define GICR_ICENABLER0			GICD_ICENABLER
 #define GICR_ISPENDR0			GICD_ISPENDR
@@ -90,10 +118,14 @@
 #define GICR_ICACTIVER0			GICD_ICACTIVER
 #define GICR_IPRIORITYR0		GICD_IPRIORITYR
 #define GICR_ICFGR0			GICD_ICFGR
+#define GICR_IGRPMODR0			GICD_IGRPMODR
+#define GICR_NSACR			GICD_NSACR
 
 #define GICR_TYPER_VLPIS		(1U << 1)
 #define GICR_TYPER_LAST			(1U << 4)
 
+#define GIC_V3_REDIST_SIZE		0x20000
+
 /*
  * CPU interface registers
  */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 326ba7a..4a7798e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1085,6 +1085,7 @@ void kvm_unregister_device_ops(u32 type);
 extern struct kvm_device_ops kvm_mpic_ops;
 extern struct kvm_device_ops kvm_xics_ops;
 extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
+extern struct kvm_device_ops kvm_arm_vgic_v3_ops;
 
 #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6076882..24cb129 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -960,6 +960,8 @@ enum kvm_device_type {
 #define KVM_DEV_TYPE_ARM_VGIC_V2	KVM_DEV_TYPE_ARM_VGIC_V2
 	KVM_DEV_TYPE_FLIC,
 #define KVM_DEV_TYPE_FLIC		KVM_DEV_TYPE_FLIC
+	KVM_DEV_TYPE_ARM_VGIC_V3,
+#define KVM_DEV_TYPE_ARM_VGIC_V3	KVM_DEV_TYPE_ARM_VGIC_V3
 	KVM_DEV_TYPE_MAX,
 };
 
diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
new file mode 100644
index 0000000..97b5801
--- /dev/null
+++ b/virt/kvm/arm/vgic-v3-emul.c
@@ -0,0 +1,904 @@
+/*
+ * GICv3 distributor and redistributor emulation
+ *
+ * GICv3 emulation is currently only supported on a GICv3 host (because
+ * we rely on the hardware's CPU interface virtualization support), but
+ * supports both hardware with or without the optional GICv2 backwards
+ * compatibility features.
+ *
+ * Limitations of the emulation:
+ * (RAZ/WI: read as zero, write ignore, RAO/WI: read as one, write ignore)
+ * - We do not support LPIs (yet). TYPER.LPIS is reported as 0 and is RAZ/WI.
+ * - We do not support the message based interrupts (MBIs) triggered by
+ *   writes to the GICD_{SET,CLR}SPI_* registers. TYPER.MBIS is reported as 0.
+ * - We do not support the (optional) backwards compatibility feature.
+ *   GICD_CTLR.ARE resets to 1 and is RAO/WI. If the _host_ GIC supports
+ *   the compatiblity feature, you can use a GICv2 in the guest, though.
+ * - We only support a single security state. GICD_CTLR.DS is 1 and is RAO/WI.
+ * - Priorities are not emulated (same as the GICv2 emulation). Linux
+ *   as a guest is fine with this, because it does not use priorities.
+ * - We only support Group1 interrupts. Again Linux uses only those.
+ *
+ * Copyright (C) 2014 ARM Ltd.
+ * Author: Andre Przywara <andre.przywara@arm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/cpu.h>
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+#include <linux/interrupt.h>
+
+#include <linux/irqchip/arm-gic-v3.h>
+#include <kvm/arm_vgic.h>
+
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_mmu.h>
+
+#include "vgic.h"
+
+static bool handle_mmio_rao_wi(struct kvm_vcpu *vcpu,
+			       struct kvm_exit_mmio *mmio, phys_addr_t offset)
+{
+	u32 reg = 0xffffffff;
+
+	vgic_reg_access(mmio, &reg, offset,
+			ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
+
+	return false;
+}
+
+static bool handle_mmio_ctlr(struct kvm_vcpu *vcpu,
+			     struct kvm_exit_mmio *mmio, phys_addr_t offset)
+{
+	u32 reg = 0;
+
+	/*
+	 * Force ARE and DS to 1, the guest cannot change this.
+	 * For the time being we only support Group1 interrupts.
+	 */
+	if (vcpu->kvm->arch.vgic.enabled)
+		reg = GICD_CTLR_ENABLE_SS_G1;
+	reg |= GICD_CTLR_ARE_NS | GICD_CTLR_DS;
+
+	vgic_reg_access(mmio, &reg, offset,
+			ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
+	if (mmio->is_write) {
+		if (reg & GICD_CTLR_ENABLE_SS_G0)
+			kvm_info("guest tried to enable unsupported Group0 interrupts\n");
+		vcpu->kvm->arch.vgic.enabled = !!(reg & GICD_CTLR_ENABLE_SS_G1);
+		vgic_update_state(vcpu->kvm);
+		return true;
+	}
+	return false;
+}
+
+/*
+ * As this implementation does not provide compatibility
+ * with GICv2 (ARE==1), we report zero CPUs in bits [5..7].
+ * Also LPIs and MBIs are not supported, so we set the respective bits to 0.
+ * Also we report at most 2**10=1024 interrupt IDs (to match 1024 SPIs).
+ */
+#define INTERRUPT_ID_BITS 10
+static bool handle_mmio_typer(struct kvm_vcpu *vcpu,
+			      struct kvm_exit_mmio *mmio, phys_addr_t offset)
+{
+	u32 reg;
+
+	/* we report at most 1024 IRQs via this interface */
+	reg = (min(vcpu->kvm->arch.vgic.nr_irqs, 1024) >> 5) - 1;
+
+	reg |= (INTERRUPT_ID_BITS - 1) << 19;
+
+	vgic_reg_access(mmio, &reg, offset,
+			ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
+
+	return false;
+}
+
+static bool handle_mmio_iidr(struct kvm_vcpu *vcpu,
+			     struct kvm_exit_mmio *mmio, phys_addr_t offset)
+{
+	u32 reg;
+
+	reg = (PRODUCT_ID_KVM << 24) | (IMPLEMENTER_ARM << 0);
+	vgic_reg_access(mmio, &reg, offset,
+			ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
+
+	return false;
+}
+
+static bool handle_mmio_set_enable_reg_dist(struct kvm_vcpu *vcpu,
+					    struct kvm_exit_mmio *mmio,
+					    phys_addr_t offset)
+{
+	if (likely(offset >= VGIC_NR_PRIVATE_IRQS / 8))
+		return vgic_handle_enable_reg(vcpu->kvm, mmio, offset,
+					      vcpu->vcpu_id,
+					      ACCESS_WRITE_SETBIT);
+
+	vgic_reg_access(mmio, NULL, offset,
+			ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
+	return false;
+}
+
+static bool handle_mmio_clear_enable_reg_dist(struct kvm_vcpu *vcpu,
+					      struct kvm_exit_mmio *mmio,
+					      phys_addr_t offset)
+{
+	if (likely(offset >= VGIC_NR_PRIVATE_IRQS / 8))
+		return vgic_handle_enable_reg(vcpu->kvm, mmio, offset,
+					      vcpu->vcpu_id,
+					      ACCESS_WRITE_CLEARBIT);
+
+	vgic_reg_access(mmio, NULL, offset,
+			ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
+	return false;
+}
+
+static bool handle_mmio_set_pending_reg_dist(struct kvm_vcpu *vcpu,
+					     struct kvm_exit_mmio *mmio,
+					     phys_addr_t offset)
+{
+	if (likely(offset >= VGIC_NR_PRIVATE_IRQS / 8))
+		return vgic_handle_set_pending_reg(vcpu->kvm, mmio, offset,
+						   vcpu->vcpu_id);
+
+	vgic_reg_access(mmio, NULL, offset,
+			ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
+	return false;
+}
+
+static bool handle_mmio_clear_pending_reg_dist(struct kvm_vcpu *vcpu,
+					       struct kvm_exit_mmio *mmio,
+					       phys_addr_t offset)
+{
+	if (likely(offset >= VGIC_NR_PRIVATE_IRQS / 8))
+		return vgic_handle_clear_pending_reg(vcpu->kvm, mmio, offset,
+						     vcpu->vcpu_id);
+
+	vgic_reg_access(mmio, NULL, offset,
+			ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
+	return false;
+}
+
+static bool handle_mmio_priority_reg_dist(struct kvm_vcpu *vcpu,
+					  struct kvm_exit_mmio *mmio,
+					  phys_addr_t offset)
+{
+	u32 *reg;
+
+	if (unlikely(offset < VGIC_NR_PRIVATE_IRQS)) {
+		vgic_reg_access(mmio, NULL, offset,
+				ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
+		return false;
+	}
+
+	reg = vgic_bytemap_get_reg(&vcpu->kvm->arch.vgic.irq_priority,
+				   vcpu->vcpu_id, offset);
+	vgic_reg_access(mmio, reg, offset,
+		ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
+	return false;
+}
+
+static bool handle_mmio_cfg_reg_dist(struct kvm_vcpu *vcpu,
+				     struct kvm_exit_mmio *mmio,
+				     phys_addr_t offset)
+{
+	u32 *reg;
+
+	if (unlikely(offset < VGIC_NR_PRIVATE_IRQS / 4)) {
+		vgic_reg_access(mmio, NULL, offset,
+				ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
+		return false;
+	}
+
+	reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_cfg,
+				  vcpu->vcpu_id, offset >> 1);
+
+	return vgic_handle_cfg_reg(reg, mmio, offset);
+}
+
+/*
+ * We use a compressed version of the MPIDR (all 32 bits in one 32-bit word)
+ * when we store the target MPIDR written by the guest.
+ */
+static u32 compress_mpidr(unsigned long mpidr)
+{
+	u32 ret;
+
+	ret = MPIDR_AFFINITY_LEVEL(mpidr, 0);
+	ret |= MPIDR_AFFINITY_LEVEL(mpidr, 1) << 8;
+	ret |= MPIDR_AFFINITY_LEVEL(mpidr, 2) << 16;
+	ret |= MPIDR_AFFINITY_LEVEL(mpidr, 3) << 24;
+
+	return ret;
+}
+
+static unsigned long uncompress_mpidr(u32 value)
+{
+	unsigned long mpidr;
+
+	mpidr  = ((value >>  0) & 0xFF) << MPIDR_LEVEL_SHIFT(0);
+	mpidr |= ((value >>  8) & 0xFF) << MPIDR_LEVEL_SHIFT(1);
+	mpidr |= ((value >> 16) & 0xFF) << MPIDR_LEVEL_SHIFT(2);
+	mpidr |= (u64)((value >> 24) & 0xFF) << MPIDR_LEVEL_SHIFT(3);
+
+	return mpidr;
+}
+
+/*
+ * Lookup the given MPIDR value to get the vcpu_id (if there is one)
+ * and store that in the irq_spi_cpu[] array.
+ * This limits the number of VCPUs to 255 for now, extending the data
+ * type (or storing kvm_vcpu poiners) should lift the limit.
+ * Store the original MPIDR value in an extra array to support read-as-written.
+ * Unallocated MPIDRs are translated to a special value and caught
+ * before any array accesses.
+ */
+static bool handle_mmio_route_reg(struct kvm_vcpu *vcpu,
+				  struct kvm_exit_mmio *mmio,
+				  phys_addr_t offset)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct vgic_dist *dist = &kvm->arch.vgic;
+	int spi;
+	u32 reg;
+	int vcpu_id;
+	unsigned long *bmap, mpidr;
+
+	/*
+	 * The upper 32 bits of each 64 bit register are zero,
+	 * as we don't support Aff3.
+	 */
+	if ((offset & 4)) {
+		vgic_reg_access(mmio, NULL, offset,
+				ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
+		return false;
+	}
+
+	/* This region only covers SPIs, so no handling of private IRQs here. */
+	spi = offset / 8;
+
+	/* get the stored MPIDR for this IRQ */
+	mpidr = uncompress_mpidr(dist->irq_spi_mpidr[spi]);
+	mpidr &= MPIDR_HWID_BITMASK;
+	reg = mpidr;
+
+	vgic_reg_access(mmio, &reg, offset,
+			ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
+
+	if (!mmio->is_write)
+		return false;
+
+	/*
+	 * Now clear the currently assigned vCPU from the map, making room
+	 * for the new one to be written below
+	 */
+	vcpu = kvm_mpidr_to_vcpu(kvm, mpidr);
+	if (likely(vcpu)) {
+		vcpu_id = vcpu->vcpu_id;
+		bmap = vgic_bitmap_get_shared_map(&dist->irq_spi_target[vcpu_id]);
+		__clear_bit(spi, bmap);
+	}
+
+	dist->irq_spi_mpidr[spi] = compress_mpidr(reg);
+	vcpu = kvm_mpidr_to_vcpu(kvm, reg & MPIDR_HWID_BITMASK);
+
+	/*
+	 * The spec says that non-existent MPIDR values should not be
+	 * forwarded to any existent (v)CPU, but should be able to become
+	 * pending anyway. We simply keep the irq_spi_target[] array empty, so
+	 * the interrupt will never be injected.
+	 * irq_spi_cpu[irq] gets a magic value in this case.
+	 */
+	if (likely(vcpu)) {
+		vcpu_id = vcpu->vcpu_id;
+		dist->irq_spi_cpu[spi] = vcpu_id;
+		bmap = vgic_bitmap_get_shared_map(&dist->irq_spi_target[vcpu_id]);
+		__set_bit(spi, bmap);
+	} else {
+		dist->irq_spi_cpu[spi] = VCPU_NOT_ALLOCATED;
+	}
+
+	vgic_update_state(kvm);
+
+	return true;
+}
+
+/*
+ * We should be careful about promising too much when a guest reads
+ * this register. Don't claim to be like any hardware implementation,
+ * but just report the GIC as version 3 - which is what a Linux guest
+ * would check.
+ */
+static bool handle_mmio_idregs(struct kvm_vcpu *vcpu,
+			       struct kvm_exit_mmio *mmio,
+			       phys_addr_t offset)
+{
+	u32 reg = 0;
+
+	switch (offset + GICD_IDREGS) {
+	case GICD_PIDR2:
+		reg = 0x3b;
+		break;
+	}
+
+	vgic_reg_access(mmio, &reg, offset,
+			ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
+
+	return false;
+}
+
+static const struct kvm_mmio_range vgic_v3_dist_ranges[] = {
+	{
+		.base           = GICD_CTLR,
+		.len            = 0x04,
+		.bits_per_irq   = 0,
+		.handle_mmio    = handle_mmio_ctlr,
+	},
+	{
+		.base           = GICD_TYPER,
+		.len            = 0x04,
+		.bits_per_irq   = 0,
+		.handle_mmio    = handle_mmio_typer,
+	},
+	{
+		.base           = GICD_IIDR,
+		.len            = 0x04,
+		.bits_per_irq   = 0,
+		.handle_mmio    = handle_mmio_iidr,
+	},
+	{
+		/* this register is optional, it is RAZ/WI if not implemented */
+		.base           = GICD_STATUSR,
+		.len            = 0x04,
+		.bits_per_irq   = 0,
+		.handle_mmio    = handle_mmio_raz_wi,
+	},
+	{
+		/* this write only register is WI when TYPER.MBIS=0 */
+		.base		= GICD_SETSPI_NSR,
+		.len		= 0x04,
+		.bits_per_irq	= 0,
+		.handle_mmio	= handle_mmio_raz_wi,
+	},
+	{
+		/* this write only register is WI when TYPER.MBIS=0 */
+		.base		= GICD_CLRSPI_NSR,
+		.len		= 0x04,
+		.bits_per_irq	= 0,
+		.handle_mmio	= handle_mmio_raz_wi,
+	},
+	{
+		/* this is RAZ/WI when DS=1 */
+		.base		= GICD_SETSPI_SR,
+		.len		= 0x04,
+		.bits_per_irq	= 0,
+		.handle_mmio	= handle_mmio_raz_wi,
+	},
+	{
+		/* this is RAZ/WI when DS=1 */
+		.base		= GICD_CLRSPI_SR,
+		.len		= 0x04,
+		.bits_per_irq	= 0,
+		.handle_mmio	= handle_mmio_raz_wi,
+	},
+	{
+		.base		= GICD_IGROUPR,
+		.len		= 0x80,
+		.bits_per_irq	= 1,
+		.handle_mmio	= handle_mmio_rao_wi,
+	},
+	{
+		.base		= GICD_ISENABLER,
+		.len		= 0x80,
+		.bits_per_irq	= 1,
+		.handle_mmio	= handle_mmio_set_enable_reg_dist,
+	},
+	{
+		.base		= GICD_ICENABLER,
+		.len		= 0x80,
+		.bits_per_irq	= 1,
+		.handle_mmio	= handle_mmio_clear_enable_reg_dist,
+	},
+	{
+		.base		= GICD_ISPENDR,
+		.len		= 0x80,
+		.bits_per_irq	= 1,
+		.handle_mmio	= handle_mmio_set_pending_reg_dist,
+	},
+	{
+		.base		= GICD_ICPENDR,
+		.len		= 0x80,
+		.bits_per_irq	= 1,
+		.handle_mmio	= handle_mmio_clear_pending_reg_dist,
+	},
+	{
+		.base		= GICD_ISACTIVER,
+		.len		= 0x80,
+		.bits_per_irq	= 1,
+		.handle_mmio	= handle_mmio_raz_wi,
+	},
+	{
+		.base		= GICD_ICACTIVER,
+		.len		= 0x80,
+		.bits_per_irq	= 1,
+		.handle_mmio	= handle_mmio_raz_wi,
+	},
+	{
+		.base		= GICD_IPRIORITYR,
+		.len		= 0x400,
+		.bits_per_irq	= 8,
+		.handle_mmio	= handle_mmio_priority_reg_dist,
+	},
+	{
+		/* TARGETSRn is RES0 when ARE=1 */
+		.base		= GICD_ITARGETSR,
+		.len		= 0x400,
+		.bits_per_irq	= 8,
+		.handle_mmio	= handle_mmio_raz_wi,
+	},
+	{
+		.base		= GICD_ICFGR,
+		.len		= 0x100,
+		.bits_per_irq	= 2,
+		.handle_mmio	= handle_mmio_cfg_reg_dist,
+	},
+	{
+		/* this is RAZ/WI when DS=1 */
+		.base		= GICD_IGRPMODR,
+		.len		= 0x80,
+		.bits_per_irq	= 1,
+		.handle_mmio	= handle_mmio_raz_wi,
+	},
+	{
+		/* this is RAZ/WI when DS=1 */
+		.base		= GICD_NSACR,
+		.len		= 0x100,
+		.bits_per_irq	= 2,
+		.handle_mmio	= handle_mmio_raz_wi,
+	},
+	{
+		/* this is RAZ/WI when ARE=1 */
+		.base		= GICD_SGIR,
+		.len		= 0x04,
+		.handle_mmio	= handle_mmio_raz_wi,
+	},
+	{
+		/* this is RAZ/WI when ARE=1 */
+		.base		= GICD_CPENDSGIR,
+		.len		= 0x10,
+		.handle_mmio	= handle_mmio_raz_wi,
+	},
+	{
+		/* this is RAZ/WI when ARE=1 */
+		.base           = GICD_SPENDSGIR,
+		.len            = 0x10,
+		.handle_mmio    = handle_mmio_raz_wi,
+	},
+	{
+		.base		= GICD_IROUTER + 0x100,
+		.len		= 0x1edc,
+		.bits_per_irq	= 64,
+		.handle_mmio	= handle_mmio_route_reg,
+	},
+	{
+		.base           = GICD_IDREGS,
+		.len            = 0x30,
+		.bits_per_irq   = 0,
+		.handle_mmio    = handle_mmio_idregs,
+	},
+	{},
+};
+
+static bool handle_mmio_set_enable_reg_redist(struct kvm_vcpu *vcpu,
+					      struct kvm_exit_mmio *mmio,
+					      phys_addr_t offset)
+{
+	struct kvm_vcpu *redist_vcpu = mmio->private;
+
+	return vgic_handle_enable_reg(vcpu->kvm, mmio, offset,
+				      redist_vcpu->vcpu_id,
+				      ACCESS_WRITE_SETBIT);
+}
+
+static bool handle_mmio_clear_enable_reg_redist(struct kvm_vcpu *vcpu,
+						struct kvm_exit_mmio *mmio,
+						phys_addr_t offset)
+{
+	struct kvm_vcpu *redist_vcpu = mmio->private;
+
+	return vgic_handle_enable_reg(vcpu->kvm, mmio, offset,
+				      redist_vcpu->vcpu_id,
+				      ACCESS_WRITE_CLEARBIT);
+}
+
+static bool handle_mmio_set_pending_reg_redist(struct kvm_vcpu *vcpu,
+					       struct kvm_exit_mmio *mmio,
+					       phys_addr_t offset)
+{
+	struct kvm_vcpu *redist_vcpu = mmio->private;
+
+	return vgic_handle_set_pending_reg(vcpu->kvm, mmio, offset,
+					   redist_vcpu->vcpu_id);
+}
+
+static bool handle_mmio_clear_pending_reg_redist(struct kvm_vcpu *vcpu,
+						 struct kvm_exit_mmio *mmio,
+						 phys_addr_t offset)
+{
+	struct kvm_vcpu *redist_vcpu = mmio->private;
+
+	return vgic_handle_clear_pending_reg(vcpu->kvm, mmio, offset,
+					     redist_vcpu->vcpu_id);
+}
+
+static bool handle_mmio_priority_reg_redist(struct kvm_vcpu *vcpu,
+					    struct kvm_exit_mmio *mmio,
+					    phys_addr_t offset)
+{
+	struct kvm_vcpu *redist_vcpu = mmio->private;
+	u32 *reg;
+
+	reg = vgic_bytemap_get_reg(&vcpu->kvm->arch.vgic.irq_priority,
+				   redist_vcpu->vcpu_id, offset);
+	vgic_reg_access(mmio, reg, offset,
+			ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
+	return false;
+}
+
+static bool handle_mmio_cfg_reg_redist(struct kvm_vcpu *vcpu,
+				       struct kvm_exit_mmio *mmio,
+				       phys_addr_t offset)
+{
+	struct kvm_vcpu *redist_vcpu = mmio->private;
+
+	u32 *reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_cfg,
+				       redist_vcpu->vcpu_id, offset >> 1);
+
+	return vgic_handle_cfg_reg(reg, mmio, offset);
+}
+
+static const struct kvm_mmio_range vgic_redist_sgi_ranges[] = {
+	{
+		.base		= GICR_IGROUPR0,
+		.len		= 0x04,
+		.bits_per_irq	= 1,
+		.handle_mmio	= handle_mmio_raz_wi,
+	},
+	{
+		.base		= GICR_ISENABLER0,
+		.len		= 0x04,
+		.bits_per_irq	= 1,
+		.handle_mmio	= handle_mmio_set_enable_reg_redist,
+	},
+	{
+		.base		= GICR_ICENABLER0,
+		.len		= 0x04,
+		.bits_per_irq	= 1,
+		.handle_mmio	= handle_mmio_clear_enable_reg_redist,
+	},
+	{
+		.base		= GICR_ISPENDR0,
+		.len		= 0x04,
+		.bits_per_irq	= 1,
+		.handle_mmio	= handle_mmio_set_pending_reg_redist,
+	},
+	{
+		.base		= GICR_ICPENDR0,
+		.len		= 0x04,
+		.bits_per_irq	= 1,
+		.handle_mmio	= handle_mmio_clear_pending_reg_redist,
+	},
+	{
+		.base		= GICR_ISACTIVER0,
+		.len		= 0x04,
+		.bits_per_irq	= 1,
+		.handle_mmio	= handle_mmio_raz_wi,
+	},
+	{
+		.base		= GICR_ICACTIVER0,
+		.len		= 0x04,
+		.bits_per_irq	= 1,
+		.handle_mmio	= handle_mmio_raz_wi,
+	},
+	{
+		.base		= GICR_IPRIORITYR0,
+		.len		= 0x20,
+		.bits_per_irq	= 8,
+		.handle_mmio	= handle_mmio_priority_reg_redist,
+	},
+	{
+		.base		= GICR_ICFGR0,
+		.len		= 0x08,
+		.bits_per_irq	= 2,
+		.handle_mmio	= handle_mmio_cfg_reg_redist,
+	},
+	{
+		.base		= GICR_IGRPMODR0,
+		.len		= 0x04,
+		.bits_per_irq	= 1,
+		.handle_mmio	= handle_mmio_raz_wi,
+	},
+	{
+		.base		= GICR_NSACR,
+		.len		= 0x04,
+		.handle_mmio	= handle_mmio_raz_wi,
+	},
+	{},
+};
+
+static bool handle_mmio_ctlr_redist(struct kvm_vcpu *vcpu,
+				    struct kvm_exit_mmio *mmio,
+				    phys_addr_t offset)
+{
+	/* since we don't support LPIs, this register is zero for now */
+	vgic_reg_access(mmio, NULL, offset,
+			ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
+	return false;
+}
+
+static bool handle_mmio_typer_redist(struct kvm_vcpu *vcpu,
+				     struct kvm_exit_mmio *mmio,
+				     phys_addr_t offset)
+{
+	u32 reg;
+	u64 mpidr;
+	struct kvm_vcpu *redist_vcpu = mmio->private;
+	int target_vcpu_id = redist_vcpu->vcpu_id;
+
+	/* the upper 32 bits contain the affinity value */
+	if ((offset & ~3) == 4) {
+		mpidr = kvm_vcpu_get_mpidr_aff(redist_vcpu);
+		reg = compress_mpidr(mpidr);
+
+		vgic_reg_access(mmio, &reg, offset,
+				ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
+		return false;
+	}
+
+	reg = redist_vcpu->vcpu_id << 8;
+	if (target_vcpu_id == atomic_read(&vcpu->kvm->online_vcpus) - 1)
+		reg |= GICR_TYPER_LAST;
+	vgic_reg_access(mmio, &reg, offset,
+			ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
+	return false;
+}
+
+static const struct kvm_mmio_range vgic_redist_ranges[] = {
+	{
+		.base           = GICR_CTLR,
+		.len            = 0x04,
+		.bits_per_irq   = 0,
+		.handle_mmio    = handle_mmio_ctlr_redist,
+	},
+	{
+		.base           = GICR_TYPER,
+		.len            = 0x08,
+		.bits_per_irq   = 0,
+		.handle_mmio    = handle_mmio_typer_redist,
+	},
+	{
+		.base           = GICR_IIDR,
+		.len            = 0x04,
+		.bits_per_irq   = 0,
+		.handle_mmio    = handle_mmio_iidr,
+	},
+	{
+		.base           = GICR_WAKER,
+		.len            = 0x04,
+		.bits_per_irq   = 0,
+		.handle_mmio    = handle_mmio_raz_wi,
+	},
+	{
+		.base           = GICR_IDREGS,
+		.len            = 0x30,
+		.bits_per_irq   = 0,
+		.handle_mmio    = handle_mmio_idregs,
+	},
+	{},
+};
+
+/*
+ * This function splits accesses between the distributor and the two
+ * redistributor parts (private/SPI). As each redistributor is accessible
+ * from any CPU, we have to determine the affected VCPU by taking the faulting
+ * address into account. We then pass this VCPU to the handler function via
+ * the private parameter.
+ */
+#define SGI_BASE_OFFSET SZ_64K
+static bool vgic_v3_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
+				struct kvm_exit_mmio *mmio)
+{
+	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+	unsigned long dbase = dist->vgic_dist_base;
+	unsigned long rdbase = dist->vgic_redist_base;
+	int nrcpus = atomic_read(&vcpu->kvm->online_vcpus);
+	int vcpu_id;
+	const struct kvm_mmio_range *mmio_range;
+
+	if (is_in_range(mmio->phys_addr, mmio->len, dbase, GIC_V3_DIST_SIZE)) {
+		return vgic_handle_mmio_range(vcpu, run, mmio,
+					      vgic_v3_dist_ranges, dbase);
+	}
+
+	if (!is_in_range(mmio->phys_addr, mmio->len, rdbase,
+	    GIC_V3_REDIST_SIZE * nrcpus))
+		return false;
+
+	vcpu_id = (mmio->phys_addr - rdbase) / GIC_V3_REDIST_SIZE;
+	rdbase += (vcpu_id * GIC_V3_REDIST_SIZE);
+	mmio->private = kvm_get_vcpu(vcpu->kvm, vcpu_id);
+
+	if (mmio->phys_addr >= rdbase + SGI_BASE_OFFSET) {
+		rdbase += SGI_BASE_OFFSET;
+		mmio_range = vgic_redist_sgi_ranges;
+	} else {
+		mmio_range = vgic_redist_ranges;
+	}
+	return vgic_handle_mmio_range(vcpu, run, mmio, mmio_range, rdbase);
+}
+
+static bool vgic_v3_queue_sgi(struct kvm_vcpu *vcpu, int irq)
+{
+	if (vgic_queue_irq(vcpu, 0, irq)) {
+		vgic_dist_irq_clear_pending(vcpu, irq);
+		vgic_cpu_irq_clear(vcpu, irq);
+		return true;
+	}
+
+	return false;
+}
+
+static int vgic_v3_init_maps(struct vgic_dist *dist)
+{
+	int nr_spis = dist->nr_irqs - VGIC_NR_PRIVATE_IRQS;
+
+	dist->irq_spi_mpidr = kcalloc(nr_spis, sizeof(dist->irq_spi_mpidr[0]),
+				      GFP_KERNEL);
+
+	if (!dist->irq_spi_mpidr)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int vgic_v3_init(struct kvm *kvm, const struct vgic_params *params)
+{
+	struct vgic_dist *dist = &kvm->arch.vgic;
+	int ret, i;
+	u32 mpidr;
+
+	if (IS_VGIC_ADDR_UNDEF(dist->vgic_dist_base) ||
+	    IS_VGIC_ADDR_UNDEF(dist->vgic_redist_base)) {
+		kvm_err("Need to set vgic distributor addresses first\n");
+		return -ENXIO;
+	}
+
+	/*
+	 * FIXME: this should be moved to init_maps time, and may bite
+	 * us when adding save/restore. Add a per-emulation hook?
+	 */
+	ret = vgic_v3_init_maps(dist);
+	if (ret) {
+		kvm_err("Unable to allocate maps\n");
+		return ret;
+	}
+
+	/* Initialize the target VCPUs for each IRQ to VCPU 0 */
+	mpidr = compress_mpidr(kvm_vcpu_get_mpidr_aff(kvm_get_vcpu(kvm, 0)));
+	for (i = VGIC_NR_PRIVATE_IRQS; i < dist->nr_irqs; i++) {
+		dist->irq_spi_cpu[i - VGIC_NR_PRIVATE_IRQS] = 0;
+		dist->irq_spi_mpidr[i - VGIC_NR_PRIVATE_IRQS] = mpidr;
+		vgic_bitmap_set_irq_val(dist->irq_spi_target, 0, i, 1);
+	}
+
+	return 0;
+}
+
+/* GICv3 does not keep track of SGI sources anymore. */
+static void vgic_v3_add_sgi_source(struct kvm_vcpu *vcpu, int irq, int source)
+{
+}
+
+int vgic_v3_init_emulation(struct kvm *kvm)
+{
+	struct vgic_dist *dist = &kvm->arch.vgic;
+
+	dist->vm_ops.handle_mmio = vgic_v3_handle_mmio;
+	dist->vm_ops.queue_sgi = vgic_v3_queue_sgi;
+	dist->vm_ops.add_sgi_source = vgic_v3_add_sgi_source;
+	dist->vm_ops.vgic_init = vgic_v3_init;
+
+	kvm->arch.max_vcpus = KVM_MAX_VCPUS;
+
+	return 0;
+}
+
+static int vgic_v3_create(struct kvm_device *dev, u32 type)
+{
+	return kvm_vgic_create(dev->kvm, type);
+}
+
+static void vgic_v3_destroy(struct kvm_device *dev)
+{
+	kfree(dev);
+}
+
+static int vgic_v3_set_attr(struct kvm_device *dev,
+			    struct kvm_device_attr *attr)
+{
+	int ret;
+
+	ret = vgic_set_common_attr(dev, attr);
+	if (ret != -ENXIO)
+		return ret;
+
+	switch (attr->group) {
+	case KVM_DEV_ARM_VGIC_GRP_DIST_REGS:
+	case KVM_DEV_ARM_VGIC_GRP_CPU_REGS:
+		return -ENXIO;
+	}
+
+	return -ENXIO;
+}
+
+static int vgic_v3_get_attr(struct kvm_device *dev,
+			    struct kvm_device_attr *attr)
+{
+	int ret;
+
+	ret = vgic_get_common_attr(dev, attr);
+	if (ret != -ENXIO)
+		return ret;
+
+	switch (attr->group) {
+	case KVM_DEV_ARM_VGIC_GRP_DIST_REGS:
+	case KVM_DEV_ARM_VGIC_GRP_CPU_REGS:
+		return -ENXIO;
+	}
+
+	return -ENXIO;
+}
+
+static int vgic_v3_has_attr(struct kvm_device *dev,
+			    struct kvm_device_attr *attr)
+{
+	switch (attr->group) {
+	case KVM_DEV_ARM_VGIC_GRP_ADDR:
+		switch (attr->attr) {
+		case KVM_VGIC_V2_ADDR_TYPE_DIST:
+		case KVM_VGIC_V2_ADDR_TYPE_CPU:
+			return -ENXIO;
+		}
+		break;
+	case KVM_DEV_ARM_VGIC_GRP_DIST_REGS:
+	case KVM_DEV_ARM_VGIC_GRP_CPU_REGS:
+		return -ENXIO;
+	case KVM_DEV_ARM_VGIC_GRP_NR_IRQS:
+		return 0;
+	}
+	return -ENXIO;
+}
+
+struct kvm_device_ops kvm_arm_vgic_v3_ops = {
+	.name = "kvm-arm-vgic-v3",
+	.create = vgic_v3_create,
+	.destroy = vgic_v3_destroy,
+	.set_attr = vgic_v3_set_attr,
+	.get_attr = vgic_v3_get_attr,
+	.has_attr = vgic_v3_has_attr,
+};
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 335ffe0..b7de0f8 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1249,7 +1249,7 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
 	struct kvm_vcpu *vcpu;
 	int edge_triggered, level_triggered;
 	int enabled;
-	bool ret = true;
+	bool ret = true, can_inject = true;
 
 	spin_lock(&dist->lock);
 
@@ -1264,6 +1264,11 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
 
 	if (irq_num >= VGIC_NR_PRIVATE_IRQS) {
 		cpuid = dist->irq_spi_cpu[irq_num - VGIC_NR_PRIVATE_IRQS];
+		if (cpuid == VCPU_NOT_ALLOCATED) {
+			/* Pretend we use CPU0, and prevent injection */
+			cpuid = 0;
+			can_inject = false;
+		}
 		vcpu = kvm_get_vcpu(kvm, cpuid);
 	}
 
@@ -1285,7 +1290,7 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
 
 	enabled = vgic_irq_is_enabled(vcpu, irq_num);
 
-	if (!enabled) {
+	if (!enabled || !can_inject) {
 		ret = false;
 		goto out;
 	}
@@ -1438,6 +1443,7 @@ void kvm_vgic_destroy(struct kvm *kvm)
 	}
 	kfree(dist->irq_sgi_sources);
 	kfree(dist->irq_spi_cpu);
+	kfree(dist->irq_spi_mpidr);
 	kfree(dist->irq_spi_target);
 	kfree(dist->irq_pending_on_cpu);
 	dist->irq_sgi_sources = NULL;
@@ -1628,6 +1634,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
 	kvm->arch.vgic.vctrl_base = vgic->vctrl_base;
 	kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
 	kvm->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
+	kvm->arch.vgic.vgic_redist_base = VGIC_ADDR_UNDEF;
 
 out_unlock:
 	for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
diff --git a/virt/kvm/arm/vgic.h b/virt/kvm/arm/vgic.h
index ff3171a..b0c6b2f 100644
--- a/virt/kvm/arm/vgic.h
+++ b/virt/kvm/arm/vgic.h
@@ -35,6 +35,8 @@
 #define ACCESS_WRITE_VALUE	(3 << 1)
 #define ACCESS_WRITE_MASK(x)	((x) & (3 << 1))
 
+#define VCPU_NOT_ALLOCATED	((u8)-1)
+
 unsigned long *vgic_bitmap_get_shared_map(struct vgic_bitmap *x);
 
 void vgic_update_state(struct kvm *kvm);
@@ -115,5 +117,6 @@ int vgic_set_common_attr(struct kvm_device *dev, struct kvm_device_attr *attr);
 int vgic_get_common_attr(struct kvm_device *dev, struct kvm_device_attr *attr);
 
 int vgic_v2_init_emulation(struct kvm *kvm);
+int vgic_v3_init_emulation(struct kvm *kvm);
 
 #endif
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v4 16/19] arm64: GICv3: introduce symbolic names for GICv3 ICC_SGI1R_EL1 fields
  2014-11-14 10:07 [PATCH v4 00/19] KVM GICv3 emulation Andre Przywara
                   ` (14 preceding siblings ...)
  2014-11-14 10:07 ` [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation Andre Przywara
@ 2014-11-14 10:08 ` Andre Przywara
  2014-11-23 14:43   ` Christoffer Dall
  2014-11-14 10:08 ` [PATCH v4 17/19] arm64: KVM: add SGI generation register emulation Andre Przywara
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 80+ messages in thread
From: Andre Przywara @ 2014-11-14 10:08 UTC (permalink / raw)
  To: linux-arm-kernel

The gic_send_sgi() function used hardcoded bit shift values to
generate the ICC_SGI1R_EL1 register value.
Replace this with symbolic names to allow reusing them later.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
Changelog v3...v4:
- (new patch)

 drivers/irqchip/irq-gic-v3.c       |   14 +++++++++-----
 include/linux/irqchip/arm-gic-v3.h |   12 ++++++++++++
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index aa17ae8..e163749 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -467,15 +467,19 @@ out:
 	return tlist;
 }
 
+#define MPIDR_TO_SGI_AFFINITY(cluster_id, level) \
+	(MPIDR_AFFINITY_LEVEL(cluster_id, level) \
+		<< ICC_SGI1R_AFFINITY_## level ##_SHIFT)
+
 static void gic_send_sgi(u64 cluster_id, u16 tlist, unsigned int irq)
 {
 	u64 val;
 
-	val = (MPIDR_AFFINITY_LEVEL(cluster_id, 3) << 48	|
-	       MPIDR_AFFINITY_LEVEL(cluster_id, 2) << 32	|
-	       irq << 24			    		|
-	       MPIDR_AFFINITY_LEVEL(cluster_id, 1) << 16	|
-	       tlist);
+	val = (MPIDR_TO_SGI_AFFINITY(cluster_id, 3)	|
+	       MPIDR_TO_SGI_AFFINITY(cluster_id, 2)	|
+	       irq << ICC_SGI1R_SGI_ID_SHIFT		|
+	       MPIDR_TO_SGI_AFFINITY(cluster_id, 1)	|
+	       tlist << ICC_SGI1R_TARGET_LIST_SHIFT);
 
 	pr_debug("CPU%d: ICC_SGI1R_EL1 %llx\n", smp_processor_id(), val);
 	gic_write_sgi1r(val);
diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
index 726d898..b7946c3 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -174,6 +174,18 @@
 #define ICC_SRE_EL2_SRE			(1 << 0)
 #define ICC_SRE_EL2_ENABLE		(1 << 3)
 
+#define ICC_SGI1R_TARGET_LIST_SHIFT	0
+#define ICC_SGI1R_TARGET_LIST_MASK	(0xffff << ICC_SGI1R_TARGET_LIST_SHIFT)
+#define ICC_SGI1R_AFFINITY_1_SHIFT	16
+#define ICC_SGI1R_AFFINITY_1_MASK	(0xff << ICC_SGI1R_AFFINITY_1_SHIFT)
+#define ICC_SGI1R_SGI_ID_SHIFT		24
+#define ICC_SGI1R_SGI_ID_MASK		(0xff << ICC_SGI1R_SGI_ID_SHIFT)
+#define ICC_SGI1R_AFFINITY_2_SHIFT	32
+#define ICC_SGI1R_AFFINITY_2_MASK	(0xffULL << ICC_SGI1R_AFFINITY_1_SHIFT)
+#define ICC_SGI1R_IRQ_ROUTING_MODE_BIT	40
+#define ICC_SGI1R_AFFINITY_3_SHIFT	48
+#define ICC_SGI1R_AFFINITY_3_MASK	(0xffULL << ICC_SGI1R_AFFINITY_1_SHIFT)
+
 /*
  * System register definitions
  */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v4 17/19] arm64: KVM: add SGI generation register emulation
  2014-11-14 10:07 [PATCH v4 00/19] KVM GICv3 emulation Andre Przywara
                   ` (15 preceding siblings ...)
  2014-11-14 10:08 ` [PATCH v4 16/19] arm64: GICv3: introduce symbolic names for GICv3 ICC_SGI1R_EL1 fields Andre Przywara
@ 2014-11-14 10:08 ` Andre Przywara
  2014-11-23 15:08   ` Christoffer Dall
  2014-11-14 10:08 ` [PATCH v4 18/19] arm/arm64: KVM: enable kernel side of GICv3 emulation Andre Przywara
                   ` (2 subsequent siblings)
  19 siblings, 1 reply; 80+ messages in thread
From: Andre Przywara @ 2014-11-14 10:08 UTC (permalink / raw)
  To: linux-arm-kernel

While the generation of a (virtual) inter-processor interrupt (SGI)
on a GICv2 works by writing to a MMIO register, GICv3 uses the system
register ICC_SGI1R_EL1 to trigger them.
Trap that register on ARM64 hosts and handle it in a new handler
function in the GICv3 emulation code.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
Changelog v3...v4:
- moved addition of vgic_v3_dispatch_sgi() from earlier patch into here
- move MPIDR comparison into extra function
- use new ICC_SGI1R_ field names
- improve readability of vgic_v3_dispatch_sgi()
- add and refine comments

 arch/arm64/kvm/sys_regs.c   |   26 ++++++++++
 include/kvm/arm_vgic.h      |    1 +
 virt/kvm/arm/vgic-v3-emul.c |  113 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 140 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index fd3ffc3..e59369a 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -165,6 +165,27 @@ static bool access_sctlr(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+/*
+ * Trap handler for the GICv3 SGI generation system register.
+ * Forward the request to the VGIC emulation.
+ * The cp15_64 code makes sure this automatically works
+ * for both AArch64 and AArch32 accesses.
+ */
+static bool access_gic_sgi(struct kvm_vcpu *vcpu,
+			   const struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	u64 val;
+
+	if (!p->is_write)
+		return read_from_write_only(vcpu, p);
+
+	val = *vcpu_reg(vcpu, p->Rt);
+	vgic_v3_dispatch_sgi(vcpu, val);
+
+	return true;
+}
+
 static bool trap_raz_wi(struct kvm_vcpu *vcpu,
 			const struct sys_reg_params *p,
 			const struct sys_reg_desc *r)
@@ -431,6 +452,9 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	/* VBAR_EL1 */
 	{ Op0(0b11), Op1(0b000), CRn(0b1100), CRm(0b0000), Op2(0b000),
 	  NULL, reset_val, VBAR_EL1, 0 },
+	/* ICC_SGI1R_EL1 */
+	{ Op0(0b11), Op1(0b000), CRn(0b1100), CRm(0b1011), Op2(0b101),
+	  access_gic_sgi },
 	/* CONTEXTIDR_EL1 */
 	{ Op0(0b11), Op1(0b000), CRn(0b1101), CRm(0b0000), Op2(0b001),
 	  access_vm_reg, reset_val, CONTEXTIDR_EL1, 0 },
@@ -659,6 +683,8 @@ static const struct sys_reg_desc cp14_64_regs[] = {
  * register).
  */
 static const struct sys_reg_desc cp15_regs[] = {
+	{ Op1( 0), CRn( 0), CRm(12), Op2( 0), access_gic_sgi },
+
 	{ Op1( 0), CRn( 1), CRm( 0), Op2( 0), access_sctlr, NULL, c1_SCTLR },
 	{ Op1( 0), CRn( 2), CRm( 0), Op2( 0), access_vm_reg, NULL, c2_TTBR0 },
 	{ Op1( 0), CRn( 2), CRm( 0), Op2( 1), access_vm_reg, NULL, c2_TTBR1 },
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index c1ef5a9..357a935 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -305,6 +305,7 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu);
 void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu);
 int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
 			bool level);
+void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
 int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
 bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
 		      struct kvm_exit_mmio *mmio);
diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
index 97b5801..58d7457 100644
--- a/virt/kvm/arm/vgic-v3-emul.c
+++ b/virt/kvm/arm/vgic-v3-emul.c
@@ -828,6 +828,119 @@ int vgic_v3_init_emulation(struct kvm *kvm)
 	return 0;
 }
 
+/*
+ * Compare a given affinity (level 1-3 and a level 0 mask, from the SGI
+ * generation register ICC_SGI1R_EL1) with a given VCPU.
+ * If the VCPU's MPIDR matches, return the level0 affinity, otherwise
+ * return -1.
+ */
+static int match_mpidr(u64 sgi_aff, u16 sgi_cpu_mask, struct kvm_vcpu *vcpu)
+{
+	unsigned long affinity;
+	int level0;
+
+	/*
+	 * Split the current VCPU's MPIDR into affinity level 0 and the
+	 * rest as this is what we have to compare against.
+	 */
+	affinity = kvm_vcpu_get_mpidr_aff(vcpu);
+	level0 = MPIDR_AFFINITY_LEVEL(affinity, 0);
+	affinity &= ~MPIDR_LEVEL_MASK;
+
+	/* bail out if the upper three levels don't match */
+	if (sgi_aff != affinity)
+		return -1;
+
+	/* Is this VCPU's bit set in the mask ? */
+	if (!(sgi_cpu_mask & BIT(level0)))
+		return -1;
+
+	return level0;
+}
+
+#define SGI_AFFINITY_LEVEL(reg, level) \
+	((((reg) & ICC_SGI1R_AFFINITY_## level ##_MASK) \
+	>> ICC_SGI1R_AFFINITY_## level ##_SHIFT) << MPIDR_LEVEL_SHIFT(level))
+
+/**
+ * vgic_v3_dispatch_sgi - handle SGI requests from VCPUs
+ * @vcpu: The VCPU requesting a SGI
+ * @reg: The value written into the ICC_SGI1R_EL1 register by that VCPU
+ *
+ * With GICv3 (and ARE=1) CPUs trigger SGIs by writing to an architectural
+ * system register. This will trap in sys_regs.c and call this function.
+ * This ICC_SGI1R_EL1 register contains the upper three affinity levels of the
+ * target processors as well as a bitmask of 16 Aff0 CPUs.
+ * If the interrupt routing mode bit is not set, we iterate over all VCPUs to
+ * check for matching ones. If this bit is set, we signal all, but not the
+ * calling VCPU.
+ */
+void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_vcpu *c_vcpu;
+	struct vgic_dist *dist = &kvm->arch.vgic;
+	u16 target_cpus;
+	u64 mpidr;
+	int sgi, c, vcpu_id;
+	bool broadcast;
+	int updated = 0;
+
+	vcpu_id = vcpu->vcpu_id;
+
+	sgi = (reg & ICC_SGI1R_SGI_ID_MASK) >> ICC_SGI1R_SGI_ID_SHIFT;
+	broadcast = reg & BIT(ICC_SGI1R_IRQ_ROUTING_MODE_BIT);
+	target_cpus = (reg & ICC_SGI1R_TARGET_LIST_MASK) >> ICC_SGI1R_TARGET_LIST_SHIFT;
+	mpidr = SGI_AFFINITY_LEVEL(reg, 3);
+	mpidr |= SGI_AFFINITY_LEVEL(reg, 2);
+	mpidr |= SGI_AFFINITY_LEVEL(reg, 1);
+	mpidr &= ~MPIDR_LEVEL_MASK;
+
+	/*
+	 * We take the dist lock here, because we come from the sysregs
+	 * code path and not from the MMIO one (which already takes the lock).
+	 */
+	spin_lock(&dist->lock);
+
+	/*
+	 * We iterate over all VCPUs to find the MPIDRs matching the request.
+	 * If we have handled one CPU, we clear it's bit to detect early
+	 * if we are already finished. This avoids iterating through all
+	 * VCPUs when most of the times we just signal a single VCPU.
+	 */
+	kvm_for_each_vcpu(c, c_vcpu, kvm) {
+
+		/* Exit early if we have dealt with all requested CPUs */
+		if (!broadcast && target_cpus == 0)
+			break;
+
+		 /* Don't signal the calling VCPU */
+		if (broadcast && c == vcpu_id)
+			continue;
+
+		if (!broadcast) {
+			int level0;
+
+			level0 = match_mpidr(mpidr, target_cpus, c_vcpu);
+			if (level0 == -1)
+				continue;
+
+			/* remove this matching VCPU from the mask */
+			target_cpus &= ~BIT(level0);
+		}
+
+		/* Flag the SGI as pending */
+		vgic_dist_irq_set_pending(c_vcpu, sgi);
+		updated = 1;
+		kvm_debug("SGI%d from CPU%d to CPU%d\n", sgi, vcpu_id, c);
+	}
+	if (updated)
+		vgic_update_state(vcpu->kvm);
+	spin_unlock(&dist->lock);
+	if (updated)
+		vgic_kick_vcpus(vcpu->kvm);
+}
+
 static int vgic_v3_create(struct kvm_device *dev, u32 type)
 {
 	return kvm_vgic_create(dev->kvm, type);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v4 18/19] arm/arm64: KVM: enable kernel side of GICv3 emulation
  2014-11-14 10:07 [PATCH v4 00/19] KVM GICv3 emulation Andre Przywara
                   ` (16 preceding siblings ...)
  2014-11-14 10:08 ` [PATCH v4 17/19] arm64: KVM: add SGI generation register emulation Andre Przywara
@ 2014-11-14 10:08 ` Andre Przywara
  2014-11-24  9:09   ` Christoffer Dall
  2014-11-14 10:08 ` [PATCH v4 19/19] arm/arm64: KVM: allow userland to request a virtual GICv3 Andre Przywara
  2014-11-24  9:33 ` [PATCH v4 00/19] KVM GICv3 emulation Eric Auger
  19 siblings, 1 reply; 80+ messages in thread
From: Andre Przywara @ 2014-11-14 10:08 UTC (permalink / raw)
  To: linux-arm-kernel

With all the necessary GICv3 emulation code in place, we can now
connect the code to the GICv3 backend in the kernel.
The LR register handling is different depending on the emulated GIC
model, so provide different implementations for each.
Also allow non-v2-compatible GICv3 implementations (which don't
provide MMIO regions for the virtual CPU interface in the DT), but
restrict those hosts to support GICv3 guests only.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
Changelog v3...v4:
- handle differences between GICv2-on-v3 and GICv3-on-v3 in existing functions
- remove init_*_emul() functions
- remove max_vcpus setting (done in earlier patches now)
- adapt to new vgic_v<n>_init_emulation behaviour

 virt/kvm/arm/vgic-v3.c |   83 ++++++++++++++++++++++++++++++++----------------
 virt/kvm/arm/vgic.c    |    5 +++
 2 files changed, 60 insertions(+), 28 deletions(-)

diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
index a04d208..4894c59 100644
--- a/virt/kvm/arm/vgic-v3.c
+++ b/virt/kvm/arm/vgic-v3.c
@@ -34,6 +34,7 @@
 #define GICH_LR_VIRTUALID		(0x3ffUL << 0)
 #define GICH_LR_PHYSID_CPUID_SHIFT	(10)
 #define GICH_LR_PHYSID_CPUID		(7UL << GICH_LR_PHYSID_CPUID_SHIFT)
+#define ICH_LR_VIRTUALID_MASK		(BIT_ULL(32) - 1)
 
 /*
  * LRs are stored in reverse order in memory. make sure we index them
@@ -48,12 +49,17 @@ static struct vgic_lr vgic_v3_get_lr(const struct kvm_vcpu *vcpu, int lr)
 	struct vgic_lr lr_desc;
 	u64 val = vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[LR_INDEX(lr)];
 
-	lr_desc.irq	= val & GICH_LR_VIRTUALID;
-	if (lr_desc.irq <= 15)
-		lr_desc.source	= (val >> GICH_LR_PHYSID_CPUID_SHIFT) & 0x7;
+	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
+		lr_desc.irq = val & ICH_LR_VIRTUALID_MASK;
 	else
-		lr_desc.source = 0;
-	lr_desc.state	= 0;
+		lr_desc.irq = val & GICH_LR_VIRTUALID;
+
+	lr_desc.source = 0;
+	if (lr_desc.irq <= 15 &&
+	    vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V2)
+		lr_desc.source	= (val >> GICH_LR_PHYSID_CPUID_SHIFT) & 0x7;
+
+	lr_desc.state   = 0;
 
 	if (val & ICH_LR_PENDING_BIT)
 		lr_desc.state |= LR_STATE_PENDING;
@@ -68,8 +74,20 @@ static struct vgic_lr vgic_v3_get_lr(const struct kvm_vcpu *vcpu, int lr)
 static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
 			   struct vgic_lr lr_desc)
 {
-	u64 lr_val = (((u32)lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT) |
-		      lr_desc.irq);
+	u64 lr_val;
+
+	lr_val = lr_desc.irq;
+
+	/*
+	 * currently all guest IRQs are Group1, as Group0 would result
+	 * in a FIQ in the guest, which it wouldn't expect.
+	 * Eventually we want to make this configurable, so we may revisit
+	 * this in the future.
+	 */
+	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
+		lr_val |= ICH_LR_GROUP;
+	else
+		lr_val |= (u32)lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT;
 
 	if (lr_desc.state & LR_STATE_PENDING)
 		lr_val |= ICH_LR_PENDING_BIT;
@@ -154,7 +172,14 @@ static void vgic_v3_enable(struct kvm_vcpu *vcpu)
 	 */
 	vgic_v3->vgic_vmcr = 0;
 
-	vgic_v3->vgic_sre = 0;
+	/*
+	 * Set the SRE_EL1 value depending on the configured
+	 * emulated vGIC model.
+	 */
+	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
+		vgic_v3->vgic_sre = ICC_SRE_EL1_SRE;
+	else
+		vgic_v3->vgic_sre = 0;
 
 	/* Get the show on the road... */
 	vgic_v3->vgic_hcr = ICH_HCR_EN;
@@ -215,28 +240,30 @@ int vgic_v3_probe(struct device_node *vgic_node,
 
 	gicv_idx += 3; /* Also skip GICD, GICC, GICH */
 	if (of_address_to_resource(vgic_node, gicv_idx, &vcpu_res)) {
-		kvm_err("Cannot obtain GICV region\n");
-		ret = -ENXIO;
-		goto out;
-	}
-
-	if (!PAGE_ALIGNED(vcpu_res.start)) {
-		kvm_err("GICV physical address 0x%llx not page aligned\n",
-			(unsigned long long)vcpu_res.start);
-		ret = -ENXIO;
-		goto out;
-	}
-
-	if (!PAGE_ALIGNED(resource_size(&vcpu_res))) {
-		kvm_err("GICV size 0x%llx not a multiple of page size 0x%lx\n",
-			(unsigned long long)resource_size(&vcpu_res),
-			PAGE_SIZE);
-		ret = -ENXIO;
-		goto out;
+		kvm_info("GICv3: GICv2 emulation not available\n");
+		vgic->vcpu_base = 0;
+	} else {
+		if (!PAGE_ALIGNED(vcpu_res.start)) {
+			kvm_err("GICV physical address 0x%llx not page aligned\n",
+				(unsigned long long)vcpu_res.start);
+			ret = -ENXIO;
+			goto out;
+		}
+
+		if (!PAGE_ALIGNED(resource_size(&vcpu_res))) {
+			kvm_err("GICV size 0x%llx not a multiple of page size 0x%lx\n",
+				(unsigned long long)resource_size(&vcpu_res),
+				PAGE_SIZE);
+			ret = -ENXIO;
+			goto out;
+		}
+
+		vgic->vcpu_base = vcpu_res.start;
+		kvm_register_device_ops(&kvm_arm_vgic_v2_ops,
+					KVM_DEV_TYPE_ARM_VGIC_V2);
 	}
-	kvm_register_device_ops(&kvm_arm_vgic_v2_ops, KVM_DEV_TYPE_ARM_VGIC_V2);
+	kvm_register_device_ops(&kvm_arm_vgic_v3_ops, KVM_DEV_TYPE_ARM_VGIC_V3);
 
-	vgic->vcpu_base = vcpu_res.start;
 	vgic->vctrl_base = NULL;
 	vgic->type = VGIC_V3;
 	vgic->max_hw_vcpus = KVM_MAX_VCPUS;
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index b7de0f8..1dbaeb5 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1577,6 +1577,11 @@ static int init_vgic_model(struct kvm *kvm, int type)
 	case KVM_DEV_TYPE_ARM_VGIC_V2:
 		ret = vgic_v2_init_emulation(kvm);
 		break;
+#ifdef CONFIG_ARM_GIC_V3
+	case KVM_DEV_TYPE_ARM_VGIC_V3:
+		ret = vgic_v3_init_emulation(kvm);
+		break;
+#endif
 	default:
 		ret = -ENODEV;
 		break;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v4 19/19] arm/arm64: KVM: allow userland to request a virtual GICv3
  2014-11-14 10:07 [PATCH v4 00/19] KVM GICv3 emulation Andre Przywara
                   ` (17 preceding siblings ...)
  2014-11-14 10:08 ` [PATCH v4 18/19] arm/arm64: KVM: enable kernel side of GICv3 emulation Andre Przywara
@ 2014-11-14 10:08 ` Andre Przywara
  2014-11-24  9:39   ` Christoffer Dall
  2014-11-24  9:33 ` [PATCH v4 00/19] KVM GICv3 emulation Eric Auger
  19 siblings, 1 reply; 80+ messages in thread
From: Andre Przywara @ 2014-11-14 10:08 UTC (permalink / raw)
  To: linux-arm-kernel

With all of the GICv3 code in place now we allow userland to ask the
kernel for using a virtual GICv3 in the guest.
Also we provide the necessary support for guests setting the memory
addresses for the virtual distributor and redistributors.
This requires some userland code to make use of that feature and
explicitly ask for a virtual GICv3.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
Changelog v3...v4:
- refine commit message
- add documentation of new GICv3 KVM device

 Documentation/virtual/kvm/devices/arm-vgic.txt |   21 +++++++++--
 arch/arm64/include/uapi/asm/kvm.h              |    7 ++++
 include/kvm/arm_vgic.h                         |    4 +--
 virt/kvm/arm/vgic-v3-emul.c                    |    3 ++
 virt/kvm/arm/vgic.c                            |   46 +++++++++++++++++-------
 5 files changed, 64 insertions(+), 17 deletions(-)

diff --git a/Documentation/virtual/kvm/devices/arm-vgic.txt b/Documentation/virtual/kvm/devices/arm-vgic.txt
index df8b0c7..67e4c3e 100644
--- a/Documentation/virtual/kvm/devices/arm-vgic.txt
+++ b/Documentation/virtual/kvm/devices/arm-vgic.txt
@@ -3,22 +3,37 @@ ARM Virtual Generic Interrupt Controller (VGIC)
 
 Device types supported:
   KVM_DEV_TYPE_ARM_VGIC_V2     ARM Generic Interrupt Controller v2.0
+  KVM_DEV_TYPE_ARM_VGIC_V3     ARM Generic Interrupt Controller v3.0
 
 Only one VGIC instance may be instantiated through either this API or the
 legacy KVM_CREATE_IRQCHIP api.  The created VGIC will act as the VM interrupt
 controller, requiring emulated user-space devices to inject interrupts to the
 VGIC instead of directly to CPUs.
+Creating a guest GICv3 device requires a host GICv3 as well.
+GICv3 implementations with hardware compatibility support allow a guest GICv2
+as well.
 
 Groups:
   KVM_DEV_ARM_VGIC_GRP_ADDR
   Attributes:
     KVM_VGIC_V2_ADDR_TYPE_DIST (rw, 64-bit)
       Base address in the guest physical address space of the GIC distributor
-      register mappings.
+      register mappings. Only valid if a guest GICv2 has been instantiated.
 
     KVM_VGIC_V2_ADDR_TYPE_CPU (rw, 64-bit)
       Base address in the guest physical address space of the GIC virtual cpu
-      interface register mappings.
+      interface register mappings. Only valid if a guest GICv2 has been
+      instantiated.
+
+    KVM_VGIC_V3_ADDR_TYPE_DIST (rw, 64-bit)
+      Base address in the guest physical address space of the GICv3 distributor
+      register mappings. Only valid if a guest GICv3 has been instantiated.
+
+    KVM_VGIC_V3_ADDR_TYPE_REDIST (rw, 64-bit)
+      Base address in the guest physical address space of the GICv3
+      redistributor register mappings. Only valid if a guest GICv3 has been
+      instantiated.
+
 
   KVM_DEV_ARM_VGIC_GRP_DIST_REGS
   Attributes:
@@ -36,6 +51,7 @@ Groups:
     the register.
   Limitations:
     - Priorities are not implemented, and registers are RAZ/WI
+    - Currently only implemented for GICv2.
   Errors:
     -ENODEV: Getting or setting this register is not yet supported
     -EBUSY: One or more VCPUs are running
@@ -68,6 +84,7 @@ Groups:
 
   Limitations:
     - Priorities are not implemented, and registers are RAZ/WI
+    - Currently only implemented for GICv2.
   Errors:
     -ENODEV: Getting or setting this register is not yet supported
     -EBUSY: One or more VCPUs are running
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 8e38878..2ed873a 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -78,6 +78,13 @@ struct kvm_regs {
 #define KVM_VGIC_V2_DIST_SIZE		0x1000
 #define KVM_VGIC_V2_CPU_SIZE		0x2000
 
+/* Supported VGICv3 address types  */
+#define KVM_VGIC_V3_ADDR_TYPE_DIST	2
+#define KVM_VGIC_V3_ADDR_TYPE_REDIST	3
+
+#define KVM_VGIC_V3_DIST_SIZE		SZ_64K
+#define KVM_VGIC_V3_REDIST_SIZE		(2 * SZ_64K)
+
 #define KVM_ARM_VCPU_POWER_OFF		0 /* CPU is started in OFF state */
 #define KVM_ARM_VCPU_EL1_32BIT		1 /* CPU running a 32bit VM */
 #define KVM_ARM_VCPU_PSCI_0_2		2 /* CPU uses PSCI v0.2 */
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 357a935..e452ef7 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -35,8 +35,8 @@
 #define VGIC_MAX_IRQS		1024
 
 /* Sanity checks... */
-#if (KVM_MAX_VCPUS > 8)
-#error	Invalid number of CPU interfaces
+#if (KVM_MAX_VCPUS > 255)
+#error Too many KVM VCPUs, the VGIC only supports up to 255 VCPUs for now
 #endif
 
 #if (VGIC_NR_IRQS_LEGACY & 31)
diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
index 58d7457..85e1bd6 100644
--- a/virt/kvm/arm/vgic-v3-emul.c
+++ b/virt/kvm/arm/vgic-v3-emul.c
@@ -996,6 +996,9 @@ static int vgic_v3_has_attr(struct kvm_device *dev,
 		case KVM_VGIC_V2_ADDR_TYPE_DIST:
 		case KVM_VGIC_V2_ADDR_TYPE_CPU:
 			return -ENXIO;
+		case KVM_VGIC_V3_ADDR_TYPE_DIST:
+		case KVM_VGIC_V3_ADDR_TYPE_REDIST:
+			return 0;
 		}
 		break;
 	case KVM_DEV_ARM_VGIC_GRP_DIST_REGS:
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 1dbaeb5..1213da5 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1692,7 +1692,7 @@ static int vgic_ioaddr_assign(struct kvm *kvm, phys_addr_t *ioaddr,
 /**
  * kvm_vgic_addr - set or get vgic VM base addresses
  * @kvm:   pointer to the vm struct
- * @type:  the VGIC addr type, one of KVM_VGIC_V2_ADDR_TYPE_XXX
+ * @type:  the VGIC addr type, one of KVM_VGIC_V[23]_ADDR_TYPE_XXX
  * @addr:  pointer to address value
  * @write: if true set the address in the VM address space, if false read the
  *          address
@@ -1706,29 +1706,49 @@ int kvm_vgic_addr(struct kvm *kvm, unsigned long type, u64 *addr, bool write)
 {
 	int r = 0;
 	struct vgic_dist *vgic = &kvm->arch.vgic;
+	int type_needed;
+	phys_addr_t *addr_ptr, block_size;
 
 	mutex_lock(&kvm->lock);
 	switch (type) {
 	case KVM_VGIC_V2_ADDR_TYPE_DIST:
-		if (write) {
-			r = vgic_ioaddr_assign(kvm, &vgic->vgic_dist_base,
-					       *addr, KVM_VGIC_V2_DIST_SIZE);
-		} else {
-			*addr = vgic->vgic_dist_base;
-		}
+		type_needed = KVM_DEV_TYPE_ARM_VGIC_V2;
+		addr_ptr = &vgic->vgic_dist_base;
+		block_size = KVM_VGIC_V2_DIST_SIZE;
 		break;
 	case KVM_VGIC_V2_ADDR_TYPE_CPU:
-		if (write) {
-			r = vgic_ioaddr_assign(kvm, &vgic->vgic_cpu_base,
-					       *addr, KVM_VGIC_V2_CPU_SIZE);
-		} else {
-			*addr = vgic->vgic_cpu_base;
-		}
+		type_needed = KVM_DEV_TYPE_ARM_VGIC_V2;
+		addr_ptr = &vgic->vgic_cpu_base;
+		block_size = KVM_VGIC_V2_CPU_SIZE;
 		break;
+#ifdef CONFIG_ARM_GIC_V3
+	case KVM_VGIC_V3_ADDR_TYPE_DIST:
+		type_needed = KVM_DEV_TYPE_ARM_VGIC_V3;
+		addr_ptr = &vgic->vgic_dist_base;
+		block_size = KVM_VGIC_V3_DIST_SIZE;
+		break;
+	case KVM_VGIC_V3_ADDR_TYPE_REDIST:
+		type_needed = KVM_DEV_TYPE_ARM_VGIC_V3;
+		addr_ptr = &vgic->vgic_redist_base;
+		block_size = KVM_VGIC_V3_REDIST_SIZE;
+		break;
+#endif
 	default:
 		r = -ENODEV;
+		goto out;
+	}
+
+	if (vgic->vgic_model != type_needed) {
+		r = -ENODEV;
+		goto out;
 	}
 
+	if (write)
+		r = vgic_ioaddr_assign(kvm, addr_ptr, *addr, block_size);
+	else
+		*addr = *addr_ptr;
+
+out:
 	mutex_unlock(&kvm->lock);
 	return r;
 }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation
  2014-11-14 10:07 ` [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation Andre Przywara
@ 2014-11-14 11:07   ` Christoffer Dall
  2014-11-17 13:58     ` Andre Przywara
  2014-11-18 15:57   ` Eric Auger
  2014-11-23 14:38   ` Christoffer Dall
  2 siblings, 1 reply; 80+ messages in thread
From: Christoffer Dall @ 2014-11-14 11:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Nov 14, 2014 at 10:07:59AM +0000, Andre Przywara wrote:
> With everything separated and prepared, we implement a model of a
> GICv3 distributor and redistributors by using the existing framework
> to provide handler functions for each register group.
> 
> Currently we limit the emulation to a model enforcing a single
> security state, with SRE==1 (forcing system register access) and
> ARE==1 (allowing more than 8 VCPUs).
> 
> We share some of the functions provided for GICv2 emulation, but take
> the different ways of addressing (v)CPUs into account.
> Save and restore is currently not implemented.
> 
> Similar to the split-off of the GICv2 specific code, the new emulation
> code goes into a new file (vgic-v3-emul.c).
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>

??...??
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 335ffe0..b7de0f8 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1249,7 +1249,7 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>  	struct kvm_vcpu *vcpu;
>  	int edge_triggered, level_triggered;
>  	int enabled;
> -	bool ret = true;
> +	bool ret = true, can_inject = true;
>  
>  	spin_lock(&dist->lock);
>  
> @@ -1264,6 +1264,11 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>  
>  	if (irq_num >= VGIC_NR_PRIVATE_IRQS) {
>  		cpuid = dist->irq_spi_cpu[irq_num - VGIC_NR_PRIVATE_IRQS];
> +		if (cpuid == VCPU_NOT_ALLOCATED) {
> +			/* Pretend we use CPU0, and prevent injection */
> +			cpuid = 0;
> +			can_inject = false;
> +		}
>  		vcpu = kvm_get_vcpu(kvm, cpuid);
>  	}
>  
> @@ -1285,7 +1290,7 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>  
>  	enabled = vgic_irq_is_enabled(vcpu, irq_num);
>  
> -	if (!enabled) {
> +	if (!enabled || !can_inject) {
>  		ret = false;
>  		goto out;
>  	}

As I wrote, I think this is wrong.  What happened here?

-Christoffer

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation
  2014-11-14 11:07   ` Christoffer Dall
@ 2014-11-17 13:58     ` Andre Przywara
  2014-11-17 23:46       ` Christoffer Dall
  0 siblings, 1 reply; 80+ messages in thread
From: Andre Przywara @ 2014-11-17 13:58 UTC (permalink / raw)
  To: linux-arm-kernel

Hej Christoffer,

maybe I just don't see the wood for the trees, so I will explain my view
on the things below. Please correct me / explain your view!

On 14/11/14 11:07, Christoffer Dall wrote:
> On Fri, Nov 14, 2014 at 10:07:59AM +0000, Andre Przywara wrote:
>> With everything separated and prepared, we implement a model of a
>> GICv3 distributor and redistributors by using the existing framework
>> to provide handler functions for each register group.
>>
>> Currently we limit the emulation to a model enforcing a single
>> security state, with SRE==1 (forcing system register access) and
>> ARE==1 (allowing more than 8 VCPUs).
>>
>> We share some of the functions provided for GICv2 emulation, but take
>> the different ways of addressing (v)CPUs into account.
>> Save and restore is currently not implemented.
>>
>> Similar to the split-off of the GICv2 specific code, the new emulation
>> code goes into a new file (vgic-v3-emul.c).
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> 
> ??...??
>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>> index 335ffe0..b7de0f8 100644
>> --- a/virt/kvm/arm/vgic.c
>> +++ b/virt/kvm/arm/vgic.c
>> @@ -1249,7 +1249,7 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>>  	struct kvm_vcpu *vcpu;
>>  	int edge_triggered, level_triggered;
>>  	int enabled;
>> -	bool ret = true;
>> +	bool ret = true, can_inject = true;
>>  
>>  	spin_lock(&dist->lock);
>>  
>> @@ -1264,6 +1264,11 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>>  
>>  	if (irq_num >= VGIC_NR_PRIVATE_IRQS) {
>>  		cpuid = dist->irq_spi_cpu[irq_num - VGIC_NR_PRIVATE_IRQS];
>> +		if (cpuid == VCPU_NOT_ALLOCATED) {
>> +			/* Pretend we use CPU0, and prevent injection */
>> +			cpuid = 0;
>> +			can_inject = false;
>> +		}
>>  		vcpu = kvm_get_vcpu(kvm, cpuid);
>>  	}
>>  
>> @@ -1285,7 +1290,7 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>>  
>>  	enabled = vgic_irq_is_enabled(vcpu, irq_num);
>>  
>> -	if (!enabled) {
>> +	if (!enabled || !can_inject) {
>>  		ret = false;
>>  		goto out;
>>  	}
> 
> As I wrote, I think this is wrong.  What happened here?

can_inject is just a way for stopping "non-targeted" SPIs to be
_injected_. The spec says in "5.3.4. GICD_IROUTERn":

"If this register is programmed to forward the corresponding interrupt
to a specific processor (i.e. IRM is zero) and the affinity path does
not correspond to an implemented processor, then if the corresponding
interrupt becomes pending it will not be forwarded to any processor and
will remain pending."

So we can happily make these SPIs pending, but an illegal irq_spi_cpu[]
entry will just avoid it to be injected, right?

What am I missing?

Do you want a comment here explaining this?

Cheers,
Andre.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation
  2014-11-17 13:58     ` Andre Przywara
@ 2014-11-17 23:46       ` Christoffer Dall
  0 siblings, 0 replies; 80+ messages in thread
From: Christoffer Dall @ 2014-11-17 23:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 17, 2014 at 01:58:05PM +0000, Andre Przywara wrote:
> Hej Christoffer,
> 
> maybe I just don't see the wood for the trees, so I will explain my view
> on the things below. Please correct me / explain your view!

I think this argument should have taken place as a reply to my e-mail on
the last version of this patch, but ok.

> 
> On 14/11/14 11:07, Christoffer Dall wrote:
> > On Fri, Nov 14, 2014 at 10:07:59AM +0000, Andre Przywara wrote:
> >> With everything separated and prepared, we implement a model of a
> >> GICv3 distributor and redistributors by using the existing framework
> >> to provide handler functions for each register group.
> >>
> >> Currently we limit the emulation to a model enforcing a single
> >> security state, with SRE==1 (forcing system register access) and
> >> ARE==1 (allowing more than 8 VCPUs).
> >>
> >> We share some of the functions provided for GICv2 emulation, but take
> >> the different ways of addressing (v)CPUs into account.
> >> Save and restore is currently not implemented.
> >>
> >> Similar to the split-off of the GICv2 specific code, the new emulation
> >> code goes into a new file (vgic-v3-emul.c).
> >>
> >> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> > 
> > ??...??
> >> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> >> index 335ffe0..b7de0f8 100644
> >> --- a/virt/kvm/arm/vgic.c
> >> +++ b/virt/kvm/arm/vgic.c
> >> @@ -1249,7 +1249,7 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
> >>  	struct kvm_vcpu *vcpu;
> >>  	int edge_triggered, level_triggered;
> >>  	int enabled;
> >> -	bool ret = true;
> >> +	bool ret = true, can_inject = true;
> >>  
> >>  	spin_lock(&dist->lock);
> >>  
> >> @@ -1264,6 +1264,11 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
> >>  
> >>  	if (irq_num >= VGIC_NR_PRIVATE_IRQS) {
> >>  		cpuid = dist->irq_spi_cpu[irq_num - VGIC_NR_PRIVATE_IRQS];
> >> +		if (cpuid == VCPU_NOT_ALLOCATED) {
> >> +			/* Pretend we use CPU0, and prevent injection */
> >> +			cpuid = 0;
> >> +			can_inject = false;
> >> +		}
> >>  		vcpu = kvm_get_vcpu(kvm, cpuid);
> >>  	}
> >>  
> >> @@ -1285,7 +1290,7 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
> >>  
> >>  	enabled = vgic_irq_is_enabled(vcpu, irq_num);
> >>  
> >> -	if (!enabled) {
> >> +	if (!enabled || !can_inject) {
> >>  		ret = false;
> >>  		goto out;
> >>  	}
> > 
> > As I wrote, I think this is wrong.  What happened here?
> 
> can_inject is just a way for stopping "non-targeted" SPIs to be
> _injected_. The spec says in "5.3.4. GICD_IROUTERn":
> 
> "If this register is programmed to forward the corresponding interrupt
> to a specific processor (i.e. IRM is zero) and the affinity path does
> not correspond to an implemented processor, then if the corresponding
> interrupt becomes pending it will not be forwarded to any processor and
> will remain pending."
> 
> So we can happily make these SPIs pending, but an illegal irq_spi_cpu[]
> entry will just avoid it to be injected, right?
> 
> What am I missing?
> 
> Do you want a comment here explaining this?
> 
No, I missed something.  What I missed was that we consider
irq_spi_target in the compute_pending_for_cpu() so that this actually
ends up working.

We really shouldn't be trying to take this "precompute cpu pending
bitmaps instead of calling compute_pending_for_cpu()" shortcut here, at
least I'm beginning to have troubles following the flow.  That's for
another time and place though.

I'll review the rest of this version of the series when I get some cycles.

-Christoffer

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 01/19] arm/arm64: KVM: rework MPIDR assignment and add accessors
  2014-11-14 10:07 ` [PATCH v4 01/19] arm/arm64: KVM: rework MPIDR assignment and add accessors Andre Przywara
@ 2014-11-18 10:35   ` Eric Auger
  2014-11-23  9:34   ` Christoffer Dall
  1 sibling, 0 replies; 80+ messages in thread
From: Eric Auger @ 2014-11-18 10:35 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/14/2014 11:07 AM, Andre Przywara wrote:
> The virtual MPIDR registers (containing topology information) for the
> guest are currently mapped linearily to the vcpu_id. Improve this
> mapping for arm64 by using three levels to not artificially limit the
> number of vCPUs.
> To help this, change and rename the kvm_vcpu_get_mpidr() function to
> mask off the non-affinity bits in the MPIDR register.
> Also add an accessor to later allow easier access to a vCPU with a
> given MPIDR. Use this new accessor in the PSCI emulation.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
> Changelog v3...v4:
> - rename kvm_vcpu_get_mpidr() to kvm_vcpu_get_mpidr_aff()
> - simplify kvm_mpidr_to_vcpu()
> - fixup comment
> 
>  arch/arm/include/asm/kvm_emulate.h   |    5 +++--
>  arch/arm/include/asm/kvm_host.h      |    2 ++
>  arch/arm/kvm/arm.c                   |   13 +++++++++++++
>  arch/arm/kvm/psci.c                  |   17 +++++------------
>  arch/arm64/include/asm/kvm_emulate.h |    5 +++--
>  arch/arm64/include/asm/kvm_host.h    |    2 ++
>  arch/arm64/kvm/sys_regs.c            |   11 +++++++++--
>  7 files changed, 37 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> index b9db269..3ae88ac 100644
> --- a/arch/arm/include/asm/kvm_emulate.h
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -23,6 +23,7 @@
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_mmio.h>
>  #include <asm/kvm_arm.h>
> +#include <asm/cputype.h>
>  
>  unsigned long *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num);
>  unsigned long *vcpu_spsr(struct kvm_vcpu *vcpu);
> @@ -162,9 +163,9 @@ static inline u32 kvm_vcpu_hvc_get_imm(struct kvm_vcpu *vcpu)
>  	return kvm_vcpu_get_hsr(vcpu) & HSR_HVC_IMM_MASK;
>  }
>  
> -static inline unsigned long kvm_vcpu_get_mpidr(struct kvm_vcpu *vcpu)
> +static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
>  {
> -	return vcpu->arch.cp15[c0_MPIDR];
> +	return vcpu->arch.cp15[c0_MPIDR] & MPIDR_HWID_BITMASK;
Hi Andre,

Can't we use a single naming here? aff or hwid
>  }
>  
>  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 53036e2..b443dfe 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -236,6 +236,8 @@ static inline void vgic_arch_setup(const struct vgic_params *vgic)
>  int kvm_perf_init(void);
>  int kvm_perf_teardown(void);
>  
> +struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
> +
>  static inline void kvm_arch_hardware_disable(void) {}
>  static inline void kvm_arch_hardware_unsetup(void) {}
>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 9e193c8..c2a5c69 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -977,6 +977,19 @@ static void check_kvm_target_cpu(void *ret)
>  	*(int *)ret = kvm_target_cpu();
>  }
>  
> +struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr)
> +{
> +	struct kvm_vcpu *vcpu;
> +	int i;
> +
> +	mpidr &= MPIDR_HWID_BITMASK;
> +	kvm_for_each_vcpu(i, vcpu, kvm) {
> +		if (mpidr == kvm_vcpu_get_mpidr_aff(vcpu))
> +			return vcpu;
> +	}
> +	return NULL;
> +}
> +
>  /**
>   * Initialize Hyp-mode and memory mappings on all CPUs.
>   */
> diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
> index 09cf377..84121b2 100644
> --- a/arch/arm/kvm/psci.c
> +++ b/arch/arm/kvm/psci.c
> @@ -21,6 +21,7 @@
>  #include <asm/cputype.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_psci.h>
> +#include <asm/kvm_host.h>
>  
>  /*
>   * This is an implementation of the Power State Coordination Interface
> @@ -65,25 +66,17 @@ static void kvm_psci_vcpu_off(struct kvm_vcpu *vcpu)
>  static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
>  {
>  	struct kvm *kvm = source_vcpu->kvm;
> -	struct kvm_vcpu *vcpu = NULL, *tmp;
> +	struct kvm_vcpu *vcpu = NULL;
>  	wait_queue_head_t *wq;
>  	unsigned long cpu_id;
>  	unsigned long context_id;
> -	unsigned long mpidr;
>  	phys_addr_t target_pc;
> -	int i;
>  
> -	cpu_id = *vcpu_reg(source_vcpu, 1);
> +	cpu_id = *vcpu_reg(source_vcpu, 1) & MPIDR_HWID_BITMASK;
>  	if (vcpu_mode_is_32bit(source_vcpu))
>  		cpu_id &= ~((u32) 0);
>  
> -	kvm_for_each_vcpu(i, tmp, kvm) {
> -		mpidr = kvm_vcpu_get_mpidr(tmp);
> -		if ((mpidr & MPIDR_HWID_BITMASK) == (cpu_id & MPIDR_HWID_BITMASK)) {
> -			vcpu = tmp;
> -			break;
> -		}
> -	}
> +	vcpu = kvm_mpidr_to_vcpu(kvm, cpu_id);
>  
>  	/*
>  	 * Make sure the caller requested a valid CPU and that the CPU is
> @@ -154,7 +147,7 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
>  	 * then ON else OFF
>  	 */
>  	kvm_for_each_vcpu(i, tmp, kvm) {
> -		mpidr = kvm_vcpu_get_mpidr(tmp);
> +		mpidr = kvm_vcpu_get_mpidr_aff(tmp);
>  		if (((mpidr & target_affinity_mask) == target_affinity) &&
>  		    !tmp->arch.pause) {
>  			return PSCI_0_2_AFFINITY_LEVEL_ON;
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 5674a55..d4daaa5 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -27,6 +27,7 @@
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_mmio.h>
>  #include <asm/ptrace.h>
> +#include <asm/cputype.h>
>  
>  unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num);
>  unsigned long *vcpu_spsr32(const struct kvm_vcpu *vcpu);
> @@ -182,9 +183,9 @@ static inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vcpu)
>  	return kvm_vcpu_get_hsr(vcpu) & ESR_EL2_FSC_TYPE;
>  }
>  
> -static inline unsigned long kvm_vcpu_get_mpidr(struct kvm_vcpu *vcpu)
> +static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
>  {
> -	return vcpu_sys_reg(vcpu, MPIDR_EL1);
> +	return vcpu_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
>  }
>  
>  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 2012c4b..286bb61 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -207,6 +207,8 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
>  int kvm_perf_init(void);
>  int kvm_perf_teardown(void);
>  
> +struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
> +
>  static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
>  				       phys_addr_t pgd_ptr,
>  				       unsigned long hyp_stack_ptr,
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 4cc3b71..fd3ffc3 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -252,10 +252,17 @@ static void reset_amair_el1(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
>  
>  static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
>  {
> +	u64 mpidr;
> +
>  	/*
> -	 * Simply map the vcpu_id into the Aff0 field of the MPIDR.
> +	 * Map the vcpu_id into the first three Aff fields of the MPIDR.
> +	 * We limit the number of VCPUs in Aff0 due to a limitation in the
> +	 * ICC_SGIxR registers of the GICv3.
>  	 */
I had some difficulties to understand that comment. Is that limitation a
bug, a limitation bound to be fixed later? I found in the doc each
affinity level might have between 1 and 256 children. But target
a.b.c.{target list} for SGI is limited to 16 processors only. To me it
is clearer stated that way.

Best Regards

Eric

> -	vcpu_sys_reg(vcpu, MPIDR_EL1) = (1UL << 31) | (vcpu->vcpu_id & 0xff);
> +	mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
> +	mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
> +	mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
> +	vcpu_sys_reg(vcpu, MPIDR_EL1) = (1ULL << 31) | mpidr;
>  }
>  
>  /* Silly macro to expand the DBG{BCR,BVR,WVR,WCR}n_EL1 registers in one go */
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 03/19] arm/arm64: KVM: refactor vgic_handle_mmio() function
  2014-11-14 10:07 ` [PATCH v4 03/19] arm/arm64: KVM: refactor vgic_handle_mmio() function Andre Przywara
@ 2014-11-18 10:35   ` Eric Auger
  0 siblings, 0 replies; 80+ messages in thread
From: Eric Auger @ 2014-11-18 10:35 UTC (permalink / raw)
  To: linux-arm-kernel

Andre,

On 11/14/2014 11:07 AM, Andre Przywara wrote:
> Currently we only need to deal with one MMIO region for the GIC
> emulation
might be worth mentioning it is the dist one?
, but we soon need to extend this. Refactor the existing
> code to allow easier addition of different ranges without code
> duplication.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
> Changelog v3...v4:
> - simplify is_in_range()
> - added Reviewed-by:
> 
>  virt/kvm/arm/vgic.c |   75 ++++++++++++++++++++++++++++++++++++---------------
>  1 file changed, 54 insertions(+), 21 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 2403d72..5eee3de 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1032,37 +1032,28 @@ static bool vgic_validate_access(const struct vgic_dist *dist,
>  	return true;
>  }
>  
> -/**
restore /**?
> - * vgic_handle_mmio - handle an in-kernel MMIO access
> +/*
> + * vgic_handle_mmio_range - handle an in-kernel MMIO access
>   * @vcpu:	pointer to the vcpu performing the access
>   * @run:	pointer to the kvm_run structure
>   * @mmio:	pointer to the data describing the access
> + * @ranges:	pointer to the register defining structure
array of MMIO ranges in a given region?
> + * @mmio_base:	base address for this mapping
base address of the region?
>   *
> - * returns true if the MMIO access has been performed in kernel space,
> - * and false if it needs to be emulated in user space.
> + * returns true if the MMIO access could be performed
>   */
> -bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
> -		      struct kvm_exit_mmio *mmio)
> +static bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, struct kvm_run *run,
> +			    struct kvm_exit_mmio *mmio,
> +			    const struct mmio_range *ranges,
> +			    unsigned long mmio_base)
>  {
>  	const struct mmio_range *range;
>  	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> -	unsigned long base = dist->vgic_dist_base;
>  	bool updated_state;
>  	unsigned long offset;
>  
> -	if (!irqchip_in_kernel(vcpu->kvm) ||
> -	    mmio->phys_addr < base ||
> -	    (mmio->phys_addr + mmio->len) > (base + KVM_VGIC_V2_DIST_SIZE))
> -		return false;
> -
> -	/* We don't support ldrd / strd or ldm / stm to the emulated vgic */
> -	if (mmio->len > 4) {
> -		kvm_inject_dabt(vcpu, mmio->phys_addr);
> -		return true;
> -	}
> -
> -	offset = mmio->phys_addr - base;
> -	range = find_matching_range(vgic_dist_ranges, mmio, offset);
> +	offset = mmio->phys_addr - mmio_base;
> +	range = find_matching_range(ranges, mmio, offset);
>  	if (unlikely(!range || !range->handle_mmio)) {
>  		pr_warn("Unhandled access %d %08llx %d\n",
>  			mmio->is_write, mmio->phys_addr, mmio->len);
> @@ -1070,7 +1061,7 @@ bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
>  	}
>  
>  	spin_lock(&vcpu->kvm->arch.vgic.lock);
> -	offset = mmio->phys_addr - range->base - base;
> +	offset -= range->base;
I think I would prefer having 2 variables since the semantic changes:
offset wrt region base, offset wrt range base but that's a detail.
>  	if (vgic_validate_access(dist, range, offset)) {
>  		updated_state = range->handle_mmio(vcpu, mmio, offset);
>  	} else {
> @@ -1088,6 +1079,48 @@ bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
>  	return true;
>  }
>  
> +static inline bool is_in_range(phys_addr_t addr, unsigned long len,
> +			       phys_addr_t baseaddr, unsigned long size)
> +{
> +	return (addr >= baseaddr) && (addr + len <= baseaddr + size);
> +}
> +
> +static bool vgic_v2_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
> +				struct kvm_exit_mmio *mmio)
> +{
> +	unsigned long base = vcpu->kvm->arch.vgic.vgic_dist_base;
dist_base?

Best Regards

Eric
> +
> +	if (!is_in_range(mmio->phys_addr, mmio->len, base,
> +			 KVM_VGIC_V2_DIST_SIZE))
> +		return false;
> +
> +	/* GICv2 does not support accesses wider than 32 bits */
> +	if (mmio->len > 4) {
> +		kvm_inject_dabt(vcpu, mmio->phys_addr);
> +		return true;
> +	}
> +
> +	return vgic_handle_mmio_range(vcpu, run, mmio, vgic_dist_ranges, base);
> +}
> +
> +/**
> + * vgic_handle_mmio - handle an in-kernel MMIO access for the GIC emulation
> + * @vcpu:      pointer to the vcpu performing the access
> + * @run:       pointer to the kvm_run structure
> + * @mmio:      pointer to the data describing the access
> + *
> + * returns true if the MMIO access has been performed in kernel space,
> + * and false if it needs to be emulated in user space.
> + */
> +bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
> +		      struct kvm_exit_mmio *mmio)
> +{
> +	if (!irqchip_in_kernel(vcpu->kvm))
> +		return false;
> +
> +	return vgic_v2_handle_mmio(vcpu, run, mmio);
> +}
> +
>  static u8 *vgic_get_sgi_sources(struct vgic_dist *dist, int vcpu_id, int sgi)
>  {
>  	return dist->irq_sgi_sources + vcpu_id * VGIC_NR_SGIS + sgi;
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 02/19] arm/arm64: KVM: pass down user space provided GIC type into vGIC code
  2014-11-14 10:07 ` [PATCH v4 02/19] arm/arm64: KVM: pass down user space provided GIC type into vGIC code Andre Przywara
@ 2014-11-18 10:36   ` Eric Auger
  0 siblings, 0 replies; 80+ messages in thread
From: Eric Auger @ 2014-11-18 10:36 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/14/2014 11:07 AM, Andre Przywara wrote:
> With the introduction of a second emulated GIC model we need to let
> userspace specify the GIC model to use for each VM. Pass the
> userspace provided value down into the vGIC code and store it there
> to differentiate later.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
> Changelog v3...v4:
> - added Acked-by
> 
>  arch/arm/kvm/arm.c     |    2 +-
>  include/kvm/arm_vgic.h |    7 +++++--
>  virt/kvm/arm/vgic.c    |    5 +++--
>  3 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index c2a5c69..8817fbd 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -753,7 +753,7 @@ long kvm_arch_vm_ioctl(struct file *filp,
>  	switch (ioctl) {
>  	case KVM_CREATE_IRQCHIP: {
>  		if (vgic_present)
> -			return kvm_vgic_create(kvm);
> +			return kvm_vgic_create(kvm, KVM_DEV_TYPE_ARM_VGIC_V2);
>  		else
>  			return -ENXIO;
>  	}
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 206dcc3..dde5a00 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -140,6 +140,9 @@ struct vgic_dist {
>  	bool			in_kernel;
>  	bool			ready;
>  
> +	/* vGIC model the kernel emulates for the guest (GICv2 or GICv3) */
just a small question related to GICv2m, will it be considered as
another model or is the same as GICv2 from a guest perspective?

Eric
> +	u32			vgic_model;
> +
>  	int			nr_cpus;
>  	int			nr_irqs;
>  
> @@ -275,7 +278,7 @@ struct kvm_exit_mmio;
>  int kvm_vgic_addr(struct kvm *kvm, unsigned long type, u64 *addr, bool write);
>  int kvm_vgic_hyp_init(void);
>  int kvm_vgic_init(struct kvm *kvm);
> -int kvm_vgic_create(struct kvm *kvm);
> +int kvm_vgic_create(struct kvm *kvm, u32 type);
>  void kvm_vgic_destroy(struct kvm *kvm);
>  void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu);
>  void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu);
> @@ -326,7 +329,7 @@ static inline int kvm_vgic_init(struct kvm *kvm)
>  	return 0;
>  }
>  
> -static inline int kvm_vgic_create(struct kvm *kvm)
> +static inline int kvm_vgic_create(struct kvm *kvm, u32 type)
>  {
>  	return 0;
>  }
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 3aaca49..2403d72 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1931,7 +1931,7 @@ out:
>  	return ret;
>  }
>  
> -int kvm_vgic_create(struct kvm *kvm)
> +int kvm_vgic_create(struct kvm *kvm, u32 type)
>  {
>  	int i, vcpu_lock_idx = -1, ret = 0;
>  	struct kvm_vcpu *vcpu;
> @@ -1963,6 +1963,7 @@ int kvm_vgic_create(struct kvm *kvm)
>  
>  	spin_lock_init(&kvm->arch.vgic.lock);
>  	kvm->arch.vgic.in_kernel = true;
> +	kvm->arch.vgic.vgic_model = type;
>  	kvm->arch.vgic.vctrl_base = vgic->vctrl_base;
>  	kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
>  	kvm->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
> @@ -2388,7 +2389,7 @@ static void vgic_destroy(struct kvm_device *dev)
>  
>  static int vgic_create(struct kvm_device *dev, u32 type)
>  {
> -	return kvm_vgic_create(dev->kvm);
> +	return kvm_vgic_create(dev->kvm, type);
>  }
>  
>  static struct kvm_device_ops kvm_arm_vgic_v2_ops = {
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 04/19] arm/arm64: KVM: wrap 64 bit MMIO accesses with two 32 bit ones
  2014-11-14 10:07 ` [PATCH v4 04/19] arm/arm64: KVM: wrap 64 bit MMIO accesses with two 32 bit ones Andre Przywara
@ 2014-11-18 10:36   ` Eric Auger
  2014-11-23  9:42   ` Christoffer Dall
  1 sibling, 0 replies; 80+ messages in thread
From: Eric Auger @ 2014-11-18 10:36 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/14/2014 11:07 AM, Andre Przywara wrote:
> Some GICv3 registers can and will be accessed as 64 bit registers.
> Currently the register handling code can only deal with 32 bit
> accesses, so we do two consecutive calls to cover this.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
> Changelog v3...v4:
> - add comment explaining little endian handling
> 
>  virt/kvm/arm/vgic.c |   51 ++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 48 insertions(+), 3 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 5eee3de..dba51e4 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1033,6 +1033,51 @@ static bool vgic_validate_access(const struct vgic_dist *dist,
>  }
>  
>  /*
> + * Call the respective handler function for the given range.
> + * We split up any 64 bit accesses into two consecutive 32 bit
> + * handler calls and merge the result afterwards.
> + * We do this in a little endian fashion regardless of the host's
> + * or guest's endianness, because the GIC is always LE and the rest of
> + * the code (vgic_reg_access) also puts it in a LE fashion already.*
might be worth explaining the semantics of offset and range, related to
1 range in the region that time.

Eric
> + */
> +static bool call_range_handler(struct kvm_vcpu *vcpu,
> +			       struct kvm_exit_mmio *mmio,
> +			       unsigned long offset,
> +			       const struct mmio_range *range)
> +{
> +	u32 *data32 = (void *)mmio->data;
> +	struct kvm_exit_mmio mmio32;
> +	bool ret;
> +
> +	if (likely(mmio->len <= 4))
> +		return range->handle_mmio(vcpu, mmio, offset);
> +
> +	/*
> +	 * Any access bigger than 4 bytes (that we currently handle in KVM)
> +	 * is actually 8 bytes long, caused by a 64-bit access
> +	 */
> +
> +	mmio32.len = 4;
> +	mmio32.is_write = mmio->is_write;
> +
> +	mmio32.phys_addr = mmio->phys_addr + 4;
> +	if (mmio->is_write)
> +		*(u32 *)mmio32.data = data32[1];
> +	ret = range->handle_mmio(vcpu, &mmio32, offset + 4);
> +	if (!mmio->is_write)
> +		data32[1] = *(u32 *)mmio32.data;
> +
> +	mmio32.phys_addr = mmio->phys_addr;
> +	if (mmio->is_write)
> +		*(u32 *)mmio32.data = data32[0];
> +	ret |= range->handle_mmio(vcpu, &mmio32, offset);
> +	if (!mmio->is_write)
> +		data32[0] = *(u32 *)mmio32.data;
> +
> +	return ret;
> +}
> +
> +/*
>   * vgic_handle_mmio_range - handle an in-kernel MMIO access
>   * @vcpu:	pointer to the vcpu performing the access
>   * @run:	pointer to the kvm_run structure
> @@ -1063,10 +1108,10 @@ static bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, struct kvm_run *run,
>  	spin_lock(&vcpu->kvm->arch.vgic.lock);
>  	offset -= range->base;
>  	if (vgic_validate_access(dist, range, offset)) {
> -		updated_state = range->handle_mmio(vcpu, mmio, offset);
> +		updated_state = call_range_handler(vcpu, mmio, offset, range);
>  	} else {
> -		vgic_reg_access(mmio, NULL, offset,
> -				ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
> +		if (!mmio->is_write)
> +			memset(mmio->data, 0, mmio->len);
>  		updated_state = false;
>  	}
>  	spin_unlock(&vcpu->kvm->arch.vgic.lock);
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 06/19] arm/arm64: KVM: move kvm_register_device_ops() into vGIC probing
  2014-11-14 10:07 ` [PATCH v4 06/19] arm/arm64: KVM: move kvm_register_device_ops() into vGIC probing Andre Przywara
@ 2014-11-18 10:43   ` Eric Auger
  2014-11-18 10:58     ` Eric Auger
  0 siblings, 1 reply; 80+ messages in thread
From: Eric Auger @ 2014-11-18 10:43 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/14/2014 11:07 AM, Andre Przywara wrote:
> Currently we unconditionally register the GICv2 emulation device
> during the host's KVM initialization. Since with GICv3 support we
> may end up with only v2 or only v3 or both supported, we move the
> registration into the GIC probing function, where we will later know
> which combination is valid.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
> Changelog v3...v4:
> - add Acked-by
> 
>  include/linux/kvm_host.h |    1 +
>  virt/kvm/arm/vgic-v2.c   |    2 ++
>  virt/kvm/arm/vgic-v3.c   |    1 +
>  virt/kvm/arm/vgic.c      |    5 ++---
>  4 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index ea53b04..326ba7a 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1084,6 +1084,7 @@ void kvm_unregister_device_ops(u32 type);
>  
>  extern struct kvm_device_ops kvm_mpic_ops;
>  extern struct kvm_device_ops kvm_xics_ops;
> +extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
>  
>  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
>  
> diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
> index 2935405..e1cd3cb 100644
> --- a/virt/kvm/arm/vgic-v2.c
> +++ b/virt/kvm/arm/vgic-v2.c
> @@ -229,6 +229,8 @@ int vgic_v2_probe(struct device_node *vgic_node,
>  		goto out_unmap;
>  	}
>  
> +	kvm_register_device_ops(&kvm_arm_vgic_v2_ops, KVM_DEV_TYPE_ARM_VGIC_V2);
> +
>  	vgic->vcpu_base = vcpu_res.start;
>  
>  	kvm_info("%s@%llx IRQ%d\n", vgic_node->name,
> diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
> index 1c2c8ee..d14c75f 100644
> --- a/virt/kvm/arm/vgic-v3.c
> +++ b/virt/kvm/arm/vgic-v3.c
> @@ -230,6 +230,7 @@ int vgic_v3_probe(struct device_node *vgic_node,
>  		ret = -ENXIO;
>  		goto out;
>  	}
> +	kvm_register_device_ops(&kvm_arm_vgic_v2_ops, KVM_DEV_TYPE_ARM_VGIC_V2);
I did not find any unregistration? I am wrong or isn't it relevant? If
confirmed might be added in kvm_arch_exit. Saw KVM-VFIO device ops
unregistration is done in kvm_exit/kvm_vfio_ops_exit.

Eric
>  
>  	vgic->vcpu_base = vcpu_res.start;
>  	vgic->vctrl_base = NULL;
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 963b84e..e8003ca 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -2540,7 +2540,7 @@ static int vgic_create(struct kvm_device *dev, u32 type)
>  	return kvm_vgic_create(dev->kvm, type);
>  }
>  
> -static struct kvm_device_ops kvm_arm_vgic_v2_ops = {
> +struct kvm_device_ops kvm_arm_vgic_v2_ops = {
>  	.name = "kvm-arm-vgic",
>  	.create = vgic_create,
>  	.destroy = vgic_destroy,
> @@ -2619,8 +2619,7 @@ int kvm_vgic_hyp_init(void)
>  
>  	on_each_cpu(vgic_init_maintenance_interrupt, NULL, 1);
>  
> -	return kvm_register_device_ops(&kvm_arm_vgic_v2_ops,
> -				       KVM_DEV_TYPE_ARM_VGIC_V2);
> +	return 0;
>  
>  out_free_irq:
>  	free_percpu_irq(vgic->maint_irq, kvm_get_running_vcpus());
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 06/19] arm/arm64: KVM: move kvm_register_device_ops() into vGIC probing
  2014-11-18 10:43   ` Eric Auger
@ 2014-11-18 10:58     ` Eric Auger
  2014-11-18 11:03       ` Andre Przywara
  0 siblings, 1 reply; 80+ messages in thread
From: Eric Auger @ 2014-11-18 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/18/2014 11:43 AM, Eric Auger wrote:
> On 11/14/2014 11:07 AM, Andre Przywara wrote:
>> Currently we unconditionally register the GICv2 emulation device
>> during the host's KVM initialization. Since with GICv3 support we
>> may end up with only v2 or only v3 or both supported, we move the
>> registration into the GIC probing function, where we will later know
>> which combination is valid.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
>> ---
>> Changelog v3...v4:
>> - add Acked-by
>>
>>  include/linux/kvm_host.h |    1 +
>>  virt/kvm/arm/vgic-v2.c   |    2 ++
>>  virt/kvm/arm/vgic-v3.c   |    1 +
>>  virt/kvm/arm/vgic.c      |    5 ++---
>>  4 files changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index ea53b04..326ba7a 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -1084,6 +1084,7 @@ void kvm_unregister_device_ops(u32 type);
>>  
>>  extern struct kvm_device_ops kvm_mpic_ops;
>>  extern struct kvm_device_ops kvm_xics_ops;
>> +extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
>>  
>>  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
>>  
>> diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
>> index 2935405..e1cd3cb 100644
>> --- a/virt/kvm/arm/vgic-v2.c
>> +++ b/virt/kvm/arm/vgic-v2.c
>> @@ -229,6 +229,8 @@ int vgic_v2_probe(struct device_node *vgic_node,
>>  		goto out_unmap;
>>  	}
>>  
>> +	kvm_register_device_ops(&kvm_arm_vgic_v2_ops, KVM_DEV_TYPE_ARM_VGIC_V2);
>> +
>>  	vgic->vcpu_base = vcpu_res.start;
>>  
>>  	kvm_info("%s@%llx IRQ%d\n", vgic_node->name,
>> diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
>> index 1c2c8ee..d14c75f 100644
>> --- a/virt/kvm/arm/vgic-v3.c
>> +++ b/virt/kvm/arm/vgic-v3.c
>> @@ -230,6 +230,7 @@ int vgic_v3_probe(struct device_node *vgic_node,
>>  		ret = -ENXIO;
>>  		goto out;
>>  	}
>> +	kvm_register_device_ops(&kvm_arm_vgic_v2_ops, KVM_DEV_TYPE_ARM_VGIC_V2);
> I did not find any unregistration? I am wrong or isn't it relevant? If
> confirmed might be added in kvm_arch_exit. Saw KVM-VFIO device ops
> unregistration is done in kvm_exit/kvm_vfio_ops_exit.

well forget that one. I guess it is not relevant for that device which
is not going to be released.

BR

Eric
> 
> Eric
>>  
>>  	vgic->vcpu_base = vcpu_res.start;
>>  	vgic->vctrl_base = NULL;
>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>> index 963b84e..e8003ca 100644
>> --- a/virt/kvm/arm/vgic.c
>> +++ b/virt/kvm/arm/vgic.c
>> @@ -2540,7 +2540,7 @@ static int vgic_create(struct kvm_device *dev, u32 type)
>>  	return kvm_vgic_create(dev->kvm, type);
>>  }
>>  
>> -static struct kvm_device_ops kvm_arm_vgic_v2_ops = {
>> +struct kvm_device_ops kvm_arm_vgic_v2_ops = {
>>  	.name = "kvm-arm-vgic",
>>  	.create = vgic_create,
>>  	.destroy = vgic_destroy,
>> @@ -2619,8 +2619,7 @@ int kvm_vgic_hyp_init(void)
>>  
>>  	on_each_cpu(vgic_init_maintenance_interrupt, NULL, 1);
>>  
>> -	return kvm_register_device_ops(&kvm_arm_vgic_v2_ops,
>> -				       KVM_DEV_TYPE_ARM_VGIC_V2);
>> +	return 0;
>>  
>>  out_free_irq:
>>  	free_percpu_irq(vgic->maint_irq, kvm_get_running_vcpus());
>>
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 06/19] arm/arm64: KVM: move kvm_register_device_ops() into vGIC probing
  2014-11-18 10:58     ` Eric Auger
@ 2014-11-18 11:03       ` Andre Przywara
  0 siblings, 0 replies; 80+ messages in thread
From: Andre Przywara @ 2014-11-18 11:03 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Eric,

thanks for the review! Much appreciated.

On 18/11/14 10:58, Eric Auger wrote:
> On 11/18/2014 11:43 AM, Eric Auger wrote:
>> On 11/14/2014 11:07 AM, Andre Przywara wrote:
>>> Currently we unconditionally register the GICv2 emulation device
>>> during the host's KVM initialization. Since with GICv3 support we
>>> may end up with only v2 or only v3 or both supported, we move the
>>> registration into the GIC probing function, where we will later know
>>> which combination is valid.
>>>
>>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>>> Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
>>> ---
>>> Changelog v3...v4:
>>> - add Acked-by
>>>
>>>  include/linux/kvm_host.h |    1 +
>>>  virt/kvm/arm/vgic-v2.c   |    2 ++
>>>  virt/kvm/arm/vgic-v3.c   |    1 +
>>>  virt/kvm/arm/vgic.c      |    5 ++---
>>>  4 files changed, 6 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>>> index ea53b04..326ba7a 100644
>>> --- a/include/linux/kvm_host.h
>>> +++ b/include/linux/kvm_host.h
>>> @@ -1084,6 +1084,7 @@ void kvm_unregister_device_ops(u32 type);
>>>  
>>>  extern struct kvm_device_ops kvm_mpic_ops;
>>>  extern struct kvm_device_ops kvm_xics_ops;
>>> +extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
>>>  
>>>  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
>>>  
>>> diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
>>> index 2935405..e1cd3cb 100644
>>> --- a/virt/kvm/arm/vgic-v2.c
>>> +++ b/virt/kvm/arm/vgic-v2.c
>>> @@ -229,6 +229,8 @@ int vgic_v2_probe(struct device_node *vgic_node,
>>>  		goto out_unmap;
>>>  	}
>>>  
>>> +	kvm_register_device_ops(&kvm_arm_vgic_v2_ops, KVM_DEV_TYPE_ARM_VGIC_V2);
>>> +
>>>  	vgic->vcpu_base = vcpu_res.start;
>>>  
>>>  	kvm_info("%s@%llx IRQ%d\n", vgic_node->name,
>>> diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
>>> index 1c2c8ee..d14c75f 100644
>>> --- a/virt/kvm/arm/vgic-v3.c
>>> +++ b/virt/kvm/arm/vgic-v3.c
>>> @@ -230,6 +230,7 @@ int vgic_v3_probe(struct device_node *vgic_node,
>>>  		ret = -ENXIO;
>>>  		goto out;
>>>  	}
>>> +	kvm_register_device_ops(&kvm_arm_vgic_v2_ops, KVM_DEV_TYPE_ARM_VGIC_V2);
>> I did not find any unregistration? I am wrong or isn't it relevant? If
>> confirmed might be added in kvm_arch_exit. Saw KVM-VFIO device ops
>> unregistration is done in kvm_exit/kvm_vfio_ops_exit.
> 
> well forget that one. I guess it is not relevant for that device which
> is not going to be released.

Yeah, I think that's due to the fact that we cannot remove KVM from the
kernel on ARM (no modprobe -r).

Cheers,
Andre.

> BR
> 
> Eric
>>
>> Eric
>>>  
>>>  	vgic->vcpu_base = vcpu_res.start;
>>>  	vgic->vctrl_base = NULL;
>>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>>> index 963b84e..e8003ca 100644
>>> --- a/virt/kvm/arm/vgic.c
>>> +++ b/virt/kvm/arm/vgic.c
>>> @@ -2540,7 +2540,7 @@ static int vgic_create(struct kvm_device *dev, u32 type)
>>>  	return kvm_vgic_create(dev->kvm, type);
>>>  }
>>>  
>>> -static struct kvm_device_ops kvm_arm_vgic_v2_ops = {
>>> +struct kvm_device_ops kvm_arm_vgic_v2_ops = {
>>>  	.name = "kvm-arm-vgic",
>>>  	.create = vgic_create,
>>>  	.destroy = vgic_destroy,
>>> @@ -2619,8 +2619,7 @@ int kvm_vgic_hyp_init(void)
>>>  
>>>  	on_each_cpu(vgic_init_maintenance_interrupt, NULL, 1);
>>>  
>>> -	return kvm_register_device_ops(&kvm_arm_vgic_v2_ops,
>>> -				       KVM_DEV_TYPE_ARM_VGIC_V2);
>>> +	return 0;
>>>  
>>>  out_free_irq:
>>>  	free_percpu_irq(vgic->maint_irq, kvm_get_running_vcpus());
>>>
>>
> 
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 12/19] arm/arm64: KVM: add vgic.h header file
  2014-11-14 10:07 ` [PATCH v4 12/19] arm/arm64: KVM: add vgic.h header file Andre Przywara
@ 2014-11-18 14:07   ` Eric Auger
  2014-11-18 15:24     ` Andre Przywara
  2014-11-23 13:29   ` Christoffer Dall
  1 sibling, 1 reply; 80+ messages in thread
From: Eric Auger @ 2014-11-18 14:07 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/14/2014 11:07 AM, Andre Przywara wrote:
> vgic.c is currently a mixture of generic vGIC emulation code and
> functions specific to emulating a GICv2. To ease the addition of
> GICv3 later, we create new header file vgic.h, which holds constants
> and prototypes of commonly used functions.
> Rename some identifiers to avoid name space clutter.
> I removed the long-standing comment about using the kvm_io_bus API
> to tackle the GIC register ranges, as it wouldn't be a win for us
> anymore.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> 
> -------
> As the diff isn't always obvious here (and to aid eventual rebases),
> here is a list of high-level changes done to the code:
> * moved definitions and prototypes from vgic.c to vgic.h:
>   - VGIC_ADDR_UNDEF
>   - ACCESS_{READ,WRITE}_*
>   - vgic_update_state()
>   - vgic_kick_vcpus()
>   - vgic_get_vmcr()
>   - vgic_set_vmcr()
>   - struct mmio_range {} (renamed to struct kvm_mmio_range)
> * removed static keyword and exported prototype in vgic.h:
>   - vgic_bitmap_get_reg()
>   - vgic_bitmap_set_irq_val()
>   - vgic_bitmap_get_shared_map()
>   - vgic_bytemap_get_reg()
>   - vgic_dist_irq_set()
>   - vgic_dist_irq_clear()
>   - vgic_cpu_irq_clear()
>   - vgic_reg_access()
>   - handle_mmio_raz_wi()
>   - vgic_handle_enable_reg()
>   - vgic_handle_pending_reg()
>   - vgic_handle_cfg_reg()
>   - vgic_unqueue_irqs()
>   - find_matching_range() (renamed to vgic_find_range)
>   - vgic_handle_mmio_range()
>   - vgic_update_state()
>   - vgic_get_vmcr()
>   - vgic_set_vmcr()
>   - vgic_queue_irq()
>   - vgic_kick_vcpus()
>   - vgic_init_maps()
>   - vgic_has_attr_regs()
>   - vgic_set_common_attr()
>   - vgic_get_common_attr()
> * moved functions to vgic.h (static inline):
>   - mmio_data_read()
>   - mmio_data_write()
>   - is_in_range()
> ---
> Changelog v3...v4:
> - rename struct mmio_range to struct kvm_mmio_range
> - rename find_matching_range() to vgic_find_range()
Hi Andre,

It might have helped here to split the patch into 2: first renamings (
mmio_range, find_matching_range ...) in anticipation and then the move.

I would rather use kvm_*vgic*_mmio_range to emphasize it is not kvm wide.
> - remove vgic_create() and vgic_destroy() from header
> 
>  virt/kvm/arm/vgic.c |  150 +++++++++++++++++----------------------------------
>  virt/kvm/arm/vgic.h |  119 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 169 insertions(+), 100 deletions(-)
>  create mode 100644 virt/kvm/arm/vgic.h
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index ea71cd0..4fa58c9 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -75,32 +75,16 @@
>   *   inactive as long as the external input line is held high.
>   */
>  
> -#define VGIC_ADDR_UNDEF		(-1)
> -#define IS_VGIC_ADDR_UNDEF(_x)  ((_x) == VGIC_ADDR_UNDEF)
> +#include "vgic.h"
>  
> -#define PRODUCT_ID_KVM		0x4b	/* ASCII code K */
> -#define IMPLEMENTER_ARM		0x43b
>  #define GICC_ARCH_VERSION_V2	0x2
>  
> -#define ACCESS_READ_VALUE	(1 << 0)
> -#define ACCESS_READ_RAZ		(0 << 0)
> -#define ACCESS_READ_MASK(x)	((x) & (1 << 0))
> -#define ACCESS_WRITE_IGNORED	(0 << 1)
> -#define ACCESS_WRITE_SETBIT	(1 << 1)
> -#define ACCESS_WRITE_CLEARBIT	(2 << 1)
> -#define ACCESS_WRITE_VALUE	(3 << 1)
> -#define ACCESS_WRITE_MASK(x)	((x) & (3 << 1))
> -
>  static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
>  static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
> -static void vgic_update_state(struct kvm *kvm);
> -static void vgic_kick_vcpus(struct kvm *kvm);
>  static u8 *vgic_get_sgi_sources(struct vgic_dist *dist, int vcpu_id, int sgi);
>  static void vgic_dispatch_sgi(struct kvm_vcpu *vcpu, u32 reg);
>  static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
>  static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
> -static void vgic_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr);
> -static void vgic_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr);
>  
>  static const struct vgic_ops *vgic_ops;
>  static const struct vgic_params *vgic;
> @@ -174,8 +158,7 @@ static unsigned long *u64_to_bitmask(u64 *val)
>  	return (unsigned long *)val;
>  }
>  
> -static u32 *vgic_bitmap_get_reg(struct vgic_bitmap *x,
> -				int cpuid, u32 offset)
> +u32 *vgic_bitmap_get_reg(struct vgic_bitmap *x, int cpuid, u32 offset)
>  {
>  	offset >>= 2;
>  	if (!offset)
> @@ -193,8 +176,8 @@ static int vgic_bitmap_get_irq_val(struct vgic_bitmap *x,
>  	return test_bit(irq - VGIC_NR_PRIVATE_IRQS, x->shared);
>  }
>  
> -static void vgic_bitmap_set_irq_val(struct vgic_bitmap *x, int cpuid,
> -				    int irq, int val)
> +void vgic_bitmap_set_irq_val(struct vgic_bitmap *x, int cpuid,
> +			     int irq, int val)
>  {
>  	unsigned long *reg;
>  
> @@ -216,7 +199,7 @@ static unsigned long *vgic_bitmap_get_cpu_map(struct vgic_bitmap *x, int cpuid)
>  	return x->private + cpuid;
>  }
>  
> -static unsigned long *vgic_bitmap_get_shared_map(struct vgic_bitmap *x)
> +unsigned long *vgic_bitmap_get_shared_map(struct vgic_bitmap *x)
>  {
>  	return x->shared;
>  }
> @@ -243,7 +226,7 @@ static void vgic_free_bytemap(struct vgic_bytemap *b)
>  	b->shared = NULL;
>  }
>  
> -static u32 *vgic_bytemap_get_reg(struct vgic_bytemap *x, int cpuid, u32 offset)
> +u32 *vgic_bytemap_get_reg(struct vgic_bytemap *x, int cpuid, u32 offset)
>  {
>  	u32 *reg;
>  
> @@ -340,14 +323,14 @@ static int vgic_dist_irq_is_pending(struct kvm_vcpu *vcpu, int irq)
>  	return vgic_bitmap_get_irq_val(&dist->irq_pending, vcpu->vcpu_id, irq);
>  }
>  
> -static void vgic_dist_irq_set_pending(struct kvm_vcpu *vcpu, int irq)
> +void vgic_dist_irq_set_pending(struct kvm_vcpu *vcpu, int irq)
>  {
>  	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>  
>  	vgic_bitmap_set_irq_val(&dist->irq_pending, vcpu->vcpu_id, irq, 1);
>  }
>  
> -static void vgic_dist_irq_clear_pending(struct kvm_vcpu *vcpu, int irq)
> +void vgic_dist_irq_clear_pending(struct kvm_vcpu *vcpu, int irq)
>  {
>  	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>  
> @@ -363,7 +346,7 @@ static void vgic_cpu_irq_set(struct kvm_vcpu *vcpu, int irq)
>  			vcpu->arch.vgic_cpu.pending_shared);
>  }
>  
> -static void vgic_cpu_irq_clear(struct kvm_vcpu *vcpu, int irq)
> +void vgic_cpu_irq_clear(struct kvm_vcpu *vcpu, int irq)
>  {
>  	if (irq < VGIC_NR_PRIVATE_IRQS)
>  		clear_bit(irq, vcpu->arch.vgic_cpu.pending_percpu);
> @@ -377,16 +360,6 @@ static bool vgic_can_sample_irq(struct kvm_vcpu *vcpu, int irq)
>  	return vgic_irq_is_edge(vcpu, irq) || !vgic_irq_is_queued(vcpu, irq);
>  }
>  
> -static u32 mmio_data_read(struct kvm_exit_mmio *mmio, u32 mask)
> -{
> -	return le32_to_cpu(*((u32 *)mmio->data)) & mask;
> -}
> -
> -static void mmio_data_write(struct kvm_exit_mmio *mmio, u32 mask, u32 value)
> -{
> -	*((u32 *)mmio->data) = cpu_to_le32(value) & mask;
> -}
> -
>  /**
>   * vgic_reg_access - access vgic register
>   * @mmio:   pointer to the data describing the mmio access
> @@ -398,8 +371,8 @@ static void mmio_data_write(struct kvm_exit_mmio *mmio, u32 mask, u32 value)
>   * modes defined for vgic register access
>   * (read,raz,write-ignored,setbit,clearbit,write)
>   */
> -static void vgic_reg_access(struct kvm_exit_mmio *mmio, u32 *reg,
> -			    phys_addr_t offset, int mode)
> +void vgic_reg_access(struct kvm_exit_mmio *mmio, u32 *reg,
> +		     phys_addr_t offset, int mode)
>  {
>  	int word_offset = (offset & 3) * 8;
>  	u32 mask = (1UL << (mmio->len * 8)) - 1;
> @@ -483,16 +456,16 @@ static bool handle_mmio_misc(struct kvm_vcpu *vcpu,
>  	return false;
>  }
>  
> -static bool handle_mmio_raz_wi(struct kvm_vcpu *vcpu,
> -			       struct kvm_exit_mmio *mmio, phys_addr_t offset)
> +bool handle_mmio_raz_wi(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio,
> +			phys_addr_t offset)
>  {
>  	vgic_reg_access(mmio, NULL, offset,
>  			ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
>  	return false;
>  }
>  
> -static bool vgic_handle_enable_reg(struct kvm *kvm, struct kvm_exit_mmio *mmio,
> -				   phys_addr_t offset, int vcpu_id, int access)
> +bool vgic_handle_enable_reg(struct kvm *kvm, struct kvm_exit_mmio *mmio,
> +			    phys_addr_t offset, int vcpu_id, int access)
>  {
>  	u32 *reg;
>  	int mode = ACCESS_READ_VALUE | access;
> @@ -529,9 +502,9 @@ static bool handle_mmio_clear_enable_reg(struct kvm_vcpu *vcpu,
>  				      vcpu->vcpu_id, ACCESS_WRITE_CLEARBIT);
>  }
>  
> -static bool vgic_handle_set_pending_reg(struct kvm *kvm,
> -					struct kvm_exit_mmio *mmio,
> -					phys_addr_t offset, int vcpu_id)
> +bool vgic_handle_set_pending_reg(struct kvm *kvm,
> +				 struct kvm_exit_mmio *mmio,
> +				 phys_addr_t offset, int vcpu_id)
>  {
>  	u32 *reg, orig;
>  	u32 level_mask;
> @@ -566,9 +539,9 @@ static bool vgic_handle_set_pending_reg(struct kvm *kvm,
>  	return false;
>  }
>  
> -static bool vgic_handle_clear_pending_reg(struct kvm *kvm,
> -					  struct kvm_exit_mmio *mmio,
> -					  phys_addr_t offset, int vcpu_id)
> +bool vgic_handle_clear_pending_reg(struct kvm *kvm,
> +				   struct kvm_exit_mmio *mmio,
> +				   phys_addr_t offset, int vcpu_id)
>  {
>  	u32 *level_active;
>  	u32 *reg, orig;
> @@ -740,8 +713,8 @@ static u16 vgic_cfg_compress(u32 val)
>   * LSB is always 0. As such, we only keep the upper bit, and use the
>   * two above functions to compress/expand the bits
>   */
> -static bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio *mmio,
> -				phys_addr_t offset)
> +bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio *mmio,
> +			 phys_addr_t offset)
>  {
>  	u32 val;
>  
> @@ -817,7 +790,7 @@ static void vgic_v2_add_sgi_source(struct kvm_vcpu *vcpu, int irq, int source)
>   * to the distributor but the active state stays in the LRs, because we don't
>   * track the active state on the distributor side.
>   */
> -static void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
> +void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
>  {
>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>  	int i;
> @@ -942,21 +915,7 @@ static bool handle_mmio_sgi_clear(struct kvm_vcpu *vcpu,
>  		return write_set_clear_sgi_pend_reg(vcpu, mmio, offset, false);
>  }
>  
> -/*
> - * I would have liked to use the kvm_bus_io_*() API instead, but it
> - * cannot cope with banked registers (only the VM pointer is passed
> - * around, and we need the vcpu). One of these days, someone please
> - * fix it!
> - */
> -struct mmio_range {
> -	phys_addr_t base;
> -	unsigned long len;
> -	int bits_per_irq;
> -	bool (*handle_mmio)(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio,
> -			    phys_addr_t offset);
> -};
> -
> -static const struct mmio_range vgic_dist_ranges[] = {
> +static const struct kvm_mmio_range vgic_dist_ranges[] = {
>  	{
>  		.base		= GIC_DIST_CTRL,
>  		.len		= 12,
> @@ -1041,12 +1000,12 @@ static const struct mmio_range vgic_dist_ranges[] = {
>  	{}
>  };
>  
> -static const
> -struct mmio_range *find_matching_range(const struct mmio_range *ranges,
> +const
> +struct kvm_mmio_range *vgic_find_range(const struct kvm_mmio_range *ranges,
>  				       struct kvm_exit_mmio *mmio,
>  				       phys_addr_t offset)
>  {
> -	const struct mmio_range *r = ranges;
> +	const struct kvm_mmio_range *r = ranges;
>  
>  	while (r->len) {
>  		if (offset >= r->base &&
> @@ -1059,7 +1018,7 @@ struct mmio_range *find_matching_range(const struct mmio_range *ranges,
>  }
>  
>  static bool vgic_validate_access(const struct vgic_dist *dist,
> -				 const struct mmio_range *range,
> +				 const struct kvm_mmio_range *range,
>  				 unsigned long offset)
>  {
>  	int irq;
> @@ -1085,7 +1044,7 @@ static bool vgic_validate_access(const struct vgic_dist *dist,
>  static bool call_range_handler(struct kvm_vcpu *vcpu,
>  			       struct kvm_exit_mmio *mmio,
>  			       unsigned long offset,
> -			       const struct mmio_range *range)
> +			       const struct kvm_mmio_range *range)
>  {
>  	u32 *data32 = (void *)mmio->data;
>  	struct kvm_exit_mmio mmio32;
> @@ -1129,18 +1088,18 @@ static bool call_range_handler(struct kvm_vcpu *vcpu,
>   *
>   * returns true if the MMIO access could be performed
>   */
> -static bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, struct kvm_run *run,
> +bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, struct kvm_run *run,
>  			    struct kvm_exit_mmio *mmio,
> -			    const struct mmio_range *ranges,
> +			    const struct kvm_mmio_range *ranges,
>  			    unsigned long mmio_base)
>  {
> -	const struct mmio_range *range;
> +	const struct kvm_mmio_range *range;
>  	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>  	bool updated_state;
>  	unsigned long offset;
>  
>  	offset = mmio->phys_addr - mmio_base;
> -	range = find_matching_range(ranges, mmio, offset);
> +	range = vgic_find_range(ranges, mmio, offset);
>  	if (unlikely(!range || !range->handle_mmio)) {
>  		pr_warn("Unhandled access %d %08llx %d\n",
>  			mmio->is_write, mmio->phys_addr, mmio->len);
> @@ -1166,12 +1125,6 @@ static bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, struct kvm_run *run,
>  	return true;
>  }
>  
> -static inline bool is_in_range(phys_addr_t addr, unsigned long len,
> -			       phys_addr_t baseaddr, unsigned long size)
> -{
> -	return (addr >= baseaddr) && (addr + len <= baseaddr + size);
> -}
> -
>  static bool vgic_v2_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
>  				struct kvm_exit_mmio *mmio)
>  {
> @@ -1298,7 +1251,7 @@ static int compute_pending_for_cpu(struct kvm_vcpu *vcpu)
>   * Update the interrupt state and determine which CPUs have pending
>   * interrupts. Must be called with distributor lock held.
>   */
> -static void vgic_update_state(struct kvm *kvm)
> +void vgic_update_state(struct kvm *kvm)
>  {
>  	struct vgic_dist *dist = &kvm->arch.vgic;
>  	struct kvm_vcpu *vcpu;
> @@ -1359,12 +1312,12 @@ static inline void vgic_disable_underflow(struct kvm_vcpu *vcpu)
>  	vgic_ops->disable_underflow(vcpu);
>  }
>  
> -static inline void vgic_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr)
> +void vgic_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr)
>  {
>  	vgic_ops->get_vmcr(vcpu, vmcr);
>  }
>  
> -static void vgic_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr)
> +void vgic_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr)
>  {
>  	vgic_ops->set_vmcr(vcpu, vmcr);
>  }
> @@ -1414,7 +1367,7 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu)
>   * Queue an interrupt to a CPU virtual interface. Return true on success,
>   * or false if it wasn't possible to queue it.
>   */
> -static bool vgic_queue_irq(struct kvm_vcpu *vcpu, u8 sgi_source_id, int irq)
> +bool vgic_queue_irq(struct kvm_vcpu *vcpu, u8 sgi_source_id, int irq)
>  {
>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>  	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> @@ -1700,7 +1653,7 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
>  	return test_bit(vcpu->vcpu_id, dist->irq_pending_on_cpu);
>  }
>  
> -static void vgic_kick_vcpus(struct kvm *kvm)
> +void vgic_kick_vcpus(struct kvm *kvm)
>  {
>  	struct kvm_vcpu *vcpu;
>  	int c;
> @@ -1941,7 +1894,7 @@ void kvm_vgic_destroy(struct kvm *kvm)
>   * Allocate and initialize the various data structures. Must be called
>   * with kvm->lock held!
>   */
> -static int vgic_init_maps(struct kvm *kvm)
> +int vgic_init_maps(struct kvm *kvm)
>  {
>  	struct vgic_dist *dist = &kvm->arch.vgic;
>  	struct kvm_vcpu *vcpu;
> @@ -2080,7 +2033,7 @@ out:
>  	return ret;
>  }
>  
> -static int vgic_v2_init_emulation(struct kvm *kvm)
> +int vgic_v2_init_emulation(struct kvm *kvm)
>  {
>  	struct vgic_dist *dist = &kvm->arch.vgic;
>  
> @@ -2320,7 +2273,7 @@ static bool handle_cpu_mmio_ident(struct kvm_vcpu *vcpu,
>   * CPU Interface Register accesses - these are not accessed by the VM, but by
>   * user space for saving and restoring VGIC state.
>   */
> -static const struct mmio_range vgic_cpu_ranges[] = {
> +static const struct kvm_mmio_range vgic_cpu_ranges[] = {
>  	{
>  		.base		= GIC_CPU_CTRL,
>  		.len		= 12,
> @@ -2347,7 +2300,7 @@ static int vgic_attr_regs_access(struct kvm_device *dev,
>  				 struct kvm_device_attr *attr,
>  				 u32 *reg, bool is_write)
>  {
> -	const struct mmio_range *r = NULL, *ranges;
> +	const struct kvm_mmio_range *r = NULL, *ranges;
>  	phys_addr_t offset;
>  	int ret, cpuid, c;
>  	struct kvm_vcpu *vcpu, *tmp_vcpu;
> @@ -2388,7 +2341,7 @@ static int vgic_attr_regs_access(struct kvm_device *dev,
>  	default:
>  		BUG();
>  	}
> -	r = find_matching_range(ranges, &mmio, offset);
> +	r = vgic_find_range(ranges, &mmio, offset);
>  
>  	if (unlikely(!r || !r->handle_mmio)) {
>  		ret = -ENXIO;
> @@ -2434,8 +2387,7 @@ out:
>  	return ret;
>  }
>  
> -static int vgic_set_common_attr(struct kvm_device *dev,
> -				struct kvm_device_attr *attr)
> +int vgic_set_common_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
>  {
>  	int r;
>  
> @@ -2512,8 +2464,7 @@ static int vgic_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
>  	return -ENXIO;
>  }
>  
> -static int vgic_get_common_attr(struct kvm_device *dev,
> -				struct kvm_device_attr *attr)
> +int vgic_get_common_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
>  {
>  	int r = -ENXIO;
>  
> @@ -2568,13 +2519,12 @@ static int vgic_get_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
>  	return -ENXIO;
>  }
>  
> -static int vgic_has_attr_regs(const struct mmio_range *ranges,
> -			      phys_addr_t offset)
> +int vgic_has_attr_regs(const struct kvm_mmio_range *ranges, phys_addr_t offset)
>  {
>  	struct kvm_exit_mmio dev_attr_mmio;
>  
>  	dev_attr_mmio.len = 4;
> -	if (find_matching_range(ranges, &dev_attr_mmio, offset))
> +	if (vgic_find_range(ranges, &dev_attr_mmio, offset))
>  		return 0;
>  	else
>  		return -ENXIO;
> @@ -2604,12 +2554,12 @@ static int vgic_has_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
>  	return -ENXIO;
>  }
>  
> -static void vgic_destroy(struct kvm_device *dev)
> +void vgic_destroy(struct kvm_device *dev)
>  {
>  	kfree(dev);
>  }
>  
> -static int vgic_create(struct kvm_device *dev, u32 type)
> +int vgic_create(struct kvm_device *dev, u32 type)
>  {
>  	return kvm_vgic_create(dev->kvm, type);
>  }
> diff --git a/virt/kvm/arm/vgic.h b/virt/kvm/arm/vgic.h
> new file mode 100644
> index 0000000..ff3171a
> --- /dev/null
> +++ b/virt/kvm/arm/vgic.h
> @@ -0,0 +1,119 @@
> +/*
> + * Copyright (C) 2012-2014 ARM Ltd.
> + * Author: Marc Zyngier <marc.zyngier@arm.com>
> + *
> + * Derived from virt/kvm/arm/vgic.c
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef __KVM_VGIC_H__
> +#define __KVM_VGIC_H__
> +
> +#define VGIC_ADDR_UNDEF		(-1)
> +#define IS_VGIC_ADDR_UNDEF(_x)  ((_x) == VGIC_ADDR_UNDEF)
> +
> +#define PRODUCT_ID_KVM		0x4b	/* ASCII code K */
> +#define IMPLEMENTER_ARM		0x43b
> +
> +#define ACCESS_READ_VALUE	(1 << 0)
> +#define ACCESS_READ_RAZ		(0 << 0)
> +#define ACCESS_READ_MASK(x)	((x) & (1 << 0))
> +#define ACCESS_WRITE_IGNORED	(0 << 1)
> +#define ACCESS_WRITE_SETBIT	(1 << 1)
> +#define ACCESS_WRITE_CLEARBIT	(2 << 1)
> +#define ACCESS_WRITE_VALUE	(3 << 1)
> +#define ACCESS_WRITE_MASK(x)	((x) & (3 << 1))
> +
> +unsigned long *vgic_bitmap_get_shared_map(struct vgic_bitmap *x);
> +
> +void vgic_update_state(struct kvm *kvm);
> +int vgic_init_maps(struct kvm *kvm);
> +
> +u32 *vgic_bitmap_get_reg(struct vgic_bitmap *x, int cpuid, u32 offset);
> +u32 *vgic_bytemap_get_reg(struct vgic_bytemap *x, int cpuid, u32 offset);
> +
> +void vgic_dist_irq_set_pending(struct kvm_vcpu *vcpu, int irq);
> +void vgic_dist_irq_clear_pending(struct kvm_vcpu *vcpu, int irq);
> +void vgic_cpu_irq_clear(struct kvm_vcpu *vcpu, int irq);
> +void vgic_bitmap_set_irq_val(struct vgic_bitmap *x, int cpuid,
> +			     int irq, int val);
> +
> +void vgic_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr);
> +void vgic_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr);
> +
> +bool vgic_queue_irq(struct kvm_vcpu *vcpu, u8 sgi_source_id, int irq);
> +void vgic_unqueue_irqs(struct kvm_vcpu *vcpu);
> +
> +void vgic_reg_access(struct kvm_exit_mmio *mmio, u32 *reg,
> +		     phys_addr_t offset, int mode);
> +bool handle_mmio_raz_wi(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio,
> +			phys_addr_t offset);
> +
> +static inline
> +u32 mmio_data_read(struct kvm_exit_mmio *mmio, u32 mask)
> +{
> +	return le32_to_cpu(*((u32 *)mmio->data)) & mask;
> +}
> +
> +static inline
> +void mmio_data_write(struct kvm_exit_mmio *mmio, u32 mask, u32 value)
> +{
> +	*((u32 *)mmio->data) = cpu_to_le32(value) & mask;
> +}
> +
> +struct kvm_mmio_range {
> +	phys_addr_t base;
> +	unsigned long len;
> +	int bits_per_irq;
> +	bool (*handle_mmio)(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio,
> +			    phys_addr_t offset);
> +};
> +
> +static inline bool is_in_range(phys_addr_t addr, unsigned long len,
> +			       phys_addr_t baseaddr, unsigned long size)
> +{
> +	return (addr >= baseaddr) && (addr + len <= baseaddr + size);
> +}
don't it deserve a renaming too?
same for mmio_data_write/read? defines?

Best Regards

Eric
> +
> +const
> +struct kvm_mmio_range *vgic_find_range(const struct kvm_mmio_range *ranges,
> +				       struct kvm_exit_mmio *mmio,
> +				       phys_addr_t offset);
> +
> +bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, struct kvm_run *run,
> +			    struct kvm_exit_mmio *mmio,
> +			    const struct kvm_mmio_range *ranges,
> +			    unsigned long mmio_base);
> +
> +bool vgic_handle_enable_reg(struct kvm *kvm, struct kvm_exit_mmio *mmio,
> +			    phys_addr_t offset, int vcpu_id, int access);
> +
> +bool vgic_handle_set_pending_reg(struct kvm *kvm, struct kvm_exit_mmio *mmio,
> +				 phys_addr_t offset, int vcpu_id);
> +
> +bool vgic_handle_clear_pending_reg(struct kvm *kvm, struct kvm_exit_mmio *mmio,
> +				   phys_addr_t offset, int vcpu_id);
> +
> +bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio *mmio,
> +			 phys_addr_t offset);
> +
> +void vgic_kick_vcpus(struct kvm *kvm);
> +
> +int vgic_has_attr_regs(const struct kvm_mmio_range *ranges, phys_addr_t offset);
> +int vgic_set_common_attr(struct kvm_device *dev, struct kvm_device_attr *attr);
> +int vgic_get_common_attr(struct kvm_device *dev, struct kvm_device_attr *attr);
> +
> +int vgic_v2_init_emulation(struct kvm *kvm);
> +
> +#endif
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 12/19] arm/arm64: KVM: add vgic.h header file
  2014-11-18 14:07   ` Eric Auger
@ 2014-11-18 15:24     ` Andre Przywara
  0 siblings, 0 replies; 80+ messages in thread
From: Andre Przywara @ 2014-11-18 15:24 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Eric,

On 18/11/14 14:07, Eric Auger wrote:
> On 11/14/2014 11:07 AM, Andre Przywara wrote:
>> vgic.c is currently a mixture of generic vGIC emulation code and
>> functions specific to emulating a GICv2. To ease the addition of
>> GICv3 later, we create new header file vgic.h, which holds constants
>> and prototypes of commonly used functions.
>> Rename some identifiers to avoid name space clutter.
>> I removed the long-standing comment about using the kvm_io_bus API
>> to tackle the GIC register ranges, as it wouldn't be a win for us
>> anymore.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>>
>> -------
>> As the diff isn't always obvious here (and to aid eventual rebases),
>> here is a list of high-level changes done to the code:
>> * moved definitions and prototypes from vgic.c to vgic.h:
>>   - VGIC_ADDR_UNDEF
>>   - ACCESS_{READ,WRITE}_*
>>   - vgic_update_state()
>>   - vgic_kick_vcpus()
>>   - vgic_get_vmcr()
>>   - vgic_set_vmcr()
>>   - struct mmio_range {} (renamed to struct kvm_mmio_range)
>> * removed static keyword and exported prototype in vgic.h:
>>   - vgic_bitmap_get_reg()
>>   - vgic_bitmap_set_irq_val()
>>   - vgic_bitmap_get_shared_map()
>>   - vgic_bytemap_get_reg()
>>   - vgic_dist_irq_set()
>>   - vgic_dist_irq_clear()
>>   - vgic_cpu_irq_clear()
>>   - vgic_reg_access()
>>   - handle_mmio_raz_wi()
>>   - vgic_handle_enable_reg()
>>   - vgic_handle_pending_reg()
>>   - vgic_handle_cfg_reg()
>>   - vgic_unqueue_irqs()
>>   - find_matching_range() (renamed to vgic_find_range)
>>   - vgic_handle_mmio_range()
>>   - vgic_update_state()
>>   - vgic_get_vmcr()
>>   - vgic_set_vmcr()
>>   - vgic_queue_irq()
>>   - vgic_kick_vcpus()
>>   - vgic_init_maps()
>>   - vgic_has_attr_regs()
>>   - vgic_set_common_attr()
>>   - vgic_get_common_attr()
>> * moved functions to vgic.h (static inline):
>>   - mmio_data_read()
>>   - mmio_data_write()
>>   - is_in_range()
>> ---
>> Changelog v3...v4:
>> - rename struct mmio_range to struct kvm_mmio_range
>> - rename find_matching_range() to vgic_find_range()
> Hi Andre,
> 
> It might have helped here to split the patch into 2: first renamings (
> mmio_range, find_matching_range ...) in anticipation and then the move.

Possibly. I didn't make so much sense before, because there was only one
rename, but I may take a look at this again.

> I would rather use kvm_*vgic*_mmio_range to emphasize it is not kvm wide.

I think I shortened it because it broke quite some lines (the type being
mentioned in some function's parameter list).
But I could rename it to vgic_mmio_range to avoid that kvm-wide notion.

>> - remove vgic_create() and vgic_destroy() from header
>>

[ ... ]

>> diff --git a/virt/kvm/arm/vgic.h b/virt/kvm/arm/vgic.h
>> new file mode 100644
>> index 0000000..ff3171a
>> --- /dev/null
>> +++ b/virt/kvm/arm/vgic.h
>> @@ -0,0 +1,119 @@
>> +/*
>> + * Copyright (C) 2012-2014 ARM Ltd.
>> + * Author: Marc Zyngier <marc.zyngier@arm.com>
>> + *
>> + * Derived from virt/kvm/arm/vgic.c
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef __KVM_VGIC_H__
>> +#define __KVM_VGIC_H__
>> +
>> +#define VGIC_ADDR_UNDEF              (-1)
>> +#define IS_VGIC_ADDR_UNDEF(_x)  ((_x) == VGIC_ADDR_UNDEF)
>> +
>> +#define PRODUCT_ID_KVM               0x4b    /* ASCII code K */
>> +#define IMPLEMENTER_ARM              0x43b
>> +
>> +#define ACCESS_READ_VALUE    (1 << 0)
>> +#define ACCESS_READ_RAZ              (0 << 0)
>> +#define ACCESS_READ_MASK(x)  ((x) & (1 << 0))
>> +#define ACCESS_WRITE_IGNORED (0 << 1)
>> +#define ACCESS_WRITE_SETBIT  (1 << 1)
>> +#define ACCESS_WRITE_CLEARBIT        (2 << 1)
>> +#define ACCESS_WRITE_VALUE   (3 << 1)
>> +#define ACCESS_WRITE_MASK(x) ((x) & (3 << 1))
>> +
>> +unsigned long *vgic_bitmap_get_shared_map(struct vgic_bitmap *x);
>> +
>> +void vgic_update_state(struct kvm *kvm);
>> +int vgic_init_maps(struct kvm *kvm);
>> +
>> +u32 *vgic_bitmap_get_reg(struct vgic_bitmap *x, int cpuid, u32 offset);
>> +u32 *vgic_bytemap_get_reg(struct vgic_bytemap *x, int cpuid, u32 offset);
>> +
>> +void vgic_dist_irq_set_pending(struct kvm_vcpu *vcpu, int irq);
>> +void vgic_dist_irq_clear_pending(struct kvm_vcpu *vcpu, int irq);
>> +void vgic_cpu_irq_clear(struct kvm_vcpu *vcpu, int irq);
>> +void vgic_bitmap_set_irq_val(struct vgic_bitmap *x, int cpuid,
>> +                          int irq, int val);
>> +
>> +void vgic_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr);
>> +void vgic_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr);
>> +
>> +bool vgic_queue_irq(struct kvm_vcpu *vcpu, u8 sgi_source_id, int irq);
>> +void vgic_unqueue_irqs(struct kvm_vcpu *vcpu);
>> +
>> +void vgic_reg_access(struct kvm_exit_mmio *mmio, u32 *reg,
>> +                  phys_addr_t offset, int mode);
>> +bool handle_mmio_raz_wi(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio,
>> +                     phys_addr_t offset);
>> +
>> +static inline
>> +u32 mmio_data_read(struct kvm_exit_mmio *mmio, u32 mask)
>> +{
>> +     return le32_to_cpu(*((u32 *)mmio->data)) & mask;
>> +}
>> +
>> +static inline
>> +void mmio_data_write(struct kvm_exit_mmio *mmio, u32 mask, u32 value)
>> +{
>> +     *((u32 *)mmio->data) = cpu_to_le32(value) & mask;
>> +}
>> +
>> +struct kvm_mmio_range {
>> +     phys_addr_t base;
>> +     unsigned long len;
>> +     int bits_per_irq;
>> +     bool (*handle_mmio)(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio,
>> +                         phys_addr_t offset);
>> +};
>> +
>> +static inline bool is_in_range(phys_addr_t addr, unsigned long len,
>> +                            phys_addr_t baseaddr, unsigned long size)
>> +{
>> +     return (addr >= baseaddr) && (addr + len <= baseaddr + size);
>> +}
> don't it deserve a renaming too?
> same for mmio_data_write/read? defines?

Those are static inlines in the header file, so they are not visible
outside.

Thanks for taking care!
Andr?

> Best Regards
> 
> Eric
>> +
>> +const
>> +struct kvm_mmio_range *vgic_find_range(const struct kvm_mmio_range *ranges,
>> +                                    struct kvm_exit_mmio *mmio,
>> +                                    phys_addr_t offset);
>> +
>> +bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, struct kvm_run *run,
>> +                         struct kvm_exit_mmio *mmio,
>> +                         const struct kvm_mmio_range *ranges,
>> +                         unsigned long mmio_base);
>> +
>> +bool vgic_handle_enable_reg(struct kvm *kvm, struct kvm_exit_mmio *mmio,
>> +                         phys_addr_t offset, int vcpu_id, int access);
>> +
>> +bool vgic_handle_set_pending_reg(struct kvm *kvm, struct kvm_exit_mmio *mmio,
>> +                              phys_addr_t offset, int vcpu_id);
>> +
>> +bool vgic_handle_clear_pending_reg(struct kvm *kvm, struct kvm_exit_mmio *mmio,
>> +                                phys_addr_t offset, int vcpu_id);
>> +
>> +bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio *mmio,
>> +                      phys_addr_t offset);
>> +
>> +void vgic_kick_vcpus(struct kvm *kvm);
>> +
>> +int vgic_has_attr_regs(const struct kvm_mmio_range *ranges, phys_addr_t offset);
>> +int vgic_set_common_attr(struct kvm_device *dev, struct kvm_device_attr *attr);
>> +int vgic_get_common_attr(struct kvm_device *dev, struct kvm_device_attr *attr);
>> +
>> +int vgic_v2_init_emulation(struct kvm *kvm);
>> +
>> +#endif
>>
> 
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation
  2014-11-14 10:07 ` [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation Andre Przywara
  2014-11-14 11:07   ` Christoffer Dall
@ 2014-11-18 15:57   ` Eric Auger
  2014-11-23 14:38   ` Christoffer Dall
  2 siblings, 0 replies; 80+ messages in thread
From: Eric Auger @ 2014-11-18 15:57 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/14/2014 11:07 AM, Andre Przywara wrote:
> With everything separated and prepared, we implement a model of a
> GICv3 distributor and redistributors by using the existing framework
> to provide handler functions for each register group.
> 
> Currently we limit the emulation to a model enforcing a single
> security state, with SRE==1 (forcing system register access) and
> ARE==1 (allowing more than 8 VCPUs).
> 
> We share some of the functions provided for GICv2 emulation, but take
> the different ways of addressing (v)CPUs into account.
> Save and restore is currently not implemented.
> 
> Similar to the split-off of the GICv2 specific code, the new emulation
> code goes into a new file (vgic-v3-emul.c).
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
> Changelog v3...v4:
> - remove ICC_SGI1R_EL1 register handling (moved into later patch)
> - add definitions for single security state
> - document emulation limitations in vgic-v3-emul.c header
> - move CTLR, TYPER and IIDR handling into separate functions
> - add RAO/WI handling for IGROUPRn registers
> - remove unneeded offset masking on calling vgic_reg_access()
> - rework handle_mmio_route_reg() to only handle SPIs
> - refine IROUTERn register range
> - use non-atomic bitops functions (__clear_bit() and __set_bit())
> - rename vgic_dist_ranges[] to vgic_v3_dist_ranges[]
> - add (RAZ/WI) implementation of GICD_STATUSR
> - add (RAZ/WI) implementations of MBI registers
> - adapt to new private passing (in struct kvm_exit_mmio instead of a paramter)
> - fix vcpu_id calculation bug in handle CFG registers
> - always use hexadecimal numbers for .len member
> - simplify vgic_v3_handle_mmio()
> - add vgic_v3_create() and vgic_v3_destroy()
> - swap vgic_v3_[sg]et_attr() code location
> - add and improve comments
> - (adaptions to changes from earlier patches)
> 
>  arch/arm64/kvm/Makefile            |    1 +
>  include/kvm/arm_vgic.h             |    9 +-
>  include/linux/irqchip/arm-gic-v3.h |   32 ++
>  include/linux/kvm_host.h           |    1 +
>  include/uapi/linux/kvm.h           |    2 +
>  virt/kvm/arm/vgic-v3-emul.c        |  904 ++++++++++++++++++++++++++++++++++++
>  virt/kvm/arm/vgic.c                |   11 +-
>  virt/kvm/arm/vgic.h                |    3 +
>  8 files changed, 960 insertions(+), 3 deletions(-)
>  create mode 100644 virt/kvm/arm/vgic-v3-emul.c
> 
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index d957353..4e6e09e 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -24,5 +24,6 @@ kvm-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v2.o
>  kvm-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v2-emul.o
>  kvm-$(CONFIG_KVM_ARM_VGIC) += vgic-v2-switch.o
>  kvm-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v3.o
> +kvm-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v3-emul.o
>  kvm-$(CONFIG_KVM_ARM_VGIC) += vgic-v3-switch.o
>  kvm-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 421833f..c1ef5a9 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -160,7 +160,11 @@ struct vgic_dist {
>  
>  	/* Distributor and vcpu interface mapping in the guest */
>  	phys_addr_t		vgic_dist_base;
> -	phys_addr_t		vgic_cpu_base;
> +	/* GICv2 and GICv3 use different mapped register blocks */
> +	union {
> +		phys_addr_t		vgic_cpu_base;
> +		phys_addr_t		vgic_redist_base;
> +	};
>  
>  	/* Distributor enabled */
>  	u32			enabled;
> @@ -222,6 +226,9 @@ struct vgic_dist {
>  	 */
>  	struct vgic_bitmap	*irq_spi_target;
>  
> +	/* Target MPIDR for each IRQ (needed for GICv3 IROUTERn) only */
> +	u32			*irq_spi_mpidr;
> +
>  	/* Bitmap indicating which CPU has something pending */
>  	unsigned long		*irq_pending_on_cpu;
>  
> diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
> index 03a4ea3..726d898 100644
> --- a/include/linux/irqchip/arm-gic-v3.h
> +++ b/include/linux/irqchip/arm-gic-v3.h
> @@ -33,6 +33,7 @@
>  #define GICD_SETSPI_SR			0x0050
>  #define GICD_CLRSPI_SR			0x0058
>  #define GICD_SEIR			0x0068
> +#define GICD_IGROUPR			0x0080
>  #define GICD_ISENABLER			0x0100
>  #define GICD_ICENABLER			0x0180
>  #define GICD_ISPENDR			0x0200
> @@ -41,14 +42,37 @@
>  #define GICD_ICACTIVER			0x0380
>  #define GICD_IPRIORITYR			0x0400
>  #define GICD_ICFGR			0x0C00
> +#define GICD_IGRPMODR			0x0D00
> +#define GICD_NSACR			0x0E00
>  #define GICD_IROUTER			0x6000
> +#define GICD_IDREGS			0xFFD0
>  #define GICD_PIDR2			0xFFE8
>  
> +/*
> + * Those registers are actually from GICv2, but the spec demands that they
> + * are implemented as RES0 if ARE is 1 (which we do in KVM's emulated GICv3).
> + */
> +#define GICD_ITARGETSR			0x0800
> +#define GICD_SGIR			0x0F00
> +#define GICD_CPENDSGIR			0x0F10
> +#define GICD_SPENDSGIR			0x0F20
> +
>  #define GICD_CTLR_RWP			(1U << 31)
> +#define GICD_CTLR_DS			(1U << 6)
>  #define GICD_CTLR_ARE_NS		(1U << 4)
>  #define GICD_CTLR_ENABLE_G1A		(1U << 1)
>  #define GICD_CTLR_ENABLE_G1		(1U << 0)
>  
> +/*
> + * In systems with a single security state (what we emulate in KVM)
> + * the meaning of the interrupt group enable bits is slightly different
> + */
> +#define GICD_CTLR_ENABLE_SS_G1		(1U << 1)
> +#define GICD_CTLR_ENABLE_SS_G0		(1U << 0)
> +
> +#define GICD_TYPER_LPIS			(1U << 17)
> +#define GICD_TYPER_MBIS			(1U << 16)
> +
>  #define GICD_IROUTER_SPI_MODE_ONE	(0U << 31)
>  #define GICD_IROUTER_SPI_MODE_ANY	(1U << 31)
>  
> @@ -56,6 +80,8 @@
>  #define GIC_PIDR2_ARCH_GICv3		0x30
>  #define GIC_PIDR2_ARCH_GICv4		0x40
>  
> +#define GIC_V3_DIST_SIZE		0x10000
> +
>  /*
>   * Re-Distributor registers, offsets from RD_base
>   */
> @@ -74,6 +100,7 @@
>  #define GICR_SYNCR			0x00C0
>  #define GICR_MOVLPIR			0x0100
>  #define GICR_MOVALLR			0x0110
> +#define GICR_IDREGS			GICD_IDREGS
>  #define GICR_PIDR2			GICD_PIDR2
>  
>  #define GICR_WAKER_ProcessorSleep	(1U << 1)
> @@ -82,6 +109,7 @@
>  /*
>   * Re-Distributor registers, offsets from SGI_base
>   */
> +#define GICR_IGROUPR0			GICD_IGROUPR
>  #define GICR_ISENABLER0			GICD_ISENABLER
>  #define GICR_ICENABLER0			GICD_ICENABLER
>  #define GICR_ISPENDR0			GICD_ISPENDR
> @@ -90,10 +118,14 @@
>  #define GICR_ICACTIVER0			GICD_ICACTIVER
>  #define GICR_IPRIORITYR0		GICD_IPRIORITYR
>  #define GICR_ICFGR0			GICD_ICFGR
> +#define GICR_IGRPMODR0			GICD_IGRPMODR
> +#define GICR_NSACR			GICD_NSACR
>  
>  #define GICR_TYPER_VLPIS		(1U << 1)
>  #define GICR_TYPER_LAST			(1U << 4)
>  
> +#define GIC_V3_REDIST_SIZE		0x20000
> +
>  /*
>   * CPU interface registers
>   */
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 326ba7a..4a7798e 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1085,6 +1085,7 @@ void kvm_unregister_device_ops(u32 type);
>  extern struct kvm_device_ops kvm_mpic_ops;
>  extern struct kvm_device_ops kvm_xics_ops;
>  extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
> +extern struct kvm_device_ops kvm_arm_vgic_v3_ops;
>  
>  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
>  
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 6076882..24cb129 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -960,6 +960,8 @@ enum kvm_device_type {
>  #define KVM_DEV_TYPE_ARM_VGIC_V2	KVM_DEV_TYPE_ARM_VGIC_V2
>  	KVM_DEV_TYPE_FLIC,
>  #define KVM_DEV_TYPE_FLIC		KVM_DEV_TYPE_FLIC
> +	KVM_DEV_TYPE_ARM_VGIC_V3,
> +#define KVM_DEV_TYPE_ARM_VGIC_V3	KVM_DEV_TYPE_ARM_VGIC_V3
>  	KVM_DEV_TYPE_MAX,
>  };
Hi Andre,

Documentation/virtual/kvm/devices/arm-vgic.txt needs to be updated too.

I need some more time to study the rest and digest ;-)

--Eric

>  
> diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
> new file mode 100644
> index 0000000..97b5801
> --- /dev/null
> +++ b/virt/kvm/arm/vgic-v3-emul.c
> @@ -0,0 +1,904 @@
> +/*
> + * GICv3 distributor and redistributor emulation
> + *
> + * GICv3 emulation is currently only supported on a GICv3 host (because
> + * we rely on the hardware's CPU interface virtualization support), but
> + * supports both hardware with or without the optional GICv2 backwards
> + * compatibility features.
> + *
> + * Limitations of the emulation:
> + * (RAZ/WI: read as zero, write ignore, RAO/WI: read as one, write ignore)
> + * - We do not support LPIs (yet). TYPER.LPIS is reported as 0 and is RAZ/WI.
> + * - We do not support the message based interrupts (MBIs) triggered by
> + *   writes to the GICD_{SET,CLR}SPI_* registers. TYPER.MBIS is reported as 0.
> + * - We do not support the (optional) backwards compatibility feature.
> + *   GICD_CTLR.ARE resets to 1 and is RAO/WI. If the _host_ GIC supports
> + *   the compatiblity feature, you can use a GICv2 in the guest, though.
> + * - We only support a single security state. GICD_CTLR.DS is 1 and is RAO/WI.
> + * - Priorities are not emulated (same as the GICv2 emulation). Linux
> + *   as a guest is fine with this, because it does not use priorities.
> + * - We only support Group1 interrupts. Again Linux uses only those.
> + *
> + * Copyright (C) 2014 ARM Ltd.
> + * Author: Andre Przywara <andre.przywara@arm.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/cpu.h>
> +#include <linux/kvm.h>
> +#include <linux/kvm_host.h>
> +#include <linux/interrupt.h>
> +
> +#include <linux/irqchip/arm-gic-v3.h>
> +#include <kvm/arm_vgic.h>
> +
> +#include <asm/kvm_emulate.h>
> +#include <asm/kvm_arm.h>
> +#include <asm/kvm_mmu.h>
> +
> +#include "vgic.h"
> +
> +static bool handle_mmio_rao_wi(struct kvm_vcpu *vcpu,
> +			       struct kvm_exit_mmio *mmio, phys_addr_t offset)
> +{
> +	u32 reg = 0xffffffff;
> +
> +	vgic_reg_access(mmio, &reg, offset,
> +			ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
> +
> +	return false;
> +}
> +
> +static bool handle_mmio_ctlr(struct kvm_vcpu *vcpu,
> +			     struct kvm_exit_mmio *mmio, phys_addr_t offset)
> +{
> +	u32 reg = 0;
> +
> +	/*
> +	 * Force ARE and DS to 1, the guest cannot change this.
> +	 * For the time being we only support Group1 interrupts.
> +	 */
> +	if (vcpu->kvm->arch.vgic.enabled)
> +		reg = GICD_CTLR_ENABLE_SS_G1;
> +	reg |= GICD_CTLR_ARE_NS | GICD_CTLR_DS;
> +
> +	vgic_reg_access(mmio, &reg, offset,
> +			ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
> +	if (mmio->is_write) {
> +		if (reg & GICD_CTLR_ENABLE_SS_G0)
> +			kvm_info("guest tried to enable unsupported Group0 interrupts\n");
> +		vcpu->kvm->arch.vgic.enabled = !!(reg & GICD_CTLR_ENABLE_SS_G1);
> +		vgic_update_state(vcpu->kvm);
> +		return true;
> +	}
> +	return false;
> +}
> +
> +/*
> + * As this implementation does not provide compatibility
> + * with GICv2 (ARE==1), we report zero CPUs in bits [5..7].
> + * Also LPIs and MBIs are not supported, so we set the respective bits to 0.
> + * Also we report at most 2**10=1024 interrupt IDs (to match 1024 SPIs).
> + */
> +#define INTERRUPT_ID_BITS 10
> +static bool handle_mmio_typer(struct kvm_vcpu *vcpu,
> +			      struct kvm_exit_mmio *mmio, phys_addr_t offset)
> +{
> +	u32 reg;
> +
> +	/* we report at most 1024 IRQs via this interface */
> +	reg = (min(vcpu->kvm->arch.vgic.nr_irqs, 1024) >> 5) - 1;
> +
> +	reg |= (INTERRUPT_ID_BITS - 1) << 19;
> +
> +	vgic_reg_access(mmio, &reg, offset,
> +			ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
> +
> +	return false;
> +}
> +
> +static bool handle_mmio_iidr(struct kvm_vcpu *vcpu,
> +			     struct kvm_exit_mmio *mmio, phys_addr_t offset)
> +{
> +	u32 reg;
> +
> +	reg = (PRODUCT_ID_KVM << 24) | (IMPLEMENTER_ARM << 0);
> +	vgic_reg_access(mmio, &reg, offset,
> +			ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
> +
> +	return false;
> +}
> +
> +static bool handle_mmio_set_enable_reg_dist(struct kvm_vcpu *vcpu,
> +					    struct kvm_exit_mmio *mmio,
> +					    phys_addr_t offset)
> +{
> +	if (likely(offset >= VGIC_NR_PRIVATE_IRQS / 8))
> +		return vgic_handle_enable_reg(vcpu->kvm, mmio, offset,
> +					      vcpu->vcpu_id,
> +					      ACCESS_WRITE_SETBIT);
> +
> +	vgic_reg_access(mmio, NULL, offset,
> +			ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
> +	return false;
> +}
> +
> +static bool handle_mmio_clear_enable_reg_dist(struct kvm_vcpu *vcpu,
> +					      struct kvm_exit_mmio *mmio,
> +					      phys_addr_t offset)
> +{
> +	if (likely(offset >= VGIC_NR_PRIVATE_IRQS / 8))
> +		return vgic_handle_enable_reg(vcpu->kvm, mmio, offset,
> +					      vcpu->vcpu_id,
> +					      ACCESS_WRITE_CLEARBIT);
> +
> +	vgic_reg_access(mmio, NULL, offset,
> +			ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
> +	return false;
> +}
> +
> +static bool handle_mmio_set_pending_reg_dist(struct kvm_vcpu *vcpu,
> +					     struct kvm_exit_mmio *mmio,
> +					     phys_addr_t offset)
> +{
> +	if (likely(offset >= VGIC_NR_PRIVATE_IRQS / 8))
> +		return vgic_handle_set_pending_reg(vcpu->kvm, mmio, offset,
> +						   vcpu->vcpu_id);
> +
> +	vgic_reg_access(mmio, NULL, offset,
> +			ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
> +	return false;
> +}
> +
> +static bool handle_mmio_clear_pending_reg_dist(struct kvm_vcpu *vcpu,
> +					       struct kvm_exit_mmio *mmio,
> +					       phys_addr_t offset)
> +{
> +	if (likely(offset >= VGIC_NR_PRIVATE_IRQS / 8))
> +		return vgic_handle_clear_pending_reg(vcpu->kvm, mmio, offset,
> +						     vcpu->vcpu_id);
> +
> +	vgic_reg_access(mmio, NULL, offset,
> +			ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
> +	return false;
> +}
> +
> +static bool handle_mmio_priority_reg_dist(struct kvm_vcpu *vcpu,
> +					  struct kvm_exit_mmio *mmio,
> +					  phys_addr_t offset)
> +{
> +	u32 *reg;
> +
> +	if (unlikely(offset < VGIC_NR_PRIVATE_IRQS)) {
> +		vgic_reg_access(mmio, NULL, offset,
> +				ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
> +		return false;
> +	}
> +
> +	reg = vgic_bytemap_get_reg(&vcpu->kvm->arch.vgic.irq_priority,
> +				   vcpu->vcpu_id, offset);
> +	vgic_reg_access(mmio, reg, offset,
> +		ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
> +	return false;
> +}
> +
> +static bool handle_mmio_cfg_reg_dist(struct kvm_vcpu *vcpu,
> +				     struct kvm_exit_mmio *mmio,
> +				     phys_addr_t offset)
> +{
> +	u32 *reg;
> +
> +	if (unlikely(offset < VGIC_NR_PRIVATE_IRQS / 4)) {
> +		vgic_reg_access(mmio, NULL, offset,
> +				ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
> +		return false;
> +	}
> +
> +	reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_cfg,
> +				  vcpu->vcpu_id, offset >> 1);
> +
> +	return vgic_handle_cfg_reg(reg, mmio, offset);
> +}
> +
> +/*
> + * We use a compressed version of the MPIDR (all 32 bits in one 32-bit word)
> + * when we store the target MPIDR written by the guest.
> + */
> +static u32 compress_mpidr(unsigned long mpidr)
> +{
> +	u32 ret;
> +
> +	ret = MPIDR_AFFINITY_LEVEL(mpidr, 0);
> +	ret |= MPIDR_AFFINITY_LEVEL(mpidr, 1) << 8;
> +	ret |= MPIDR_AFFINITY_LEVEL(mpidr, 2) << 16;
> +	ret |= MPIDR_AFFINITY_LEVEL(mpidr, 3) << 24;
> +
> +	return ret;
> +}
> +
> +static unsigned long uncompress_mpidr(u32 value)
> +{
> +	unsigned long mpidr;
> +
> +	mpidr  = ((value >>  0) & 0xFF) << MPIDR_LEVEL_SHIFT(0);
> +	mpidr |= ((value >>  8) & 0xFF) << MPIDR_LEVEL_SHIFT(1);
> +	mpidr |= ((value >> 16) & 0xFF) << MPIDR_LEVEL_SHIFT(2);
> +	mpidr |= (u64)((value >> 24) & 0xFF) << MPIDR_LEVEL_SHIFT(3);
> +
> +	return mpidr;
> +}
> +
> +/*
> + * Lookup the given MPIDR value to get the vcpu_id (if there is one)
> + * and store that in the irq_spi_cpu[] array.
> + * This limits the number of VCPUs to 255 for now, extending the data
> + * type (or storing kvm_vcpu poiners) should lift the limit.
> + * Store the original MPIDR value in an extra array to support read-as-written.
> + * Unallocated MPIDRs are translated to a special value and caught
> + * before any array accesses.
> + */
> +static bool handle_mmio_route_reg(struct kvm_vcpu *vcpu,
> +				  struct kvm_exit_mmio *mmio,
> +				  phys_addr_t offset)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	struct vgic_dist *dist = &kvm->arch.vgic;
> +	int spi;
> +	u32 reg;
> +	int vcpu_id;
> +	unsigned long *bmap, mpidr;
> +
> +	/*
> +	 * The upper 32 bits of each 64 bit register are zero,
> +	 * as we don't support Aff3.
> +	 */
> +	if ((offset & 4)) {
> +		vgic_reg_access(mmio, NULL, offset,
> +				ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
> +		return false;
> +	}
> +
> +	/* This region only covers SPIs, so no handling of private IRQs here. */
> +	spi = offset / 8;
> +
> +	/* get the stored MPIDR for this IRQ */
> +	mpidr = uncompress_mpidr(dist->irq_spi_mpidr[spi]);
> +	mpidr &= MPIDR_HWID_BITMASK;
> +	reg = mpidr;
> +
> +	vgic_reg_access(mmio, &reg, offset,
> +			ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
> +
> +	if (!mmio->is_write)
> +		return false;
> +
> +	/*
> +	 * Now clear the currently assigned vCPU from the map, making room
> +	 * for the new one to be written below
> +	 */
> +	vcpu = kvm_mpidr_to_vcpu(kvm, mpidr);
> +	if (likely(vcpu)) {
> +		vcpu_id = vcpu->vcpu_id;
> +		bmap = vgic_bitmap_get_shared_map(&dist->irq_spi_target[vcpu_id]);
> +		__clear_bit(spi, bmap);
> +	}
> +
> +	dist->irq_spi_mpidr[spi] = compress_mpidr(reg);
> +	vcpu = kvm_mpidr_to_vcpu(kvm, reg & MPIDR_HWID_BITMASK);
> +
> +	/*
> +	 * The spec says that non-existent MPIDR values should not be
> +	 * forwarded to any existent (v)CPU, but should be able to become
> +	 * pending anyway. We simply keep the irq_spi_target[] array empty, so
> +	 * the interrupt will never be injected.
> +	 * irq_spi_cpu[irq] gets a magic value in this case.
> +	 */
> +	if (likely(vcpu)) {
> +		vcpu_id = vcpu->vcpu_id;
> +		dist->irq_spi_cpu[spi] = vcpu_id;
> +		bmap = vgic_bitmap_get_shared_map(&dist->irq_spi_target[vcpu_id]);
> +		__set_bit(spi, bmap);
> +	} else {
> +		dist->irq_spi_cpu[spi] = VCPU_NOT_ALLOCATED;
> +	}
> +
> +	vgic_update_state(kvm);
> +
> +	return true;
> +}
> +
> +/*
> + * We should be careful about promising too much when a guest reads
> + * this register. Don't claim to be like any hardware implementation,
> + * but just report the GIC as version 3 - which is what a Linux guest
> + * would check.
> + */
> +static bool handle_mmio_idregs(struct kvm_vcpu *vcpu,
> +			       struct kvm_exit_mmio *mmio,
> +			       phys_addr_t offset)
> +{
> +	u32 reg = 0;
> +
> +	switch (offset + GICD_IDREGS) {
> +	case GICD_PIDR2:
> +		reg = 0x3b;
> +		break;
> +	}
> +
> +	vgic_reg_access(mmio, &reg, offset,
> +			ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
> +
> +	return false;
> +}
> +
> +static const struct kvm_mmio_range vgic_v3_dist_ranges[] = {
> +	{
> +		.base           = GICD_CTLR,
> +		.len            = 0x04,
> +		.bits_per_irq   = 0,
> +		.handle_mmio    = handle_mmio_ctlr,
> +	},
> +	{
> +		.base           = GICD_TYPER,
> +		.len            = 0x04,
> +		.bits_per_irq   = 0,
> +		.handle_mmio    = handle_mmio_typer,
> +	},
> +	{
> +		.base           = GICD_IIDR,
> +		.len            = 0x04,
> +		.bits_per_irq   = 0,
> +		.handle_mmio    = handle_mmio_iidr,
> +	},
> +	{
> +		/* this register is optional, it is RAZ/WI if not implemented */
> +		.base           = GICD_STATUSR,
> +		.len            = 0x04,
> +		.bits_per_irq   = 0,
> +		.handle_mmio    = handle_mmio_raz_wi,
> +	},
> +	{
> +		/* this write only register is WI when TYPER.MBIS=0 */
> +		.base		= GICD_SETSPI_NSR,
> +		.len		= 0x04,
> +		.bits_per_irq	= 0,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		/* this write only register is WI when TYPER.MBIS=0 */
> +		.base		= GICD_CLRSPI_NSR,
> +		.len		= 0x04,
> +		.bits_per_irq	= 0,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		/* this is RAZ/WI when DS=1 */
> +		.base		= GICD_SETSPI_SR,
> +		.len		= 0x04,
> +		.bits_per_irq	= 0,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		/* this is RAZ/WI when DS=1 */
> +		.base		= GICD_CLRSPI_SR,
> +		.len		= 0x04,
> +		.bits_per_irq	= 0,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		.base		= GICD_IGROUPR,
> +		.len		= 0x80,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_rao_wi,
> +	},
> +	{
> +		.base		= GICD_ISENABLER,
> +		.len		= 0x80,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_set_enable_reg_dist,
> +	},
> +	{
> +		.base		= GICD_ICENABLER,
> +		.len		= 0x80,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_clear_enable_reg_dist,
> +	},
> +	{
> +		.base		= GICD_ISPENDR,
> +		.len		= 0x80,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_set_pending_reg_dist,
> +	},
> +	{
> +		.base		= GICD_ICPENDR,
> +		.len		= 0x80,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_clear_pending_reg_dist,
> +	},
> +	{
> +		.base		= GICD_ISACTIVER,
> +		.len		= 0x80,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		.base		= GICD_ICACTIVER,
> +		.len		= 0x80,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		.base		= GICD_IPRIORITYR,
> +		.len		= 0x400,
> +		.bits_per_irq	= 8,
> +		.handle_mmio	= handle_mmio_priority_reg_dist,
> +	},
> +	{
> +		/* TARGETSRn is RES0 when ARE=1 */
> +		.base		= GICD_ITARGETSR,
> +		.len		= 0x400,
> +		.bits_per_irq	= 8,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		.base		= GICD_ICFGR,
> +		.len		= 0x100,
> +		.bits_per_irq	= 2,
> +		.handle_mmio	= handle_mmio_cfg_reg_dist,
> +	},
> +	{
> +		/* this is RAZ/WI when DS=1 */
> +		.base		= GICD_IGRPMODR,
> +		.len		= 0x80,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		/* this is RAZ/WI when DS=1 */
> +		.base		= GICD_NSACR,
> +		.len		= 0x100,
> +		.bits_per_irq	= 2,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		/* this is RAZ/WI when ARE=1 */
> +		.base		= GICD_SGIR,
> +		.len		= 0x04,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		/* this is RAZ/WI when ARE=1 */
> +		.base		= GICD_CPENDSGIR,
> +		.len		= 0x10,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		/* this is RAZ/WI when ARE=1 */
> +		.base           = GICD_SPENDSGIR,
> +		.len            = 0x10,
> +		.handle_mmio    = handle_mmio_raz_wi,
> +	},
> +	{
> +		.base		= GICD_IROUTER + 0x100,
> +		.len		= 0x1edc,
> +		.bits_per_irq	= 64,
> +		.handle_mmio	= handle_mmio_route_reg,
> +	},
> +	{
> +		.base           = GICD_IDREGS,
> +		.len            = 0x30,
> +		.bits_per_irq   = 0,
> +		.handle_mmio    = handle_mmio_idregs,
> +	},
> +	{},
> +};
> +
> +static bool handle_mmio_set_enable_reg_redist(struct kvm_vcpu *vcpu,
> +					      struct kvm_exit_mmio *mmio,
> +					      phys_addr_t offset)
> +{
> +	struct kvm_vcpu *redist_vcpu = mmio->private;
> +
> +	return vgic_handle_enable_reg(vcpu->kvm, mmio, offset,
> +				      redist_vcpu->vcpu_id,
> +				      ACCESS_WRITE_SETBIT);
> +}
> +
> +static bool handle_mmio_clear_enable_reg_redist(struct kvm_vcpu *vcpu,
> +						struct kvm_exit_mmio *mmio,
> +						phys_addr_t offset)
> +{
> +	struct kvm_vcpu *redist_vcpu = mmio->private;
> +
> +	return vgic_handle_enable_reg(vcpu->kvm, mmio, offset,
> +				      redist_vcpu->vcpu_id,
> +				      ACCESS_WRITE_CLEARBIT);
> +}
> +
> +static bool handle_mmio_set_pending_reg_redist(struct kvm_vcpu *vcpu,
> +					       struct kvm_exit_mmio *mmio,
> +					       phys_addr_t offset)
> +{
> +	struct kvm_vcpu *redist_vcpu = mmio->private;
> +
> +	return vgic_handle_set_pending_reg(vcpu->kvm, mmio, offset,
> +					   redist_vcpu->vcpu_id);
> +}
> +
> +static bool handle_mmio_clear_pending_reg_redist(struct kvm_vcpu *vcpu,
> +						 struct kvm_exit_mmio *mmio,
> +						 phys_addr_t offset)
> +{
> +	struct kvm_vcpu *redist_vcpu = mmio->private;
> +
> +	return vgic_handle_clear_pending_reg(vcpu->kvm, mmio, offset,
> +					     redist_vcpu->vcpu_id);
> +}
> +
> +static bool handle_mmio_priority_reg_redist(struct kvm_vcpu *vcpu,
> +					    struct kvm_exit_mmio *mmio,
> +					    phys_addr_t offset)
> +{
> +	struct kvm_vcpu *redist_vcpu = mmio->private;
> +	u32 *reg;
> +
> +	reg = vgic_bytemap_get_reg(&vcpu->kvm->arch.vgic.irq_priority,
> +				   redist_vcpu->vcpu_id, offset);
> +	vgic_reg_access(mmio, reg, offset,
> +			ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
> +	return false;
> +}
> +
> +static bool handle_mmio_cfg_reg_redist(struct kvm_vcpu *vcpu,
> +				       struct kvm_exit_mmio *mmio,
> +				       phys_addr_t offset)
> +{
> +	struct kvm_vcpu *redist_vcpu = mmio->private;
> +
> +	u32 *reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_cfg,
> +				       redist_vcpu->vcpu_id, offset >> 1);
> +
> +	return vgic_handle_cfg_reg(reg, mmio, offset);
> +}
> +
> +static const struct kvm_mmio_range vgic_redist_sgi_ranges[] = {
> +	{
> +		.base		= GICR_IGROUPR0,
> +		.len		= 0x04,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		.base		= GICR_ISENABLER0,
> +		.len		= 0x04,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_set_enable_reg_redist,
> +	},
> +	{
> +		.base		= GICR_ICENABLER0,
> +		.len		= 0x04,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_clear_enable_reg_redist,
> +	},
> +	{
> +		.base		= GICR_ISPENDR0,
> +		.len		= 0x04,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_set_pending_reg_redist,
> +	},
> +	{
> +		.base		= GICR_ICPENDR0,
> +		.len		= 0x04,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_clear_pending_reg_redist,
> +	},
> +	{
> +		.base		= GICR_ISACTIVER0,
> +		.len		= 0x04,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		.base		= GICR_ICACTIVER0,
> +		.len		= 0x04,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		.base		= GICR_IPRIORITYR0,
> +		.len		= 0x20,
> +		.bits_per_irq	= 8,
> +		.handle_mmio	= handle_mmio_priority_reg_redist,
> +	},
> +	{
> +		.base		= GICR_ICFGR0,
> +		.len		= 0x08,
> +		.bits_per_irq	= 2,
> +		.handle_mmio	= handle_mmio_cfg_reg_redist,
> +	},
> +	{
> +		.base		= GICR_IGRPMODR0,
> +		.len		= 0x04,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		.base		= GICR_NSACR,
> +		.len		= 0x04,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{},
> +};
> +
> +static bool handle_mmio_ctlr_redist(struct kvm_vcpu *vcpu,
> +				    struct kvm_exit_mmio *mmio,
> +				    phys_addr_t offset)
> +{
> +	/* since we don't support LPIs, this register is zero for now */
> +	vgic_reg_access(mmio, NULL, offset,
> +			ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
> +	return false;
> +}
> +
> +static bool handle_mmio_typer_redist(struct kvm_vcpu *vcpu,
> +				     struct kvm_exit_mmio *mmio,
> +				     phys_addr_t offset)
> +{
> +	u32 reg;
> +	u64 mpidr;
> +	struct kvm_vcpu *redist_vcpu = mmio->private;
> +	int target_vcpu_id = redist_vcpu->vcpu_id;
> +
> +	/* the upper 32 bits contain the affinity value */
> +	if ((offset & ~3) == 4) {
> +		mpidr = kvm_vcpu_get_mpidr_aff(redist_vcpu);
> +		reg = compress_mpidr(mpidr);
> +
> +		vgic_reg_access(mmio, &reg, offset,
> +				ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
> +		return false;
> +	}
> +
> +	reg = redist_vcpu->vcpu_id << 8;
> +	if (target_vcpu_id == atomic_read(&vcpu->kvm->online_vcpus) - 1)
> +		reg |= GICR_TYPER_LAST;
> +	vgic_reg_access(mmio, &reg, offset,
> +			ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
> +	return false;
> +}
> +
> +static const struct kvm_mmio_range vgic_redist_ranges[] = {
> +	{
> +		.base           = GICR_CTLR,
> +		.len            = 0x04,
> +		.bits_per_irq   = 0,
> +		.handle_mmio    = handle_mmio_ctlr_redist,
> +	},
> +	{
> +		.base           = GICR_TYPER,
> +		.len            = 0x08,
> +		.bits_per_irq   = 0,
> +		.handle_mmio    = handle_mmio_typer_redist,
> +	},
> +	{
> +		.base           = GICR_IIDR,
> +		.len            = 0x04,
> +		.bits_per_irq   = 0,
> +		.handle_mmio    = handle_mmio_iidr,
> +	},
> +	{
> +		.base           = GICR_WAKER,
> +		.len            = 0x04,
> +		.bits_per_irq   = 0,
> +		.handle_mmio    = handle_mmio_raz_wi,
> +	},
> +	{
> +		.base           = GICR_IDREGS,
> +		.len            = 0x30,
> +		.bits_per_irq   = 0,
> +		.handle_mmio    = handle_mmio_idregs,
> +	},
> +	{},
> +};
> +
> +/*
> + * This function splits accesses between the distributor and the two
> + * redistributor parts (private/SPI). As each redistributor is accessible
> + * from any CPU, we have to determine the affected VCPU by taking the faulting
> + * address into account. We then pass this VCPU to the handler function via
> + * the private parameter.
> + */
> +#define SGI_BASE_OFFSET SZ_64K
> +static bool vgic_v3_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
> +				struct kvm_exit_mmio *mmio)
> +{
> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +	unsigned long dbase = dist->vgic_dist_base;
> +	unsigned long rdbase = dist->vgic_redist_base;
> +	int nrcpus = atomic_read(&vcpu->kvm->online_vcpus);
> +	int vcpu_id;
> +	const struct kvm_mmio_range *mmio_range;
> +
> +	if (is_in_range(mmio->phys_addr, mmio->len, dbase, GIC_V3_DIST_SIZE)) {
> +		return vgic_handle_mmio_range(vcpu, run, mmio,
> +					      vgic_v3_dist_ranges, dbase);
> +	}
> +
> +	if (!is_in_range(mmio->phys_addr, mmio->len, rdbase,
> +	    GIC_V3_REDIST_SIZE * nrcpus))
> +		return false;
> +
> +	vcpu_id = (mmio->phys_addr - rdbase) / GIC_V3_REDIST_SIZE;
> +	rdbase += (vcpu_id * GIC_V3_REDIST_SIZE);
> +	mmio->private = kvm_get_vcpu(vcpu->kvm, vcpu_id);
> +
> +	if (mmio->phys_addr >= rdbase + SGI_BASE_OFFSET) {
> +		rdbase += SGI_BASE_OFFSET;
> +		mmio_range = vgic_redist_sgi_ranges;
> +	} else {
> +		mmio_range = vgic_redist_ranges;
> +	}
> +	return vgic_handle_mmio_range(vcpu, run, mmio, mmio_range, rdbase);
> +}
> +
> +static bool vgic_v3_queue_sgi(struct kvm_vcpu *vcpu, int irq)
> +{
> +	if (vgic_queue_irq(vcpu, 0, irq)) {
> +		vgic_dist_irq_clear_pending(vcpu, irq);
> +		vgic_cpu_irq_clear(vcpu, irq);
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
> +static int vgic_v3_init_maps(struct vgic_dist *dist)
> +{
> +	int nr_spis = dist->nr_irqs - VGIC_NR_PRIVATE_IRQS;
> +
> +	dist->irq_spi_mpidr = kcalloc(nr_spis, sizeof(dist->irq_spi_mpidr[0]),
> +				      GFP_KERNEL);
> +
> +	if (!dist->irq_spi_mpidr)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +static int vgic_v3_init(struct kvm *kvm, const struct vgic_params *params)
> +{
> +	struct vgic_dist *dist = &kvm->arch.vgic;
> +	int ret, i;
> +	u32 mpidr;
> +
> +	if (IS_VGIC_ADDR_UNDEF(dist->vgic_dist_base) ||
> +	    IS_VGIC_ADDR_UNDEF(dist->vgic_redist_base)) {
> +		kvm_err("Need to set vgic distributor addresses first\n");
> +		return -ENXIO;
> +	}
> +
> +	/*
> +	 * FIXME: this should be moved to init_maps time, and may bite
> +	 * us when adding save/restore. Add a per-emulation hook?
> +	 */
> +	ret = vgic_v3_init_maps(dist);
> +	if (ret) {
> +		kvm_err("Unable to allocate maps\n");
> +		return ret;
> +	}
> +
> +	/* Initialize the target VCPUs for each IRQ to VCPU 0 */
> +	mpidr = compress_mpidr(kvm_vcpu_get_mpidr_aff(kvm_get_vcpu(kvm, 0)));
> +	for (i = VGIC_NR_PRIVATE_IRQS; i < dist->nr_irqs; i++) {
> +		dist->irq_spi_cpu[i - VGIC_NR_PRIVATE_IRQS] = 0;
> +		dist->irq_spi_mpidr[i - VGIC_NR_PRIVATE_IRQS] = mpidr;
> +		vgic_bitmap_set_irq_val(dist->irq_spi_target, 0, i, 1);
> +	}
> +
> +	return 0;
> +}
> +
> +/* GICv3 does not keep track of SGI sources anymore. */
> +static void vgic_v3_add_sgi_source(struct kvm_vcpu *vcpu, int irq, int source)
> +{
> +}
> +
> +int vgic_v3_init_emulation(struct kvm *kvm)
> +{
> +	struct vgic_dist *dist = &kvm->arch.vgic;
> +
> +	dist->vm_ops.handle_mmio = vgic_v3_handle_mmio;
> +	dist->vm_ops.queue_sgi = vgic_v3_queue_sgi;
> +	dist->vm_ops.add_sgi_source = vgic_v3_add_sgi_source;
> +	dist->vm_ops.vgic_init = vgic_v3_init;
> +
> +	kvm->arch.max_vcpus = KVM_MAX_VCPUS;
> +
> +	return 0;
> +}
> +
> +static int vgic_v3_create(struct kvm_device *dev, u32 type)
> +{
> +	return kvm_vgic_create(dev->kvm, type);
> +}
> +
> +static void vgic_v3_destroy(struct kvm_device *dev)
> +{
> +	kfree(dev);
> +}
> +
> +static int vgic_v3_set_attr(struct kvm_device *dev,
> +			    struct kvm_device_attr *attr)
> +{
> +	int ret;
> +
> +	ret = vgic_set_common_attr(dev, attr);
> +	if (ret != -ENXIO)
> +		return ret;
> +
> +	switch (attr->group) {
> +	case KVM_DEV_ARM_VGIC_GRP_DIST_REGS:
> +	case KVM_DEV_ARM_VGIC_GRP_CPU_REGS:
> +		return -ENXIO;
> +	}
> +
> +	return -ENXIO;
> +}
> +
> +static int vgic_v3_get_attr(struct kvm_device *dev,
> +			    struct kvm_device_attr *attr)
> +{
> +	int ret;
> +
> +	ret = vgic_get_common_attr(dev, attr);
> +	if (ret != -ENXIO)
> +		return ret;
> +
> +	switch (attr->group) {
> +	case KVM_DEV_ARM_VGIC_GRP_DIST_REGS:
> +	case KVM_DEV_ARM_VGIC_GRP_CPU_REGS:
> +		return -ENXIO;
> +	}
> +
> +	return -ENXIO;
> +}
> +
> +static int vgic_v3_has_attr(struct kvm_device *dev,
> +			    struct kvm_device_attr *attr)
> +{
> +	switch (attr->group) {
> +	case KVM_DEV_ARM_VGIC_GRP_ADDR:
> +		switch (attr->attr) {
> +		case KVM_VGIC_V2_ADDR_TYPE_DIST:
> +		case KVM_VGIC_V2_ADDR_TYPE_CPU:
> +			return -ENXIO;
> +		}
> +		break;
> +	case KVM_DEV_ARM_VGIC_GRP_DIST_REGS:
> +	case KVM_DEV_ARM_VGIC_GRP_CPU_REGS:
> +		return -ENXIO;
> +	case KVM_DEV_ARM_VGIC_GRP_NR_IRQS:
> +		return 0;
> +	}
> +	return -ENXIO;
> +}
> +
> +struct kvm_device_ops kvm_arm_vgic_v3_ops = {
> +	.name = "kvm-arm-vgic-v3",
> +	.create = vgic_v3_create,
> +	.destroy = vgic_v3_destroy,
> +	.set_attr = vgic_v3_set_attr,
> +	.get_attr = vgic_v3_get_attr,
> +	.has_attr = vgic_v3_has_attr,
> +};
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 335ffe0..b7de0f8 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1249,7 +1249,7 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>  	struct kvm_vcpu *vcpu;
>  	int edge_triggered, level_triggered;
>  	int enabled;
> -	bool ret = true;
> +	bool ret = true, can_inject = true;
>  
>  	spin_lock(&dist->lock);
>  
> @@ -1264,6 +1264,11 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>  
>  	if (irq_num >= VGIC_NR_PRIVATE_IRQS) {
>  		cpuid = dist->irq_spi_cpu[irq_num - VGIC_NR_PRIVATE_IRQS];
> +		if (cpuid == VCPU_NOT_ALLOCATED) {
> +			/* Pretend we use CPU0, and prevent injection */
> +			cpuid = 0;
> +			can_inject = false;
> +		}
>  		vcpu = kvm_get_vcpu(kvm, cpuid);
>  	}
>  
> @@ -1285,7 +1290,7 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>  
>  	enabled = vgic_irq_is_enabled(vcpu, irq_num);
>  
> -	if (!enabled) {
> +	if (!enabled || !can_inject) {
>  		ret = false;
>  		goto out;
>  	}
> @@ -1438,6 +1443,7 @@ void kvm_vgic_destroy(struct kvm *kvm)
>  	}
>  	kfree(dist->irq_sgi_sources);
>  	kfree(dist->irq_spi_cpu);
> +	kfree(dist->irq_spi_mpidr);
>  	kfree(dist->irq_spi_target);
>  	kfree(dist->irq_pending_on_cpu);
>  	dist->irq_sgi_sources = NULL;
> @@ -1628,6 +1634,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
>  	kvm->arch.vgic.vctrl_base = vgic->vctrl_base;
>  	kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
>  	kvm->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
> +	kvm->arch.vgic.vgic_redist_base = VGIC_ADDR_UNDEF;
>  
>  out_unlock:
>  	for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
> diff --git a/virt/kvm/arm/vgic.h b/virt/kvm/arm/vgic.h
> index ff3171a..b0c6b2f 100644
> --- a/virt/kvm/arm/vgic.h
> +++ b/virt/kvm/arm/vgic.h
> @@ -35,6 +35,8 @@
>  #define ACCESS_WRITE_VALUE	(3 << 1)
>  #define ACCESS_WRITE_MASK(x)	((x) & (3 << 1))
>  
> +#define VCPU_NOT_ALLOCATED	((u8)-1)
> +
>  unsigned long *vgic_bitmap_get_shared_map(struct vgic_bitmap *x);
>  
>  void vgic_update_state(struct kvm *kvm);
> @@ -115,5 +117,6 @@ int vgic_set_common_attr(struct kvm_device *dev, struct kvm_device_attr *attr);
>  int vgic_get_common_attr(struct kvm_device *dev, struct kvm_device_attr *attr);
>  
>  int vgic_v2_init_emulation(struct kvm *kvm);
> +int vgic_v3_init_emulation(struct kvm *kvm);
>  
>  #endif
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 01/19] arm/arm64: KVM: rework MPIDR assignment and add accessors
  2014-11-14 10:07 ` [PATCH v4 01/19] arm/arm64: KVM: rework MPIDR assignment and add accessors Andre Przywara
  2014-11-18 10:35   ` Eric Auger
@ 2014-11-23  9:34   ` Christoffer Dall
  1 sibling, 0 replies; 80+ messages in thread
From: Christoffer Dall @ 2014-11-23  9:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Nov 14, 2014 at 10:07:45AM +0000, Andre Przywara wrote:
> The virtual MPIDR registers (containing topology information) for the
> guest are currently mapped linearily to the vcpu_id. Improve this
> mapping for arm64 by using three levels to not artificially limit the
> number of vCPUs.
> To help this, change and rename the kvm_vcpu_get_mpidr() function to
> mask off the non-affinity bits in the MPIDR register.
> Also add an accessor to later allow easier access to a vCPU with a
> given MPIDR. Use this new accessor in the PSCI emulation.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
> Changelog v3...v4:
> - rename kvm_vcpu_get_mpidr() to kvm_vcpu_get_mpidr_aff()
> - simplify kvm_mpidr_to_vcpu()
> - fixup comment
> 
>  arch/arm/include/asm/kvm_emulate.h   |    5 +++--
>  arch/arm/include/asm/kvm_host.h      |    2 ++
>  arch/arm/kvm/arm.c                   |   13 +++++++++++++
>  arch/arm/kvm/psci.c                  |   17 +++++------------
>  arch/arm64/include/asm/kvm_emulate.h |    5 +++--
>  arch/arm64/include/asm/kvm_host.h    |    2 ++
>  arch/arm64/kvm/sys_regs.c            |   11 +++++++++--
>  7 files changed, 37 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> index b9db269..3ae88ac 100644
> --- a/arch/arm/include/asm/kvm_emulate.h
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -23,6 +23,7 @@
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_mmio.h>
>  #include <asm/kvm_arm.h>
> +#include <asm/cputype.h>
>  
>  unsigned long *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num);
>  unsigned long *vcpu_spsr(struct kvm_vcpu *vcpu);
> @@ -162,9 +163,9 @@ static inline u32 kvm_vcpu_hvc_get_imm(struct kvm_vcpu *vcpu)
>  	return kvm_vcpu_get_hsr(vcpu) & HSR_HVC_IMM_MASK;
>  }
>  
> -static inline unsigned long kvm_vcpu_get_mpidr(struct kvm_vcpu *vcpu)
> +static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
>  {
> -	return vcpu->arch.cp15[c0_MPIDR];
> +	return vcpu->arch.cp15[c0_MPIDR] & MPIDR_HWID_BITMASK;
>  }
>  
>  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 53036e2..b443dfe 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -236,6 +236,8 @@ static inline void vgic_arch_setup(const struct vgic_params *vgic)
>  int kvm_perf_init(void);
>  int kvm_perf_teardown(void);
>  
> +struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
> +
>  static inline void kvm_arch_hardware_disable(void) {}
>  static inline void kvm_arch_hardware_unsetup(void) {}
>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 9e193c8..c2a5c69 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -977,6 +977,19 @@ static void check_kvm_target_cpu(void *ret)
>  	*(int *)ret = kvm_target_cpu();
>  }
>  
> +struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr)
> +{
> +	struct kvm_vcpu *vcpu;
> +	int i;
> +
> +	mpidr &= MPIDR_HWID_BITMASK;
> +	kvm_for_each_vcpu(i, vcpu, kvm) {
> +		if (mpidr == kvm_vcpu_get_mpidr_aff(vcpu))
> +			return vcpu;
> +	}
> +	return NULL;
> +}
> +
>  /**
>   * Initialize Hyp-mode and memory mappings on all CPUs.
>   */
> diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
> index 09cf377..84121b2 100644
> --- a/arch/arm/kvm/psci.c
> +++ b/arch/arm/kvm/psci.c
> @@ -21,6 +21,7 @@
>  #include <asm/cputype.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_psci.h>
> +#include <asm/kvm_host.h>
>  
>  /*
>   * This is an implementation of the Power State Coordination Interface
> @@ -65,25 +66,17 @@ static void kvm_psci_vcpu_off(struct kvm_vcpu *vcpu)
>  static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
>  {
>  	struct kvm *kvm = source_vcpu->kvm;
> -	struct kvm_vcpu *vcpu = NULL, *tmp;
> +	struct kvm_vcpu *vcpu = NULL;
>  	wait_queue_head_t *wq;
>  	unsigned long cpu_id;
>  	unsigned long context_id;
> -	unsigned long mpidr;
>  	phys_addr_t target_pc;
> -	int i;
>  
> -	cpu_id = *vcpu_reg(source_vcpu, 1);
> +	cpu_id = *vcpu_reg(source_vcpu, 1) & MPIDR_HWID_BITMASK;
>  	if (vcpu_mode_is_32bit(source_vcpu))
>  		cpu_id &= ~((u32) 0);
>  
> -	kvm_for_each_vcpu(i, tmp, kvm) {
> -		mpidr = kvm_vcpu_get_mpidr(tmp);
> -		if ((mpidr & MPIDR_HWID_BITMASK) == (cpu_id & MPIDR_HWID_BITMASK)) {
> -			vcpu = tmp;
> -			break;
> -		}
> -	}
> +	vcpu = kvm_mpidr_to_vcpu(kvm, cpu_id);
>  
>  	/*
>  	 * Make sure the caller requested a valid CPU and that the CPU is
> @@ -154,7 +147,7 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
>  	 * then ON else OFF
>  	 */
>  	kvm_for_each_vcpu(i, tmp, kvm) {
> -		mpidr = kvm_vcpu_get_mpidr(tmp);
> +		mpidr = kvm_vcpu_get_mpidr_aff(tmp);
>  		if (((mpidr & target_affinity_mask) == target_affinity) &&
>  		    !tmp->arch.pause) {
>  			return PSCI_0_2_AFFINITY_LEVEL_ON;
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 5674a55..d4daaa5 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -27,6 +27,7 @@
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_mmio.h>
>  #include <asm/ptrace.h>
> +#include <asm/cputype.h>
>  
>  unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num);
>  unsigned long *vcpu_spsr32(const struct kvm_vcpu *vcpu);
> @@ -182,9 +183,9 @@ static inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vcpu)
>  	return kvm_vcpu_get_hsr(vcpu) & ESR_EL2_FSC_TYPE;
>  }
>  
> -static inline unsigned long kvm_vcpu_get_mpidr(struct kvm_vcpu *vcpu)
> +static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
>  {
> -	return vcpu_sys_reg(vcpu, MPIDR_EL1);
> +	return vcpu_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
>  }
>  
>  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 2012c4b..286bb61 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -207,6 +207,8 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
>  int kvm_perf_init(void);
>  int kvm_perf_teardown(void);
>  
> +struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
> +
>  static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
>  				       phys_addr_t pgd_ptr,
>  				       unsigned long hyp_stack_ptr,
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 4cc3b71..fd3ffc3 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -252,10 +252,17 @@ static void reset_amair_el1(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
>  
>  static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
>  {
> +	u64 mpidr;
> +
>  	/*
> -	 * Simply map the vcpu_id into the Aff0 field of the MPIDR.
> +	 * Map the vcpu_id into the first three Aff fields of the MPIDR.
> +	 * We limit the number of VCPUs in Aff0 due to a limitation in the
> +	 * ICC_SGIxR registers of the GICv3.
>  	 */
> -	vcpu_sys_reg(vcpu, MPIDR_EL1) = (1UL << 31) | (vcpu->vcpu_id & 0xff);
> +	mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
> +	mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
> +	mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
> +	vcpu_sys_reg(vcpu, MPIDR_EL1) = (1ULL << 31) | mpidr;
>  }
>  
>  /* Silly macro to expand the DBG{BCR,BVR,WVR,WCR}n_EL1 registers in one go */
> -- 
> 1.7.9.5
> 

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 04/19] arm/arm64: KVM: wrap 64 bit MMIO accesses with two 32 bit ones
  2014-11-14 10:07 ` [PATCH v4 04/19] arm/arm64: KVM: wrap 64 bit MMIO accesses with two 32 bit ones Andre Przywara
  2014-11-18 10:36   ` Eric Auger
@ 2014-11-23  9:42   ` Christoffer Dall
  2014-11-24 13:50     ` Andre Przywara
  1 sibling, 1 reply; 80+ messages in thread
From: Christoffer Dall @ 2014-11-23  9:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Nov 14, 2014 at 10:07:48AM +0000, Andre Przywara wrote:
> Some GICv3 registers can and will be accessed as 64 bit registers.
> Currently the register handling code can only deal with 32 bit
> accesses, so we do two consecutive calls to cover this.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
> Changelog v3...v4:
> - add comment explaining little endian handling
> 
>  virt/kvm/arm/vgic.c |   51 ++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 48 insertions(+), 3 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 5eee3de..dba51e4 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1033,6 +1033,51 @@ static bool vgic_validate_access(const struct vgic_dist *dist,
>  }
>  
>  /*
> + * Call the respective handler function for the given range.
> + * We split up any 64 bit accesses into two consecutive 32 bit
> + * handler calls and merge the result afterwards.
> + * We do this in a little endian fashion regardless of the host's
> + * or guest's endianness, because the GIC is always LE and the rest of
> + * the code (vgic_reg_access) also puts it in a LE fashion already.
> + */
> +static bool call_range_handler(struct kvm_vcpu *vcpu,
> +			       struct kvm_exit_mmio *mmio,
> +			       unsigned long offset,
> +			       const struct mmio_range *range)
> +{
> +	u32 *data32 = (void *)mmio->data;
> +	struct kvm_exit_mmio mmio32;
> +	bool ret;
> +
> +	if (likely(mmio->len <= 4))
> +		return range->handle_mmio(vcpu, mmio, offset);
> +
> +	/*
> +	 * Any access bigger than 4 bytes (that we currently handle in KVM)
> +	 * is actually 8 bytes long, caused by a 64-bit access
> +	 */
> +
> +	mmio32.len = 4;
> +	mmio32.is_write = mmio->is_write;
> +
> +	mmio32.phys_addr = mmio->phys_addr + 4;
> +	if (mmio->is_write)
> +		*(u32 *)mmio32.data = data32[1];
> +	ret = range->handle_mmio(vcpu, &mmio32, offset + 4);
> +	if (!mmio->is_write)
> +		data32[1] = *(u32 *)mmio32.data;
> +
> +	mmio32.phys_addr = mmio->phys_addr;
> +	if (mmio->is_write)
> +		*(u32 *)mmio32.data = data32[0];
> +	ret |= range->handle_mmio(vcpu, &mmio32, offset);

nit: if handle_mmio returns multiple error codes, we will now not
(necessarily) be preserving either, so you may just want to do a check
on ret above and return early in the case of error.  Only worth it if
you respin anyway.

Otherwise:
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

> +	if (!mmio->is_write)
> +		data32[0] = *(u32 *)mmio32.data;
> +
> +	return ret;
> +}
> +
> +/*
>   * vgic_handle_mmio_range - handle an in-kernel MMIO access
>   * @vcpu:	pointer to the vcpu performing the access
>   * @run:	pointer to the kvm_run structure
> @@ -1063,10 +1108,10 @@ static bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, struct kvm_run *run,
>  	spin_lock(&vcpu->kvm->arch.vgic.lock);
>  	offset -= range->base;
>  	if (vgic_validate_access(dist, range, offset)) {
> -		updated_state = range->handle_mmio(vcpu, mmio, offset);
> +		updated_state = call_range_handler(vcpu, mmio, offset, range);
>  	} else {
> -		vgic_reg_access(mmio, NULL, offset,
> -				ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
> +		if (!mmio->is_write)
> +			memset(mmio->data, 0, mmio->len);
>  		updated_state = false;
>  	}
>  	spin_unlock(&vcpu->kvm->arch.vgic.lock);
> -- 
> 1.7.9.5
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 05/19] arm/arm64: KVM: introduce per-VM ops
  2014-11-14 10:07 ` [PATCH v4 05/19] arm/arm64: KVM: introduce per-VM ops Andre Przywara
@ 2014-11-23  9:58   ` Christoffer Dall
  0 siblings, 0 replies; 80+ messages in thread
From: Christoffer Dall @ 2014-11-23  9:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Nov 14, 2014 at 10:07:49AM +0000, Andre Przywara wrote:
> Currently we only have one virtual GIC model supported, so all guests
> use the same emulation code. With the addition of another model we
> end up with different guests using potentially different vGIC models,
> so we have to split up some functions to be per VM.
> Introduce a vgic_vm_ops struct to hold function pointers for those
> functions that are different and provide the necessary code to
> initialize them.
> Also split up the kvm_vgic_init() function to separate out VGIC model
> specific functionality into a separate function, which will later be
> different for a GICv3 model.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
> Changelog v3...v4:
> - add accessor functions for vm_ops members
> - introduce init_vgic_model() to differentiate between guest GIC models
> - simplify vgic_v2_init_emulation() 
> - help debugging by hinting on handle_mmio codeflow in comment
> 
That helped a lot as this patch goes, thanks for reworking the function
pointers and init sequence:

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 08/19] arm/arm64: KVM: make the maximum number of vCPUs a per-VM value
  2014-11-14 10:07 ` [PATCH v4 08/19] arm/arm64: KVM: make the maximum number of vCPUs a per-VM value Andre Przywara
@ 2014-11-23 13:21   ` Christoffer Dall
  2014-12-08 14:10     ` Andre Przywara
  0 siblings, 1 reply; 80+ messages in thread
From: Christoffer Dall @ 2014-11-23 13:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Nov 14, 2014 at 10:07:52AM +0000, Andre Przywara wrote:
> Currently the maximum number of vCPUs supported is a global value
> limited by the used GIC model. GICv3 will lift this limit, but we
> still need to observe it for guests using GICv2.
> So the maximum number of vCPUs is per-VM value, depending on the
> GIC model the guest uses.
> Store and check the value in struct kvm_arch, but keep it down to
> 8 for now.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
> Changelog v3...v4:
> - initialize max_vcpus with limit based on host GIC
> - remove *_init_emul_* from VGIC backend
> - refine VCPU limit on VGIC creation
> - print warning when userland tries to create more VCPUs than supported
> 
>  arch/arm/include/asm/kvm_host.h   |    1 +
>  arch/arm/kvm/arm.c                |    8 ++++++++
>  arch/arm64/include/asm/kvm_host.h |    3 +++
>  include/kvm/arm_vgic.h            |    2 ++
>  virt/kvm/arm/vgic-v2.c            |    1 +
>  virt/kvm/arm/vgic-v3.c            |    1 +
>  virt/kvm/arm/vgic.c               |   22 ++++++++++++++++++++++
>  7 files changed, 38 insertions(+)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index b443dfe..7969e6e 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -68,6 +68,7 @@ struct kvm_arch {
>  
>  	/* Interrupt controller */
>  	struct vgic_dist	vgic;
> +	int max_vcpus;
>  };
>  
>  #define KVM_NR_MEM_OBJS     40
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 8817fbd..c3d0fbd 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -132,6 +132,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	/* Mark the initial VMID generation invalid */
>  	kvm->arch.vmid_gen = 0;
>  
> +	/* The maximum number of VCPUs is limited by the host's GIC model */
> +	kvm->arch.max_vcpus = kvm_vgic_get_max_vcpus();

I think you forgot to declare this one in arm_vgic.h for
v7-without-vgic-but-with-kvm-configure-case
(which-we-should-have-gotten-rid-of-a-while-back-perhaps).

> +
>  	return ret;
>  out_free_stage2_pgd:
>  	kvm_free_stage2_pgd(kvm);
> @@ -213,6 +216,11 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
>  	int err;
>  	struct kvm_vcpu *vcpu;
>  
> +	if (id >= kvm->arch.max_vcpus) {
> +		err = -EINVAL;
> +		goto out;
> +	}
> +
>  	vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
>  	if (!vcpu) {
>  		err = -ENOMEM;
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 286bb61..f9e130d 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -59,6 +59,9 @@ struct kvm_arch {
>  	/* VTTBR value associated with above pgd and vmid */
>  	u64    vttbr;
>  
> +	/* The maximum number of vCPUs depends on the used GIC model */
> +	int max_vcpus;
> +
>  	/* Interrupt controller */
>  	struct vgic_dist	vgic;
>  
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index bfb660a..09344ac 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -132,6 +132,7 @@ struct vgic_params {
>  	unsigned int	maint_irq;
>  	/* Virtual control interface base address */
>  	void __iomem	*vctrl_base;
> +	int		max_hw_vcpus;

nit: max_vcpus or max_gic_vcpus would be more meaningful imho.

>  };
>  
>  struct vgic_vm_ops {
> @@ -287,6 +288,7 @@ struct kvm_exit_mmio;
>  #ifdef CONFIG_KVM_ARM_VGIC
>  int kvm_vgic_addr(struct kvm *kvm, unsigned long type, u64 *addr, bool write);
>  int kvm_vgic_hyp_init(void);
> +int kvm_vgic_get_max_vcpus(void);
>  int kvm_vgic_init(struct kvm *kvm);
>  int kvm_vgic_create(struct kvm *kvm, u32 type);
>  void kvm_vgic_destroy(struct kvm *kvm);
> diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
> index e1cd3cb..49fb288 100644
> --- a/virt/kvm/arm/vgic-v2.c
> +++ b/virt/kvm/arm/vgic-v2.c
> @@ -237,6 +237,7 @@ int vgic_v2_probe(struct device_node *vgic_node,
>  		 vctrl_res.start, vgic->maint_irq);
>  
>  	vgic->type = VGIC_V2;
> +	vgic->max_hw_vcpus = 8;

define GIC_V2_MAX_CPUS ?

>  	*ops = &vgic_v2_ops;
>  	*params = vgic;
>  	goto out;
> diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
> index d14c75f..acd256c7 100644
> --- a/virt/kvm/arm/vgic-v3.c
> +++ b/virt/kvm/arm/vgic-v3.c
> @@ -235,6 +235,7 @@ int vgic_v3_probe(struct device_node *vgic_node,
>  	vgic->vcpu_base = vcpu_res.start;
>  	vgic->vctrl_base = NULL;
>  	vgic->type = VGIC_V3;
> +	vgic->max_hw_vcpus = KVM_MAX_VCPUS;
>  
>  	kvm_info("%s@%llx IRQ%d\n", vgic_node->name,
>  		 vcpu_res.start, vgic->maint_irq);
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 4aa0b2f..4c72c66 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1841,6 +1841,17 @@ static int vgic_vcpu_init_maps(struct kvm_vcpu *vcpu, int nr_irqs)
>  }
>  
>  /**
> + * kvm_vgic_get_max_vcpus - Get the maximum number of VCPUs allowed by HW
> + *
> + * The host's GIC naturally limits the maximum amount of VCPUs a guest
> + * can use.
> + */
> +int kvm_vgic_get_max_vcpus(void)
> +{
> +	return vgic->max_hw_vcpus;
> +}
> +
> +/**
>   * kvm_vgic_vcpu_init - Initialize per-vcpu VGIC state
>   * @vcpu: pointer to the vcpu struct
>   *
> @@ -2056,6 +2067,8 @@ static int vgic_v2_init_emulation(struct kvm *kvm)
>  	dist->vm_ops.add_sgi_source = vgic_v2_add_sgi_source;
>  	dist->vm_ops.vgic_init = vgic_v2_init;
>  
> +	kvm->arch.max_vcpus = 8;

reuse the define here

> +
>  	return 0;
>  }
>  
> @@ -2072,6 +2085,15 @@ static int init_vgic_model(struct kvm *kvm, int type)
>  		break;
>  	}
>  
> +	if (ret)
> +		return ret;
> +
> +	if (kvm->arch.max_vcpus < atomic_read(&kvm->online_vcpus)) {

I would invert this check; if (online_vcpus > max_vcpus) ...

> +		pr_warn_ratelimited("VGIC model only supports up to %d vCPUs\n",
> +			kvm->arch.max_vcpus);
> +		ret = -EINVAL;
> +	}
> +
>  	return ret;
>  }
>  
> -- 
> 1.7.9.5
> 

So let me see if I got this right:

When you create the VM you set the maximum number of vcpus to whatever
the underlying vgic hardware allows.

Then, when you start creating vcpus, you complain if the user tries to
create more than what the hardware allows (check against
kvm->arch.max_vcpus).

Then, when you create the vgic, you further limit kvm->arch.max_vcpus
and check if you already created too many vcpus for the vgic model you
are trying to create, and error out in that case, and now also check
against the new value when user space is trying to create more vcpus.

Some questions:

(1) Is there currently a way to tell user space what the maximum number
of vcpus for a given setup is?

(2) Would it be simpler to just have kvm_vgic_max_vcpus() return its
best guess and always check against that from the outside?  Hmmm, maybe
not, but thought I'd throw it out there.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 11/19] arm/arm64: KVM: refactor/wrap vgic_set/get_attr()
  2014-11-14 10:07 ` [PATCH v4 11/19] arm/arm64: KVM: refactor/wrap vgic_set/get_attr() Andre Przywara
@ 2014-11-23 13:27   ` Christoffer Dall
  0 siblings, 0 replies; 80+ messages in thread
From: Christoffer Dall @ 2014-11-23 13:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Nov 14, 2014 at 10:07:55AM +0000, Andre Przywara wrote:
> vgic_set_attr() and vgic_get_attr() contain both code specific for
> the emulated GIC as well as code for the userland facing, generic
> part of the GIC.
> Split the guest GIC facing code of from the generic part to allow
> easier splitting later.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 12/19] arm/arm64: KVM: add vgic.h header file
  2014-11-14 10:07 ` [PATCH v4 12/19] arm/arm64: KVM: add vgic.h header file Andre Przywara
  2014-11-18 14:07   ` Eric Auger
@ 2014-11-23 13:29   ` Christoffer Dall
  1 sibling, 0 replies; 80+ messages in thread
From: Christoffer Dall @ 2014-11-23 13:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Nov 14, 2014 at 10:07:56AM +0000, Andre Przywara wrote:
> vgic.c is currently a mixture of generic vGIC emulation code and
> functions specific to emulating a GICv2. To ease the addition of
> GICv3 later, we create new header file vgic.h, which holds constants
> and prototypes of commonly used functions.
> Rename some identifiers to avoid name space clutter.
> I removed the long-standing comment about using the kvm_io_bus API
> to tackle the GIC register ranges, as it wouldn't be a win for us
> anymore.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> 
> -------
> As the diff isn't always obvious here (and to aid eventual rebases),
> here is a list of high-level changes done to the code:
> * moved definitions and prototypes from vgic.c to vgic.h:
>   - VGIC_ADDR_UNDEF
>   - ACCESS_{READ,WRITE}_*
>   - vgic_update_state()
>   - vgic_kick_vcpus()
>   - vgic_get_vmcr()
>   - vgic_set_vmcr()
>   - struct mmio_range {} (renamed to struct kvm_mmio_range)
> * removed static keyword and exported prototype in vgic.h:
>   - vgic_bitmap_get_reg()
>   - vgic_bitmap_set_irq_val()
>   - vgic_bitmap_get_shared_map()
>   - vgic_bytemap_get_reg()
>   - vgic_dist_irq_set()
>   - vgic_dist_irq_clear()
>   - vgic_cpu_irq_clear()
>   - vgic_reg_access()
>   - handle_mmio_raz_wi()
>   - vgic_handle_enable_reg()
>   - vgic_handle_pending_reg()
>   - vgic_handle_cfg_reg()
>   - vgic_unqueue_irqs()
>   - find_matching_range() (renamed to vgic_find_range)
>   - vgic_handle_mmio_range()
>   - vgic_update_state()
>   - vgic_get_vmcr()
>   - vgic_set_vmcr()
>   - vgic_queue_irq()
>   - vgic_kick_vcpus()
>   - vgic_init_maps()
>   - vgic_has_attr_regs()
>   - vgic_set_common_attr()
>   - vgic_get_common_attr()
> * moved functions to vgic.h (static inline):
>   - mmio_data_read()
>   - mmio_data_write()
>   - is_in_range()
> ---
> Changelog v3...v4:
> - rename struct mmio_range to struct kvm_mmio_range
> - rename find_matching_range() to vgic_find_range()
> - remove vgic_create() and vgic_destroy() from header

Why are we removing these from the header but still changing them from
static to non-static?  What did I misunderstand?

Otherwise:

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 13/19] arm/arm64: KVM: split GICv2 specific emulation code from vgic.c
  2014-11-14 10:07 ` [PATCH v4 13/19] arm/arm64: KVM: split GICv2 specific emulation code from vgic.c Andre Przywara
@ 2014-11-23 13:32   ` Christoffer Dall
  0 siblings, 0 replies; 80+ messages in thread
From: Christoffer Dall @ 2014-11-23 13:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Nov 14, 2014 at 10:07:57AM +0000, Andre Przywara wrote:
> vgic.c is currently a mixture of generic vGIC emulation code and
> functions specific to emulating a GICv2. To ease the addition of
> GICv3, split off strictly v2 specific parts into a new file
> vgic-v2-emul.c.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Acked-by: Christoffer Dall <christoffer.dall@linaro.org>

[...]

> 
> -
> -void vgic_destroy(struct kvm_device *dev)
> -{
> -	kfree(dev);
> -}
> -
> -int vgic_create(struct kvm_device *dev, u32 type)

ah, I see, you make it static again when it magically apappears as vgic_v2_create.

never mind the comment on the last patch.

my ack still stands.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 14/19] arm/arm64: KVM: add opaque private pointer to MMIO data
  2014-11-14 10:07 ` [PATCH v4 14/19] arm/arm64: KVM: add opaque private pointer to MMIO data Andre Przywara
@ 2014-11-23 13:33   ` Christoffer Dall
  0 siblings, 0 replies; 80+ messages in thread
From: Christoffer Dall @ 2014-11-23 13:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Nov 14, 2014 at 10:07:58AM +0000, Andre Przywara wrote:
> For a GICv2 there is always only one (v)CPU involved: the one that
> does the access. On a GICv3 the access to a CPU redistributor is
> memory-mapped, but not banked, so the (v)CPU affected is determined by
> looking at the MMIO address region being accessed.
> To allow passing the affected CPU into the accessors later, extend
> struct kvm_exit_mmio to add an opaque private pointer parameter.
> The current GICv2 emulation just does not use it.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---

Looks reasonable:

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation
  2014-11-14 10:07 ` [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation Andre Przywara
  2014-11-14 11:07   ` Christoffer Dall
  2014-11-18 15:57   ` Eric Auger
@ 2014-11-23 14:38   ` Christoffer Dall
  2014-11-24 16:00     ` Andre Przywara
  2 siblings, 1 reply; 80+ messages in thread
From: Christoffer Dall @ 2014-11-23 14:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Nov 14, 2014 at 10:07:59AM +0000, Andre Przywara wrote:
> With everything separated and prepared, we implement a model of a
> GICv3 distributor and redistributors by using the existing framework
> to provide handler functions for each register group.
> 
> Currently we limit the emulation to a model enforcing a single
> security state, with SRE==1 (forcing system register access) and
> ARE==1 (allowing more than 8 VCPUs).
> 
> We share some of the functions provided for GICv2 emulation, but take
> the different ways of addressing (v)CPUs into account.
> Save and restore is currently not implemented.
> 
> Similar to the split-off of the GICv2 specific code, the new emulation
> code goes into a new file (vgic-v3-emul.c).
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
> Changelog v3...v4:
> - remove ICC_SGI1R_EL1 register handling (moved into later patch)
> - add definitions for single security state

what exactly does this mean?

> - document emulation limitations in vgic-v3-emul.c header
> - move CTLR, TYPER and IIDR handling into separate functions
> - add RAO/WI handling for IGROUPRn registers
> - remove unneeded offset masking on calling vgic_reg_access()
> - rework handle_mmio_route_reg() to only handle SPIs
> - refine IROUTERn register range
> - use non-atomic bitops functions (__clear_bit() and __set_bit())
> - rename vgic_dist_ranges[] to vgic_v3_dist_ranges[]
> - add (RAZ/WI) implementation of GICD_STATUSR
> - add (RAZ/WI) implementations of MBI registers
> - adapt to new private passing (in struct kvm_exit_mmio instead of a paramter)
> - fix vcpu_id calculation bug in handle CFG registers

which bug was that?  I can't see the difference between v3 and v4 on
this one?

> - always use hexadecimal numbers for .len member
> - simplify vgic_v3_handle_mmio()
> - add vgic_v3_create() and vgic_v3_destroy()
> - swap vgic_v3_[sg]et_attr() code location
> - add and improve comments
> - (adaptions to changes from earlier patches)
> 
>  arch/arm64/kvm/Makefile            |    1 +
>  include/kvm/arm_vgic.h             |    9 +-
>  include/linux/irqchip/arm-gic-v3.h |   32 ++
>  include/linux/kvm_host.h           |    1 +
>  include/uapi/linux/kvm.h           |    2 +
>  virt/kvm/arm/vgic-v3-emul.c        |  904 ++++++++++++++++++++++++++++++++++++
>  virt/kvm/arm/vgic.c                |   11 +-
>  virt/kvm/arm/vgic.h                |    3 +
>  8 files changed, 960 insertions(+), 3 deletions(-)
>  create mode 100644 virt/kvm/arm/vgic-v3-emul.c
> 
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index d957353..4e6e09e 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -24,5 +24,6 @@ kvm-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v2.o
>  kvm-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v2-emul.o
>  kvm-$(CONFIG_KVM_ARM_VGIC) += vgic-v2-switch.o
>  kvm-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v3.o
> +kvm-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v3-emul.o
>  kvm-$(CONFIG_KVM_ARM_VGIC) += vgic-v3-switch.o
>  kvm-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 421833f..c1ef5a9 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -160,7 +160,11 @@ struct vgic_dist {
>  
>  	/* Distributor and vcpu interface mapping in the guest */
>  	phys_addr_t		vgic_dist_base;
> -	phys_addr_t		vgic_cpu_base;
> +	/* GICv2 and GICv3 use different mapped register blocks */
> +	union {
> +		phys_addr_t		vgic_cpu_base;
> +		phys_addr_t		vgic_redist_base;
> +	};
>  
>  	/* Distributor enabled */
>  	u32			enabled;
> @@ -222,6 +226,9 @@ struct vgic_dist {
>  	 */
>  	struct vgic_bitmap	*irq_spi_target;
>  
> +	/* Target MPIDR for each IRQ (needed for GICv3 IROUTERn) only */
> +	u32			*irq_spi_mpidr;
> +
>  	/* Bitmap indicating which CPU has something pending */
>  	unsigned long		*irq_pending_on_cpu;
>  
> diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
> index 03a4ea3..726d898 100644
> --- a/include/linux/irqchip/arm-gic-v3.h
> +++ b/include/linux/irqchip/arm-gic-v3.h
> @@ -33,6 +33,7 @@
>  #define GICD_SETSPI_SR			0x0050
>  #define GICD_CLRSPI_SR			0x0058
>  #define GICD_SEIR			0x0068
> +#define GICD_IGROUPR			0x0080
>  #define GICD_ISENABLER			0x0100
>  #define GICD_ICENABLER			0x0180
>  #define GICD_ISPENDR			0x0200
> @@ -41,14 +42,37 @@
>  #define GICD_ICACTIVER			0x0380
>  #define GICD_IPRIORITYR			0x0400
>  #define GICD_ICFGR			0x0C00
> +#define GICD_IGRPMODR			0x0D00
> +#define GICD_NSACR			0x0E00
>  #define GICD_IROUTER			0x6000
> +#define GICD_IDREGS			0xFFD0
>  #define GICD_PIDR2			0xFFE8
>  
> +/*
> + * Those registers are actually from GICv2, but the spec demands that they
> + * are implemented as RES0 if ARE is 1 (which we do in KVM's emulated GICv3).
> + */
> +#define GICD_ITARGETSR			0x0800
> +#define GICD_SGIR			0x0F00
> +#define GICD_CPENDSGIR			0x0F10
> +#define GICD_SPENDSGIR			0x0F20
> +
>  #define GICD_CTLR_RWP			(1U << 31)
> +#define GICD_CTLR_DS			(1U << 6)
>  #define GICD_CTLR_ARE_NS		(1U << 4)
>  #define GICD_CTLR_ENABLE_G1A		(1U << 1)
>  #define GICD_CTLR_ENABLE_G1		(1U << 0)
>  
> +/*
> + * In systems with a single security state (what we emulate in KVM)
> + * the meaning of the interrupt group enable bits is slightly different
> + */
> +#define GICD_CTLR_ENABLE_SS_G1		(1U << 1)
> +#define GICD_CTLR_ENABLE_SS_G0		(1U << 0)
> +
> +#define GICD_TYPER_LPIS			(1U << 17)
> +#define GICD_TYPER_MBIS			(1U << 16)
> +
>  #define GICD_IROUTER_SPI_MODE_ONE	(0U << 31)
>  #define GICD_IROUTER_SPI_MODE_ANY	(1U << 31)
>  
> @@ -56,6 +80,8 @@
>  #define GIC_PIDR2_ARCH_GICv3		0x30
>  #define GIC_PIDR2_ARCH_GICv4		0x40
>  
> +#define GIC_V3_DIST_SIZE		0x10000
> +
>  /*
>   * Re-Distributor registers, offsets from RD_base
>   */
> @@ -74,6 +100,7 @@
>  #define GICR_SYNCR			0x00C0
>  #define GICR_MOVLPIR			0x0100
>  #define GICR_MOVALLR			0x0110
> +#define GICR_IDREGS			GICD_IDREGS
>  #define GICR_PIDR2			GICD_PIDR2
>  
>  #define GICR_WAKER_ProcessorSleep	(1U << 1)
> @@ -82,6 +109,7 @@
>  /*
>   * Re-Distributor registers, offsets from SGI_base
>   */
> +#define GICR_IGROUPR0			GICD_IGROUPR
>  #define GICR_ISENABLER0			GICD_ISENABLER
>  #define GICR_ICENABLER0			GICD_ICENABLER
>  #define GICR_ISPENDR0			GICD_ISPENDR
> @@ -90,10 +118,14 @@
>  #define GICR_ICACTIVER0			GICD_ICACTIVER
>  #define GICR_IPRIORITYR0		GICD_IPRIORITYR
>  #define GICR_ICFGR0			GICD_ICFGR
> +#define GICR_IGRPMODR0			GICD_IGRPMODR
> +#define GICR_NSACR			GICD_NSACR
>  
>  #define GICR_TYPER_VLPIS		(1U << 1)
>  #define GICR_TYPER_LAST			(1U << 4)
>  
> +#define GIC_V3_REDIST_SIZE		0x20000
> +
>  /*
>   * CPU interface registers
>   */
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 326ba7a..4a7798e 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1085,6 +1085,7 @@ void kvm_unregister_device_ops(u32 type);
>  extern struct kvm_device_ops kvm_mpic_ops;
>  extern struct kvm_device_ops kvm_xics_ops;
>  extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
> +extern struct kvm_device_ops kvm_arm_vgic_v3_ops;
>  
>  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
>  
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 6076882..24cb129 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -960,6 +960,8 @@ enum kvm_device_type {
>  #define KVM_DEV_TYPE_ARM_VGIC_V2	KVM_DEV_TYPE_ARM_VGIC_V2
>  	KVM_DEV_TYPE_FLIC,
>  #define KVM_DEV_TYPE_FLIC		KVM_DEV_TYPE_FLIC
> +	KVM_DEV_TYPE_ARM_VGIC_V3,
> +#define KVM_DEV_TYPE_ARM_VGIC_V3	KVM_DEV_TYPE_ARM_VGIC_V3
>  	KVM_DEV_TYPE_MAX,
>  };
>  
> diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
> new file mode 100644
> index 0000000..97b5801
> --- /dev/null
> +++ b/virt/kvm/arm/vgic-v3-emul.c
> @@ -0,0 +1,904 @@
> +/*
> + * GICv3 distributor and redistributor emulation
> + *
> + * GICv3 emulation is currently only supported on a GICv3 host (because
> + * we rely on the hardware's CPU interface virtualization support), but
> + * supports both hardware with or without the optional GICv2 backwards
> + * compatibility features.
> + *
> + * Limitations of the emulation:
> + * (RAZ/WI: read as zero, write ignore, RAO/WI: read as one, write ignore)
> + * - We do not support LPIs (yet). TYPER.LPIS is reported as 0 and is RAZ/WI.
> + * - We do not support the message based interrupts (MBIs) triggered by
> + *   writes to the GICD_{SET,CLR}SPI_* registers. TYPER.MBIS is reported as 0.
> + * - We do not support the (optional) backwards compatibility feature.
> + *   GICD_CTLR.ARE resets to 1 and is RAO/WI. If the _host_ GIC supports
> + *   the compatiblity feature, you can use a GICv2 in the guest, though.
> + * - We only support a single security state. GICD_CTLR.DS is 1 and is RAO/WI.
> + * - Priorities are not emulated (same as the GICv2 emulation). Linux
> + *   as a guest is fine with this, because it does not use priorities.
> + * - We only support Group1 interrupts. Again Linux uses only those.
> + *
> + * Copyright (C) 2014 ARM Ltd.
> + * Author: Andre Przywara <andre.przywara@arm.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/cpu.h>
> +#include <linux/kvm.h>
> +#include <linux/kvm_host.h>
> +#include <linux/interrupt.h>
> +
> +#include <linux/irqchip/arm-gic-v3.h>
> +#include <kvm/arm_vgic.h>
> +
> +#include <asm/kvm_emulate.h>
> +#include <asm/kvm_arm.h>
> +#include <asm/kvm_mmu.h>
> +
> +#include "vgic.h"
> +
> +static bool handle_mmio_rao_wi(struct kvm_vcpu *vcpu,
> +			       struct kvm_exit_mmio *mmio, phys_addr_t offset)
> +{
> +	u32 reg = 0xffffffff;
> +
> +	vgic_reg_access(mmio, &reg, offset,
> +			ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
> +
> +	return false;
> +}
> +
> +static bool handle_mmio_ctlr(struct kvm_vcpu *vcpu,
> +			     struct kvm_exit_mmio *mmio, phys_addr_t offset)
> +{
> +	u32 reg = 0;
> +
> +	/*
> +	 * Force ARE and DS to 1, the guest cannot change this.
> +	 * For the time being we only support Group1 interrupts.
> +	 */
> +	if (vcpu->kvm->arch.vgic.enabled)
> +		reg = GICD_CTLR_ENABLE_SS_G1;
> +	reg |= GICD_CTLR_ARE_NS | GICD_CTLR_DS;
> +
> +	vgic_reg_access(mmio, &reg, offset,
> +			ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
> +	if (mmio->is_write) {
> +		if (reg & GICD_CTLR_ENABLE_SS_G0)
> +			kvm_info("guest tried to enable unsupported Group0 interrupts\n");
> +		vcpu->kvm->arch.vgic.enabled = !!(reg & GICD_CTLR_ENABLE_SS_G1);
> +		vgic_update_state(vcpu->kvm);
> +		return true;
> +	}
> +	return false;
> +}
> +
> +/*
> + * As this implementation does not provide compatibility
> + * with GICv2 (ARE==1), we report zero CPUs in bits [5..7].
> + * Also LPIs and MBIs are not supported, so we set the respective bits to 0.
> + * Also we report at most 2**10=1024 interrupt IDs (to match 1024 SPIs).
> + */
> +#define INTERRUPT_ID_BITS 10
> +static bool handle_mmio_typer(struct kvm_vcpu *vcpu,
> +			      struct kvm_exit_mmio *mmio, phys_addr_t offset)
> +{
> +	u32 reg;
> +
> +	/* we report at most 1024 IRQs via this interface */

hmmm, do we need to repeat ourselves here?

I get a bit confused by both the comment above and here, as to *why* we
are reporting this value?  And what is the bit about 'this interface'?
Is there another interface.

Perhaps what you're trying to get at here are the semantic differences
between ITLinesNumber and IDbits and how that helps a reader understand
the code.

> +	reg = (min(vcpu->kvm->arch.vgic.nr_irqs, 1024) >> 5) - 1;
> +
> +	reg |= (INTERRUPT_ID_BITS - 1) << 19;
> +
> +	vgic_reg_access(mmio, &reg, offset,
> +			ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
> +
> +	return false;
> +}
> +
> +static bool handle_mmio_iidr(struct kvm_vcpu *vcpu,
> +			     struct kvm_exit_mmio *mmio, phys_addr_t offset)
> +{
> +	u32 reg;
> +
> +	reg = (PRODUCT_ID_KVM << 24) | (IMPLEMENTER_ARM << 0);
> +	vgic_reg_access(mmio, &reg, offset,
> +			ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
> +
> +	return false;
> +}
> +
> +static bool handle_mmio_set_enable_reg_dist(struct kvm_vcpu *vcpu,
> +					    struct kvm_exit_mmio *mmio,
> +					    phys_addr_t offset)
> +{
> +	if (likely(offset >= VGIC_NR_PRIVATE_IRQS / 8))
> +		return vgic_handle_enable_reg(vcpu->kvm, mmio, offset,
> +					      vcpu->vcpu_id,
> +					      ACCESS_WRITE_SETBIT);
> +
> +	vgic_reg_access(mmio, NULL, offset,
> +			ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
> +	return false;
> +}
> +
> +static bool handle_mmio_clear_enable_reg_dist(struct kvm_vcpu *vcpu,
> +					      struct kvm_exit_mmio *mmio,
> +					      phys_addr_t offset)
> +{
> +	if (likely(offset >= VGIC_NR_PRIVATE_IRQS / 8))
> +		return vgic_handle_enable_reg(vcpu->kvm, mmio, offset,
> +					      vcpu->vcpu_id,
> +					      ACCESS_WRITE_CLEARBIT);
> +
> +	vgic_reg_access(mmio, NULL, offset,
> +			ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
> +	return false;
> +}
> +
> +static bool handle_mmio_set_pending_reg_dist(struct kvm_vcpu *vcpu,
> +					     struct kvm_exit_mmio *mmio,
> +					     phys_addr_t offset)
> +{
> +	if (likely(offset >= VGIC_NR_PRIVATE_IRQS / 8))
> +		return vgic_handle_set_pending_reg(vcpu->kvm, mmio, offset,
> +						   vcpu->vcpu_id);
> +
> +	vgic_reg_access(mmio, NULL, offset,
> +			ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
> +	return false;
> +}
> +
> +static bool handle_mmio_clear_pending_reg_dist(struct kvm_vcpu *vcpu,
> +					       struct kvm_exit_mmio *mmio,
> +					       phys_addr_t offset)
> +{
> +	if (likely(offset >= VGIC_NR_PRIVATE_IRQS / 8))
> +		return vgic_handle_clear_pending_reg(vcpu->kvm, mmio, offset,
> +						     vcpu->vcpu_id);
> +
> +	vgic_reg_access(mmio, NULL, offset,
> +			ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
> +	return false;
> +}
> +
> +static bool handle_mmio_priority_reg_dist(struct kvm_vcpu *vcpu,
> +					  struct kvm_exit_mmio *mmio,
> +					  phys_addr_t offset)
> +{
> +	u32 *reg;
> +
> +	if (unlikely(offset < VGIC_NR_PRIVATE_IRQS)) {
> +		vgic_reg_access(mmio, NULL, offset,
> +				ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
> +		return false;
> +	}
> +
> +	reg = vgic_bytemap_get_reg(&vcpu->kvm->arch.vgic.irq_priority,
> +				   vcpu->vcpu_id, offset);
> +	vgic_reg_access(mmio, reg, offset,
> +		ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
> +	return false;
> +}
> +
> +static bool handle_mmio_cfg_reg_dist(struct kvm_vcpu *vcpu,
> +				     struct kvm_exit_mmio *mmio,
> +				     phys_addr_t offset)
> +{
> +	u32 *reg;
> +
> +	if (unlikely(offset < VGIC_NR_PRIVATE_IRQS / 4)) {
> +		vgic_reg_access(mmio, NULL, offset,
> +				ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
> +		return false;
> +	}
> +
> +	reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_cfg,
> +				  vcpu->vcpu_id, offset >> 1);
> +
> +	return vgic_handle_cfg_reg(reg, mmio, offset);
> +}
> +
> +/*
> + * We use a compressed version of the MPIDR (all 32 bits in one 32-bit word)
> + * when we store the target MPIDR written by the guest.
> + */
> +static u32 compress_mpidr(unsigned long mpidr)
> +{
> +	u32 ret;
> +
> +	ret = MPIDR_AFFINITY_LEVEL(mpidr, 0);
> +	ret |= MPIDR_AFFINITY_LEVEL(mpidr, 1) << 8;
> +	ret |= MPIDR_AFFINITY_LEVEL(mpidr, 2) << 16;
> +	ret |= MPIDR_AFFINITY_LEVEL(mpidr, 3) << 24;
> +
> +	return ret;
> +}
> +
> +static unsigned long uncompress_mpidr(u32 value)
> +{
> +	unsigned long mpidr;
> +
> +	mpidr  = ((value >>  0) & 0xFF) << MPIDR_LEVEL_SHIFT(0);
> +	mpidr |= ((value >>  8) & 0xFF) << MPIDR_LEVEL_SHIFT(1);
> +	mpidr |= ((value >> 16) & 0xFF) << MPIDR_LEVEL_SHIFT(2);
> +	mpidr |= (u64)((value >> 24) & 0xFF) << MPIDR_LEVEL_SHIFT(3);
> +
> +	return mpidr;
> +}
> +
> +/*
> + * Lookup the given MPIDR value to get the vcpu_id (if there is one)
> + * and store that in the irq_spi_cpu[] array.
> + * This limits the number of VCPUs to 255 for now, extending the data
> + * type (or storing kvm_vcpu poiners) should lift the limit.

s/poiners/pointers

> + * Store the original MPIDR value in an extra array to support read-as-written.
> + * Unallocated MPIDRs are translated to a special value and caught
> + * before any array accesses.

We may have covered this already, but why can't we restore the original
MPIDR based on the the irq_spi_cpu array?

Is that because we loose information about 'which' unallocated MPIDR was
written?  If that's the case, it seems weird that we go through the
trouble but we anyway throw away the aff3 field...?

> + */
> +static bool handle_mmio_route_reg(struct kvm_vcpu *vcpu,
> +				  struct kvm_exit_mmio *mmio,
> +				  phys_addr_t offset)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	struct vgic_dist *dist = &kvm->arch.vgic;
> +	int spi;
> +	u32 reg;
> +	int vcpu_id;
> +	unsigned long *bmap, mpidr;
> +
> +	/*
> +	 * The upper 32 bits of each 64 bit register are zero,
> +	 * as we don't support Aff3.
> +	 */
> +	if ((offset & 4)) {
> +		vgic_reg_access(mmio, NULL, offset,
> +				ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
> +		return false;
> +	}
> +
> +	/* This region only covers SPIs, so no handling of private IRQs here. */
> +	spi = offset / 8;

that's not how I read the spec, it says that GICD_IROUTER0 to
GICD_IROUTER1 are not implemented (because they are SGIs and PPIs), and
I read the 'SPI ID m' as the lowest numbered SPI ID being 32, thus you
should do:

spi = offset / 8 - VGIC_NR_PRIVATE_IRQS;

> +
> +	/* get the stored MPIDR for this IRQ */
> +	mpidr = uncompress_mpidr(dist->irq_spi_mpidr[spi]);
> +	mpidr &= MPIDR_HWID_BITMASK;

is this mask needed after calling uncompress above?

> +	reg = mpidr;
> +
> +	vgic_reg_access(mmio, &reg, offset,
> +			ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
> +
> +	if (!mmio->is_write)
> +		return false;
> +
> +	/*
> +	 * Now clear the currently assigned vCPU from the map, making room
> +	 * for the new one to be written below
> +	 */
> +	vcpu = kvm_mpidr_to_vcpu(kvm, mpidr);
> +	if (likely(vcpu)) {
> +		vcpu_id = vcpu->vcpu_id;
> +		bmap = vgic_bitmap_get_shared_map(&dist->irq_spi_target[vcpu_id]);
> +		__clear_bit(spi, bmap);
> +	}
> +
> +	dist->irq_spi_mpidr[spi] = compress_mpidr(reg);
> +	vcpu = kvm_mpidr_to_vcpu(kvm, reg & MPIDR_HWID_BITMASK);
> +
> +	/*
> +	 * The spec says that non-existent MPIDR values should not be
> +	 * forwarded to any existent (v)CPU, but should be able to become
> +	 * pending anyway. We simply keep the irq_spi_target[] array empty, so
> +	 * the interrupt will never be injected.
> +	 * irq_spi_cpu[irq] gets a magic value in this case.
> +	 */
> +	if (likely(vcpu)) {
> +		vcpu_id = vcpu->vcpu_id;
> +		dist->irq_spi_cpu[spi] = vcpu_id;
> +		bmap = vgic_bitmap_get_shared_map(&dist->irq_spi_target[vcpu_id]);
> +		__set_bit(spi, bmap);
> +	} else {
> +		dist->irq_spi_cpu[spi] = VCPU_NOT_ALLOCATED;
> +	}
> +
> +	vgic_update_state(kvm);
> +
> +	return true;
> +}
> +
> +/*
> + * We should be careful about promising too much when a guest reads
> + * this register. Don't claim to be like any hardware implementation,
> + * but just report the GIC as version 3 - which is what a Linux guest
> + * would check.
> + */
> +static bool handle_mmio_idregs(struct kvm_vcpu *vcpu,
> +			       struct kvm_exit_mmio *mmio,
> +			       phys_addr_t offset)
> +{
> +	u32 reg = 0;
> +
> +	switch (offset + GICD_IDREGS) {
> +	case GICD_PIDR2:
> +		reg = 0x3b;
> +		break;
> +	}
> +
> +	vgic_reg_access(mmio, &reg, offset,
> +			ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
> +
> +	return false;
> +}
> +
> +static const struct kvm_mmio_range vgic_v3_dist_ranges[] = {
> +	{
> +		.base           = GICD_CTLR,
> +		.len            = 0x04,
> +		.bits_per_irq   = 0,
> +		.handle_mmio    = handle_mmio_ctlr,
> +	},
> +	{
> +		.base           = GICD_TYPER,
> +		.len            = 0x04,
> +		.bits_per_irq   = 0,
> +		.handle_mmio    = handle_mmio_typer,
> +	},
> +	{
> +		.base           = GICD_IIDR,
> +		.len            = 0x04,
> +		.bits_per_irq   = 0,
> +		.handle_mmio    = handle_mmio_iidr,
> +	},
> +	{
> +		/* this register is optional, it is RAZ/WI if not implemented */
> +		.base           = GICD_STATUSR,
> +		.len            = 0x04,
> +		.bits_per_irq   = 0,
> +		.handle_mmio    = handle_mmio_raz_wi,
> +	},
> +	{
> +		/* this write only register is WI when TYPER.MBIS=0 */
> +		.base		= GICD_SETSPI_NSR,
> +		.len		= 0x04,
> +		.bits_per_irq	= 0,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		/* this write only register is WI when TYPER.MBIS=0 */
> +		.base		= GICD_CLRSPI_NSR,
> +		.len		= 0x04,
> +		.bits_per_irq	= 0,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		/* this is RAZ/WI when DS=1 */
> +		.base		= GICD_SETSPI_SR,
> +		.len		= 0x04,
> +		.bits_per_irq	= 0,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		/* this is RAZ/WI when DS=1 */
> +		.base		= GICD_CLRSPI_SR,
> +		.len		= 0x04,
> +		.bits_per_irq	= 0,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		.base		= GICD_IGROUPR,
> +		.len		= 0x80,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_rao_wi,
> +	},
> +	{
> +		.base		= GICD_ISENABLER,
> +		.len		= 0x80,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_set_enable_reg_dist,
> +	},
> +	{
> +		.base		= GICD_ICENABLER,
> +		.len		= 0x80,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_clear_enable_reg_dist,
> +	},
> +	{
> +		.base		= GICD_ISPENDR,
> +		.len		= 0x80,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_set_pending_reg_dist,
> +	},
> +	{
> +		.base		= GICD_ICPENDR,
> +		.len		= 0x80,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_clear_pending_reg_dist,
> +	},
> +	{
> +		.base		= GICD_ISACTIVER,
> +		.len		= 0x80,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		.base		= GICD_ICACTIVER,
> +		.len		= 0x80,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		.base		= GICD_IPRIORITYR,
> +		.len		= 0x400,
> +		.bits_per_irq	= 8,
> +		.handle_mmio	= handle_mmio_priority_reg_dist,
> +	},
> +	{
> +		/* TARGETSRn is RES0 when ARE=1 */
> +		.base		= GICD_ITARGETSR,
> +		.len		= 0x400,
> +		.bits_per_irq	= 8,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		.base		= GICD_ICFGR,
> +		.len		= 0x100,
> +		.bits_per_irq	= 2,
> +		.handle_mmio	= handle_mmio_cfg_reg_dist,
> +	},
> +	{
> +		/* this is RAZ/WI when DS=1 */
> +		.base		= GICD_IGRPMODR,
> +		.len		= 0x80,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		/* this is RAZ/WI when DS=1 */
> +		.base		= GICD_NSACR,
> +		.len		= 0x100,
> +		.bits_per_irq	= 2,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		/* this is RAZ/WI when ARE=1 */
> +		.base		= GICD_SGIR,
> +		.len		= 0x04,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		/* this is RAZ/WI when ARE=1 */
> +		.base		= GICD_CPENDSGIR,
> +		.len		= 0x10,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		/* this is RAZ/WI when ARE=1 */
> +		.base           = GICD_SPENDSGIR,
> +		.len            = 0x10,
> +		.handle_mmio    = handle_mmio_raz_wi,
> +	},
> +	{
> +		.base		= GICD_IROUTER + 0x100,
> +		.len		= 0x1edc,
> +		.bits_per_irq	= 64,
> +		.handle_mmio	= handle_mmio_route_reg,
> +	},
> +	{
> +		.base           = GICD_IDREGS,
> +		.len            = 0x30,
> +		.bits_per_irq   = 0,
> +		.handle_mmio    = handle_mmio_idregs,
> +	},
> +	{},
> +};
> +
> +static bool handle_mmio_set_enable_reg_redist(struct kvm_vcpu *vcpu,
> +					      struct kvm_exit_mmio *mmio,
> +					      phys_addr_t offset)
> +{
> +	struct kvm_vcpu *redist_vcpu = mmio->private;
> +
> +	return vgic_handle_enable_reg(vcpu->kvm, mmio, offset,
> +				      redist_vcpu->vcpu_id,
> +				      ACCESS_WRITE_SETBIT);
> +}
> +
> +static bool handle_mmio_clear_enable_reg_redist(struct kvm_vcpu *vcpu,
> +						struct kvm_exit_mmio *mmio,
> +						phys_addr_t offset)
> +{
> +	struct kvm_vcpu *redist_vcpu = mmio->private;
> +
> +	return vgic_handle_enable_reg(vcpu->kvm, mmio, offset,
> +				      redist_vcpu->vcpu_id,
> +				      ACCESS_WRITE_CLEARBIT);
> +}
> +
> +static bool handle_mmio_set_pending_reg_redist(struct kvm_vcpu *vcpu,
> +					       struct kvm_exit_mmio *mmio,
> +					       phys_addr_t offset)
> +{
> +	struct kvm_vcpu *redist_vcpu = mmio->private;
> +
> +	return vgic_handle_set_pending_reg(vcpu->kvm, mmio, offset,
> +					   redist_vcpu->vcpu_id);
> +}
> +
> +static bool handle_mmio_clear_pending_reg_redist(struct kvm_vcpu *vcpu,
> +						 struct kvm_exit_mmio *mmio,
> +						 phys_addr_t offset)
> +{
> +	struct kvm_vcpu *redist_vcpu = mmio->private;
> +
> +	return vgic_handle_clear_pending_reg(vcpu->kvm, mmio, offset,
> +					     redist_vcpu->vcpu_id);
> +}
> +
> +static bool handle_mmio_priority_reg_redist(struct kvm_vcpu *vcpu,
> +					    struct kvm_exit_mmio *mmio,
> +					    phys_addr_t offset)
> +{
> +	struct kvm_vcpu *redist_vcpu = mmio->private;
> +	u32 *reg;
> +
> +	reg = vgic_bytemap_get_reg(&vcpu->kvm->arch.vgic.irq_priority,
> +				   redist_vcpu->vcpu_id, offset);
> +	vgic_reg_access(mmio, reg, offset,
> +			ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
> +	return false;
> +}
> +
> +static bool handle_mmio_cfg_reg_redist(struct kvm_vcpu *vcpu,
> +				       struct kvm_exit_mmio *mmio,
> +				       phys_addr_t offset)
> +{
> +	struct kvm_vcpu *redist_vcpu = mmio->private;
> +
> +	u32 *reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_cfg,
> +				       redist_vcpu->vcpu_id, offset >> 1);
> +
> +	return vgic_handle_cfg_reg(reg, mmio, offset);
> +}
> +
> +static const struct kvm_mmio_range vgic_redist_sgi_ranges[] = {
> +	{
> +		.base		= GICR_IGROUPR0,
> +		.len		= 0x04,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_raz_wi,

I think you were going to change this ro handle_mmio_rao_wi() instead?

> +	},
> +	{
> +		.base		= GICR_ISENABLER0,
> +		.len		= 0x04,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_set_enable_reg_redist,
> +	},
> +	{
> +		.base		= GICR_ICENABLER0,
> +		.len		= 0x04,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_clear_enable_reg_redist,
> +	},
> +	{
> +		.base		= GICR_ISPENDR0,
> +		.len		= 0x04,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_set_pending_reg_redist,
> +	},
> +	{
> +		.base		= GICR_ICPENDR0,
> +		.len		= 0x04,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_clear_pending_reg_redist,
> +	},
> +	{
> +		.base		= GICR_ISACTIVER0,
> +		.len		= 0x04,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		.base		= GICR_ICACTIVER0,
> +		.len		= 0x04,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		.base		= GICR_IPRIORITYR0,
> +		.len		= 0x20,
> +		.bits_per_irq	= 8,
> +		.handle_mmio	= handle_mmio_priority_reg_redist,
> +	},
> +	{
> +		.base		= GICR_ICFGR0,
> +		.len		= 0x08,
> +		.bits_per_irq	= 2,
> +		.handle_mmio	= handle_mmio_cfg_reg_redist,
> +	},
> +	{
> +		.base		= GICR_IGRPMODR0,
> +		.len		= 0x04,
> +		.bits_per_irq	= 1,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{
> +		.base		= GICR_NSACR,
> +		.len		= 0x04,
> +		.handle_mmio	= handle_mmio_raz_wi,
> +	},
> +	{},
> +};
> +
> +static bool handle_mmio_ctlr_redist(struct kvm_vcpu *vcpu,
> +				    struct kvm_exit_mmio *mmio,
> +				    phys_addr_t offset)
> +{
> +	/* since we don't support LPIs, this register is zero for now */
> +	vgic_reg_access(mmio, NULL, offset,
> +			ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
> +	return false;
> +}
> +
> +static bool handle_mmio_typer_redist(struct kvm_vcpu *vcpu,
> +				     struct kvm_exit_mmio *mmio,
> +				     phys_addr_t offset)
> +{
> +	u32 reg;
> +	u64 mpidr;
> +	struct kvm_vcpu *redist_vcpu = mmio->private;
> +	int target_vcpu_id = redist_vcpu->vcpu_id;
> +
> +	/* the upper 32 bits contain the affinity value */
> +	if ((offset & ~3) == 4) {
> +		mpidr = kvm_vcpu_get_mpidr_aff(redist_vcpu);
> +		reg = compress_mpidr(mpidr);
> +
> +		vgic_reg_access(mmio, &reg, offset,
> +				ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
> +		return false;
> +	}
> +
> +	reg = redist_vcpu->vcpu_id << 8;
> +	if (target_vcpu_id == atomic_read(&vcpu->kvm->online_vcpus) - 1)
> +		reg |= GICR_TYPER_LAST;
> +	vgic_reg_access(mmio, &reg, offset,
> +			ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
> +	return false;
> +}
> +
> +static const struct kvm_mmio_range vgic_redist_ranges[] = {
> +	{
> +		.base           = GICR_CTLR,
> +		.len            = 0x04,
> +		.bits_per_irq   = 0,
> +		.handle_mmio    = handle_mmio_ctlr_redist,
> +	},
> +	{
> +		.base           = GICR_TYPER,
> +		.len            = 0x08,
> +		.bits_per_irq   = 0,
> +		.handle_mmio    = handle_mmio_typer_redist,
> +	},
> +	{
> +		.base           = GICR_IIDR,
> +		.len            = 0x04,
> +		.bits_per_irq   = 0,
> +		.handle_mmio    = handle_mmio_iidr,
> +	},
> +	{
> +		.base           = GICR_WAKER,
> +		.len            = 0x04,
> +		.bits_per_irq   = 0,
> +		.handle_mmio    = handle_mmio_raz_wi,
> +	},
> +	{
> +		.base           = GICR_IDREGS,
> +		.len            = 0x30,
> +		.bits_per_irq   = 0,
> +		.handle_mmio    = handle_mmio_idregs,
> +	},
> +	{},
> +};
> +
> +/*
> + * This function splits accesses between the distributor and the two
> + * redistributor parts (private/SPI). As each redistributor is accessible
> + * from any CPU, we have to determine the affected VCPU by taking the faulting
> + * address into account. We then pass this VCPU to the handler function via
> + * the private parameter.
> + */
> +#define SGI_BASE_OFFSET SZ_64K
> +static bool vgic_v3_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
> +				struct kvm_exit_mmio *mmio)
> +{
> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +	unsigned long dbase = dist->vgic_dist_base;
> +	unsigned long rdbase = dist->vgic_redist_base;
> +	int nrcpus = atomic_read(&vcpu->kvm->online_vcpus);
> +	int vcpu_id;
> +	const struct kvm_mmio_range *mmio_range;
> +
> +	if (is_in_range(mmio->phys_addr, mmio->len, dbase, GIC_V3_DIST_SIZE)) {
> +		return vgic_handle_mmio_range(vcpu, run, mmio,
> +					      vgic_v3_dist_ranges, dbase);
> +	}
> +
> +	if (!is_in_range(mmio->phys_addr, mmio->len, rdbase,
> +	    GIC_V3_REDIST_SIZE * nrcpus))
> +		return false;

Did you think more about the contiguous allocation issue here or can you
give me a pointer to the requirement in the spec?

> +
> +	vcpu_id = (mmio->phys_addr - rdbase) / GIC_V3_REDIST_SIZE;
> +	rdbase += (vcpu_id * GIC_V3_REDIST_SIZE);
> +	mmio->private = kvm_get_vcpu(vcpu->kvm, vcpu_id);
> +
> +	if (mmio->phys_addr >= rdbase + SGI_BASE_OFFSET) {
> +		rdbase += SGI_BASE_OFFSET;
> +		mmio_range = vgic_redist_sgi_ranges;
> +	} else {
> +		mmio_range = vgic_redist_ranges;
> +	}
> +	return vgic_handle_mmio_range(vcpu, run, mmio, mmio_range, rdbase);
> +}
> +
> +static bool vgic_v3_queue_sgi(struct kvm_vcpu *vcpu, int irq)
> +{
> +	if (vgic_queue_irq(vcpu, 0, irq)) {
> +		vgic_dist_irq_clear_pending(vcpu, irq);
> +		vgic_cpu_irq_clear(vcpu, irq);
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
> +static int vgic_v3_init_maps(struct vgic_dist *dist)
> +{
> +	int nr_spis = dist->nr_irqs - VGIC_NR_PRIVATE_IRQS;
> +
> +	dist->irq_spi_mpidr = kcalloc(nr_spis, sizeof(dist->irq_spi_mpidr[0]),
> +				      GFP_KERNEL);
> +
> +	if (!dist->irq_spi_mpidr)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +static int vgic_v3_init(struct kvm *kvm, const struct vgic_params *params)
> +{
> +	struct vgic_dist *dist = &kvm->arch.vgic;
> +	int ret, i;
> +	u32 mpidr;
> +
> +	if (IS_VGIC_ADDR_UNDEF(dist->vgic_dist_base) ||
> +	    IS_VGIC_ADDR_UNDEF(dist->vgic_redist_base)) {
> +		kvm_err("Need to set vgic distributor addresses first\n");
> +		return -ENXIO;
> +	}
> +
> +	/*
> +	 * FIXME: this should be moved to init_maps time, and may bite
> +	 * us when adding save/restore. Add a per-emulation hook?
> +	 */

progress on this fixme?

> +	ret = vgic_v3_init_maps(dist);
> +	if (ret) {
> +		kvm_err("Unable to allocate maps\n");
> +		return ret;
> +	}
> +
> +	/* Initialize the target VCPUs for each IRQ to VCPU 0 */
> +	mpidr = compress_mpidr(kvm_vcpu_get_mpidr_aff(kvm_get_vcpu(kvm, 0)));
> +	for (i = VGIC_NR_PRIVATE_IRQS; i < dist->nr_irqs; i++) {
> +		dist->irq_spi_cpu[i - VGIC_NR_PRIVATE_IRQS] = 0;
> +		dist->irq_spi_mpidr[i - VGIC_NR_PRIVATE_IRQS] = mpidr;
> +		vgic_bitmap_set_irq_val(dist->irq_spi_target, 0, i, 1);
> +	}
> +
> +	return 0;
> +}
> +
> +/* GICv3 does not keep track of SGI sources anymore. */
> +static void vgic_v3_add_sgi_source(struct kvm_vcpu *vcpu, int irq, int source)
> +{
> +}
> +
> +int vgic_v3_init_emulation(struct kvm *kvm)
> +{
> +	struct vgic_dist *dist = &kvm->arch.vgic;
> +
> +	dist->vm_ops.handle_mmio = vgic_v3_handle_mmio;
> +	dist->vm_ops.queue_sgi = vgic_v3_queue_sgi;
> +	dist->vm_ops.add_sgi_source = vgic_v3_add_sgi_source;
> +	dist->vm_ops.vgic_init = vgic_v3_init;
> +
> +	kvm->arch.max_vcpus = KVM_MAX_VCPUS;
> +
> +	return 0;
> +}
> +
> +static int vgic_v3_create(struct kvm_device *dev, u32 type)
> +{
> +	return kvm_vgic_create(dev->kvm, type);
> +}
> +
> +static void vgic_v3_destroy(struct kvm_device *dev)
> +{
> +	kfree(dev);
> +}
> +
> +static int vgic_v3_set_attr(struct kvm_device *dev,
> +			    struct kvm_device_attr *attr)
> +{
> +	int ret;
> +
> +	ret = vgic_set_common_attr(dev, attr);
> +	if (ret != -ENXIO)
> +		return ret;
> +
> +	switch (attr->group) {
> +	case KVM_DEV_ARM_VGIC_GRP_DIST_REGS:
> +	case KVM_DEV_ARM_VGIC_GRP_CPU_REGS:
> +		return -ENXIO;
> +	}
> +
> +	return -ENXIO;
> +}
> +
> +static int vgic_v3_get_attr(struct kvm_device *dev,
> +			    struct kvm_device_attr *attr)
> +{
> +	int ret;
> +
> +	ret = vgic_get_common_attr(dev, attr);
> +	if (ret != -ENXIO)
> +		return ret;
> +
> +	switch (attr->group) {
> +	case KVM_DEV_ARM_VGIC_GRP_DIST_REGS:
> +	case KVM_DEV_ARM_VGIC_GRP_CPU_REGS:
> +		return -ENXIO;
> +	}
> +
> +	return -ENXIO;
> +}
> +
> +static int vgic_v3_has_attr(struct kvm_device *dev,
> +			    struct kvm_device_attr *attr)
> +{
> +	switch (attr->group) {
> +	case KVM_DEV_ARM_VGIC_GRP_ADDR:
> +		switch (attr->attr) {
> +		case KVM_VGIC_V2_ADDR_TYPE_DIST:
> +		case KVM_VGIC_V2_ADDR_TYPE_CPU:
> +			return -ENXIO;
> +		}
> +		break;
> +	case KVM_DEV_ARM_VGIC_GRP_DIST_REGS:
> +	case KVM_DEV_ARM_VGIC_GRP_CPU_REGS:
> +		return -ENXIO;
> +	case KVM_DEV_ARM_VGIC_GRP_NR_IRQS:
> +		return 0;
> +	}
> +	return -ENXIO;
> +}
> +
> +struct kvm_device_ops kvm_arm_vgic_v3_ops = {
> +	.name = "kvm-arm-vgic-v3",
> +	.create = vgic_v3_create,
> +	.destroy = vgic_v3_destroy,
> +	.set_attr = vgic_v3_set_attr,
> +	.get_attr = vgic_v3_get_attr,
> +	.has_attr = vgic_v3_has_attr,
> +};
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 335ffe0..b7de0f8 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1249,7 +1249,7 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>  	struct kvm_vcpu *vcpu;
>  	int edge_triggered, level_triggered;
>  	int enabled;
> -	bool ret = true;
> +	bool ret = true, can_inject = true;
>  
>  	spin_lock(&dist->lock);
>  
> @@ -1264,6 +1264,11 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>  
>  	if (irq_num >= VGIC_NR_PRIVATE_IRQS) {
>  		cpuid = dist->irq_spi_cpu[irq_num - VGIC_NR_PRIVATE_IRQS];
> +		if (cpuid == VCPU_NOT_ALLOCATED) {
> +			/* Pretend we use CPU0, and prevent injection */
> +			cpuid = 0;
> +			can_inject = false;
> +		}
>  		vcpu = kvm_get_vcpu(kvm, cpuid);
>  	}
>  
> @@ -1285,7 +1290,7 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>  
>  	enabled = vgic_irq_is_enabled(vcpu, irq_num);
>  
> -	if (!enabled) {
> +	if (!enabled || !can_inject) {
>  		ret = false;
>  		goto out;
>  	}
> @@ -1438,6 +1443,7 @@ void kvm_vgic_destroy(struct kvm *kvm)
>  	}
>  	kfree(dist->irq_sgi_sources);
>  	kfree(dist->irq_spi_cpu);
> +	kfree(dist->irq_spi_mpidr);
>  	kfree(dist->irq_spi_target);
>  	kfree(dist->irq_pending_on_cpu);
>  	dist->irq_sgi_sources = NULL;
> @@ -1628,6 +1634,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
>  	kvm->arch.vgic.vctrl_base = vgic->vctrl_base;
>  	kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
>  	kvm->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
> +	kvm->arch.vgic.vgic_redist_base = VGIC_ADDR_UNDEF;
>  
>  out_unlock:
>  	for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
> diff --git a/virt/kvm/arm/vgic.h b/virt/kvm/arm/vgic.h
> index ff3171a..b0c6b2f 100644
> --- a/virt/kvm/arm/vgic.h
> +++ b/virt/kvm/arm/vgic.h
> @@ -35,6 +35,8 @@
>  #define ACCESS_WRITE_VALUE	(3 << 1)
>  #define ACCESS_WRITE_MASK(x)	((x) & (3 << 1))
>  
> +#define VCPU_NOT_ALLOCATED	((u8)-1)
> +
>  unsigned long *vgic_bitmap_get_shared_map(struct vgic_bitmap *x);
>  
>  void vgic_update_state(struct kvm *kvm);
> @@ -115,5 +117,6 @@ int vgic_set_common_attr(struct kvm_device *dev, struct kvm_device_attr *attr);
>  int vgic_get_common_attr(struct kvm_device *dev, struct kvm_device_attr *attr);
>  
>  int vgic_v2_init_emulation(struct kvm *kvm);
> +int vgic_v3_init_emulation(struct kvm *kvm);
>  
>  #endif
> -- 
> 1.7.9.5
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 16/19] arm64: GICv3: introduce symbolic names for GICv3 ICC_SGI1R_EL1 fields
  2014-11-14 10:08 ` [PATCH v4 16/19] arm64: GICv3: introduce symbolic names for GICv3 ICC_SGI1R_EL1 fields Andre Przywara
@ 2014-11-23 14:43   ` Christoffer Dall
  0 siblings, 0 replies; 80+ messages in thread
From: Christoffer Dall @ 2014-11-23 14:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Nov 14, 2014 at 10:08:00AM +0000, Andre Przywara wrote:
> The gic_send_sgi() function used hardcoded bit shift values to
> generate the ICC_SGI1R_EL1 register value.
> Replace this with symbolic names to allow reusing them later.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 17/19] arm64: KVM: add SGI generation register emulation
  2014-11-14 10:08 ` [PATCH v4 17/19] arm64: KVM: add SGI generation register emulation Andre Przywara
@ 2014-11-23 15:08   ` Christoffer Dall
  2014-11-24 16:37     ` Andre Przywara
  0 siblings, 1 reply; 80+ messages in thread
From: Christoffer Dall @ 2014-11-23 15:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Nov 14, 2014 at 10:08:01AM +0000, Andre Przywara wrote:
> While the generation of a (virtual) inter-processor interrupt (SGI)
> on a GICv2 works by writing to a MMIO register, GICv3 uses the system
> register ICC_SGI1R_EL1 to trigger them.
> Trap that register on ARM64 hosts and handle it in a new handler
> function in the GICv3 emulation code.

Did you reorder something or does my previous comment still apply that
you're not enabling trapping yet, you're just adding the handler - those
are two different things.

You sort of left my question about access_gic_sgi() not checking if the
gicv3 is presetn hanging from the last thread, but I think I'm
understanding properly now, that as long as you're not setting the
ICC_SRE_EL2.Enable = 1, then we'll never get here, right?

> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
> Changelog v3...v4:
> - moved addition of vgic_v3_dispatch_sgi() from earlier patch into here
> - move MPIDR comparison into extra function
> - use new ICC_SGI1R_ field names
> - improve readability of vgic_v3_dispatch_sgi()
> - add and refine comments
> 
>  arch/arm64/kvm/sys_regs.c   |   26 ++++++++++
>  include/kvm/arm_vgic.h      |    1 +
>  virt/kvm/arm/vgic-v3-emul.c |  113 +++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 140 insertions(+)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index fd3ffc3..e59369a 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -165,6 +165,27 @@ static bool access_sctlr(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> +/*
> + * Trap handler for the GICv3 SGI generation system register.
> + * Forward the request to the VGIC emulation.
> + * The cp15_64 code makes sure this automatically works
> + * for both AArch64 and AArch32 accesses.
> + */
> +static bool access_gic_sgi(struct kvm_vcpu *vcpu,
> +			   const struct sys_reg_params *p,
> +			   const struct sys_reg_desc *r)
> +{
> +	u64 val;
> +
> +	if (!p->is_write)
> +		return read_from_write_only(vcpu, p);
> +
> +	val = *vcpu_reg(vcpu, p->Rt);
> +	vgic_v3_dispatch_sgi(vcpu, val);
> +
> +	return true;
> +}
> +
>  static bool trap_raz_wi(struct kvm_vcpu *vcpu,
>  			const struct sys_reg_params *p,
>  			const struct sys_reg_desc *r)
> @@ -431,6 +452,9 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	/* VBAR_EL1 */
>  	{ Op0(0b11), Op1(0b000), CRn(0b1100), CRm(0b0000), Op2(0b000),
>  	  NULL, reset_val, VBAR_EL1, 0 },
> +	/* ICC_SGI1R_EL1 */
> +	{ Op0(0b11), Op1(0b000), CRn(0b1100), CRm(0b1011), Op2(0b101),
> +	  access_gic_sgi },
>  	/* CONTEXTIDR_EL1 */
>  	{ Op0(0b11), Op1(0b000), CRn(0b1101), CRm(0b0000), Op2(0b001),
>  	  access_vm_reg, reset_val, CONTEXTIDR_EL1, 0 },
> @@ -659,6 +683,8 @@ static const struct sys_reg_desc cp14_64_regs[] = {
>   * register).
>   */
>  static const struct sys_reg_desc cp15_regs[] = {
> +	{ Op1( 0), CRn( 0), CRm(12), Op2( 0), access_gic_sgi },
> +
>  	{ Op1( 0), CRn( 1), CRm( 0), Op2( 0), access_sctlr, NULL, c1_SCTLR },
>  	{ Op1( 0), CRn( 2), CRm( 0), Op2( 0), access_vm_reg, NULL, c2_TTBR0 },
>  	{ Op1( 0), CRn( 2), CRm( 0), Op2( 1), access_vm_reg, NULL, c2_TTBR1 },
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index c1ef5a9..357a935 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -305,6 +305,7 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu);
>  void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu);
>  int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
>  			bool level);
> +void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>  bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
>  		      struct kvm_exit_mmio *mmio);
> diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
> index 97b5801..58d7457 100644
> --- a/virt/kvm/arm/vgic-v3-emul.c
> +++ b/virt/kvm/arm/vgic-v3-emul.c
> @@ -828,6 +828,119 @@ int vgic_v3_init_emulation(struct kvm *kvm)
>  	return 0;
>  }
>  
> +/*
> + * Compare a given affinity (level 1-3 and a level 0 mask, from the SGI
> + * generation register ICC_SGI1R_EL1) with a given VCPU.
> + * If the VCPU's MPIDR matches, return the level0 affinity, otherwise
> + * return -1.
> + */
> +static int match_mpidr(u64 sgi_aff, u16 sgi_cpu_mask, struct kvm_vcpu *vcpu)
> +{
> +	unsigned long affinity;
> +	int level0;
> +
> +	/*
> +	 * Split the current VCPU's MPIDR into affinity level 0 and the
> +	 * rest as this is what we have to compare against.
> +	 */
> +	affinity = kvm_vcpu_get_mpidr_aff(vcpu);
> +	level0 = MPIDR_AFFINITY_LEVEL(affinity, 0);
> +	affinity &= ~MPIDR_LEVEL_MASK;
> +
> +	/* bail out if the upper three levels don't match */
> +	if (sgi_aff != affinity)
> +		return -1;
> +
> +	/* Is this VCPU's bit set in the mask ? */
> +	if (!(sgi_cpu_mask & BIT(level0)))
> +		return -1;
> +
> +	return level0;
> +}
> +
> +#define SGI_AFFINITY_LEVEL(reg, level) \
> +	((((reg) & ICC_SGI1R_AFFINITY_## level ##_MASK) \
> +	>> ICC_SGI1R_AFFINITY_## level ##_SHIFT) << MPIDR_LEVEL_SHIFT(level))
> +
> +/**
> + * vgic_v3_dispatch_sgi - handle SGI requests from VCPUs
> + * @vcpu: The VCPU requesting a SGI
> + * @reg: The value written into the ICC_SGI1R_EL1 register by that VCPU
> + *
> + * With GICv3 (and ARE=1) CPUs trigger SGIs by writing to an architectural

what's a non-architectural system register?

> + * system register. This will trap in sys_regs.c and call this function.
> + * This ICC_SGI1R_EL1 register contains the upper three affinity levels of the
> + * target processors as well as a bitmask of 16 Aff0 CPUs.
> + * If the interrupt routing mode bit is not set, we iterate over all VCPUs to
> + * check for matching ones. If this bit is set, we signal all, but not the
> + * calling VCPU.
> + */
> +void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvm_vcpu *c_vcpu;
> +	struct vgic_dist *dist = &kvm->arch.vgic;
> +	u16 target_cpus;
> +	u64 mpidr;
> +	int sgi, c, vcpu_id;
> +	bool broadcast;
> +	int updated = 0;
> +
> +	vcpu_id = vcpu->vcpu_id;
> +
> +	sgi = (reg & ICC_SGI1R_SGI_ID_MASK) >> ICC_SGI1R_SGI_ID_SHIFT;
> +	broadcast = reg & BIT(ICC_SGI1R_IRQ_ROUTING_MODE_BIT);
> +	target_cpus = (reg & ICC_SGI1R_TARGET_LIST_MASK) >> ICC_SGI1R_TARGET_LIST_SHIFT;
> +	mpidr = SGI_AFFINITY_LEVEL(reg, 3);
> +	mpidr |= SGI_AFFINITY_LEVEL(reg, 2);
> +	mpidr |= SGI_AFFINITY_LEVEL(reg, 1);
> +	mpidr &= ~MPIDR_LEVEL_MASK;

do you need this last mask?  It should be 0 already, right?

> +
> +	/*
> +	 * We take the dist lock here, because we come from the sysregs
> +	 * code path and not from the MMIO one (which already takes the lock).
> +	 */
> +	spin_lock(&dist->lock);
> +
> +	/*
> +	 * We iterate over all VCPUs to find the MPIDRs matching the request.
> +	 * If we have handled one CPU, we clear it's bit to detect early
> +	 * if we are already finished. This avoids iterating through all
> +	 * VCPUs when most of the times we just signal a single VCPU.
> +	 */
> +	kvm_for_each_vcpu(c, c_vcpu, kvm) {
> +
> +		/* Exit early if we have dealt with all requested CPUs */
> +		if (!broadcast && target_cpus == 0)
> +			break;
> +
> +		 /* Don't signal the calling VCPU */
> +		if (broadcast && c == vcpu_id)
> +			continue;
> +
> +		if (!broadcast) {
> +			int level0;
> +
> +			level0 = match_mpidr(mpidr, target_cpus, c_vcpu);
> +			if (level0 == -1)
> +				continue;
> +
> +			/* remove this matching VCPU from the mask */
> +			target_cpus &= ~BIT(level0);
> +		}
> +
> +		/* Flag the SGI as pending */
> +		vgic_dist_irq_set_pending(c_vcpu, sgi);
> +		updated = 1;
> +		kvm_debug("SGI%d from CPU%d to CPU%d\n", sgi, vcpu_id, c);
> +	}
> +	if (updated)
> +		vgic_update_state(vcpu->kvm);
> +	spin_unlock(&dist->lock);
> +	if (updated)
> +		vgic_kick_vcpus(vcpu->kvm);
> +}
> +
>  static int vgic_v3_create(struct kvm_device *dev, u32 type)
>  {
>  	return kvm_vgic_create(dev->kvm, type);
> -- 
> 1.7.9.5
> 

Assuming you'll address the commit message stuff above:

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 18/19] arm/arm64: KVM: enable kernel side of GICv3 emulation
  2014-11-14 10:08 ` [PATCH v4 18/19] arm/arm64: KVM: enable kernel side of GICv3 emulation Andre Przywara
@ 2014-11-24  9:09   ` Christoffer Dall
  2014-11-24 17:41     ` Andre Przywara
  0 siblings, 1 reply; 80+ messages in thread
From: Christoffer Dall @ 2014-11-24  9:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Nov 14, 2014 at 10:08:02AM +0000, Andre Przywara wrote:
> With all the necessary GICv3 emulation code in place, we can now
> connect the code to the GICv3 backend in the kernel.
> The LR register handling is different depending on the emulated GIC
> model, so provide different implementations for each.
> Also allow non-v2-compatible GICv3 implementations (which don't
> provide MMIO regions for the virtual CPU interface in the DT), but
> restrict those hosts to support GICv3 guests only.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
> Changelog v3...v4:
> - handle differences between GICv2-on-v3 and GICv3-on-v3 in existing functions
> - remove init_*_emul() functions
> - remove max_vcpus setting (done in earlier patches now)
> - adapt to new vgic_v<n>_init_emulation behaviour
> 
>  virt/kvm/arm/vgic-v3.c |   83 ++++++++++++++++++++++++++++++++----------------
>  virt/kvm/arm/vgic.c    |    5 +++
>  2 files changed, 60 insertions(+), 28 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
> index a04d208..4894c59 100644
> --- a/virt/kvm/arm/vgic-v3.c
> +++ b/virt/kvm/arm/vgic-v3.c
> @@ -34,6 +34,7 @@
>  #define GICH_LR_VIRTUALID		(0x3ffUL << 0)
>  #define GICH_LR_PHYSID_CPUID_SHIFT	(10)
>  #define GICH_LR_PHYSID_CPUID		(7UL << GICH_LR_PHYSID_CPUID_SHIFT)
> +#define ICH_LR_VIRTUALID_MASK		(BIT_ULL(32) - 1)
>  
>  /*
>   * LRs are stored in reverse order in memory. make sure we index them
> @@ -48,12 +49,17 @@ static struct vgic_lr vgic_v3_get_lr(const struct kvm_vcpu *vcpu, int lr)
>  	struct vgic_lr lr_desc;
>  	u64 val = vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[LR_INDEX(lr)];
>  
> -	lr_desc.irq	= val & GICH_LR_VIRTUALID;
> -	if (lr_desc.irq <= 15)
> -		lr_desc.source	= (val >> GICH_LR_PHYSID_CPUID_SHIFT) & 0x7;
> +	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
> +		lr_desc.irq = val & ICH_LR_VIRTUALID_MASK;
>  	else
> -		lr_desc.source = 0;
> -	lr_desc.state	= 0;
> +		lr_desc.irq = val & GICH_LR_VIRTUALID;
> +
> +	lr_desc.source = 0;
> +	if (lr_desc.irq <= 15 &&
> +	    vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V2)
> +		lr_desc.source	= (val >> GICH_LR_PHYSID_CPUID_SHIFT) & 0x7;
> +
> +	lr_desc.state   = 0;

super-nit-only-if-you-respin: you have a couple of tabs and extra spaces
in the two lines above that need to just be a single space before the
assignment operator on each line.

>  
>  	if (val & ICH_LR_PENDING_BIT)
>  		lr_desc.state |= LR_STATE_PENDING;
> @@ -68,8 +74,20 @@ static struct vgic_lr vgic_v3_get_lr(const struct kvm_vcpu *vcpu, int lr)
>  static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
>  			   struct vgic_lr lr_desc)
>  {
> -	u64 lr_val = (((u32)lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT) |
> -		      lr_desc.irq);
> +	u64 lr_val;
> +
> +	lr_val = lr_desc.irq;
> +
> +	/*
> +	 * currently all guest IRQs are Group1, as Group0 would result

I guess you couldn't guess my comment, from last time, can you please
begin sentences with upper-case?  (only if you re-spin).

> +	 * in a FIQ in the guest, which it wouldn't expect.
> +	 * Eventually we want to make this configurable, so we may revisit
> +	 * this in the future.
> +	 */
> +	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
> +		lr_val |= ICH_LR_GROUP;
> +	else
> +		lr_val |= (u32)lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT;
>  
>  	if (lr_desc.state & LR_STATE_PENDING)
>  		lr_val |= ICH_LR_PENDING_BIT;
> @@ -154,7 +172,14 @@ static void vgic_v3_enable(struct kvm_vcpu *vcpu)
>  	 */
>  	vgic_v3->vgic_vmcr = 0;
>  
> -	vgic_v3->vgic_sre = 0;
> +	/*
> +	 * Set the SRE_EL1 value depending on the configured
> +	 * emulated vGIC model.
> +	 */
> +	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
> +		vgic_v3->vgic_sre = ICC_SRE_EL1_SRE;

I think we left some of my questions from last round unanswered here
(you said you needed to go and think about it).  If the guest sets SRE=0
we will not currently preserve this value I think.

The comment should clearly indicate that we're choosing a reset value
of the register for our implementation of the gic.

I'm fine with this change, but would like to know what the rationale
behind it is; wouldn't guests always initialize this value?

> +	else
> +		vgic_v3->vgic_sre = 0;
>  
>  	/* Get the show on the road... */
>  	vgic_v3->vgic_hcr = ICH_HCR_EN;
> @@ -215,28 +240,30 @@ int vgic_v3_probe(struct device_node *vgic_node,
>  
>  	gicv_idx += 3; /* Also skip GICD, GICC, GICH */
>  	if (of_address_to_resource(vgic_node, gicv_idx, &vcpu_res)) {
> -		kvm_err("Cannot obtain GICV region\n");
> -		ret = -ENXIO;
> -		goto out;
> -	}
> -
> -	if (!PAGE_ALIGNED(vcpu_res.start)) {
> -		kvm_err("GICV physical address 0x%llx not page aligned\n",
> -			(unsigned long long)vcpu_res.start);
> -		ret = -ENXIO;
> -		goto out;
> -	}
> -
> -	if (!PAGE_ALIGNED(resource_size(&vcpu_res))) {
> -		kvm_err("GICV size 0x%llx not a multiple of page size 0x%lx\n",
> -			(unsigned long long)resource_size(&vcpu_res),
> -			PAGE_SIZE);
> -		ret = -ENXIO;
> -		goto out;
> +		kvm_info("GICv3: GICv2 emulation not available\n");
> +		vgic->vcpu_base = 0;
> +	} else {
> +		if (!PAGE_ALIGNED(vcpu_res.start)) {
> +			kvm_err("GICV physical address 0x%llx not page aligned\n",
> +				(unsigned long long)vcpu_res.start);
> +			ret = -ENXIO;
> +			goto out;

shouldn't we be allowing an emulated gicv3 using the system registers in
this case then?

> +		}
> +
> +		if (!PAGE_ALIGNED(resource_size(&vcpu_res))) {
> +			kvm_err("GICV size 0x%llx not a multiple of page size 0x%lx\n",
> +				(unsigned long long)resource_size(&vcpu_res),
> +				PAGE_SIZE);
> +			ret = -ENXIO;
> +			goto out;

ditto?

> +		}
> +
> +		vgic->vcpu_base = vcpu_res.start;
> +		kvm_register_device_ops(&kvm_arm_vgic_v2_ops,
> +					KVM_DEV_TYPE_ARM_VGIC_V2);
>  	}
> -	kvm_register_device_ops(&kvm_arm_vgic_v2_ops, KVM_DEV_TYPE_ARM_VGIC_V2);
> +	kvm_register_device_ops(&kvm_arm_vgic_v3_ops, KVM_DEV_TYPE_ARM_VGIC_V3);
>  
> -	vgic->vcpu_base = vcpu_res.start;
>  	vgic->vctrl_base = NULL;
>  	vgic->type = VGIC_V3;
>  	vgic->max_hw_vcpus = KVM_MAX_VCPUS;
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index b7de0f8..1dbaeb5 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1577,6 +1577,11 @@ static int init_vgic_model(struct kvm *kvm, int type)
>  	case KVM_DEV_TYPE_ARM_VGIC_V2:
>  		ret = vgic_v2_init_emulation(kvm);
>  		break;
> +#ifdef CONFIG_ARM_GIC_V3
> +	case KVM_DEV_TYPE_ARM_VGIC_V3:
> +		ret = vgic_v3_init_emulation(kvm);
> +		break;
> +#endif
>  	default:
>  		ret = -ENODEV;
>  		break;
> -- 
> 1.7.9.5
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 00/19] KVM GICv3 emulation
  2014-11-14 10:07 [PATCH v4 00/19] KVM GICv3 emulation Andre Przywara
                   ` (18 preceding siblings ...)
  2014-11-14 10:08 ` [PATCH v4 19/19] arm/arm64: KVM: allow userland to request a virtual GICv3 Andre Przywara
@ 2014-11-24  9:33 ` Eric Auger
  2014-11-24 17:46   ` Andre Przywara
  19 siblings, 1 reply; 80+ messages in thread
From: Eric Auger @ 2014-11-24  9:33 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/14/2014 11:07 AM, Andre Przywara wrote:
> This is version 4 of the GICv3 guest emulation series.
> 
> As the previous series, this is based on v3.18-rc2. I didn't bother to
> update this to keep it diff-able with v3 of the patch series.
> 
> I addressed most of the comments from Christoffer's last review round
> (mange tak!). There were numerous changes, though mostly reworking and
> style changes. To ease spotting them I updated the kvm-gicv3/v3 branch
> in the repo mentioned below to carry all the delta patches. Those
> patches are just for reference to see what has changed between
> v3 and v4. For review and all other purposes please use the v4 branch.
> 
> For an changelog summary see below, also each patch carries a
> changelog now.
> Patch 02, 04, 06, 07, 09, 10, 11 have had no code changes compared to
> their previous counterparts.
> The (v3) patch 06 is gone now, the new patch 14 has changed dramatically
> and a new patch (16/19) has been added.
> 
> I quickly tested this version with a GICv3 capable fast model in all
> endianness modes (LE guest on LE host, BE on LE, LE on BE, BE on BE).
> Both a GICv2 and a GICv3 guest were booted in all four combinations.
> In contrast to the v3 commit message, cross-endianness was working
> fine already with the previous patch series.
> 
> A git repo hosting all these patches lives in the kvm-gicv3/v4 branch
> of:
> http://www.linux-arm.org/git?p=linux-ap.git
> git://linux-arm.org/linux-ap.git
> -----
> 
> GICv3 is the ARM generic interrupt controller designed to overcome
> some limits of the prevalent GICv2. Most notably it lifts the 8-CPU
> limit. Though with Linux-3.17 Marc introduced support for hosts to
> use a GICv3, the CPU limitation still applies to KVM guests, since
> the current code emulates a GICv2 only.
> Also, GICv2 backward compatibility being optional in GICv3, a number
> of systems won't be able to run GICv2 guests.
> 
> This patch series provides code to emulate a GICv3 distributor and
> redistributor for any KVM guest. It requires a GICv3 in the host to
> work. With those patches one can run guests efficiently on any GICv3
> host. It has the following features:
> - Affinity routing (support for up to 255 VCPUs, more possible)
> - System registers (as opposed to MMIO access)
> - No ITS
> - No priority support (as the GICv2 emulation)
> - No save / restore support so far (will be added soon)
> - Only Group1 interrupts support
> 
> The first patches actually refactor the current VGIC code to make
> room for a different VGIC model to be dropped in with Patch 16.
> The remaining patches connect the new model to the kernel backend and
> the userland facing code.
> 
> The series goes on top of v3.18-rc2.
> The necessary patches for kvmtool to enable the guest's GICv3 have
> been posted here before [1], an updated version will follow soon.
> 
> There was some testing on the fast model with some I/O and interrupt
> affinity shuffling in a Linux guest with a varying number of VCPUs as
> well as some testing on a Juno board (GICv2 only, to spot regressions).
> 
> Please review and test.
> I would be grateful for people to test for GICv2 regressions also
> (so on a GICv2 host with current kvmtool/qemu), as there is quite
> some refactoring on that front.

Hi Andre,

I tested your kvm-gicv3/v4 branch on Calxeda Midway with GICv2 host in
QEMU/VFIO passthrough use case. It behaves as expected.

Best Regards

Eric
> 
> Much of the code was inspired by MarcZ, also kudos to him for doing
> the rather painful rebase on top of v3.17-rc1.
> 
> Cheers,
> Andre.
> 
> [1] https://lists.cs.columbia.edu/pipermail/kvmarm/2014-June/010086.html
> 
> Changes v3 ... v4:
> * bug-fix in handling GICv3 redistributor CFG register
> * move set/get_lr from gic_vm_ops back to vgic_ops (get rid of v3 06/19)
> * getting rid of init_emul() at all
> * rework guest GIC model initialization
> * use non-atomic bit-set and bit-clear functions
> * split up handle_mmio_misc* into multiple functions
> * refine handling of some reserved registers
> * use symbolic names for ICC_SGI1R_EL1 register fields (new patch 16/19)
> * move private parameter from MMIO accessors to struct kvm_mmio_exit
> * added documentation of new GICv3 guest device
> * added lots of comments
> * some renaming of identifiers
> * minor changes in style and code flow of various functions
> 
> Changes v2 ... v3:
> * rebase to v3.18-rc2
> * adapt to new kvm_register_device() function
> * split up vm_ops patch and the GICv2 split-off patch to ease review
> * various smaller changes due to Christoffer's review
> * fix compilation for arm
> * remove support for trapping SGI sysreg accesses on arm hosts
> 
> Changes v1 ... v2:
> * rebase to v3.17-rc1, caused quite some changes to the init code
> * new 9/15 patch to make 10/15 smaller
> * fix wrongly ordered cp15 register trap entry (MarcZ)
> * fix SGI broadcast (thanks to wanghaibin for spotting)
> * fix broken bailout path in kvm_vgic_create (wanghaibin)
> * check return value of init_emulation_ops() (wanghaibin)
> * fix return value check in vgic_[sg]et_attr()
> * add header inclusion guards
> * remove double definition of VCPU_NOT_ALLOCATED
> * some code move-around
> * whitespace fixes
> 
> Andre Przywara (19):
>   arm/arm64: KVM: rework MPIDR assignment and add accessors
>   arm/arm64: KVM: pass down user space provided GIC type into vGIC code
>   arm/arm64: KVM: refactor vgic_handle_mmio() function
>   arm/arm64: KVM: wrap 64 bit MMIO accesses with two 32 bit ones
>   arm/arm64: KVM: introduce per-VM ops
>   arm/arm64: KVM: move kvm_register_device_ops() into vGIC probing
>   arm/arm64: KVM: dont rely on a valid GICH base address
>   arm/arm64: KVM: make the maximum number of vCPUs a per-VM value
>   arm/arm64: KVM: make the value of ICC_SRE_EL1 a per-VM variable
>   arm/arm64: KVM: refactor MMIO accessors
>   arm/arm64: KVM: refactor/wrap vgic_set/get_attr()
>   arm/arm64: KVM: add vgic.h header file
>   arm/arm64: KVM: split GICv2 specific emulation code from vgic.c
>   arm/arm64: KVM: add opaque private pointer to MMIO data
>   arm/arm64: KVM: add virtual GICv3 distributor emulation
>   arm64: GICv3: introduce symbolic names for GICv3 ICC_SGI1R_EL1 fields
>   arm64: KVM: add SGI generation register emulation
>   arm/arm64: KVM: enable kernel side of GICv3 emulation
>   arm/arm64: KVM: allow userland to request a virtual GICv3
> 
>  Documentation/virtual/kvm/devices/arm-vgic.txt |   21 +-
>  arch/arm/include/asm/kvm_emulate.h             |    5 +-
>  arch/arm/include/asm/kvm_host.h                |    3 +
>  arch/arm/include/asm/kvm_mmio.h                |    1 +
>  arch/arm/kvm/Makefile                          |    1 +
>  arch/arm/kvm/arm.c                             |   23 +-
>  arch/arm/kvm/psci.c                            |   17 +-
>  arch/arm64/include/asm/kvm_emulate.h           |    5 +-
>  arch/arm64/include/asm/kvm_host.h              |    5 +
>  arch/arm64/include/asm/kvm_mmio.h              |    1 +
>  arch/arm64/include/uapi/asm/kvm.h              |    7 +
>  arch/arm64/kernel/asm-offsets.c                |    1 +
>  arch/arm64/kvm/Makefile                        |    2 +
>  arch/arm64/kvm/sys_regs.c                      |   37 +-
>  arch/arm64/kvm/vgic-v3-switch.S                |   14 +-
>  drivers/irqchip/irq-gic-v3.c                   |   14 +-
>  include/kvm/arm_vgic.h                         |   34 +-
>  include/linux/irqchip/arm-gic-v3.h             |   44 +
>  include/linux/kvm_host.h                       |    2 +
>  include/uapi/linux/kvm.h                       |    2 +
>  virt/kvm/arm/vgic-v2-emul.c                    |  805 ++++++++++++++++++
>  virt/kvm/arm/vgic-v2.c                         |    3 +
>  virt/kvm/arm/vgic-v3-emul.c                    | 1020 +++++++++++++++++++++++
>  virt/kvm/arm/vgic-v3.c                         |   89 +-
>  virt/kvm/arm/vgic.c                            | 1065 ++++++------------------
>  virt/kvm/arm/vgic.h                            |  122 +++
>  26 files changed, 2469 insertions(+), 874 deletions(-)
>  create mode 100644 virt/kvm/arm/vgic-v2-emul.c
>  create mode 100644 virt/kvm/arm/vgic-v3-emul.c
>  create mode 100644 virt/kvm/arm/vgic.h
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 19/19] arm/arm64: KVM: allow userland to request a virtual GICv3
  2014-11-14 10:08 ` [PATCH v4 19/19] arm/arm64: KVM: allow userland to request a virtual GICv3 Andre Przywara
@ 2014-11-24  9:39   ` Christoffer Dall
  0 siblings, 0 replies; 80+ messages in thread
From: Christoffer Dall @ 2014-11-24  9:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Nov 14, 2014 at 10:08:03AM +0000, Andre Przywara wrote:
> With all of the GICv3 code in place now we allow userland to ask the
> kernel for using a virtual GICv3 in the guest.
> Also we provide the necessary support for guests setting the memory
> addresses for the virtual distributor and redistributors.
> This requires some userland code to make use of that feature and
> explicitly ask for a virtual GICv3.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
> Changelog v3...v4:
> - refine commit message
> - add documentation of new GICv3 KVM device
> 
>  Documentation/virtual/kvm/devices/arm-vgic.txt |   21 +++++++++--
>  arch/arm64/include/uapi/asm/kvm.h              |    7 ++++
>  include/kvm/arm_vgic.h                         |    4 +--
>  virt/kvm/arm/vgic-v3-emul.c                    |    3 ++
>  virt/kvm/arm/vgic.c                            |   46 +++++++++++++++++-------
>  5 files changed, 64 insertions(+), 17 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/devices/arm-vgic.txt b/Documentation/virtual/kvm/devices/arm-vgic.txt
> index df8b0c7..67e4c3e 100644
> --- a/Documentation/virtual/kvm/devices/arm-vgic.txt
> +++ b/Documentation/virtual/kvm/devices/arm-vgic.txt
> @@ -3,22 +3,37 @@ ARM Virtual Generic Interrupt Controller (VGIC)
>  
>  Device types supported:
>    KVM_DEV_TYPE_ARM_VGIC_V2     ARM Generic Interrupt Controller v2.0
> +  KVM_DEV_TYPE_ARM_VGIC_V3     ARM Generic Interrupt Controller v3.0
>  
>  Only one VGIC instance may be instantiated through either this API or the
>  legacy KVM_CREATE_IRQCHIP api.  The created VGIC will act as the VM interrupt
>  controller, requiring emulated user-space devices to inject interrupts to the
>  VGIC instead of directly to CPUs.

I would add a newline here.

> +Creating a guest GICv3 device requires a host GICv3 as well.
> +GICv3 implementations with hardware compatibility support allow a guest GICv2
> +as well.
>  
>  Groups:
>    KVM_DEV_ARM_VGIC_GRP_ADDR
>    Attributes:
>      KVM_VGIC_V2_ADDR_TYPE_DIST (rw, 64-bit)
>        Base address in the guest physical address space of the GIC distributor
> -      register mappings.
> +      register mappings. Only valid if a guest GICv2 has been instantiated.
>  
only valid for KVM_DEV_TYPE_ARM_VGIC_V2.

>      KVM_VGIC_V2_ADDR_TYPE_CPU (rw, 64-bit)
>        Base address in the guest physical address space of the GIC virtual cpu
> -      interface register mappings.
> +      interface register mappings. Only valid if a guest GICv2 has been
> +      instantiated.

same as above.

> +
> +    KVM_VGIC_V3_ADDR_TYPE_DIST (rw, 64-bit)
> +      Base address in the guest physical address space of the GICv3 distributor
> +      register mappings. Only valid if a guest GICv3 has been instantiated.

only valid for KVM_DEV_TYPE_ARM_VGIC_V3.  Are there not any alignment
restrictions here?

> +
> +    KVM_VGIC_V3_ADDR_TYPE_REDIST (rw, 64-bit)
> +      Base address in the guest physical address space of the GICv3
> +      redistributor register mappings. Only valid if a guest GICv3 has been
> +      instantiated.
> +

same as above

So this region's size is automatically just contiguously increased
depending on the number of the VCPUS created?

How are we ensuring this doesn't conflict with any other potential
mappings (memregions) in the guest physical address space?

>  
>    KVM_DEV_ARM_VGIC_GRP_DIST_REGS
>    Attributes:
> @@ -36,6 +51,7 @@ Groups:
>      the register.
>    Limitations:
>      - Priorities are not implemented, and registers are RAZ/WI
> +    - Currently only implemented for GICv2.

Currently only implemented for KVM_DEV_TYPE_ARM_VGIC_V2?  Or does this
not work for KVM_DEV_TYPE_ARM_VGIC_V2 on GICv3 hardware?

>    Errors:
>      -ENODEV: Getting or setting this register is not yet supported
>      -EBUSY: One or more VCPUs are running
> @@ -68,6 +84,7 @@ Groups:
>  
>    Limitations:
>      - Priorities are not implemented, and registers are RAZ/WI
> +    - Currently only implemented for GICv2.

same as above.

>    Errors:
>      -ENODEV: Getting or setting this register is not yet supported
>      -EBUSY: One or more VCPUs are running


Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 04/19] arm/arm64: KVM: wrap 64 bit MMIO accesses with two 32 bit ones
  2014-11-23  9:42   ` Christoffer Dall
@ 2014-11-24 13:50     ` Andre Przywara
  2014-11-24 14:40       ` Christoffer Dall
  0 siblings, 1 reply; 80+ messages in thread
From: Andre Przywara @ 2014-11-24 13:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On 23/11/14 09:42, Christoffer Dall wrote:
> On Fri, Nov 14, 2014 at 10:07:48AM +0000, Andre Przywara wrote:
>> Some GICv3 registers can and will be accessed as 64 bit registers.
>> Currently the register handling code can only deal with 32 bit
>> accesses, so we do two consecutive calls to cover this.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> ---
>> Changelog v3...v4:
>> - add comment explaining little endian handling
>>
>>  virt/kvm/arm/vgic.c |   51 ++++++++++++++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 48 insertions(+), 3 deletions(-)
>>
>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>> index 5eee3de..dba51e4 100644
>> --- a/virt/kvm/arm/vgic.c
>> +++ b/virt/kvm/arm/vgic.c
>> @@ -1033,6 +1033,51 @@ static bool vgic_validate_access(const struct vgic_dist *dist,
>>  }
>>  
>>  /*
>> + * Call the respective handler function for the given range.
>> + * We split up any 64 bit accesses into two consecutive 32 bit
>> + * handler calls and merge the result afterwards.
>> + * We do this in a little endian fashion regardless of the host's
>> + * or guest's endianness, because the GIC is always LE and the rest of
>> + * the code (vgic_reg_access) also puts it in a LE fashion already.
>> + */
>> +static bool call_range_handler(struct kvm_vcpu *vcpu,
>> +			       struct kvm_exit_mmio *mmio,
>> +			       unsigned long offset,
>> +			       const struct mmio_range *range)
>> +{
>> +	u32 *data32 = (void *)mmio->data;
>> +	struct kvm_exit_mmio mmio32;
>> +	bool ret;
>> +
>> +	if (likely(mmio->len <= 4))
>> +		return range->handle_mmio(vcpu, mmio, offset);
>> +
>> +	/*
>> +	 * Any access bigger than 4 bytes (that we currently handle in KVM)
>> +	 * is actually 8 bytes long, caused by a 64-bit access
>> +	 */
>> +
>> +	mmio32.len = 4;
>> +	mmio32.is_write = mmio->is_write;
>> +
>> +	mmio32.phys_addr = mmio->phys_addr + 4;
>> +	if (mmio->is_write)
>> +		*(u32 *)mmio32.data = data32[1];
>> +	ret = range->handle_mmio(vcpu, &mmio32, offset + 4);
>> +	if (!mmio->is_write)
>> +		data32[1] = *(u32 *)mmio32.data;
>> +
>> +	mmio32.phys_addr = mmio->phys_addr;
>> +	if (mmio->is_write)
>> +		*(u32 *)mmio32.data = data32[0];
>> +	ret |= range->handle_mmio(vcpu, &mmio32, offset);
> 
> nit: if handle_mmio returns multiple error codes, we will now not
> (necessarily) be preserving either, so you may just want to do a check
> on ret above and return early in the case of error.  Only worth it if
> you respin anyway.

Mmh, if I read this correctly, the return value actually turns into
updated_state, so technically I wouldn't call this an error code. I
think we must not exit after the first half and also have to keep the
OR-ing semantics of the two parts, right?

Cheers,
Andre.

> 
> Otherwise:
> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
> 
>> +	if (!mmio->is_write)
>> +		data32[0] = *(u32 *)mmio32.data;
>> +
>> +	return ret;
>> +}
>> +
>> +/*
>>   * vgic_handle_mmio_range - handle an in-kernel MMIO access
>>   * @vcpu:	pointer to the vcpu performing the access
>>   * @run:	pointer to the kvm_run structure
>> @@ -1063,10 +1108,10 @@ static bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, struct kvm_run *run,
>>  	spin_lock(&vcpu->kvm->arch.vgic.lock);
>>  	offset -= range->base;
>>  	if (vgic_validate_access(dist, range, offset)) {
>> -		updated_state = range->handle_mmio(vcpu, mmio, offset);
>> +		updated_state = call_range_handler(vcpu, mmio, offset, range);
>>  	} else {
>> -		vgic_reg_access(mmio, NULL, offset,
>> -				ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
>> +		if (!mmio->is_write)
>> +			memset(mmio->data, 0, mmio->len);
>>  		updated_state = false;
>>  	}
>>  	spin_unlock(&vcpu->kvm->arch.vgic.lock);
>> -- 
>> 1.7.9.5
>>
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 04/19] arm/arm64: KVM: wrap 64 bit MMIO accesses with two 32 bit ones
  2014-11-24 13:50     ` Andre Przywara
@ 2014-11-24 14:40       ` Christoffer Dall
  0 siblings, 0 replies; 80+ messages in thread
From: Christoffer Dall @ 2014-11-24 14:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 24, 2014 at 01:50:27PM +0000, Andre Przywara wrote:
> Hi Christoffer,
> 
> On 23/11/14 09:42, Christoffer Dall wrote:
> > On Fri, Nov 14, 2014 at 10:07:48AM +0000, Andre Przywara wrote:
> >> Some GICv3 registers can and will be accessed as 64 bit registers.
> >> Currently the register handling code can only deal with 32 bit
> >> accesses, so we do two consecutive calls to cover this.
> >>
> >> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> >> ---
> >> Changelog v3...v4:
> >> - add comment explaining little endian handling
> >>
> >>  virt/kvm/arm/vgic.c |   51 ++++++++++++++++++++++++++++++++++++++++++++++++---
> >>  1 file changed, 48 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> >> index 5eee3de..dba51e4 100644
> >> --- a/virt/kvm/arm/vgic.c
> >> +++ b/virt/kvm/arm/vgic.c
> >> @@ -1033,6 +1033,51 @@ static bool vgic_validate_access(const struct vgic_dist *dist,
> >>  }
> >>  
> >>  /*
> >> + * Call the respective handler function for the given range.
> >> + * We split up any 64 bit accesses into two consecutive 32 bit
> >> + * handler calls and merge the result afterwards.
> >> + * We do this in a little endian fashion regardless of the host's
> >> + * or guest's endianness, because the GIC is always LE and the rest of
> >> + * the code (vgic_reg_access) also puts it in a LE fashion already.
> >> + */
> >> +static bool call_range_handler(struct kvm_vcpu *vcpu,
> >> +			       struct kvm_exit_mmio *mmio,
> >> +			       unsigned long offset,
> >> +			       const struct mmio_range *range)
> >> +{
> >> +	u32 *data32 = (void *)mmio->data;
> >> +	struct kvm_exit_mmio mmio32;
> >> +	bool ret;
> >> +
> >> +	if (likely(mmio->len <= 4))
> >> +		return range->handle_mmio(vcpu, mmio, offset);
> >> +
> >> +	/*
> >> +	 * Any access bigger than 4 bytes (that we currently handle in KVM)
> >> +	 * is actually 8 bytes long, caused by a 64-bit access
> >> +	 */
> >> +
> >> +	mmio32.len = 4;
> >> +	mmio32.is_write = mmio->is_write;
> >> +
> >> +	mmio32.phys_addr = mmio->phys_addr + 4;
> >> +	if (mmio->is_write)
> >> +		*(u32 *)mmio32.data = data32[1];
> >> +	ret = range->handle_mmio(vcpu, &mmio32, offset + 4);
> >> +	if (!mmio->is_write)
> >> +		data32[1] = *(u32 *)mmio32.data;
> >> +
> >> +	mmio32.phys_addr = mmio->phys_addr;
> >> +	if (mmio->is_write)
> >> +		*(u32 *)mmio32.data = data32[0];
> >> +	ret |= range->handle_mmio(vcpu, &mmio32, offset);
> > 
> > nit: if handle_mmio returns multiple error codes, we will now not
> > (necessarily) be preserving either, so you may just want to do a check
> > on ret above and return early in the case of error.  Only worth it if
> > you respin anyway.
> 
> Mmh, if I read this correctly, the return value actually turns into
> updated_state, so technically I wouldn't call this an error code. I
> think we must not exit after the first half and also have to keep the
> OR-ing semantics of the two parts, right?
> 
Doh, it's a bool, forget what I said.

-Christoffer

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation
  2014-11-23 14:38   ` Christoffer Dall
@ 2014-11-24 16:00     ` Andre Przywara
  2014-11-25 10:41       ` Christoffer Dall
  0 siblings, 1 reply; 80+ messages in thread
From: Andre Przywara @ 2014-11-24 16:00 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 23/11/14 14:38, Christoffer Dall wrote:
> On Fri, Nov 14, 2014 at 10:07:59AM +0000, Andre Przywara wrote:
>> With everything separated and prepared, we implement a model of a
>> GICv3 distributor and redistributors by using the existing framework
>> to provide handler functions for each register group.
>>
>> Currently we limit the emulation to a model enforcing a single
>> security state, with SRE==1 (forcing system register access) and
>> ARE==1 (allowing more than 8 VCPUs).
>>
>> We share some of the functions provided for GICv2 emulation, but take
>> the different ways of addressing (v)CPUs into account.
>> Save and restore is currently not implemented.
>>
>> Similar to the split-off of the GICv2 specific code, the new emulation
>> code goes into a new file (vgic-v3-emul.c).
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> ---
>> Changelog v3...v4:
>> - remove ICC_SGI1R_EL1 register handling (moved into later patch)
>> - add definitions for single security state
> 
> what exactly does this mean?

Sorry, forgot to add that I was talking about GICD_CTLR bit values here.
Unfortunately the meaning of bits 0..2 differs depending on the security
setup. I added the names for the single security case (the third group
in the spec).

>> - document emulation limitations in vgic-v3-emul.c header
>> - move CTLR, TYPER and IIDR handling into separate functions
>> - add RAO/WI handling for IGROUPRn registers
>> - remove unneeded offset masking on calling vgic_reg_access()
>> - rework handle_mmio_route_reg() to only handle SPIs
>> - refine IROUTERn register range
>> - use non-atomic bitops functions (__clear_bit() and __set_bit())
>> - rename vgic_dist_ranges[] to vgic_v3_dist_ranges[]
>> - add (RAZ/WI) implementation of GICD_STATUSR
>> - add (RAZ/WI) implementations of MBI registers
>> - adapt to new private passing (in struct kvm_exit_mmio instead of a paramter)
>> - fix vcpu_id calculation bug in handle CFG registers
> 
> which bug was that?  I can't see the difference between v3 and v4 on
> this one?

http://www.linux-arm.org/git?p=linux-ap.git;a=commitdiff;h=dae47cd0b1c233241112037cea717f8c364a34e9
That was obviously introduced during some rework, where I didn't catch
all the cases.

>> - always use hexadecimal numbers for .len member
>> - simplify vgic_v3_handle_mmio()
>> - add vgic_v3_create() and vgic_v3_destroy()
>> - swap vgic_v3_[sg]et_attr() code location
>> - add and improve comments
>> - (adaptions to changes from earlier patches)
>>
>>  arch/arm64/kvm/Makefile            |    1 +
>>  include/kvm/arm_vgic.h             |    9 +-
>>  include/linux/irqchip/arm-gic-v3.h |   32 ++
>>  include/linux/kvm_host.h           |    1 +
>>  include/uapi/linux/kvm.h           |    2 +
>>  virt/kvm/arm/vgic-v3-emul.c        |  904 ++++++++++++++++++++++++++++++++++++
>>  virt/kvm/arm/vgic.c                |   11 +-
>>  virt/kvm/arm/vgic.h                |    3 +
>>  8 files changed, 960 insertions(+), 3 deletions(-)
>>  create mode 100644 virt/kvm/arm/vgic-v3-emul.c
>>
>> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
>> index d957353..4e6e09e 100644
>> --- a/arch/arm64/kvm/Makefile
>> +++ b/arch/arm64/kvm/Makefile
>> @@ -24,5 +24,6 @@ kvm-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v2.o
>>  kvm-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v2-emul.o
>>  kvm-$(CONFIG_KVM_ARM_VGIC) += vgic-v2-switch.o
>>  kvm-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v3.o
>> +kvm-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v3-emul.o
>>  kvm-$(CONFIG_KVM_ARM_VGIC) += vgic-v3-switch.o
>>  kvm-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>> index 421833f..c1ef5a9 100644
>> --- a/include/kvm/arm_vgic.h
>> +++ b/include/kvm/arm_vgic.h
>> @@ -160,7 +160,11 @@ struct vgic_dist {
>>
>>       /* Distributor and vcpu interface mapping in the guest */
>>       phys_addr_t             vgic_dist_base;
>> -     phys_addr_t             vgic_cpu_base;
>> +     /* GICv2 and GICv3 use different mapped register blocks */
>> +     union {
>> +             phys_addr_t             vgic_cpu_base;
>> +             phys_addr_t             vgic_redist_base;
>> +     };
>>
>>       /* Distributor enabled */
>>       u32                     enabled;
>> @@ -222,6 +226,9 @@ struct vgic_dist {
>>        */
>>       struct vgic_bitmap      *irq_spi_target;
>>
>> +     /* Target MPIDR for each IRQ (needed for GICv3 IROUTERn) only */
>> +     u32                     *irq_spi_mpidr;
>> +
>>       /* Bitmap indicating which CPU has something pending */
>>       unsigned long           *irq_pending_on_cpu;
>>
>> diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
>> index 03a4ea3..726d898 100644
>> --- a/include/linux/irqchip/arm-gic-v3.h
>> +++ b/include/linux/irqchip/arm-gic-v3.h
>> @@ -33,6 +33,7 @@
>>  #define GICD_SETSPI_SR                       0x0050
>>  #define GICD_CLRSPI_SR                       0x0058
>>  #define GICD_SEIR                    0x0068
>> +#define GICD_IGROUPR                 0x0080
>>  #define GICD_ISENABLER                       0x0100
>>  #define GICD_ICENABLER                       0x0180
>>  #define GICD_ISPENDR                 0x0200
>> @@ -41,14 +42,37 @@
>>  #define GICD_ICACTIVER                       0x0380
>>  #define GICD_IPRIORITYR                      0x0400
>>  #define GICD_ICFGR                   0x0C00
>> +#define GICD_IGRPMODR                        0x0D00
>> +#define GICD_NSACR                   0x0E00
>>  #define GICD_IROUTER                 0x6000
>> +#define GICD_IDREGS                  0xFFD0
>>  #define GICD_PIDR2                   0xFFE8
>>
>> +/*
>> + * Those registers are actually from GICv2, but the spec demands that they
>> + * are implemented as RES0 if ARE is 1 (which we do in KVM's emulated GICv3).
>> + */
>> +#define GICD_ITARGETSR                       0x0800
>> +#define GICD_SGIR                    0x0F00
>> +#define GICD_CPENDSGIR                       0x0F10
>> +#define GICD_SPENDSGIR                       0x0F20
>> +
>>  #define GICD_CTLR_RWP                        (1U << 31)
>> +#define GICD_CTLR_DS                 (1U << 6)
>>  #define GICD_CTLR_ARE_NS             (1U << 4)
>>  #define GICD_CTLR_ENABLE_G1A         (1U << 1)
>>  #define GICD_CTLR_ENABLE_G1          (1U << 0)
>>
>> +/*
>> + * In systems with a single security state (what we emulate in KVM)
>> + * the meaning of the interrupt group enable bits is slightly different
>> + */
>> +#define GICD_CTLR_ENABLE_SS_G1               (1U << 1)
>> +#define GICD_CTLR_ENABLE_SS_G0               (1U << 0)
>> +
>> +#define GICD_TYPER_LPIS                      (1U << 17)
>> +#define GICD_TYPER_MBIS                      (1U << 16)
>> +
>>  #define GICD_IROUTER_SPI_MODE_ONE    (0U << 31)
>>  #define GICD_IROUTER_SPI_MODE_ANY    (1U << 31)
>>
>> @@ -56,6 +80,8 @@
>>  #define GIC_PIDR2_ARCH_GICv3         0x30
>>  #define GIC_PIDR2_ARCH_GICv4         0x40
>>
>> +#define GIC_V3_DIST_SIZE             0x10000
>> +
>>  /*
>>   * Re-Distributor registers, offsets from RD_base
>>   */
>> @@ -74,6 +100,7 @@
>>  #define GICR_SYNCR                   0x00C0
>>  #define GICR_MOVLPIR                 0x0100
>>  #define GICR_MOVALLR                 0x0110
>> +#define GICR_IDREGS                  GICD_IDREGS
>>  #define GICR_PIDR2                   GICD_PIDR2
>>
>>  #define GICR_WAKER_ProcessorSleep    (1U << 1)
>> @@ -82,6 +109,7 @@
>>  /*
>>   * Re-Distributor registers, offsets from SGI_base
>>   */
>> +#define GICR_IGROUPR0                        GICD_IGROUPR
>>  #define GICR_ISENABLER0                      GICD_ISENABLER
>>  #define GICR_ICENABLER0                      GICD_ICENABLER
>>  #define GICR_ISPENDR0                        GICD_ISPENDR
>> @@ -90,10 +118,14 @@
>>  #define GICR_ICACTIVER0                      GICD_ICACTIVER
>>  #define GICR_IPRIORITYR0             GICD_IPRIORITYR
>>  #define GICR_ICFGR0                  GICD_ICFGR
>> +#define GICR_IGRPMODR0                       GICD_IGRPMODR
>> +#define GICR_NSACR                   GICD_NSACR
>>
>>  #define GICR_TYPER_VLPIS             (1U << 1)
>>  #define GICR_TYPER_LAST                      (1U << 4)
>>
>> +#define GIC_V3_REDIST_SIZE           0x20000
>> +
>>  /*
>>   * CPU interface registers
>>   */
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index 326ba7a..4a7798e 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -1085,6 +1085,7 @@ void kvm_unregister_device_ops(u32 type);
>>  extern struct kvm_device_ops kvm_mpic_ops;
>>  extern struct kvm_device_ops kvm_xics_ops;
>>  extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
>> +extern struct kvm_device_ops kvm_arm_vgic_v3_ops;
>>
>>  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
>>
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 6076882..24cb129 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -960,6 +960,8 @@ enum kvm_device_type {
>>  #define KVM_DEV_TYPE_ARM_VGIC_V2     KVM_DEV_TYPE_ARM_VGIC_V2
>>       KVM_DEV_TYPE_FLIC,
>>  #define KVM_DEV_TYPE_FLIC            KVM_DEV_TYPE_FLIC
>> +     KVM_DEV_TYPE_ARM_VGIC_V3,
>> +#define KVM_DEV_TYPE_ARM_VGIC_V3     KVM_DEV_TYPE_ARM_VGIC_V3
>>       KVM_DEV_TYPE_MAX,
>>  };
>>
>> diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
>> new file mode 100644
>> index 0000000..97b5801
>> --- /dev/null
>> +++ b/virt/kvm/arm/vgic-v3-emul.c
>> @@ -0,0 +1,904 @@
>> +/*
>> + * GICv3 distributor and redistributor emulation
>> + *
>> + * GICv3 emulation is currently only supported on a GICv3 host (because
>> + * we rely on the hardware's CPU interface virtualization support), but
>> + * supports both hardware with or without the optional GICv2 backwards
>> + * compatibility features.
>> + *
>> + * Limitations of the emulation:
>> + * (RAZ/WI: read as zero, write ignore, RAO/WI: read as one, write ignore)
>> + * - We do not support LPIs (yet). TYPER.LPIS is reported as 0 and is RAZ/WI.
>> + * - We do not support the message based interrupts (MBIs) triggered by
>> + *   writes to the GICD_{SET,CLR}SPI_* registers. TYPER.MBIS is reported as 0.
>> + * - We do not support the (optional) backwards compatibility feature.
>> + *   GICD_CTLR.ARE resets to 1 and is RAO/WI. If the _host_ GIC supports
>> + *   the compatiblity feature, you can use a GICv2 in the guest, though.
>> + * - We only support a single security state. GICD_CTLR.DS is 1 and is RAO/WI.
>> + * - Priorities are not emulated (same as the GICv2 emulation). Linux
>> + *   as a guest is fine with this, because it does not use priorities.
>> + * - We only support Group1 interrupts. Again Linux uses only those.
>> + *
>> + * Copyright (C) 2014 ARM Ltd.
>> + * Author: Andre Przywara <andre.przywara@arm.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <linux/cpu.h>
>> +#include <linux/kvm.h>
>> +#include <linux/kvm_host.h>
>> +#include <linux/interrupt.h>
>> +
>> +#include <linux/irqchip/arm-gic-v3.h>
>> +#include <kvm/arm_vgic.h>
>> +
>> +#include <asm/kvm_emulate.h>
>> +#include <asm/kvm_arm.h>
>> +#include <asm/kvm_mmu.h>
>> +
>> +#include "vgic.h"
>> +
>> +static bool handle_mmio_rao_wi(struct kvm_vcpu *vcpu,
>> +                            struct kvm_exit_mmio *mmio, phys_addr_t offset)
>> +{
>> +     u32 reg = 0xffffffff;
>> +
>> +     vgic_reg_access(mmio, &reg, offset,
>> +                     ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
>> +
>> +     return false;
>> +}
>> +
>> +static bool handle_mmio_ctlr(struct kvm_vcpu *vcpu,
>> +                          struct kvm_exit_mmio *mmio, phys_addr_t offset)
>> +{
>> +     u32 reg = 0;
>> +
>> +     /*
>> +      * Force ARE and DS to 1, the guest cannot change this.
>> +      * For the time being we only support Group1 interrupts.
>> +      */
>> +     if (vcpu->kvm->arch.vgic.enabled)
>> +             reg = GICD_CTLR_ENABLE_SS_G1;
>> +     reg |= GICD_CTLR_ARE_NS | GICD_CTLR_DS;
>> +
>> +     vgic_reg_access(mmio, &reg, offset,
>> +                     ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
>> +     if (mmio->is_write) {
>> +             if (reg & GICD_CTLR_ENABLE_SS_G0)
>> +                     kvm_info("guest tried to enable unsupported Group0 interrupts\n");
>> +             vcpu->kvm->arch.vgic.enabled = !!(reg & GICD_CTLR_ENABLE_SS_G1);
>> +             vgic_update_state(vcpu->kvm);
>> +             return true;
>> +     }
>> +     return false;
>> +}
>> +
>> +/*
>> + * As this implementation does not provide compatibility
>> + * with GICv2 (ARE==1), we report zero CPUs in bits [5..7].
>> + * Also LPIs and MBIs are not supported, so we set the respective bits to 0.
>> + * Also we report at most 2**10=1024 interrupt IDs (to match 1024 SPIs).
>> + */
>> +#define INTERRUPT_ID_BITS 10
>> +static bool handle_mmio_typer(struct kvm_vcpu *vcpu,
>> +                           struct kvm_exit_mmio *mmio, phys_addr_t offset)
>> +{
>> +     u32 reg;
>> +
>> +     /* we report at most 1024 IRQs via this interface */
> 
> hmmm, do we need to repeat ourselves here?

Actually ... not.
To avoid confusion, I will probably just drop this comment.

> I get a bit confused by both the comment above and here, as to *why* we
> are reporting this value?  And what is the bit about 'this interface'?

With this interface I meant the number of SPIs which is communicated
here in a GICv2 compatible way (ITLinesNumber). Looking forward to LPI
support I didn't want to use the term IRQ without some confinement.

> Is there another interface.

IDbits, but admittedly this isn't clear from the comment.
Not sure if that justifies more comments before we add ITS support, though.

> Perhaps what you're trying to get at here are the semantic differences
> between ITLinesNumber and IDbits and how that helps a reader understand
> the code.

I can add a better comment.

>> +     reg = (min(vcpu->kvm->arch.vgic.nr_irqs, 1024) >> 5) - 1;
>> +
>> +     reg |= (INTERRUPT_ID_BITS - 1) << 19;
>> +
>> +     vgic_reg_access(mmio, &reg, offset,
>> +                     ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
>> +
>> +     return false;
>> +}
>> +
>> +static bool handle_mmio_iidr(struct kvm_vcpu *vcpu,
>> +                          struct kvm_exit_mmio *mmio, phys_addr_t offset)
>> +{
>> +     u32 reg;
>> +
>> +     reg = (PRODUCT_ID_KVM << 24) | (IMPLEMENTER_ARM << 0);
>> +     vgic_reg_access(mmio, &reg, offset,
>> +                     ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
>> +
>> +     return false;
>> +}
>> +
>> +static bool handle_mmio_set_enable_reg_dist(struct kvm_vcpu *vcpu,
>> +                                         struct kvm_exit_mmio *mmio,
>> +                                         phys_addr_t offset)
>> +{
>> +     if (likely(offset >= VGIC_NR_PRIVATE_IRQS / 8))
>> +             return vgic_handle_enable_reg(vcpu->kvm, mmio, offset,
>> +                                           vcpu->vcpu_id,
>> +                                           ACCESS_WRITE_SETBIT);
>> +
>> +     vgic_reg_access(mmio, NULL, offset,
>> +                     ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
>> +     return false;
>> +}
>> +
>> +static bool handle_mmio_clear_enable_reg_dist(struct kvm_vcpu *vcpu,
>> +                                           struct kvm_exit_mmio *mmio,
>> +                                           phys_addr_t offset)
>> +{
>> +     if (likely(offset >= VGIC_NR_PRIVATE_IRQS / 8))
>> +             return vgic_handle_enable_reg(vcpu->kvm, mmio, offset,
>> +                                           vcpu->vcpu_id,
>> +                                           ACCESS_WRITE_CLEARBIT);
>> +
>> +     vgic_reg_access(mmio, NULL, offset,
>> +                     ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
>> +     return false;
>> +}
>> +
>> +static bool handle_mmio_set_pending_reg_dist(struct kvm_vcpu *vcpu,
>> +                                          struct kvm_exit_mmio *mmio,
>> +                                          phys_addr_t offset)
>> +{
>> +     if (likely(offset >= VGIC_NR_PRIVATE_IRQS / 8))
>> +             return vgic_handle_set_pending_reg(vcpu->kvm, mmio, offset,
>> +                                                vcpu->vcpu_id);
>> +
>> +     vgic_reg_access(mmio, NULL, offset,
>> +                     ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
>> +     return false;
>> +}
>> +
>> +static bool handle_mmio_clear_pending_reg_dist(struct kvm_vcpu *vcpu,
>> +                                            struct kvm_exit_mmio *mmio,
>> +                                            phys_addr_t offset)
>> +{
>> +     if (likely(offset >= VGIC_NR_PRIVATE_IRQS / 8))
>> +             return vgic_handle_clear_pending_reg(vcpu->kvm, mmio, offset,
>> +                                                  vcpu->vcpu_id);
>> +
>> +     vgic_reg_access(mmio, NULL, offset,
>> +                     ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
>> +     return false;
>> +}
>> +
>> +static bool handle_mmio_priority_reg_dist(struct kvm_vcpu *vcpu,
>> +                                       struct kvm_exit_mmio *mmio,
>> +                                       phys_addr_t offset)
>> +{
>> +     u32 *reg;
>> +
>> +     if (unlikely(offset < VGIC_NR_PRIVATE_IRQS)) {
>> +             vgic_reg_access(mmio, NULL, offset,
>> +                             ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
>> +             return false;
>> +     }
>> +
>> +     reg = vgic_bytemap_get_reg(&vcpu->kvm->arch.vgic.irq_priority,
>> +                                vcpu->vcpu_id, offset);
>> +     vgic_reg_access(mmio, reg, offset,
>> +             ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
>> +     return false;
>> +}
>> +
>> +static bool handle_mmio_cfg_reg_dist(struct kvm_vcpu *vcpu,
>> +                                  struct kvm_exit_mmio *mmio,
>> +                                  phys_addr_t offset)
>> +{
>> +     u32 *reg;
>> +
>> +     if (unlikely(offset < VGIC_NR_PRIVATE_IRQS / 4)) {
>> +             vgic_reg_access(mmio, NULL, offset,
>> +                             ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
>> +             return false;
>> +     }
>> +
>> +     reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_cfg,
>> +                               vcpu->vcpu_id, offset >> 1);
>> +
>> +     return vgic_handle_cfg_reg(reg, mmio, offset);
>> +}
>> +
>> +/*
>> + * We use a compressed version of the MPIDR (all 32 bits in one 32-bit word)
>> + * when we store the target MPIDR written by the guest.
>> + */
>> +static u32 compress_mpidr(unsigned long mpidr)
>> +{
>> +     u32 ret;
>> +
>> +     ret = MPIDR_AFFINITY_LEVEL(mpidr, 0);
>> +     ret |= MPIDR_AFFINITY_LEVEL(mpidr, 1) << 8;
>> +     ret |= MPIDR_AFFINITY_LEVEL(mpidr, 2) << 16;
>> +     ret |= MPIDR_AFFINITY_LEVEL(mpidr, 3) << 24;
>> +
>> +     return ret;
>> +}
>> +
>> +static unsigned long uncompress_mpidr(u32 value)
>> +{
>> +     unsigned long mpidr;
>> +
>> +     mpidr  = ((value >>  0) & 0xFF) << MPIDR_LEVEL_SHIFT(0);
>> +     mpidr |= ((value >>  8) & 0xFF) << MPIDR_LEVEL_SHIFT(1);
>> +     mpidr |= ((value >> 16) & 0xFF) << MPIDR_LEVEL_SHIFT(2);
>> +     mpidr |= (u64)((value >> 24) & 0xFF) << MPIDR_LEVEL_SHIFT(3);
>> +
>> +     return mpidr;
>> +}
>> +
>> +/*
>> + * Lookup the given MPIDR value to get the vcpu_id (if there is one)
>> + * and store that in the irq_spi_cpu[] array.
>> + * This limits the number of VCPUs to 255 for now, extending the data
>> + * type (or storing kvm_vcpu poiners) should lift the limit.
> 
> s/poiners/pointers

OK.

>> + * Store the original MPIDR value in an extra array to support read-as-written.
>> + * Unallocated MPIDRs are translated to a special value and caught
>> + * before any array accesses.
> 
> We may have covered this already, but why can't we restore the original
> MPIDR based on the the irq_spi_cpu array?
> 
> Is that because we loose information about 'which' unallocated MPIDR was
> written?

Yes.

> If that's the case, it seems weird that we go through the
> trouble but we anyway throw away the aff3 field...?

Not supporting the aff3 field saves us from caring about atomicity on
GICD_IROUTER accesses (where aff3 is in the upper word of this 64-bit
register).
Not supporting Aff3 is an architectural option in the GICv3, so this
seems like a viable solution.
I had some code to support "real" 64-bit accesses, which would allow
Aff3 support, but have to fight this through Marc first sometimes in the
future again ;-)

>> + */
>> +static bool handle_mmio_route_reg(struct kvm_vcpu *vcpu,
>> +                               struct kvm_exit_mmio *mmio,
>> +                               phys_addr_t offset)
>> +{
>> +     struct kvm *kvm = vcpu->kvm;
>> +     struct vgic_dist *dist = &kvm->arch.vgic;
>> +     int spi;
>> +     u32 reg;
>> +     int vcpu_id;
>> +     unsigned long *bmap, mpidr;
>> +
>> +     /*
>> +      * The upper 32 bits of each 64 bit register are zero,
>> +      * as we don't support Aff3.
>> +      */
>> +     if ((offset & 4)) {
>> +             vgic_reg_access(mmio, NULL, offset,
>> +                             ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
>> +             return false;
>> +     }
>> +
>> +     /* This region only covers SPIs, so no handling of private IRQs here. */
>> +     spi = offset / 8;
> 
> that's not how I read the spec, it says that GICD_IROUTER0 to
> GICD_IROUTER1 are not implemented (because they are SGIs and PPIs), and
> I read the 'SPI ID m' as the lowest numbered SPI ID being 32, thus you
> should do:
> 
> spi = offset / 8 - VGIC_NR_PRIVATE_IRQS;

Well, below I changed the description of the IROUTER range to:
+     {
+             .base           = GICD_IROUTER + 0x100,
+             .len            = 0x1edc,
+             .bits_per_irq   = 64,
+             .handle_mmio    = handle_mmio_route_reg,
+     },

This was due to a comment on v3 by you, where you correctly stated the
difference in the spec's description between IROUTER and the other
registers regarding the private IRQ handling (not implemented/reserved
vs. RAZ/WI).

So the offset in this function is relative to 0x6100 and thus depicts
directly the SPI number.

>> +
>> +     /* get the stored MPIDR for this IRQ */
>> +     mpidr = uncompress_mpidr(dist->irq_spi_mpidr[spi]);
>> +     mpidr &= MPIDR_HWID_BITMASK;
> 
> is this mask needed after calling uncompress above?

Indeed not.

>> +     reg = mpidr;
>> +
>> +     vgic_reg_access(mmio, &reg, offset,
>> +                     ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
>> +
>> +     if (!mmio->is_write)
>> +             return false;
>> +
>> +     /*
>> +      * Now clear the currently assigned vCPU from the map, making room
>> +      * for the new one to be written below
>> +      */
>> +     vcpu = kvm_mpidr_to_vcpu(kvm, mpidr);
>> +     if (likely(vcpu)) {
>> +             vcpu_id = vcpu->vcpu_id;
>> +             bmap = vgic_bitmap_get_shared_map(&dist->irq_spi_target[vcpu_id]);
>> +             __clear_bit(spi, bmap);
>> +     }
>> +
>> +     dist->irq_spi_mpidr[spi] = compress_mpidr(reg);
>> +     vcpu = kvm_mpidr_to_vcpu(kvm, reg & MPIDR_HWID_BITMASK);
>> +
>> +     /*
>> +      * The spec says that non-existent MPIDR values should not be
>> +      * forwarded to any existent (v)CPU, but should be able to become
>> +      * pending anyway. We simply keep the irq_spi_target[] array empty, so
>> +      * the interrupt will never be injected.
>> +      * irq_spi_cpu[irq] gets a magic value in this case.
>> +      */
>> +     if (likely(vcpu)) {
>> +             vcpu_id = vcpu->vcpu_id;
>> +             dist->irq_spi_cpu[spi] = vcpu_id;
>> +             bmap = vgic_bitmap_get_shared_map(&dist->irq_spi_target[vcpu_id]);
>> +             __set_bit(spi, bmap);
>> +     } else {
>> +             dist->irq_spi_cpu[spi] = VCPU_NOT_ALLOCATED;
>> +     }
>> +
>> +     vgic_update_state(kvm);
>> +
>> +     return true;
>> +}
>> +
>> +/*
>> + * We should be careful about promising too much when a guest reads
>> + * this register. Don't claim to be like any hardware implementation,
>> + * but just report the GIC as version 3 - which is what a Linux guest
>> + * would check.
>> + */
>> +static bool handle_mmio_idregs(struct kvm_vcpu *vcpu,
>> +                            struct kvm_exit_mmio *mmio,
>> +                            phys_addr_t offset)
>> +{
>> +     u32 reg = 0;
>> +
>> +     switch (offset + GICD_IDREGS) {
>> +     case GICD_PIDR2:
>> +             reg = 0x3b;
>> +             break;
>> +     }
>> +
>> +     vgic_reg_access(mmio, &reg, offset,
>> +                     ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
>> +
>> +     return false;
>> +}
>> +
>> +static const struct kvm_mmio_range vgic_v3_dist_ranges[] = {
>> +     {
>> +             .base           = GICD_CTLR,
>> +             .len            = 0x04,
>> +             .bits_per_irq   = 0,
>> +             .handle_mmio    = handle_mmio_ctlr,
>> +     },
>> +     {
>> +             .base           = GICD_TYPER,
>> +             .len            = 0x04,
>> +             .bits_per_irq   = 0,
>> +             .handle_mmio    = handle_mmio_typer,
>> +     },
>> +     {
>> +             .base           = GICD_IIDR,
>> +             .len            = 0x04,
>> +             .bits_per_irq   = 0,
>> +             .handle_mmio    = handle_mmio_iidr,
>> +     },
>> +     {
>> +             /* this register is optional, it is RAZ/WI if not implemented */
>> +             .base           = GICD_STATUSR,
>> +             .len            = 0x04,
>> +             .bits_per_irq   = 0,
>> +             .handle_mmio    = handle_mmio_raz_wi,
>> +     },
>> +     {
>> +             /* this write only register is WI when TYPER.MBIS=0 */
>> +             .base           = GICD_SETSPI_NSR,
>> +             .len            = 0x04,
>> +             .bits_per_irq   = 0,
>> +             .handle_mmio    = handle_mmio_raz_wi,
>> +     },
>> +     {
>> +             /* this write only register is WI when TYPER.MBIS=0 */
>> +             .base           = GICD_CLRSPI_NSR,
>> +             .len            = 0x04,
>> +             .bits_per_irq   = 0,
>> +             .handle_mmio    = handle_mmio_raz_wi,
>> +     },
>> +     {
>> +             /* this is RAZ/WI when DS=1 */
>> +             .base           = GICD_SETSPI_SR,
>> +             .len            = 0x04,
>> +             .bits_per_irq   = 0,
>> +             .handle_mmio    = handle_mmio_raz_wi,
>> +     },
>> +     {
>> +             /* this is RAZ/WI when DS=1 */
>> +             .base           = GICD_CLRSPI_SR,
>> +             .len            = 0x04,
>> +             .bits_per_irq   = 0,
>> +             .handle_mmio    = handle_mmio_raz_wi,
>> +     },
>> +     {
>> +             .base           = GICD_IGROUPR,
>> +             .len            = 0x80,
>> +             .bits_per_irq   = 1,
>> +             .handle_mmio    = handle_mmio_rao_wi,
>> +     },
>> +     {
>> +             .base           = GICD_ISENABLER,
>> +             .len            = 0x80,
>> +             .bits_per_irq   = 1,
>> +             .handle_mmio    = handle_mmio_set_enable_reg_dist,
>> +     },
>> +     {
>> +             .base           = GICD_ICENABLER,
>> +             .len            = 0x80,
>> +             .bits_per_irq   = 1,
>> +             .handle_mmio    = handle_mmio_clear_enable_reg_dist,
>> +     },
>> +     {
>> +             .base           = GICD_ISPENDR,
>> +             .len            = 0x80,
>> +             .bits_per_irq   = 1,
>> +             .handle_mmio    = handle_mmio_set_pending_reg_dist,
>> +     },
>> +     {
>> +             .base           = GICD_ICPENDR,
>> +             .len            = 0x80,
>> +             .bits_per_irq   = 1,
>> +             .handle_mmio    = handle_mmio_clear_pending_reg_dist,
>> +     },
>> +     {
>> +             .base           = GICD_ISACTIVER,
>> +             .len            = 0x80,
>> +             .bits_per_irq   = 1,
>> +             .handle_mmio    = handle_mmio_raz_wi,
>> +     },
>> +     {
>> +             .base           = GICD_ICACTIVER,
>> +             .len            = 0x80,
>> +             .bits_per_irq   = 1,
>> +             .handle_mmio    = handle_mmio_raz_wi,
>> +     },
>> +     {
>> +             .base           = GICD_IPRIORITYR,
>> +             .len            = 0x400,
>> +             .bits_per_irq   = 8,
>> +             .handle_mmio    = handle_mmio_priority_reg_dist,
>> +     },
>> +     {
>> +             /* TARGETSRn is RES0 when ARE=1 */
>> +             .base           = GICD_ITARGETSR,
>> +             .len            = 0x400,
>> +             .bits_per_irq   = 8,
>> +             .handle_mmio    = handle_mmio_raz_wi,
>> +     },
>> +     {
>> +             .base           = GICD_ICFGR,
>> +             .len            = 0x100,
>> +             .bits_per_irq   = 2,
>> +             .handle_mmio    = handle_mmio_cfg_reg_dist,
>> +     },
>> +     {
>> +             /* this is RAZ/WI when DS=1 */
>> +             .base           = GICD_IGRPMODR,
>> +             .len            = 0x80,
>> +             .bits_per_irq   = 1,
>> +             .handle_mmio    = handle_mmio_raz_wi,
>> +     },
>> +     {
>> +             /* this is RAZ/WI when DS=1 */
>> +             .base           = GICD_NSACR,
>> +             .len            = 0x100,
>> +             .bits_per_irq   = 2,
>> +             .handle_mmio    = handle_mmio_raz_wi,
>> +     },
>> +     {
>> +             /* this is RAZ/WI when ARE=1 */
>> +             .base           = GICD_SGIR,
>> +             .len            = 0x04,
>> +             .handle_mmio    = handle_mmio_raz_wi,
>> +     },
>> +     {
>> +             /* this is RAZ/WI when ARE=1 */
>> +             .base           = GICD_CPENDSGIR,
>> +             .len            = 0x10,
>> +             .handle_mmio    = handle_mmio_raz_wi,
>> +     },
>> +     {
>> +             /* this is RAZ/WI when ARE=1 */
>> +             .base           = GICD_SPENDSGIR,
>> +             .len            = 0x10,
>> +             .handle_mmio    = handle_mmio_raz_wi,
>> +     },
>> +     {
>> +             .base           = GICD_IROUTER + 0x100,
>> +             .len            = 0x1edc,
>> +             .bits_per_irq   = 64,
>> +             .handle_mmio    = handle_mmio_route_reg,
>> +     },
>> +     {
>> +             .base           = GICD_IDREGS,
>> +             .len            = 0x30,
>> +             .bits_per_irq   = 0,
>> +             .handle_mmio    = handle_mmio_idregs,
>> +     },
>> +     {},
>> +};
>> +
>> +static bool handle_mmio_set_enable_reg_redist(struct kvm_vcpu *vcpu,
>> +                                           struct kvm_exit_mmio *mmio,
>> +                                           phys_addr_t offset)
>> +{
>> +     struct kvm_vcpu *redist_vcpu = mmio->private;
>> +
>> +     return vgic_handle_enable_reg(vcpu->kvm, mmio, offset,
>> +                                   redist_vcpu->vcpu_id,
>> +                                   ACCESS_WRITE_SETBIT);
>> +}
>> +
>> +static bool handle_mmio_clear_enable_reg_redist(struct kvm_vcpu *vcpu,
>> +                                             struct kvm_exit_mmio *mmio,
>> +                                             phys_addr_t offset)
>> +{
>> +     struct kvm_vcpu *redist_vcpu = mmio->private;
>> +
>> +     return vgic_handle_enable_reg(vcpu->kvm, mmio, offset,
>> +                                   redist_vcpu->vcpu_id,
>> +                                   ACCESS_WRITE_CLEARBIT);
>> +}
>> +
>> +static bool handle_mmio_set_pending_reg_redist(struct kvm_vcpu *vcpu,
>> +                                            struct kvm_exit_mmio *mmio,
>> +                                            phys_addr_t offset)
>> +{
>> +     struct kvm_vcpu *redist_vcpu = mmio->private;
>> +
>> +     return vgic_handle_set_pending_reg(vcpu->kvm, mmio, offset,
>> +                                        redist_vcpu->vcpu_id);
>> +}
>> +
>> +static bool handle_mmio_clear_pending_reg_redist(struct kvm_vcpu *vcpu,
>> +                                              struct kvm_exit_mmio *mmio,
>> +                                              phys_addr_t offset)
>> +{
>> +     struct kvm_vcpu *redist_vcpu = mmio->private;
>> +
>> +     return vgic_handle_clear_pending_reg(vcpu->kvm, mmio, offset,
>> +                                          redist_vcpu->vcpu_id);
>> +}
>> +
>> +static bool handle_mmio_priority_reg_redist(struct kvm_vcpu *vcpu,
>> +                                         struct kvm_exit_mmio *mmio,
>> +                                         phys_addr_t offset)
>> +{
>> +     struct kvm_vcpu *redist_vcpu = mmio->private;
>> +     u32 *reg;
>> +
>> +     reg = vgic_bytemap_get_reg(&vcpu->kvm->arch.vgic.irq_priority,
>> +                                redist_vcpu->vcpu_id, offset);
>> +     vgic_reg_access(mmio, reg, offset,
>> +                     ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
>> +     return false;
>> +}
>> +
>> +static bool handle_mmio_cfg_reg_redist(struct kvm_vcpu *vcpu,
>> +                                    struct kvm_exit_mmio *mmio,
>> +                                    phys_addr_t offset)
>> +{
>> +     struct kvm_vcpu *redist_vcpu = mmio->private;
>> +
>> +     u32 *reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_cfg,
>> +                                    redist_vcpu->vcpu_id, offset >> 1);
>> +
>> +     return vgic_handle_cfg_reg(reg, mmio, offset);
>> +}
>> +
>> +static const struct kvm_mmio_range vgic_redist_sgi_ranges[] = {
>> +     {
>> +             .base           = GICR_IGROUPR0,
>> +             .len            = 0x04,
>> +             .bits_per_irq   = 1,
>> +             .handle_mmio    = handle_mmio_raz_wi,
> 
> I think you were going to change this ro handle_mmio_rao_wi() instead?

Indeed, thanks for spotting!

>> +     },
>> +     {
>> +             .base           = GICR_ISENABLER0,
>> +             .len            = 0x04,
>> +             .bits_per_irq   = 1,
>> +             .handle_mmio    = handle_mmio_set_enable_reg_redist,
>> +     },
>> +     {
>> +             .base           = GICR_ICENABLER0,
>> +             .len            = 0x04,
>> +             .bits_per_irq   = 1,
>> +             .handle_mmio    = handle_mmio_clear_enable_reg_redist,
>> +     },
>> +     {
>> +             .base           = GICR_ISPENDR0,
>> +             .len            = 0x04,
>> +             .bits_per_irq   = 1,
>> +             .handle_mmio    = handle_mmio_set_pending_reg_redist,
>> +     },
>> +     {
>> +             .base           = GICR_ICPENDR0,
>> +             .len            = 0x04,
>> +             .bits_per_irq   = 1,
>> +             .handle_mmio    = handle_mmio_clear_pending_reg_redist,
>> +     },
>> +     {
>> +             .base           = GICR_ISACTIVER0,
>> +             .len            = 0x04,
>> +             .bits_per_irq   = 1,
>> +             .handle_mmio    = handle_mmio_raz_wi,
>> +     },
>> +     {
>> +             .base           = GICR_ICACTIVER0,
>> +             .len            = 0x04,
>> +             .bits_per_irq   = 1,
>> +             .handle_mmio    = handle_mmio_raz_wi,
>> +     },
>> +     {
>> +             .base           = GICR_IPRIORITYR0,
>> +             .len            = 0x20,
>> +             .bits_per_irq   = 8,
>> +             .handle_mmio    = handle_mmio_priority_reg_redist,
>> +     },
>> +     {
>> +             .base           = GICR_ICFGR0,
>> +             .len            = 0x08,
>> +             .bits_per_irq   = 2,
>> +             .handle_mmio    = handle_mmio_cfg_reg_redist,
>> +     },
>> +     {
>> +             .base           = GICR_IGRPMODR0,
>> +             .len            = 0x04,
>> +             .bits_per_irq   = 1,
>> +             .handle_mmio    = handle_mmio_raz_wi,
>> +     },
>> +     {
>> +             .base           = GICR_NSACR,
>> +             .len            = 0x04,
>> +             .handle_mmio    = handle_mmio_raz_wi,
>> +     },
>> +     {},
>> +};
>> +
>> +static bool handle_mmio_ctlr_redist(struct kvm_vcpu *vcpu,
>> +                                 struct kvm_exit_mmio *mmio,
>> +                                 phys_addr_t offset)
>> +{
>> +     /* since we don't support LPIs, this register is zero for now */
>> +     vgic_reg_access(mmio, NULL, offset,
>> +                     ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
>> +     return false;
>> +}
>> +
>> +static bool handle_mmio_typer_redist(struct kvm_vcpu *vcpu,
>> +                                  struct kvm_exit_mmio *mmio,
>> +                                  phys_addr_t offset)
>> +{
>> +     u32 reg;
>> +     u64 mpidr;
>> +     struct kvm_vcpu *redist_vcpu = mmio->private;
>> +     int target_vcpu_id = redist_vcpu->vcpu_id;
>> +
>> +     /* the upper 32 bits contain the affinity value */
>> +     if ((offset & ~3) == 4) {
>> +             mpidr = kvm_vcpu_get_mpidr_aff(redist_vcpu);
>> +             reg = compress_mpidr(mpidr);
>> +
>> +             vgic_reg_access(mmio, &reg, offset,
>> +                             ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
>> +             return false;
>> +     }
>> +
>> +     reg = redist_vcpu->vcpu_id << 8;
>> +     if (target_vcpu_id == atomic_read(&vcpu->kvm->online_vcpus) - 1)
>> +             reg |= GICR_TYPER_LAST;
>> +     vgic_reg_access(mmio, &reg, offset,
>> +                     ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
>> +     return false;
>> +}
>> +
>> +static const struct kvm_mmio_range vgic_redist_ranges[] = {
>> +     {
>> +             .base           = GICR_CTLR,
>> +             .len            = 0x04,
>> +             .bits_per_irq   = 0,
>> +             .handle_mmio    = handle_mmio_ctlr_redist,
>> +     },
>> +     {
>> +             .base           = GICR_TYPER,
>> +             .len            = 0x08,
>> +             .bits_per_irq   = 0,
>> +             .handle_mmio    = handle_mmio_typer_redist,
>> +     },
>> +     {
>> +             .base           = GICR_IIDR,
>> +             .len            = 0x04,
>> +             .bits_per_irq   = 0,
>> +             .handle_mmio    = handle_mmio_iidr,
>> +     },
>> +     {
>> +             .base           = GICR_WAKER,
>> +             .len            = 0x04,
>> +             .bits_per_irq   = 0,
>> +             .handle_mmio    = handle_mmio_raz_wi,
>> +     },
>> +     {
>> +             .base           = GICR_IDREGS,
>> +             .len            = 0x30,
>> +             .bits_per_irq   = 0,
>> +             .handle_mmio    = handle_mmio_idregs,
>> +     },
>> +     {},
>> +};
>> +
>> +/*
>> + * This function splits accesses between the distributor and the two
>> + * redistributor parts (private/SPI). As each redistributor is accessible
>> + * from any CPU, we have to determine the affected VCPU by taking the faulting
>> + * address into account. We then pass this VCPU to the handler function via
>> + * the private parameter.
>> + */
>> +#define SGI_BASE_OFFSET SZ_64K
>> +static bool vgic_v3_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
>> +                             struct kvm_exit_mmio *mmio)
>> +{
>> +     struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>> +     unsigned long dbase = dist->vgic_dist_base;
>> +     unsigned long rdbase = dist->vgic_redist_base;
>> +     int nrcpus = atomic_read(&vcpu->kvm->online_vcpus);
>> +     int vcpu_id;
>> +     const struct kvm_mmio_range *mmio_range;
>> +
>> +     if (is_in_range(mmio->phys_addr, mmio->len, dbase, GIC_V3_DIST_SIZE)) {
>> +             return vgic_handle_mmio_range(vcpu, run, mmio,
>> +                                           vgic_v3_dist_ranges, dbase);
>> +     }
>> +
>> +     if (!is_in_range(mmio->phys_addr, mmio->len, rdbase,
>> +         GIC_V3_REDIST_SIZE * nrcpus))
>> +             return false;
> 
> Did you think more about the contiguous allocation issue here or can you
> give me a pointer to the requirement in the spec?

5.4.1 Re-Distributor Addressing

>> +
>> +     vcpu_id = (mmio->phys_addr - rdbase) / GIC_V3_REDIST_SIZE;
>> +     rdbase += (vcpu_id * GIC_V3_REDIST_SIZE);
>> +     mmio->private = kvm_get_vcpu(vcpu->kvm, vcpu_id);
>> +
>> +     if (mmio->phys_addr >= rdbase + SGI_BASE_OFFSET) {
>> +             rdbase += SGI_BASE_OFFSET;
>> +             mmio_range = vgic_redist_sgi_ranges;
>> +     } else {
>> +             mmio_range = vgic_redist_ranges;
>> +     }
>> +     return vgic_handle_mmio_range(vcpu, run, mmio, mmio_range, rdbase);
>> +}
>> +
>> +static bool vgic_v3_queue_sgi(struct kvm_vcpu *vcpu, int irq)
>> +{
>> +     if (vgic_queue_irq(vcpu, 0, irq)) {
>> +             vgic_dist_irq_clear_pending(vcpu, irq);
>> +             vgic_cpu_irq_clear(vcpu, irq);
>> +             return true;
>> +     }
>> +
>> +     return false;
>> +}
>> +
>> +static int vgic_v3_init_maps(struct vgic_dist *dist)
>> +{
>> +     int nr_spis = dist->nr_irqs - VGIC_NR_PRIVATE_IRQS;
>> +
>> +     dist->irq_spi_mpidr = kcalloc(nr_spis, sizeof(dist->irq_spi_mpidr[0]),
>> +                                   GFP_KERNEL);
>> +
>> +     if (!dist->irq_spi_mpidr)
>> +             return -ENOMEM;
>> +
>> +     return 0;
>> +}
>> +
>> +static int vgic_v3_init(struct kvm *kvm, const struct vgic_params *params)
>> +{
>> +     struct vgic_dist *dist = &kvm->arch.vgic;
>> +     int ret, i;
>> +     u32 mpidr;
>> +
>> +     if (IS_VGIC_ADDR_UNDEF(dist->vgic_dist_base) ||
>> +         IS_VGIC_ADDR_UNDEF(dist->vgic_redist_base)) {
>> +             kvm_err("Need to set vgic distributor addresses first\n");
>> +             return -ENXIO;
>> +     }
>> +
>> +     /*
>> +      * FIXME: this should be moved to init_maps time, and may bite
>> +      * us when adding save/restore. Add a per-emulation hook?
>> +      */
> 
> progress on this fixme?

Progress supplies the ISS, but not this piece of code (read: none) ;-)
I am more in favour of a follow-up patch on this one ...

Many thanks for reading through all this mess^Wcode ... again.

Cheers,
Andre.

>> +     ret = vgic_v3_init_maps(dist);
>> +     if (ret) {
>> +             kvm_err("Unable to allocate maps\n");
>> +             return ret;
>> +     }
>> +
>> +     /* Initialize the target VCPUs for each IRQ to VCPU 0 */
>> +     mpidr = compress_mpidr(kvm_vcpu_get_mpidr_aff(kvm_get_vcpu(kvm, 0)));
>> +     for (i = VGIC_NR_PRIVATE_IRQS; i < dist->nr_irqs; i++) {
>> +             dist->irq_spi_cpu[i - VGIC_NR_PRIVATE_IRQS] = 0;
>> +             dist->irq_spi_mpidr[i - VGIC_NR_PRIVATE_IRQS] = mpidr;
>> +             vgic_bitmap_set_irq_val(dist->irq_spi_target, 0, i, 1);
>> +     }
>> +
>> +     return 0;
>> +}
>> +
>> +/* GICv3 does not keep track of SGI sources anymore. */
>> +static void vgic_v3_add_sgi_source(struct kvm_vcpu *vcpu, int irq, int source)
>> +{
>> +}
>> +
>> +int vgic_v3_init_emulation(struct kvm *kvm)
>> +{
>> +     struct vgic_dist *dist = &kvm->arch.vgic;
>> +
>> +     dist->vm_ops.handle_mmio = vgic_v3_handle_mmio;
>> +     dist->vm_ops.queue_sgi = vgic_v3_queue_sgi;
>> +     dist->vm_ops.add_sgi_source = vgic_v3_add_sgi_source;
>> +     dist->vm_ops.vgic_init = vgic_v3_init;
>> +
>> +     kvm->arch.max_vcpus = KVM_MAX_VCPUS;
>> +
>> +     return 0;
>> +}
>> +
>> +static int vgic_v3_create(struct kvm_device *dev, u32 type)
>> +{
>> +     return kvm_vgic_create(dev->kvm, type);
>> +}
>> +
>> +static void vgic_v3_destroy(struct kvm_device *dev)
>> +{
>> +     kfree(dev);
>> +}
>> +
>> +static int vgic_v3_set_attr(struct kvm_device *dev,
>> +                         struct kvm_device_attr *attr)
>> +{
>> +     int ret;
>> +
>> +     ret = vgic_set_common_attr(dev, attr);
>> +     if (ret != -ENXIO)
>> +             return ret;
>> +
>> +     switch (attr->group) {
>> +     case KVM_DEV_ARM_VGIC_GRP_DIST_REGS:
>> +     case KVM_DEV_ARM_VGIC_GRP_CPU_REGS:
>> +             return -ENXIO;
>> +     }
>> +
>> +     return -ENXIO;
>> +}
>> +
>> +static int vgic_v3_get_attr(struct kvm_device *dev,
>> +                         struct kvm_device_attr *attr)
>> +{
>> +     int ret;
>> +
>> +     ret = vgic_get_common_attr(dev, attr);
>> +     if (ret != -ENXIO)
>> +             return ret;
>> +
>> +     switch (attr->group) {
>> +     case KVM_DEV_ARM_VGIC_GRP_DIST_REGS:
>> +     case KVM_DEV_ARM_VGIC_GRP_CPU_REGS:
>> +             return -ENXIO;
>> +     }
>> +
>> +     return -ENXIO;
>> +}
>> +
>> +static int vgic_v3_has_attr(struct kvm_device *dev,
>> +                         struct kvm_device_attr *attr)
>> +{
>> +     switch (attr->group) {
>> +     case KVM_DEV_ARM_VGIC_GRP_ADDR:
>> +             switch (attr->attr) {
>> +             case KVM_VGIC_V2_ADDR_TYPE_DIST:
>> +             case KVM_VGIC_V2_ADDR_TYPE_CPU:
>> +                     return -ENXIO;
>> +             }
>> +             break;
>> +     case KVM_DEV_ARM_VGIC_GRP_DIST_REGS:
>> +     case KVM_DEV_ARM_VGIC_GRP_CPU_REGS:
>> +             return -ENXIO;
>> +     case KVM_DEV_ARM_VGIC_GRP_NR_IRQS:
>> +             return 0;
>> +     }
>> +     return -ENXIO;
>> +}
>> +
>> +struct kvm_device_ops kvm_arm_vgic_v3_ops = {
>> +     .name = "kvm-arm-vgic-v3",
>> +     .create = vgic_v3_create,
>> +     .destroy = vgic_v3_destroy,
>> +     .set_attr = vgic_v3_set_attr,
>> +     .get_attr = vgic_v3_get_attr,
>> +     .has_attr = vgic_v3_has_attr,
>> +};
>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>> index 335ffe0..b7de0f8 100644
>> --- a/virt/kvm/arm/vgic.c
>> +++ b/virt/kvm/arm/vgic.c
>> @@ -1249,7 +1249,7 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>>       struct kvm_vcpu *vcpu;
>>       int edge_triggered, level_triggered;
>>       int enabled;
>> -     bool ret = true;
>> +     bool ret = true, can_inject = true;
>>
>>       spin_lock(&dist->lock);
>>
>> @@ -1264,6 +1264,11 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>>
>>       if (irq_num >= VGIC_NR_PRIVATE_IRQS) {
>>               cpuid = dist->irq_spi_cpu[irq_num - VGIC_NR_PRIVATE_IRQS];
>> +             if (cpuid == VCPU_NOT_ALLOCATED) {
>> +                     /* Pretend we use CPU0, and prevent injection */
>> +                     cpuid = 0;
>> +                     can_inject = false;
>> +             }
>>               vcpu = kvm_get_vcpu(kvm, cpuid);
>>       }
>>
>> @@ -1285,7 +1290,7 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>>
>>       enabled = vgic_irq_is_enabled(vcpu, irq_num);
>>
>> -     if (!enabled) {
>> +     if (!enabled || !can_inject) {
>>               ret = false;
>>               goto out;
>>       }
>> @@ -1438,6 +1443,7 @@ void kvm_vgic_destroy(struct kvm *kvm)
>>       }
>>       kfree(dist->irq_sgi_sources);
>>       kfree(dist->irq_spi_cpu);
>> +     kfree(dist->irq_spi_mpidr);
>>       kfree(dist->irq_spi_target);
>>       kfree(dist->irq_pending_on_cpu);
>>       dist->irq_sgi_sources = NULL;
>> @@ -1628,6 +1634,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
>>       kvm->arch.vgic.vctrl_base = vgic->vctrl_base;
>>       kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
>>       kvm->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
>> +     kvm->arch.vgic.vgic_redist_base = VGIC_ADDR_UNDEF;
>>
>>  out_unlock:
>>       for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
>> diff --git a/virt/kvm/arm/vgic.h b/virt/kvm/arm/vgic.h
>> index ff3171a..b0c6b2f 100644
>> --- a/virt/kvm/arm/vgic.h
>> +++ b/virt/kvm/arm/vgic.h
>> @@ -35,6 +35,8 @@
>>  #define ACCESS_WRITE_VALUE   (3 << 1)
>>  #define ACCESS_WRITE_MASK(x) ((x) & (3 << 1))
>>
>> +#define VCPU_NOT_ALLOCATED   ((u8)-1)
>> +
>>  unsigned long *vgic_bitmap_get_shared_map(struct vgic_bitmap *x);
>>
>>  void vgic_update_state(struct kvm *kvm);
>> @@ -115,5 +117,6 @@ int vgic_set_common_attr(struct kvm_device *dev, struct kvm_device_attr *attr);
>>  int vgic_get_common_attr(struct kvm_device *dev, struct kvm_device_attr *attr);
>>
>>  int vgic_v2_init_emulation(struct kvm *kvm);
>> +int vgic_v3_init_emulation(struct kvm *kvm);
>>
>>  #endif
>> --
>> 1.7.9.5
>>
> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 17/19] arm64: KVM: add SGI generation register emulation
  2014-11-23 15:08   ` Christoffer Dall
@ 2014-11-24 16:37     ` Andre Przywara
  2014-11-25 11:03       ` Christoffer Dall
  0 siblings, 1 reply; 80+ messages in thread
From: Andre Przywara @ 2014-11-24 16:37 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 23/11/14 15:08, Christoffer Dall wrote:
> On Fri, Nov 14, 2014 at 10:08:01AM +0000, Andre Przywara wrote:
>> While the generation of a (virtual) inter-processor interrupt (SGI)
>> on a GICv2 works by writing to a MMIO register, GICv3 uses the system
>> register ICC_SGI1R_EL1 to trigger them.
>> Trap that register on ARM64 hosts and handle it in a new handler
>> function in the GICv3 emulation code.
> 
> Did you reorder something or does my previous comment still apply that
> you're not enabling trapping yet, you're just adding the handler - those
> are two different things.

Yes, I can fix the wording.

> You sort of left my question about access_gic_sgi() not checking if the
> gicv3 is presetn hanging from the last thread, but I think I'm
> understanding properly now, that as long as you're not setting the
> ICC_SRE_EL2.Enable = 1, then we'll never get here, right?

Right, that is the idea. Just to make sure that I got this right from
the discussion the other day: We will not trap to EL2 as long as
ICC_SRE_EL2.Enable is 0 - which it should still be at this point, right?
(I am asking because I struggle to find this in the spec).

So actually your ICC_SRE_EL1 trap patch solved that problem ;-)

>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> ---
>> Changelog v3...v4:
>> - moved addition of vgic_v3_dispatch_sgi() from earlier patch into here
>> - move MPIDR comparison into extra function
>> - use new ICC_SGI1R_ field names
>> - improve readability of vgic_v3_dispatch_sgi()
>> - add and refine comments
>>
>>  arch/arm64/kvm/sys_regs.c   |   26 ++++++++++
>>  include/kvm/arm_vgic.h      |    1 +
>>  virt/kvm/arm/vgic-v3-emul.c |  113 +++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 140 insertions(+)
>>
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index fd3ffc3..e59369a 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -165,6 +165,27 @@ static bool access_sctlr(struct kvm_vcpu *vcpu,
>>  	return true;
>>  }
>>  
>> +/*
>> + * Trap handler for the GICv3 SGI generation system register.
>> + * Forward the request to the VGIC emulation.
>> + * The cp15_64 code makes sure this automatically works
>> + * for both AArch64 and AArch32 accesses.
>> + */
>> +static bool access_gic_sgi(struct kvm_vcpu *vcpu,
>> +			   const struct sys_reg_params *p,
>> +			   const struct sys_reg_desc *r)
>> +{
>> +	u64 val;
>> +
>> +	if (!p->is_write)
>> +		return read_from_write_only(vcpu, p);
>> +
>> +	val = *vcpu_reg(vcpu, p->Rt);
>> +	vgic_v3_dispatch_sgi(vcpu, val);
>> +
>> +	return true;
>> +}
>> +
>>  static bool trap_raz_wi(struct kvm_vcpu *vcpu,
>>  			const struct sys_reg_params *p,
>>  			const struct sys_reg_desc *r)
>> @@ -431,6 +452,9 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>>  	/* VBAR_EL1 */
>>  	{ Op0(0b11), Op1(0b000), CRn(0b1100), CRm(0b0000), Op2(0b000),
>>  	  NULL, reset_val, VBAR_EL1, 0 },
>> +	/* ICC_SGI1R_EL1 */
>> +	{ Op0(0b11), Op1(0b000), CRn(0b1100), CRm(0b1011), Op2(0b101),
>> +	  access_gic_sgi },
>>  	/* CONTEXTIDR_EL1 */
>>  	{ Op0(0b11), Op1(0b000), CRn(0b1101), CRm(0b0000), Op2(0b001),
>>  	  access_vm_reg, reset_val, CONTEXTIDR_EL1, 0 },
>> @@ -659,6 +683,8 @@ static const struct sys_reg_desc cp14_64_regs[] = {
>>   * register).
>>   */
>>  static const struct sys_reg_desc cp15_regs[] = {
>> +	{ Op1( 0), CRn( 0), CRm(12), Op2( 0), access_gic_sgi },
>> +
>>  	{ Op1( 0), CRn( 1), CRm( 0), Op2( 0), access_sctlr, NULL, c1_SCTLR },
>>  	{ Op1( 0), CRn( 2), CRm( 0), Op2( 0), access_vm_reg, NULL, c2_TTBR0 },
>>  	{ Op1( 0), CRn( 2), CRm( 0), Op2( 1), access_vm_reg, NULL, c2_TTBR1 },
>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>> index c1ef5a9..357a935 100644
>> --- a/include/kvm/arm_vgic.h
>> +++ b/include/kvm/arm_vgic.h
>> @@ -305,6 +305,7 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu);
>>  void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu);
>>  int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
>>  			bool level);
>> +void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
>>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>>  bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
>>  		      struct kvm_exit_mmio *mmio);
>> diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
>> index 97b5801..58d7457 100644
>> --- a/virt/kvm/arm/vgic-v3-emul.c
>> +++ b/virt/kvm/arm/vgic-v3-emul.c
>> @@ -828,6 +828,119 @@ int vgic_v3_init_emulation(struct kvm *kvm)
>>  	return 0;
>>  }
>>  
>> +/*
>> + * Compare a given affinity (level 1-3 and a level 0 mask, from the SGI
>> + * generation register ICC_SGI1R_EL1) with a given VCPU.
>> + * If the VCPU's MPIDR matches, return the level0 affinity, otherwise
>> + * return -1.
>> + */
>> +static int match_mpidr(u64 sgi_aff, u16 sgi_cpu_mask, struct kvm_vcpu *vcpu)
>> +{
>> +	unsigned long affinity;
>> +	int level0;
>> +
>> +	/*
>> +	 * Split the current VCPU's MPIDR into affinity level 0 and the
>> +	 * rest as this is what we have to compare against.
>> +	 */
>> +	affinity = kvm_vcpu_get_mpidr_aff(vcpu);
>> +	level0 = MPIDR_AFFINITY_LEVEL(affinity, 0);
>> +	affinity &= ~MPIDR_LEVEL_MASK;
>> +
>> +	/* bail out if the upper three levels don't match */
>> +	if (sgi_aff != affinity)
>> +		return -1;
>> +
>> +	/* Is this VCPU's bit set in the mask ? */
>> +	if (!(sgi_cpu_mask & BIT(level0)))
>> +		return -1;
>> +
>> +	return level0;
>> +}
>> +
>> +#define SGI_AFFINITY_LEVEL(reg, level) \
>> +	((((reg) & ICC_SGI1R_AFFINITY_## level ##_MASK) \
>> +	>> ICC_SGI1R_AFFINITY_## level ##_SHIFT) << MPIDR_LEVEL_SHIFT(level))
>> +
>> +/**
>> + * vgic_v3_dispatch_sgi - handle SGI requests from VCPUs
>> + * @vcpu: The VCPU requesting a SGI
>> + * @reg: The value written into the ICC_SGI1R_EL1 register by that VCPU
>> + *
>> + * With GICv3 (and ARE=1) CPUs trigger SGIs by writing to an architectural
> 
> what's a non-architectural system register?

architectural vs. implementation defined.
Are you suggesting that I should drop "architectural" because it is a
tautology?

>> + * system register. This will trap in sys_regs.c and call this function.
>> + * This ICC_SGI1R_EL1 register contains the upper three affinity levels of the
>> + * target processors as well as a bitmask of 16 Aff0 CPUs.
>> + * If the interrupt routing mode bit is not set, we iterate over all VCPUs to
>> + * check for matching ones. If this bit is set, we signal all, but not the
>> + * calling VCPU.
>> + */
>> +void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg)
>> +{
>> +	struct kvm *kvm = vcpu->kvm;
>> +	struct kvm_vcpu *c_vcpu;
>> +	struct vgic_dist *dist = &kvm->arch.vgic;
>> +	u16 target_cpus;
>> +	u64 mpidr;
>> +	int sgi, c, vcpu_id;
>> +	bool broadcast;
>> +	int updated = 0;
>> +
>> +	vcpu_id = vcpu->vcpu_id;
>> +
>> +	sgi = (reg & ICC_SGI1R_SGI_ID_MASK) >> ICC_SGI1R_SGI_ID_SHIFT;
>> +	broadcast = reg & BIT(ICC_SGI1R_IRQ_ROUTING_MODE_BIT);
>> +	target_cpus = (reg & ICC_SGI1R_TARGET_LIST_MASK) >> ICC_SGI1R_TARGET_LIST_SHIFT;
>> +	mpidr = SGI_AFFINITY_LEVEL(reg, 3);
>> +	mpidr |= SGI_AFFINITY_LEVEL(reg, 2);
>> +	mpidr |= SGI_AFFINITY_LEVEL(reg, 1);
>> +	mpidr &= ~MPIDR_LEVEL_MASK;
> 
> do you need this last mask?  It should be 0 already, right?

Indeed, I can drop this.

Thanks for looking at this!
Andre.

>> +
>> +	/*
>> +	 * We take the dist lock here, because we come from the sysregs
>> +	 * code path and not from the MMIO one (which already takes the lock).
>> +	 */
>> +	spin_lock(&dist->lock);
>> +
>> +	/*
>> +	 * We iterate over all VCPUs to find the MPIDRs matching the request.
>> +	 * If we have handled one CPU, we clear it's bit to detect early
>> +	 * if we are already finished. This avoids iterating through all
>> +	 * VCPUs when most of the times we just signal a single VCPU.
>> +	 */
>> +	kvm_for_each_vcpu(c, c_vcpu, kvm) {
>> +
>> +		/* Exit early if we have dealt with all requested CPUs */
>> +		if (!broadcast && target_cpus == 0)
>> +			break;
>> +
>> +		 /* Don't signal the calling VCPU */
>> +		if (broadcast && c == vcpu_id)
>> +			continue;
>> +
>> +		if (!broadcast) {
>> +			int level0;
>> +
>> +			level0 = match_mpidr(mpidr, target_cpus, c_vcpu);
>> +			if (level0 == -1)
>> +				continue;
>> +
>> +			/* remove this matching VCPU from the mask */
>> +			target_cpus &= ~BIT(level0);
>> +		}
>> +
>> +		/* Flag the SGI as pending */
>> +		vgic_dist_irq_set_pending(c_vcpu, sgi);
>> +		updated = 1;
>> +		kvm_debug("SGI%d from CPU%d to CPU%d\n", sgi, vcpu_id, c);
>> +	}
>> +	if (updated)
>> +		vgic_update_state(vcpu->kvm);
>> +	spin_unlock(&dist->lock);
>> +	if (updated)
>> +		vgic_kick_vcpus(vcpu->kvm);
>> +}
>> +
>>  static int vgic_v3_create(struct kvm_device *dev, u32 type)
>>  {
>>  	return kvm_vgic_create(dev->kvm, type);
>> -- 
>> 1.7.9.5
>>
> 
> Assuming you'll address the commit message stuff above:
> 
> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 18/19] arm/arm64: KVM: enable kernel side of GICv3 emulation
  2014-11-24  9:09   ` Christoffer Dall
@ 2014-11-24 17:41     ` Andre Przywara
  2014-11-25 11:08       ` Christoffer Dall
  0 siblings, 1 reply; 80+ messages in thread
From: Andre Przywara @ 2014-11-24 17:41 UTC (permalink / raw)
  To: linux-arm-kernel

(Marc, can you comment on the last question below about the unaligned
GICV mapping?)

Hi Christoffer,

On 24/11/14 09:09, Christoffer Dall wrote:
> On Fri, Nov 14, 2014 at 10:08:02AM +0000, Andre Przywara wrote:
>> With all the necessary GICv3 emulation code in place, we can now
>> connect the code to the GICv3 backend in the kernel.
>> The LR register handling is different depending on the emulated GIC
>> model, so provide different implementations for each.
>> Also allow non-v2-compatible GICv3 implementations (which don't
>> provide MMIO regions for the virtual CPU interface in the DT), but
>> restrict those hosts to support GICv3 guests only.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> ---
>> Changelog v3...v4:
>> - handle differences between GICv2-on-v3 and GICv3-on-v3 in existing functions
>> - remove init_*_emul() functions
>> - remove max_vcpus setting (done in earlier patches now)
>> - adapt to new vgic_v<n>_init_emulation behaviour
>>
>>  virt/kvm/arm/vgic-v3.c |   83 ++++++++++++++++++++++++++++++++----------------
>>  virt/kvm/arm/vgic.c    |    5 +++
>>  2 files changed, 60 insertions(+), 28 deletions(-)
>>
>> diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
>> index a04d208..4894c59 100644
>> --- a/virt/kvm/arm/vgic-v3.c
>> +++ b/virt/kvm/arm/vgic-v3.c
>> @@ -34,6 +34,7 @@
>>  #define GICH_LR_VIRTUALID		(0x3ffUL << 0)
>>  #define GICH_LR_PHYSID_CPUID_SHIFT	(10)
>>  #define GICH_LR_PHYSID_CPUID		(7UL << GICH_LR_PHYSID_CPUID_SHIFT)
>> +#define ICH_LR_VIRTUALID_MASK		(BIT_ULL(32) - 1)
>>  
>>  /*
>>   * LRs are stored in reverse order in memory. make sure we index them
>> @@ -48,12 +49,17 @@ static struct vgic_lr vgic_v3_get_lr(const struct kvm_vcpu *vcpu, int lr)
>>  	struct vgic_lr lr_desc;
>>  	u64 val = vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[LR_INDEX(lr)];
>>  
>> -	lr_desc.irq	= val & GICH_LR_VIRTUALID;
>> -	if (lr_desc.irq <= 15)
>> -		lr_desc.source	= (val >> GICH_LR_PHYSID_CPUID_SHIFT) & 0x7;
>> +	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
>> +		lr_desc.irq = val & ICH_LR_VIRTUALID_MASK;
>>  	else
>> -		lr_desc.source = 0;
>> -	lr_desc.state	= 0;
>> +		lr_desc.irq = val & GICH_LR_VIRTUALID;
>> +
>> +	lr_desc.source = 0;
>> +	if (lr_desc.irq <= 15 &&
>> +	    vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V2)
>> +		lr_desc.source	= (val >> GICH_LR_PHYSID_CPUID_SHIFT) & 0x7;
>> +
>> +	lr_desc.state   = 0;
> 
> super-nit-only-if-you-respin: you have a couple of tabs and extra spaces
> in the two lines above that need to just be a single space before the
> assignment operator on each line.
> 
>>  
>>  	if (val & ICH_LR_PENDING_BIT)
>>  		lr_desc.state |= LR_STATE_PENDING;
>> @@ -68,8 +74,20 @@ static struct vgic_lr vgic_v3_get_lr(const struct kvm_vcpu *vcpu, int lr)
>>  static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
>>  			   struct vgic_lr lr_desc)
>>  {
>> -	u64 lr_val = (((u32)lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT) |
>> -		      lr_desc.irq);
>> +	u64 lr_val;
>> +
>> +	lr_val = lr_desc.irq;
>> +
>> +	/*
>> +	 * currently all guest IRQs are Group1, as Group0 would result
> 
> I guess you couldn't guess my comment, from last time, can you please
> begin sentences with upper-case?  (only if you re-spin).
> 
>> +	 * in a FIQ in the guest, which it wouldn't expect.
>> +	 * Eventually we want to make this configurable, so we may revisit
>> +	 * this in the future.
>> +	 */
>> +	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
>> +		lr_val |= ICH_LR_GROUP;
>> +	else
>> +		lr_val |= (u32)lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT;
>>  
>>  	if (lr_desc.state & LR_STATE_PENDING)
>>  		lr_val |= ICH_LR_PENDING_BIT;
>> @@ -154,7 +172,14 @@ static void vgic_v3_enable(struct kvm_vcpu *vcpu)
>>  	 */
>>  	vgic_v3->vgic_vmcr = 0;
>>  
>> -	vgic_v3->vgic_sre = 0;
>> +	/*
>> +	 * Set the SRE_EL1 value depending on the configured
>> +	 * emulated vGIC model.
>> +	 */
>> +	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
>> +		vgic_v3->vgic_sre = ICC_SRE_EL1_SRE;
> 
> I think we left some of my questions from last round unanswered here
> (you said you needed to go and think about it).  If the guest sets SRE=0
> we will not currently preserve this value I think.

Which is intentional, as we are following "4.4.7 Implementations with
Fixed System Register Enables", which allows RAO/WI semantics for the
SRE bit if the GIC implementation does not care about GICv2
compatibility (which is what we do for the guest).
Does that make sense or am I missing something?

> The comment should clearly indicate that we're choosing a reset value
> of the register for our implementation of the gic.
> 
> I'm fine with this change, but would like to know what the rationale
> behind it is; wouldn't guests always initialize this value?
> 
>> +	else
>> +		vgic_v3->vgic_sre = 0;
>>  
>>  	/* Get the show on the road... */
>>  	vgic_v3->vgic_hcr = ICH_HCR_EN;
>> @@ -215,28 +240,30 @@ int vgic_v3_probe(struct device_node *vgic_node,
>>  
>>  	gicv_idx += 3; /* Also skip GICD, GICC, GICH */
>>  	if (of_address_to_resource(vgic_node, gicv_idx, &vcpu_res)) {
>> -		kvm_err("Cannot obtain GICV region\n");
>> -		ret = -ENXIO;
>> -		goto out;
>> -	}
>> -
>> -	if (!PAGE_ALIGNED(vcpu_res.start)) {
>> -		kvm_err("GICV physical address 0x%llx not page aligned\n",
>> -			(unsigned long long)vcpu_res.start);
>> -		ret = -ENXIO;
>> -		goto out;
>> -	}
>> -
>> -	if (!PAGE_ALIGNED(resource_size(&vcpu_res))) {
>> -		kvm_err("GICV size 0x%llx not a multiple of page size 0x%lx\n",
>> -			(unsigned long long)resource_size(&vcpu_res),
>> -			PAGE_SIZE);
>> -		ret = -ENXIO;
>> -		goto out;
>> +		kvm_info("GICv3: GICv2 emulation not available\n");
>> +		vgic->vcpu_base = 0;
>> +	} else {
>> +		if (!PAGE_ALIGNED(vcpu_res.start)) {
>> +			kvm_err("GICV physical address 0x%llx not page aligned\n",
>> +				(unsigned long long)vcpu_res.start);
>> +			ret = -ENXIO;
>> +			goto out;
> 
> shouldn't we be allowing an emulated gicv3 using the system registers in
> this case then?

I guess not. If we have a GICV address that is not passing those tests,
we should warn the user about it instead of unexpectedly restricting the
virtualization capabilities, right?
If the user desperately wants to use GIC virtualization despite having
the odd mapping, he could remove the GICC/GICH/GICV at all from the DTB.

Marc, any thoughts?

Cheers,
Andre.

>> +		}
>> +
>> +		if (!PAGE_ALIGNED(resource_size(&vcpu_res))) {
>> +			kvm_err("GICV size 0x%llx not a multiple of page size 0x%lx\n",
>> +				(unsigned long long)resource_size(&vcpu_res),
>> +				PAGE_SIZE);
>> +			ret = -ENXIO;
>> +			goto out;
> 
> ditto?
> 
>> +		}
>> +
>> +		vgic->vcpu_base = vcpu_res.start;
>> +		kvm_register_device_ops(&kvm_arm_vgic_v2_ops,
>> +					KVM_DEV_TYPE_ARM_VGIC_V2);
>>  	}
>> -	kvm_register_device_ops(&kvm_arm_vgic_v2_ops, KVM_DEV_TYPE_ARM_VGIC_V2);
>> +	kvm_register_device_ops(&kvm_arm_vgic_v3_ops, KVM_DEV_TYPE_ARM_VGIC_V3);
>>  
>> -	vgic->vcpu_base = vcpu_res.start;
>>  	vgic->vctrl_base = NULL;
>>  	vgic->type = VGIC_V3;
>>  	vgic->max_hw_vcpus = KVM_MAX_VCPUS;
>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>> index b7de0f8..1dbaeb5 100644
>> --- a/virt/kvm/arm/vgic.c
>> +++ b/virt/kvm/arm/vgic.c
>> @@ -1577,6 +1577,11 @@ static int init_vgic_model(struct kvm *kvm, int type)
>>  	case KVM_DEV_TYPE_ARM_VGIC_V2:
>>  		ret = vgic_v2_init_emulation(kvm);
>>  		break;
>> +#ifdef CONFIG_ARM_GIC_V3
>> +	case KVM_DEV_TYPE_ARM_VGIC_V3:
>> +		ret = vgic_v3_init_emulation(kvm);
>> +		break;
>> +#endif
>>  	default:
>>  		ret = -ENODEV;
>>  		break;
>> -- 
>> 1.7.9.5
>>
> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 00/19] KVM GICv3 emulation
  2014-11-24  9:33 ` [PATCH v4 00/19] KVM GICv3 emulation Eric Auger
@ 2014-11-24 17:46   ` Andre Przywara
  0 siblings, 0 replies; 80+ messages in thread
From: Andre Przywara @ 2014-11-24 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

On 24/11/14 09:33, Eric Auger wrote:
> On 11/14/2014 11:07 AM, Andre Przywara wrote:
>> This is version 4 of the GICv3 guest emulation series.
>>
[ ... ]
>>
>> Please review and test.
>> I would be grateful for people to test for GICv2 regressions also
>> (so on a GICv2 host with current kvmtool/qemu), as there is quite
>> some refactoring on that front.
> 
> Hi Andre,
> 
> I tested your kvm-gicv3/v4 branch on Calxeda Midway with GICv2 host in
> QEMU/VFIO passthrough use case. It behaves as expected.

Merci beaucoup, Eric!
Much appreciated.

Cheers,
Andre.

> 
> Best Regards
> 
> Eric
>>
>> Much of the code was inspired by MarcZ, also kudos to him for doing
>> the rather painful rebase on top of v3.17-rc1.
>>
>> Cheers,
>> Andre.
>>
>> [1] https://lists.cs.columbia.edu/pipermail/kvmarm/2014-June/010086.html
>>
>> Changes v3 ... v4:
>> * bug-fix in handling GICv3 redistributor CFG register
>> * move set/get_lr from gic_vm_ops back to vgic_ops (get rid of v3 06/19)
>> * getting rid of init_emul() at all
>> * rework guest GIC model initialization
>> * use non-atomic bit-set and bit-clear functions
>> * split up handle_mmio_misc* into multiple functions
>> * refine handling of some reserved registers
>> * use symbolic names for ICC_SGI1R_EL1 register fields (new patch 16/19)
>> * move private parameter from MMIO accessors to struct kvm_mmio_exit
>> * added documentation of new GICv3 guest device
>> * added lots of comments
>> * some renaming of identifiers
>> * minor changes in style and code flow of various functions
>>
>> Changes v2 ... v3:
>> * rebase to v3.18-rc2
>> * adapt to new kvm_register_device() function
>> * split up vm_ops patch and the GICv2 split-off patch to ease review
>> * various smaller changes due to Christoffer's review
>> * fix compilation for arm
>> * remove support for trapping SGI sysreg accesses on arm hosts
>>
>> Changes v1 ... v2:
>> * rebase to v3.17-rc1, caused quite some changes to the init code
>> * new 9/15 patch to make 10/15 smaller
>> * fix wrongly ordered cp15 register trap entry (MarcZ)
>> * fix SGI broadcast (thanks to wanghaibin for spotting)
>> * fix broken bailout path in kvm_vgic_create (wanghaibin)
>> * check return value of init_emulation_ops() (wanghaibin)
>> * fix return value check in vgic_[sg]et_attr()
>> * add header inclusion guards
>> * remove double definition of VCPU_NOT_ALLOCATED
>> * some code move-around
>> * whitespace fixes
>>
>> Andre Przywara (19):
>>   arm/arm64: KVM: rework MPIDR assignment and add accessors
>>   arm/arm64: KVM: pass down user space provided GIC type into vGIC code
>>   arm/arm64: KVM: refactor vgic_handle_mmio() function
>>   arm/arm64: KVM: wrap 64 bit MMIO accesses with two 32 bit ones
>>   arm/arm64: KVM: introduce per-VM ops
>>   arm/arm64: KVM: move kvm_register_device_ops() into vGIC probing
>>   arm/arm64: KVM: dont rely on a valid GICH base address
>>   arm/arm64: KVM: make the maximum number of vCPUs a per-VM value
>>   arm/arm64: KVM: make the value of ICC_SRE_EL1 a per-VM variable
>>   arm/arm64: KVM: refactor MMIO accessors
>>   arm/arm64: KVM: refactor/wrap vgic_set/get_attr()
>>   arm/arm64: KVM: add vgic.h header file
>>   arm/arm64: KVM: split GICv2 specific emulation code from vgic.c
>>   arm/arm64: KVM: add opaque private pointer to MMIO data
>>   arm/arm64: KVM: add virtual GICv3 distributor emulation
>>   arm64: GICv3: introduce symbolic names for GICv3 ICC_SGI1R_EL1 fields
>>   arm64: KVM: add SGI generation register emulation
>>   arm/arm64: KVM: enable kernel side of GICv3 emulation
>>   arm/arm64: KVM: allow userland to request a virtual GICv3
>>
>>  Documentation/virtual/kvm/devices/arm-vgic.txt |   21 +-
>>  arch/arm/include/asm/kvm_emulate.h             |    5 +-
>>  arch/arm/include/asm/kvm_host.h                |    3 +
>>  arch/arm/include/asm/kvm_mmio.h                |    1 +
>>  arch/arm/kvm/Makefile                          |    1 +
>>  arch/arm/kvm/arm.c                             |   23 +-
>>  arch/arm/kvm/psci.c                            |   17 +-
>>  arch/arm64/include/asm/kvm_emulate.h           |    5 +-
>>  arch/arm64/include/asm/kvm_host.h              |    5 +
>>  arch/arm64/include/asm/kvm_mmio.h              |    1 +
>>  arch/arm64/include/uapi/asm/kvm.h              |    7 +
>>  arch/arm64/kernel/asm-offsets.c                |    1 +
>>  arch/arm64/kvm/Makefile                        |    2 +
>>  arch/arm64/kvm/sys_regs.c                      |   37 +-
>>  arch/arm64/kvm/vgic-v3-switch.S                |   14 +-
>>  drivers/irqchip/irq-gic-v3.c                   |   14 +-
>>  include/kvm/arm_vgic.h                         |   34 +-
>>  include/linux/irqchip/arm-gic-v3.h             |   44 +
>>  include/linux/kvm_host.h                       |    2 +
>>  include/uapi/linux/kvm.h                       |    2 +
>>  virt/kvm/arm/vgic-v2-emul.c                    |  805 ++++++++++++++++++
>>  virt/kvm/arm/vgic-v2.c                         |    3 +
>>  virt/kvm/arm/vgic-v3-emul.c                    | 1020 +++++++++++++++++++++++
>>  virt/kvm/arm/vgic-v3.c                         |   89 +-
>>  virt/kvm/arm/vgic.c                            | 1065 ++++++------------------
>>  virt/kvm/arm/vgic.h                            |  122 +++
>>  26 files changed, 2469 insertions(+), 874 deletions(-)
>>  create mode 100644 virt/kvm/arm/vgic-v2-emul.c
>>  create mode 100644 virt/kvm/arm/vgic-v3-emul.c
>>  create mode 100644 virt/kvm/arm/vgic.h
>>
> 
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation
  2014-11-24 16:00     ` Andre Przywara
@ 2014-11-25 10:41       ` Christoffer Dall
  2014-11-28 15:24         ` Andre Przywara
  0 siblings, 1 reply; 80+ messages in thread
From: Christoffer Dall @ 2014-11-25 10:41 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Andre,

On Mon, Nov 24, 2014 at 04:00:46PM +0000, Andre Przywara wrote:

[...]

> >> +
> >> +/*
> >> + * As this implementation does not provide compatibility
> >> + * with GICv2 (ARE==1), we report zero CPUs in bits [5..7].
> >> + * Also LPIs and MBIs are not supported, so we set the respective bits to 0.
> >> + * Also we report at most 2**10=1024 interrupt IDs (to match 1024 SPIs).
> >> + */
> >> +#define INTERRUPT_ID_BITS 10
> >> +static bool handle_mmio_typer(struct kvm_vcpu *vcpu,
> >> +                           struct kvm_exit_mmio *mmio, phys_addr_t offset)
> >> +{
> >> +     u32 reg;
> >> +
> >> +     /* we report at most 1024 IRQs via this interface */
> > 
> > hmmm, do we need to repeat ourselves here?
> 
> Actually ... not.
> To avoid confusion, I will probably just drop this comment.
> 
> > I get a bit confused by both the comment above and here, as to *why* we
> > are reporting this value?  And what is the bit about 'this interface'?
> 
> With this interface I meant the number of SPIs which is communicated
> here in a GICv2 compatible way (ITLinesNumber). Looking forward to LPI
> support I didn't want to use the term IRQ without some confinement.
> 
> > Is there another interface.
> 
> IDbits, but admittedly this isn't clear from the comment.
> Not sure if that justifies more comments before we add ITS support, though.
> 
> > Perhaps what you're trying to get at here are the semantic differences
> > between ITLinesNumber and IDbits and how that helps a reader understand
> > the code.
> 
> I can add a better comment.
> 

I think you just need to clarify the comment above the function or let
the code speak for itself.

> >> +     reg = (min(vcpu->kvm->arch.vgic.nr_irqs, 1024) >> 5) - 1;
> >> +
> >> +     reg |= (INTERRUPT_ID_BITS - 1) << 19;
> >> +
> >> +     vgic_reg_access(mmio, &reg, offset,
> >> +                     ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
> >> +
> >> +     return false;
> >> +}
> >> +


[...]

> >> + * Store the original MPIDR value in an extra array to support read-as-written.
> >> + * Unallocated MPIDRs are translated to a special value and caught
> >> + * before any array accesses.
> > 
> > We may have covered this already, but why can't we restore the original
> > MPIDR based on the the irq_spi_cpu array?
> > 
> > Is that because we loose information about 'which' unallocated MPIDR was
> > written?
> 
> Yes.
> 
> > If that's the case, it seems weird that we go through the
> > trouble but we anyway throw away the aff3 field...?
> 
> Not supporting the aff3 field saves us from caring about atomicity on
> GICD_IROUTER accesses (where aff3 is in the upper word of this 64-bit
> register).
> Not supporting Aff3 is an architectural option in the GICv3, so this
> seems like a viable solution.
> I had some code to support "real" 64-bit accesses, which would allow
> Aff3 support, but have to fight this through Marc first sometimes in the
> future again ;-)
> 

didn't realize it was an architecturally allowed option to not support
Aff3, in that case it's not worth the bother at this point.

> >> + */
> >> +static bool handle_mmio_route_reg(struct kvm_vcpu *vcpu,
> >> +                               struct kvm_exit_mmio *mmio,
> >> +                               phys_addr_t offset)
> >> +{
> >> +     struct kvm *kvm = vcpu->kvm;
> >> +     struct vgic_dist *dist = &kvm->arch.vgic;
> >> +     int spi;
> >> +     u32 reg;
> >> +     int vcpu_id;
> >> +     unsigned long *bmap, mpidr;
> >> +
> >> +     /*
> >> +      * The upper 32 bits of each 64 bit register are zero,
> >> +      * as we don't support Aff3.
> >> +      */
> >> +     if ((offset & 4)) {
> >> +             vgic_reg_access(mmio, NULL, offset,
> >> +                             ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
> >> +             return false;
> >> +     }
> >> +
> >> +     /* This region only covers SPIs, so no handling of private IRQs here. */
> >> +     spi = offset / 8;
> > 
> > that's not how I read the spec, it says that GICD_IROUTER0 to
> > GICD_IROUTER1 are not implemented (because they are SGIs and PPIs), and
> > I read the 'SPI ID m' as the lowest numbered SPI ID being 32, thus you
> > should do:
> > 
> > spi = offset / 8 - VGIC_NR_PRIVATE_IRQS;
> 
> Well, below I changed the description of the IROUTER range to:

oh, now it's finally coming back together for me, I think I
misunderstodd your point from last rounds of review because I didn't
realize that GICD_IROUTER was defined as 0x6000 (which I actually think
is a bit backwards, but this is not the place or time).

> +     {
> +             .base           = GICD_IROUTER + 0x100,
> +             .len            = 0x1edc,

in that case, len should be 0x1ee0:

 $ printf '0x%x\n' $(( (0x7fd8 + 8) - 0x6100 ))

> +             .bits_per_irq   = 64,
> +             .handle_mmio    = handle_mmio_route_reg,
> +     },
> 
> This was due to a comment on v3 by you, where you correctly stated the
> difference in the spec's description between IROUTER and the other
> registers regarding the private IRQ handling (not implemented/reserved
> vs. RAZ/WI).
> 
> So the offset in this function is relative to 0x6100 and thus depicts
> directly the SPI number.
> 

got it now, yes, the code is correct.

[...]

> >> +
> >> +/*
> >> + * This function splits accesses between the distributor and the two
> >> + * redistributor parts (private/SPI). As each redistributor is accessible
> >> + * from any CPU, we have to determine the affected VCPU by taking the faulting
> >> + * address into account. We then pass this VCPU to the handler function via
> >> + * the private parameter.
> >> + */
> >> +#define SGI_BASE_OFFSET SZ_64K
> >> +static bool vgic_v3_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
> >> +                             struct kvm_exit_mmio *mmio)
> >> +{
> >> +     struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> >> +     unsigned long dbase = dist->vgic_dist_base;
> >> +     unsigned long rdbase = dist->vgic_redist_base;
> >> +     int nrcpus = atomic_read(&vcpu->kvm->online_vcpus);
> >> +     int vcpu_id;
> >> +     const struct kvm_mmio_range *mmio_range;
> >> +
> >> +     if (is_in_range(mmio->phys_addr, mmio->len, dbase, GIC_V3_DIST_SIZE)) {
> >> +             return vgic_handle_mmio_range(vcpu, run, mmio,
> >> +                                           vgic_v3_dist_ranges, dbase);
> >> +     }
> >> +
> >> +     if (!is_in_range(mmio->phys_addr, mmio->len, rdbase,
> >> +         GIC_V3_REDIST_SIZE * nrcpus))
> >> +             return false;
> > 
> > Did you think more about the contiguous allocation issue here or can you
> > give me a pointer to the requirement in the spec?
> 
> 5.4.1 Re-Distributor Addressing
> 

Section 5.4.1 talks about the pages within a single re-distributor having
to be contiguous, not all the re-deistributor regions having to be

contiguous, right?

> >> +
> >> +static int vgic_v3_init(struct kvm *kvm, const struct vgic_params *params)
> >> +{
> >> +     struct vgic_dist *dist = &kvm->arch.vgic;
> >> +     int ret, i;
> >> +     u32 mpidr;
> >> +
> >> +     if (IS_VGIC_ADDR_UNDEF(dist->vgic_dist_base) ||
> >> +         IS_VGIC_ADDR_UNDEF(dist->vgic_redist_base)) {
> >> +             kvm_err("Need to set vgic distributor addresses first\n");
> >> +             return -ENXIO;
> >> +     }
> >> +
> >> +     /*
> >> +      * FIXME: this should be moved to init_maps time, and may bite
> >> +      * us when adding save/restore. Add a per-emulation hook?
> >> +      */
> > 
> > progress on this fixme?
> 
> Progress supplies the ISS, but not this piece of code (read: none) ;-)
> I am more in favour of a follow-up patch on this one ...

hmmm, I'm not a fan of merging code with this kind of a comment in it,
because it looks scary, and I dont' really understand the problem from
just reading the comment, so something needs to be done here.

Thanks,

-Christoffer

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 17/19] arm64: KVM: add SGI generation register emulation
  2014-11-24 16:37     ` Andre Przywara
@ 2014-11-25 11:03       ` Christoffer Dall
  2014-11-28 15:40         ` Andre Przywara
  0 siblings, 1 reply; 80+ messages in thread
From: Christoffer Dall @ 2014-11-25 11:03 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Andre,

On Mon, Nov 24, 2014 at 04:37:58PM +0000, Andre Przywara wrote:
> Hi,
> 
> On 23/11/14 15:08, Christoffer Dall wrote:
> > On Fri, Nov 14, 2014 at 10:08:01AM +0000, Andre Przywara wrote:
> >> While the generation of a (virtual) inter-processor interrupt (SGI)
> >> on a GICv2 works by writing to a MMIO register, GICv3 uses the system
> >> register ICC_SGI1R_EL1 to trigger them.
> >> Trap that register on ARM64 hosts and handle it in a new handler
> >> function in the GICv3 emulation code.
> > 
> > Did you reorder something or does my previous comment still apply that
> > you're not enabling trapping yet, you're just adding the handler - those
> > are two different things.
> 
> Yes, I can fix the wording.
> 
> > You sort of left my question about access_gic_sgi() not checking if the
> > gicv3 is presetn hanging from the last thread, but I think I'm
> > understanding properly now, that as long as you're not setting the
> > ICC_SRE_EL2.Enable = 1, then we'll never get here, right?
> 
> Right, that is the idea. Just to make sure that I got this right from
> the discussion the other day: We will not trap to EL2 as long as
> ICC_SRE_EL2.Enable is 0 - which it should still be at this point, right?

No, when ICC_SRE_EL2.Enable is 0, then Non-secure EL1 access to
ICC_SRE_EL1 trap to EL2 (See Section 5.7.39 in the spec), which means
that accesses to the ICC_SGIx registers will cause an undefined
exception in the guest because we set ICC_SRE_EL1.SRE to 0 for the
guest and the guest cannot change this.

Now, when we set ICC_SRE_EL2.Enable to 1, then the guest can set
ICC_SRE_EL1.SRE to 1 (and we also happen to reset it to 1), and we will
indeed trap on guest access to the ICC_SGIx registers, because all
virtual accesses to these registers trap.

(Going back and checking where 'virtual accesses' is defined in the spec
left me somewhere without any results, but I am guessing that because we
set the ICH_HCR_EL2.En to 1, all accesses will be deemed virtual
accesses, maybe the spec should be clarfied on this matter?).

Anyhow, to get back to my original question, getting here requires
a situation where the guest copy of the ICC_SRE_EL1.SRE is 1, which we
only allow when we have properly initialized the GICv3 data structures.

> (I am asking because I struggle to find this in the spec).
> 
> So actually your ICC_SRE_EL1 trap patch solved that problem ;-)
> 

So I think this is a different thing, not related that closely to my
question above.

That patch was about when ICC_SRE_EL2.Enable is 0, then we would trap
guest accesses to ICC_SRE_EL1 which did not have any sysreg handler
installed, and ended up with an undefined exception in the guest instead
of handling the trap as RAZ/WI.

> >>
> >> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> >> ---
> >> Changelog v3...v4:
> >> - moved addition of vgic_v3_dispatch_sgi() from earlier patch into here
> >> - move MPIDR comparison into extra function
> >> - use new ICC_SGI1R_ field names
> >> - improve readability of vgic_v3_dispatch_sgi()
> >> - add and refine comments
> >>
> >>  arch/arm64/kvm/sys_regs.c   |   26 ++++++++++
> >>  include/kvm/arm_vgic.h      |    1 +
> >>  virt/kvm/arm/vgic-v3-emul.c |  113 +++++++++++++++++++++++++++++++++++++++++++
> >>  3 files changed, 140 insertions(+)
> >>
> >> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> >> index fd3ffc3..e59369a 100644
> >> --- a/arch/arm64/kvm/sys_regs.c
> >> +++ b/arch/arm64/kvm/sys_regs.c
> >> @@ -165,6 +165,27 @@ static bool access_sctlr(struct kvm_vcpu *vcpu,
> >>  	return true;
> >>  }
> >>  
> >> +/*
> >> + * Trap handler for the GICv3 SGI generation system register.
> >> + * Forward the request to the VGIC emulation.
> >> + * The cp15_64 code makes sure this automatically works
> >> + * for both AArch64 and AArch32 accesses.
> >> + */
> >> +static bool access_gic_sgi(struct kvm_vcpu *vcpu,
> >> +			   const struct sys_reg_params *p,
> >> +			   const struct sys_reg_desc *r)
> >> +{
> >> +	u64 val;
> >> +
> >> +	if (!p->is_write)
> >> +		return read_from_write_only(vcpu, p);
> >> +
> >> +	val = *vcpu_reg(vcpu, p->Rt);
> >> +	vgic_v3_dispatch_sgi(vcpu, val);
> >> +
> >> +	return true;
> >> +}
> >> +
> >>  static bool trap_raz_wi(struct kvm_vcpu *vcpu,
> >>  			const struct sys_reg_params *p,
> >>  			const struct sys_reg_desc *r)
> >> @@ -431,6 +452,9 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> >>  	/* VBAR_EL1 */
> >>  	{ Op0(0b11), Op1(0b000), CRn(0b1100), CRm(0b0000), Op2(0b000),
> >>  	  NULL, reset_val, VBAR_EL1, 0 },
> >> +	/* ICC_SGI1R_EL1 */
> >> +	{ Op0(0b11), Op1(0b000), CRn(0b1100), CRm(0b1011), Op2(0b101),
> >> +	  access_gic_sgi },
> >>  	/* CONTEXTIDR_EL1 */
> >>  	{ Op0(0b11), Op1(0b000), CRn(0b1101), CRm(0b0000), Op2(0b001),
> >>  	  access_vm_reg, reset_val, CONTEXTIDR_EL1, 0 },
> >> @@ -659,6 +683,8 @@ static const struct sys_reg_desc cp14_64_regs[] = {
> >>   * register).
> >>   */
> >>  static const struct sys_reg_desc cp15_regs[] = {
> >> +	{ Op1( 0), CRn( 0), CRm(12), Op2( 0), access_gic_sgi },
> >> +
> >>  	{ Op1( 0), CRn( 1), CRm( 0), Op2( 0), access_sctlr, NULL, c1_SCTLR },
> >>  	{ Op1( 0), CRn( 2), CRm( 0), Op2( 0), access_vm_reg, NULL, c2_TTBR0 },
> >>  	{ Op1( 0), CRn( 2), CRm( 0), Op2( 1), access_vm_reg, NULL, c2_TTBR1 },
> >> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> >> index c1ef5a9..357a935 100644
> >> --- a/include/kvm/arm_vgic.h
> >> +++ b/include/kvm/arm_vgic.h
> >> @@ -305,6 +305,7 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu);
> >>  void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu);
> >>  int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
> >>  			bool level);
> >> +void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
> >>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
> >>  bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
> >>  		      struct kvm_exit_mmio *mmio);
> >> diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
> >> index 97b5801..58d7457 100644
> >> --- a/virt/kvm/arm/vgic-v3-emul.c
> >> +++ b/virt/kvm/arm/vgic-v3-emul.c
> >> @@ -828,6 +828,119 @@ int vgic_v3_init_emulation(struct kvm *kvm)
> >>  	return 0;
> >>  }
> >>  
> >> +/*
> >> + * Compare a given affinity (level 1-3 and a level 0 mask, from the SGI
> >> + * generation register ICC_SGI1R_EL1) with a given VCPU.
> >> + * If the VCPU's MPIDR matches, return the level0 affinity, otherwise
> >> + * return -1.
> >> + */
> >> +static int match_mpidr(u64 sgi_aff, u16 sgi_cpu_mask, struct kvm_vcpu *vcpu)
> >> +{
> >> +	unsigned long affinity;
> >> +	int level0;
> >> +
> >> +	/*
> >> +	 * Split the current VCPU's MPIDR into affinity level 0 and the
> >> +	 * rest as this is what we have to compare against.
> >> +	 */
> >> +	affinity = kvm_vcpu_get_mpidr_aff(vcpu);
> >> +	level0 = MPIDR_AFFINITY_LEVEL(affinity, 0);
> >> +	affinity &= ~MPIDR_LEVEL_MASK;
> >> +
> >> +	/* bail out if the upper three levels don't match */
> >> +	if (sgi_aff != affinity)
> >> +		return -1;
> >> +
> >> +	/* Is this VCPU's bit set in the mask ? */
> >> +	if (!(sgi_cpu_mask & BIT(level0)))
> >> +		return -1;
> >> +
> >> +	return level0;
> >> +}
> >> +
> >> +#define SGI_AFFINITY_LEVEL(reg, level) \
> >> +	((((reg) & ICC_SGI1R_AFFINITY_## level ##_MASK) \
> >> +	>> ICC_SGI1R_AFFINITY_## level ##_SHIFT) << MPIDR_LEVEL_SHIFT(level))
> >> +
> >> +/**
> >> + * vgic_v3_dispatch_sgi - handle SGI requests from VCPUs
> >> + * @vcpu: The VCPU requesting a SGI
> >> + * @reg: The value written into the ICC_SGI1R_EL1 register by that VCPU
> >> + *
> >> + * With GICv3 (and ARE=1) CPUs trigger SGIs by writing to an architectural
> > 
> > what's a non-architectural system register?
> 
> architectural vs. implementation defined.
> Are you suggesting that I should drop "architectural" because it is a
> tautology?

when you write architectural here it lets the reader belive this is
something of importance as compared to any other system register write,
which I don't believe it is here, so I would drop the word, but it's up
to you.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 18/19] arm/arm64: KVM: enable kernel side of GICv3 emulation
  2014-11-24 17:41     ` Andre Przywara
@ 2014-11-25 11:08       ` Christoffer Dall
  0 siblings, 0 replies; 80+ messages in thread
From: Christoffer Dall @ 2014-11-25 11:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 24, 2014 at 05:41:07PM +0000, Andre Przywara wrote:
> (Marc, can you comment on the last question below about the unaligned
> GICV mapping?)
> 
> Hi Christoffer,
> 
> On 24/11/14 09:09, Christoffer Dall wrote:
> > On Fri, Nov 14, 2014 at 10:08:02AM +0000, Andre Przywara wrote:
> >> With all the necessary GICv3 emulation code in place, we can now
> >> connect the code to the GICv3 backend in the kernel.
> >> The LR register handling is different depending on the emulated GIC
> >> model, so provide different implementations for each.
> >> Also allow non-v2-compatible GICv3 implementations (which don't
> >> provide MMIO regions for the virtual CPU interface in the DT), but
> >> restrict those hosts to support GICv3 guests only.
> >>
> >> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> >> ---
> >> Changelog v3...v4:
> >> - handle differences between GICv2-on-v3 and GICv3-on-v3 in existing functions
> >> - remove init_*_emul() functions
> >> - remove max_vcpus setting (done in earlier patches now)
> >> - adapt to new vgic_v<n>_init_emulation behaviour
> >>
> >>  virt/kvm/arm/vgic-v3.c |   83 ++++++++++++++++++++++++++++++++----------------
> >>  virt/kvm/arm/vgic.c    |    5 +++
> >>  2 files changed, 60 insertions(+), 28 deletions(-)
> >>
> >> diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
> >> index a04d208..4894c59 100644
> >> --- a/virt/kvm/arm/vgic-v3.c
> >> +++ b/virt/kvm/arm/vgic-v3.c
> >> @@ -34,6 +34,7 @@
> >>  #define GICH_LR_VIRTUALID		(0x3ffUL << 0)
> >>  #define GICH_LR_PHYSID_CPUID_SHIFT	(10)
> >>  #define GICH_LR_PHYSID_CPUID		(7UL << GICH_LR_PHYSID_CPUID_SHIFT)
> >> +#define ICH_LR_VIRTUALID_MASK		(BIT_ULL(32) - 1)
> >>  
> >>  /*
> >>   * LRs are stored in reverse order in memory. make sure we index them
> >> @@ -48,12 +49,17 @@ static struct vgic_lr vgic_v3_get_lr(const struct kvm_vcpu *vcpu, int lr)
> >>  	struct vgic_lr lr_desc;
> >>  	u64 val = vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[LR_INDEX(lr)];
> >>  
> >> -	lr_desc.irq	= val & GICH_LR_VIRTUALID;
> >> -	if (lr_desc.irq <= 15)
> >> -		lr_desc.source	= (val >> GICH_LR_PHYSID_CPUID_SHIFT) & 0x7;
> >> +	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
> >> +		lr_desc.irq = val & ICH_LR_VIRTUALID_MASK;
> >>  	else
> >> -		lr_desc.source = 0;
> >> -	lr_desc.state	= 0;
> >> +		lr_desc.irq = val & GICH_LR_VIRTUALID;
> >> +
> >> +	lr_desc.source = 0;
> >> +	if (lr_desc.irq <= 15 &&
> >> +	    vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V2)
> >> +		lr_desc.source	= (val >> GICH_LR_PHYSID_CPUID_SHIFT) & 0x7;
> >> +
> >> +	lr_desc.state   = 0;
> > 
> > super-nit-only-if-you-respin: you have a couple of tabs and extra spaces
> > in the two lines above that need to just be a single space before the
> > assignment operator on each line.
> > 
> >>  
> >>  	if (val & ICH_LR_PENDING_BIT)
> >>  		lr_desc.state |= LR_STATE_PENDING;
> >> @@ -68,8 +74,20 @@ static struct vgic_lr vgic_v3_get_lr(const struct kvm_vcpu *vcpu, int lr)
> >>  static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
> >>  			   struct vgic_lr lr_desc)
> >>  {
> >> -	u64 lr_val = (((u32)lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT) |
> >> -		      lr_desc.irq);
> >> +	u64 lr_val;
> >> +
> >> +	lr_val = lr_desc.irq;
> >> +
> >> +	/*
> >> +	 * currently all guest IRQs are Group1, as Group0 would result
> > 
> > I guess you couldn't guess my comment, from last time, can you please
> > begin sentences with upper-case?  (only if you re-spin).
> > 
> >> +	 * in a FIQ in the guest, which it wouldn't expect.
> >> +	 * Eventually we want to make this configurable, so we may revisit
> >> +	 * this in the future.
> >> +	 */
> >> +	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
> >> +		lr_val |= ICH_LR_GROUP;
> >> +	else
> >> +		lr_val |= (u32)lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT;
> >>  
> >>  	if (lr_desc.state & LR_STATE_PENDING)
> >>  		lr_val |= ICH_LR_PENDING_BIT;
> >> @@ -154,7 +172,14 @@ static void vgic_v3_enable(struct kvm_vcpu *vcpu)
> >>  	 */
> >>  	vgic_v3->vgic_vmcr = 0;
> >>  
> >> -	vgic_v3->vgic_sre = 0;
> >> +	/*
> >> +	 * Set the SRE_EL1 value depending on the configured
> >> +	 * emulated vGIC model.
> >> +	 */
> >> +	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
> >> +		vgic_v3->vgic_sre = ICC_SRE_EL1_SRE;
> > 
> > I think we left some of my questions from last round unanswered here
> > (you said you needed to go and think about it).  If the guest sets SRE=0
> > we will not currently preserve this value I think.
> 
> Which is intentional, as we are following "4.4.7 Implementations with
> Fixed System Register Enables", which allows RAO/WI semantics for the
> SRE bit if the GIC implementation does not care about GICv2
> compatibility (which is what we do for the guest).
> Does that make sense or am I missing something?
> 

I really hope you didn't mean that you left my question unanswered
intentionally, but that the code behaves the way it does, on purpose ;)

So I think the comment should focus on the non-trivial part, the code
easily reads as doing something based on the vgic_model, but your
*implementation choice* of the vgic is not trivial, and that is what
deserves a comment.

Hint: how many times did we come back to this so far?

> > The comment should clearly indicate that we're choosing a reset value
> > of the register for our implementation of the gic.
> > 
> > I'm fine with this change, but would like to know what the rationale
> > behind it is; wouldn't guests always initialize this value?
> > 
> >> +	else
> >> +		vgic_v3->vgic_sre = 0;
> >>  
> >>  	/* Get the show on the road... */
> >>  	vgic_v3->vgic_hcr = ICH_HCR_EN;
> >> @@ -215,28 +240,30 @@ int vgic_v3_probe(struct device_node *vgic_node,
> >>  
> >>  	gicv_idx += 3; /* Also skip GICD, GICC, GICH */
> >>  	if (of_address_to_resource(vgic_node, gicv_idx, &vcpu_res)) {
> >> -		kvm_err("Cannot obtain GICV region\n");
> >> -		ret = -ENXIO;
> >> -		goto out;
> >> -	}
> >> -
> >> -	if (!PAGE_ALIGNED(vcpu_res.start)) {
> >> -		kvm_err("GICV physical address 0x%llx not page aligned\n",
> >> -			(unsigned long long)vcpu_res.start);
> >> -		ret = -ENXIO;
> >> -		goto out;
> >> -	}
> >> -
> >> -	if (!PAGE_ALIGNED(resource_size(&vcpu_res))) {
> >> -		kvm_err("GICV size 0x%llx not a multiple of page size 0x%lx\n",
> >> -			(unsigned long long)resource_size(&vcpu_res),
> >> -			PAGE_SIZE);
> >> -		ret = -ENXIO;
> >> -		goto out;
> >> +		kvm_info("GICv3: GICv2 emulation not available\n");
> >> +		vgic->vcpu_base = 0;
> >> +	} else {
> >> +		if (!PAGE_ALIGNED(vcpu_res.start)) {
> >> +			kvm_err("GICV physical address 0x%llx not page aligned\n",
> >> +				(unsigned long long)vcpu_res.start);
> >> +			ret = -ENXIO;
> >> +			goto out;
> > 
> > shouldn't we be allowing an emulated gicv3 using the system registers in
> > this case then?
> 
> I guess not. If we have a GICV address that is not passing those tests,
> we should warn the user about it instead of unexpectedly restricting the
> virtualization capabilities, right?
> If the user desperately wants to use GIC virtualization despite having
> the odd mapping, he could remove the GICC/GICH/GICV at all from the DTB.
> 

You can still inform the user but allow the system register interface at
the same time.

We already have hardware where they got this wrong on gicv2, so there's
a chance we'll see it again on gicv3, and I see no technical reason why
the two things are related?  We may as well disable the entire gic then
and panic the kernel if we're going down that road, which we obviously
don't want to do.

-Christoffer

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation
  2014-11-25 10:41       ` Christoffer Dall
@ 2014-11-28 15:24         ` Andre Przywara
  2014-11-30  8:30           ` Christoffer Dall
  0 siblings, 1 reply; 80+ messages in thread
From: Andre Przywara @ 2014-11-28 15:24 UTC (permalink / raw)
  To: linux-arm-kernel

Hej Christoffer,

On 25/11/14 10:41, Christoffer Dall wrote:
> Hi Andre,
> 
> On Mon, Nov 24, 2014 at 04:00:46PM +0000, Andre Przywara wrote:
> 

[...]

>>>> +
>>>> +     /* This region only covers SPIs, so no handling of private IRQs here. */
>>>> +     spi = offset / 8;
>>>
>>> that's not how I read the spec, it says that GICD_IROUTER0 to
>>> GICD_IROUTER1 are not implemented (because they are SGIs and PPIs), and
>>> I read the 'SPI ID m' as the lowest numbered SPI ID being 32, thus you
>>> should do:
>>>
>>> spi = offset / 8 - VGIC_NR_PRIVATE_IRQS;
>>
>> Well, below I changed the description of the IROUTER range to:
> 
> oh, now it's finally coming back together for me, I think I
> misunderstodd your point from last rounds of review because I didn't
> realize that GICD_IROUTER was defined as 0x6000 (which I actually think
> is a bit backwards, but this is not the place or time).
> 
>> +     {
>> +             .base           = GICD_IROUTER + 0x100,
>> +             .len            = 0x1edc,
> 
> in that case, len should be 0x1ee0:
> 
>  $ printf '0x%x\n' $(( (0x7fd8 + 8) - 0x6100 ))

Ah yes, the spec gives the beginning of the last register, not the end
of the region. Thanks for spotting this!

[...]

>>>> +
>>>> +     if (!is_in_range(mmio->phys_addr, mmio->len, rdbase,
>>>> +         GIC_V3_REDIST_SIZE * nrcpus))
>>>> +             return false;
>>>
>>> Did you think more about the contiguous allocation issue here or can you
>>> give me a pointer to the requirement in the spec?
>>
>> 5.4.1 Re-Distributor Addressing
>>
> 
> Section 5.4.1 talks about the pages within a single re-distributor having
> to be contiguous, not all the re-deistributor regions having to be
> contiguous, right?

Ah yes, you are right. But I still think it does not matter:
1) We are "implementing" the GICv3. So as the spec does not forbid this,
we just state that the redistributor register maps for each VCPU are
contiguous. Also we create the FDT accordingly. I will add a comment in
the documentation to state this.

2) The kernel's GICv3 DT bindings assume this allocation is the default.
Although Marc added bindings to work around this (stride), it seems much
more logical to me to not use it.

>>>> +
>>>> +static int vgic_v3_init(struct kvm *kvm, const struct vgic_params *params)
>>>> +{
>>>> +     struct vgic_dist *dist = &kvm->arch.vgic;
>>>> +     int ret, i;
>>>> +     u32 mpidr;
>>>> +
>>>> +     if (IS_VGIC_ADDR_UNDEF(dist->vgic_dist_base) ||
>>>> +         IS_VGIC_ADDR_UNDEF(dist->vgic_redist_base)) {
>>>> +             kvm_err("Need to set vgic distributor addresses first\n");
>>>> +             return -ENXIO;
>>>> +     }
>>>> +
>>>> +     /*
>>>> +      * FIXME: this should be moved to init_maps time, and may bite
>>>> +      * us when adding save/restore. Add a per-emulation hook?
>>>> +      */
>>>
>>> progress on this fixme?
>>
>> Progress supplies the ISS, but not this piece of code (read: none) ;-)
>> I am more in favour of a follow-up patch on this one ...
> 
> hmmm, I'm not a fan of merging code with this kind of a comment in it,
> because it looks scary, and I dont' really understand the problem from
> just reading the comment, so something needs to be done here.

I see. What about we are moving this unconditionally into vgic_init_maps
and allocate it for both v2 and v3 guests and get rid of the whole
function? It allocates only memory for the irq_spi_mpidr, which is 4
bytes per configured SPI (so at most less than 4 KB, but usually just
128 Bytes per guest). This would be a pretty quick solution. Does that
sound too hackish?

After your comments about the per-VM ops function pointers I am a bit
reluctant to introduce another one (which would be the obvious way
following the comment) for just this simple kalloc().
On the other hand the ITS emulation may later make better use of a GICv3
specific allocation function.

What's your opinion on this?

Cheers,
Andre.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 17/19] arm64: KVM: add SGI generation register emulation
  2014-11-25 11:03       ` Christoffer Dall
@ 2014-11-28 15:40         ` Andre Przywara
  2014-11-30  8:45           ` Christoffer Dall
  0 siblings, 1 reply; 80+ messages in thread
From: Andre Przywara @ 2014-11-28 15:40 UTC (permalink / raw)
  To: linux-arm-kernel

Hej Christoffer,

On 25/11/14 11:03, Christoffer Dall wrote:
> Hi Andre,
> 
> On Mon, Nov 24, 2014 at 04:37:58PM +0000, Andre Przywara wrote:
>> Hi,
>>
>> On 23/11/14 15:08, Christoffer Dall wrote:
>>> On Fri, Nov 14, 2014 at 10:08:01AM +0000, Andre Przywara wrote:
>>>> While the generation of a (virtual) inter-processor interrupt (SGI)
>>>> on a GICv2 works by writing to a MMIO register, GICv3 uses the system
>>>> register ICC_SGI1R_EL1 to trigger them.
>>>> Trap that register on ARM64 hosts and handle it in a new handler
>>>> function in the GICv3 emulation code.
>>>
>>> Did you reorder something or does my previous comment still apply that
>>> you're not enabling trapping yet, you're just adding the handler - those
>>> are two different things.
>>
>> Yes, I can fix the wording.
>>
>>> You sort of left my question about access_gic_sgi() not checking if the
>>> gicv3 is presetn hanging from the last thread, but I think I'm
>>> understanding properly now, that as long as you're not setting the
>>> ICC_SRE_EL2.Enable = 1, then we'll never get here, right?
>>
>> Right, that is the idea. Just to make sure that I got this right from
>> the discussion the other day: We will not trap to EL2 as long as
>> ICC_SRE_EL2.Enable is 0 - which it should still be at this point, right?
> 
> No, when ICC_SRE_EL2.Enable is 0, then Non-secure EL1 access to
> ICC_SRE_EL1 trap to EL2 (See Section 5.7.39 in the spec), which means
> that accesses to the ICC_SGIx registers will cause an undefined
> exception in the guest because we set ICC_SRE_EL1.SRE to 0 for the
> guest and the guest cannot change this.
> 
> Now, when we set ICC_SRE_EL2.Enable to 1, then the guest can set
> ICC_SRE_EL1.SRE to 1 (and we also happen to reset it to 1), and we will
> indeed trap on guest access to the ICC_SGIx registers, because all
> virtual accesses to these registers trap.
> 
> (Going back and checking where 'virtual accesses' is defined in the spec
> left me somewhere without any results, but I am guessing that because we
> set the ICH_HCR_EL2.En to 1, all accesses will be deemed virtual
> accesses, maybe the spec should be clarfied on this matter?).
> 
> Anyhow, to get back to my original question, getting here requires
> a situation where the guest copy of the ICC_SRE_EL1.SRE is 1, which we
> only allow when we have properly initialized the GICv3 data structures.

So to summarize (and check) this: There is no real issue at this point?
And the code is totally fine after 19/19?

Would this kind of problem actually matter _inside_ a patch series? To
trigger an issue, we would need a bogus guest and bogus userland
(because at this point neither of them would see/inject a GICv3 FDT
node). I'd assume that running a kernel at this point is just for
debugging/bisecting? Where you wouldn't care about every corner case of
execution?

Please tell me if I should give my email reading a seventh pass ;-)

Regards,
Andre.

>> (I am asking because I struggle to find this in the spec).
>>
>> So actually your ICC_SRE_EL1 trap patch solved that problem ;-)
>>
> 
> So I think this is a different thing, not related that closely to my
> question above.
> 
> That patch was about when ICC_SRE_EL2.Enable is 0, then we would trap
> guest accesses to ICC_SRE_EL1 which did not have any sysreg handler
> installed, and ended up with an undefined exception in the guest instead
> of handling the trap as RAZ/WI.
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation
  2014-11-28 15:24         ` Andre Przywara
@ 2014-11-30  8:30           ` Christoffer Dall
  2014-12-02 16:24             ` Andre Przywara
  0 siblings, 1 reply; 80+ messages in thread
From: Christoffer Dall @ 2014-11-30  8:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Nov 28, 2014 at 03:24:11PM +0000, Andre Przywara wrote:
> Hej Christoffer,
> 
> On 25/11/14 10:41, Christoffer Dall wrote:
> > Hi Andre,
> > 
> > On Mon, Nov 24, 2014 at 04:00:46PM +0000, Andre Przywara wrote:
> > 
> 
> [...]
> 
> >>>> +
> >>>> +     /* This region only covers SPIs, so no handling of private IRQs here. */
> >>>> +     spi = offset / 8;
> >>>
> >>> that's not how I read the spec, it says that GICD_IROUTER0 to
> >>> GICD_IROUTER1 are not implemented (because they are SGIs and PPIs), and
> >>> I read the 'SPI ID m' as the lowest numbered SPI ID being 32, thus you
> >>> should do:
> >>>
> >>> spi = offset / 8 - VGIC_NR_PRIVATE_IRQS;
> >>
> >> Well, below I changed the description of the IROUTER range to:
> > 
> > oh, now it's finally coming back together for me, I think I
> > misunderstodd your point from last rounds of review because I didn't
> > realize that GICD_IROUTER was defined as 0x6000 (which I actually think
> > is a bit backwards, but this is not the place or time).
> > 
> >> +     {
> >> +             .base           = GICD_IROUTER + 0x100,
> >> +             .len            = 0x1edc,
> > 
> > in that case, len should be 0x1ee0:
> > 
> >  $ printf '0x%x\n' $(( (0x7fd8 + 8) - 0x6100 ))
> 
> Ah yes, the spec gives the beginning of the last register, not the end
> of the region. Thanks for spotting this!
> 
> [...]
> 
> >>>> +
> >>>> +     if (!is_in_range(mmio->phys_addr, mmio->len, rdbase,
> >>>> +         GIC_V3_REDIST_SIZE * nrcpus))
> >>>> +             return false;
> >>>
> >>> Did you think more about the contiguous allocation issue here or can you
> >>> give me a pointer to the requirement in the spec?
> >>
> >> 5.4.1 Re-Distributor Addressing
> >>
> > 
> > Section 5.4.1 talks about the pages within a single re-distributor having
> > to be contiguous, not all the re-deistributor regions having to be
> > contiguous, right?
> 
> Ah yes, you are right. But I still think it does not matter:
> 1) We are "implementing" the GICv3. So as the spec does not forbid this,
> we just state that the redistributor register maps for each VCPU are
> contiguous. Also we create the FDT accordingly. I will add a comment in
> the documentation to state this.
> 
> 2) The kernel's GICv3 DT bindings assume this allocation is the default.
> Although Marc added bindings to work around this (stride), it seems much
> more logical to me to not use it.

I don't disagree (and never have) with the fact that it is up to us to
decide.

My original question, which we haven't talked about yet, is if it is
*reasonable* to assume that all re-distributor regions will always be
contiguous?

How will you handle VCPU hotplug for example?  Where in the guest
physical memory map of our various virt machines should these regions
sit so that we can allocate anough re-distributors for VCPUs etc.?

I just want to make sure we're not limiting ourselves by some amount of
functionality or ABI (redistributor base addresses) that will be hard to
expand in the future.

> 
> >>>> +
> >>>> +static int vgic_v3_init(struct kvm *kvm, const struct vgic_params *params)
> >>>> +{
> >>>> +     struct vgic_dist *dist = &kvm->arch.vgic;
> >>>> +     int ret, i;
> >>>> +     u32 mpidr;
> >>>> +
> >>>> +     if (IS_VGIC_ADDR_UNDEF(dist->vgic_dist_base) ||
> >>>> +         IS_VGIC_ADDR_UNDEF(dist->vgic_redist_base)) {
> >>>> +             kvm_err("Need to set vgic distributor addresses first\n");
> >>>> +             return -ENXIO;
> >>>> +     }
> >>>> +
> >>>> +     /*
> >>>> +      * FIXME: this should be moved to init_maps time, and may bite
> >>>> +      * us when adding save/restore. Add a per-emulation hook?
> >>>> +      */
> >>>
> >>> progress on this fixme?
> >>
> >> Progress supplies the ISS, but not this piece of code (read: none) ;-)
> >> I am more in favour of a follow-up patch on this one ...
> > 
> > hmmm, I'm not a fan of merging code with this kind of a comment in it,
> > because it looks scary, and I dont' really understand the problem from
> > just reading the comment, so something needs to be done here.
> 
> I see. What about we are moving this unconditionally into vgic_init_maps
> and allocate it for both v2 and v3 guests and get rid of the whole
> function? It allocates only memory for the irq_spi_mpidr, which is 4
> bytes per configured SPI (so at most less than 4 KB, but usually just
> 128 Bytes per guest). This would be a pretty quick solution. Does that
> sound too hackish?
> 
> After your comments about the per-VM ops function pointers I am a bit
> reluctant to introduce another one (which would be the obvious way
> following the comment) for just this simple kalloc().
> On the other hand the ITS emulation may later make better use of a GICv3
> specific allocation function.

What I really disliked was the configuration of a function pointer,
which, when invoked, configured other function pointers.  That just made
my head spin.  So adding another per-gic-model init_maps method is not
that bad, but on the other hand, the only problem with keeping this here
is that when we restore the vgic state, then user space wants to be able
to populate all the date before running any VCPUs, and we don't create
the data structures before the first VCPU is run.

However, Eric has a problem with this "init-when-we-run-the-first-VCPU"
approach as well, so one argument is that we need to add a method to
both the gicv2 and gicv3 device API to say "VGIC_INIT" which userspace
can call after having created all the VCPUs.  And, in fact, we may want
to enforce this for the gicv3 right now and only maintain the existing
behavior for gicv2.

(Eric's use case is configuring IRQFD, which must logically be done
before running the machine, but also needs to be done after the vgic is
fully ready.).

Does this make sense?

We could consider scheduling a call for this if you think that would be
helpful.

-Christoffer

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 17/19] arm64: KVM: add SGI generation register emulation
  2014-11-28 15:40         ` Andre Przywara
@ 2014-11-30  8:45           ` Christoffer Dall
  2014-12-03 17:50             ` Andre Przywara
  0 siblings, 1 reply; 80+ messages in thread
From: Christoffer Dall @ 2014-11-30  8:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Nov 28, 2014 at 03:40:12PM +0000, Andre Przywara wrote:
> Hej Christoffer,
> 
> On 25/11/14 11:03, Christoffer Dall wrote:
> > Hi Andre,
> > 
> > On Mon, Nov 24, 2014 at 04:37:58PM +0000, Andre Przywara wrote:
> >> Hi,
> >>
> >> On 23/11/14 15:08, Christoffer Dall wrote:
> >>> On Fri, Nov 14, 2014 at 10:08:01AM +0000, Andre Przywara wrote:
> >>>> While the generation of a (virtual) inter-processor interrupt (SGI)
> >>>> on a GICv2 works by writing to a MMIO register, GICv3 uses the system
> >>>> register ICC_SGI1R_EL1 to trigger them.
> >>>> Trap that register on ARM64 hosts and handle it in a new handler
> >>>> function in the GICv3 emulation code.
> >>>
> >>> Did you reorder something or does my previous comment still apply that
> >>> you're not enabling trapping yet, you're just adding the handler - those
> >>> are two different things.
> >>
> >> Yes, I can fix the wording.
> >>
> >>> You sort of left my question about access_gic_sgi() not checking if the
> >>> gicv3 is presetn hanging from the last thread, but I think I'm
> >>> understanding properly now, that as long as you're not setting the
> >>> ICC_SRE_EL2.Enable = 1, then we'll never get here, right?
> >>
> >> Right, that is the idea. Just to make sure that I got this right from
> >> the discussion the other day: We will not trap to EL2 as long as
> >> ICC_SRE_EL2.Enable is 0 - which it should still be at this point, right?
> > 
> > No, when ICC_SRE_EL2.Enable is 0, then Non-secure EL1 access to
> > ICC_SRE_EL1 trap to EL2 (See Section 5.7.39 in the spec), which means
> > that accesses to the ICC_SGIx registers will cause an undefined
> > exception in the guest because we set ICC_SRE_EL1.SRE to 0 for the
> > guest and the guest cannot change this.
> > 
> > Now, when we set ICC_SRE_EL2.Enable to 1, then the guest can set
> > ICC_SRE_EL1.SRE to 1 (and we also happen to reset it to 1), and we will
> > indeed trap on guest access to the ICC_SGIx registers, because all
> > virtual accesses to these registers trap.
> > 
> > (Going back and checking where 'virtual accesses' is defined in the spec
> > left me somewhere without any results, but I am guessing that because we
> > set the ICH_HCR_EL2.En to 1, all accesses will be deemed virtual
> > accesses, maybe the spec should be clarfied on this matter?).
> > 
> > Anyhow, to get back to my original question, getting here requires
> > a situation where the guest copy of the ICC_SRE_EL1.SRE is 1, which we
> > only allow when we have properly initialized the GICv3 data structures.
> 
> So to summarize (and check) this: There is no real issue at this point?
> And the code is totally fine after 19/19?

There is no issue at this point, no.

> 
> Would this kind of problem actually matter _inside_ a patch series? To
> trigger an issue, we would need a bogus guest and bogus userland
> (because at this point neither of them would see/inject a GICv3 FDT
> node). I'd assume that running a kernel at this point is just for
> debugging/bisecting? Where you wouldn't care about every corner case of
> execution?

The argument about bogus guests / fdts should *never* be considered in
the context of these discussions.  If we have code that looks like the
guest can kill the host, or do a NULL pointer dereference, then we need
to address it.

Your point about it being inside a patch series, sure, it's unlikely
that people will run this, but I'm reviewing this patch right now, and
honestly not considering how this changes in the subsequent patch.  For
this sort of thing, if we were leaving a gaping hole open, that would at
least require a clear note in the commit message on why we're doing it.

Hopefully you understood and agreed with my deduction about the various
SRE settings above though?

-Christoffer

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation
  2014-11-30  8:30           ` Christoffer Dall
@ 2014-12-02 16:24             ` Andre Przywara
  2014-12-02 17:06               ` Marc Zyngier
  2014-12-03 10:28               ` Christoffer Dall
  0 siblings, 2 replies; 80+ messages in thread
From: Andre Przywara @ 2014-12-02 16:24 UTC (permalink / raw)
  To: linux-arm-kernel

Hej Christoffer,

On 30/11/14 08:30, Christoffer Dall wrote:
> On Fri, Nov 28, 2014 at 03:24:11PM +0000, Andre Przywara wrote:
>> Hej Christoffer,
>>
>> On 25/11/14 10:41, Christoffer Dall wrote:
>>> Hi Andre,
>>>
>>> On Mon, Nov 24, 2014 at 04:00:46PM +0000, Andre Przywara wrote:
>>>
>>

[...]

>>>>>> +
>>>>>> +     if (!is_in_range(mmio->phys_addr, mmio->len, rdbase,
>>>>>> +         GIC_V3_REDIST_SIZE * nrcpus))
>>>>>> +             return false;
>>>>>
>>>>> Did you think more about the contiguous allocation issue here or can you
>>>>> give me a pointer to the requirement in the spec?
>>>>
>>>> 5.4.1 Re-Distributor Addressing
>>>>
>>>
>>> Section 5.4.1 talks about the pages within a single re-distributor having
>>> to be contiguous, not all the re-deistributor regions having to be
>>> contiguous, right?
>>
>> Ah yes, you are right. But I still think it does not matter:
>> 1) We are "implementing" the GICv3. So as the spec does not forbid this,
>> we just state that the redistributor register maps for each VCPU are
>> contiguous. Also we create the FDT accordingly. I will add a comment in
>> the documentation to state this.
>>
>> 2) The kernel's GICv3 DT bindings assume this allocation is the default.
>> Although Marc added bindings to work around this (stride), it seems much
>> more logical to me to not use it.
> 
> I don't disagree (and never have) with the fact that it is up to us to
> decide.
> 
> My original question, which we haven't talked about yet, is if it is
> *reasonable* to assume that all re-distributor regions will always be
> contiguous?
> 
> How will you handle VCPU hotplug for example?

As kvmtool does not support hotplug, I haven't thought about this yet.
To me it looks like userland should just use maxcpus for the allocation.
If I get the current QEMU code right, there is room for 127 GICv3 VCPUs
(2*64K per VCPU + 64K for the distributor in 16M space) at the moment.
Kvmtool uses a different mapping, which allows to share 1G with virtio,
so the limit is around 8000ish VCPUs here.
Are there any issues with changing the QEMU virt mapping later?
Migration, maybe?
If the UART, the RTC and the virtio regions are moved more towards the
beginning of the 256MB PCI mapping, then there should be space for a bit
less than 1024 VCPUs, if I get this right.

> Where in the guest
> physical memory map of our various virt machines should these regions
> sit so that we can allocate anough re-distributors for VCPUs etc.?

Various? Are there other mappings than those described in hw/arm/virt.c?

> I just want to make sure we're not limiting ourselves by some amount of
> functionality or ABI (redistributor base addresses) that will be hard to
> expand in the future.

If we are flexible with the mapping at VM creation time, QEMU could just
use a mapping depending on max_cpus:
< 128 VCPUs: use the current mapping
128 <= x < 1020: use a more compressed mapping
>= 1020: map the redistributor somewhere above 4 GB

As the device tree binding for GICv3 just supports a stride value, we
don't have any other real options beside this, right? So how I see this,
a contiguous mapping (with possible holes) is the only way.

>>>>>> +
>>>>>> +static int vgic_v3_init(struct kvm *kvm, const struct vgic_params *params)
>>>>>> +{
>>>>>> +     struct vgic_dist *dist = &kvm->arch.vgic;
>>>>>> +     int ret, i;
>>>>>> +     u32 mpidr;
>>>>>> +
>>>>>> +     if (IS_VGIC_ADDR_UNDEF(dist->vgic_dist_base) ||
>>>>>> +         IS_VGIC_ADDR_UNDEF(dist->vgic_redist_base)) {
>>>>>> +             kvm_err("Need to set vgic distributor addresses first\n");
>>>>>> +             return -ENXIO;
>>>>>> +     }
>>>>>> +
>>>>>> +     /*
>>>>>> +      * FIXME: this should be moved to init_maps time, and may bite
>>>>>> +      * us when adding save/restore. Add a per-emulation hook?
>>>>>> +      */
>>>>>
>>>>> progress on this fixme?
>>>>
>>>> Progress supplies the ISS, but not this piece of code (read: none) ;-)
>>>> I am more in favour of a follow-up patch on this one ...
>>>
>>> hmmm, I'm not a fan of merging code with this kind of a comment in it,
>>> because it looks scary, and I dont' really understand the problem from
>>> just reading the comment, so something needs to be done here.
>>
>> I see. What about we are moving this unconditionally into vgic_init_maps
>> and allocate it for both v2 and v3 guests and get rid of the whole
>> function? It allocates only memory for the irq_spi_mpidr, which is 4
>> bytes per configured SPI (so at most less than 4 KB, but usually just
>> 128 Bytes per guest). This would be a pretty quick solution. Does that
>> sound too hackish?
>>
>> After your comments about the per-VM ops function pointers I am a bit
>> reluctant to introduce another one (which would be the obvious way
>> following the comment) for just this simple kalloc().
>> On the other hand the ITS emulation may later make better use of a GICv3
>> specific allocation function.
> 
> What I really disliked was the configuration of a function pointer,
> which, when invoked, configured other function pointers.  That just made
> my head spin.  So adding another per-gic-model init_maps method is not
> that bad, but on the other hand, the only problem with keeping this here
> is that when we restore the vgic state, then user space wants to be able
> to populate all the date before running any VCPUs, and we don't create
> the data structures before the first VCPU is run.
> 
> However, Eric has a problem with this "init-when-we-run-the-first-VCPU"
> approach as well, so one argument is that we need to add a method to
> both the gicv2 and gicv3 device API to say "VGIC_INIT" which userspace
> can call after having created all the VCPUs.  And, in fact, we may want
> to enforce this for the gicv3 right now and only maintain the existing
> behavior for gicv2.
> 
> (Eric's use case is configuring IRQFD, which must logically be done
> before running the machine, but also needs to be done after the vgic is
> fully ready.).
> 
> Does this make sense?

So if we would avoid that spooky "detect-if-a-VCPU-has-run" code and
rely on an explicit ioctl, I am in favor for this. We would need to keep
the current approach for compatibility, though, right?

So what about we either keep the current GICv3 allocation as it stands
in my patches right now (or move the GICv3 specific part into the
general vgic_init_maps()) and adapt that to the VGIC_INIT call once that
has appeared (or even handle this in that series then).

Does that make sense? What is the time frame for that VGIC_INIT call?

> We could consider scheduling a call for this if you think that would be
> helpful.

Depends on your answer to the above ;-)

Cheers,
Andre.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation
  2014-12-02 16:24             ` Andre Przywara
@ 2014-12-02 17:06               ` Marc Zyngier
  2014-12-02 17:32                 ` Andre Przywara
  2014-12-03 10:29                 ` Christoffer Dall
  2014-12-03 10:28               ` Christoffer Dall
  1 sibling, 2 replies; 80+ messages in thread
From: Marc Zyngier @ 2014-12-02 17:06 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/12/14 16:24, Andre Przywara wrote:
> Hej Christoffer,
> 
> On 30/11/14 08:30, Christoffer Dall wrote:
>> On Fri, Nov 28, 2014 at 03:24:11PM +0000, Andre Przywara wrote:
>>> Hej Christoffer,
>>>
>>> On 25/11/14 10:41, Christoffer Dall wrote:
>>>> Hi Andre,
>>>>
>>>> On Mon, Nov 24, 2014 at 04:00:46PM +0000, Andre Przywara wrote:
>>>>
>>>
> 
> [...]
> 
>>>>>>> +
>>>>>>> +     if (!is_in_range(mmio->phys_addr, mmio->len, rdbase,
>>>>>>> +         GIC_V3_REDIST_SIZE * nrcpus))
>>>>>>> +             return false;
>>>>>>
>>>>>> Did you think more about the contiguous allocation issue here or can you
>>>>>> give me a pointer to the requirement in the spec?
>>>>>
>>>>> 5.4.1 Re-Distributor Addressing
>>>>>
>>>>
>>>> Section 5.4.1 talks about the pages within a single re-distributor having
>>>> to be contiguous, not all the re-deistributor regions having to be
>>>> contiguous, right?
>>>
>>> Ah yes, you are right. But I still think it does not matter:
>>> 1) We are "implementing" the GICv3. So as the spec does not forbid this,
>>> we just state that the redistributor register maps for each VCPU are
>>> contiguous. Also we create the FDT accordingly. I will add a comment in
>>> the documentation to state this.
>>>
>>> 2) The kernel's GICv3 DT bindings assume this allocation is the default.
>>> Although Marc added bindings to work around this (stride), it seems much
>>> more logical to me to not use it.
>>
>> I don't disagree (and never have) with the fact that it is up to us to
>> decide.
>>
>> My original question, which we haven't talked about yet, is if it is
>> *reasonable* to assume that all re-distributor regions will always be
>> contiguous?
>>
>> How will you handle VCPU hotplug for example?
> 
> As kvmtool does not support hotplug, I haven't thought about this yet.
> To me it looks like userland should just use maxcpus for the allocation.
> If I get the current QEMU code right, there is room for 127 GICv3 VCPUs
> (2*64K per VCPU + 64K for the distributor in 16M space) at the moment.
> Kvmtool uses a different mapping, which allows to share 1G with virtio,
> so the limit is around 8000ish VCPUs here.
> Are there any issues with changing the QEMU virt mapping later?
> Migration, maybe?
> If the UART, the RTC and the virtio regions are moved more towards the
> beginning of the 256MB PCI mapping, then there should be space for a bit
> less than 1024 VCPUs, if I get this right.
> 
>> Where in the guest
>> physical memory map of our various virt machines should these regions
>> sit so that we can allocate anough re-distributors for VCPUs etc.?
> 
> Various? Are there other mappings than those described in hw/arm/virt.c?
> 
>> I just want to make sure we're not limiting ourselves by some amount of
>> functionality or ABI (redistributor base addresses) that will be hard to
>> expand in the future.
> 
> If we are flexible with the mapping at VM creation time, QEMU could just
> use a mapping depending on max_cpus:
> < 128 VCPUs: use the current mapping
> 128 <= x < 1020: use a more compressed mapping
>> = 1020: map the redistributor somewhere above 4 GB
> 
> As the device tree binding for GICv3 just supports a stride value, we
> don't have any other real options beside this, right? So how I see this,
> a contiguous mapping (with possible holes) is the only way.

Not really. The GICv3 binding definitely supports having several regions
for the redistributors (see the binding documentation). This allows for
the pathological case where you have N regions for N CPUs. Not that we
ever want to go there, really.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation
  2014-12-02 17:06               ` Marc Zyngier
@ 2014-12-02 17:32                 ` Andre Przywara
  2014-12-03 10:30                   ` Christoffer Dall
  2014-12-03 10:29                 ` Christoffer Dall
  1 sibling, 1 reply; 80+ messages in thread
From: Andre Przywara @ 2014-12-02 17:32 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/12/14 17:06, Marc Zyngier wrote:
> On 02/12/14 16:24, Andre Przywara wrote:
>> Hej Christoffer,
>>
>> On 30/11/14 08:30, Christoffer Dall wrote:
>>> On Fri, Nov 28, 2014 at 03:24:11PM +0000, Andre Przywara wrote:
>>>> Hej Christoffer,
>>>>
>>>> On 25/11/14 10:41, Christoffer Dall wrote:
>>>>> Hi Andre,
>>>>>
>>>>> On Mon, Nov 24, 2014 at 04:00:46PM +0000, Andre Przywara wrote:
>>>>>
>>>>
>>
>> [...]
>>
>>>>>>>> +
>>>>>>>> +     if (!is_in_range(mmio->phys_addr, mmio->len, rdbase,
>>>>>>>> +         GIC_V3_REDIST_SIZE * nrcpus))
>>>>>>>> +             return false;
>>>>>>>
>>>>>>> Did you think more about the contiguous allocation issue here or can you
>>>>>>> give me a pointer to the requirement in the spec?
>>>>>>
>>>>>> 5.4.1 Re-Distributor Addressing
>>>>>>
>>>>>
>>>>> Section 5.4.1 talks about the pages within a single re-distributor having
>>>>> to be contiguous, not all the re-deistributor regions having to be
>>>>> contiguous, right?
>>>>
>>>> Ah yes, you are right. But I still think it does not matter:
>>>> 1) We are "implementing" the GICv3. So as the spec does not forbid this,
>>>> we just state that the redistributor register maps for each VCPU are
>>>> contiguous. Also we create the FDT accordingly. I will add a comment in
>>>> the documentation to state this.
>>>>
>>>> 2) The kernel's GICv3 DT bindings assume this allocation is the default.
>>>> Although Marc added bindings to work around this (stride), it seems much
>>>> more logical to me to not use it.
>>>
>>> I don't disagree (and never have) with the fact that it is up to us to
>>> decide.
>>>
>>> My original question, which we haven't talked about yet, is if it is
>>> *reasonable* to assume that all re-distributor regions will always be
>>> contiguous?
>>>
>>> How will you handle VCPU hotplug for example?
>>
>> As kvmtool does not support hotplug, I haven't thought about this yet.
>> To me it looks like userland should just use maxcpus for the allocation.
>> If I get the current QEMU code right, there is room for 127 GICv3 VCPUs
>> (2*64K per VCPU + 64K for the distributor in 16M space) at the moment.
>> Kvmtool uses a different mapping, which allows to share 1G with virtio,
>> so the limit is around 8000ish VCPUs here.
>> Are there any issues with changing the QEMU virt mapping later?
>> Migration, maybe?
>> If the UART, the RTC and the virtio regions are moved more towards the
>> beginning of the 256MB PCI mapping, then there should be space for a bit
>> less than 1024 VCPUs, if I get this right.
>>
>>> Where in the guest
>>> physical memory map of our various virt machines should these regions
>>> sit so that we can allocate anough re-distributors for VCPUs etc.?
>>
>> Various? Are there other mappings than those described in hw/arm/virt.c?
>>
>>> I just want to make sure we're not limiting ourselves by some amount of
>>> functionality or ABI (redistributor base addresses) that will be hard to
>>> expand in the future.
>>
>> If we are flexible with the mapping at VM creation time, QEMU could just
>> use a mapping depending on max_cpus:
>> < 128 VCPUs: use the current mapping
>> 128 <= x < 1020: use a more compressed mapping
>>> = 1020: map the redistributor somewhere above 4 GB
>>
>> As the device tree binding for GICv3 just supports a stride value, we
>> don't have any other real options beside this, right? So how I see this,
>> a contiguous mapping (with possible holes) is the only way.
> 
> Not really. The GICv3 binding definitely supports having several regions
> for the redistributors (see the binding documentation). This allows for
> the pathological case where you have N regions for N CPUs. Not that we
> ever want to go there, really.

Ah yes, thanks for pointing that out. I was mixing this up with the
stride parameter, which is independent of this. Sorry for that.

So from a userland point of view we probably would like to have the
first n VCPU's redistributors mapped at their current places and allow
for more VCPUs to use memory above 4 GB.
Which would require quite some changes to the code to support this in a
very flexible way. I think this could be much easier if we confine
ourselves to two regions (one contiguous lower (< 4 GB) and one
contiguous upper region (>4 GB)), so we don't need to support arbitrary
per VCPU addresses, but could just use the 1st or 2nd map depending on
the VCPU number.
Is this too hackish?
If not, I would add another vgic_addr type (like
KVM_VGIC_V3_ADDR_TYPE_REDIST_UPPER or so) to be used from userland and
use that in the handle_mmio region detection.
Let me know if that sounds reasonable.

Cheers,
Andre.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation
  2014-12-02 16:24             ` Andre Przywara
  2014-12-02 17:06               ` Marc Zyngier
@ 2014-12-03 10:28               ` Christoffer Dall
  2014-12-03 11:10                 ` Andre Przywara
  1 sibling, 1 reply; 80+ messages in thread
From: Christoffer Dall @ 2014-12-03 10:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 02, 2014 at 04:24:53PM +0000, Andre Przywara wrote:
> Hej Christoffer,
> 
> On 30/11/14 08:30, Christoffer Dall wrote:
> > On Fri, Nov 28, 2014 at 03:24:11PM +0000, Andre Przywara wrote:
> >> Hej Christoffer,
> >>
> >> On 25/11/14 10:41, Christoffer Dall wrote:
> >>> Hi Andre,
> >>>
> >>> On Mon, Nov 24, 2014 at 04:00:46PM +0000, Andre Przywara wrote:
> >>>
> >>
> 
> [...]
> 
> >>>>>> +
> >>>>>> +     if (!is_in_range(mmio->phys_addr, mmio->len, rdbase,
> >>>>>> +         GIC_V3_REDIST_SIZE * nrcpus))
> >>>>>> +             return false;
> >>>>>
> >>>>> Did you think more about the contiguous allocation issue here or can you
> >>>>> give me a pointer to the requirement in the spec?
> >>>>
> >>>> 5.4.1 Re-Distributor Addressing
> >>>>
> >>>
> >>> Section 5.4.1 talks about the pages within a single re-distributor having
> >>> to be contiguous, not all the re-deistributor regions having to be
> >>> contiguous, right?
> >>
> >> Ah yes, you are right. But I still think it does not matter:
> >> 1) We are "implementing" the GICv3. So as the spec does not forbid this,
> >> we just state that the redistributor register maps for each VCPU are
> >> contiguous. Also we create the FDT accordingly. I will add a comment in
> >> the documentation to state this.
> >>
> >> 2) The kernel's GICv3 DT bindings assume this allocation is the default.
> >> Although Marc added bindings to work around this (stride), it seems much
> >> more logical to me to not use it.
> > 
> > I don't disagree (and never have) with the fact that it is up to us to
> > decide.
> > 
> > My original question, which we haven't talked about yet, is if it is
> > *reasonable* to assume that all re-distributor regions will always be
> > contiguous?
> > 
> > How will you handle VCPU hotplug for example?
> 
> As kvmtool does not support hotplug, I haven't thought about this yet.
> To me it looks like userland should just use maxcpus for the allocation.
> If I get the current QEMU code right, there is room for 127 GICv3 VCPUs
> (2*64K per VCPU + 64K for the distributor in 16M space) at the moment.
> Kvmtool uses a different mapping, which allows to share 1G with virtio,
> so the limit is around 8000ish VCPUs here.
> Are there any issues with changing the QEMU virt mapping later?

Not issues as such, but we try to keep it as stable as possible.  At
least soon, you have to worry about UEFI working with such changes, for
example.

> Migration, maybe?
> If the UART, the RTC and the virtio regions are moved more towards the
> beginning of the 256MB PCI mapping, then there should be space for a bit
> less than 1024 VCPUs, if I get this right.
> 
> > Where in the guest
> > physical memory map of our various virt machines should these regions
> > sit so that we can allocate anough re-distributors for VCPUs etc.?
> 
> Various? Are there other mappings than those described in hw/arm/virt.c?
> 

QEMU's and kvmtool's, for starters.

> > I just want to make sure we're not limiting ourselves by some amount of
> > functionality or ABI (redistributor base addresses) that will be hard to
> > expand in the future.
> 
> If we are flexible with the mapping at VM creation time, QEMU could just
> use a mapping depending on max_cpus:
> < 128 VCPUs: use the current mapping
> 128 <= x < 1020: use a more compressed mapping
> >= 1020: map the redistributor somewhere above 4 GB

I don't understand what the compressed mapping is.  Is the current code
easy to expand such that you can add a secondary separate redistributor
region above the 4GB map?

> 
> As the device tree binding for GICv3 just supports a stride value, we
> don't have any other real options beside this, right? So how I see this,
> a contiguous mapping (with possible holes) is the only way.
> 
> >>>>>> +
> >>>>>> +static int vgic_v3_init(struct kvm *kvm, const struct vgic_params *params)
> >>>>>> +{
> >>>>>> +     struct vgic_dist *dist = &kvm->arch.vgic;
> >>>>>> +     int ret, i;
> >>>>>> +     u32 mpidr;
> >>>>>> +
> >>>>>> +     if (IS_VGIC_ADDR_UNDEF(dist->vgic_dist_base) ||
> >>>>>> +         IS_VGIC_ADDR_UNDEF(dist->vgic_redist_base)) {
> >>>>>> +             kvm_err("Need to set vgic distributor addresses first\n");
> >>>>>> +             return -ENXIO;
> >>>>>> +     }
> >>>>>> +
> >>>>>> +     /*
> >>>>>> +      * FIXME: this should be moved to init_maps time, and may bite
> >>>>>> +      * us when adding save/restore. Add a per-emulation hook?
> >>>>>> +      */
> >>>>>
> >>>>> progress on this fixme?
> >>>>
> >>>> Progress supplies the ISS, but not this piece of code (read: none) ;-)
> >>>> I am more in favour of a follow-up patch on this one ...
> >>>
> >>> hmmm, I'm not a fan of merging code with this kind of a comment in it,
> >>> because it looks scary, and I dont' really understand the problem from
> >>> just reading the comment, so something needs to be done here.
> >>
> >> I see. What about we are moving this unconditionally into vgic_init_maps
> >> and allocate it for both v2 and v3 guests and get rid of the whole
> >> function? It allocates only memory for the irq_spi_mpidr, which is 4
> >> bytes per configured SPI (so at most less than 4 KB, but usually just
> >> 128 Bytes per guest). This would be a pretty quick solution. Does that
> >> sound too hackish?
> >>
> >> After your comments about the per-VM ops function pointers I am a bit
> >> reluctant to introduce another one (which would be the obvious way
> >> following the comment) for just this simple kalloc().
> >> On the other hand the ITS emulation may later make better use of a GICv3
> >> specific allocation function.
> > 
> > What I really disliked was the configuration of a function pointer,
> > which, when invoked, configured other function pointers.  That just made
> > my head spin.  So adding another per-gic-model init_maps method is not
> > that bad, but on the other hand, the only problem with keeping this here
> > is that when we restore the vgic state, then user space wants to be able
> > to populate all the date before running any VCPUs, and we don't create
> > the data structures before the first VCPU is run.
> > 
> > However, Eric has a problem with this "init-when-we-run-the-first-VCPU"
> > approach as well, so one argument is that we need to add a method to
> > both the gicv2 and gicv3 device API to say "VGIC_INIT" which userspace
> > can call after having created all the VCPUs.  And, in fact, we may want
> > to enforce this for the gicv3 right now and only maintain the existing
> > behavior for gicv2.
> > 
> > (Eric's use case is configuring IRQFD, which must logically be done
> > before running the machine, but also needs to be done after the vgic is
> > fully ready.).
> > 
> > Does this make sense?
> 
> So if we would avoid that spooky "detect-if-a-VCPU-has-run" code and
> rely on an explicit ioctl, I am in favor for this. We would need to keep
> the current approach for compatibility, though, right?

not for gicv3, only for gicv2, which is exactly why I'm bringing it up
here.

> 
> So what about we either keep the current GICv3 allocation as it stands
> in my patches right now (or move the GICv3 specific part into the
> general vgic_init_maps()) and adapt that to the VGIC_INIT call once that
> has appeared (or even handle this in that series then).
> 
> Does that make sense? What is the time frame for that VGIC_INIT call?
> 

Eric posted some patches in private yesterday, I think he's planning on
sending them out any day now.

-Christoffer

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation
  2014-12-02 17:06               ` Marc Zyngier
  2014-12-02 17:32                 ` Andre Przywara
@ 2014-12-03 10:29                 ` Christoffer Dall
  2014-12-03 10:44                   ` Marc Zyngier
  1 sibling, 1 reply; 80+ messages in thread
From: Christoffer Dall @ 2014-12-03 10:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 02, 2014 at 05:06:09PM +0000, Marc Zyngier wrote:
> On 02/12/14 16:24, Andre Przywara wrote:
> > Hej Christoffer,
> > 
> > On 30/11/14 08:30, Christoffer Dall wrote:
> >> On Fri, Nov 28, 2014 at 03:24:11PM +0000, Andre Przywara wrote:
> >>> Hej Christoffer,
> >>>
> >>> On 25/11/14 10:41, Christoffer Dall wrote:
> >>>> Hi Andre,
> >>>>
> >>>> On Mon, Nov 24, 2014 at 04:00:46PM +0000, Andre Przywara wrote:
> >>>>
> >>>
> > 
> > [...]
> > 
> >>>>>>> +
> >>>>>>> +     if (!is_in_range(mmio->phys_addr, mmio->len, rdbase,
> >>>>>>> +         GIC_V3_REDIST_SIZE * nrcpus))
> >>>>>>> +             return false;
> >>>>>>
> >>>>>> Did you think more about the contiguous allocation issue here or can you
> >>>>>> give me a pointer to the requirement in the spec?
> >>>>>
> >>>>> 5.4.1 Re-Distributor Addressing
> >>>>>
> >>>>
> >>>> Section 5.4.1 talks about the pages within a single re-distributor having
> >>>> to be contiguous, not all the re-deistributor regions having to be
> >>>> contiguous, right?
> >>>
> >>> Ah yes, you are right. But I still think it does not matter:
> >>> 1) We are "implementing" the GICv3. So as the spec does not forbid this,
> >>> we just state that the redistributor register maps for each VCPU are
> >>> contiguous. Also we create the FDT accordingly. I will add a comment in
> >>> the documentation to state this.
> >>>
> >>> 2) The kernel's GICv3 DT bindings assume this allocation is the default.
> >>> Although Marc added bindings to work around this (stride), it seems much
> >>> more logical to me to not use it.
> >>
> >> I don't disagree (and never have) with the fact that it is up to us to
> >> decide.
> >>
> >> My original question, which we haven't talked about yet, is if it is
> >> *reasonable* to assume that all re-distributor regions will always be
> >> contiguous?
> >>
> >> How will you handle VCPU hotplug for example?
> > 
> > As kvmtool does not support hotplug, I haven't thought about this yet.
> > To me it looks like userland should just use maxcpus for the allocation.
> > If I get the current QEMU code right, there is room for 127 GICv3 VCPUs
> > (2*64K per VCPU + 64K for the distributor in 16M space) at the moment.
> > Kvmtool uses a different mapping, which allows to share 1G with virtio,
> > so the limit is around 8000ish VCPUs here.
> > Are there any issues with changing the QEMU virt mapping later?
> > Migration, maybe?
> > If the UART, the RTC and the virtio regions are moved more towards the
> > beginning of the 256MB PCI mapping, then there should be space for a bit
> > less than 1024 VCPUs, if I get this right.
> > 
> >> Where in the guest
> >> physical memory map of our various virt machines should these regions
> >> sit so that we can allocate anough re-distributors for VCPUs etc.?
> > 
> > Various? Are there other mappings than those described in hw/arm/virt.c?
> > 
> >> I just want to make sure we're not limiting ourselves by some amount of
> >> functionality or ABI (redistributor base addresses) that will be hard to
> >> expand in the future.
> > 
> > If we are flexible with the mapping at VM creation time, QEMU could just
> > use a mapping depending on max_cpus:
> > < 128 VCPUs: use the current mapping
> > 128 <= x < 1020: use a more compressed mapping
> >> = 1020: map the redistributor somewhere above 4 GB
> > 
> > As the device tree binding for GICv3 just supports a stride value, we
> > don't have any other real options beside this, right? So how I see this,
> > a contiguous mapping (with possible holes) is the only way.
> 
> Not really. The GICv3 binding definitely supports having several regions
> for the redistributors (see the binding documentation). This allows for
> the pathological case where you have N regions for N CPUs. Not that we
> ever want to go there, really.
> 
What are your thoughts on mapping all of the redistributor regions in
one consecutive guest phys address space chunk?  Am I making an issue
out of nothing?

-Christoffer

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation
  2014-12-02 17:32                 ` Andre Przywara
@ 2014-12-03 10:30                   ` Christoffer Dall
  2014-12-03 10:47                     ` Andre Przywara
  0 siblings, 1 reply; 80+ messages in thread
From: Christoffer Dall @ 2014-12-03 10:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 02, 2014 at 05:32:45PM +0000, Andre Przywara wrote:
> On 02/12/14 17:06, Marc Zyngier wrote:
> > On 02/12/14 16:24, Andre Przywara wrote:
> >> Hej Christoffer,
> >>
> >> On 30/11/14 08:30, Christoffer Dall wrote:
> >>> On Fri, Nov 28, 2014 at 03:24:11PM +0000, Andre Przywara wrote:
> >>>> Hej Christoffer,
> >>>>
> >>>> On 25/11/14 10:41, Christoffer Dall wrote:
> >>>>> Hi Andre,
> >>>>>
> >>>>> On Mon, Nov 24, 2014 at 04:00:46PM +0000, Andre Przywara wrote:
> >>>>>
> >>>>
> >>
> >> [...]
> >>
> >>>>>>>> +
> >>>>>>>> +     if (!is_in_range(mmio->phys_addr, mmio->len, rdbase,
> >>>>>>>> +         GIC_V3_REDIST_SIZE * nrcpus))
> >>>>>>>> +             return false;
> >>>>>>>
> >>>>>>> Did you think more about the contiguous allocation issue here or can you
> >>>>>>> give me a pointer to the requirement in the spec?
> >>>>>>
> >>>>>> 5.4.1 Re-Distributor Addressing
> >>>>>>
> >>>>>
> >>>>> Section 5.4.1 talks about the pages within a single re-distributor having
> >>>>> to be contiguous, not all the re-deistributor regions having to be
> >>>>> contiguous, right?
> >>>>
> >>>> Ah yes, you are right. But I still think it does not matter:
> >>>> 1) We are "implementing" the GICv3. So as the spec does not forbid this,
> >>>> we just state that the redistributor register maps for each VCPU are
> >>>> contiguous. Also we create the FDT accordingly. I will add a comment in
> >>>> the documentation to state this.
> >>>>
> >>>> 2) The kernel's GICv3 DT bindings assume this allocation is the default.
> >>>> Although Marc added bindings to work around this (stride), it seems much
> >>>> more logical to me to not use it.
> >>>
> >>> I don't disagree (and never have) with the fact that it is up to us to
> >>> decide.
> >>>
> >>> My original question, which we haven't talked about yet, is if it is
> >>> *reasonable* to assume that all re-distributor regions will always be
> >>> contiguous?
> >>>
> >>> How will you handle VCPU hotplug for example?
> >>
> >> As kvmtool does not support hotplug, I haven't thought about this yet.
> >> To me it looks like userland should just use maxcpus for the allocation.
> >> If I get the current QEMU code right, there is room for 127 GICv3 VCPUs
> >> (2*64K per VCPU + 64K for the distributor in 16M space) at the moment.
> >> Kvmtool uses a different mapping, which allows to share 1G with virtio,
> >> so the limit is around 8000ish VCPUs here.
> >> Are there any issues with changing the QEMU virt mapping later?
> >> Migration, maybe?
> >> If the UART, the RTC and the virtio regions are moved more towards the
> >> beginning of the 256MB PCI mapping, then there should be space for a bit
> >> less than 1024 VCPUs, if I get this right.
> >>
> >>> Where in the guest
> >>> physical memory map of our various virt machines should these regions
> >>> sit so that we can allocate anough re-distributors for VCPUs etc.?
> >>
> >> Various? Are there other mappings than those described in hw/arm/virt.c?
> >>
> >>> I just want to make sure we're not limiting ourselves by some amount of
> >>> functionality or ABI (redistributor base addresses) that will be hard to
> >>> expand in the future.
> >>
> >> If we are flexible with the mapping at VM creation time, QEMU could just
> >> use a mapping depending on max_cpus:
> >> < 128 VCPUs: use the current mapping
> >> 128 <= x < 1020: use a more compressed mapping
> >>> = 1020: map the redistributor somewhere above 4 GB
> >>
> >> As the device tree binding for GICv3 just supports a stride value, we
> >> don't have any other real options beside this, right? So how I see this,
> >> a contiguous mapping (with possible holes) is the only way.
> > 
> > Not really. The GICv3 binding definitely supports having several regions
> > for the redistributors (see the binding documentation). This allows for
> > the pathological case where you have N regions for N CPUs. Not that we
> > ever want to go there, really.
> 
> Ah yes, thanks for pointing that out. I was mixing this up with the
> stride parameter, which is independent of this. Sorry for that.
> 
> So from a userland point of view we probably would like to have the
> first n VCPU's redistributors mapped at their current places and allow
> for more VCPUs to use memory above 4 GB.
> Which would require quite some changes to the code to support this in a
> very flexible way. I think this could be much easier if we confine
> ourselves to two regions (one contiguous lower (< 4 GB) and one
> contiguous upper region (>4 GB)), so we don't need to support arbitrary
> per VCPU addresses, but could just use the 1st or 2nd map depending on
> the VCPU number.
> Is this too hackish?
> If not, I would add another vgic_addr type (like
> KVM_VGIC_V3_ADDR_TYPE_REDIST_UPPER or so) to be used from userland and
> use that in the handle_mmio region detection.
> Let me know if that sounds reasonable.
> 
The point that I've been trying to make sure we think about is if we'll
regret not being able to fragment the redistributor regions a bit.  Even
if it's technically possible, we may regret requiring a huge contigous
allocation in the guest physical address space.  But maybe we don't care
when we have 40 bits to play with?

-Christoffer

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation
  2014-12-03 10:29                 ` Christoffer Dall
@ 2014-12-03 10:44                   ` Marc Zyngier
  2014-12-03 11:07                     ` Christoffer Dall
  0 siblings, 1 reply; 80+ messages in thread
From: Marc Zyngier @ 2014-12-03 10:44 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/12/14 10:29, Christoffer Dall wrote:
> On Tue, Dec 02, 2014 at 05:06:09PM +0000, Marc Zyngier wrote:
>> On 02/12/14 16:24, Andre Przywara wrote:
>>> Hej Christoffer,
>>>
>>> On 30/11/14 08:30, Christoffer Dall wrote:
>>>> On Fri, Nov 28, 2014 at 03:24:11PM +0000, Andre Przywara wrote:
>>>>> Hej Christoffer,
>>>>>
>>>>> On 25/11/14 10:41, Christoffer Dall wrote:
>>>>>> Hi Andre,
>>>>>>
>>>>>> On Mon, Nov 24, 2014 at 04:00:46PM +0000, Andre Przywara wrote:
>>>>>>
>>>>>
>>>
>>> [...]
>>>
>>>>>>>>> +
>>>>>>>>> +     if (!is_in_range(mmio->phys_addr, mmio->len, rdbase,
>>>>>>>>> +         GIC_V3_REDIST_SIZE * nrcpus))
>>>>>>>>> +             return false;
>>>>>>>>
>>>>>>>> Did you think more about the contiguous allocation issue here or can you
>>>>>>>> give me a pointer to the requirement in the spec?
>>>>>>>
>>>>>>> 5.4.1 Re-Distributor Addressing
>>>>>>>
>>>>>>
>>>>>> Section 5.4.1 talks about the pages within a single re-distributor having
>>>>>> to be contiguous, not all the re-deistributor regions having to be
>>>>>> contiguous, right?
>>>>>
>>>>> Ah yes, you are right. But I still think it does not matter:
>>>>> 1) We are "implementing" the GICv3. So as the spec does not forbid this,
>>>>> we just state that the redistributor register maps for each VCPU are
>>>>> contiguous. Also we create the FDT accordingly. I will add a comment in
>>>>> the documentation to state this.
>>>>>
>>>>> 2) The kernel's GICv3 DT bindings assume this allocation is the default.
>>>>> Although Marc added bindings to work around this (stride), it seems much
>>>>> more logical to me to not use it.
>>>>
>>>> I don't disagree (and never have) with the fact that it is up to us to
>>>> decide.
>>>>
>>>> My original question, which we haven't talked about yet, is if it is
>>>> *reasonable* to assume that all re-distributor regions will always be
>>>> contiguous?
>>>>
>>>> How will you handle VCPU hotplug for example?
>>>
>>> As kvmtool does not support hotplug, I haven't thought about this yet.
>>> To me it looks like userland should just use maxcpus for the allocation.
>>> If I get the current QEMU code right, there is room for 127 GICv3 VCPUs
>>> (2*64K per VCPU + 64K for the distributor in 16M space) at the moment.
>>> Kvmtool uses a different mapping, which allows to share 1G with virtio,
>>> so the limit is around 8000ish VCPUs here.
>>> Are there any issues with changing the QEMU virt mapping later?
>>> Migration, maybe?
>>> If the UART, the RTC and the virtio regions are moved more towards the
>>> beginning of the 256MB PCI mapping, then there should be space for a bit
>>> less than 1024 VCPUs, if I get this right.
>>>
>>>> Where in the guest
>>>> physical memory map of our various virt machines should these regions
>>>> sit so that we can allocate anough re-distributors for VCPUs etc.?
>>>
>>> Various? Are there other mappings than those described in hw/arm/virt.c?
>>>
>>>> I just want to make sure we're not limiting ourselves by some amount of
>>>> functionality or ABI (redistributor base addresses) that will be hard to
>>>> expand in the future.
>>>
>>> If we are flexible with the mapping at VM creation time, QEMU could just
>>> use a mapping depending on max_cpus:
>>> < 128 VCPUs: use the current mapping
>>> 128 <= x < 1020: use a more compressed mapping
>>>> = 1020: map the redistributor somewhere above 4 GB
>>>
>>> As the device tree binding for GICv3 just supports a stride value, we
>>> don't have any other real options beside this, right? So how I see this,
>>> a contiguous mapping (with possible holes) is the only way.
>>
>> Not really. The GICv3 binding definitely supports having several regions
>> for the redistributors (see the binding documentation). This allows for
>> the pathological case where you have N regions for N CPUs. Not that we
>> ever want to go there, really.
>>
> What are your thoughts on mapping all of the redistributor regions in
> one consecutive guest phys address space chunk?  Am I making an issue
> out of nothing?

I don't think this is too bad. It puts constraints on the physical
memory map, but we do have a massive IPA space anyway (at least on
arm64). Of course, the issue is slightly more acute on 32bit guests,
where IPA space is at a premium. But this is fairly accurately modelling
a monolithic GICv3 (as opposed to distributed).

I imagine that, over time, we'll have to introduce support for "split"
redistributor ranges, but that probably only become an issue when you
want to support guests with several hundred vcpus.

Another interesting point you raise is vcpu hotplug. I'm not completely
sure how that would work. Do we pre-allocate redistributors, do we have
a more coarse grained "socket hot-plug"? I think that we need to give it
some thoughts, as this probably require a slightly different model for
GICv3.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation
  2014-12-03 10:30                   ` Christoffer Dall
@ 2014-12-03 10:47                     ` Andre Przywara
  2014-12-03 11:06                       ` Christoffer Dall
  0 siblings, 1 reply; 80+ messages in thread
From: Andre Przywara @ 2014-12-03 10:47 UTC (permalink / raw)
  To: linux-arm-kernel



On 03/12/14 10:30, Christoffer Dall wrote:
> On Tue, Dec 02, 2014 at 05:32:45PM +0000, Andre Przywara wrote:
>> On 02/12/14 17:06, Marc Zyngier wrote:
>>> On 02/12/14 16:24, Andre Przywara wrote:
>>>> Hej Christoffer,
>>>>
>>>> On 30/11/14 08:30, Christoffer Dall wrote:
>>>>> On Fri, Nov 28, 2014 at 03:24:11PM +0000, Andre Przywara wrote:
>>>>>> Hej Christoffer,
>>>>>>
>>>>>> On 25/11/14 10:41, Christoffer Dall wrote:
>>>>>>> Hi Andre,
>>>>>>>
>>>>>>> On Mon, Nov 24, 2014 at 04:00:46PM +0000, Andre Przywara wrote:
>>>>>>>
>>>>>>
>>>>
>>>> [...]
>>>>
>>>>>>>>>> +
>>>>>>>>>> +     if (!is_in_range(mmio->phys_addr, mmio->len, rdbase,
>>>>>>>>>> +         GIC_V3_REDIST_SIZE * nrcpus))
>>>>>>>>>> +             return false;
>>>>>>>>>
>>>>>>>>> Did you think more about the contiguous allocation issue here or can you
>>>>>>>>> give me a pointer to the requirement in the spec?
>>>>>>>>
>>>>>>>> 5.4.1 Re-Distributor Addressing
>>>>>>>>
>>>>>>>
>>>>>>> Section 5.4.1 talks about the pages within a single re-distributor having
>>>>>>> to be contiguous, not all the re-deistributor regions having to be
>>>>>>> contiguous, right?
>>>>>>
>>>>>> Ah yes, you are right. But I still think it does not matter:
>>>>>> 1) We are "implementing" the GICv3. So as the spec does not forbid this,
>>>>>> we just state that the redistributor register maps for each VCPU are
>>>>>> contiguous. Also we create the FDT accordingly. I will add a comment in
>>>>>> the documentation to state this.
>>>>>>
>>>>>> 2) The kernel's GICv3 DT bindings assume this allocation is the default.
>>>>>> Although Marc added bindings to work around this (stride), it seems much
>>>>>> more logical to me to not use it.
>>>>>
>>>>> I don't disagree (and never have) with the fact that it is up to us to
>>>>> decide.
>>>>>
>>>>> My original question, which we haven't talked about yet, is if it is
>>>>> *reasonable* to assume that all re-distributor regions will always be
>>>>> contiguous?
>>>>>
>>>>> How will you handle VCPU hotplug for example?
>>>>
>>>> As kvmtool does not support hotplug, I haven't thought about this yet.
>>>> To me it looks like userland should just use maxcpus for the allocation.
>>>> If I get the current QEMU code right, there is room for 127 GICv3 VCPUs
>>>> (2*64K per VCPU + 64K for the distributor in 16M space) at the moment.
>>>> Kvmtool uses a different mapping, which allows to share 1G with virtio,
>>>> so the limit is around 8000ish VCPUs here.
>>>> Are there any issues with changing the QEMU virt mapping later?
>>>> Migration, maybe?
>>>> If the UART, the RTC and the virtio regions are moved more towards the
>>>> beginning of the 256MB PCI mapping, then there should be space for a bit
>>>> less than 1024 VCPUs, if I get this right.
>>>>
>>>>> Where in the guest
>>>>> physical memory map of our various virt machines should these regions
>>>>> sit so that we can allocate anough re-distributors for VCPUs etc.?
>>>>
>>>> Various? Are there other mappings than those described in hw/arm/virt.c?
>>>>
>>>>> I just want to make sure we're not limiting ourselves by some amount of
>>>>> functionality or ABI (redistributor base addresses) that will be hard to
>>>>> expand in the future.
>>>>
>>>> If we are flexible with the mapping at VM creation time, QEMU could just
>>>> use a mapping depending on max_cpus:
>>>> < 128 VCPUs: use the current mapping
>>>> 128 <= x < 1020: use a more compressed mapping
>>>>> = 1020: map the redistributor somewhere above 4 GB
>>>>
>>>> As the device tree binding for GICv3 just supports a stride value, we
>>>> don't have any other real options beside this, right? So how I see this,
>>>> a contiguous mapping (with possible holes) is the only way.
>>>
>>> Not really. The GICv3 binding definitely supports having several regions
>>> for the redistributors (see the binding documentation). This allows for
>>> the pathological case where you have N regions for N CPUs. Not that we
>>> ever want to go there, really.
>>
>> Ah yes, thanks for pointing that out. I was mixing this up with the
>> stride parameter, which is independent of this. Sorry for that.
>>
>> So from a userland point of view we probably would like to have the
>> first n VCPU's redistributors mapped at their current places and allow
>> for more VCPUs to use memory above 4 GB.
>> Which would require quite some changes to the code to support this in a
>> very flexible way. I think this could be much easier if we confine
>> ourselves to two regions (one contiguous lower (< 4 GB) and one
>> contiguous upper region (>4 GB)), so we don't need to support arbitrary
>> per VCPU addresses, but could just use the 1st or 2nd map depending on
>> the VCPU number.
>> Is this too hackish?
>> If not, I would add another vgic_addr type (like
>> KVM_VGIC_V3_ADDR_TYPE_REDIST_UPPER or so) to be used from userland and
>> use that in the handle_mmio region detection.
>> Let me know if that sounds reasonable.
>>
> The point that I've been trying to make sure we think about is if we'll
> regret not being able to fragment the redistributor regions a bit.  Even
> if it's technically possible, we may regret requiring a huge contigous
> allocation in the guest physical address space.  But maybe we don't care
> when we have 40 bits to play with?

40 bits are more than enough. But are we OK with using only memory above
4GB? Is there some code before the Linux kernel that is limited to 4GB?
I am thinking about 32bit guests in particular, which may have some
firmware blob executed before which may not use the MMU.

If this is not an issue, I'd rather stay with one contiguous region - at
least for the itme being. The current GICv3 code has a limit of 255
VCPUs anyway, so this requires at most 32MB, which should be easily
fitted anywhere.

Should we later need to extend the number of VCPUs, we can in the worst
case adjust the code to support split regions if the 4GB limit issue
persists. This would be done via a new KVM capability and some new
register groups in the KVM device ioctl to set a second (or following)
region, so in a backwards compatible way.

Cheers,
Andre.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation
  2014-12-03 10:47                     ` Andre Przywara
@ 2014-12-03 11:06                       ` Christoffer Dall
  0 siblings, 0 replies; 80+ messages in thread
From: Christoffer Dall @ 2014-12-03 11:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Dec 03, 2014 at 10:47:34AM +0000, Andre Przywara wrote:
> 
> 
> On 03/12/14 10:30, Christoffer Dall wrote:
> > On Tue, Dec 02, 2014 at 05:32:45PM +0000, Andre Przywara wrote:
> >> On 02/12/14 17:06, Marc Zyngier wrote:
> >>> On 02/12/14 16:24, Andre Przywara wrote:
> >>>> Hej Christoffer,
> >>>>
> >>>> On 30/11/14 08:30, Christoffer Dall wrote:
> >>>>> On Fri, Nov 28, 2014 at 03:24:11PM +0000, Andre Przywara wrote:
> >>>>>> Hej Christoffer,
> >>>>>>
> >>>>>> On 25/11/14 10:41, Christoffer Dall wrote:
> >>>>>>> Hi Andre,
> >>>>>>>
> >>>>>>> On Mon, Nov 24, 2014 at 04:00:46PM +0000, Andre Przywara wrote:
> >>>>>>>
> >>>>>>
> >>>>
> >>>> [...]
> >>>>
> >>>>>>>>>> +
> >>>>>>>>>> +     if (!is_in_range(mmio->phys_addr, mmio->len, rdbase,
> >>>>>>>>>> +         GIC_V3_REDIST_SIZE * nrcpus))
> >>>>>>>>>> +             return false;
> >>>>>>>>>
> >>>>>>>>> Did you think more about the contiguous allocation issue here or can you
> >>>>>>>>> give me a pointer to the requirement in the spec?
> >>>>>>>>
> >>>>>>>> 5.4.1 Re-Distributor Addressing
> >>>>>>>>
> >>>>>>>
> >>>>>>> Section 5.4.1 talks about the pages within a single re-distributor having
> >>>>>>> to be contiguous, not all the re-deistributor regions having to be
> >>>>>>> contiguous, right?
> >>>>>>
> >>>>>> Ah yes, you are right. But I still think it does not matter:
> >>>>>> 1) We are "implementing" the GICv3. So as the spec does not forbid this,
> >>>>>> we just state that the redistributor register maps for each VCPU are
> >>>>>> contiguous. Also we create the FDT accordingly. I will add a comment in
> >>>>>> the documentation to state this.
> >>>>>>
> >>>>>> 2) The kernel's GICv3 DT bindings assume this allocation is the default.
> >>>>>> Although Marc added bindings to work around this (stride), it seems much
> >>>>>> more logical to me to not use it.
> >>>>>
> >>>>> I don't disagree (and never have) with the fact that it is up to us to
> >>>>> decide.
> >>>>>
> >>>>> My original question, which we haven't talked about yet, is if it is
> >>>>> *reasonable* to assume that all re-distributor regions will always be
> >>>>> contiguous?
> >>>>>
> >>>>> How will you handle VCPU hotplug for example?
> >>>>
> >>>> As kvmtool does not support hotplug, I haven't thought about this yet.
> >>>> To me it looks like userland should just use maxcpus for the allocation.
> >>>> If I get the current QEMU code right, there is room for 127 GICv3 VCPUs
> >>>> (2*64K per VCPU + 64K for the distributor in 16M space) at the moment.
> >>>> Kvmtool uses a different mapping, which allows to share 1G with virtio,
> >>>> so the limit is around 8000ish VCPUs here.
> >>>> Are there any issues with changing the QEMU virt mapping later?
> >>>> Migration, maybe?
> >>>> If the UART, the RTC and the virtio regions are moved more towards the
> >>>> beginning of the 256MB PCI mapping, then there should be space for a bit
> >>>> less than 1024 VCPUs, if I get this right.
> >>>>
> >>>>> Where in the guest
> >>>>> physical memory map of our various virt machines should these regions
> >>>>> sit so that we can allocate anough re-distributors for VCPUs etc.?
> >>>>
> >>>> Various? Are there other mappings than those described in hw/arm/virt.c?
> >>>>
> >>>>> I just want to make sure we're not limiting ourselves by some amount of
> >>>>> functionality or ABI (redistributor base addresses) that will be hard to
> >>>>> expand in the future.
> >>>>
> >>>> If we are flexible with the mapping at VM creation time, QEMU could just
> >>>> use a mapping depending on max_cpus:
> >>>> < 128 VCPUs: use the current mapping
> >>>> 128 <= x < 1020: use a more compressed mapping
> >>>>> = 1020: map the redistributor somewhere above 4 GB
> >>>>
> >>>> As the device tree binding for GICv3 just supports a stride value, we
> >>>> don't have any other real options beside this, right? So how I see this,
> >>>> a contiguous mapping (with possible holes) is the only way.
> >>>
> >>> Not really. The GICv3 binding definitely supports having several regions
> >>> for the redistributors (see the binding documentation). This allows for
> >>> the pathological case where you have N regions for N CPUs. Not that we
> >>> ever want to go there, really.
> >>
> >> Ah yes, thanks for pointing that out. I was mixing this up with the
> >> stride parameter, which is independent of this. Sorry for that.
> >>
> >> So from a userland point of view we probably would like to have the
> >> first n VCPU's redistributors mapped at their current places and allow
> >> for more VCPUs to use memory above 4 GB.
> >> Which would require quite some changes to the code to support this in a
> >> very flexible way. I think this could be much easier if we confine
> >> ourselves to two regions (one contiguous lower (< 4 GB) and one
> >> contiguous upper region (>4 GB)), so we don't need to support arbitrary
> >> per VCPU addresses, but could just use the 1st or 2nd map depending on
> >> the VCPU number.
> >> Is this too hackish?
> >> If not, I would add another vgic_addr type (like
> >> KVM_VGIC_V3_ADDR_TYPE_REDIST_UPPER or so) to be used from userland and
> >> use that in the handle_mmio region detection.
> >> Let me know if that sounds reasonable.
> >>
> > The point that I've been trying to make sure we think about is if we'll
> > regret not being able to fragment the redistributor regions a bit.  Even
> > if it's technically possible, we may regret requiring a huge contigous
> > allocation in the guest physical address space.  But maybe we don't care
> > when we have 40 bits to play with?
> 
> 40 bits are more than enough. But are we OK with using only memory above
> 4GB? Is there some code before the Linux kernel that is limited to 4GB?
> I am thinking about 32bit guests in particular, which may have some
> firmware blob executed before which may not use the MMU.
> 
> If this is not an issue, I'd rather stay with one contiguous region - at
> least for the itme being. The current GICv3 code has a limit of 255
> VCPUs anyway, so this requires at most 32MB, which should be easily
> fitted anywhere.
> 
> Should we later need to extend the number of VCPUs, we can in the worst
> case adjust the code to support split regions if the 4GB limit issue
> persists. This would be done via a new KVM capability and some new
> register groups in the KVM device ioctl to set a second (or following)
> region, so in a backwards compatible way.
> 
ok, sounds reasonable.  I'll shut up then.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation
  2014-12-03 10:44                   ` Marc Zyngier
@ 2014-12-03 11:07                     ` Christoffer Dall
  0 siblings, 0 replies; 80+ messages in thread
From: Christoffer Dall @ 2014-12-03 11:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Dec 03, 2014 at 10:44:32AM +0000, Marc Zyngier wrote:
> On 03/12/14 10:29, Christoffer Dall wrote:
> > On Tue, Dec 02, 2014 at 05:06:09PM +0000, Marc Zyngier wrote:
> >> On 02/12/14 16:24, Andre Przywara wrote:
> >>> Hej Christoffer,
> >>>
> >>> On 30/11/14 08:30, Christoffer Dall wrote:
> >>>> On Fri, Nov 28, 2014 at 03:24:11PM +0000, Andre Przywara wrote:
> >>>>> Hej Christoffer,
> >>>>>
> >>>>> On 25/11/14 10:41, Christoffer Dall wrote:
> >>>>>> Hi Andre,
> >>>>>>
> >>>>>> On Mon, Nov 24, 2014 at 04:00:46PM +0000, Andre Przywara wrote:
> >>>>>>
> >>>>>
> >>>
> >>> [...]
> >>>
> >>>>>>>>> +
> >>>>>>>>> +     if (!is_in_range(mmio->phys_addr, mmio->len, rdbase,
> >>>>>>>>> +         GIC_V3_REDIST_SIZE * nrcpus))
> >>>>>>>>> +             return false;
> >>>>>>>>
> >>>>>>>> Did you think more about the contiguous allocation issue here or can you
> >>>>>>>> give me a pointer to the requirement in the spec?
> >>>>>>>
> >>>>>>> 5.4.1 Re-Distributor Addressing
> >>>>>>>
> >>>>>>
> >>>>>> Section 5.4.1 talks about the pages within a single re-distributor having
> >>>>>> to be contiguous, not all the re-deistributor regions having to be
> >>>>>> contiguous, right?
> >>>>>
> >>>>> Ah yes, you are right. But I still think it does not matter:
> >>>>> 1) We are "implementing" the GICv3. So as the spec does not forbid this,
> >>>>> we just state that the redistributor register maps for each VCPU are
> >>>>> contiguous. Also we create the FDT accordingly. I will add a comment in
> >>>>> the documentation to state this.
> >>>>>
> >>>>> 2) The kernel's GICv3 DT bindings assume this allocation is the default.
> >>>>> Although Marc added bindings to work around this (stride), it seems much
> >>>>> more logical to me to not use it.
> >>>>
> >>>> I don't disagree (and never have) with the fact that it is up to us to
> >>>> decide.
> >>>>
> >>>> My original question, which we haven't talked about yet, is if it is
> >>>> *reasonable* to assume that all re-distributor regions will always be
> >>>> contiguous?
> >>>>
> >>>> How will you handle VCPU hotplug for example?
> >>>
> >>> As kvmtool does not support hotplug, I haven't thought about this yet.
> >>> To me it looks like userland should just use maxcpus for the allocation.
> >>> If I get the current QEMU code right, there is room for 127 GICv3 VCPUs
> >>> (2*64K per VCPU + 64K for the distributor in 16M space) at the moment.
> >>> Kvmtool uses a different mapping, which allows to share 1G with virtio,
> >>> so the limit is around 8000ish VCPUs here.
> >>> Are there any issues with changing the QEMU virt mapping later?
> >>> Migration, maybe?
> >>> If the UART, the RTC and the virtio regions are moved more towards the
> >>> beginning of the 256MB PCI mapping, then there should be space for a bit
> >>> less than 1024 VCPUs, if I get this right.
> >>>
> >>>> Where in the guest
> >>>> physical memory map of our various virt machines should these regions
> >>>> sit so that we can allocate anough re-distributors for VCPUs etc.?
> >>>
> >>> Various? Are there other mappings than those described in hw/arm/virt.c?
> >>>
> >>>> I just want to make sure we're not limiting ourselves by some amount of
> >>>> functionality or ABI (redistributor base addresses) that will be hard to
> >>>> expand in the future.
> >>>
> >>> If we are flexible with the mapping at VM creation time, QEMU could just
> >>> use a mapping depending on max_cpus:
> >>> < 128 VCPUs: use the current mapping
> >>> 128 <= x < 1020: use a more compressed mapping
> >>>> = 1020: map the redistributor somewhere above 4 GB
> >>>
> >>> As the device tree binding for GICv3 just supports a stride value, we
> >>> don't have any other real options beside this, right? So how I see this,
> >>> a contiguous mapping (with possible holes) is the only way.
> >>
> >> Not really. The GICv3 binding definitely supports having several regions
> >> for the redistributors (see the binding documentation). This allows for
> >> the pathological case where you have N regions for N CPUs. Not that we
> >> ever want to go there, really.
> >>
> > What are your thoughts on mapping all of the redistributor regions in
> > one consecutive guest phys address space chunk?  Am I making an issue
> > out of nothing?
> 
> I don't think this is too bad. It puts constraints on the physical
> memory map, but we do have a massive IPA space anyway (at least on
> arm64). Of course, the issue is slightly more acute on 32bit guests,
> where IPA space is at a premium. But this is fairly accurately modelling
> a monolithic GICv3 (as opposed to distributed).
> 
> I imagine that, over time, we'll have to introduce support for "split"
> redistributor ranges, but that probably only become an issue when you
> want to support guests with several hundred vcpus.
> 
> Another interesting point you raise is vcpu hotplug. I'm not completely
> sure how that would work. Do we pre-allocate redistributors, do we have
> a more coarse grained "socket hot-plug"? I think that we need to give it
> some thoughts, as this probably require a slightly different model for
> GICv3.
> 
hotplug is indeed probably a larger can of worms.  Let's move forward
with these patches for now and patch up things later then.

-Christoffer

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation
  2014-12-03 10:28               ` Christoffer Dall
@ 2014-12-03 11:10                 ` Andre Przywara
  2014-12-03 11:28                   ` Arnd Bergmann
  0 siblings, 1 reply; 80+ messages in thread
From: Andre Przywara @ 2014-12-03 11:10 UTC (permalink / raw)
  To: linux-arm-kernel

Hej Christoffer,

On 03/12/14 10:28, Christoffer Dall wrote:
> On Tue, Dec 02, 2014 at 04:24:53PM +0000, Andre Przywara wrote:
>> Hej Christoffer,
>>
>> On 30/11/14 08:30, Christoffer Dall wrote:
>>> On Fri, Nov 28, 2014 at 03:24:11PM +0000, Andre Przywara wrote:
>>>> Hej Christoffer,
>>>>
>>>> On 25/11/14 10:41, Christoffer Dall wrote:
>>>>> Hi Andre,
>>>>>
>>>>> On Mon, Nov 24, 2014 at 04:00:46PM +0000, Andre Przywara wrote:
>>>>>
>>>>
>>
>> [...]
>>
>>>>>>>> +
>>>>>>>> +     if (!is_in_range(mmio->phys_addr, mmio->len, rdbase,
>>>>>>>> +         GIC_V3_REDIST_SIZE * nrcpus))
>>>>>>>> +             return false;
>>>>>>>
>>>>>>> Did you think more about the contiguous allocation issue here or can you
>>>>>>> give me a pointer to the requirement in the spec?
>>>>>>
>>>>>> 5.4.1 Re-Distributor Addressing
>>>>>>
>>>>>
>>>>> Section 5.4.1 talks about the pages within a single re-distributor having
>>>>> to be contiguous, not all the re-deistributor regions having to be
>>>>> contiguous, right?
>>>>
>>>> Ah yes, you are right. But I still think it does not matter:
>>>> 1) We are "implementing" the GICv3. So as the spec does not forbid this,
>>>> we just state that the redistributor register maps for each VCPU are
>>>> contiguous. Also we create the FDT accordingly. I will add a comment in
>>>> the documentation to state this.
>>>>
>>>> 2) The kernel's GICv3 DT bindings assume this allocation is the default.
>>>> Although Marc added bindings to work around this (stride), it seems much
>>>> more logical to me to not use it.
>>>
>>> I don't disagree (and never have) with the fact that it is up to us to
>>> decide.
>>>
>>> My original question, which we haven't talked about yet, is if it is
>>> *reasonable* to assume that all re-distributor regions will always be
>>> contiguous?
>>>
>>> How will you handle VCPU hotplug for example?
>>
>> As kvmtool does not support hotplug, I haven't thought about this yet.
>> To me it looks like userland should just use maxcpus for the allocation.
>> If I get the current QEMU code right, there is room for 127 GICv3 VCPUs
>> (2*64K per VCPU + 64K for the distributor in 16M space) at the moment.
>> Kvmtool uses a different mapping, which allows to share 1G with virtio,
>> so the limit is around 8000ish VCPUs here.
>> Are there any issues with changing the QEMU virt mapping later?
> 
> Not issues as such, but we try to keep it as stable as possible.  At
> least soon, you have to worry about UEFI working with such changes, for
> example.
> 
>> Migration, maybe?
>> If the UART, the RTC and the virtio regions are moved more towards the
>> beginning of the 256MB PCI mapping, then there should be space for a bit
>> less than 1024 VCPUs, if I get this right.
>>
>>> Where in the guest
>>> physical memory map of our various virt machines should these regions
>>> sit so that we can allocate anough re-distributors for VCPUs etc.?
>>
>> Various? Are there other mappings than those described in hw/arm/virt.c?
>>
> 
> QEMU's and kvmtool's, for starters.
> 
>>> I just want to make sure we're not limiting ourselves by some amount of
>>> functionality or ABI (redistributor base addresses) that will be hard to
>>> expand in the future.
>>
>> If we are flexible with the mapping at VM creation time, QEMU could just
>> use a mapping depending on max_cpus:
>> < 128 VCPUs: use the current mapping
>> 128 <= x < 1020: use a more compressed mapping
>>> = 1020: map the redistributor somewhere above 4 GB
> 
> I don't understand what the compressed mapping is. 

Currently we have in QEMU:
   0 MB -  128 MB: Flash
 128 MB -  144 MB: CPU peripherals (GIC for now only)
 144 MB -  144 MB: UART (64k)
 144 MB -  144 MB: RTC (64k)
 160 MB -  256 MB: virtio (32 * 512 Bytes used)
 256 MB - 1024 MB: PCI
1024 MB - ???? MB: DRAM

So we waste quite some space between 144 MB and 256 MB, where we
currently use only 3 64K pages. If we would move those three pages
closer to the 256 MB region, we could extend the GIC mapping space from
16 MB (127 VCPUs) to almost 128 MB (good for 1022 VCPUs).

> Is the current code
> easy to expand such that you can add a secondary separate redistributor
> region above the 4GB map?

Yes, reasonably easy. I sketched this, we just need to have two more
attribute registers (KVM_VGIC_V3_ADDR_TYPE_REDIST_UPPER and
KVM_VGIC_V3_NR_LOWER_VCPUS or so) in the KVM device. But I would refrain
from this for now, as the limit in the rest of the GICv3 code is 255
VCPUs anyways (x86 has the same limit).

>> As the device tree binding for GICv3 just supports a stride value, we
>> don't have any other real options beside this, right? So how I see this,
>> a contiguous mapping (with possible holes) is the only way.
>>
>>>>>>>> +
>>>>>>>> +static int vgic_v3_init(struct kvm *kvm, const struct vgic_params *params)
>>>>>>>> +{
>>>>>>>> +     struct vgic_dist *dist = &kvm->arch.vgic;
>>>>>>>> +     int ret, i;
>>>>>>>> +     u32 mpidr;
>>>>>>>> +
>>>>>>>> +     if (IS_VGIC_ADDR_UNDEF(dist->vgic_dist_base) ||
>>>>>>>> +         IS_VGIC_ADDR_UNDEF(dist->vgic_redist_base)) {
>>>>>>>> +             kvm_err("Need to set vgic distributor addresses first\n");
>>>>>>>> +             return -ENXIO;
>>>>>>>> +     }
>>>>>>>> +
>>>>>>>> +     /*
>>>>>>>> +      * FIXME: this should be moved to init_maps time, and may bite
>>>>>>>> +      * us when adding save/restore. Add a per-emulation hook?
>>>>>>>> +      */
>>>>>>>
>>>>>>> progress on this fixme?
>>>>>>
>>>>>> Progress supplies the ISS, but not this piece of code (read: none) ;-)
>>>>>> I am more in favour of a follow-up patch on this one ...
>>>>>
>>>>> hmmm, I'm not a fan of merging code with this kind of a comment in it,
>>>>> because it looks scary, and I dont' really understand the problem from
>>>>> just reading the comment, so something needs to be done here.
>>>>
>>>> I see. What about we are moving this unconditionally into vgic_init_maps
>>>> and allocate it for both v2 and v3 guests and get rid of the whole
>>>> function? It allocates only memory for the irq_spi_mpidr, which is 4
>>>> bytes per configured SPI (so at most less than 4 KB, but usually just
>>>> 128 Bytes per guest). This would be a pretty quick solution. Does that
>>>> sound too hackish?
>>>>
>>>> After your comments about the per-VM ops function pointers I am a bit
>>>> reluctant to introduce another one (which would be the obvious way
>>>> following the comment) for just this simple kalloc().
>>>> On the other hand the ITS emulation may later make better use of a GICv3
>>>> specific allocation function.
>>>
>>> What I really disliked was the configuration of a function pointer,
>>> which, when invoked, configured other function pointers.  That just made
>>> my head spin.  So adding another per-gic-model init_maps method is not
>>> that bad, but on the other hand, the only problem with keeping this here
>>> is that when we restore the vgic state, then user space wants to be able
>>> to populate all the date before running any VCPUs, and we don't create
>>> the data structures before the first VCPU is run.
>>>
>>> However, Eric has a problem with this "init-when-we-run-the-first-VCPU"
>>> approach as well, so one argument is that we need to add a method to
>>> both the gicv2 and gicv3 device API to say "VGIC_INIT" which userspace
>>> can call after having created all the VCPUs.  And, in fact, we may want
>>> to enforce this for the gicv3 right now and only maintain the existing
>>> behavior for gicv2.
>>>
>>> (Eric's use case is configuring IRQFD, which must logically be done
>>> before running the machine, but also needs to be done after the vgic is
>>> fully ready.).
>>>
>>> Does this make sense?
>>
>> So if we would avoid that spooky "detect-if-a-VCPU-has-run" code and
>> rely on an explicit ioctl, I am in favor for this. We would need to keep
>> the current approach for compatibility, though, right?
> 
> not for gicv3, only for gicv2, which is exactly why I'm bringing it up
> here.

Ah, I see.

>>
>> So what about we either keep the current GICv3 allocation as it stands
>> in my patches right now (or move the GICv3 specific part into the
>> general vgic_init_maps()) and adapt that to the VGIC_INIT call once that
>> has appeared (or even handle this in that series then).
>>
>> Does that make sense? What is the time frame for that VGIC_INIT call?
>>
> 
> Eric posted some patches in private yesterday, I think he's planning on
> sending them out any day now.

I just found them on the list. Let me take a look...

Cheers,
Andre.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation
  2014-12-03 11:10                 ` Andre Przywara
@ 2014-12-03 11:28                   ` Arnd Bergmann
  2014-12-03 11:39                     ` Peter Maydell
  0 siblings, 1 reply; 80+ messages in thread
From: Arnd Bergmann @ 2014-12-03 11:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 03 December 2014 11:10:18 Andre Przywara wrote:
> On 03/12/14 10:28, Christoffer Dall wrote:
> > On Tue, Dec 02, 2014 at 04:24:53PM +0000, Andre Przywara wrote:
> >> On 30/11/14 08:30, Christoffer Dall wrote:
> >>
> >> If we are flexible with the mapping at VM creation time, QEMU could just
> >> use a mapping depending on max_cpus:
> >> < 128 VCPUs: use the current mapping
> >> 128 <= x < 1020: use a more compressed mapping
> >>> = 1020: map the redistributor somewhere above 4 GB
> > 
> > I don't understand what the compressed mapping is. 
> 
> Currently we have in QEMU:
>    0 MB -  128 MB: Flash
>  128 MB -  144 MB: CPU peripherals (GIC for now only)
>  144 MB -  144 MB: UART (64k)
>  144 MB -  144 MB: RTC (64k)
>  160 MB -  256 MB: virtio (32 * 512 Bytes used)
>  256 MB - 1024 MB: PCI
> 1024 MB - ???? MB: DRAM
> 
> So we waste quite some space between 144 MB and 256 MB, where we
> currently use only 3 64K pages. If we would move those three pages
> closer to the 256 MB region, we could extend the GIC mapping space from
> 16 MB (127 VCPUs) to almost 128 MB (good for 1022 VCPUs).

Having a more compressed mapping sounds useful, but you could also consider
carving the GIC registers out of the PCI range and make PCI memory space
smaller if there are lots of CPUs.

Is there also a 64-bit PCI memory space behind the DRAM?

	Arnd

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation
  2014-12-03 11:28                   ` Arnd Bergmann
@ 2014-12-03 11:39                     ` Peter Maydell
  2014-12-03 12:03                       ` Andre Przywara
  0 siblings, 1 reply; 80+ messages in thread
From: Peter Maydell @ 2014-12-03 11:39 UTC (permalink / raw)
  To: linux-arm-kernel

On 3 December 2014 at 11:28, Arnd Bergmann <arnd@arndb.de> wrote:
> On Wednesday 03 December 2014 11:10:18 Andre Przywara wrote:
>> Currently we have in QEMU:
>>    0 MB -  128 MB: Flash
>>  128 MB -  144 MB: CPU peripherals (GIC for now only)
>>  144 MB -  144 MB: UART (64k)
>>  144 MB -  144 MB: RTC (64k)
>>  160 MB -  256 MB: virtio (32 * 512 Bytes used)
>>  256 MB - 1024 MB: PCI
>> 1024 MB - ???? MB: DRAM
>>
>> So we waste quite some space between 144 MB and 256 MB, where we
>> currently use only 3 64K pages. If we would move those three pages
>> closer to the 256 MB region, we could extend the GIC mapping space from
>> 16 MB (127 VCPUs) to almost 128 MB (good for 1022 VCPUs).

Note that part of the reason for that gap is to give us space
to add more memory-mapped peripherals as they are needed.
For instance the fw_cfg conduit for talking to UEFI firmware
is going to need another 64K page in that section.

> Having a more compressed mapping sounds useful, but you could also consider
> carving the GIC registers out of the PCI range and make PCI memory space
> smaller if there are lots of CPUs.
>
> Is there also a 64-bit PCI memory space behind the DRAM?

The "PCI" section at the moment is purely hypothetical -- it's
a lump of reserved space in the memory map that I put in so
we had a place to put PCI later... The DRAM is just the last
thing in the memory map and goes up for as much DRAM as you
asked QEMU to provide, so in that sense there's no "behind"
there.

We could (at least at the moment) shuffle things around a bit,
since guests should all be looking at the DTB to figure where
things are. About the only thing we can't do is move the base
address of RAM or put the flash somewhere other than 0. If
we do need to make changes I'd rather we figured out the new
layout and switched to it rather than doing a set of piecemeal
tweaks, though.

-- PMM

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation
  2014-12-03 11:39                     ` Peter Maydell
@ 2014-12-03 12:03                       ` Andre Przywara
  2014-12-03 13:14                         ` Arnd Bergmann
  2014-12-04  9:34                         ` Christoffer Dall
  0 siblings, 2 replies; 80+ messages in thread
From: Andre Przywara @ 2014-12-03 12:03 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/12/14 11:39, Peter Maydell wrote:
> On 3 December 2014 at 11:28, Arnd Bergmann <arnd@arndb.de> wrote:
>> On Wednesday 03 December 2014 11:10:18 Andre Przywara wrote:
>>> Currently we have in QEMU:
>>>    0 MB -  128 MB: Flash
>>>  128 MB -  144 MB: CPU peripherals (GIC for now only)
>>>  144 MB -  144 MB: UART (64k)
>>>  144 MB -  144 MB: RTC (64k)
>>>  160 MB -  256 MB: virtio (32 * 512 Bytes used)
>>>  256 MB - 1024 MB: PCI
>>> 1024 MB - ???? MB: DRAM
>>>
>>> So we waste quite some space between 144 MB and 256 MB, where we
>>> currently use only 3 64K pages. If we would move those three pages
>>> closer to the 256 MB region, we could extend the GIC mapping space from
>>> 16 MB (127 VCPUs) to almost 128 MB (good for 1022 VCPUs).
> 
> Note that part of the reason for that gap is to give us space
> to add more memory-mapped peripherals as they are needed.
> For instance the fw_cfg conduit for talking to UEFI firmware
> is going to need another 64K page in that section.

Sure, that was just to show the situation in general. We could do
something like kvmtool and grow the CPU peripheral space and the device
MMIO pages from different directions to not waste space needlessly.
So UART, RTC, virtio and future devices use space from 128MB upwards,
whereas the GIC uses as much space below 256 MB as needed.

Btw.: Is there an issue with not aligning each virtio device to a 64K
page? Will we never need to separate access to virtio devices in a guest
with MMU help?

>> Having a more compressed mapping sounds useful, but you could also consider
>> carving the GIC registers out of the PCI range and make PCI memory space
>> smaller if there are lots of CPUs.
>>
>> Is there also a 64-bit PCI memory space behind the DRAM?
> 
> The "PCI" section at the moment is purely hypothetical -- it's
> a lump of reserved space in the memory map that I put in so
> we had a place to put PCI later... The DRAM is just the last
> thing in the memory map and goes up for as much DRAM as you
> asked QEMU to provide, so in that sense there's no "behind"
> there.
> 
> We could (at least at the moment) shuffle things around a bit,
> since guests should all be looking at the DTB to figure where
> things are. About the only thing we can't do is move the base
> address of RAM or put the flash somewhere other than 0. If
> we do need to make changes I'd rather we figured out the new
> layout and switched to it rather than doing a set of piecemeal
> tweaks, though.

Agreed. As QEMU does not support GICv3 anyway at the moment, there is no
need to rush things.

Is there any sign of a PCI host controller emulation for QEMU on the
horizon?

Cheers,
Andre.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation
  2014-12-03 12:03                       ` Andre Przywara
@ 2014-12-03 13:14                         ` Arnd Bergmann
  2014-12-04  9:34                         ` Christoffer Dall
  1 sibling, 0 replies; 80+ messages in thread
From: Arnd Bergmann @ 2014-12-03 13:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 03 December 2014 12:03:59 Andre Przywara wrote:
> On 03/12/14 11:39, Peter Maydell wrote:
> > On 3 December 2014 at 11:28, Arnd Bergmann <arnd@arndb.de> wrote:
> >> On Wednesday 03 December 2014 11:10:18 Andre Przywara wrote:
> >>> Currently we have in QEMU:
> >>>    0 MB -  128 MB: Flash
> >>>  128 MB -  144 MB: CPU peripherals (GIC for now only)
> >>>  144 MB -  144 MB: UART (64k)
> >>>  144 MB -  144 MB: RTC (64k)
> >>>  160 MB -  256 MB: virtio (32 * 512 Bytes used)
> >>>  256 MB - 1024 MB: PCI
> >>> 1024 MB - ???? MB: DRAM
> >>>
> >>> So we waste quite some space between 144 MB and 256 MB, where we
> >>> currently use only 3 64K pages. If we would move those three pages
> >>> closer to the 256 MB region, we could extend the GIC mapping space from
> >>> 16 MB (127 VCPUs) to almost 128 MB (good for 1022 VCPUs).
> > 
> > Note that part of the reason for that gap is to give us space
> > to add more memory-mapped peripherals as they are needed.
> > For instance the fw_cfg conduit for talking to UEFI firmware
> > is going to need another 64K page in that section.
>
> Sure, that was just to show the situation in general. We could do
> something like kvmtool and grow the CPU peripheral space and the device
> MMIO pages from different directions to not waste space needlessly.
> So UART, RTC, virtio and future devices use space from 128MB upwards,
> whereas the GIC uses as much space below 256 MB as needed.

I also think that would be good. I assume 128MB of flash is required for
running UEFI, or could that be optional or variable-sized too?

A very easy method for doing this would be just allocate any devices with
a 64k spacing from the bottom (right after the end of flash) and use
whatever remains below 1GB for 32-bit PCI memory space, then have all
RAM after that, and finally do a 64-bit PCI memory space at the end
of RAM, probably at the next power-of-two aligned address.

Within the PCI area, it could look something like this:

<variable start>
* 64KB I/O space
* 1MB aligned ECAM config space, 1MB per emulated bus, possibly
  some spare to allow PCI hotplug
* prefetchable memory space, multiples of 1MB, only needed when
  doing device passthrough
* non-prefetchable memory space, all the rest up to 1GB

> Btw.: Is there an issue with not aligning each virtio device to a 64K
> page? Will we never need to separate access to virtio devices in a guest
> with MMU help?

As far as I'm concerned, none of the integrated devices need to be aligned
to 64K. The only requirement for 64K alignment is if you have a qemu
emulated machine that runs a hypervisor in it and does a passthrough of
its devices to an unprivileged guest running under the hypervisor.

I can't think of why anybody would ever do that for a uart or rtc, but
maybe my imagination is too limited.

Note that some of the older SoCs had all their peripherals within
a single 4kb page!
 

	Arnd

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 17/19] arm64: KVM: add SGI generation register emulation
  2014-11-30  8:45           ` Christoffer Dall
@ 2014-12-03 17:50             ` Andre Przywara
  2014-12-03 20:22               ` Christoffer Dall
  0 siblings, 1 reply; 80+ messages in thread
From: Andre Przywara @ 2014-12-03 17:50 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/11/14 08:45, Christoffer Dall wrote:
> On Fri, Nov 28, 2014 at 03:40:12PM +0000, Andre Przywara wrote:
>> Hej Christoffer,
>>
>> On 25/11/14 11:03, Christoffer Dall wrote:
>>> Hi Andre,
>>>
>>> On Mon, Nov 24, 2014 at 04:37:58PM +0000, Andre Przywara wrote:
>>>> Hi,
>>>>
>>>> On 23/11/14 15:08, Christoffer Dall wrote:
>>>>> On Fri, Nov 14, 2014 at 10:08:01AM +0000, Andre Przywara wrote:
>>>>>> While the generation of a (virtual) inter-processor interrupt (SGI)
>>>>>> on a GICv2 works by writing to a MMIO register, GICv3 uses the system
>>>>>> register ICC_SGI1R_EL1 to trigger them.
>>>>>> Trap that register on ARM64 hosts and handle it in a new handler
>>>>>> function in the GICv3 emulation code.
>>>>>
>>>>> Did you reorder something or does my previous comment still apply that
>>>>> you're not enabling trapping yet, you're just adding the handler - those
>>>>> are two different things.
>>>>
>>>> Yes, I can fix the wording.
>>>>
>>>>> You sort of left my question about access_gic_sgi() not checking if the
>>>>> gicv3 is presetn hanging from the last thread, but I think I'm
>>>>> understanding properly now, that as long as you're not setting the
>>>>> ICC_SRE_EL2.Enable = 1, then we'll never get here, right?
>>>>
>>>> Right, that is the idea. Just to make sure that I got this right from
>>>> the discussion the other day: We will not trap to EL2 as long as
>>>> ICC_SRE_EL2.Enable is 0 - which it should still be at this point, right?
>>>
>>> No, when ICC_SRE_EL2.Enable is 0, then Non-secure EL1 access to
>>> ICC_SRE_EL1 trap to EL2 (See Section 5.7.39 in the spec), which means
>>> that accesses to the ICC_SGIx registers will cause an undefined
>>> exception in the guest because we set ICC_SRE_EL1.SRE to 0 for the
>>> guest and the guest cannot change this.
>>>
>>> Now, when we set ICC_SRE_EL2.Enable to 1, then the guest can set
>>> ICC_SRE_EL1.SRE to 1 (and we also happen to reset it to 1), and we will
>>> indeed trap on guest access to the ICC_SGIx registers, because all
>>> virtual accesses to these registers trap.
>>>
>>> (Going back and checking where 'virtual accesses' is defined in the spec
>>> left me somewhere without any results, but I am guessing that because we
>>> set the ICH_HCR_EL2.En to 1, all accesses will be deemed virtual
>>> accesses, maybe the spec should be clarfied on this matter?).
>>>
>>> Anyhow, to get back to my original question, getting here requires
>>> a situation where the guest copy of the ICC_SRE_EL1.SRE is 1, which we
>>> only allow when we have properly initialized the GICv3 data structures.
>>
>> So to summarize (and check) this: There is no real issue at this point?
>> And the code is totally fine after 19/19?
> 
> There is no issue at this point, no.
> 
>>
>> Would this kind of problem actually matter _inside_ a patch series? To
>> trigger an issue, we would need a bogus guest and bogus userland
>> (because at this point neither of them would see/inject a GICv3 FDT
>> node). I'd assume that running a kernel at this point is just for
>> debugging/bisecting? Where you wouldn't care about every corner case of
>> execution?
> 
> The argument about bogus guests / fdts should *never* be considered in
> the context of these discussions.  If we have code that looks like the
> guest can kill the host, or do a NULL pointer dereference, then we need
> to address it.
> 
> Your point about it being inside a patch series, sure, it's unlikely
> that people will run this, but I'm reviewing this patch right now, and
> honestly not considering how this changes in the subsequent patch.  For
> this sort of thing, if we were leaving a gaping hole open, that would at
> least require a clear note in the commit message on why we're doing it.

I see, makes sense.
So I thought about adding a line like this to the very beginning of
vgic_v3_dispatch_sgi(). This would cover all cases of spurious traps.
Does that sound useful as a security precaution (though unneeded as
described)?
Shall there be a warning before the return?

+	/* only valid for an initialized VGICv3 */
+	if (!vgic_initialized(kvm)  ||
+	    kvm->arch.vgic.vgic_model != KVM_DEV_TYPE_ARM_VGIC_V3)
+		return;

> Hopefully you understood and agreed with my deduction about the various
> SRE settings above though?

Yes, I got this. We are safe as long as ICC_SRE_EL1.SRE is 0, which is
true until patch 19/19 allows userland to request a GICv3 guest, which
will force it to 1.
I also tested this explicitly, starting with patch 17/19 (for the host
kernel) and going over the remaining two as well. Starting a guest with
GICv2 and accessing ICC_SRE_EL1 and ICC_SGI1R_EL1 from a custom module
inside the guest will always keep ICC_SRE_EL1.SRE to 0 (thanks to your
recent trap patch), accesses to ICC_SGI1R_EL1 provoke an #UNDEF
exception in the guest. The host was never bothered.
Creating a guest with a GICv3 was only successful after patch 19/19, and
ICC_SRE_EL1.SRE couldn't be cleared.

So I consider this topic done.

Cheers,
Andre.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 17/19] arm64: KVM: add SGI generation register emulation
  2014-12-03 17:50             ` Andre Przywara
@ 2014-12-03 20:22               ` Christoffer Dall
  0 siblings, 0 replies; 80+ messages in thread
From: Christoffer Dall @ 2014-12-03 20:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Dec 03, 2014 at 05:50:51PM +0000, Andre Przywara wrote:
> On 30/11/14 08:45, Christoffer Dall wrote:
> > On Fri, Nov 28, 2014 at 03:40:12PM +0000, Andre Przywara wrote:
> >> Hej Christoffer,
> >>
> >> On 25/11/14 11:03, Christoffer Dall wrote:
> >>> Hi Andre,
> >>>
> >>> On Mon, Nov 24, 2014 at 04:37:58PM +0000, Andre Przywara wrote:
> >>>> Hi,
> >>>>
> >>>> On 23/11/14 15:08, Christoffer Dall wrote:
> >>>>> On Fri, Nov 14, 2014 at 10:08:01AM +0000, Andre Przywara wrote:
> >>>>>> While the generation of a (virtual) inter-processor interrupt (SGI)
> >>>>>> on a GICv2 works by writing to a MMIO register, GICv3 uses the system
> >>>>>> register ICC_SGI1R_EL1 to trigger them.
> >>>>>> Trap that register on ARM64 hosts and handle it in a new handler
> >>>>>> function in the GICv3 emulation code.
> >>>>>
> >>>>> Did you reorder something or does my previous comment still apply that
> >>>>> you're not enabling trapping yet, you're just adding the handler - those
> >>>>> are two different things.
> >>>>
> >>>> Yes, I can fix the wording.
> >>>>
> >>>>> You sort of left my question about access_gic_sgi() not checking if the
> >>>>> gicv3 is presetn hanging from the last thread, but I think I'm
> >>>>> understanding properly now, that as long as you're not setting the
> >>>>> ICC_SRE_EL2.Enable = 1, then we'll never get here, right?
> >>>>
> >>>> Right, that is the idea. Just to make sure that I got this right from
> >>>> the discussion the other day: We will not trap to EL2 as long as
> >>>> ICC_SRE_EL2.Enable is 0 - which it should still be at this point, right?
> >>>
> >>> No, when ICC_SRE_EL2.Enable is 0, then Non-secure EL1 access to
> >>> ICC_SRE_EL1 trap to EL2 (See Section 5.7.39 in the spec), which means
> >>> that accesses to the ICC_SGIx registers will cause an undefined
> >>> exception in the guest because we set ICC_SRE_EL1.SRE to 0 for the
> >>> guest and the guest cannot change this.
> >>>
> >>> Now, when we set ICC_SRE_EL2.Enable to 1, then the guest can set
> >>> ICC_SRE_EL1.SRE to 1 (and we also happen to reset it to 1), and we will
> >>> indeed trap on guest access to the ICC_SGIx registers, because all
> >>> virtual accesses to these registers trap.
> >>>
> >>> (Going back and checking where 'virtual accesses' is defined in the spec
> >>> left me somewhere without any results, but I am guessing that because we
> >>> set the ICH_HCR_EL2.En to 1, all accesses will be deemed virtual
> >>> accesses, maybe the spec should be clarfied on this matter?).
> >>>
> >>> Anyhow, to get back to my original question, getting here requires
> >>> a situation where the guest copy of the ICC_SRE_EL1.SRE is 1, which we
> >>> only allow when we have properly initialized the GICv3 data structures.
> >>
> >> So to summarize (and check) this: There is no real issue at this point?
> >> And the code is totally fine after 19/19?
> > 
> > There is no issue at this point, no.
> > 
> >>
> >> Would this kind of problem actually matter _inside_ a patch series? To
> >> trigger an issue, we would need a bogus guest and bogus userland
> >> (because at this point neither of them would see/inject a GICv3 FDT
> >> node). I'd assume that running a kernel at this point is just for
> >> debugging/bisecting? Where you wouldn't care about every corner case of
> >> execution?
> > 
> > The argument about bogus guests / fdts should *never* be considered in
> > the context of these discussions.  If we have code that looks like the
> > guest can kill the host, or do a NULL pointer dereference, then we need
> > to address it.
> > 
> > Your point about it being inside a patch series, sure, it's unlikely
> > that people will run this, but I'm reviewing this patch right now, and
> > honestly not considering how this changes in the subsequent patch.  For
> > this sort of thing, if we were leaving a gaping hole open, that would at
> > least require a clear note in the commit message on why we're doing it.
> 
> I see, makes sense.
> So I thought about adding a line like this to the very beginning of
> vgic_v3_dispatch_sgi(). This would cover all cases of spurious traps.
> Does that sound useful as a security precaution (though unneeded as
> described)?
> Shall there be a warning before the return?
> 
> +	/* only valid for an initialized VGICv3 */
> +	if (!vgic_initialized(kvm)  ||
> +	    kvm->arch.vgic.vgic_model != KVM_DEV_TYPE_ARM_VGIC_V3)
> +		return;
> 

If anything you should have a BUG_ON(), but especially when you've
tested this, it won't happen.

> > Hopefully you understood and agreed with my deduction about the various
> > SRE settings above though?
> 
> Yes, I got this. We are safe as long as ICC_SRE_EL1.SRE is 0, which is
> true until patch 19/19 allows userland to request a GICv3 guest, which
> will force it to 1.
> I also tested this explicitly, starting with patch 17/19 (for the host
> kernel) and going over the remaining two as well. Starting a guest with
> GICv2 and accessing ICC_SRE_EL1 and ICC_SGI1R_EL1 from a custom module
> inside the guest will always keep ICC_SRE_EL1.SRE to 0 (thanks to your
> recent trap patch), accesses to ICC_SGI1R_EL1 provoke an #UNDEF
> exception in the guest. The host was never bothered.
> Creating a guest with a GICv3 was only successful after patch 19/19, and
> ICC_SRE_EL1.SRE couldn't be cleared.
> 
> So I consider this topic done.
> 
Indeed!
-Christoffer

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation
  2014-12-03 12:03                       ` Andre Przywara
  2014-12-03 13:14                         ` Arnd Bergmann
@ 2014-12-04  9:34                         ` Christoffer Dall
  2014-12-04 10:02                           ` Eric Auger
  1 sibling, 1 reply; 80+ messages in thread
From: Christoffer Dall @ 2014-12-04  9:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Dec 03, 2014 at 12:03:59PM +0000, Andre Przywara wrote:
> On 03/12/14 11:39, Peter Maydell wrote:
> > On 3 December 2014 at 11:28, Arnd Bergmann <arnd@arndb.de> wrote:
> >> On Wednesday 03 December 2014 11:10:18 Andre Przywara wrote:
> >>> Currently we have in QEMU:
> >>>    0 MB -  128 MB: Flash
> >>>  128 MB -  144 MB: CPU peripherals (GIC for now only)
> >>>  144 MB -  144 MB: UART (64k)
> >>>  144 MB -  144 MB: RTC (64k)
> >>>  160 MB -  256 MB: virtio (32 * 512 Bytes used)
> >>>  256 MB - 1024 MB: PCI
> >>> 1024 MB - ???? MB: DRAM
> >>>
> >>> So we waste quite some space between 144 MB and 256 MB, where we
> >>> currently use only 3 64K pages. If we would move those three pages
> >>> closer to the 256 MB region, we could extend the GIC mapping space from
> >>> 16 MB (127 VCPUs) to almost 128 MB (good for 1022 VCPUs).
> > 
> > Note that part of the reason for that gap is to give us space
> > to add more memory-mapped peripherals as they are needed.
> > For instance the fw_cfg conduit for talking to UEFI firmware
> > is going to need another 64K page in that section.
> 
> Sure, that was just to show the situation in general. We could do
> something like kvmtool and grow the CPU peripheral space and the device
> MMIO pages from different directions to not waste space needlessly.
> So UART, RTC, virtio and future devices use space from 128MB upwards,
> whereas the GIC uses as much space below 256 MB as needed.
> 
> Btw.: Is there an issue with not aligning each virtio device to a 64K
> page? Will we never need to separate access to virtio devices in a guest
> with MMU help?
> 
> >> Having a more compressed mapping sounds useful, but you could also consider
> >> carving the GIC registers out of the PCI range and make PCI memory space
> >> smaller if there are lots of CPUs.
> >>
> >> Is there also a 64-bit PCI memory space behind the DRAM?
> > 
> > The "PCI" section at the moment is purely hypothetical -- it's
> > a lump of reserved space in the memory map that I put in so
> > we had a place to put PCI later... The DRAM is just the last
> > thing in the memory map and goes up for as much DRAM as you
> > asked QEMU to provide, so in that sense there's no "behind"
> > there.
> > 
> > We could (at least at the moment) shuffle things around a bit,
> > since guests should all be looking at the DTB to figure where
> > things are. About the only thing we can't do is move the base
> > address of RAM or put the flash somewhere other than 0. If
> > we do need to make changes I'd rather we figured out the new
> > layout and switched to it rather than doing a set of piecemeal
> > tweaks, though.
> 
> Agreed. As QEMU does not support GICv3 anyway at the moment, there is no
> need to rush things.
> 
> Is there any sign of a PCI host controller emulation for QEMU on the
> horizon?
> 
Yes, we need to have this in Q1 2015 latest.  Linaro has scheduled this
work, but patches are a little while out I'm afraid.

-Christoffer

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation
  2014-12-04  9:34                         ` Christoffer Dall
@ 2014-12-04 10:02                           ` Eric Auger
  0 siblings, 0 replies; 80+ messages in thread
From: Eric Auger @ 2014-12-04 10:02 UTC (permalink / raw)
  To: linux-arm-kernel

On 12/04/2014 10:34 AM, Christoffer Dall wrote:
> On Wed, Dec 03, 2014 at 12:03:59PM +0000, Andre Przywara wrote:
>> On 03/12/14 11:39, Peter Maydell wrote:
>>> On 3 December 2014 at 11:28, Arnd Bergmann <arnd@arndb.de> wrote:
>>>> On Wednesday 03 December 2014 11:10:18 Andre Przywara wrote:
>>>>> Currently we have in QEMU:
>>>>>    0 MB -  128 MB: Flash
>>>>>  128 MB -  144 MB: CPU peripherals (GIC for now only)
>>>>>  144 MB -  144 MB: UART (64k)
>>>>>  144 MB -  144 MB: RTC (64k)
>>>>>  160 MB -  256 MB: virtio (32 * 512 Bytes used)
>>>>>  256 MB - 1024 MB: PCI
>>>>> 1024 MB - ???? MB: DRAM
>>>>>
>>>>> So we waste quite some space between 144 MB and 256 MB, where we
>>>>> currently use only 3 64K pages. If we would move those three pages
>>>>> closer to the 256 MB region, we could extend the GIC mapping space from
>>>>> 16 MB (127 VCPUs) to almost 128 MB (good for 1022 VCPUs).
>>>
>>> Note that part of the reason for that gap is to give us space
>>> to add more memory-mapped peripherals as they are needed.
>>> For instance the fw_cfg conduit for talking to UEFI firmware
>>> is going to need another 64K page in that section.

Hi,

For info I also planned to use 4MB between RTC and virtio for platform
bus (all dynamically instantiated sysbus devices including VFIO platform
devices), currently plugged at 148MB
(http://lists.gnu.org/archive/html/qemu-devel/2014-11/msg04346.html)

Best Regards

Eric

>>
>> Sure, that was just to show the situation in general. We could do
>> something like kvmtool and grow the CPU peripheral space and the device
>> MMIO pages from different directions to not waste space needlessly.
>> So UART, RTC, virtio and future devices use space from 128MB upwards,
>> whereas the GIC uses as much space below 256 MB as needed.
>>
>> Btw.: Is there an issue with not aligning each virtio device to a 64K
>> page? Will we never need to separate access to virtio devices in a guest
>> with MMU help?
>>
>>>> Having a more compressed mapping sounds useful, but you could also consider
>>>> carving the GIC registers out of the PCI range and make PCI memory space
>>>> smaller if there are lots of CPUs.
>>>>
>>>> Is there also a 64-bit PCI memory space behind the DRAM?
>>>
>>> The "PCI" section at the moment is purely hypothetical -- it's
>>> a lump of reserved space in the memory map that I put in so
>>> we had a place to put PCI later... The DRAM is just the last
>>> thing in the memory map and goes up for as much DRAM as you
>>> asked QEMU to provide, so in that sense there's no "behind"
>>> there.
>>>
>>> We could (at least at the moment) shuffle things around a bit,
>>> since guests should all be looking at the DTB to figure where
>>> things are. About the only thing we can't do is move the base
>>> address of RAM or put the flash somewhere other than 0. If
>>> we do need to make changes I'd rather we figured out the new
>>> layout and switched to it rather than doing a set of piecemeal
>>> tweaks, though.
>>
>> Agreed. As QEMU does not support GICv3 anyway at the moment, there is no
>> need to rush things.
>>
>> Is there any sign of a PCI host controller emulation for QEMU on the
>> horizon?
>>
> Yes, we need to have this in Q1 2015 latest.  Linaro has scheduled this
> work, but patches are a little while out I'm afraid.
> 
> -Christoffer
> _______________________________________________
> kvmarm mailing list
> kvmarm at lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 08/19] arm/arm64: KVM: make the maximum number of vCPUs a per-VM value
  2014-11-23 13:21   ` Christoffer Dall
@ 2014-12-08 14:10     ` Andre Przywara
  0 siblings, 0 replies; 80+ messages in thread
From: Andre Przywara @ 2014-12-08 14:10 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On 23/11/14 13:21, Christoffer Dall wrote:
> On Fri, Nov 14, 2014 at 10:07:52AM +0000, Andre Przywara wrote:
>> Currently the maximum number of vCPUs supported is a global value
>> limited by the used GIC model. GICv3 will lift this limit, but we
>> still need to observe it for guests using GICv2.
>> So the maximum number of vCPUs is per-VM value, depending on the
>> GIC model the guest uses.
>> Store and check the value in struct kvm_arch, but keep it down to
>> 8 for now.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> ---
>> Changelog v3...v4:
>> - initialize max_vcpus with limit based on host GIC
>> - remove *_init_emul_* from VGIC backend
>> - refine VCPU limit on VGIC creation
>> - print warning when userland tries to create more VCPUs than supported
>>
>>  arch/arm/include/asm/kvm_host.h   |    1 +
>>  arch/arm/kvm/arm.c                |    8 ++++++++
>>  arch/arm64/include/asm/kvm_host.h |    3 +++
>>  include/kvm/arm_vgic.h            |    2 ++
>>  virt/kvm/arm/vgic-v2.c            |    1 +
>>  virt/kvm/arm/vgic-v3.c            |    1 +
>>  virt/kvm/arm/vgic.c               |   22 ++++++++++++++++++++++
>>  7 files changed, 38 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index b443dfe..7969e6e 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -68,6 +68,7 @@ struct kvm_arch {
>>  
>>  	/* Interrupt controller */
>>  	struct vgic_dist	vgic;
>> +	int max_vcpus;
>>  };
>>  
>>  #define KVM_NR_MEM_OBJS     40
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 8817fbd..c3d0fbd 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -132,6 +132,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>>  	/* Mark the initial VMID generation invalid */
>>  	kvm->arch.vmid_gen = 0;
>>  
>> +	/* The maximum number of VCPUs is limited by the host's GIC model */
>> +	kvm->arch.max_vcpus = kvm_vgic_get_max_vcpus();
> 
> I think you forgot to declare this one in arm_vgic.h for
> v7-without-vgic-but-with-kvm-configure-case
> (which-we-should-have-gotten-rid-of-a-while-back-perhaps).

Ah yes, thanks for pointing this out. Fixed in v5.

>> +
>>  	return ret;
>>  out_free_stage2_pgd:
>>  	kvm_free_stage2_pgd(kvm);
>> @@ -213,6 +216,11 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
>>  	int err;
>>  	struct kvm_vcpu *vcpu;
>>  
>> +	if (id >= kvm->arch.max_vcpus) {
>> +		err = -EINVAL;
>> +		goto out;
>> +	}
>> +
>>  	vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
>>  	if (!vcpu) {
>>  		err = -ENOMEM;
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 286bb61..f9e130d 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -59,6 +59,9 @@ struct kvm_arch {
>>  	/* VTTBR value associated with above pgd and vmid */
>>  	u64    vttbr;
>>  
>> +	/* The maximum number of vCPUs depends on the used GIC model */
>> +	int max_vcpus;
>> +
>>  	/* Interrupt controller */
>>  	struct vgic_dist	vgic;
>>  
>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>> index bfb660a..09344ac 100644
>> --- a/include/kvm/arm_vgic.h
>> +++ b/include/kvm/arm_vgic.h
>> @@ -132,6 +132,7 @@ struct vgic_params {
>>  	unsigned int	maint_irq;
>>  	/* Virtual control interface base address */
>>  	void __iomem	*vctrl_base;
>> +	int		max_hw_vcpus;
> 
> nit: max_vcpus or max_gic_vcpus would be more meaningful imho.

Sure.

>>  };
>>  
>>  struct vgic_vm_ops {
>> @@ -287,6 +288,7 @@ struct kvm_exit_mmio;
>>  #ifdef CONFIG_KVM_ARM_VGIC
>>  int kvm_vgic_addr(struct kvm *kvm, unsigned long type, u64 *addr, bool write);
>>  int kvm_vgic_hyp_init(void);
>> +int kvm_vgic_get_max_vcpus(void);
>>  int kvm_vgic_init(struct kvm *kvm);
>>  int kvm_vgic_create(struct kvm *kvm, u32 type);
>>  void kvm_vgic_destroy(struct kvm *kvm);
>> diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
>> index e1cd3cb..49fb288 100644
>> --- a/virt/kvm/arm/vgic-v2.c
>> +++ b/virt/kvm/arm/vgic-v2.c
>> @@ -237,6 +237,7 @@ int vgic_v2_probe(struct device_node *vgic_node,
>>  		 vctrl_res.start, vgic->maint_irq);
>>  
>>  	vgic->type = VGIC_V2;
>> +	vgic->max_hw_vcpus = 8;
> 
> define GIC_V2_MAX_CPUS ?

Yes, makes sense.

>>  	*ops = &vgic_v2_ops;
>>  	*params = vgic;
>>  	goto out;
>> diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
>> index d14c75f..acd256c7 100644
>> --- a/virt/kvm/arm/vgic-v3.c
>> +++ b/virt/kvm/arm/vgic-v3.c
>> @@ -235,6 +235,7 @@ int vgic_v3_probe(struct device_node *vgic_node,
>>  	vgic->vcpu_base = vcpu_res.start;
>>  	vgic->vctrl_base = NULL;
>>  	vgic->type = VGIC_V3;
>> +	vgic->max_hw_vcpus = KVM_MAX_VCPUS;
>>  
>>  	kvm_info("%s@%llx IRQ%d\n", vgic_node->name,
>>  		 vcpu_res.start, vgic->maint_irq);
>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>> index 4aa0b2f..4c72c66 100644
>> --- a/virt/kvm/arm/vgic.c
>> +++ b/virt/kvm/arm/vgic.c
>> @@ -1841,6 +1841,17 @@ static int vgic_vcpu_init_maps(struct kvm_vcpu *vcpu, int nr_irqs)
>>  }
>>  
>>  /**
>> + * kvm_vgic_get_max_vcpus - Get the maximum number of VCPUs allowed by HW
>> + *
>> + * The host's GIC naturally limits the maximum amount of VCPUs a guest
>> + * can use.
>> + */
>> +int kvm_vgic_get_max_vcpus(void)
>> +{
>> +	return vgic->max_hw_vcpus;
>> +}
>> +
>> +/**
>>   * kvm_vgic_vcpu_init - Initialize per-vcpu VGIC state
>>   * @vcpu: pointer to the vcpu struct
>>   *
>> @@ -2056,6 +2067,8 @@ static int vgic_v2_init_emulation(struct kvm *kvm)
>>  	dist->vm_ops.add_sgi_source = vgic_v2_add_sgi_source;
>>  	dist->vm_ops.vgic_init = vgic_v2_init;
>>  
>> +	kvm->arch.max_vcpus = 8;
> 
> reuse the define here
> 
>> +
>>  	return 0;
>>  }
>>  
>> @@ -2072,6 +2085,15 @@ static int init_vgic_model(struct kvm *kvm, int type)
>>  		break;
>>  	}
>>  
>> +	if (ret)
>> +		return ret;
>> +
>> +	if (kvm->arch.max_vcpus < atomic_read(&kvm->online_vcpus)) {
> 
> I would invert this check; if (online_vcpus > max_vcpus) ...

Done.

>> +		pr_warn_ratelimited("VGIC model only supports up to %d vCPUs\n",
>> +			kvm->arch.max_vcpus);
>> +		ret = -EINVAL;
>> +	}
>> +
>>  	return ret;
>>  }
>>  
>> -- 
>> 1.7.9.5
>>
> 
> So let me see if I got this right:
> 
> When you create the VM you set the maximum number of vcpus to whatever
> the underlying vgic hardware allows.
>
> Then, when you start creating vcpus, you complain if the user tries to
> create more than what the hardware allows (check against
> kvm->arch.max_vcpus).
> 
> Then, when you create the vgic, you further limit kvm->arch.max_vcpus
> and check if you already created too many vcpus for the vgic model you
> are trying to create, and error out in that case, and now also check
> against the new value when user space is trying to create more vcpus.

Yes, that was my idea.

> Some questions:
> 
> (1) Is there currently a way to tell user space what the maximum number
> of vcpus for a given setup is?

Not that I know of (also not for the other architectures, AFAICT).
I guess we would inherit that possible change of the result when doing
the ioctl before or after the VGIC creation, so I am not sure if that is
really a useful ioctl to have. Maybe as part of the VGIC kvm_device
interface, which would automatically ensure only one answer?

> (2) Would it be simpler to just have kvm_vgic_max_vcpus() return its
> best guess and always check against that from the outside?  Hmmm, maybe
> not, but thought I'd throw it out there.

Maybe, maybe not, lets make this a separate bikeshed discussion right
before the holidays ;-)

Cheers,
Andre.

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2014-12-08 14:10 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-14 10:07 [PATCH v4 00/19] KVM GICv3 emulation Andre Przywara
2014-11-14 10:07 ` [PATCH v4 01/19] arm/arm64: KVM: rework MPIDR assignment and add accessors Andre Przywara
2014-11-18 10:35   ` Eric Auger
2014-11-23  9:34   ` Christoffer Dall
2014-11-14 10:07 ` [PATCH v4 02/19] arm/arm64: KVM: pass down user space provided GIC type into vGIC code Andre Przywara
2014-11-18 10:36   ` Eric Auger
2014-11-14 10:07 ` [PATCH v4 03/19] arm/arm64: KVM: refactor vgic_handle_mmio() function Andre Przywara
2014-11-18 10:35   ` Eric Auger
2014-11-14 10:07 ` [PATCH v4 04/19] arm/arm64: KVM: wrap 64 bit MMIO accesses with two 32 bit ones Andre Przywara
2014-11-18 10:36   ` Eric Auger
2014-11-23  9:42   ` Christoffer Dall
2014-11-24 13:50     ` Andre Przywara
2014-11-24 14:40       ` Christoffer Dall
2014-11-14 10:07 ` [PATCH v4 05/19] arm/arm64: KVM: introduce per-VM ops Andre Przywara
2014-11-23  9:58   ` Christoffer Dall
2014-11-14 10:07 ` [PATCH v4 06/19] arm/arm64: KVM: move kvm_register_device_ops() into vGIC probing Andre Przywara
2014-11-18 10:43   ` Eric Auger
2014-11-18 10:58     ` Eric Auger
2014-11-18 11:03       ` Andre Przywara
2014-11-14 10:07 ` [PATCH v4 07/19] arm/arm64: KVM: dont rely on a valid GICH base address Andre Przywara
2014-11-14 10:07 ` [PATCH v4 08/19] arm/arm64: KVM: make the maximum number of vCPUs a per-VM value Andre Przywara
2014-11-23 13:21   ` Christoffer Dall
2014-12-08 14:10     ` Andre Przywara
2014-11-14 10:07 ` [PATCH v4 09/19] arm/arm64: KVM: make the value of ICC_SRE_EL1 a per-VM variable Andre Przywara
2014-11-14 10:07 ` [PATCH v4 10/19] arm/arm64: KVM: refactor MMIO accessors Andre Przywara
2014-11-14 10:07 ` [PATCH v4 11/19] arm/arm64: KVM: refactor/wrap vgic_set/get_attr() Andre Przywara
2014-11-23 13:27   ` Christoffer Dall
2014-11-14 10:07 ` [PATCH v4 12/19] arm/arm64: KVM: add vgic.h header file Andre Przywara
2014-11-18 14:07   ` Eric Auger
2014-11-18 15:24     ` Andre Przywara
2014-11-23 13:29   ` Christoffer Dall
2014-11-14 10:07 ` [PATCH v4 13/19] arm/arm64: KVM: split GICv2 specific emulation code from vgic.c Andre Przywara
2014-11-23 13:32   ` Christoffer Dall
2014-11-14 10:07 ` [PATCH v4 14/19] arm/arm64: KVM: add opaque private pointer to MMIO data Andre Przywara
2014-11-23 13:33   ` Christoffer Dall
2014-11-14 10:07 ` [PATCH v4 15/19] arm/arm64: KVM: add virtual GICv3 distributor emulation Andre Przywara
2014-11-14 11:07   ` Christoffer Dall
2014-11-17 13:58     ` Andre Przywara
2014-11-17 23:46       ` Christoffer Dall
2014-11-18 15:57   ` Eric Auger
2014-11-23 14:38   ` Christoffer Dall
2014-11-24 16:00     ` Andre Przywara
2014-11-25 10:41       ` Christoffer Dall
2014-11-28 15:24         ` Andre Przywara
2014-11-30  8:30           ` Christoffer Dall
2014-12-02 16:24             ` Andre Przywara
2014-12-02 17:06               ` Marc Zyngier
2014-12-02 17:32                 ` Andre Przywara
2014-12-03 10:30                   ` Christoffer Dall
2014-12-03 10:47                     ` Andre Przywara
2014-12-03 11:06                       ` Christoffer Dall
2014-12-03 10:29                 ` Christoffer Dall
2014-12-03 10:44                   ` Marc Zyngier
2014-12-03 11:07                     ` Christoffer Dall
2014-12-03 10:28               ` Christoffer Dall
2014-12-03 11:10                 ` Andre Przywara
2014-12-03 11:28                   ` Arnd Bergmann
2014-12-03 11:39                     ` Peter Maydell
2014-12-03 12:03                       ` Andre Przywara
2014-12-03 13:14                         ` Arnd Bergmann
2014-12-04  9:34                         ` Christoffer Dall
2014-12-04 10:02                           ` Eric Auger
2014-11-14 10:08 ` [PATCH v4 16/19] arm64: GICv3: introduce symbolic names for GICv3 ICC_SGI1R_EL1 fields Andre Przywara
2014-11-23 14:43   ` Christoffer Dall
2014-11-14 10:08 ` [PATCH v4 17/19] arm64: KVM: add SGI generation register emulation Andre Przywara
2014-11-23 15:08   ` Christoffer Dall
2014-11-24 16:37     ` Andre Przywara
2014-11-25 11:03       ` Christoffer Dall
2014-11-28 15:40         ` Andre Przywara
2014-11-30  8:45           ` Christoffer Dall
2014-12-03 17:50             ` Andre Przywara
2014-12-03 20:22               ` Christoffer Dall
2014-11-14 10:08 ` [PATCH v4 18/19] arm/arm64: KVM: enable kernel side of GICv3 emulation Andre Przywara
2014-11-24  9:09   ` Christoffer Dall
2014-11-24 17:41     ` Andre Przywara
2014-11-25 11:08       ` Christoffer Dall
2014-11-14 10:08 ` [PATCH v4 19/19] arm/arm64: KVM: allow userland to request a virtual GICv3 Andre Przywara
2014-11-24  9:39   ` Christoffer Dall
2014-11-24  9:33 ` [PATCH v4 00/19] KVM GICv3 emulation Eric Auger
2014-11-24 17:46   ` Andre Przywara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.