All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-14  0:04 Christoffer Dall
  2012-10-14  0:04 ` [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl Christoffer Dall
                   ` (3 more replies)
  0 siblings, 4 replies; 102+ messages in thread
From: Christoffer Dall @ 2012-10-14  0:04 UTC (permalink / raw)
  To: kvmarm, kvm; +Cc: peter.maydell, marc.zyngier, Christoffer Dall

*** warning: this RFC patch series is only compile-tested ***

We need a way to specify the address at which we expect VMs to access
the interrupt controller (both the emulated distributor and the hardware
interface supporting virtualization).  User space should decide on this
address as user space decides on an emulated board and loads a device
tree describing these details directly to the guest.

Instead of modifying the copying KVM_CREATE_IRQCHIP to an ARM specific
ioctl with a a highly device specific set of parameters, we try
something slightly more generic, that should fit well with how user
space (read QEMU) first builds the individual devices and later sets up
the emulated platform.

Comments welcome!

Christoffer Dall (3):
  KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl
  KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl
  KVM: ARM: Split KVM_CREATE_IRQCHIP and KVM_INIT_IRQCHIP

 Documentation/virtual/kvm/api.txt |   46 +++++++++++++++++++++++
 arch/arm/include/asm/kvm.h        |   13 +++++++
 arch/arm/include/asm/kvm_mmu.h    |    1 +
 arch/arm/include/asm/kvm_vgic.h   |   12 ++++++
 arch/arm/kvm/arm.c                |   38 ++++++++++++++++++-
 arch/arm/kvm/vgic.c               |   74 +++++++++++++++++++++++++++++++------
 include/linux/kvm.h               |   11 ++++++
 7 files changed, 183 insertions(+), 12 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 102+ messages in thread

* [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl
  2012-10-14  0:04 [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses Christoffer Dall
@ 2012-10-14  0:04 ` Christoffer Dall
  2012-10-17 20:21   ` Peter Maydell
  2012-10-18 12:20   ` Avi Kivity
  2012-10-14  0:04 ` [RFC PATCH 2/3] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl Christoffer Dall
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 102+ messages in thread
From: Christoffer Dall @ 2012-10-14  0:04 UTC (permalink / raw)
  To: kvmarm, kvm; +Cc: peter.maydell, marc.zyngier, Christoffer Dall

Used to initialize the in-kernel interrupt controller. On ARM we need to
map the virtual generic interrupt controller (vGIC) into Hyp the guest's
physicall address space so the guest can access the virtual cpu
interface. This must be done after the IRQ chips is create and after a
base address has been provided for the emulated platform (patch is
following), but before the CPU is initally run.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 Documentation/virtual/kvm/api.txt |   16 ++++++++++++++++
 arch/arm/kvm/arm.c                |    1 +
 include/linux/kvm.h               |    3 +++
 3 files changed, 20 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 25eacc6..26e953d 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2102,6 +2102,22 @@ This ioctl returns the guest registers that are supported for the
 KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
 
 
+4.79 KVM_INIT_IRQCHIP
+
+Capability: KVM_CAP_INIT_IRQCHIP
+Architectures: arm
+Type: vm ioctl
+Parameters: none
+Returns: 0 on success, -1 on error
+
+Initialize the in-kernel interrupt controller. On ARM we need to map the
+virtual generic interrupt controller (vGIC) into Hyp the guest's physicall
+address space so the guest can access the virtual cpu interface. This must be
+done after the IRQ chips is create and after a base address has been provided
+for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU is
+initally run.
+
+
 5. The kvm_run structure
 ------------------------
 
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index f8c377b..85c76e4 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -195,6 +195,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	switch (ext) {
 #ifdef CONFIG_KVM_ARM_VGIC
 	case KVM_CAP_IRQCHIP:
+	case KVM_CAP_INIT_IRQCHIP:
 		r = vgic_present;
 		break;
 #endif
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 8091b1d..90ee023 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -626,6 +626,7 @@ struct kvm_ppc_smmu_info {
 #ifdef __KVM_HAVE_READONLY_MEM
 #define KVM_CAP_READONLY_MEM 81
 #endif
+#define KVM_CAP_INIT_IRQCHIP 82
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -839,6 +840,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_PPC_GET_SMMU_INFO	  _IOR(KVMIO,  0xa6, struct kvm_ppc_smmu_info)
 /* Available with KVM_CAP_PPC_ALLOC_HTAB */
 #define KVM_PPC_ALLOCATE_HTAB	  _IOWR(KVMIO, 0xa7, __u32)
+/* Available with KVM_CAP_INIT_IRQCHIP */
+#define KVM_INIT_IRQCHIP	  _IO(KVMIO,   0xa8)
 
 /*
  * ioctls for vcpu fds
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH 2/3] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl
  2012-10-14  0:04 [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses Christoffer Dall
  2012-10-14  0:04 ` [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl Christoffer Dall
@ 2012-10-14  0:04 ` Christoffer Dall
  2012-10-17 20:29   ` Peter Maydell
  2012-10-14  0:04 ` [RFC PATCH 3/3] KVM: ARM: Split KVM_CREATE_IRQCHIP and KVM_INIT_IRQCHIP Christoffer Dall
  2012-10-17 20:38   ` Alexander Graf
  3 siblings, 1 reply; 102+ messages in thread
From: Christoffer Dall @ 2012-10-14  0:04 UTC (permalink / raw)
  To: kvmarm, kvm; +Cc: peter.maydell, marc.zyngier, Christoffer Dall

On ARM (and possibly other architectures) some bits are specific to the
model being emulated for the guest and user space needs a way to tell
the kernel about those bits.  An example is mmio device base addresses,
where KVM must know the base address for a given device to properly
emulate mmio accesses within a certain address range or directly map a
device with virtualiation extensions into the guest address space.

We try to make this API slightly more generic than for our specific use,
but so far only the VGIC uses this feature.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 Documentation/virtual/kvm/api.txt |   30 ++++++++++++++++++++++++++++++
 arch/arm/include/asm/kvm.h        |   13 +++++++++++++
 arch/arm/include/asm/kvm_mmu.h    |    1 +
 arch/arm/include/asm/kvm_vgic.h   |    6 ++++++
 arch/arm/kvm/arm.c                |   31 ++++++++++++++++++++++++++++++-
 arch/arm/kvm/vgic.c               |   34 +++++++++++++++++++++++++++++++---
 include/linux/kvm.h               |    8 ++++++++
 7 files changed, 119 insertions(+), 4 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 26e953d..30ddcac 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2118,6 +2118,36 @@ for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU is
 initally run.
 
 
+4.80 KVM_SET_DEVICE_ADDRESS
+
+Capability: KVM_CAP_SET_DEVICE_ADDRESS
+Architectures: arm
+Type: vm ioctl
+Parameters: struct kvm_device_address (in)
+Returns: 0 on success, -1 on error
+Errors:
+  ENODEV: The device id is unknwown
+  ENXIO:  Device not supported in configuration
+  E2BIG:  Address outside of guest physical address space
+
+struct kvm_device_address {
+	__u32 id;
+	__u64 addr;
+};
+
+Specify a device address in the guest's physical address space where guests
+can access emulated or directly exposed devices, which the host kernel needs
+to know about. The id field is an architecture specific identifier for a
+specific device.
+
+ARM divides the id field into two parts, a device ID and an address type id
+specific to the individual device.
+
+  bits:  | 31    ...    16 | 15    ...    0 |
+  field: |     device id   |  addr type id  |
+
+
+
 5. The kvm_run structure
 ------------------------
 
diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
index b6eaf0c..dfd60cc 100644
--- a/arch/arm/include/asm/kvm.h
+++ b/arch/arm/include/asm/kvm.h
@@ -42,6 +42,19 @@ struct kvm_regs {
 #define KVM_ARM_TARGET_CORTEX_A15	0
 #define KVM_ARM_NUM_TARGETS		1
 
+/* KVM_SET_DEVICE_ADDRESS ioctl id encoding */
+#define KVM_DEVICE_TYPE_SHIFT		0
+#define KVM_DEVICE_TYPE_MASK		(0xffff << KVM_DEVICE_TYPE_SHIFT)
+#define KVM_DEVICE_ID_SHIFT		16
+#define KVM_DEVICE_ID_MASK		(0xffff << KVM_DEVICE_ID_SHIFT)
+
+/* Supported device IDs */
+#define KVM_ARM_DEVICE_VGIC_V2		0
+
+/* Supported VGIC address types  */
+#define KVM_VGIC_V2_ADDR_TYPE_DIST	0
+#define KVM_VGIC_V2_ADDR_TYPE_CPU	1
+
 struct kvm_vcpu_init {
 	__u32 target;
 	__u32 features[7];
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index ecfaaf0..0aef24f 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -26,6 +26,7 @@
  * To save a bit of memory and to avoid alignment issues we assume 39-bit IPA
  * for now, but remember that the level-1 table must be aligned to its size.
  */
+#define KVM_MAX_IPA	((1ULL << 38) - 1)
 #define PTRS_PER_PGD2	512
 #define PGD2_ORDER	get_order(PTRS_PER_PGD2 * sizeof(pgd_t))
 
diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h
index 588c637..a688132 100644
--- a/arch/arm/include/asm/kvm_vgic.h
+++ b/arch/arm/include/asm/kvm_vgic.h
@@ -242,6 +242,7 @@ struct kvm_exit_mmio;
 
 #ifdef CONFIG_KVM_ARM_VGIC
 int kvm_vgic_hyp_init(void);
+int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr);
 int kvm_vgic_init(struct kvm *kvm);
 void kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu);
 void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu);
@@ -261,6 +262,11 @@ static inline int kvm_vgic_hyp_init(void)
 	return 0;
 }
 
+static inline int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr)
+{
+	return 0;
+}
+
 static inline int kvm_vgic_init(struct kvm *kvm)
 {
 	return 0;
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 85c76e4..67c8cc2 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -207,6 +207,9 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_COALESCED_MMIO:
 		r = KVM_COALESCED_MMIO_PAGE_OFFSET;
 		break;
+	case KVM_CAP_SET_DEVICE_ADDR:
+		r = 1;
+		break;
 	default:
 		r = 0;
 		break;
@@ -859,20 +862,46 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 	return -EINVAL;
 }
 
+static int kvm_vm_ioctl_set_device_address(struct kvm *kvm,
+					   struct kvm_device_address *dev_addr)
+{
+	unsigned long dev_id, type;
+
+	dev_id = (dev_addr->id & KVM_DEVICE_ID_MASK) >> KVM_DEVICE_ID_SHIFT;
+	type = (dev_addr->id & KVM_DEVICE_TYPE_MASK) >> KVM_DEVICE_TYPE_SHIFT;
+
+	switch (dev_id) {
+	case KVM_ARM_DEVICE_VGIC_V2:
+		if (!vgic_present)
+			return -ENXIO;
+		return kvm_vgic_set_addr(kvm, type, dev_addr->addr);
+	default:
+		return -ENODEV;
+	}
+}
+
 long kvm_arch_vm_ioctl(struct file *filp,
 		       unsigned int ioctl, unsigned long arg)
 {
+	struct kvm *kvm = filp->private_data;
+	void __user *argp = (void __user *)arg;
 
 	switch (ioctl) {
 #ifdef CONFIG_KVM_ARM_VGIC
 	case KVM_CREATE_IRQCHIP: {
-		struct kvm *kvm = filp->private_data;
 		if (vgic_present)
 			return kvm_vgic_init(kvm);
 		else
 			return -EINVAL;
 	}
 #endif
+	case KVM_SET_DEVICE_ADDRESS: {
+		struct kvm_device_address dev_addr;
+
+		if (copy_from_user(&dev_addr, argp, sizeof(dev_addr)))
+			return -EFAULT;
+		return kvm_vm_ioctl_set_device_address(kvm, &dev_addr);
+	}
 	default:
 		return -EINVAL;
 	}
diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c
index 494d94d..1e4be2d 100644
--- a/arch/arm/kvm/vgic.c
+++ b/arch/arm/kvm/vgic.c
@@ -65,12 +65,17 @@
  *   interrupt line to be sampled again.
  */
 
-/* Temporary hacks, need to be provided by userspace emulation */
-#define VGIC_DIST_BASE		0x2c001000
+#define VGIC_ADDR_UNDEF		(-1)
 #define VGIC_DIST_SIZE		0x1000
-#define VGIC_CPU_BASE		0x2c002000
 #define VGIC_CPU_SIZE		0x2000
 
+/* Physical address of vgic virtual cpu interface */
+static phys_addr_t vgic_vcpu_base;
+
+/* Guest physical addresses used by the guest to access the vgic */
+static unsigned long vgic_guest_dist_base = VGIC_ADDR_UNDEF;
+static unsigned long vgic_guest_cpu_base = VGIC_ADDR_UNDEF;
+
 /* Virtual control interface base address */
 static void __iomem *vgic_vctrl_base;
 
@@ -1117,3 +1122,26 @@ out:
 
 	return ret;
 }
+
+int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr)
+{
+	int r = 0;
+
+	if (addr > KVM_MAX_IPA)
+		return -E2BIG;
+
+	mutex_lock(&kvm->lock);
+	switch (type) {
+	case KVM_VGIC_V2_ADDR_TYPE_DIST:
+		vgic_guest_dist_base = addr;
+		break;
+	case KVM_VGIC_V2_ADDR_TYPE_CPU:
+		vgic_guest_cpu_base = addr;
+		break;
+	default:
+		r = -ENODEV;
+	}
+
+	mutex_unlock(&kvm->lock);
+	return r;
+}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 90ee023..e9da8ec 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -627,6 +627,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_READONLY_MEM 81
 #endif
 #define KVM_CAP_INIT_IRQCHIP 82
+#define KVM_CAP_SET_DEVICE_ADDR 83
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -760,6 +761,11 @@ struct kvm_msi {
 	__u8  pad[16];
 };
 
+struct kvm_device_address {
+	__u32 id;
+	__u64 addr;
+};
+
 /*
  * ioctls for VM fds
  */
@@ -842,6 +848,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_PPC_ALLOCATE_HTAB	  _IOWR(KVMIO, 0xa7, __u32)
 /* Available with KVM_CAP_INIT_IRQCHIP */
 #define KVM_INIT_IRQCHIP	  _IO(KVMIO,   0xa8)
+/* Available with KVM_CAP_SET_DEVICE_ADDR */
+#define KVM_SET_DEVICE_ADDRESS	  _IOW(KVMIO,  0xa9, struct kvm_device_address)
 
 /*
  * ioctls for vcpu fds
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH 3/3] KVM: ARM: Split KVM_CREATE_IRQCHIP and KVM_INIT_IRQCHIP
  2012-10-14  0:04 [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses Christoffer Dall
  2012-10-14  0:04 ` [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl Christoffer Dall
  2012-10-14  0:04 ` [RFC PATCH 2/3] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl Christoffer Dall
@ 2012-10-14  0:04 ` Christoffer Dall
  2012-10-18 11:15   ` Peter Maydell
  2012-10-17 20:38   ` Alexander Graf
  3 siblings, 1 reply; 102+ messages in thread
From: Christoffer Dall @ 2012-10-14  0:04 UTC (permalink / raw)
  To: kvmarm, kvm; +Cc: peter.maydell, marc.zyngier, Christoffer Dall

We need this two factor initialization step to support a sane user space
initialization of the emulated model.  We simply follow the names of the
ioctls for the internal vgic implementation steps and check if we have
everything we need on the host side when we create the vgic and set up
the rest on init.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/include/asm/kvm_vgic.h |    6 ++++++
 arch/arm/kvm/arm.c              |    6 ++++++
 arch/arm/kvm/vgic.c             |   40 +++++++++++++++++++++++++++++++--------
 3 files changed, 44 insertions(+), 8 deletions(-)

diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h
index a688132..8bd1426 100644
--- a/arch/arm/include/asm/kvm_vgic.h
+++ b/arch/arm/include/asm/kvm_vgic.h
@@ -243,6 +243,7 @@ struct kvm_exit_mmio;
 #ifdef CONFIG_KVM_ARM_VGIC
 int kvm_vgic_hyp_init(void);
 int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr);
+int kvm_vgic_create(struct kvm *kvm);
 int kvm_vgic_init(struct kvm *kvm);
 void kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu);
 void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu);
@@ -267,6 +268,11 @@ static inline int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 add
 	return 0;
 }
 
+static inline int kvm_vgic_create(struct kvm *kvm)
+{
+	return 0;
+}
+
 static inline int kvm_vgic_init(struct kvm *kvm)
 {
 	return 0;
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 67c8cc2..c0af87e 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -890,6 +890,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
 #ifdef CONFIG_KVM_ARM_VGIC
 	case KVM_CREATE_IRQCHIP: {
 		if (vgic_present)
+			return kvm_vgic_create(kvm);
+		else
+			return -EINVAL;
+	}
+	case KVM_INIT_IRQCHIP: {
+		if (vgic_present)
 			return kvm_vgic_init(kvm);
 		else
 			return -EINVAL;
diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c
index 1e4be2d..78d590c 100644
--- a/arch/arm/kvm/vgic.c
+++ b/arch/arm/kvm/vgic.c
@@ -1084,12 +1084,12 @@ out_free_irq:
 int kvm_vgic_init(struct kvm *kvm)
 {
 	int ret, i;
-	struct resource vcpu_res;
 
 	mutex_lock(&kvm->lock);
 
-	if (of_address_to_resource(vgic_node, 3, &vcpu_res)) {
-		kvm_err("Cannot obtain VCPU resource\n");
+	if (vgic_guest_dist_base == VGIC_ADDR_UNDEF ||
+	    vgic_guest_cpu_base == VGIC_ADDR_UNDEF) {
+		kvm_err("Need to set vgic cpu and dist addresses first\n");
 		ret = -ENXIO;
 		goto out;
 	}
@@ -1101,11 +1101,12 @@ int kvm_vgic_init(struct kvm *kvm)
 
 	spin_lock_init(&kvm->arch.vgic.lock);
 	kvm->arch.vgic.vctrl_base = vgic_vctrl_base;
-	kvm->arch.vgic.vgic_dist_base = VGIC_DIST_BASE;
+	kvm->arch.vgic.vgic_dist_base = vgic_guest_dist_base;
 	kvm->arch.vgic.vgic_dist_size = VGIC_DIST_SIZE;
 
-	ret = kvm_phys_addr_ioremap(kvm, VGIC_CPU_BASE,
-				    vcpu_res.start, VGIC_CPU_SIZE);
+	ret = kvm_phys_addr_ioremap(kvm, vgic_guest_cpu_base,
+				    vgic_vcpu_base, VGIC_CPU_SIZE);
+
 	if (ret) {
 		kvm_err("Unable to remap VGIC CPU to VCPU\n");
 		goto out;
@@ -1113,13 +1114,36 @@ int kvm_vgic_init(struct kvm *kvm)
 
 	for (i = 32; i < VGIC_NR_IRQS; i += 4)
 		vgic_set_target_reg(kvm, 0, i);
-
 out:
 	mutex_unlock(&kvm->lock);
-
 	if (!ret)
 		kvm_timer_init(kvm);
+	return ret;
+}
+
+int kvm_vgic_create(struct kvm *kvm)
+{
+	int ret;
+	struct resource vcpu_res;
+
+	mutex_lock(&kvm->lock);
 
+	if (of_address_to_resource(vgic_node, 3, &vcpu_res)) {
+		kvm_err("Cannot obtain VCPU resource\n");
+		ret = -ENXIO;
+		goto out;
+	}
+
+	if (atomic_read(&kvm->online_vcpus) || kvm->arch.vgic.vctrl_base) {
+		ret = -EEXIST;
+		goto out;
+	}
+
+
+	vgic_vcpu_base = vcpu_res.start;
+	ret = 0;
+out:
+	mutex_unlock(&kvm->lock);
 	return ret;
 }
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl
  2012-10-14  0:04 ` [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl Christoffer Dall
@ 2012-10-17 20:21   ` Peter Maydell
  2012-10-17 20:23     ` Christoffer Dall
  2012-10-18 12:20   ` Avi Kivity
  1 sibling, 1 reply; 102+ messages in thread
From: Peter Maydell @ 2012-10-17 20:21 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, kvm, marc.zyngier

On 14 October 2012 01:04, Christoffer Dall
<c.dall@virtualopensystems.com> wrote:
> Used to initialize the in-kernel interrupt controller. On ARM we need to
> map the virtual generic interrupt controller (vGIC) into Hyp the guest's
> physicall address space so the guest can access the virtual cpu
> interface. This must be done after the IRQ chips is create and after a
> base address has been provided for the emulated platform (patch is
> following), but before the CPU is initally run.

I've now written the code for that patch but don't have access to a machine
with the ARM cross compile setup to build it until tomorrow.

>
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> ---
>  Documentation/virtual/kvm/api.txt |   16 ++++++++++++++++
>  arch/arm/kvm/arm.c                |    1 +
>  include/linux/kvm.h               |    3 +++
>  3 files changed, 20 insertions(+)
>
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 25eacc6..26e953d 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -2102,6 +2102,22 @@ This ioctl returns the guest registers that are supported for the
>  KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
>
>
> +4.79 KVM_INIT_IRQCHIP
> +
> +Capability: KVM_CAP_INIT_IRQCHIP
> +Architectures: arm
> +Type: vm ioctl
> +Parameters: none
> +Returns: 0 on success, -1 on error
> +
> +Initialize the in-kernel interrupt controller. On ARM we need to map the
> +virtual generic interrupt controller (vGIC) into Hyp the guest's physicall

Should that "Hyp" be deleted?

"physical"

> +address space so the guest can access the virtual cpu interface. This must be
> +done after the IRQ chips is create and after a base address has been provided

"chip". "created".

> +for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU is
> +initally run.

"initially".

(all these typos are also in your commit message)

> +
> +
>  5. The kvm_run structure
>  ------------------------
>
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index f8c377b..85c76e4 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -195,6 +195,7 @@ int kvm_dev_ioctl_check_extension(long ext)
>         switch (ext) {
>  #ifdef CONFIG_KVM_ARM_VGIC
>         case KVM_CAP_IRQCHIP:
> +       case KVM_CAP_INIT_IRQCHIP:
>                 r = vgic_present;
>                 break;
>  #endif
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index 8091b1d..90ee023 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -626,6 +626,7 @@ struct kvm_ppc_smmu_info {
>  #ifdef __KVM_HAVE_READONLY_MEM
>  #define KVM_CAP_READONLY_MEM 81
>  #endif
> +#define KVM_CAP_INIT_IRQCHIP 82
>
>  #ifdef KVM_CAP_IRQ_ROUTING
>
> @@ -839,6 +840,8 @@ struct kvm_s390_ucas_mapping {
>  #define KVM_PPC_GET_SMMU_INFO    _IOR(KVMIO,  0xa6, struct kvm_ppc_smmu_info)
>  /* Available with KVM_CAP_PPC_ALLOC_HTAB */
>  #define KVM_PPC_ALLOCATE_HTAB    _IOWR(KVMIO, 0xa7, __u32)
> +/* Available with KVM_CAP_INIT_IRQCHIP */
> +#define KVM_INIT_IRQCHIP         _IO(KVMIO,   0xa8)
>
>  /*
>   * ioctls for vcpu fds
> --
> 1.7.9.5
>


-- PMM

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl
  2012-10-17 20:21   ` Peter Maydell
@ 2012-10-17 20:23     ` Christoffer Dall
  2012-10-17 20:31       ` Peter Maydell
  0 siblings, 1 reply; 102+ messages in thread
From: Christoffer Dall @ 2012-10-17 20:23 UTC (permalink / raw)
  To: Peter Maydell; +Cc: kvmarm, kvm, marc.zyngier

On Wed, Oct 17, 2012 at 4:21 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 14 October 2012 01:04, Christoffer Dall
> <c.dall@virtualopensystems.com> wrote:
>> Used to initialize the in-kernel interrupt controller. On ARM we need to
>> map the virtual generic interrupt controller (vGIC) into Hyp the guest's
>> physicall address space so the guest can access the virtual cpu
>> interface. This must be done after the IRQ chips is create and after a
>> base address has been provided for the emulated platform (patch is
>> following), but before the CPU is initally run.
>
> I've now written the code for that patch but don't have access to a machine
> with the ARM cross compile setup to build it until tomorrow.
>
>>
>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
>> ---
>>  Documentation/virtual/kvm/api.txt |   16 ++++++++++++++++
>>  arch/arm/kvm/arm.c                |    1 +
>>  include/linux/kvm.h               |    3 +++
>>  3 files changed, 20 insertions(+)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> index 25eacc6..26e953d 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -2102,6 +2102,22 @@ This ioctl returns the guest registers that are supported for the
>>  KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
>>
>>
>> +4.79 KVM_INIT_IRQCHIP
>> +
>> +Capability: KVM_CAP_INIT_IRQCHIP
>> +Architectures: arm
>> +Type: vm ioctl
>> +Parameters: none
>> +Returns: 0 on success, -1 on error
>> +
>> +Initialize the in-kernel interrupt controller. On ARM we need to map the
>> +virtual generic interrupt controller (vGIC) into Hyp the guest's physicall
>
> Should that "Hyp" be deleted?

yup

>
> "physical"
>
>> +address space so the guest can access the virtual cpu interface. This must be
>> +done after the IRQ chips is create and after a base address has been provided
>
> "chip". "created".
>
>> +for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU is
>> +initally run.
>
> "initially".

thanks a bunch for those, and sorry about the sloppyness.

>
> (all these typos are also in your commit message)
>

yeah, you caught my -ECUTANDPASTE there ;)

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH 2/3] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl
  2012-10-14  0:04 ` [RFC PATCH 2/3] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl Christoffer Dall
@ 2012-10-17 20:29   ` Peter Maydell
  2012-10-19 18:46     ` Christoffer Dall
  0 siblings, 1 reply; 102+ messages in thread
From: Peter Maydell @ 2012-10-17 20:29 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, kvm, marc.zyngier

On 14 October 2012 01:04, Christoffer Dall
<c.dall@virtualopensystems.com> wrote:
> On ARM (and possibly other architectures) some bits are specific to the
> model being emulated for the guest and user space needs a way to tell
> the kernel about those bits.  An example is mmio device base addresses,
> where KVM must know the base address for a given device to properly
> emulate mmio accesses within a certain address range or directly map a
> device with virtualiation extensions into the guest address space.
>
> We try to make this API slightly more generic than for our specific use,
> but so far only the VGIC uses this feature.
>
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> ---
>  Documentation/virtual/kvm/api.txt |   30 ++++++++++++++++++++++++++++++
>  arch/arm/include/asm/kvm.h        |   13 +++++++++++++
>  arch/arm/include/asm/kvm_mmu.h    |    1 +
>  arch/arm/include/asm/kvm_vgic.h   |    6 ++++++
>  arch/arm/kvm/arm.c                |   31 ++++++++++++++++++++++++++++++-
>  arch/arm/kvm/vgic.c               |   34 +++++++++++++++++++++++++++++++---
>  include/linux/kvm.h               |    8 ++++++++
>  7 files changed, 119 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 26e953d..30ddcac 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -2118,6 +2118,36 @@ for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU is
>  initally run.
>
>
> +4.80 KVM_SET_DEVICE_ADDRESS
> +
> +Capability: KVM_CAP_SET_DEVICE_ADDRESS
> +Architectures: arm
> +Type: vm ioctl
> +Parameters: struct kvm_device_address (in)
> +Returns: 0 on success, -1 on error
> +Errors:
> +  ENODEV: The device id is unknwown

"unknown"

> +  ENXIO:  Device not supported in configuration

"in this configuration" ? (I'm guessing this is for "you tried to
map a GIC when this CPU doesn't have a GIC" and similar errors?)

> +  E2BIG:  Address outside of guest physical address space

I would say "outside" rather than "outside of" here.

> +
> +struct kvm_device_address {
> +       __u32 id;
> +       __u64 addr;
> +};
> +
> +Specify a device address in the guest's physical address space where guests
> +can access emulated or directly exposed devices, which the host kernel needs
> +to know about. The id field is an architecture specific identifier for a
> +specific device.
> +
> +ARM divides the id field into two parts, a device ID and an address type id

We should be consistent about whether ID is capitalised or not.

> +specific to the individual device.
> +
> +  bits:  | 31    ...    16 | 15    ...    0 |
> +  field: |     device id   |  addr type id  |

This doesn't say whether userspace is allowed to make this ioctl
multiple times for the same device. This could be any of:
 * undefined behaviour
 * second call fails with some errno
 * second call overrides first one

It also doesn't say that you're supposed to call this after CREATE
and before INIT of the irqchip. (Nor does it say what happens if
you call it at some other time.)

-- PMM

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl
  2012-10-17 20:23     ` Christoffer Dall
@ 2012-10-17 20:31       ` Peter Maydell
  2012-10-17 20:39         ` Christoffer Dall
  0 siblings, 1 reply; 102+ messages in thread
From: Peter Maydell @ 2012-10-17 20:31 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, kvm, marc.zyngier

On 17 October 2012 21:23, Christoffer Dall
<c.dall@virtualopensystems.com> wrote:
> On Wed, Oct 17, 2012 at 4:21 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
>>> +for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU is
>>> +initally run.
>>
>> "initially".
>
> thanks a bunch for those, and sorry about the sloppyness.

No problem. Also just noticed "platform" there :-)

-- PMM

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-14  0:04 [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses Christoffer Dall
@ 2012-10-17 20:38   ` Alexander Graf
  2012-10-14  0:04 ` [RFC PATCH 2/3] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl Christoffer Dall
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 102+ messages in thread
From: Alexander Graf @ 2012-10-17 20:38 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, kvm, Benjamin Herrenschmidt, kvm-ppc

On 10/14/2012 02:04 AM, Christoffer Dall wrote:
> *** warning: this RFC patch series is only compile-tested ***
>
> We need a way to specify the address at which we expect VMs to access
> the interrupt controller (both the emulated distributor and the hardware
> interface supporting virtualization).  User space should decide on this
> address as user space decides on an emulated board and loads a device
> tree describing these details directly to the guest.
>
> Instead of modifying the copying KVM_CREATE_IRQCHIP to an ARM specific
> ioctl with a a highly device specific set of parameters, we try
> something slightly more generic, that should fit well with how user
> space (read QEMU) first builds the individual devices and later sets up
> the emulated platform.

Have you talked to Ben about this one? He wanted to design a new, more 
flexible irqchip API that would work for XICS & MPIC. Maybe there's some 
room for cooperation here?


Alex


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-17 20:38   ` Alexander Graf
  0 siblings, 0 replies; 102+ messages in thread
From: Alexander Graf @ 2012-10-17 20:38 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, kvm, Benjamin Herrenschmidt, kvm-ppc

On 10/14/2012 02:04 AM, Christoffer Dall wrote:
> *** warning: this RFC patch series is only compile-tested ***
>
> We need a way to specify the address at which we expect VMs to access
> the interrupt controller (both the emulated distributor and the hardware
> interface supporting virtualization).  User space should decide on this
> address as user space decides on an emulated board and loads a device
> tree describing these details directly to the guest.
>
> Instead of modifying the copying KVM_CREATE_IRQCHIP to an ARM specific
> ioctl with a a highly device specific set of parameters, we try
> something slightly more generic, that should fit well with how user
> space (read QEMU) first builds the individual devices and later sets up
> the emulated platform.

Have you talked to Ben about this one? He wanted to design a new, more 
flexible irqchip API that would work for XICS & MPIC. Maybe there's some 
room for cooperation here?


Alex


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl
  2012-10-17 20:31       ` Peter Maydell
@ 2012-10-17 20:39         ` Christoffer Dall
  0 siblings, 0 replies; 102+ messages in thread
From: Christoffer Dall @ 2012-10-17 20:39 UTC (permalink / raw)
  To: Peter Maydell; +Cc: kvmarm, kvm, marc.zyngier

On Wed, Oct 17, 2012 at 4:31 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 17 October 2012 21:23, Christoffer Dall
> <c.dall@virtualopensystems.com> wrote:
>> On Wed, Oct 17, 2012 at 4:21 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
>>>> +for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU is
>>>> +initally run.
>>>
>>> "initially".
>>
>> thanks a bunch for those, and sorry about the sloppyness.
>
> No problem. Also just noticed "platform" there :-)
>
I'll spell check the diff just to be sure. :)

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-17 20:38   ` Alexander Graf
@ 2012-10-17 20:39     ` Christoffer Dall
  -1 siblings, 0 replies; 102+ messages in thread
From: Christoffer Dall @ 2012-10-17 20:39 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvmarm, kvm, Benjamin Herrenschmidt, kvm-ppc

On Wed, Oct 17, 2012 at 4:38 PM, Alexander Graf <agraf@suse.de> wrote:
> On 10/14/2012 02:04 AM, Christoffer Dall wrote:
>>
>> *** warning: this RFC patch series is only compile-tested ***
>>
>> We need a way to specify the address at which we expect VMs to access
>> the interrupt controller (both the emulated distributor and the hardware
>> interface supporting virtualization).  User space should decide on this
>> address as user space decides on an emulated board and loads a device
>> tree describing these details directly to the guest.
>>
>> Instead of modifying the copying KVM_CREATE_IRQCHIP to an ARM specific
>> ioctl with a a highly device specific set of parameters, we try
>> something slightly more generic, that should fit well with how user
>> space (read QEMU) first builds the individual devices and later sets up
>> the emulated platform.
>
>
> Have you talked to Ben about this one? He wanted to design a new, more
> flexible irqchip API that would work for XICS & MPIC. Maybe there's some
> room for cooperation here?
>
I have not - Ben, what do you have in mind?

-Christoffer

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-17 20:39     ` Christoffer Dall
  0 siblings, 0 replies; 102+ messages in thread
From: Christoffer Dall @ 2012-10-17 20:39 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvmarm, kvm, Benjamin Herrenschmidt, kvm-ppc

On Wed, Oct 17, 2012 at 4:38 PM, Alexander Graf <agraf@suse.de> wrote:
> On 10/14/2012 02:04 AM, Christoffer Dall wrote:
>>
>> *** warning: this RFC patch series is only compile-tested ***
>>
>> We need a way to specify the address at which we expect VMs to access
>> the interrupt controller (both the emulated distributor and the hardware
>> interface supporting virtualization).  User space should decide on this
>> address as user space decides on an emulated board and loads a device
>> tree describing these details directly to the guest.
>>
>> Instead of modifying the copying KVM_CREATE_IRQCHIP to an ARM specific
>> ioctl with a a highly device specific set of parameters, we try
>> something slightly more generic, that should fit well with how user
>> space (read QEMU) first builds the individual devices and later sets up
>> the emulated platform.
>
>
> Have you talked to Ben about this one? He wanted to design a new, more
> flexible irqchip API that would work for XICS & MPIC. Maybe there's some
> room for cooperation here?
>
I have not - Ben, what do you have in mind?

-Christoffer

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-17 20:39     ` Christoffer Dall
@ 2012-10-17 21:19       ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 102+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-17 21:19 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: Alexander Graf, kvmarm, kvm, kvm-ppc, Paul Mackerras

On Wed, 2012-10-17 at 16:39 -0400, Christoffer Dall wrote:

> > Have you talked to Ben about this one? He wanted to design a new, more
> > flexible irqchip API that would work for XICS & MPIC. Maybe there's some
> > room for cooperation here?
> >
> I have not - Ben, what do you have in mind?

I've been sidetracked to some other stuff so for now Paul (CC) is taking
over my interrupt patches.

We initially changes IRQ_CREATE_IRQCHIP to take an argument but that was
causing an x86 ABI breakage (ioctl number changing). So we'll probably
be creating a new one.

>From there, nothing fancy really, just an ioctl with an IRQ chip type at
the beginning followed by a union of type-specific parameters.

The main problem we haven't sorted out yet is how to replace some of the
horrors related to mapping interrupts that have tendrils all the way
into virtio-pci etc... in kemu that don't apply to use (well mostly) and
the interaction with in-kernel generated interrupts to avoid going
through qemu for vhost ec...

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-17 21:19       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 102+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-17 21:19 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: Alexander Graf, kvmarm, kvm, kvm-ppc, Paul Mackerras

On Wed, 2012-10-17 at 16:39 -0400, Christoffer Dall wrote:

> > Have you talked to Ben about this one? He wanted to design a new, more
> > flexible irqchip API that would work for XICS & MPIC. Maybe there's some
> > room for cooperation here?
> >
> I have not - Ben, what do you have in mind?

I've been sidetracked to some other stuff so for now Paul (CC) is taking
over my interrupt patches.

We initially changes IRQ_CREATE_IRQCHIP to take an argument but that was
causing an x86 ABI breakage (ioctl number changing). So we'll probably
be creating a new one.

From there, nothing fancy really, just an ioctl with an IRQ chip type at
the beginning followed by a union of type-specific parameters.

The main problem we haven't sorted out yet is how to replace some of the
horrors related to mapping interrupts that have tendrils all the way
into virtio-pci etc... in kemu that don't apply to use (well mostly) and
the interaction with in-kernel generated interrupts to avoid going
through qemu for vhost ec...

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-17 20:39     ` Christoffer Dall
@ 2012-10-17 22:10       ` Paul Mackerras
  -1 siblings, 0 replies; 102+ messages in thread
From: Paul Mackerras @ 2012-10-17 22:10 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Alexander Graf, kvmarm, kvm, Benjamin Herrenschmidt, kvm-ppc

On Wed, Oct 17, 2012 at 04:39:57PM -0400, Christoffer Dall wrote:
> On Wed, Oct 17, 2012 at 4:38 PM, Alexander Graf <agraf@suse.de> wrote:
> > On 10/14/2012 02:04 AM, Christoffer Dall wrote:
> >>
> >> *** warning: this RFC patch series is only compile-tested ***
> >>
> >> We need a way to specify the address at which we expect VMs to access
> >> the interrupt controller (both the emulated distributor and the hardware
> >> interface supporting virtualization).  User space should decide on this
> >> address as user space decides on an emulated board and loads a device
> >> tree describing these details directly to the guest.
> >>
> >> Instead of modifying the copying KVM_CREATE_IRQCHIP to an ARM specific
> >> ioctl with a a highly device specific set of parameters, we try
> >> something slightly more generic, that should fit well with how user
> >> space (read QEMU) first builds the individual devices and later sets up
> >> the emulated platform.
> >
> >
> > Have you talked to Ben about this one? He wanted to design a new, more
> > flexible irqchip API that would work for XICS & MPIC. Maybe there's some
> > room for cooperation here?
> >
> I have not - Ben, what do you have in mind?

I've taken over Ben's patches in this area and I'm currently working
on getting them ready for submission.  So far we only have XICS
emulation, and it is accessed through hypercalls, so there are no
addresses in the create-iochip ioctl argument yet.

What we have so far is a new ioctl:

#define KVM_CREATE_IRQCHIP_ARGS   _IOW(KVMIO,  0xac, struct kvm_irqchip_args)

where kvm_irqchip_args is defined in an arch header and currently
looks like this:

/* for KVM_CAP_SPAPR_XICS */
#define __KVM_HAVE_IRQCHIP_ARGS
struct kvm_irqchip_args {
#define KVM_IRQCHIP_TYPE_ICP	0	/* XICS: ICP (presentation controller) */
#define KVM_IRQCHIP_TYPE_ICS	1	/* XICS: ICS (source controller) */
	__u32 type;
	union {
		/* XICS ICP arguments. This needs to be called once before
		 * creating any VCPU to initialize the main kernel XICS data
		 * structures.
		 */
		struct {
#define KVM_ICP_FLAG_NOREALMODE		0x00000001 /* Disable real mode ICP */
			__u32 flags;
		} icp;

		/* XICS ICS arguments. You can call this for every BUID you
		 * want to make available.
		 *
		 * The BUID is 12 bits, the interrupt number within a BUID
		 * is up to 12 bits as well. The resulting interrupt numbers
		 * exposed to the guest are BUID || IRQ which is 24 bit
		 *
		 * BUID cannot be 0.
		 */
		struct {
			__u32 flags;
			__u16 buid;
			__u16 nr_irqs;
		} ics;
	};
};

With the XICS, there are two types of irqchip: a source controller and
a presentation controller.  There is one presentation controller per
vcpu and typically one source controller per PCI host bridge (a source
controller can manage multiple sources).  The "buid" above is
basically an identifier for a source controller.

So with the above, it would be quite easy to add new types and
arguments for them.

Thoughts?

Paul.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-17 22:10       ` Paul Mackerras
  0 siblings, 0 replies; 102+ messages in thread
From: Paul Mackerras @ 2012-10-17 22:10 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Alexander Graf, kvmarm, kvm, Benjamin Herrenschmidt, kvm-ppc

On Wed, Oct 17, 2012 at 04:39:57PM -0400, Christoffer Dall wrote:
> On Wed, Oct 17, 2012 at 4:38 PM, Alexander Graf <agraf@suse.de> wrote:
> > On 10/14/2012 02:04 AM, Christoffer Dall wrote:
> >>
> >> *** warning: this RFC patch series is only compile-tested ***
> >>
> >> We need a way to specify the address at which we expect VMs to access
> >> the interrupt controller (both the emulated distributor and the hardware
> >> interface supporting virtualization).  User space should decide on this
> >> address as user space decides on an emulated board and loads a device
> >> tree describing these details directly to the guest.
> >>
> >> Instead of modifying the copying KVM_CREATE_IRQCHIP to an ARM specific
> >> ioctl with a a highly device specific set of parameters, we try
> >> something slightly more generic, that should fit well with how user
> >> space (read QEMU) first builds the individual devices and later sets up
> >> the emulated platform.
> >
> >
> > Have you talked to Ben about this one? He wanted to design a new, more
> > flexible irqchip API that would work for XICS & MPIC. Maybe there's some
> > room for cooperation here?
> >
> I have not - Ben, what do you have in mind?

I've taken over Ben's patches in this area and I'm currently working
on getting them ready for submission.  So far we only have XICS
emulation, and it is accessed through hypercalls, so there are no
addresses in the create-iochip ioctl argument yet.

What we have so far is a new ioctl:

#define KVM_CREATE_IRQCHIP_ARGS   _IOW(KVMIO,  0xac, struct kvm_irqchip_args)

where kvm_irqchip_args is defined in an arch header and currently
looks like this:

/* for KVM_CAP_SPAPR_XICS */
#define __KVM_HAVE_IRQCHIP_ARGS
struct kvm_irqchip_args {
#define KVM_IRQCHIP_TYPE_ICP	0	/* XICS: ICP (presentation controller) */
#define KVM_IRQCHIP_TYPE_ICS	1	/* XICS: ICS (source controller) */
	__u32 type;
	union {
		/* XICS ICP arguments. This needs to be called once before
		 * creating any VCPU to initialize the main kernel XICS data
		 * structures.
		 */
		struct {
#define KVM_ICP_FLAG_NOREALMODE		0x00000001 /* Disable real mode ICP */
			__u32 flags;
		} icp;

		/* XICS ICS arguments. You can call this for every BUID you
		 * want to make available.
		 *
		 * The BUID is 12 bits, the interrupt number within a BUID
		 * is up to 12 bits as well. The resulting interrupt numbers
		 * exposed to the guest are BUID || IRQ which is 24 bit
		 *
		 * BUID cannot be 0.
		 */
		struct {
			__u32 flags;
			__u16 buid;
			__u16 nr_irqs;
		} ics;
	};
};

With the XICS, there are two types of irqchip: a source controller and
a presentation controller.  There is one presentation controller per
vcpu and typically one source controller per PCI host bridge (a source
controller can manage multiple sources).  The "buid" above is
basically an identifier for a source controller.

So with the above, it would be quite easy to add new types and
arguments for them.

Thoughts?

Paul.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-17 22:10       ` Paul Mackerras
@ 2012-10-17 23:58         ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 102+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-17 23:58 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Christoffer Dall, Alexander Graf, kvmarm, kvm, kvm-ppc

On Thu, 2012-10-18 at 09:10 +1100, Paul Mackerras wrote:
> 
> With the XICS, there are two types of irqchip: a source controller and
> a presentation controller.  There is one presentation controller per
> vcpu and typically one source controller per PCI host bridge (a source
> controller can manage multiple sources).  The "buid" above is
> basically an identifier for a source controller.
> 
> So with the above, it would be quite easy to add new types and
> arguments for them. 

The only possible issue is that afiak, the ioctl number depends on the
structure size, no ? If it does, then we should add some padding to the
union to leave room for new types.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-17 23:58         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 102+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-17 23:58 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Christoffer Dall, Alexander Graf, kvmarm, kvm, kvm-ppc

On Thu, 2012-10-18 at 09:10 +1100, Paul Mackerras wrote:
> 
> With the XICS, there are two types of irqchip: a source controller and
> a presentation controller.  There is one presentation controller per
> vcpu and typically one source controller per PCI host bridge (a source
> controller can manage multiple sources).  The "buid" above is
> basically an identifier for a source controller.
> 
> So with the above, it would be quite easy to add new types and
> arguments for them. 

The only possible issue is that afiak, the ioctl number depends on the
structure size, no ? If it does, then we should add some padding to the
union to leave room for new types.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH 3/3] KVM: ARM: Split KVM_CREATE_IRQCHIP and KVM_INIT_IRQCHIP
  2012-10-14  0:04 ` [RFC PATCH 3/3] KVM: ARM: Split KVM_CREATE_IRQCHIP and KVM_INIT_IRQCHIP Christoffer Dall
@ 2012-10-18 11:15   ` Peter Maydell
  0 siblings, 0 replies; 102+ messages in thread
From: Peter Maydell @ 2012-10-18 11:15 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, kvm, marc.zyngier

On 14 October 2012 01:04, Christoffer Dall
<c.dall@virtualopensystems.com> wrote:
> We need this two factor initialization step to support a sane user space
> initialization of the emulated model.  We simply follow the names of the
> ioctls for the internal vgic implementation steps and check if we have
> everything we need on the host side when we create the vgic and set up
> the rest on init.

With this patch I find that KVM_INIT_IRQCHIP fails EEXIST because it's
hitting the check in kvm_vgic_init() that online_vcpus is 0. I think
this check should be removed now as INIT_IRQCHIP will always happen
late, after we've created vcpus. (The patch puts this check in
kvm_vgic_create() so I guess I'm saying the check should be moved
rather than copied.)

On the other hand, I removed that check, and the host kernel oopses:

Unable to handle kernel paging request at virtual address 78656e75
pgd = dea94c80
[78656e75] *pgd=00000000
Internal error: Oops: 205 [#1] SMP ARM
CPU: 1    Tainted: G        W     (3.6.0+ #87)
PC is at vsnprintf+0x38/0x400
LR is at panic+0x60/0x1dc
pc : [<c01bb7c4>]    lr : [<c037eb98>]    psr: 20000093
sp : de145df0  ip : de145e6c  fp : 00000020
r10: c04c3dd4  r9 : 00000000  r8 : de145e6c
r7 : 00000000  r6 : cfdfdfdf  r5 : 00000000  r4 : 78656e75
r3 : de145e6c  r2 : 78656e75  r1 : 00000400  r0 : c04c3dd4
Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 30c5387d  Table: 9ea94c80  DAC: 00000000
Process qemu-system-arm (pid: 1078, stack limit = 0xde1442f8)
Stack: (0xde145df0 to 0xde146000)
5de0:                                     00030003 c006a07c 00000000 00000001
5e00: c038830c c04c3dd4 00000400 00000000 00000000 00000000 00000001 c04c3db0
5e20: 00000000 cfdfdfdf 00000000 00000000 00000000 00000001 de1642c4 c037eb98
5e40: 00000000 de145e6c fffffce0 00000000 cfdfdfdf 00000000 00000000 00000000
5e60: 00000001 c0022ac8 78656e75 c0022ac8 00000000 00000000 de144000 00000001
5e80: 00000002 00000000 00000000 00010000 9eacc000 00000000 7ffbfeff fffffffe
5ea0: 00000000 de164000 dea96600 00000000 de144000 00000000 00000000 0000ae80
5ec0: b63da6a4 c001ef5c 00000001 00000002 00000000 c00689a4 00000000 c003c794
5ee0: de073ec4 0000000a de400e18 00000000 dea96600 0000000b c000ed08 de144000
5f00: 00000000 c00c36fc fffffffa 00000434 00000000 c04c3618 c04974c0 c006cdf4
5f20: 00000100 3fb69f7c 00000000 00000004 00000084 7fffffff 00000001 00000001
5f40: 00000081 00000000 00000001 007bc068 de144000 00000000 b63da6a4 c0069418
5f60: 00000002 dea96600 00000000 0000ae80 0000000b c000ed08 de144000 00000000
5f80: b63da6a4 c00c3ca8 00000002 00000001 00000000 b63da470 00000000 00000000
5fa0: 00000036 c000eb80 b63da470 00000000 0000000b 0000ae80 00000000 0000ae80
5fc0: b63da470 00000000 00000000 00000036 00000000 00000000 beab7628 b63da6a4
5fe0: 0037e250 b63d9d6c 00290181 b6d9c2ec 600f0010 0000000b dfdfdfcf cfdfdfdf
[<c01bb7c4>] (vsnprintf+0x38/0x400) from [<c037eb98>] (panic+0x60/0x1dc)
[<c037eb98>] (panic+0x60/0x1dc) from [<c0022ac8>]
(kvm_arch_vcpu_ioctl_run+0xe8/0x404)
[<c0022ac8>] (kvm_arch_vcpu_ioctl_run+0xe8/0x404) from [<c001ef5c>]
(kvm_vcpu_ioctl+0x4d0/0x6a0)
[<c001ef5c>] (kvm_vcpu_ioctl+0x4d0/0x6a0) from [<c00c36fc>]
(do_vfs_ioctl+0x84/0x5f8)
[<c00c36fc>] (do_vfs_ioctl+0x84/0x5f8) from [<c00c3ca8>] (sys_ioctl+0x38/0x5c)
[<c00c3ca8>] (sys_ioctl+0x38/0x5c) from [<c000eb80>] (ret_fast_syscall+0x0/0x30)
Code: ba0000e5 e59da014 e3a0b020 e59d1018 (e5d23000)
---[ end trace 1b75b31a2719ed1e ]---

QEMU test code is here:
git://git.linaro.org/people/pmaydell/qemu-arm.git kvm-arm-dev-addr-test

thanks
-- PMM

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl
  2012-10-14  0:04 ` [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl Christoffer Dall
  2012-10-17 20:21   ` Peter Maydell
@ 2012-10-18 12:20   ` Avi Kivity
  2012-10-19 18:42     ` Christoffer Dall
  1 sibling, 1 reply; 102+ messages in thread
From: Avi Kivity @ 2012-10-18 12:20 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, kvm, peter.maydell, marc.zyngier

On 10/14/2012 02:04 AM, Christoffer Dall wrote:
> Used to initialize the in-kernel interrupt controller. On ARM we need to
> map the virtual generic interrupt controller (vGIC) into Hyp the guest's
> physicall address space so the guest can access the virtual cpu
> interface. This must be done after the IRQ chips is create and after a
> base address has been provided for the emulated platform (patch is
> following), but before the CPU is initally run.
> 
>  
> +4.79 KVM_INIT_IRQCHIP
> +
> +Capability: KVM_CAP_INIT_IRQCHIP
> +Architectures: arm
> +Type: vm ioctl
> +Parameters: none
> +Returns: 0 on success, -1 on error
> +
> +Initialize the in-kernel interrupt controller. On ARM we need to map the
> +virtual generic interrupt controller (vGIC) into Hyp the guest's physicall
> +address space so the guest can access the virtual cpu interface. This must be
> +done after the IRQ chips is create and after a base address has been provided
> +for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU is
> +initally run.
> +

What enforces this?

Can it be done automatically?  issue a
kvm_make_request(KVM_REQ_INIT_IRQCHIP) on vcpu creation, and you'll
automatically be notified before the first guest entry.

Having an ioctl that must be called after point A but before point B
seems pointless, when A and B are both known.

> +
>  5. The kvm_run structure
>  ------------------------
>  
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index f8c377b..85c76e4 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -195,6 +195,7 @@ int kvm_dev_ioctl_check_extension(long ext)
>  	switch (ext) {
>  #ifdef CONFIG_KVM_ARM_VGIC
>  	case KVM_CAP_IRQCHIP:
> +	case KVM_CAP_INIT_IRQCHIP:

This could be part of a baseline, if you don't envision ever taking it out.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-17 23:58         ` Benjamin Herrenschmidt
@ 2012-10-18 13:48           ` Christoffer Dall
  -1 siblings, 0 replies; 102+ messages in thread
From: Christoffer Dall @ 2012-10-18 13:48 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Paul Mackerras, Alexander Graf, kvmarm, kvm, kvm-ppc, Peter Maydell

On Wed, Oct 17, 2012 at 7:58 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Thu, 2012-10-18 at 09:10 +1100, Paul Mackerras wrote:
>>
>> With the XICS, there are two types of irqchip: a source controller and
>> a presentation controller.  There is one presentation controller per
>> vcpu and typically one source controller per PCI host bridge (a source
>> controller can manage multiple sources).  The "buid" above is
>> basically an identifier for a source controller.
>>
>> So with the above, it would be quite easy to add new types and
>> arguments for them.
>
> The only possible issue is that afiak, the ioctl number depends on the
> structure size, no ? If it does, then we should add some padding to the
> union to leave room for new types.
>
It sounds overall ok to me, however, Peter Maydell pointed out that
QEMU is architected in such a way that creating the IRQ chip happens
before QEMU knows how the chip fits with the model it is emulating and
therefore lacks bits of information that we require for initializing
the irq chip.

Specifically on ARM the information needed is the base address of a
hardware peripheral, which is virtualization aware, and is mapped
directly into the guest's physical address space.

One could argue that's not a concern for designing a good kernel api,
but on the other hand, qemu is already quite a large system on its
own, and we have to be practical.

Another alternative is to just let qemu init the device, later give
the base address and check before each execution of the vcpu whether
the base address has been set and simply return an error if not. This
latter approach just seems horrible and unintuitive to me.

Thoughts?

-Christoffer

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-18 13:48           ` Christoffer Dall
  0 siblings, 0 replies; 102+ messages in thread
From: Christoffer Dall @ 2012-10-18 13:48 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Paul Mackerras, Alexander Graf, kvmarm, kvm, kvm-ppc, Peter Maydell

On Wed, Oct 17, 2012 at 7:58 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Thu, 2012-10-18 at 09:10 +1100, Paul Mackerras wrote:
>>
>> With the XICS, there are two types of irqchip: a source controller and
>> a presentation controller.  There is one presentation controller per
>> vcpu and typically one source controller per PCI host bridge (a source
>> controller can manage multiple sources).  The "buid" above is
>> basically an identifier for a source controller.
>>
>> So with the above, it would be quite easy to add new types and
>> arguments for them.
>
> The only possible issue is that afiak, the ioctl number depends on the
> structure size, no ? If it does, then we should add some padding to the
> union to leave room for new types.
>
It sounds overall ok to me, however, Peter Maydell pointed out that
QEMU is architected in such a way that creating the IRQ chip happens
before QEMU knows how the chip fits with the model it is emulating and
therefore lacks bits of information that we require for initializing
the irq chip.

Specifically on ARM the information needed is the base address of a
hardware peripheral, which is virtualization aware, and is mapped
directly into the guest's physical address space.

One could argue that's not a concern for designing a good kernel api,
but on the other hand, qemu is already quite a large system on its
own, and we have to be practical.

Another alternative is to just let qemu init the device, later give
the base address and check before each execution of the vcpu whether
the base address has been set and simply return an error if not. This
latter approach just seems horrible and unintuitive to me.

Thoughts?

-Christoffer

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-18 13:48           ` Christoffer Dall
@ 2012-10-18 13:49             ` Alexander Graf
  -1 siblings, 0 replies; 102+ messages in thread
From: Alexander Graf @ 2012-10-18 13:49 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Benjamin Herrenschmidt, Paul Mackerras, kvmarm, kvm, kvm-ppc,
	Peter Maydell


On 18.10.2012, at 15:48, Christoffer Dall wrote:

> On Wed, Oct 17, 2012 at 7:58 PM, Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
>> On Thu, 2012-10-18 at 09:10 +1100, Paul Mackerras wrote:
>>> 
>>> With the XICS, there are two types of irqchip: a source controller and
>>> a presentation controller.  There is one presentation controller per
>>> vcpu and typically one source controller per PCI host bridge (a source
>>> controller can manage multiple sources).  The "buid" above is
>>> basically an identifier for a source controller.
>>> 
>>> So with the above, it would be quite easy to add new types and
>>> arguments for them.
>> 
>> The only possible issue is that afiak, the ioctl number depends on the
>> structure size, no ? If it does, then we should add some padding to the
>> union to leave room for new types.
>> 
> It sounds overall ok to me, however, Peter Maydell pointed out that
> QEMU is architected in such a way that creating the IRQ chip happens
> before QEMU knows how the chip fits with the model it is emulating and
> therefore lacks bits of information that we require for initializing
> the irq chip.
> 
> Specifically on ARM the information needed is the base address of a
> hardware peripheral, which is virtualization aware, and is mapped
> directly into the guest's physical address space.
> 
> One could argue that's not a concern for designing a good kernel api,
> but on the other hand, qemu is already quite a large system on its
> own, and we have to be practical.
> 
> Another alternative is to just let qemu init the device, later give
> the base address and check before each execution of the vcpu whether
> the base address has been set and simply return an error if not. This
> latter approach just seems horrible and unintuitive to me.
> 
> Thoughts?

Wasn't there some "sane" field in the vcpu structs that you can hijack to make sure you only ever VCPU_RUN when the VM is complete?


Alex

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-18 13:49             ` Alexander Graf
  0 siblings, 0 replies; 102+ messages in thread
From: Alexander Graf @ 2012-10-18 13:49 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Benjamin Herrenschmidt, Paul Mackerras, kvmarm, kvm, kvm-ppc,
	Peter Maydell


On 18.10.2012, at 15:48, Christoffer Dall wrote:

> On Wed, Oct 17, 2012 at 7:58 PM, Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
>> On Thu, 2012-10-18 at 09:10 +1100, Paul Mackerras wrote:
>>> 
>>> With the XICS, there are two types of irqchip: a source controller and
>>> a presentation controller.  There is one presentation controller per
>>> vcpu and typically one source controller per PCI host bridge (a source
>>> controller can manage multiple sources).  The "buid" above is
>>> basically an identifier for a source controller.
>>> 
>>> So with the above, it would be quite easy to add new types and
>>> arguments for them.
>> 
>> The only possible issue is that afiak, the ioctl number depends on the
>> structure size, no ? If it does, then we should add some padding to the
>> union to leave room for new types.
>> 
> It sounds overall ok to me, however, Peter Maydell pointed out that
> QEMU is architected in such a way that creating the IRQ chip happens
> before QEMU knows how the chip fits with the model it is emulating and
> therefore lacks bits of information that we require for initializing
> the irq chip.
> 
> Specifically on ARM the information needed is the base address of a
> hardware peripheral, which is virtualization aware, and is mapped
> directly into the guest's physical address space.
> 
> One could argue that's not a concern for designing a good kernel api,
> but on the other hand, qemu is already quite a large system on its
> own, and we have to be practical.
> 
> Another alternative is to just let qemu init the device, later give
> the base address and check before each execution of the vcpu whether
> the base address has been set and simply return an error if not. This
> latter approach just seems horrible and unintuitive to me.
> 
> Thoughts?

Wasn't there some "sane" field in the vcpu structs that you can hijack to make sure you only ever VCPU_RUN when the VM is complete?


Alex


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-18 13:49             ` Alexander Graf
@ 2012-10-18 15:25               ` Avi Kivity
  -1 siblings, 0 replies; 102+ messages in thread
From: Avi Kivity @ 2012-10-18 15:25 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Christoffer Dall, Benjamin Herrenschmidt, Paul Mackerras, kvmarm,
	kvm, kvm-ppc, Peter Maydell

On 10/18/2012 03:49 PM, Alexander Graf wrote:
> 
> On 18.10.2012, at 15:48, Christoffer Dall wrote:
> 
>> On Wed, Oct 17, 2012 at 7:58 PM, Benjamin Herrenschmidt
>> <benh@kernel.crashing.org> wrote:
>>> On Thu, 2012-10-18 at 09:10 +1100, Paul Mackerras wrote:
>>>> 
>>>> With the XICS, there are two types of irqchip: a source controller and
>>>> a presentation controller.  There is one presentation controller per
>>>> vcpu and typically one source controller per PCI host bridge (a source
>>>> controller can manage multiple sources).  The "buid" above is
>>>> basically an identifier for a source controller.
>>>> 
>>>> So with the above, it would be quite easy to add new types and
>>>> arguments for them.
>>> 
>>> The only possible issue is that afiak, the ioctl number depends on the
>>> structure size, no ? If it does, then we should add some padding to the
>>> union to leave room for new types.
>>> 
>> It sounds overall ok to me, however, Peter Maydell pointed out that
>> QEMU is architected in such a way that creating the IRQ chip happens
>> before QEMU knows how the chip fits with the model it is emulating and
>> therefore lacks bits of information that we require for initializing
>> the irq chip.
>> 
>> Specifically on ARM the information needed is the base address of a
>> hardware peripheral, which is virtualization aware, and is mapped
>> directly into the guest's physical address space.
>> 
>> One could argue that's not a concern for designing a good kernel api,
>> but on the other hand, qemu is already quite a large system on its
>> own, and we have to be practical.
>> 
>> Another alternative is to just let qemu init the device, later give
>> the base address and check before each execution of the vcpu whether
>> the base address has been set and simply return an error if not. This
>> latter approach just seems horrible and unintuitive to me.
>> 
>> Thoughts?
> 
> Wasn't there some "sane" field in the vcpu structs that you can hijack to make sure you only ever VCPU_RUN when the VM is complete?

vcpu->requests is checked on each entry.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-18 15:25               ` Avi Kivity
  0 siblings, 0 replies; 102+ messages in thread
From: Avi Kivity @ 2012-10-18 15:25 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Christoffer Dall, Benjamin Herrenschmidt, Paul Mackerras, kvmarm,
	kvm, kvm-ppc, Peter Maydell

On 10/18/2012 03:49 PM, Alexander Graf wrote:
> 
> On 18.10.2012, at 15:48, Christoffer Dall wrote:
> 
>> On Wed, Oct 17, 2012 at 7:58 PM, Benjamin Herrenschmidt
>> <benh@kernel.crashing.org> wrote:
>>> On Thu, 2012-10-18 at 09:10 +1100, Paul Mackerras wrote:
>>>> 
>>>> With the XICS, there are two types of irqchip: a source controller and
>>>> a presentation controller.  There is one presentation controller per
>>>> vcpu and typically one source controller per PCI host bridge (a source
>>>> controller can manage multiple sources).  The "buid" above is
>>>> basically an identifier for a source controller.
>>>> 
>>>> So with the above, it would be quite easy to add new types and
>>>> arguments for them.
>>> 
>>> The only possible issue is that afiak, the ioctl number depends on the
>>> structure size, no ? If it does, then we should add some padding to the
>>> union to leave room for new types.
>>> 
>> It sounds overall ok to me, however, Peter Maydell pointed out that
>> QEMU is architected in such a way that creating the IRQ chip happens
>> before QEMU knows how the chip fits with the model it is emulating and
>> therefore lacks bits of information that we require for initializing
>> the irq chip.
>> 
>> Specifically on ARM the information needed is the base address of a
>> hardware peripheral, which is virtualization aware, and is mapped
>> directly into the guest's physical address space.
>> 
>> One could argue that's not a concern for designing a good kernel api,
>> but on the other hand, qemu is already quite a large system on its
>> own, and we have to be practical.
>> 
>> Another alternative is to just let qemu init the device, later give
>> the base address and check before each execution of the vcpu whether
>> the base address has been set and simply return an error if not. This
>> latter approach just seems horrible and unintuitive to me.
>> 
>> Thoughts?
> 
> Wasn't there some "sane" field in the vcpu structs that you can hijack to make sure you only ever VCPU_RUN when the VM is complete?

vcpu->requests is checked on each entry.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl
  2012-10-18 12:20   ` Avi Kivity
@ 2012-10-19 18:42     ` Christoffer Dall
  0 siblings, 0 replies; 102+ messages in thread
From: Christoffer Dall @ 2012-10-19 18:42 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvmarm, kvm, peter.maydell, marc.zyngier

On Thu, Oct 18, 2012 at 8:20 AM, Avi Kivity <avi@redhat.com> wrote:
> On 10/14/2012 02:04 AM, Christoffer Dall wrote:
>> Used to initialize the in-kernel interrupt controller. On ARM we need to
>> map the virtual generic interrupt controller (vGIC) into Hyp the guest's
>> physicall address space so the guest can access the virtual cpu
>> interface. This must be done after the IRQ chips is create and after a
>> base address has been provided for the emulated platform (patch is
>> following), but before the CPU is initally run.
>>
>>
>> +4.79 KVM_INIT_IRQCHIP
>> +
>> +Capability: KVM_CAP_INIT_IRQCHIP
>> +Architectures: arm
>> +Type: vm ioctl
>> +Parameters: none
>> +Returns: 0 on success, -1 on error
>> +
>> +Initialize the in-kernel interrupt controller. On ARM we need to map the
>> +virtual generic interrupt controller (vGIC) into Hyp the guest's physicall
>> +address space so the guest can access the virtual cpu interface. This must be
>> +done after the IRQ chips is create and after a base address has been provided
>> +for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU is
>> +initally run.
>> +
>
> What enforces this?
>
> Can it be done automatically?  issue a
> kvm_make_request(KVM_REQ_INIT_IRQCHIP) on vcpu creation, and you'll
> automatically be notified before the first guest entry.
>
> Having an ioctl that must be called after point A but before point B
> seems pointless, when A and B are both known.
>

I reworked this according to your comments, patches on the way.

thanks for the input.

>> +
>>  5. The kvm_run structure
>>  ------------------------
>>
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index f8c377b..85c76e4 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -195,6 +195,7 @@ int kvm_dev_ioctl_check_extension(long ext)
>>       switch (ext) {
>>  #ifdef CONFIG_KVM_ARM_VGIC
>>       case KVM_CAP_IRQCHIP:
>> +     case KVM_CAP_INIT_IRQCHIP:
>
> This could be part of a baseline, if you don't envision ever taking it out.
>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH 2/3] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl
  2012-10-17 20:29   ` Peter Maydell
@ 2012-10-19 18:46     ` Christoffer Dall
  2012-10-19 20:24       ` Peter Maydell
  0 siblings, 1 reply; 102+ messages in thread
From: Christoffer Dall @ 2012-10-19 18:46 UTC (permalink / raw)
  To: Peter Maydell; +Cc: kvmarm, kvm, marc.zyngier

On Wed, Oct 17, 2012 at 4:29 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 14 October 2012 01:04, Christoffer Dall
> <c.dall@virtualopensystems.com> wrote:
>> On ARM (and possibly other architectures) some bits are specific to the
>> model being emulated for the guest and user space needs a way to tell
>> the kernel about those bits.  An example is mmio device base addresses,
>> where KVM must know the base address for a given device to properly
>> emulate mmio accesses within a certain address range or directly map a
>> device with virtualiation extensions into the guest address space.
>>
>> We try to make this API slightly more generic than for our specific use,
>> but so far only the VGIC uses this feature.
>>
>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
>> ---
>>  Documentation/virtual/kvm/api.txt |   30 ++++++++++++++++++++++++++++++
>>  arch/arm/include/asm/kvm.h        |   13 +++++++++++++
>>  arch/arm/include/asm/kvm_mmu.h    |    1 +
>>  arch/arm/include/asm/kvm_vgic.h   |    6 ++++++
>>  arch/arm/kvm/arm.c                |   31 ++++++++++++++++++++++++++++++-
>>  arch/arm/kvm/vgic.c               |   34 +++++++++++++++++++++++++++++++---
>>  include/linux/kvm.h               |    8 ++++++++
>>  7 files changed, 119 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> index 26e953d..30ddcac 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -2118,6 +2118,36 @@ for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU is
>>  initally run.
>>
>>
>> +4.80 KVM_SET_DEVICE_ADDRESS
>> +
>> +Capability: KVM_CAP_SET_DEVICE_ADDRESS
>> +Architectures: arm
>> +Type: vm ioctl
>> +Parameters: struct kvm_device_address (in)
>> +Returns: 0 on success, -1 on error
>> +Errors:
>> +  ENODEV: The device id is unknwown
>
> "unknown"
>
>> +  ENXIO:  Device not supported in configuration
>
> "in this configuration" ? (I'm guessing this is for "you tried to
> map a GIC when this CPU doesn't have a GIC" and similar errors?)
>
>> +  E2BIG:  Address outside of guest physical address space
>
> I would say "outside" rather than "outside of" here.
>
>> +
>> +struct kvm_device_address {
>> +       __u32 id;
>> +       __u64 addr;
>> +};
>> +
>> +Specify a device address in the guest's physical address space where guests
>> +can access emulated or directly exposed devices, which the host kernel needs
>> +to know about. The id field is an architecture specific identifier for a
>> +specific device.
>> +
>> +ARM divides the id field into two parts, a device ID and an address type id
>
> We should be consistent about whether ID is capitalised or not.
>

indeed

>> +specific to the individual device.
>> +
>> +  bits:  | 31    ...    16 | 15    ...    0 |
>> +  field: |     device id   |  addr type id  |
>
> This doesn't say whether userspace is allowed to make this ioctl
> multiple times for the same device. This could be any of:
>  * undefined behaviour
>  * second call fails with some errno
>  * second call overrides first one
>

I added an error condition EEXIST, but since this is trying to not be
arm-vgic specific this is really up to the individual device - maybe
we can have some polymorphic device that moves around later.

> It also doesn't say that you're supposed to call this after CREATE
> and before INIT of the irqchip. (Nor does it say what happens if
> you call it at some other time.)
>

same non-device specific argument as above.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH 2/3] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl
  2012-10-19 18:46     ` Christoffer Dall
@ 2012-10-19 20:24       ` Peter Maydell
  2012-10-19 20:27         ` Christoffer Dall
  0 siblings, 1 reply; 102+ messages in thread
From: Peter Maydell @ 2012-10-19 20:24 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, kvm, marc.zyngier

On 19 October 2012 19:46, Christoffer Dall
<c.dall@virtualopensystems.com> wrote:
> On Wed, Oct 17, 2012 at 4:29 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
>> This doesn't say whether userspace is allowed to make this ioctl
>> multiple times for the same device. This could be any of:
>>  * undefined behaviour
>>  * second call fails with some errno
>>  * second call overrides first one
>>
>
> I added an error condition EEXIST, but since this is trying to not be
> arm-vgic specific this is really up to the individual device - maybe
> we can have some polymorphic device that moves around later.
>
>> It also doesn't say that you're supposed to call this after CREATE
>> and before INIT of the irqchip. (Nor does it say what happens if
>> you call it at some other time.)
>>
>
> same non-device specific argument as above.

We could have a section in the docs that says "On ARM platforms
there are devices X and Y and they have such-and-such properties
and requirements" [and other devices later can have further docs
as appropriate].

-- PMM

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH 2/3] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl
  2012-10-19 20:24       ` Peter Maydell
@ 2012-10-19 20:27         ` Christoffer Dall
  2012-10-19 20:33           ` Christoffer Dall
  0 siblings, 1 reply; 102+ messages in thread
From: Christoffer Dall @ 2012-10-19 20:27 UTC (permalink / raw)
  To: Peter Maydell; +Cc: kvmarm, kvm, marc.zyngier

On Fri, Oct 19, 2012 at 4:24 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 19 October 2012 19:46, Christoffer Dall
> <c.dall@virtualopensystems.com> wrote:
>> On Wed, Oct 17, 2012 at 4:29 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
>>> This doesn't say whether userspace is allowed to make this ioctl
>>> multiple times for the same device. This could be any of:
>>>  * undefined behaviour
>>>  * second call fails with some errno
>>>  * second call overrides first one
>>>
>>
>> I added an error condition EEXIST, but since this is trying to not be
>> arm-vgic specific this is really up to the individual device - maybe
>> we can have some polymorphic device that moves around later.
>>
>>> It also doesn't say that you're supposed to call this after CREATE
>>> and before INIT of the irqchip. (Nor does it say what happens if
>>> you call it at some other time.)
>>>
>>
>> same non-device specific argument as above.
>
> We could have a section in the docs that says "On ARM platforms
> there are devices X and Y and they have such-and-such properties
> and requirements" [and other devices later can have further docs
> as appropriate].
>
sure, I can add that.

-Christoffer

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH 2/3] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl
  2012-10-19 20:27         ` Christoffer Dall
@ 2012-10-19 20:33           ` Christoffer Dall
  0 siblings, 0 replies; 102+ messages in thread
From: Christoffer Dall @ 2012-10-19 20:33 UTC (permalink / raw)
  To: Peter Maydell; +Cc: kvmarm, kvm, marc.zyngier

On Fri, Oct 19, 2012 at 4:27 PM, Christoffer Dall
<c.dall@virtualopensystems.com> wrote:
> On Fri, Oct 19, 2012 at 4:24 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
>> On 19 October 2012 19:46, Christoffer Dall
>> <c.dall@virtualopensystems.com> wrote:
>>> On Wed, Oct 17, 2012 at 4:29 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
>>>> This doesn't say whether userspace is allowed to make this ioctl
>>>> multiple times for the same device. This could be any of:
>>>>  * undefined behaviour
>>>>  * second call fails with some errno
>>>>  * second call overrides first one
>>>>
>>>
>>> I added an error condition EEXIST, but since this is trying to not be
>>> arm-vgic specific this is really up to the individual device - maybe
>>> we can have some polymorphic device that moves around later.
>>>
>>>> It also doesn't say that you're supposed to call this after CREATE
>>>> and before INIT of the irqchip. (Nor does it say what happens if
>>>> you call it at some other time.)
>>>>
>>>
>>> same non-device specific argument as above.
>>
>> We could have a section in the docs that says "On ARM platforms
>> there are devices X and Y and they have such-and-such properties
>> and requirements" [and other devices later can have further docs
>> as appropriate].
>>


diff --git a/Documentation/virtual/kvm/api.txt
b/Documentation/virtual/kvm/api.txt
index 65aacc5..1380885 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2131,6 +2131,12 @@ specific to the individual device.
   bits:  | 31    ...    16 | 15    ...    0 |
   field: |     device id   |  addr type id  |

+ARM currently only require this when using the in-kernel GIC support for the
+hardware vGIC features, using KVM_ARM_DEVICE_VGIC_V2 as the device id.  When
+setting the base address for the guest's mapping of the vGIC virtual CPU
+and distributor interface, the ioctl must be called after calling
+KVM_CREATE_IRQCHIP, but before calling KVM_RUN on any of the VCPUs.  Calling
+this ioctl twice for any of the base addresses will return -EEXIST.


 5. The kvm_run structure

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-18 13:48           ` Christoffer Dall
@ 2012-10-23 10:48             ` Jan Kiszka
  -1 siblings, 0 replies; 102+ messages in thread
From: Jan Kiszka @ 2012-10-23 10:48 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Alexander Graf, kvmarm,
	kvm, kvm-ppc, Peter Maydell

On 2012-10-18 15:48, Christoffer Dall wrote:
> On Wed, Oct 17, 2012 at 7:58 PM, Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
>> On Thu, 2012-10-18 at 09:10 +1100, Paul Mackerras wrote:
>>>
>>> With the XICS, there are two types of irqchip: a source controller and
>>> a presentation controller.  There is one presentation controller per
>>> vcpu and typically one source controller per PCI host bridge (a source
>>> controller can manage multiple sources).  The "buid" above is
>>> basically an identifier for a source controller.
>>>
>>> So with the above, it would be quite easy to add new types and
>>> arguments for them.
>>
>> The only possible issue is that afiak, the ioctl number depends on the
>> structure size, no ? If it does, then we should add some padding to the
>> union to leave room for new types.
>>
> It sounds overall ok to me, however, Peter Maydell pointed out that
> QEMU is architected in such a way that creating the IRQ chip happens
> before QEMU knows how the chip fits with the model it is emulating and
> therefore lacks bits of information that we require for initializing
> the irq chip.
> 
> Specifically on ARM the information needed is the base address of a
> hardware peripheral, which is virtualization aware, and is mapped
> directly into the guest's physical address space.
> 
> One could argue that's not a concern for designing a good kernel api,
> but on the other hand, qemu is already quite a large system on its
> own, and we have to be practical.
> 
> Another alternative is to just let qemu init the device, later give
> the base address and check before each execution of the vcpu whether
> the base address has been set and simply return an error if not. This
> latter approach just seems horrible and unintuitive to me.
> 
> Thoughts?

The current irqchip API is like this:

KVM_CREATE_IRQCHIP (without any parameters)
...
KVM_CREATE_VCPU
KVM_SET_IRQCHIP (or the other way around)
...
KVM_RUN

The arguments you cannot pass via KVM_CREATE_IRQCHIP - which is more
like a "Hey, there will be an IRQ chip!" - could be passed via
KVM_SET_IRQCHIP (it has 512 bytes space). Provided there are sane
configuration defaults, at least after KVM_CREATE_VCPU, KVM_SET_IRQCHIP
becomes optional. Then you don't need the a check on KVM_RUN.

What do we need in addition to this in any of the non-x86 archs?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-23 10:48             ` Jan Kiszka
  0 siblings, 0 replies; 102+ messages in thread
From: Jan Kiszka @ 2012-10-23 10:48 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Alexander Graf, kvmarm,
	kvm, kvm-ppc, Peter Maydell

On 2012-10-18 15:48, Christoffer Dall wrote:
> On Wed, Oct 17, 2012 at 7:58 PM, Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
>> On Thu, 2012-10-18 at 09:10 +1100, Paul Mackerras wrote:
>>>
>>> With the XICS, there are two types of irqchip: a source controller and
>>> a presentation controller.  There is one presentation controller per
>>> vcpu and typically one source controller per PCI host bridge (a source
>>> controller can manage multiple sources).  The "buid" above is
>>> basically an identifier for a source controller.
>>>
>>> So with the above, it would be quite easy to add new types and
>>> arguments for them.
>>
>> The only possible issue is that afiak, the ioctl number depends on the
>> structure size, no ? If it does, then we should add some padding to the
>> union to leave room for new types.
>>
> It sounds overall ok to me, however, Peter Maydell pointed out that
> QEMU is architected in such a way that creating the IRQ chip happens
> before QEMU knows how the chip fits with the model it is emulating and
> therefore lacks bits of information that we require for initializing
> the irq chip.
> 
> Specifically on ARM the information needed is the base address of a
> hardware peripheral, which is virtualization aware, and is mapped
> directly into the guest's physical address space.
> 
> One could argue that's not a concern for designing a good kernel api,
> but on the other hand, qemu is already quite a large system on its
> own, and we have to be practical.
> 
> Another alternative is to just let qemu init the device, later give
> the base address and check before each execution of the vcpu whether
> the base address has been set and simply return an error if not. This
> latter approach just seems horrible and unintuitive to me.
> 
> Thoughts?

The current irqchip API is like this:

KVM_CREATE_IRQCHIP (without any parameters)
...
KVM_CREATE_VCPU
KVM_SET_IRQCHIP (or the other way around)
...
KVM_RUN

The arguments you cannot pass via KVM_CREATE_IRQCHIP - which is more
like a "Hey, there will be an IRQ chip!" - could be passed via
KVM_SET_IRQCHIP (it has 512 bytes space). Provided there are sane
configuration defaults, at least after KVM_CREATE_VCPU, KVM_SET_IRQCHIP
becomes optional. Then you don't need the a check on KVM_RUN.

What do we need in addition to this in any of the non-x86 archs?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-23 10:48             ` Jan Kiszka
@ 2012-10-23 10:52               ` Peter Maydell
  -1 siblings, 0 replies; 102+ messages in thread
From: Peter Maydell @ 2012-10-23 10:52 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Christoffer Dall, Benjamin Herrenschmidt, Paul Mackerras,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On 23 October 2012 11:48, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> The current irqchip API is like this:
>
> KVM_CREATE_IRQCHIP (without any parameters)
> ...
> KVM_CREATE_VCPU
> KVM_SET_IRQCHIP (or the other way around)
> ...
> KVM_RUN
>
> The arguments you cannot pass via KVM_CREATE_IRQCHIP - which is more
> like a "Hey, there will be an IRQ chip!" - could be passed via
> KVM_SET_IRQCHIP (it has 512 bytes space). Provided there are sane
> configuration defaults, at least after KVM_CREATE_VCPU, KVM_SET_IRQCHIP
> becomes optional. Then you don't need the a check on KVM_RUN.

KVM_SET_IRQCHIP is the state load ioctl, right (companion to
KVM_GET_IRQCHIP)? It seems like a bit of an abuse to use that
for configuration rather than for migration/sync of state
with userspace...

(For ARM we will just use the ONE_REG ABI for irqchip state
save/load anyway, so KVM_SET_IRQCHIP isn't relevant.)

-- PMM

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-23 10:52               ` Peter Maydell
  0 siblings, 0 replies; 102+ messages in thread
From: Peter Maydell @ 2012-10-23 10:52 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Christoffer Dall, Benjamin Herrenschmidt, Paul Mackerras,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On 23 October 2012 11:48, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> The current irqchip API is like this:
>
> KVM_CREATE_IRQCHIP (without any parameters)
> ...
> KVM_CREATE_VCPU
> KVM_SET_IRQCHIP (or the other way around)
> ...
> KVM_RUN
>
> The arguments you cannot pass via KVM_CREATE_IRQCHIP - which is more
> like a "Hey, there will be an IRQ chip!" - could be passed via
> KVM_SET_IRQCHIP (it has 512 bytes space). Provided there are sane
> configuration defaults, at least after KVM_CREATE_VCPU, KVM_SET_IRQCHIP
> becomes optional. Then you don't need the a check on KVM_RUN.

KVM_SET_IRQCHIP is the state load ioctl, right (companion to
KVM_GET_IRQCHIP)? It seems like a bit of an abuse to use that
for configuration rather than for migration/sync of state
with userspace...

(For ARM we will just use the ONE_REG ABI for irqchip state
save/load anyway, so KVM_SET_IRQCHIP isn't relevant.)

-- PMM

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-23 10:52               ` Peter Maydell
@ 2012-10-23 11:00                 ` Jan Kiszka
  -1 siblings, 0 replies; 102+ messages in thread
From: Jan Kiszka @ 2012-10-23 11:00 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Christoffer Dall, Benjamin Herrenschmidt, Paul Mackerras,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On 2012-10-23 12:52, Peter Maydell wrote:
> On 23 October 2012 11:48, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> The current irqchip API is like this:
>>
>> KVM_CREATE_IRQCHIP (without any parameters)
>> ...
>> KVM_CREATE_VCPU
>> KVM_SET_IRQCHIP (or the other way around)
>> ...
>> KVM_RUN
>>
>> The arguments you cannot pass via KVM_CREATE_IRQCHIP - which is more
>> like a "Hey, there will be an IRQ chip!" - could be passed via
>> KVM_SET_IRQCHIP (it has 512 bytes space). Provided there are sane
>> configuration defaults, at least after KVM_CREATE_VCPU, KVM_SET_IRQCHIP
>> becomes optional. Then you don't need the a check on KVM_RUN.
> 
> KVM_SET_IRQCHIP is the state load ioctl, right (companion to
> KVM_GET_IRQCHIP)? It seems like a bit of an abuse to use that
> for configuration rather than for migration/sync of state
> with userspace...

Depends on how reconfigurable your chip is. x86 also sends the mapping
down this way, which happens to be guest configurable but is practically
static.

> 
> (For ARM we will just use the ONE_REG ABI for irqchip state
> save/load anyway, so KVM_SET_IRQCHIP isn't relevant.)

The less problems you should have using the SET interface to perform
one-time configuration.

BTW, I guess we will regret that one-reg ABI one day and have to
introduce a multi-reg version again for hot-standby, i.e. continuous
state migration. I know we also do this for c86 MSRs - that interface
has the same limitation.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-23 11:00                 ` Jan Kiszka
  0 siblings, 0 replies; 102+ messages in thread
From: Jan Kiszka @ 2012-10-23 11:00 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Christoffer Dall, Benjamin Herrenschmidt, Paul Mackerras,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On 2012-10-23 12:52, Peter Maydell wrote:
> On 23 October 2012 11:48, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> The current irqchip API is like this:
>>
>> KVM_CREATE_IRQCHIP (without any parameters)
>> ...
>> KVM_CREATE_VCPU
>> KVM_SET_IRQCHIP (or the other way around)
>> ...
>> KVM_RUN
>>
>> The arguments you cannot pass via KVM_CREATE_IRQCHIP - which is more
>> like a "Hey, there will be an IRQ chip!" - could be passed via
>> KVM_SET_IRQCHIP (it has 512 bytes space). Provided there are sane
>> configuration defaults, at least after KVM_CREATE_VCPU, KVM_SET_IRQCHIP
>> becomes optional. Then you don't need the a check on KVM_RUN.
> 
> KVM_SET_IRQCHIP is the state load ioctl, right (companion to
> KVM_GET_IRQCHIP)? It seems like a bit of an abuse to use that
> for configuration rather than for migration/sync of state
> with userspace...

Depends on how reconfigurable your chip is. x86 also sends the mapping
down this way, which happens to be guest configurable but is practically
static.

> 
> (For ARM we will just use the ONE_REG ABI for irqchip state
> save/load anyway, so KVM_SET_IRQCHIP isn't relevant.)

The less problems you should have using the SET interface to perform
one-time configuration.

BTW, I guess we will regret that one-reg ABI one day and have to
introduce a multi-reg version again for hot-standby, i.e. continuous
state migration. I know we also do this for c86 MSRs - that interface
has the same limitation.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-23 11:00                 ` Jan Kiszka
@ 2012-10-23 11:04                   ` Peter Maydell
  -1 siblings, 0 replies; 102+ messages in thread
From: Peter Maydell @ 2012-10-23 11:04 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Christoffer Dall, Benjamin Herrenschmidt, Paul Mackerras,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On 23 October 2012 12:00, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> BTW, I guess we will regret that one-reg ABI one day and have to
> introduce a multi-reg version again for hot-standby, i.e. continuous
> state migration. I know we also do this for c86 MSRs - that interface
> has the same limitation.

The multi-reg ABI idea has been floated, it would be easy enough to add.
We just don't want to tie up getting ARM KVM support in with arguments
over yet another ABI -- ONE_REG is sufficient for our current
purposes.

-- PMM

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-23 11:04                   ` Peter Maydell
  0 siblings, 0 replies; 102+ messages in thread
From: Peter Maydell @ 2012-10-23 11:04 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Christoffer Dall, Benjamin Herrenschmidt, Paul Mackerras,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On 23 October 2012 12:00, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> BTW, I guess we will regret that one-reg ABI one day and have to
> introduce a multi-reg version again for hot-standby, i.e. continuous
> state migration. I know we also do this for c86 MSRs - that interface
> has the same limitation.

The multi-reg ABI idea has been floated, it would be easy enough to add.
We just don't want to tie up getting ARM KVM support in with arguments
over yet another ABI -- ONE_REG is sufficient for our current
purposes.

-- PMM

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-23 11:04                   ` Peter Maydell
@ 2012-10-23 11:08                     ` Jan Kiszka
  -1 siblings, 0 replies; 102+ messages in thread
From: Jan Kiszka @ 2012-10-23 11:08 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Christoffer Dall, Benjamin Herrenschmidt, Paul Mackerras,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On 2012-10-23 13:04, Peter Maydell wrote:
> On 23 October 2012 12:00, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> BTW, I guess we will regret that one-reg ABI one day and have to
>> introduce a multi-reg version again for hot-standby, i.e. continuous
>> state migration. I know we also do this for c86 MSRs - that interface
>> has the same limitation.
> 
> The multi-reg ABI idea has been floated, it would be easy enough to add.
> We just don't want to tie up getting ARM KVM support in with arguments
> over yet another ABI -- ONE_REG is sufficient for our current
> purposes.

Good that the number of IOCTLs we can encode is still large. ;)

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-23 11:08                     ` Jan Kiszka
  0 siblings, 0 replies; 102+ messages in thread
From: Jan Kiszka @ 2012-10-23 11:08 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Christoffer Dall, Benjamin Herrenschmidt, Paul Mackerras,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On 2012-10-23 13:04, Peter Maydell wrote:
> On 23 October 2012 12:00, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> BTW, I guess we will regret that one-reg ABI one day and have to
>> introduce a multi-reg version again for hot-standby, i.e. continuous
>> state migration. I know we also do this for c86 MSRs - that interface
>> has the same limitation.
> 
> The multi-reg ABI idea has been floated, it would be easy enough to add.
> We just don't want to tie up getting ARM KVM support in with arguments
> over yet another ABI -- ONE_REG is sufficient for our current
> purposes.

Good that the number of IOCTLs we can encode is still large. ;)

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-23 10:48             ` Jan Kiszka
@ 2012-10-24  0:50               ` Paul Mackerras
  -1 siblings, 0 replies; 102+ messages in thread
From: Paul Mackerras @ 2012-10-24  0:50 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Christoffer Dall, Benjamin Herrenschmidt, Alexander Graf, kvmarm,
	kvm, kvm-ppc, Peter Maydell

On Tue, Oct 23, 2012 at 12:48:28PM +0200, Jan Kiszka wrote:

> The current irqchip API is like this:
> 
> KVM_CREATE_IRQCHIP (without any parameters)
> ...
> KVM_CREATE_VCPU
> KVM_SET_IRQCHIP (or the other way around)
> ...
> KVM_RUN
> 
> The arguments you cannot pass via KVM_CREATE_IRQCHIP - which is more
> like a "Hey, there will be an IRQ chip!" - could be passed via
> KVM_SET_IRQCHIP (it has 512 bytes space). Provided there are sane
> configuration defaults, at least after KVM_CREATE_VCPU, KVM_SET_IRQCHIP
> becomes optional. Then you don't need the a check on KVM_RUN.

Interesting.  How many times do you call KVM_CREATE_IRQCHIP per VM?
Just once?

> What do we need in addition to this in any of the non-x86 archs?

What we have with the XICS, and to some extent with the OpenPIC, is a
separation between "source" and "presentation" controllers, with a
message-passing fabric between them.  The source controllers handle
the details of some number of interrupt sources, such as the priority
and destination of each interrupt source, and the presentation
controllers handle the interface to the CPUs, so there is one
presentation controller per CPU.  The presentation controller for a
CPU has registers for the CPU priority, IPI request priority, and
pending interrupt status.

So we could indeed use the existing KVM_CREATE_IRQCHIP to tell KVM to
create a presentation controller per vcpu.  But then how do we tell
KVM how many source controllers we want and how many interrupts each
source controller should handle?  The source controllers are not tied
to any particular vcpu, and a source controller could potentially have
100s of interrupts or more (particularly with MSIs).  Configuration of
each source fits into 64 bits, so if we tried to use KVM_SET_IRQCHIP
for configuring a source controller we'd be limited to 64 interrupts
per source controller.

What I have in my patches to do XICS emulation in the kernel is a new
KVM_CREATE_IRQCHIP_ARGS ioctl, which takes an argument struct with a
type, and for source controllers, an identifying number ("bus unit ID"
or BUID, since that's what the hardware calls it) and the number of
sources.  Then for saving/restoring the presentation controller state
there's a register identifier for the KVM_GET/SET_ONE_REG ioctls, and
for the source controllers there are new KVM_IRQCHIP_GET/SET_SOURCES
ioctls that take an argument struct like this:

struct kvm_irq_sources {
	__u32 start_irq_number;
	__u32 nr_irqs;
	__u64 *irqbuf;
};

OpenPIC also can handle 100s or 1000s of interrupt sources and can
have the sources divided up into blocks (which tends to make it
desirable to have multiple source controllers).  It also has per-CPU
interrupt delivery registers and per-source interrupt source
registers.

So I think all this could be shoehorned into KVM_CREATE/GET/SET_IRQCHIP
for small configurations, but it seems like it would run out of space
for larger configurations.

Paul.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-24  0:50               ` Paul Mackerras
  0 siblings, 0 replies; 102+ messages in thread
From: Paul Mackerras @ 2012-10-24  0:50 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Christoffer Dall, Benjamin Herrenschmidt, Alexander Graf, kvmarm,
	kvm, kvm-ppc, Peter Maydell

On Tue, Oct 23, 2012 at 12:48:28PM +0200, Jan Kiszka wrote:

> The current irqchip API is like this:
> 
> KVM_CREATE_IRQCHIP (without any parameters)
> ...
> KVM_CREATE_VCPU
> KVM_SET_IRQCHIP (or the other way around)
> ...
> KVM_RUN
> 
> The arguments you cannot pass via KVM_CREATE_IRQCHIP - which is more
> like a "Hey, there will be an IRQ chip!" - could be passed via
> KVM_SET_IRQCHIP (it has 512 bytes space). Provided there are sane
> configuration defaults, at least after KVM_CREATE_VCPU, KVM_SET_IRQCHIP
> becomes optional. Then you don't need the a check on KVM_RUN.

Interesting.  How many times do you call KVM_CREATE_IRQCHIP per VM?
Just once?

> What do we need in addition to this in any of the non-x86 archs?

What we have with the XICS, and to some extent with the OpenPIC, is a
separation between "source" and "presentation" controllers, with a
message-passing fabric between them.  The source controllers handle
the details of some number of interrupt sources, such as the priority
and destination of each interrupt source, and the presentation
controllers handle the interface to the CPUs, so there is one
presentation controller per CPU.  The presentation controller for a
CPU has registers for the CPU priority, IPI request priority, and
pending interrupt status.

So we could indeed use the existing KVM_CREATE_IRQCHIP to tell KVM to
create a presentation controller per vcpu.  But then how do we tell
KVM how many source controllers we want and how many interrupts each
source controller should handle?  The source controllers are not tied
to any particular vcpu, and a source controller could potentially have
100s of interrupts or more (particularly with MSIs).  Configuration of
each source fits into 64 bits, so if we tried to use KVM_SET_IRQCHIP
for configuring a source controller we'd be limited to 64 interrupts
per source controller.

What I have in my patches to do XICS emulation in the kernel is a new
KVM_CREATE_IRQCHIP_ARGS ioctl, which takes an argument struct with a
type, and for source controllers, an identifying number ("bus unit ID"
or BUID, since that's what the hardware calls it) and the number of
sources.  Then for saving/restoring the presentation controller state
there's a register identifier for the KVM_GET/SET_ONE_REG ioctls, and
for the source controllers there are new KVM_IRQCHIP_GET/SET_SOURCES
ioctls that take an argument struct like this:

struct kvm_irq_sources {
	__u32 start_irq_number;
	__u32 nr_irqs;
	__u64 *irqbuf;
};

OpenPIC also can handle 100s or 1000s of interrupt sources and can
have the sources divided up into blocks (which tends to make it
desirable to have multiple source controllers).  It also has per-CPU
interrupt delivery registers and per-source interrupt source
registers.

So I think all this could be shoehorned into KVM_CREATE/GET/SET_IRQCHIP
for small configurations, but it seems like it would run out of space
for larger configurations.

Paul.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-24  0:50               ` Paul Mackerras
@ 2012-10-25 11:57                 ` Jan Kiszka
  -1 siblings, 0 replies; 102+ messages in thread
From: Jan Kiszka @ 2012-10-25 11:57 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Christoffer Dall, Benjamin Herrenschmidt, Alexander Graf, kvmarm,
	kvm, kvm-ppc, Peter Maydell

On 2012-10-24 02:50, Paul Mackerras wrote:
> On Tue, Oct 23, 2012 at 12:48:28PM +0200, Jan Kiszka wrote:
> 
>> The current irqchip API is like this:
>>
>> KVM_CREATE_IRQCHIP (without any parameters)
>> ...
>> KVM_CREATE_VCPU
>> KVM_SET_IRQCHIP (or the other way around)
>> ...
>> KVM_RUN
>>
>> The arguments you cannot pass via KVM_CREATE_IRQCHIP - which is more
>> like a "Hey, there will be an IRQ chip!" - could be passed via
>> KVM_SET_IRQCHIP (it has 512 bytes space). Provided there are sane
>> configuration defaults, at least after KVM_CREATE_VCPU, KVM_SET_IRQCHIP
>> becomes optional. Then you don't need the a check on KVM_RUN.
> 
> Interesting.  How many times do you call KVM_CREATE_IRQCHIP per VM?
> Just once?

Yes, just once. The model is that we switch between user space and
kernel space emulation (or hardware-assisted virtualization) of the
irqchip(s).

> 
>> What do we need in addition to this in any of the non-x86 archs?
> 
> What we have with the XICS, and to some extent with the OpenPIC, is a
> separation between "source" and "presentation" controllers, with a
> message-passing fabric between them.  The source controllers handle
> the details of some number of interrupt sources, such as the priority
> and destination of each interrupt source, and the presentation
> controllers handle the interface to the CPUs, so there is one
> presentation controller per CPU.  The presentation controller for a
> CPU has registers for the CPU priority, IPI request priority, and
> pending interrupt status.
> 
> So we could indeed use the existing KVM_CREATE_IRQCHIP to tell KVM to
> create a presentation controller per vcpu.  But then how do we tell
> KVM how many source controllers we want and how many interrupts each
> source controller should handle?  The source controllers are not tied
> to any particular vcpu, and a source controller could potentially have
> 100s of interrupts or more (particularly with MSIs).  Configuration of
> each source fits into 64 bits, so if we tried to use KVM_SET_IRQCHIP
> for configuring a source controller we'd be limited to 64 interrupts
> per source controller.
> 
> What I have in my patches to do XICS emulation in the kernel is a new
> KVM_CREATE_IRQCHIP_ARGS ioctl, which takes an argument struct with a
> type, and for source controllers, an identifying number ("bus unit ID"
> or BUID, since that's what the hardware calls it) and the number of
> sources.  Then for saving/restoring the presentation controller state
> there's a register identifier for the KVM_GET/SET_ONE_REG ioctls, and
> for the source controllers there are new KVM_IRQCHIP_GET/SET_SOURCES
> ioctls that take an argument struct like this:
> 
> struct kvm_irq_sources {
> 	__u32 start_irq_number;
> 	__u32 nr_irqs;
> 	__u64 *irqbuf;
> };
> 
> OpenPIC also can handle 100s or 1000s of interrupt sources and can
> have the sources divided up into blocks (which tends to make it
> desirable to have multiple source controllers).  It also has per-CPU
> interrupt delivery registers and per-source interrupt source
> registers.
> 
> So I think all this could be shoehorned into KVM_CREATE/GET/SET_IRQCHIP
> for small configurations, but it seems like it would run out of space
> for larger configurations.

Our architectures are not that different. We'll have the same challenge
on x86 one day as well as there can be several IOAPICs (source
controllers), not just one as today. Those should be addressed via
chip_id of struct kvm_irqchip (we have a 32-bit address space there).

Also there the question is when to instantiate the chips. Without adding
another IOCTL, they could be created on first SET_IRQCHIP. For Power,
the number of IRQ lines can become a set-once field in the source
controller state, i.e. must never be written twice with different values.

But, of course, some KVM_CREATE_IRQCHIP[2|_ARGS] that takes extra
arguments and specifies those details is also be an option. There the
question is how often it should be called: once with a list of all
necessary parameters or multiple times as in your model. As I'd like to
see a new IOCTL being able to replace the old one (though we will still
support it for older user space, of course), I'm leaning more toward the
first option.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-25 11:57                 ` Jan Kiszka
  0 siblings, 0 replies; 102+ messages in thread
From: Jan Kiszka @ 2012-10-25 11:57 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Christoffer Dall, Benjamin Herrenschmidt, Alexander Graf, kvmarm,
	kvm, kvm-ppc, Peter Maydell

On 2012-10-24 02:50, Paul Mackerras wrote:
> On Tue, Oct 23, 2012 at 12:48:28PM +0200, Jan Kiszka wrote:
> 
>> The current irqchip API is like this:
>>
>> KVM_CREATE_IRQCHIP (without any parameters)
>> ...
>> KVM_CREATE_VCPU
>> KVM_SET_IRQCHIP (or the other way around)
>> ...
>> KVM_RUN
>>
>> The arguments you cannot pass via KVM_CREATE_IRQCHIP - which is more
>> like a "Hey, there will be an IRQ chip!" - could be passed via
>> KVM_SET_IRQCHIP (it has 512 bytes space). Provided there are sane
>> configuration defaults, at least after KVM_CREATE_VCPU, KVM_SET_IRQCHIP
>> becomes optional. Then you don't need the a check on KVM_RUN.
> 
> Interesting.  How many times do you call KVM_CREATE_IRQCHIP per VM?
> Just once?

Yes, just once. The model is that we switch between user space and
kernel space emulation (or hardware-assisted virtualization) of the
irqchip(s).

> 
>> What do we need in addition to this in any of the non-x86 archs?
> 
> What we have with the XICS, and to some extent with the OpenPIC, is a
> separation between "source" and "presentation" controllers, with a
> message-passing fabric between them.  The source controllers handle
> the details of some number of interrupt sources, such as the priority
> and destination of each interrupt source, and the presentation
> controllers handle the interface to the CPUs, so there is one
> presentation controller per CPU.  The presentation controller for a
> CPU has registers for the CPU priority, IPI request priority, and
> pending interrupt status.
> 
> So we could indeed use the existing KVM_CREATE_IRQCHIP to tell KVM to
> create a presentation controller per vcpu.  But then how do we tell
> KVM how many source controllers we want and how many interrupts each
> source controller should handle?  The source controllers are not tied
> to any particular vcpu, and a source controller could potentially have
> 100s of interrupts or more (particularly with MSIs).  Configuration of
> each source fits into 64 bits, so if we tried to use KVM_SET_IRQCHIP
> for configuring a source controller we'd be limited to 64 interrupts
> per source controller.
> 
> What I have in my patches to do XICS emulation in the kernel is a new
> KVM_CREATE_IRQCHIP_ARGS ioctl, which takes an argument struct with a
> type, and for source controllers, an identifying number ("bus unit ID"
> or BUID, since that's what the hardware calls it) and the number of
> sources.  Then for saving/restoring the presentation controller state
> there's a register identifier for the KVM_GET/SET_ONE_REG ioctls, and
> for the source controllers there are new KVM_IRQCHIP_GET/SET_SOURCES
> ioctls that take an argument struct like this:
> 
> struct kvm_irq_sources {
> 	__u32 start_irq_number;
> 	__u32 nr_irqs;
> 	__u64 *irqbuf;
> };
> 
> OpenPIC also can handle 100s or 1000s of interrupt sources and can
> have the sources divided up into blocks (which tends to make it
> desirable to have multiple source controllers).  It also has per-CPU
> interrupt delivery registers and per-source interrupt source
> registers.
> 
> So I think all this could be shoehorned into KVM_CREATE/GET/SET_IRQCHIP
> for small configurations, but it seems like it would run out of space
> for larger configurations.

Our architectures are not that different. We'll have the same challenge
on x86 one day as well as there can be several IOAPICs (source
controllers), not just one as today. Those should be addressed via
chip_id of struct kvm_irqchip (we have a 32-bit address space there).

Also there the question is when to instantiate the chips. Without adding
another IOCTL, they could be created on first SET_IRQCHIP. For Power,
the number of IRQ lines can become a set-once field in the source
controller state, i.e. must never be written twice with different values.

But, of course, some KVM_CREATE_IRQCHIP[2|_ARGS] that takes extra
arguments and specifies those details is also be an option. There the
question is how often it should be called: once with a list of all
necessary parameters or multiple times as in your model. As I'd like to
see a new IOCTL being able to replace the old one (though we will still
support it for older user space, of course), I'm leaning more toward the
first option.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-24  0:50               ` Paul Mackerras
@ 2012-10-25 16:14                 ` Paolo Bonzini
  -1 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-25 16:14 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Jan Kiszka, Christoffer Dall, Benjamin Herrenschmidt,
	Alexander Graf, kvmarm, kvm, kvm-ppc, Peter Maydell

Il 24/10/2012 02:50, Paul Mackerras ha scritto:
> So we could indeed use the existing KVM_CREATE_IRQCHIP to tell KVM to
> create a presentation controller per vcpu.  But then how do we tell
> KVM how many source controllers we want and how many interrupts each
> source controller should handle?  The source controllers are not tied
> to any particular vcpu, and a source controller could potentially have
> 100s of interrupts or more (particularly with MSIs).  Configuration of
> each source fits into 64 bits, so if we tried to use KVM_SET_IRQCHIP
> for configuring a source controller we'd be limited to 64 interrupts
> per source controller.

As Jan said, there's more than a few similarities between the x86 and
PPC model.

Taking inspiration from the x86 API, configuring the source controller
seems to be more of a task for KVM_SET_GSI_ROUTING (a very x86-centric
name, I must admit).

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-25 16:14                 ` Paolo Bonzini
  0 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-25 16:14 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Jan Kiszka, Christoffer Dall, Benjamin Herrenschmidt,
	Alexander Graf, kvmarm, kvm, kvm-ppc, Peter Maydell

Il 24/10/2012 02:50, Paul Mackerras ha scritto:
> So we could indeed use the existing KVM_CREATE_IRQCHIP to tell KVM to
> create a presentation controller per vcpu.  But then how do we tell
> KVM how many source controllers we want and how many interrupts each
> source controller should handle?  The source controllers are not tied
> to any particular vcpu, and a source controller could potentially have
> 100s of interrupts or more (particularly with MSIs).  Configuration of
> each source fits into 64 bits, so if we tried to use KVM_SET_IRQCHIP
> for configuring a source controller we'd be limited to 64 interrupts
> per source controller.

As Jan said, there's more than a few similarities between the x86 and
PPC model.

Taking inspiration from the x86 API, configuring the source controller
seems to be more of a task for KVM_SET_GSI_ROUTING (a very x86-centric
name, I must admit).

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-25 16:14                 ` Paolo Bonzini
@ 2012-10-25 16:32                   ` Jan Kiszka
  -1 siblings, 0 replies; 102+ messages in thread
From: Jan Kiszka @ 2012-10-25 16:32 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Paul Mackerras, Christoffer Dall, Benjamin Herrenschmidt,
	Alexander Graf, kvmarm, kvm, kvm-ppc, Peter Maydell

On 2012-10-25 18:14, Paolo Bonzini wrote:
> Il 24/10/2012 02:50, Paul Mackerras ha scritto:
>> So we could indeed use the existing KVM_CREATE_IRQCHIP to tell KVM to
>> create a presentation controller per vcpu.  But then how do we tell
>> KVM how many source controllers we want and how many interrupts each
>> source controller should handle?  The source controllers are not tied
>> to any particular vcpu, and a source controller could potentially have
>> 100s of interrupts or more (particularly with MSIs).  Configuration of
>> each source fits into 64 bits, so if we tried to use KVM_SET_IRQCHIP
>> for configuring a source controller we'd be limited to 64 interrupts
>> per source controller.
> 
> As Jan said, there's more than a few similarities between the x86 and
> PPC model.
> 
> Taking inspiration from the x86 API, configuring the source controller
> seems to be more of a task for KVM_SET_GSI_ROUTING (a very x86-centric
> name, I must admit).

For wiring things together, I agree. But the IOAPIC has a fixed number
of input lines per chip, Power needs to configure them. I don't think
deriving this from addressed lines is the best solution. Some lines may
be configured (too) late, when the chip should have defined its
configuration already.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-25 16:32                   ` Jan Kiszka
  0 siblings, 0 replies; 102+ messages in thread
From: Jan Kiszka @ 2012-10-25 16:32 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Paul Mackerras, Christoffer Dall, Benjamin Herrenschmidt,
	Alexander Graf, kvmarm, kvm, kvm-ppc, Peter Maydell

On 2012-10-25 18:14, Paolo Bonzini wrote:
> Il 24/10/2012 02:50, Paul Mackerras ha scritto:
>> So we could indeed use the existing KVM_CREATE_IRQCHIP to tell KVM to
>> create a presentation controller per vcpu.  But then how do we tell
>> KVM how many source controllers we want and how many interrupts each
>> source controller should handle?  The source controllers are not tied
>> to any particular vcpu, and a source controller could potentially have
>> 100s of interrupts or more (particularly with MSIs).  Configuration of
>> each source fits into 64 bits, so if we tried to use KVM_SET_IRQCHIP
>> for configuring a source controller we'd be limited to 64 interrupts
>> per source controller.
> 
> As Jan said, there's more than a few similarities between the x86 and
> PPC model.
> 
> Taking inspiration from the x86 API, configuring the source controller
> seems to be more of a task for KVM_SET_GSI_ROUTING (a very x86-centric
> name, I must admit).

For wiring things together, I agree. But the IOAPIC has a fixed number
of input lines per chip, Power needs to configure them. I don't think
deriving this from addressed lines is the best solution. Some lines may
be configured (too) late, when the chip should have defined its
configuration already.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-25 16:32                   ` Jan Kiszka
@ 2012-10-25 18:27                     ` Paolo Bonzini
  -1 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-25 18:27 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Paul Mackerras, Christoffer Dall, Benjamin Herrenschmidt,
	Alexander Graf, kvmarm, kvm, kvm-ppc, Peter Maydell

Il 25/10/2012 18:32, Jan Kiszka ha scritto:
> For wiring things together, I agree. But the IOAPIC has a fixed number
> of input lines per chip, Power needs to configure them. I don't think
> deriving this from addressed lines is the best solution. Some lines may
> be configured (too) late, when the chip should have defined its
> configuration already.

Sure, I was replying to this only: "Configuration of each source fits
into 64 bits, so if we tried to use KVM_SET_IRQCHIP for configuring a
source controller we'd be limited to 64 interrupts per source
controller" (I figure 64 is the size of the KVM_SET_IRQCHIP union / 64
bits per entry).

Probably you do need a variant of KVM_CREATE_IRQCHIP to create the
IOAPICs/source controllers (Paul's proposal at
http://permalink.gmane.org/gmane.comp.emulators.kvm.powerpc.devel/5674
for example), assign chip ids to them, set the number of input lines,
etc. but the configuration should work well with the existing ioctls,
with no limit on the number of sources.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-25 18:27                     ` Paolo Bonzini
  0 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-25 18:27 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Paul Mackerras, Christoffer Dall, Benjamin Herrenschmidt,
	Alexander Graf, kvmarm, kvm, kvm-ppc, Peter Maydell

Il 25/10/2012 18:32, Jan Kiszka ha scritto:
> For wiring things together, I agree. But the IOAPIC has a fixed number
> of input lines per chip, Power needs to configure them. I don't think
> deriving this from addressed lines is the best solution. Some lines may
> be configured (too) late, when the chip should have defined its
> configuration already.

Sure, I was replying to this only: "Configuration of each source fits
into 64 bits, so if we tried to use KVM_SET_IRQCHIP for configuring a
source controller we'd be limited to 64 interrupts per source
controller" (I figure 64 is the size of the KVM_SET_IRQCHIP union / 64
bits per entry).

Probably you do need a variant of KVM_CREATE_IRQCHIP to create the
IOAPICs/source controllers (Paul's proposal at
http://permalink.gmane.org/gmane.comp.emulators.kvm.powerpc.devel/5674
for example), assign chip ids to them, set the number of input lines,
etc. but the configuration should work well with the existing ioctls,
with no limit on the number of sources.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-25 16:14                 ` Paolo Bonzini
@ 2012-10-25 19:39                   ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 102+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-25 19:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Paul Mackerras, Jan Kiszka, Christoffer Dall, Alexander Graf,
	kvmarm, kvm, kvm-ppc, Peter Maydell

On Thu, 2012-10-25 at 18:14 +0200, Paolo Bonzini wrote:
> As Jan said, there's more than a few similarities between the x86 and
> PPC model.
> 
> Taking inspiration from the x86 API, configuring the source controller
> seems to be more of a task for KVM_SET_GSI_ROUTING (a very x86-centric
> name, I must admit). 

Very x86 centric name, very x86 centric functionality as well with a
very x86 centric implementation. I don't know what makes you think
there's anything remotely re-usable but I looked at it a while back and
there isn't that I can find.

The "configuration" of the source controller consists of creating it
with a number of sources and a BUID, that's about it. Then there's also
APIs to extract/set its state for migration.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-25 19:39                   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 102+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-25 19:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Paul Mackerras, Jan Kiszka, Christoffer Dall, Alexander Graf,
	kvmarm, kvm, kvm-ppc, Peter Maydell

On Thu, 2012-10-25 at 18:14 +0200, Paolo Bonzini wrote:
> As Jan said, there's more than a few similarities between the x86 and
> PPC model.
> 
> Taking inspiration from the x86 API, configuring the source controller
> seems to be more of a task for KVM_SET_GSI_ROUTING (a very x86-centric
> name, I must admit). 

Very x86 centric name, very x86 centric functionality as well with a
very x86 centric implementation. I don't know what makes you think
there's anything remotely re-usable but I looked at it a while back and
there isn't that I can find.

The "configuration" of the source controller consists of creating it
with a number of sources and a BUID, that's about it. Then there's also
APIs to extract/set its state for migration.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-25 18:27                     ` Paolo Bonzini
@ 2012-10-25 19:40                       ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 102+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-25 19:40 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jan Kiszka, Paul Mackerras, Christoffer Dall, Alexander Graf,
	kvmarm, kvm, kvm-ppc, Peter Maydell

On Thu, 2012-10-25 at 20:27 +0200, Paolo Bonzini wrote:
> Probably you do need a variant of KVM_CREATE_IRQCHIP to create the
> IOAPICs/source controllers (Paul's proposal at
> http://permalink.gmane.org/gmane.comp.emulators.kvm.powerpc.devel/5674
> for example), assign chip ids to them, set the number of input lines,
> etc. but the configuration should work well with the existing ioctls,
> with no limit on the number of sources. 

But what do you mean by "configuration" really ? I don't see anything in
common there.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-25 19:40                       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 102+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-25 19:40 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jan Kiszka, Paul Mackerras, Christoffer Dall, Alexander Graf,
	kvmarm, kvm, kvm-ppc, Peter Maydell

On Thu, 2012-10-25 at 20:27 +0200, Paolo Bonzini wrote:
> Probably you do need a variant of KVM_CREATE_IRQCHIP to create the
> IOAPICs/source controllers (Paul's proposal at
> http://permalink.gmane.org/gmane.comp.emulators.kvm.powerpc.devel/5674
> for example), assign chip ids to them, set the number of input lines,
> etc. but the configuration should work well with the existing ioctls,
> with no limit on the number of sources. 

But what do you mean by "configuration" really ? I don't see anything in
common there.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-25 19:40                       ` Benjamin Herrenschmidt
@ 2012-10-26  9:58                         ` Paolo Bonzini
  -1 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-26  9:58 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jan Kiszka, Paul Mackerras, Christoffer Dall, Alexander Graf,
	kvmarm, kvm, kvm-ppc, Peter Maydell

Il 25/10/2012 21:40, Benjamin Herrenschmidt ha scritto:
>> > Probably you do need a variant of KVM_CREATE_IRQCHIP to create the
>> > IOAPICs/source controllers (Paul's proposal at
>> > http://permalink.gmane.org/gmane.comp.emulators.kvm.powerpc.devel/5674
>> > for example), assign chip ids to them, set the number of input lines,
>> > etc. but the configuration should work well with the existing ioctls,
>> > with no limit on the number of sources. 
> But what do you mean by "configuration" really ? I don't see anything in
> common there.

Wiring which MSI-X interrupts go to which source controllers.  If you
have one source controller per PCI bridge, you need to tell the kernel
the mapping between MSI messages interrupts and PCI bridges, and update
it whenever the MSI masking changes.

The other problem is configuring the redirection table.  If you need >64
sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-26  9:58                         ` Paolo Bonzini
  0 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-26  9:58 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jan Kiszka, Paul Mackerras, Christoffer Dall, Alexander Graf,
	kvmarm, kvm, kvm-ppc, Peter Maydell

Il 25/10/2012 21:40, Benjamin Herrenschmidt ha scritto:
>> > Probably you do need a variant of KVM_CREATE_IRQCHIP to create the
>> > IOAPICs/source controllers (Paul's proposal at
>> > http://permalink.gmane.org/gmane.comp.emulators.kvm.powerpc.devel/5674
>> > for example), assign chip ids to them, set the number of input lines,
>> > etc. but the configuration should work well with the existing ioctls,
>> > with no limit on the number of sources. 
> But what do you mean by "configuration" really ? I don't see anything in
> common there.

Wiring which MSI-X interrupts go to which source controllers.  If you
have one source controller per PCI bridge, you need to tell the kernel
the mapping between MSI messages interrupts and PCI bridges, and update
it whenever the MSI masking changes.

The other problem is configuring the redirection table.  If you need >64
sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-26  9:58                         ` Paolo Bonzini
@ 2012-10-26 10:09                           ` Peter Maydell
  -1 siblings, 0 replies; 102+ messages in thread
From: Peter Maydell @ 2012-10-26 10:09 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Benjamin Herrenschmidt, Jan Kiszka, Paul Mackerras,
	Christoffer Dall, Alexander Graf, kvmarm, kvm, kvm-ppc

On 26 October 2012 10:58, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Wiring which MSI-X interrupts go to which source controllers.  If you
> have one source controller per PCI bridge, you need to tell the kernel
> the mapping between MSI messages interrupts and PCI bridges, and update
> it whenever the MSI masking changes.
>
> The other problem is configuring the redirection table.  If you need >64
> sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG.

Why would you want an extra ONE_REG-like ioctl? The existing ONE_REG
ioctls have plenty of space in the ID range to allow you to devote
a subsection of it to your irqchip. (This is exactly how the ARM
VGIC save/load is going to work.)

Whether you want to do startup configuration and board wiring via
the same ioctl that handles runtime state save/load/migration is
a different question, of course.

-- PMM

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-26 10:09                           ` Peter Maydell
  0 siblings, 0 replies; 102+ messages in thread
From: Peter Maydell @ 2012-10-26 10:09 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Benjamin Herrenschmidt, Jan Kiszka, Paul Mackerras,
	Christoffer Dall, Alexander Graf, kvmarm, kvm, kvm-ppc

On 26 October 2012 10:58, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Wiring which MSI-X interrupts go to which source controllers.  If you
> have one source controller per PCI bridge, you need to tell the kernel
> the mapping between MSI messages interrupts and PCI bridges, and update
> it whenever the MSI masking changes.
>
> The other problem is configuring the redirection table.  If you need >64
> sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG.

Why would you want an extra ONE_REG-like ioctl? The existing ONE_REG
ioctls have plenty of space in the ID range to allow you to devote
a subsection of it to your irqchip. (This is exactly how the ARM
VGIC save/load is going to work.)

Whether you want to do startup configuration and board wiring via
the same ioctl that handles runtime state save/load/migration is
a different question, of course.

-- PMM

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-26 10:09                           ` Peter Maydell
@ 2012-10-26 10:15                             ` Paolo Bonzini
  -1 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-26 10:15 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Benjamin Herrenschmidt, Jan Kiszka, Paul Mackerras,
	Christoffer Dall, Alexander Graf, kvmarm, kvm, kvm-ppc

Il 26/10/2012 12:09, Peter Maydell ha scritto:
>> >
>> > The other problem is configuring the redirection table.  If you need >64
>> > sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG.
> Why would you want an extra ONE_REG-like ioctl? The existing ONE_REG
> ioctls have plenty of space in the ID range to allow you to devote
> a subsection of it to your irqchip. (This is exactly how the ARM
> VGIC save/load is going to work.)

Ok, I stand corrected. :)

> Whether you want to do startup configuration and board wiring via
> the same ioctl that handles runtime state save/load/migration is
> a different question, of course.

QEMU's MSI-X routing is not x86-specific, so it should use the same
KVM_SET_GSI_ROUTING ioctl that x86 uses.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-26 10:15                             ` Paolo Bonzini
  0 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-26 10:15 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Benjamin Herrenschmidt, Jan Kiszka, Paul Mackerras,
	Christoffer Dall, Alexander Graf, kvmarm, kvm, kvm-ppc

Il 26/10/2012 12:09, Peter Maydell ha scritto:
>> >
>> > The other problem is configuring the redirection table.  If you need >64
>> > sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG.
> Why would you want an extra ONE_REG-like ioctl? The existing ONE_REG
> ioctls have plenty of space in the ID range to allow you to devote
> a subsection of it to your irqchip. (This is exactly how the ARM
> VGIC save/load is going to work.)

Ok, I stand corrected. :)

> Whether you want to do startup configuration and board wiring via
> the same ioctl that handles runtime state save/load/migration is
> a different question, of course.

QEMU's MSI-X routing is not x86-specific, so it should use the same
KVM_SET_GSI_ROUTING ioctl that x86 uses.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-26 10:15                             ` Paolo Bonzini
@ 2012-10-26 10:22                               ` Jan Kiszka
  -1 siblings, 0 replies; 102+ messages in thread
From: Jan Kiszka @ 2012-10-26 10:22 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Maydell, Benjamin Herrenschmidt, Paul Mackerras,
	Christoffer Dall, Alexander Graf, kvmarm, kvm, kvm-ppc

On 2012-10-26 12:15, Paolo Bonzini wrote:
> Il 26/10/2012 12:09, Peter Maydell ha scritto:
>>>>
>>>> The other problem is configuring the redirection table.  If you need >64
>>>> sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG.
>> Why would you want an extra ONE_REG-like ioctl? The existing ONE_REG
>> ioctls have plenty of space in the ID range to allow you to devote
>> a subsection of it to your irqchip. (This is exactly how the ARM
>> VGIC save/load is going to work.)
> 
> Ok, I stand corrected. :)
> 
>> Whether you want to do startup configuration and board wiring via
>> the same ioctl that handles runtime state save/load/migration is
>> a different question, of course.
> 
> QEMU's MSI-X routing is not x86-specific, so it should use the same
> KVM_SET_GSI_ROUTING ioctl that x86 uses.

And it's not only MSI[-X]. Most IRQ sources need to be rounted, either
from userspace or from irqfd or from some other in-kernel source to a
specific IRQ controller. That allows to customize things according to a
specific board / SoC emulation.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-26 10:22                               ` Jan Kiszka
  0 siblings, 0 replies; 102+ messages in thread
From: Jan Kiszka @ 2012-10-26 10:22 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Maydell, Benjamin Herrenschmidt, Paul Mackerras,
	Christoffer Dall, Alexander Graf, kvmarm, kvm, kvm-ppc

On 2012-10-26 12:15, Paolo Bonzini wrote:
> Il 26/10/2012 12:09, Peter Maydell ha scritto:
>>>>
>>>> The other problem is configuring the redirection table.  If you need >64
>>>> sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG.
>> Why would you want an extra ONE_REG-like ioctl? The existing ONE_REG
>> ioctls have plenty of space in the ID range to allow you to devote
>> a subsection of it to your irqchip. (This is exactly how the ARM
>> VGIC save/load is going to work.)
> 
> Ok, I stand corrected. :)
> 
>> Whether you want to do startup configuration and board wiring via
>> the same ioctl that handles runtime state save/load/migration is
>> a different question, of course.
> 
> QEMU's MSI-X routing is not x86-specific, so it should use the same
> KVM_SET_GSI_ROUTING ioctl that x86 uses.

And it's not only MSI[-X]. Most IRQ sources need to be rounted, either
from userspace or from irqfd or from some other in-kernel source to a
specific IRQ controller. That allows to customize things according to a
specific board / SoC emulation.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-26  9:58                         ` Paolo Bonzini
@ 2012-10-26 10:37                           ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 102+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-26 10:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jan Kiszka, Paul Mackerras, Christoffer Dall, Alexander Graf,
	kvmarm, kvm, kvm-ppc, Peter Maydell

On Fri, 2012-10-26 at 11:58 +0200, Paolo Bonzini wrote:
> Il 25/10/2012 21:40, Benjamin Herrenschmidt ha scritto:
> >> > Probably you do need a variant of KVM_CREATE_IRQCHIP to create the
> >> > IOAPICs/source controllers (Paul's proposal at
> >> > http://permalink.gmane.org/gmane.comp.emulators.kvm.powerpc.devel/5674
> >> > for example), assign chip ids to them, set the number of input lines,
> >> > etc. but the configuration should work well with the existing ioctls,
> >> > with no limit on the number of sources. 
> > But what do you mean by "configuration" really ? I don't see anything in
> > common there.
> 
> Wiring which MSI-X interrupts go to which source controllers.  If you
> have one source controller per PCI bridge, you need to tell the kernel
> the mapping between MSI messages interrupts and PCI bridges, and update
> it whenever the MSI masking changes.

Not sure I get it. Are you talking in the context of PCI pass-through ?
Each PCI bridge on POWER has its own set of MSIs though for emulated
bridges it's a non-issue, it's all dealt with by qemu, so I'm not sure
what you mean here.

> The other problem is configuring the redirection table.  If you need >64
> sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG.

Well, all of that is totally specific to the IO-APIC design &
limitations as far as I can tell. What is the "redirection table" ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-26 10:37                           ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 102+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-26 10:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jan Kiszka, Paul Mackerras, Christoffer Dall, Alexander Graf,
	kvmarm, kvm, kvm-ppc, Peter Maydell

On Fri, 2012-10-26 at 11:58 +0200, Paolo Bonzini wrote:
> Il 25/10/2012 21:40, Benjamin Herrenschmidt ha scritto:
> >> > Probably you do need a variant of KVM_CREATE_IRQCHIP to create the
> >> > IOAPICs/source controllers (Paul's proposal at
> >> > http://permalink.gmane.org/gmane.comp.emulators.kvm.powerpc.devel/5674
> >> > for example), assign chip ids to them, set the number of input lines,
> >> > etc. but the configuration should work well with the existing ioctls,
> >> > with no limit on the number of sources. 
> > But what do you mean by "configuration" really ? I don't see anything in
> > common there.
> 
> Wiring which MSI-X interrupts go to which source controllers.  If you
> have one source controller per PCI bridge, you need to tell the kernel
> the mapping between MSI messages interrupts and PCI bridges, and update
> it whenever the MSI masking changes.

Not sure I get it. Are you talking in the context of PCI pass-through ?
Each PCI bridge on POWER has its own set of MSIs though for emulated
bridges it's a non-issue, it's all dealt with by qemu, so I'm not sure
what you mean here.

> The other problem is configuring the redirection table.  If you need >64
> sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG.

Well, all of that is totally specific to the IO-APIC design &
limitations as far as I can tell. What is the "redirection table" ?

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-26 10:37                           ` Benjamin Herrenschmidt
@ 2012-10-26 10:40                             ` Paolo Bonzini
  -1 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-26 10:40 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jan Kiszka, Paul Mackerras, Christoffer Dall, Alexander Graf,
	kvmarm, kvm, kvm-ppc, Peter Maydell

Il 26/10/2012 12:37, Benjamin Herrenschmidt ha scritto:
>> > Wiring which MSI-X interrupts go to which source controllers.  If you
>> > have one source controller per PCI bridge, you need to tell the kernel
>> > the mapping between MSI messages interrupts and PCI bridges, and update
>> > it whenever the MSI masking changes.
> Not sure I get it. Are you talking in the context of PCI pass-through ?

Not just that, it's also for emulated devices that do MSI-X (virtio-pci
does).

> > The other problem is configuring the redirection table.  If you need >64
> > sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG.
> Well, all of that is totally specific to the IO-APIC design &
> limitations as far as I can tell. What is the "redirection table" ?

The wiring between source and presentation controllers, roughly.  I
suppose that's what Paul referred to when he said there's 64 bits of
config info per source in the source controllers.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-26 10:40                             ` Paolo Bonzini
  0 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-26 10:40 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jan Kiszka, Paul Mackerras, Christoffer Dall, Alexander Graf,
	kvmarm, kvm, kvm-ppc, Peter Maydell

Il 26/10/2012 12:37, Benjamin Herrenschmidt ha scritto:
>> > Wiring which MSI-X interrupts go to which source controllers.  If you
>> > have one source controller per PCI bridge, you need to tell the kernel
>> > the mapping between MSI messages interrupts and PCI bridges, and update
>> > it whenever the MSI masking changes.
> Not sure I get it. Are you talking in the context of PCI pass-through ?

Not just that, it's also for emulated devices that do MSI-X (virtio-pci
does).

> > The other problem is configuring the redirection table.  If you need >64
> > sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG.
> Well, all of that is totally specific to the IO-APIC design &
> limitations as far as I can tell. What is the "redirection table" ?

The wiring between source and presentation controllers, roughly.  I
suppose that's what Paul referred to when he said there's 64 bits of
config info per source in the source controllers.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-26 10:15                             ` Paolo Bonzini
@ 2012-10-26 10:44                               ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 102+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-26 10:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Maydell, Jan Kiszka, Paul Mackerras, Christoffer Dall,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On Fri, 2012-10-26 at 12:15 +0200, Paolo Bonzini wrote:
> > Whether you want to do startup configuration and board wiring via
> > the same ioctl that handles runtime state save/load/migration is
> > a different question, of course.
> 
> QEMU's MSI-X routing is not x86-specific, so it should use the same
> KVM_SET_GSI_ROUTING ioctl that x86 uses. 

Well, that's the thing, I haven't managed to figure that out so far, it
looks very x86-specific to me. To begin with there's no such thing as a
"GSI" in our world.

Basically we have a global interrupt number space. Interrupt numbers are
24-bit long quantities. On real HW, some bits are called the "BUID" and
identify a given source controller and some bits are the interrupt
within that source controller but that's fairly flexible and generally
the OS doesn't care about it. The firmware sets up the mappings and
tells us the final numbers via the device-tree.

Under a hypervisor, it's totally virtualized already so we show a flat
24 bit number space to the guest.

MSIs don't work exactly like x86 either. On real HW, we have a different
MSI port per "partitionable endpoint" which are use purely for
validation of access permission. The message itself contain the
interrupt source number within the BUID of the bridge. A given bridge
today can contains up to 256 of these on a P7IOC chip but upcoming stuff
can have thousands. The final interrupt number seen by the OS is thus
just that MSI message in the bottom bits and the BUID in the top bits.

Here too, under a hypervisor, it's all virtualized so qemu just gives 24
bit numbers to the various emulated MSIs as part of the global interrupt
number space.

I'm not sure how any of that would need kernel communication. All we
need is to be able to associate a given global interrupt with an
eventfd.

I might just miss some subtleties here but so far I haven't been able to
figure out how to "shoehorn" our stuff in the very x86-centric existing
interfaces to the kernel APICs. In fact all that code is in a generic
location in kvm but is really x86/ia64 centric and the interfaces seem
to be as well.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-26 10:44                               ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 102+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-26 10:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Maydell, Jan Kiszka, Paul Mackerras, Christoffer Dall,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On Fri, 2012-10-26 at 12:15 +0200, Paolo Bonzini wrote:
> > Whether you want to do startup configuration and board wiring via
> > the same ioctl that handles runtime state save/load/migration is
> > a different question, of course.
> 
> QEMU's MSI-X routing is not x86-specific, so it should use the same
> KVM_SET_GSI_ROUTING ioctl that x86 uses. 

Well, that's the thing, I haven't managed to figure that out so far, it
looks very x86-specific to me. To begin with there's no such thing as a
"GSI" in our world.

Basically we have a global interrupt number space. Interrupt numbers are
24-bit long quantities. On real HW, some bits are called the "BUID" and
identify a given source controller and some bits are the interrupt
within that source controller but that's fairly flexible and generally
the OS doesn't care about it. The firmware sets up the mappings and
tells us the final numbers via the device-tree.

Under a hypervisor, it's totally virtualized already so we show a flat
24 bit number space to the guest.

MSIs don't work exactly like x86 either. On real HW, we have a different
MSI port per "partitionable endpoint" which are use purely for
validation of access permission. The message itself contain the
interrupt source number within the BUID of the bridge. A given bridge
today can contains up to 256 of these on a P7IOC chip but upcoming stuff
can have thousands. The final interrupt number seen by the OS is thus
just that MSI message in the bottom bits and the BUID in the top bits.

Here too, under a hypervisor, it's all virtualized so qemu just gives 24
bit numbers to the various emulated MSIs as part of the global interrupt
number space.

I'm not sure how any of that would need kernel communication. All we
need is to be able to associate a given global interrupt with an
eventfd.

I might just miss some subtleties here but so far I haven't been able to
figure out how to "shoehorn" our stuff in the very x86-centric existing
interfaces to the kernel APICs. In fact all that code is in a generic
location in kvm but is really x86/ia64 centric and the interfaces seem
to be as well.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-26 10:40                             ` Paolo Bonzini
@ 2012-10-26 10:47                               ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 102+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-26 10:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jan Kiszka, Paul Mackerras, Christoffer Dall, Alexander Graf,
	kvmarm, kvm, kvm-ppc, Peter Maydell

On Fri, 2012-10-26 at 12:40 +0200, Paolo Bonzini wrote:
> Il 26/10/2012 12:37, Benjamin Herrenschmidt ha scritto:
> >> > Wiring which MSI-X interrupts go to which source controllers.  If you
> >> > have one source controller per PCI bridge, you need to tell the kernel
> >> > the mapping between MSI messages interrupts and PCI bridges, and update
> >> > it whenever the MSI masking changes.
> > Not sure I get it. Are you talking in the context of PCI pass-through ?
> 
> Not just that, it's also for emulated devices that do MSI-X (virtio-pci
> does).

Then I really don't get it.

> > > The other problem is configuring the redirection table.  If you need >64
> > > sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG.
> > Well, all of that is totally specific to the IO-APIC design &
> > limitations as far as I can tell. What is the "redirection table" ?
> 
> The wiring between source and presentation controllers, roughly.  I
> suppose that's what Paul referred to when he said there's 64 bits of
> config info per source in the source controllers.

But that's the point. We don't have such "wiring". The interrupt number
space is global. In HW it's via special messages in the fabric. The
firmware configures the various source controllers at boot time by
assigning them a BUID which basically comprises the top bits of the
interrupt number.

Or do you mean the routing configured by the user ? IE. Affinity ? If
yes, then that's indeed what the 64-bit per source is. Each interrupt
source has some state including the configured target presentation
controller (plus associated link info for distributed interrupts), a
priority setting, and some internal state bits that need to be preserved
in the case of migration.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-26 10:47                               ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 102+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-26 10:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jan Kiszka, Paul Mackerras, Christoffer Dall, Alexander Graf,
	kvmarm, kvm, kvm-ppc, Peter Maydell

On Fri, 2012-10-26 at 12:40 +0200, Paolo Bonzini wrote:
> Il 26/10/2012 12:37, Benjamin Herrenschmidt ha scritto:
> >> > Wiring which MSI-X interrupts go to which source controllers.  If you
> >> > have one source controller per PCI bridge, you need to tell the kernel
> >> > the mapping between MSI messages interrupts and PCI bridges, and update
> >> > it whenever the MSI masking changes.
> > Not sure I get it. Are you talking in the context of PCI pass-through ?
> 
> Not just that, it's also for emulated devices that do MSI-X (virtio-pci
> does).

Then I really don't get it.

> > > The other problem is configuring the redirection table.  If you need >64
> > > sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG.
> > Well, all of that is totally specific to the IO-APIC design &
> > limitations as far as I can tell. What is the "redirection table" ?
> 
> The wiring between source and presentation controllers, roughly.  I
> suppose that's what Paul referred to when he said there's 64 bits of
> config info per source in the source controllers.

But that's the point. We don't have such "wiring". The interrupt number
space is global. In HW it's via special messages in the fabric. The
firmware configures the various source controllers at boot time by
assigning them a BUID which basically comprises the top bits of the
interrupt number.

Or do you mean the routing configured by the user ? IE. Affinity ? If
yes, then that's indeed what the 64-bit per source is. Each interrupt
source has some state including the configured target presentation
controller (plus associated link info for distributed interrupts), a
priority setting, and some internal state bits that need to be preserved
in the case of migration.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-26 10:44                               ` Benjamin Herrenschmidt
@ 2012-10-26 11:00                                 ` Jan Kiszka
  -1 siblings, 0 replies; 102+ messages in thread
From: Jan Kiszka @ 2012-10-26 11:00 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Paolo Bonzini, Peter Maydell, Paul Mackerras, Christoffer Dall,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On 2012-10-26 12:44, Benjamin Herrenschmidt wrote:
> On Fri, 2012-10-26 at 12:15 +0200, Paolo Bonzini wrote:
>>> Whether you want to do startup configuration and board wiring via
>>> the same ioctl that handles runtime state save/load/migration is
>>> a different question, of course.
>>
>> QEMU's MSI-X routing is not x86-specific, so it should use the same
>> KVM_SET_GSI_ROUTING ioctl that x86 uses. 
> 
> Well, that's the thing, I haven't managed to figure that out so far, it
> looks very x86-specific to me. To begin with there's no such thing as a
> "GSI" in our world.
> 
> Basically we have a global interrupt number space. Interrupt numbers are
> 24-bit long quantities. On real HW, some bits are called the "BUID" and
> identify a given source controller and some bits are the interrupt
> within that source controller but that's fairly flexible and generally
> the OS doesn't care about it. The firmware sets up the mappings and
> tells us the final numbers via the device-tree.
> 
> Under a hypervisor, it's totally virtualized already so we show a flat
> 24 bit number space to the guest.
> 
> MSIs don't work exactly like x86 either. On real HW, we have a different
> MSI port per "partitionable endpoint" which are use purely for
> validation of access permission. The message itself contain the
> interrupt source number within the BUID of the bridge. A given bridge
> today can contains up to 256 of these on a P7IOC chip but upcoming stuff
> can have thousands. The final interrupt number seen by the OS is thus
> just that MSI message in the bottom bits and the BUID in the top bits.
> 
> Here too, under a hypervisor, it's all virtualized so qemu just gives 24
> bit numbers to the various emulated MSIs as part of the global interrupt
> number space.
> 
> I'm not sure how any of that would need kernel communication. All we
> need is to be able to associate a given global interrupt with an
> eventfd.

And at latest there you will need the IRQ routing infrastructure of KVM.
It tells KVM which "virtual IRQ" (badly named "GSI") triggers which
event at which input, e.g. a physical IRQ line at some IRQ controller or
a specific message at some MSI receiver. You shouldn't try to invent a
Power wheel here, rather tune the existing one to become more generic.
We could even try to get rid of that unfortunate GSI name (when leaving
aliases behind), though that is cosmetic.

> 
> I might just miss some subtleties here but so far I haven't been able to
> figure out how to "shoehorn" our stuff in the very x86-centric existing
> interfaces to the kernel APICs. In fact all that code is in a generic
> location in kvm but is really x86/ia64 centric and the interfaces seem
> to be as well.

That's not true in general, though you surely find a lot of traces and
still a few concrete x86 bits under virt/kvm.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-26 11:00                                 ` Jan Kiszka
  0 siblings, 0 replies; 102+ messages in thread
From: Jan Kiszka @ 2012-10-26 11:00 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Paolo Bonzini, Peter Maydell, Paul Mackerras, Christoffer Dall,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On 2012-10-26 12:44, Benjamin Herrenschmidt wrote:
> On Fri, 2012-10-26 at 12:15 +0200, Paolo Bonzini wrote:
>>> Whether you want to do startup configuration and board wiring via
>>> the same ioctl that handles runtime state save/load/migration is
>>> a different question, of course.
>>
>> QEMU's MSI-X routing is not x86-specific, so it should use the same
>> KVM_SET_GSI_ROUTING ioctl that x86 uses. 
> 
> Well, that's the thing, I haven't managed to figure that out so far, it
> looks very x86-specific to me. To begin with there's no such thing as a
> "GSI" in our world.
> 
> Basically we have a global interrupt number space. Interrupt numbers are
> 24-bit long quantities. On real HW, some bits are called the "BUID" and
> identify a given source controller and some bits are the interrupt
> within that source controller but that's fairly flexible and generally
> the OS doesn't care about it. The firmware sets up the mappings and
> tells us the final numbers via the device-tree.
> 
> Under a hypervisor, it's totally virtualized already so we show a flat
> 24 bit number space to the guest.
> 
> MSIs don't work exactly like x86 either. On real HW, we have a different
> MSI port per "partitionable endpoint" which are use purely for
> validation of access permission. The message itself contain the
> interrupt source number within the BUID of the bridge. A given bridge
> today can contains up to 256 of these on a P7IOC chip but upcoming stuff
> can have thousands. The final interrupt number seen by the OS is thus
> just that MSI message in the bottom bits and the BUID in the top bits.
> 
> Here too, under a hypervisor, it's all virtualized so qemu just gives 24
> bit numbers to the various emulated MSIs as part of the global interrupt
> number space.
> 
> I'm not sure how any of that would need kernel communication. All we
> need is to be able to associate a given global interrupt with an
> eventfd.

And at latest there you will need the IRQ routing infrastructure of KVM.
It tells KVM which "virtual IRQ" (badly named "GSI") triggers which
event at which input, e.g. a physical IRQ line at some IRQ controller or
a specific message at some MSI receiver. You shouldn't try to invent a
Power wheel here, rather tune the existing one to become more generic.
We could even try to get rid of that unfortunate GSI name (when leaving
aliases behind), though that is cosmetic.

> 
> I might just miss some subtleties here but so far I haven't been able to
> figure out how to "shoehorn" our stuff in the very x86-centric existing
> interfaces to the kernel APICs. In fact all that code is in a generic
> location in kvm but is really x86/ia64 centric and the interfaces seem
> to be as well.

That's not true in general, though you surely find a lot of traces and
still a few concrete x86 bits under virt/kvm.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-26 11:00                                 ` Jan Kiszka
@ 2012-10-26 11:09                                   ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 102+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-26 11:09 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Paolo Bonzini, Peter Maydell, Paul Mackerras, Christoffer Dall,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On Fri, 2012-10-26 at 13:00 +0200, Jan Kiszka wrote:

> And at latest there you will need the IRQ routing infrastructure of KVM.
> It tells KVM which "virtual IRQ" (badly named "GSI") triggers which
> event at which input, e.g. a physical IRQ line at some IRQ controller or
> a specific message at some MSI receiver. You shouldn't try to invent a
> Power wheel here, rather tune the existing one to become more generic.
> We could even try to get rid of that unfortunate GSI name (when leaving
> aliases behind), though that is cosmetic.

Ok, there must be something wrong with me, I still don't understand what
you are talking about.

What "MSI receiver" ? What physical IRQ line are you talking about ? How
is the kernel involved ?

The only cases I can think of are the association between a virtual
interrupt (ie, an interrupt in the guest interrupt number space) and an
in-kernel source for that interrupt, ie, vhost and PCI pass-through
essentially.

Anything else is under qemu control. IE. MSIs or LSIs generated by
emulated devices are just normal interrupt that go through our ioctl to
trigger the in-kernel source with the same number.

I don't see any "routing" happening anywhere in that picture really. The
firmware calls done by the guest to change the target of interrupts
(which CPU/presentation controller to direct a given interrupt to) are
handled entirely in the kernel in platform specific code and update our
internal ICS state.

> > 
> > I might just miss some subtleties here but so far I haven't been able to
> > figure out how to "shoehorn" our stuff in the very x86-centric existing
> > interfaces to the kernel APICs. In fact all that code is in a generic
> > location in kvm but is really x86/ia64 centric and the interfaces seem
> > to be as well.
> 
> That's not true in general, though you surely find a lot of traces and
> still a few concrete x86 bits under virt/kvm.

Well, I haven't found anything in virt/kvm/irq_comm.c that was of any
use to us. Again, I might be sufferring from a major misunderstanding
here but as far as I can tell, the model is totally different. Besides,
that file has a hard coded list of what looks like completely x86
specific mappings between "GSI" and "interrupt numbers" (again I don't
understand completely the distinction and I don't think we have anything
like it on power).

Ben.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-26 11:09                                   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 102+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-26 11:09 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Paolo Bonzini, Peter Maydell, Paul Mackerras, Christoffer Dall,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On Fri, 2012-10-26 at 13:00 +0200, Jan Kiszka wrote:

> And at latest there you will need the IRQ routing infrastructure of KVM.
> It tells KVM which "virtual IRQ" (badly named "GSI") triggers which
> event at which input, e.g. a physical IRQ line at some IRQ controller or
> a specific message at some MSI receiver. You shouldn't try to invent a
> Power wheel here, rather tune the existing one to become more generic.
> We could even try to get rid of that unfortunate GSI name (when leaving
> aliases behind), though that is cosmetic.

Ok, there must be something wrong with me, I still don't understand what
you are talking about.

What "MSI receiver" ? What physical IRQ line are you talking about ? How
is the kernel involved ?

The only cases I can think of are the association between a virtual
interrupt (ie, an interrupt in the guest interrupt number space) and an
in-kernel source for that interrupt, ie, vhost and PCI pass-through
essentially.

Anything else is under qemu control. IE. MSIs or LSIs generated by
emulated devices are just normal interrupt that go through our ioctl to
trigger the in-kernel source with the same number.

I don't see any "routing" happening anywhere in that picture really. The
firmware calls done by the guest to change the target of interrupts
(which CPU/presentation controller to direct a given interrupt to) are
handled entirely in the kernel in platform specific code and update our
internal ICS state.

> > 
> > I might just miss some subtleties here but so far I haven't been able to
> > figure out how to "shoehorn" our stuff in the very x86-centric existing
> > interfaces to the kernel APICs. In fact all that code is in a generic
> > location in kvm but is really x86/ia64 centric and the interfaces seem
> > to be as well.
> 
> That's not true in general, though you surely find a lot of traces and
> still a few concrete x86 bits under virt/kvm.

Well, I haven't found anything in virt/kvm/irq_comm.c that was of any
use to us. Again, I might be sufferring from a major misunderstanding
here but as far as I can tell, the model is totally different. Besides,
that file has a hard coded list of what looks like completely x86
specific mappings between "GSI" and "interrupt numbers" (again I don't
understand completely the distinction and I don't think we have anything
like it on power).

Ben.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-26 10:44                               ` Benjamin Herrenschmidt
@ 2012-10-26 11:17                                 ` Peter Maydell
  -1 siblings, 0 replies; 102+ messages in thread
From: Peter Maydell @ 2012-10-26 11:17 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Paolo Bonzini, Jan Kiszka, Paul Mackerras, Christoffer Dall,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On 26 October 2012 11:44, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Fri, 2012-10-26 at 12:15 +0200, Paolo Bonzini wrote:
>> QEMU's MSI-X routing is not x86-specific, so it should use the same
>> KVM_SET_GSI_ROUTING ioctl that x86 uses.
>
> Well, that's the thing, I haven't managed to figure that out so far, it
> looks very x86-specific to me. To begin with there's no such thing as a
> "GSI" in our world.

This was roughly the feeling I had looking at these APIs. There
might be some underlying generic concept but there is a definite
tendency for the surface representation to use x86 specific
terminology to the extent that you can't tell whether an API
is x86 specific or merely apparently so...

-- PMM

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-26 11:17                                 ` Peter Maydell
  0 siblings, 0 replies; 102+ messages in thread
From: Peter Maydell @ 2012-10-26 11:17 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Paolo Bonzini, Jan Kiszka, Paul Mackerras, Christoffer Dall,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On 26 October 2012 11:44, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Fri, 2012-10-26 at 12:15 +0200, Paolo Bonzini wrote:
>> QEMU's MSI-X routing is not x86-specific, so it should use the same
>> KVM_SET_GSI_ROUTING ioctl that x86 uses.
>
> Well, that's the thing, I haven't managed to figure that out so far, it
> looks very x86-specific to me. To begin with there's no such thing as a
> "GSI" in our world.

This was roughly the feeling I had looking at these APIs. There
might be some underlying generic concept but there is a definite
tendency for the surface representation to use x86 specific
terminology to the extent that you can't tell whether an API
is x86 specific or merely apparently so...

-- PMM

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-26 11:17                                 ` Peter Maydell
@ 2012-10-26 11:39                                   ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 102+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-26 11:39 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Paolo Bonzini, Jan Kiszka, Paul Mackerras, Christoffer Dall,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On Fri, 2012-10-26 at 12:17 +0100, Peter Maydell wrote:
> > Well, that's the thing, I haven't managed to figure that out so far,
> it
> > looks very x86-specific to me. To begin with there's no such thing
> as a
> > "GSI" in our world.
> 
> This was roughly the feeling I had looking at these APIs. There
> might be some underlying generic concept but there is a definite
> tendency for the surface representation to use x86 specific
> terminology to the extent that you can't tell whether an API
> is x86 specific or merely apparently so...

Right. Which is why I'm sure I'm actually missing something there :-)
And I'm hoping Paolo and Jan will help shed some light.

It might help if somebody could explain a bit more what a GSI is in x86
land and how it relates to the various APICs, along with what exactly
they mean by "routing" , ie. what are the different elements that get
associated. Basically, if somebody could describe how the x86 stuff
works, that might help.

>From my view of things, we have various "sources" of interrupts. On my
list are emulated device LSIs, emulated device MSIs, both in qemu, then
vhost and finally pass-through, I suppose on some platforms IPIs come in
as well though. Those "sources" need, one way or another, to hit a
source controller which will then itself, in a platform specific way,
shoot the interrupt to a presentation controller.

The routing between source and presentation controllers is fairly
platform specific as far as I can tell even within a given CPU family.
Ie the way an OpenPIC (aka MPIC, used on macs) does it is different than
the way the XICS system does it on pseries, and is different from most
embedded stuff (which typically doesn't have that source/presentation
distinction but just cascaded dumber PICs). The amount of
configurability, the type of configuration information etc... that
governs such a layout is also very specific to the platform and the type
of interrupt controller system used on it.

Remains the "routing" between source of "events" and actual "inputs" to
a source controller.

This too doesn't seem totally obvious to generalize. For example an
embedded platform with a bunch of cascaded dumb interrupt controllers
doesn't have a concept of a flat number space in HW, an interrupt
"input" to be identified properly, needs to identify the controller and
the interrupt within that controller. However, within KVM/qemu, it's
pretty easy to assign to each controller a number and by collating the
two, get some kind of flat space, though it's not arbitrary and the
routing is thus fairly constrained if not totally fixed.

In the pseries case, the global number is split in two bit fields, the
BUID identifying the specific source controller and the source within
that controller. Here too it's fairly fixed though. So the ioctl we use
to create a source controller in the kernel takes the BUID as an
argument, and from there the kernel will "find" the right source
controller based solely on the interrupt number.

So basically on one side we have a global interrupt number that
identifies an "input", I assume that's what x86 calls a GSI ?

Remains how to associate the various sources of interrupts to that
'global number'... and that is fairly specific to each source type isn't
it ?

In our current powerpc code, the emulated devices toggle the qirq which
ends up shooting an ioctl to set/reset or "message" (for MSIs) the
corresponding global interrupt. The mapping is established entirely
within qemu, we just tell the kernel to trigger a given interrupt.

We haven't really sorted vhost out yet so I'm not sure how that will
work out but the idea would be to have an ioctl to associate an eventfd
or whatever vhost uses as interrupt "outputs" with a global interrupt
number.

For pass-through, currently our VFIO is dumb, interrupts get to qemu
which then shoots them back to the kernel using the standard qirq stuff
used by emulated devices. Here I suppose we would want something similar
to vhost to associate the VFIO irq fd with a "global number".

Is that what the existing ioctl's provide ? Their semantics aren't
totally obvious to me.

Note that for pass-through at least, and possibly for vhost, we'd like
to actually totally bypass the irqfd & eventfd stuff for performance
reasons. At least for VFIO, if we are going to get the max performance
out of it, we need to take all generic code out of the picture. IE. If
the interrupts are routed to the physical CPU where the guest is
running, we want to be able to catch and distribute the interrupts to
the guest entirely within guest context, ie, with KVM arch specific low
level code that runs in "real mode" (ie MMU off) without context
switching the MMU back to the host, which on POWER is fairly costly.

That means that at least the association between a guest global
interrupt number and a host global interrupt number for pass-through
will be something that goes entirely through arch specific code path. We
might still be able to use generic APIs to establish it if they are
suitable though.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-26 11:39                                   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 102+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-26 11:39 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Paolo Bonzini, Jan Kiszka, Paul Mackerras, Christoffer Dall,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On Fri, 2012-10-26 at 12:17 +0100, Peter Maydell wrote:
> > Well, that's the thing, I haven't managed to figure that out so far,
> it
> > looks very x86-specific to me. To begin with there's no such thing
> as a
> > "GSI" in our world.
> 
> This was roughly the feeling I had looking at these APIs. There
> might be some underlying generic concept but there is a definite
> tendency for the surface representation to use x86 specific
> terminology to the extent that you can't tell whether an API
> is x86 specific or merely apparently so...

Right. Which is why I'm sure I'm actually missing something there :-)
And I'm hoping Paolo and Jan will help shed some light.

It might help if somebody could explain a bit more what a GSI is in x86
land and how it relates to the various APICs, along with what exactly
they mean by "routing" , ie. what are the different elements that get
associated. Basically, if somebody could describe how the x86 stuff
works, that might help.

From my view of things, we have various "sources" of interrupts. On my
list are emulated device LSIs, emulated device MSIs, both in qemu, then
vhost and finally pass-through, I suppose on some platforms IPIs come in
as well though. Those "sources" need, one way or another, to hit a
source controller which will then itself, in a platform specific way,
shoot the interrupt to a presentation controller.

The routing between source and presentation controllers is fairly
platform specific as far as I can tell even within a given CPU family.
Ie the way an OpenPIC (aka MPIC, used on macs) does it is different than
the way the XICS system does it on pseries, and is different from most
embedded stuff (which typically doesn't have that source/presentation
distinction but just cascaded dumber PICs). The amount of
configurability, the type of configuration information etc... that
governs such a layout is also very specific to the platform and the type
of interrupt controller system used on it.

Remains the "routing" between source of "events" and actual "inputs" to
a source controller.

This too doesn't seem totally obvious to generalize. For example an
embedded platform with a bunch of cascaded dumb interrupt controllers
doesn't have a concept of a flat number space in HW, an interrupt
"input" to be identified properly, needs to identify the controller and
the interrupt within that controller. However, within KVM/qemu, it's
pretty easy to assign to each controller a number and by collating the
two, get some kind of flat space, though it's not arbitrary and the
routing is thus fairly constrained if not totally fixed.

In the pseries case, the global number is split in two bit fields, the
BUID identifying the specific source controller and the source within
that controller. Here too it's fairly fixed though. So the ioctl we use
to create a source controller in the kernel takes the BUID as an
argument, and from there the kernel will "find" the right source
controller based solely on the interrupt number.

So basically on one side we have a global interrupt number that
identifies an "input", I assume that's what x86 calls a GSI ?

Remains how to associate the various sources of interrupts to that
'global number'... and that is fairly specific to each source type isn't
it ?

In our current powerpc code, the emulated devices toggle the qirq which
ends up shooting an ioctl to set/reset or "message" (for MSIs) the
corresponding global interrupt. The mapping is established entirely
within qemu, we just tell the kernel to trigger a given interrupt.

We haven't really sorted vhost out yet so I'm not sure how that will
work out but the idea would be to have an ioctl to associate an eventfd
or whatever vhost uses as interrupt "outputs" with a global interrupt
number.

For pass-through, currently our VFIO is dumb, interrupts get to qemu
which then shoots them back to the kernel using the standard qirq stuff
used by emulated devices. Here I suppose we would want something similar
to vhost to associate the VFIO irq fd with a "global number".

Is that what the existing ioctl's provide ? Their semantics aren't
totally obvious to me.

Note that for pass-through at least, and possibly for vhost, we'd like
to actually totally bypass the irqfd & eventfd stuff for performance
reasons. At least for VFIO, if we are going to get the max performance
out of it, we need to take all generic code out of the picture. IE. If
the interrupts are routed to the physical CPU where the guest is
running, we want to be able to catch and distribute the interrupts to
the guest entirely within guest context, ie, with KVM arch specific low
level code that runs in "real mode" (ie MMU off) without context
switching the MMU back to the host, which on POWER is fairly costly.

That means that at least the association between a guest global
interrupt number and a host global interrupt number for pass-through
will be something that goes entirely through arch specific code path. We
might still be able to use generic APIs to establish it if they are
suitable though.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-26 10:47                               ` Benjamin Herrenschmidt
@ 2012-10-26 11:47                                 ` Paolo Bonzini
  -1 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-26 11:47 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jan Kiszka, Paul Mackerras, Christoffer Dall, Alexander Graf,
	kvmarm, kvm, kvm-ppc, Peter Maydell

[snipping some parts that Jan answered about already]

Il 26/10/2012 12:47, Benjamin Herrenschmidt ha scritto:
> Or do you mean the routing configured by the user ? IE. Affinity ? If
> yes, then that's indeed what the 64-bit per source is. Each interrupt
> source has some state including the configured target presentation
> controller (plus associated link info for distributed interrupts), a
> priority setting, and some internal state bits that need to be preserved
> in the case of migration.

Yes, that's pretty much the contents of the IOAPIC redirection table.
x86 has more stuff such as the polarity (low/high), masking, triggering
mode (edge/level), etc., but the main thing is the destination and vector.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-26 11:47                                 ` Paolo Bonzini
  0 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-26 11:47 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jan Kiszka, Paul Mackerras, Christoffer Dall, Alexander Graf,
	kvmarm, kvm, kvm-ppc, Peter Maydell

[snipping some parts that Jan answered about already]

Il 26/10/2012 12:47, Benjamin Herrenschmidt ha scritto:
> Or do you mean the routing configured by the user ? IE. Affinity ? If
> yes, then that's indeed what the 64-bit per source is. Each interrupt
> source has some state including the configured target presentation
> controller (plus associated link info for distributed interrupts), a
> priority setting, and some internal state bits that need to be preserved
> in the case of migration.

Yes, that's pretty much the contents of the IOAPIC redirection table.
x86 has more stuff such as the polarity (low/high), masking, triggering
mode (edge/level), etc., but the main thing is the destination and vector.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-26 11:09                                   ` Benjamin Herrenschmidt
@ 2012-10-26 11:57                                     ` Paolo Bonzini
  -1 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-26 11:57 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jan Kiszka, Peter Maydell, Paul Mackerras, Christoffer Dall,
	Alexander Graf, kvmarm, kvm, kvm-ppc

Il 26/10/2012 13:09, Benjamin Herrenschmidt ha scritto:
> The only cases I can think of are the association between a virtual
> interrupt (ie, an interrupt in the guest interrupt number space) and an
> in-kernel source for that interrupt, ie, vhost and PCI pass-through
> essentially.

If you exclude old-style PCI pass-through and limit yourself to vhost
and VFIO, you can treat irqfd as "the" in-kernel source of the
interrupt.  Then you need a mapping between MSIs and numbers used in
KVM_IRQFD ("GSIs").

This is what KVM_SET_GSI_ROUTING modifies, and basically the mapping is
modified every time a vector is masked/unmasked in the MSI-X table.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-26 11:57                                     ` Paolo Bonzini
  0 siblings, 0 replies; 102+ messages in thread
From: Paolo Bonzini @ 2012-10-26 11:57 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jan Kiszka, Peter Maydell, Paul Mackerras, Christoffer Dall,
	Alexander Graf, kvmarm, kvm, kvm-ppc

Il 26/10/2012 13:09, Benjamin Herrenschmidt ha scritto:
> The only cases I can think of are the association between a virtual
> interrupt (ie, an interrupt in the guest interrupt number space) and an
> in-kernel source for that interrupt, ie, vhost and PCI pass-through
> essentially.

If you exclude old-style PCI pass-through and limit yourself to vhost
and VFIO, you can treat irqfd as "the" in-kernel source of the
interrupt.  Then you need a mapping between MSIs and numbers used in
KVM_IRQFD ("GSIs").

This is what KVM_SET_GSI_ROUTING modifies, and basically the mapping is
modified every time a vector is masked/unmasked in the MSI-X table.

Paolo

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-26 11:57                                     ` Paolo Bonzini
@ 2012-10-26 12:08                                       ` Peter Maydell
  -1 siblings, 0 replies; 102+ messages in thread
From: Peter Maydell @ 2012-10-26 12:08 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Benjamin Herrenschmidt, Jan Kiszka, Paul Mackerras,
	Christoffer Dall, Alexander Graf, kvmarm, kvm, kvm-ppc

On 26 October 2012 12:57, Paolo Bonzini <pbonzini@redhat.com> wrote:
> If you exclude old-style PCI pass-through and limit yourself to vhost
> and VFIO, you can treat irqfd as "the" in-kernel source of the
> interrupt.  Then you need a mapping between MSIs and numbers used in
> KVM_IRQFD ("GSIs").
>
> This is what KVM_SET_GSI_ROUTING modifies, and basically the mapping is
> modified every time a vector is masked/unmasked in the MSI-X table.

So SET_GSI_ROUTING sets the routing for MSIs? Very logical...

-- PMM

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-26 12:08                                       ` Peter Maydell
  0 siblings, 0 replies; 102+ messages in thread
From: Peter Maydell @ 2012-10-26 12:08 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Benjamin Herrenschmidt, Jan Kiszka, Paul Mackerras,
	Christoffer Dall, Alexander Graf, kvmarm, kvm, kvm-ppc

On 26 October 2012 12:57, Paolo Bonzini <pbonzini@redhat.com> wrote:
> If you exclude old-style PCI pass-through and limit yourself to vhost
> and VFIO, you can treat irqfd as "the" in-kernel source of the
> interrupt.  Then you need a mapping between MSIs and numbers used in
> KVM_IRQFD ("GSIs").
>
> This is what KVM_SET_GSI_ROUTING modifies, and basically the mapping is
> modified every time a vector is masked/unmasked in the MSI-X table.

So SET_GSI_ROUTING sets the routing for MSIs? Very logical...

-- PMM

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-26 11:39                                   ` Benjamin Herrenschmidt
@ 2012-10-26 12:39                                     ` Jan Kiszka
  -1 siblings, 0 replies; 102+ messages in thread
From: Jan Kiszka @ 2012-10-26 12:39 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Peter Maydell, Paolo Bonzini, Paul Mackerras, Christoffer Dall,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On 2012-10-26 13:39, Benjamin Herrenschmidt wrote:
> On Fri, 2012-10-26 at 12:17 +0100, Peter Maydell wrote:
>>> Well, that's the thing, I haven't managed to figure that out so far,
>> it
>>> looks very x86-specific to me. To begin with there's no such thing
>> as a
>>> "GSI" in our world.
>>
>> This was roughly the feeling I had looking at these APIs. There
>> might be some underlying generic concept but there is a definite
>> tendency for the surface representation to use x86 specific
>> terminology to the extent that you can't tell whether an API
>> is x86 specific or merely apparently so...
> 
> Right. Which is why I'm sure I'm actually missing something there :-)
> And I'm hoping Paolo and Jan will help shed some light.
> 
> It might help if somebody could explain a bit more what a GSI is in x86
> land and how it relates to the various APICs, along with what exactly
> they mean by "routing" , ie. what are the different elements that get
> associated. Basically, if somebody could describe how the x86 stuff
> works, that might help.
> 
> From my view of things, we have various "sources" of interrupts. On my
> list are emulated device LSIs, emulated device MSIs, both in qemu, then
> vhost and finally pass-through, I suppose on some platforms IPIs come in
> as well though. Those "sources" need, one way or another, to hit a
> source controller which will then itself, in a platform specific way,
> shoot the interrupt to a presentation controller.
> 
> The routing between source and presentation controllers is fairly
> platform specific as far as I can tell even within a given CPU family.
> Ie the way an OpenPIC (aka MPIC, used on macs) does it is different than
> the way the XICS system does it on pseries, and is different from most
> embedded stuff (which typically doesn't have that source/presentation
> distinction but just cascaded dumber PICs). The amount of
> configurability, the type of configuration information etc... that
> governs such a layout is also very specific to the platform and the type
> of interrupt controller system used on it.

But we are just talking about sending messages from A to B or soldering
an input to an output pin. That's pretty generic. Give each output event
a virtual IRQ number and define where its output "line" should be linked
to (input pin of target controller). All what will be specific are the
IDs of those controllers.

Of course, all that provided you do their emulation in kernel space. For
x86, that even makes sense when the IRQ sources are in user space as the
guest may still have to interact during IRQ delivery with IOAPIC, thus
we save some costly heavy-weight exits when putting it in the kernel.

> 
> Remains the "routing" between source of "events" and actual "inputs" to
> a source controller.
> 
> This too doesn't seem totally obvious to generalize. For example an
> embedded platform with a bunch of cascaded dumb interrupt controllers
> doesn't have a concept of a flat number space in HW, an interrupt
> "input" to be identified properly, needs to identify the controller and
> the interrupt within that controller. However, within KVM/qemu, it's
> pretty easy to assign to each controller a number and by collating the
> two, get some kind of flat space, though it's not arbitrary and the
> routing is thus fairly constrained if not totally fixed.

IRQ routing entry:
 - virq number ("gsi")
 - type (controller ID, MSI, whatever you like)
 - some flags (to extend it)
 - type-specific data (MSI message, controller input pin, etc.)

And there can be multiple entries with the same virq, thus you can
deliver to multiple targets. I bet you can model quite a lot of your
platform specific routing this way. I'm not saying our generic code will
work out of the box, but at least the interfaces and concepts are there.

> 
> In the pseries case, the global number is split in two bit fields, the
> BUID identifying the specific source controller and the source within
> that controller. Here too it's fairly fixed though. So the ioctl we use
> to create a source controller in the kernel takes the BUID as an
> argument, and from there the kernel will "find" the right source
> controller based solely on the interrupt number.
> 
> So basically on one side we have a global interrupt number that
> identifies an "input", I assume that's what x86 calls a GSI ?

Right. The virtual IRQ numbers we call "GSI" is partially occupied by
the actual x86-GSIs (0..n, with n=23 so far), directed to the IOAPIC and
PIC there, and then followed by IRQs that are mapped on MSI messages.
But that's just how we _use_ it on x86, not how it has to work for other
archs.

> 
> Remains how to associate the various sources of interrupts to that
> 'global number'... and that is fairly specific to each source type isn't
> it ?
> 
> In our current powerpc code, the emulated devices toggle the qirq which
> ends up shooting an ioctl to set/reset or "message" (for MSIs) the
> corresponding global interrupt. The mapping is established entirely
> within qemu, we just tell the kernel to trigger a given interrupt.
> 
> We haven't really sorted vhost out yet so I'm not sure how that will
> work out but the idea would be to have an ioctl to associate an eventfd
> or whatever vhost uses as interrupt "outputs" with a global interrupt
> number.

KVM_IRQFD is already there. It associates an irqfd file descriptor with
a virtual IRQ. Once that triggers, the IRQ routing table is used to
define the actual interrupt type and destination chip to use, see above.

> 
> For pass-through, currently our VFIO is dumb, interrupts get to qemu
> which then shoots them back to the kernel using the standard qirq stuff
> used by emulated devices. Here I suppose we would want something similar
> to vhost to associate the VFIO irq fd with a "global number".
> 
> Is that what the existing ioctl's provide ? Their semantics aren't
> totally obvious to me.

Provided you want to trigger a MSI message, you first need to register
it via kvm_irqchip_add_msi_route (will trigger KVM_SET_GSI_ROUTING).
That will give you a virtual IRQ number which can be associate with an
irqfd file descriptor as explained above (KVM_IRQFD). But you may also
create a different kind of routing table entry if MSI is not all you
need to inject via irqfd. Could be a plain IRQ line as well, routed to a
specific in-kernel IRQ controller model.

> 
> Note that for pass-through at least, and possibly for vhost, we'd like
> to actually totally bypass the irqfd & eventfd stuff for performance
> reasons. At least for VFIO, if we are going to get the max performance
> out of it, we need to take all generic code out of the picture. IE. If
> the interrupts are routed to the physical CPU where the guest is
> running, we want to be able to catch and distribute the interrupts to
> the guest entirely within guest context, ie, with KVM arch specific low
> level code that runs in "real mode" (ie MMU off) without context
> switching the MMU back to the host, which on POWER is fairly costly.
> 
> That means that at least the association between a guest global
> interrupt number and a host global interrupt number for pass-through
> will be something that goes entirely through arch specific code path. We
> might still be able to use generic APIs to establish it if they are
> suitable though.

The same will happen on x86: direct injection to a target VCPU. Maybe
again a topic for our IRQ routing table, just with specialized target types.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-26 12:39                                     ` Jan Kiszka
  0 siblings, 0 replies; 102+ messages in thread
From: Jan Kiszka @ 2012-10-26 12:39 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Peter Maydell, Paolo Bonzini, Paul Mackerras, Christoffer Dall,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On 2012-10-26 13:39, Benjamin Herrenschmidt wrote:
> On Fri, 2012-10-26 at 12:17 +0100, Peter Maydell wrote:
>>> Well, that's the thing, I haven't managed to figure that out so far,
>> it
>>> looks very x86-specific to me. To begin with there's no such thing
>> as a
>>> "GSI" in our world.
>>
>> This was roughly the feeling I had looking at these APIs. There
>> might be some underlying generic concept but there is a definite
>> tendency for the surface representation to use x86 specific
>> terminology to the extent that you can't tell whether an API
>> is x86 specific or merely apparently so...
> 
> Right. Which is why I'm sure I'm actually missing something there :-)
> And I'm hoping Paolo and Jan will help shed some light.
> 
> It might help if somebody could explain a bit more what a GSI is in x86
> land and how it relates to the various APICs, along with what exactly
> they mean by "routing" , ie. what are the different elements that get
> associated. Basically, if somebody could describe how the x86 stuff
> works, that might help.
> 
> From my view of things, we have various "sources" of interrupts. On my
> list are emulated device LSIs, emulated device MSIs, both in qemu, then
> vhost and finally pass-through, I suppose on some platforms IPIs come in
> as well though. Those "sources" need, one way or another, to hit a
> source controller which will then itself, in a platform specific way,
> shoot the interrupt to a presentation controller.
> 
> The routing between source and presentation controllers is fairly
> platform specific as far as I can tell even within a given CPU family.
> Ie the way an OpenPIC (aka MPIC, used on macs) does it is different than
> the way the XICS system does it on pseries, and is different from most
> embedded stuff (which typically doesn't have that source/presentation
> distinction but just cascaded dumber PICs). The amount of
> configurability, the type of configuration information etc... that
> governs such a layout is also very specific to the platform and the type
> of interrupt controller system used on it.

But we are just talking about sending messages from A to B or soldering
an input to an output pin. That's pretty generic. Give each output event
a virtual IRQ number and define where its output "line" should be linked
to (input pin of target controller). All what will be specific are the
IDs of those controllers.

Of course, all that provided you do their emulation in kernel space. For
x86, that even makes sense when the IRQ sources are in user space as the
guest may still have to interact during IRQ delivery with IOAPIC, thus
we save some costly heavy-weight exits when putting it in the kernel.

> 
> Remains the "routing" between source of "events" and actual "inputs" to
> a source controller.
> 
> This too doesn't seem totally obvious to generalize. For example an
> embedded platform with a bunch of cascaded dumb interrupt controllers
> doesn't have a concept of a flat number space in HW, an interrupt
> "input" to be identified properly, needs to identify the controller and
> the interrupt within that controller. However, within KVM/qemu, it's
> pretty easy to assign to each controller a number and by collating the
> two, get some kind of flat space, though it's not arbitrary and the
> routing is thus fairly constrained if not totally fixed.

IRQ routing entry:
 - virq number ("gsi")
 - type (controller ID, MSI, whatever you like)
 - some flags (to extend it)
 - type-specific data (MSI message, controller input pin, etc.)

And there can be multiple entries with the same virq, thus you can
deliver to multiple targets. I bet you can model quite a lot of your
platform specific routing this way. I'm not saying our generic code will
work out of the box, but at least the interfaces and concepts are there.

> 
> In the pseries case, the global number is split in two bit fields, the
> BUID identifying the specific source controller and the source within
> that controller. Here too it's fairly fixed though. So the ioctl we use
> to create a source controller in the kernel takes the BUID as an
> argument, and from there the kernel will "find" the right source
> controller based solely on the interrupt number.
> 
> So basically on one side we have a global interrupt number that
> identifies an "input", I assume that's what x86 calls a GSI ?

Right. The virtual IRQ numbers we call "GSI" is partially occupied by
the actual x86-GSIs (0..n, with n# so far), directed to the IOAPIC and
PIC there, and then followed by IRQs that are mapped on MSI messages.
But that's just how we _use_ it on x86, not how it has to work for other
archs.

> 
> Remains how to associate the various sources of interrupts to that
> 'global number'... and that is fairly specific to each source type isn't
> it ?
> 
> In our current powerpc code, the emulated devices toggle the qirq which
> ends up shooting an ioctl to set/reset or "message" (for MSIs) the
> corresponding global interrupt. The mapping is established entirely
> within qemu, we just tell the kernel to trigger a given interrupt.
> 
> We haven't really sorted vhost out yet so I'm not sure how that will
> work out but the idea would be to have an ioctl to associate an eventfd
> or whatever vhost uses as interrupt "outputs" with a global interrupt
> number.

KVM_IRQFD is already there. It associates an irqfd file descriptor with
a virtual IRQ. Once that triggers, the IRQ routing table is used to
define the actual interrupt type and destination chip to use, see above.

> 
> For pass-through, currently our VFIO is dumb, interrupts get to qemu
> which then shoots them back to the kernel using the standard qirq stuff
> used by emulated devices. Here I suppose we would want something similar
> to vhost to associate the VFIO irq fd with a "global number".
> 
> Is that what the existing ioctl's provide ? Their semantics aren't
> totally obvious to me.

Provided you want to trigger a MSI message, you first need to register
it via kvm_irqchip_add_msi_route (will trigger KVM_SET_GSI_ROUTING).
That will give you a virtual IRQ number which can be associate with an
irqfd file descriptor as explained above (KVM_IRQFD). But you may also
create a different kind of routing table entry if MSI is not all you
need to inject via irqfd. Could be a plain IRQ line as well, routed to a
specific in-kernel IRQ controller model.

> 
> Note that for pass-through at least, and possibly for vhost, we'd like
> to actually totally bypass the irqfd & eventfd stuff for performance
> reasons. At least for VFIO, if we are going to get the max performance
> out of it, we need to take all generic code out of the picture. IE. If
> the interrupts are routed to the physical CPU where the guest is
> running, we want to be able to catch and distribute the interrupts to
> the guest entirely within guest context, ie, with KVM arch specific low
> level code that runs in "real mode" (ie MMU off) without context
> switching the MMU back to the host, which on POWER is fairly costly.
> 
> That means that at least the association between a guest global
> interrupt number and a host global interrupt number for pass-through
> will be something that goes entirely through arch specific code path. We
> might still be able to use generic APIs to establish it if they are
> suitable though.

The same will happen on x86: direct injection to a target VCPU. Maybe
again a topic for our IRQ routing table, just with specialized target types.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-26 12:08                                       ` Peter Maydell
@ 2012-10-26 12:41                                         ` Jan Kiszka
  -1 siblings, 0 replies; 102+ messages in thread
From: Jan Kiszka @ 2012-10-26 12:41 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Paolo Bonzini, Benjamin Herrenschmidt, Paul Mackerras,
	Christoffer Dall, Alexander Graf, kvmarm, kvm, kvm-ppc

On 2012-10-26 14:08, Peter Maydell wrote:
> On 26 October 2012 12:57, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> If you exclude old-style PCI pass-through and limit yourself to vhost
>> and VFIO, you can treat irqfd as "the" in-kernel source of the
>> interrupt.  Then you need a mapping between MSIs and numbers used in
>> KVM_IRQFD ("GSIs").
>>
>> This is what KVM_SET_GSI_ROUTING modifies, and basically the mapping is
>> modified every time a vector is masked/unmasked in the MSI-X table.
> 
> So SET_GSI_ROUTING sets the routing for MSIs? Very logical...

See my reply to Ben: It is used for MSIs as well, but not only. The
concept is absolutely generic, you just need to define specific target
types and provide ways to associate specific sources with a virtual IRQ
number ("GSI").

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-26 12:41                                         ` Jan Kiszka
  0 siblings, 0 replies; 102+ messages in thread
From: Jan Kiszka @ 2012-10-26 12:41 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Paolo Bonzini, Benjamin Herrenschmidt, Paul Mackerras,
	Christoffer Dall, Alexander Graf, kvmarm, kvm, kvm-ppc

On 2012-10-26 14:08, Peter Maydell wrote:
> On 26 October 2012 12:57, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> If you exclude old-style PCI pass-through and limit yourself to vhost
>> and VFIO, you can treat irqfd as "the" in-kernel source of the
>> interrupt.  Then you need a mapping between MSIs and numbers used in
>> KVM_IRQFD ("GSIs").
>>
>> This is what KVM_SET_GSI_ROUTING modifies, and basically the mapping is
>> modified every time a vector is masked/unmasked in the MSI-X table.
> 
> So SET_GSI_ROUTING sets the routing for MSIs? Very logical...

See my reply to Ben: It is used for MSIs as well, but not only. The
concept is absolutely generic, you just need to define specific target
types and provide ways to associate specific sources with a virtual IRQ
number ("GSI").

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-26 11:57                                     ` Paolo Bonzini
@ 2012-10-26 20:21                                       ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 102+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-26 20:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jan Kiszka, Peter Maydell, Paul Mackerras, Christoffer Dall,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On Fri, 2012-10-26 at 13:57 +0200, Paolo Bonzini wrote:
> Il 26/10/2012 13:09, Benjamin Herrenschmidt ha scritto:
> > The only cases I can think of are the association between a virtual
> > interrupt (ie, an interrupt in the guest interrupt number space) and an
> > in-kernel source for that interrupt, ie, vhost and PCI pass-through
> > essentially.
> 
> If you exclude old-style PCI pass-through and limit yourself to vhost
> and VFIO, you can treat irqfd as "the" in-kernel source of the
> interrupt.  Then you need a mapping between MSIs and numbers used in
> KVM_IRQFD ("GSIs").

Argh. Ok, I get that we need a mapping between an irqfd and a global
number, I don't see where MSIs come into the picture at all here. At
least for us they don't.

> This is what KVM_SET_GSI_ROUTING modifies, and basically the mapping is
> modified every time a vector is masked/unmasked in the MSI-X table.

Right and that doesn't make sense for us. In fact I don't understand how
it makes sense for x86 either, but that's because I don't understand how
the APIC works I suppose.

Ben.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-26 20:21                                       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 102+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-26 20:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jan Kiszka, Peter Maydell, Paul Mackerras, Christoffer Dall,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On Fri, 2012-10-26 at 13:57 +0200, Paolo Bonzini wrote:
> Il 26/10/2012 13:09, Benjamin Herrenschmidt ha scritto:
> > The only cases I can think of are the association between a virtual
> > interrupt (ie, an interrupt in the guest interrupt number space) and an
> > in-kernel source for that interrupt, ie, vhost and PCI pass-through
> > essentially.
> 
> If you exclude old-style PCI pass-through and limit yourself to vhost
> and VFIO, you can treat irqfd as "the" in-kernel source of the
> interrupt.  Then you need a mapping between MSIs and numbers used in
> KVM_IRQFD ("GSIs").

Argh. Ok, I get that we need a mapping between an irqfd and a global
number, I don't see where MSIs come into the picture at all here. At
least for us they don't.

> This is what KVM_SET_GSI_ROUTING modifies, and basically the mapping is
> modified every time a vector is masked/unmasked in the MSI-X table.

Right and that doesn't make sense for us. In fact I don't understand how
it makes sense for x86 either, but that's because I don't understand how
the APIC works I suppose.

Ben.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-26 12:39                                     ` Jan Kiszka
@ 2012-10-26 20:45                                       ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 102+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-26 20:45 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Peter Maydell, Paolo Bonzini, Paul Mackerras, Christoffer Dall,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On Fri, 2012-10-26 at 14:39 +0200, Jan Kiszka wrote:

> But we are just talking about sending messages from A to B or soldering
> an input to an output pin. That's pretty generic. Give each output event
> a virtual IRQ number and define where its output "line" should be linked
> to (input pin of target controller). All what will be specific are the
> IDs of those controllers.

Hrm you seem to be saying something very different from Paolo here.
Unless it's just a very very confused terminology.

So let's see the powerpc "pseries" case. Things like embedded etc...
might be quite different.

We have essentially two "outputs" here. One is qemu itself shooting
interrupts (emulated devices, virtio, etc...). This is an ioctl and that
gives you a global interrupt number. So this goes directly to the source
controller which then uses it's internal logic to send that to the
presentation controller in ways that are entirely implementation
specific.

The specific source controller is located using the top bits of the
global interrupt number (the BUID). When we create source controllers,
we pass as argument to the ioctl the BUID for that source controller and
the number of interrupts it handles.

The other "output" is irqfd for kernel originated events. Here I assume
there's an in-kernel way to directly call a function rather than queue
something for qemu to consume later, anything else would be horribly
wasteful. Here too, what we need here is a global interrupt number, so
we can find the source controller by BUID and shoot it the interrupt.

So that's the only case I see where we need an association of some kind,
which is irqfs -> global number. I don't see where the "MSIs" that Paolo
keep talking about come into play. User space (emulated) MSIs are dealt
within qemu entirely and MSIs from VFIO end up as irqfd.

Finally there is the "routing" between a given interrupt source (an
entry in the source controller state table) and the target processor
(the corresponding presentation controller).

That routing is purely a field in the source controller field, which is
there along with the interrupt priority and a few state bits. (We don't
need to deal with level/edge because of the way the ICS work, we just
say at the time of the triggering of an interrupt whether it's a level
set, level reset, or message, and it will do the right thing).

This field is accessed (programmed) by the guest using a firmware
interface that is implemented in the kernel part of KVM. It's a platform
specific API and it accesses the source controller (it's implemented
three really). I don't see where any generic API here would make sense
other than maybe adding useless bloat.

The only place where qemu might "see" that stuff is for migration where
it needs to save all the state of all the sources and restore it on the
target.

The actual communication between source controllers and presentation
controllers is also entirely platform specific. It follows a somewhat
specified protocol (we mimmic what the HW actually does) and here too, I
see no room for anything generic. 

> Of course, all that provided you do their emulation in kernel space. For
> x86, that even makes sense when the IRQ sources are in user space as the
> guest may still have to interact during IRQ delivery with IOAPIC, thus
> we save some costly heavy-weight exits when putting it in the kernel.

We have a way to lower that cost. Since the interaction with the
presentation controller is done by hypervisor calls, we handle them
directly in real mode within the guest MMU context unless some
exceptional condition is hit (such as the need to trigger a resend from
one of the source controllers or an interrupt rejection).

> > 
> > Remains the "routing" between source of "events" and actual "inputs" to
> > a source controller.
> > 
> > This too doesn't seem totally obvious to generalize. For example an
> > embedded platform with a bunch of cascaded dumb interrupt controllers
> > doesn't have a concept of a flat number space in HW, an interrupt
> > "input" to be identified properly, needs to identify the controller and
> > the interrupt within that controller. However, within KVM/qemu, it's
> > pretty easy to assign to each controller a number and by collating the
> > two, get some kind of flat space, though it's not arbitrary and the
> > routing is thus fairly constrained if not totally fixed.
> 
> IRQ routing entry:
>  - virq number ("gsi")
>  - type (controller ID, MSI, whatever you like)

What is "controller ID" ? That doesn't mean anything to me. In our case,
the specific source controller is known from the virq number (the top
bits of it basically).

>  - some flags (to extend it)
>  - type-specific data (MSI message, controller input pin, etc.)

I don't understand that business about MSIs really. I suppose it has to
do with the way you do old-style device assignment ? Either MSIs come
from virtual/emulated devices in which case they are a qemu fiction and
qemu just sends us an ioctl with the virq number or they come from real
devices in which case they are setup normally by the host kernel using
host kernel MSI addresses, and we catch them via irqfd (or some platform
specific bypass that we might implement in the future).

We do have some per-interrupt data I mentioned earlier, the target
presentation controller (known as server ID, basically the HW CPU number
of the target) and a few bits of state. We only ever access it from qemu
for migration though.

> And there can be multiple entries with the same virq, thus you can
> deliver to multiple targets. I bet you can model quite a lot of your
> platform specific routing this way. I'm not saying our generic code will
> work out of the box, but at least the interfaces and concepts are there.

I don't see how we can model anything using that. Qemu doesn't actually
look or modify any of that state other than during migration anyway. We
do have a concept of delivery to multiple targets via a "link" mechanism
which allows an interrupt to bounce to another target within a "ring" if
the original target is busy (that's 2 bits of state) but this too is
configured via firmware interfaces that are handled entirely in the
kernel, not in qemu.

> > In the pseries case, the global number is split in two bit fields, the
> > BUID identifying the specific source controller and the source within
> > that controller. Here too it's fairly fixed though. So the ioctl we use
> > to create a source controller in the kernel takes the BUID as an
> > argument, and from there the kernel will "find" the right source
> > controller based solely on the interrupt number.
> > 
> > So basically on one side we have a global interrupt number that
> > identifies an "input", I assume that's what x86 calls a GSI ?
> 
> Right. The virtual IRQ numbers we call "GSI" is partially occupied by
> the actual x86-GSIs (0..n, with n=23 so far), directed to the IOAPIC and
> PIC there, and then followed by IRQs that are mapped on MSI messages.
> But that's just how we _use_ it on x86, not how it has to work for other
> archs.
> 
> > 
> > Remains how to associate the various sources of interrupts to that
> > 'global number'... and that is fairly specific to each source type isn't
> > it ?
> > 
> > In our current powerpc code, the emulated devices toggle the qirq which
> > ends up shooting an ioctl to set/reset or "message" (for MSIs) the
> > corresponding global interrupt. The mapping is established entirely
> > within qemu, we just tell the kernel to trigger a given interrupt.
> > 
> > We haven't really sorted vhost out yet so I'm not sure how that will
> > work out but the idea would be to have an ioctl to associate an eventfd
> > or whatever vhost uses as interrupt "outputs" with a global interrupt
> > number.
> 
> KVM_IRQFD is already there. It associates an irqfd file descriptor with
> a virtual IRQ. Once that triggers, the IRQ routing table is used to
> define the actual interrupt type and destination chip to use, see above.

We only need irqfd to virtual irq. The rests doesn't make sense to me
(the "routing table bit").

> > For pass-through, currently our VFIO is dumb, interrupts get to qemu
> > which then shoots them back to the kernel using the standard qirq stuff
> > used by emulated devices. Here I suppose we would want something similar
> > to vhost to associate the VFIO irq fd with a "global number".
> > 
> > Is that what the existing ioctl's provide ? Their semantics aren't
> > totally obvious to me.
> 
> Provided you want to trigger a MSI message, you first need to register
> it via kvm_irqchip_add_msi_route (will trigger KVM_SET_GSI_ROUTING).

Why ? Again, I don't get it.

> That will give you a virtual IRQ number which can be associate with an
> irqfd file descriptor as explained above (KVM_IRQFD). 

But virq numbers are entirely under qemu control. qemu creates the
source controllers, and assign the virq numbers to the devices. 

I really don't quite get how that concept of "GSI routing" means
anything for us. We certainly don't want to have the kernel return a
virq number to qemu.

That whole business with MSIs makes very little sense to me.

> But you may also
> create a different kind of routing table entry if MSI is not all you
> need to inject via irqfd. Could be a plain IRQ line as well, routed to a
> specific in-kernel IRQ controller model.

Sorry, I must be totally stupid or something but I don't understand.
Doesn't make sense to me.

> > Note that for pass-through at least, and possibly for vhost, we'd like
> > to actually totally bypass the irqfd & eventfd stuff for performance
> > reasons. At least for VFIO, if we are going to get the max performance
> > out of it, we need to take all generic code out of the picture. IE. If
> > the interrupts are routed to the physical CPU where the guest is
> > running, we want to be able to catch and distribute the interrupts to
> > the guest entirely within guest context, ie, with KVM arch specific low
> > level code that runs in "real mode" (ie MMU off) without context
> > switching the MMU back to the host, which on POWER is fairly costly.
> > 
> > That means that at least the association between a guest global
> > interrupt number and a host global interrupt number for pass-through
> > will be something that goes entirely through arch specific code path. We
> > might still be able to use generic APIs to establish it if they are
> > suitable though.
> 
> The same will happen on x86: direct injection to a target VCPU. Maybe
> again a topic for our IRQ routing table, just with specialized target types.

Ben.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-26 20:45                                       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 102+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-26 20:45 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Peter Maydell, Paolo Bonzini, Paul Mackerras, Christoffer Dall,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On Fri, 2012-10-26 at 14:39 +0200, Jan Kiszka wrote:

> But we are just talking about sending messages from A to B or soldering
> an input to an output pin. That's pretty generic. Give each output event
> a virtual IRQ number and define where its output "line" should be linked
> to (input pin of target controller). All what will be specific are the
> IDs of those controllers.

Hrm you seem to be saying something very different from Paolo here.
Unless it's just a very very confused terminology.

So let's see the powerpc "pseries" case. Things like embedded etc...
might be quite different.

We have essentially two "outputs" here. One is qemu itself shooting
interrupts (emulated devices, virtio, etc...). This is an ioctl and that
gives you a global interrupt number. So this goes directly to the source
controller which then uses it's internal logic to send that to the
presentation controller in ways that are entirely implementation
specific.

The specific source controller is located using the top bits of the
global interrupt number (the BUID). When we create source controllers,
we pass as argument to the ioctl the BUID for that source controller and
the number of interrupts it handles.

The other "output" is irqfd for kernel originated events. Here I assume
there's an in-kernel way to directly call a function rather than queue
something for qemu to consume later, anything else would be horribly
wasteful. Here too, what we need here is a global interrupt number, so
we can find the source controller by BUID and shoot it the interrupt.

So that's the only case I see where we need an association of some kind,
which is irqfs -> global number. I don't see where the "MSIs" that Paolo
keep talking about come into play. User space (emulated) MSIs are dealt
within qemu entirely and MSIs from VFIO end up as irqfd.

Finally there is the "routing" between a given interrupt source (an
entry in the source controller state table) and the target processor
(the corresponding presentation controller).

That routing is purely a field in the source controller field, which is
there along with the interrupt priority and a few state bits. (We don't
need to deal with level/edge because of the way the ICS work, we just
say at the time of the triggering of an interrupt whether it's a level
set, level reset, or message, and it will do the right thing).

This field is accessed (programmed) by the guest using a firmware
interface that is implemented in the kernel part of KVM. It's a platform
specific API and it accesses the source controller (it's implemented
three really). I don't see where any generic API here would make sense
other than maybe adding useless bloat.

The only place where qemu might "see" that stuff is for migration where
it needs to save all the state of all the sources and restore it on the
target.

The actual communication between source controllers and presentation
controllers is also entirely platform specific. It follows a somewhat
specified protocol (we mimmic what the HW actually does) and here too, I
see no room for anything generic. 

> Of course, all that provided you do their emulation in kernel space. For
> x86, that even makes sense when the IRQ sources are in user space as the
> guest may still have to interact during IRQ delivery with IOAPIC, thus
> we save some costly heavy-weight exits when putting it in the kernel.

We have a way to lower that cost. Since the interaction with the
presentation controller is done by hypervisor calls, we handle them
directly in real mode within the guest MMU context unless some
exceptional condition is hit (such as the need to trigger a resend from
one of the source controllers or an interrupt rejection).

> > 
> > Remains the "routing" between source of "events" and actual "inputs" to
> > a source controller.
> > 
> > This too doesn't seem totally obvious to generalize. For example an
> > embedded platform with a bunch of cascaded dumb interrupt controllers
> > doesn't have a concept of a flat number space in HW, an interrupt
> > "input" to be identified properly, needs to identify the controller and
> > the interrupt within that controller. However, within KVM/qemu, it's
> > pretty easy to assign to each controller a number and by collating the
> > two, get some kind of flat space, though it's not arbitrary and the
> > routing is thus fairly constrained if not totally fixed.
> 
> IRQ routing entry:
>  - virq number ("gsi")
>  - type (controller ID, MSI, whatever you like)

What is "controller ID" ? That doesn't mean anything to me. In our case,
the specific source controller is known from the virq number (the top
bits of it basically).

>  - some flags (to extend it)
>  - type-specific data (MSI message, controller input pin, etc.)

I don't understand that business about MSIs really. I suppose it has to
do with the way you do old-style device assignment ? Either MSIs come
from virtual/emulated devices in which case they are a qemu fiction and
qemu just sends us an ioctl with the virq number or they come from real
devices in which case they are setup normally by the host kernel using
host kernel MSI addresses, and we catch them via irqfd (or some platform
specific bypass that we might implement in the future).

We do have some per-interrupt data I mentioned earlier, the target
presentation controller (known as server ID, basically the HW CPU number
of the target) and a few bits of state. We only ever access it from qemu
for migration though.

> And there can be multiple entries with the same virq, thus you can
> deliver to multiple targets. I bet you can model quite a lot of your
> platform specific routing this way. I'm not saying our generic code will
> work out of the box, but at least the interfaces and concepts are there.

I don't see how we can model anything using that. Qemu doesn't actually
look or modify any of that state other than during migration anyway. We
do have a concept of delivery to multiple targets via a "link" mechanism
which allows an interrupt to bounce to another target within a "ring" if
the original target is busy (that's 2 bits of state) but this too is
configured via firmware interfaces that are handled entirely in the
kernel, not in qemu.

> > In the pseries case, the global number is split in two bit fields, the
> > BUID identifying the specific source controller and the source within
> > that controller. Here too it's fairly fixed though. So the ioctl we use
> > to create a source controller in the kernel takes the BUID as an
> > argument, and from there the kernel will "find" the right source
> > controller based solely on the interrupt number.
> > 
> > So basically on one side we have a global interrupt number that
> > identifies an "input", I assume that's what x86 calls a GSI ?
> 
> Right. The virtual IRQ numbers we call "GSI" is partially occupied by
> the actual x86-GSIs (0..n, with n# so far), directed to the IOAPIC and
> PIC there, and then followed by IRQs that are mapped on MSI messages.
> But that's just how we _use_ it on x86, not how it has to work for other
> archs.
> 
> > 
> > Remains how to associate the various sources of interrupts to that
> > 'global number'... and that is fairly specific to each source type isn't
> > it ?
> > 
> > In our current powerpc code, the emulated devices toggle the qirq which
> > ends up shooting an ioctl to set/reset or "message" (for MSIs) the
> > corresponding global interrupt. The mapping is established entirely
> > within qemu, we just tell the kernel to trigger a given interrupt.
> > 
> > We haven't really sorted vhost out yet so I'm not sure how that will
> > work out but the idea would be to have an ioctl to associate an eventfd
> > or whatever vhost uses as interrupt "outputs" with a global interrupt
> > number.
> 
> KVM_IRQFD is already there. It associates an irqfd file descriptor with
> a virtual IRQ. Once that triggers, the IRQ routing table is used to
> define the actual interrupt type and destination chip to use, see above.

We only need irqfd to virtual irq. The rests doesn't make sense to me
(the "routing table bit").

> > For pass-through, currently our VFIO is dumb, interrupts get to qemu
> > which then shoots them back to the kernel using the standard qirq stuff
> > used by emulated devices. Here I suppose we would want something similar
> > to vhost to associate the VFIO irq fd with a "global number".
> > 
> > Is that what the existing ioctl's provide ? Their semantics aren't
> > totally obvious to me.
> 
> Provided you want to trigger a MSI message, you first need to register
> it via kvm_irqchip_add_msi_route (will trigger KVM_SET_GSI_ROUTING).

Why ? Again, I don't get it.

> That will give you a virtual IRQ number which can be associate with an
> irqfd file descriptor as explained above (KVM_IRQFD). 

But virq numbers are entirely under qemu control. qemu creates the
source controllers, and assign the virq numbers to the devices. 

I really don't quite get how that concept of "GSI routing" means
anything for us. We certainly don't want to have the kernel return a
virq number to qemu.

That whole business with MSIs makes very little sense to me.

> But you may also
> create a different kind of routing table entry if MSI is not all you
> need to inject via irqfd. Could be a plain IRQ line as well, routed to a
> specific in-kernel IRQ controller model.

Sorry, I must be totally stupid or something but I don't understand.
Doesn't make sense to me.

> > Note that for pass-through at least, and possibly for vhost, we'd like
> > to actually totally bypass the irqfd & eventfd stuff for performance
> > reasons. At least for VFIO, if we are going to get the max performance
> > out of it, we need to take all generic code out of the picture. IE. If
> > the interrupts are routed to the physical CPU where the guest is
> > running, we want to be able to catch and distribute the interrupts to
> > the guest entirely within guest context, ie, with KVM arch specific low
> > level code that runs in "real mode" (ie MMU off) without context
> > switching the MMU back to the host, which on POWER is fairly costly.
> > 
> > That means that at least the association between a guest global
> > interrupt number and a host global interrupt number for pass-through
> > will be something that goes entirely through arch specific code path. We
> > might still be able to use generic APIs to establish it if they are
> > suitable though.
> 
> The same will happen on x86: direct injection to a target VCPU. Maybe
> again a topic for our IRQ routing table, just with specialized target types.

Ben.




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-26 20:45                                       ` Benjamin Herrenschmidt
@ 2012-10-26 22:03                                         ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 102+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-26 22:03 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Peter Maydell, Paolo Bonzini, Paul Mackerras, Christoffer Dall,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On Sat, 2012-10-27 at 07:45 +1100, Benjamin Herrenschmidt wrote:
> On Fri, 2012-10-26 at 14:39 +0200, Jan Kiszka wrote:
> 
> > But we are just talking about sending messages from A to B or soldering
> > an input to an output pin. That's pretty generic. Give each output event
> > a virtual IRQ number and define where its output "line" should be linked
> > to (input pin of target controller). All what will be specific are the
> > IDs of those controllers.
> 
> Hrm you seem to be saying something very different from Paolo here.
> Unless it's just a very very confused terminology.
> 
> So let's see the powerpc "pseries" case. Things like embedded etc...
> might be quite different.

So I had a chat with Anthony who explained to me a bit more about what
the x86 stuff is about. It's pretty horrible I must say :-)

So correct me if I'm wrong but you essentially have to differentiate
between MSI "outputs" and other (GSI) "outputs" due to the fact that
MSIs in x86 land don't act as normal interrupts going through a source
controller but instead get shot directly to the target CPU.

Then you have to establish some kind of "routing" from those GSIs to
some IO/APIC, and from MSIs to local APICs.

That's where I think there is a fairly fundamental difference with us.

So let's cut that problem in two. The GSI bit and the MSI bit. The
reason is that the way x86 does MSIs seems to be fairly x86 specific, I
wouldn't be surprised if everybody else did MSIs like we do them, that
is turn them into normal interrupts (ie, GSIs). But let's discuss that
below.

So the GSI bit. We can assume that GSIs in that context are basically
our "global interrupt number". This would apply to pretty much every
platform indeed.

The routing here, if I understand things correctly, consists of
associating such a global interrupt number with a specific input pin (or
virtual pin) of a specific source controller (ie, IO APIC).

This would generally make sense in embedded space as well I suppose,
where you can have multiple or even cascaded interrupt controllers of
different breeds etc...

However, in the pseries system, this routing is essentially encoded in
the interrupt number itself. As I think I explained earlier, the number
is arbitrarily split in two parts, the top bits indicating the source
controller and the bottom bits the source within that controller. In
qemu/kvm we have made an arbitrary split (whose size I don't remember
precisely) and we currently create only one fairly big source controller
but we might change that in the future.

This there is no such thing as needing to "associate" or create routing
entries here. qemu will directly shoot "GSIs" using an ioctl and our
code can directly map that to a source controller without any routing
table of any sort. In fact, adding one would complicate things since
we'd have a requirement that it's populated 1:1 or thing would get very
confused indeed so overall, there's no point for us to implement or use
that API or the "generic" code behind it, it would be pure bloat,
complication and problems.

However, making  that code more generic might make sense for other
platforms (including other powerpc platforms such as embedded) where
multiple interrupt controllers may exist though here too, it's probably
going to be fairly common that the GSI numbers are essentially be a bit
field split with entire ranges assigned to a given PIC. We don't have to
emulate x86+ACPI ability to individually remap interrupts.

The case if MSIs now. My understanding from what Anthony says is that
your MSIs essentially bypass the IO APIC and route directly to the local
APIC, which is equivalent to our presentation controller. You thus need
specific APIs to associate an MSI (which isn't a GSI) to as specific
local APIC.

We have no such need at all. Our MSIs are decoded by the PCI host bridge
and directly turned into "normal" interrupt. In fact, in HW, our bridges
contain a special source controller that *is* essentially the thing that
gets hit by MSIs. 

So our MSIs are just normal interrupts in the global space. Their
numbers are assigned by qemu, the kernel never knows about them. When an
emulated device triggers an MSI that turns into a normal "trigger global
interrupt X" ioctl to the kernel. The only "knowledge" the kernel
emulation gets along the way is an argument to the ioctl that indicates
whether this is a level set, level reset, or edge type action (MSIs are
edge obviously) which dictates how the delivery state machine will work
(one shot vs. continuous until cleared).

So qemu assigns interrupt numbers to MSIs and there's never any routing
to establish at the kernel level. That also means that the current API
that has tendrils all the way into devices in qemu for "getting the virq
for a given MSI" is totally unsuitable for us. In fact we don't need a
different API for KVM vs. full emulation. Everything in qemu side is the
same, until the qirq gets actually delivered in which case with KVM
we'll shoot an ioctl rather than emulating the source controller.

So the only APIs we need as these:

 - Create the IRQ chips themselves
 - Shoot an interrupt
 - Save and Restore of individual source state for migration

(The content of the state is very specific to a given IRQ chip
implementation).

I fail to see how we can shoe horn any of that in generic code, it
doesn't fit the model you currently have at all and making it do so
would add bloat and complexity without any benefit. IE. We wouldn't
"share" code, we would "add" code not otherwise useful.

The ARM situation might be different (and the powerpc situation for
other platforms such as mac99 and embedded) in that there might be some
value in having that GSI -> PIC input mapping, but here too I tend to
doubt it. We are probably better off starting with a cleaner slate
without the gross x86 baggage and use a unified flat number space in the
kernel, leaving all the complication of who is connected to whome to
qemu.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-26 22:03                                         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 102+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-26 22:03 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Peter Maydell, Paolo Bonzini, Paul Mackerras, Christoffer Dall,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On Sat, 2012-10-27 at 07:45 +1100, Benjamin Herrenschmidt wrote:
> On Fri, 2012-10-26 at 14:39 +0200, Jan Kiszka wrote:
> 
> > But we are just talking about sending messages from A to B or soldering
> > an input to an output pin. That's pretty generic. Give each output event
> > a virtual IRQ number and define where its output "line" should be linked
> > to (input pin of target controller). All what will be specific are the
> > IDs of those controllers.
> 
> Hrm you seem to be saying something very different from Paolo here.
> Unless it's just a very very confused terminology.
> 
> So let's see the powerpc "pseries" case. Things like embedded etc...
> might be quite different.

So I had a chat with Anthony who explained to me a bit more about what
the x86 stuff is about. It's pretty horrible I must say :-)

So correct me if I'm wrong but you essentially have to differentiate
between MSI "outputs" and other (GSI) "outputs" due to the fact that
MSIs in x86 land don't act as normal interrupts going through a source
controller but instead get shot directly to the target CPU.

Then you have to establish some kind of "routing" from those GSIs to
some IO/APIC, and from MSIs to local APICs.

That's where I think there is a fairly fundamental difference with us.

So let's cut that problem in two. The GSI bit and the MSI bit. The
reason is that the way x86 does MSIs seems to be fairly x86 specific, I
wouldn't be surprised if everybody else did MSIs like we do them, that
is turn them into normal interrupts (ie, GSIs). But let's discuss that
below.

So the GSI bit. We can assume that GSIs in that context are basically
our "global interrupt number". This would apply to pretty much every
platform indeed.

The routing here, if I understand things correctly, consists of
associating such a global interrupt number with a specific input pin (or
virtual pin) of a specific source controller (ie, IO APIC).

This would generally make sense in embedded space as well I suppose,
where you can have multiple or even cascaded interrupt controllers of
different breeds etc...

However, in the pseries system, this routing is essentially encoded in
the interrupt number itself. As I think I explained earlier, the number
is arbitrarily split in two parts, the top bits indicating the source
controller and the bottom bits the source within that controller. In
qemu/kvm we have made an arbitrary split (whose size I don't remember
precisely) and we currently create only one fairly big source controller
but we might change that in the future.

This there is no such thing as needing to "associate" or create routing
entries here. qemu will directly shoot "GSIs" using an ioctl and our
code can directly map that to a source controller without any routing
table of any sort. In fact, adding one would complicate things since
we'd have a requirement that it's populated 1:1 or thing would get very
confused indeed so overall, there's no point for us to implement or use
that API or the "generic" code behind it, it would be pure bloat,
complication and problems.

However, making  that code more generic might make sense for other
platforms (including other powerpc platforms such as embedded) where
multiple interrupt controllers may exist though here too, it's probably
going to be fairly common that the GSI numbers are essentially be a bit
field split with entire ranges assigned to a given PIC. We don't have to
emulate x86+ACPI ability to individually remap interrupts.

The case if MSIs now. My understanding from what Anthony says is that
your MSIs essentially bypass the IO APIC and route directly to the local
APIC, which is equivalent to our presentation controller. You thus need
specific APIs to associate an MSI (which isn't a GSI) to as specific
local APIC.

We have no such need at all. Our MSIs are decoded by the PCI host bridge
and directly turned into "normal" interrupt. In fact, in HW, our bridges
contain a special source controller that *is* essentially the thing that
gets hit by MSIs. 

So our MSIs are just normal interrupts in the global space. Their
numbers are assigned by qemu, the kernel never knows about them. When an
emulated device triggers an MSI that turns into a normal "trigger global
interrupt X" ioctl to the kernel. The only "knowledge" the kernel
emulation gets along the way is an argument to the ioctl that indicates
whether this is a level set, level reset, or edge type action (MSIs are
edge obviously) which dictates how the delivery state machine will work
(one shot vs. continuous until cleared).

So qemu assigns interrupt numbers to MSIs and there's never any routing
to establish at the kernel level. That also means that the current API
that has tendrils all the way into devices in qemu for "getting the virq
for a given MSI" is totally unsuitable for us. In fact we don't need a
different API for KVM vs. full emulation. Everything in qemu side is the
same, until the qirq gets actually delivered in which case with KVM
we'll shoot an ioctl rather than emulating the source controller.

So the only APIs we need as these:

 - Create the IRQ chips themselves
 - Shoot an interrupt
 - Save and Restore of individual source state for migration

(The content of the state is very specific to a given IRQ chip
implementation).

I fail to see how we can shoe horn any of that in generic code, it
doesn't fit the model you currently have at all and making it do so
would add bloat and complexity without any benefit. IE. We wouldn't
"share" code, we would "add" code not otherwise useful.

The ARM situation might be different (and the powerpc situation for
other platforms such as mac99 and embedded) in that there might be some
value in having that GSI -> PIC input mapping, but here too I tend to
doubt it. We are probably better off starting with a cleaner slate
without the gross x86 baggage and use a unified flat number space in the
kernel, leaving all the complication of who is connected to whome to
qemu.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-26 22:03                                         ` Benjamin Herrenschmidt
@ 2012-10-27  8:06                                           ` Jan Kiszka
  -1 siblings, 0 replies; 102+ messages in thread
From: Jan Kiszka @ 2012-10-27  8:06 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Peter Maydell, Paolo Bonzini, Paul Mackerras, Christoffer Dall,
	Alexander Graf, kvmarm, kvm, kvm-ppc

[-- Attachment #1: Type: text/plain, Size: 8944 bytes --]

On 2012-10-27 00:03, Benjamin Herrenschmidt wrote:
> On Sat, 2012-10-27 at 07:45 +1100, Benjamin Herrenschmidt wrote:
>> On Fri, 2012-10-26 at 14:39 +0200, Jan Kiszka wrote:
>>
>>> But we are just talking about sending messages from A to B or soldering
>>> an input to an output pin. That's pretty generic. Give each output event
>>> a virtual IRQ number and define where its output "line" should be linked
>>> to (input pin of target controller). All what will be specific are the
>>> IDs of those controllers.
>>
>> Hrm you seem to be saying something very different from Paolo here.
>> Unless it's just a very very confused terminology.
>>
>> So let's see the powerpc "pseries" case. Things like embedded etc...
>> might be quite different.
> 
> So I had a chat with Anthony who explained to me a bit more about what
> the x86 stuff is about. It's pretty horrible I must say :-)
> 
> So correct me if I'm wrong but you essentially have to differentiate
> between MSI "outputs" and other (GSI) "outputs" due to the fact that
> MSIs in x86 land don't act as normal interrupts going through a source
> controller but instead get shot directly to the target CPU.
> 
> Then you have to establish some kind of "routing" from those GSIs to
> some IO/APIC, and from MSIs to local APICs.

I'm afraid there are still some misconceptions about what is happening
on x86 and what role the bits play that are more generic.

The fact that we can inject MSI messages directly to the target APIC
doesn't affect the need to have IRQ routing support. That is used for
two reason on x86:

 - define the wiring from a classic IRQ line to the various (legacy) IRQ
   controllers we have, namely the IOAPIC and the PIC
 - define the IRQ input that should be generated when an irqfd triggers
   (that's currently just irqfd->MSI associations, but irqfd->irqchip
   may come as well for vfio)

> 
> That's where I think there is a fairly fundamental difference with us.
> 
> So let's cut that problem in two. The GSI bit and the MSI bit. The
> reason is that the way x86 does MSIs seems to be fairly x86 specific, I
> wouldn't be surprised if everybody else did MSIs like we do them, that
> is turn them into normal interrupts (ie, GSIs). But let's discuss that
> below.
> 
> So the GSI bit. We can assume that GSIs in that context are basically
> our "global interrupt number". This would apply to pretty much every
> platform indeed.
> 
> The routing here, if I understand things correctly, consists of
> associating such a global interrupt number with a specific input pin (or
> virtual pin) of a specific source controller (ie, IO APIC).

...or PIC or whatever you have on your platform.

> 
> This would generally make sense in embedded space as well I suppose,
> where you can have multiple or even cascaded interrupt controllers of
> different breeds etc...
> 
> However, in the pseries system, this routing is essentially encoded in
> the interrupt number itself. As I think I explained earlier, the number
> is arbitrarily split in two parts, the top bits indicating the source
> controller and the bottom bits the source within that controller. In
> qemu/kvm we have made an arbitrary split (whose size I don't remember
> precisely) and we currently create only one fairly big source controller
> but we might change that in the future.
> 
> This there is no such thing as needing to "associate" or create routing
> entries here. qemu will directly shoot "GSIs" using an ioctl and our
> code can directly map that to a source controller without any routing
> table of any sort. In fact, adding one would complicate things since
> we'd have a requirement that it's populated 1:1 or thing would get very
> confused indeed so overall, there's no point for us to implement or use
> that API or the "generic" code behind it, it would be pure bloat,
> complication and problems.

OK, that puts the IRQ injection IOCTL on pseries in the same category as
KVM_SIGNAL_MSI on x86. Both require no routing as the target address is
fully encoded and not otherwise dynamically remapped in the kernel.

> 
> However, making  that code more generic might make sense for other
> platforms (including other powerpc platforms such as embedded) where
> multiple interrupt controllers may exist though here too, it's probably
> going to be fairly common that the GSI numbers are essentially be a bit
> field split with entire ranges assigned to a given PIC. We don't have to
> emulate x86+ACPI ability to individually remap interrupts.

In KVM, a "GSI" is an index to its central IRQ routing table where the
target IRQ controller or MSI message is defined. Other interpretations
of the number passed down the IRQ injection IOCTL are of course possible
(like you do on pseries), but then it becomes arch-specific.

> 
> The case if MSIs now. My understanding from what Anthony says is that
> your MSIs essentially bypass the IO APIC and route directly to the local
> APIC, which is equivalent to our presentation controller. You thus need
> specific APIs to associate an MSI (which isn't a GSI) to as specific
> local APIC.

First part is right, second part not. We only need special APIs on x86
for the legacy PCI assignment mess.

For MSI delivery, we now have KVM_SIGNAL_MSI, and - as indicated above -
that bypasses any routing. Only if your MSI event is delivered to the
kernel via a virtual IRQ line (means irqfd today), you need a routing
table entry.

> 
> We have no such need at all. Our MSIs are decoded by the PCI host bridge
> and directly turned into "normal" interrupt. In fact, in HW, our bridges
> contain a special source controller that *is* essentially the thing that
> gets hit by MSIs. 
> 
> So our MSIs are just normal interrupts in the global space. Their
> numbers are assigned by qemu, the kernel never knows about them. When an
> emulated device triggers an MSI that turns into a normal "trigger global
> interrupt X" ioctl to the kernel. The only "knowledge" the kernel
> emulation gets along the way is an argument to the ioctl that indicates
> whether this is a level set, level reset, or edge type action (MSIs are
> edge obviously) which dictates how the delivery state machine will work
> (one shot vs. continuous until cleared).
> 
> So qemu assigns interrupt numbers to MSIs and there's never any routing
> to establish at the kernel level. That also means that the current API
> that has tendrils all the way into devices in qemu for "getting the virq
> for a given MSI" is totally unsuitable for us.

Just like it is for x86 when the MSI event is generated in userspace -
and that's why we avoid it in that case.

> In fact we don't need a
> different API for KVM vs. full emulation. Everything in qemu side is the
> same, until the qirq gets actually delivered in which case with KVM
> we'll shoot an ioctl rather than emulating the source controller.
> 
> So the only APIs we need as these:
> 
>  - Create the IRQ chips themselves
>  - Shoot an interrupt
>  - Save and Restore of individual source state for migration
> 
> (The content of the state is very specific to a given IRQ chip
> implementation).
> 
> I fail to see how we can shoe horn any of that in generic code, it
> doesn't fit the model you currently have at all and making it do so
> would add bloat and complexity without any benefit. IE. We wouldn't
> "share" code, we would "add" code not otherwise useful.

I agree that you have no logical need for IRQ routing. Still, if you
want to use irqfd without reimplementing it for pseries, you will have
to populate a GSI routing table. That's because irqfd injects a "GSI"
(via kvm_set_irq), and that is dispatched according to the routing table.

So you need to define for the GSI number of an irqfd which pseries
global IRQ number should be generated. There is a bit of refactoring
needed in irq_comm.c to pull out remaining x86 bits and place some arch
callbacks, e.g. in setup_routing_entry so that you can hook up your
specific IRQ handler.

Options for configuring IRQ chips from userspace were discussed already.

> 
> The ARM situation might be different (and the powerpc situation for
> other platforms such as mac99 and embedded) in that there might be some
> value in having that GSI -> PIC input mapping, but here too I tend to
> doubt it. We are probably better off starting with a cleaner slate
> without the gross x86 baggage and use a unified flat number space in the
> kernel, leaving all the complication of who is connected to whome to
> qemu.

Because the specialties of pseries have not much use for IRQ routing,
you shouldn't derive that it is useless for all other archs. The
interface we have are generic enough, and there is no excuse to ignore
them without actually having tried them on ARM or embedded Power.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-27  8:06                                           ` Jan Kiszka
  0 siblings, 0 replies; 102+ messages in thread
From: Jan Kiszka @ 2012-10-27  8:06 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Peter Maydell, Paolo Bonzini, Paul Mackerras, Christoffer Dall,
	Alexander Graf, kvmarm, kvm, kvm-ppc

[-- Attachment #1: Type: text/plain, Size: 8944 bytes --]

On 2012-10-27 00:03, Benjamin Herrenschmidt wrote:
> On Sat, 2012-10-27 at 07:45 +1100, Benjamin Herrenschmidt wrote:
>> On Fri, 2012-10-26 at 14:39 +0200, Jan Kiszka wrote:
>>
>>> But we are just talking about sending messages from A to B or soldering
>>> an input to an output pin. That's pretty generic. Give each output event
>>> a virtual IRQ number and define where its output "line" should be linked
>>> to (input pin of target controller). All what will be specific are the
>>> IDs of those controllers.
>>
>> Hrm you seem to be saying something very different from Paolo here.
>> Unless it's just a very very confused terminology.
>>
>> So let's see the powerpc "pseries" case. Things like embedded etc...
>> might be quite different.
> 
> So I had a chat with Anthony who explained to me a bit more about what
> the x86 stuff is about. It's pretty horrible I must say :-)
> 
> So correct me if I'm wrong but you essentially have to differentiate
> between MSI "outputs" and other (GSI) "outputs" due to the fact that
> MSIs in x86 land don't act as normal interrupts going through a source
> controller but instead get shot directly to the target CPU.
> 
> Then you have to establish some kind of "routing" from those GSIs to
> some IO/APIC, and from MSIs to local APICs.

I'm afraid there are still some misconceptions about what is happening
on x86 and what role the bits play that are more generic.

The fact that we can inject MSI messages directly to the target APIC
doesn't affect the need to have IRQ routing support. That is used for
two reason on x86:

 - define the wiring from a classic IRQ line to the various (legacy) IRQ
   controllers we have, namely the IOAPIC and the PIC
 - define the IRQ input that should be generated when an irqfd triggers
   (that's currently just irqfd->MSI associations, but irqfd->irqchip
   may come as well for vfio)

> 
> That's where I think there is a fairly fundamental difference with us.
> 
> So let's cut that problem in two. The GSI bit and the MSI bit. The
> reason is that the way x86 does MSIs seems to be fairly x86 specific, I
> wouldn't be surprised if everybody else did MSIs like we do them, that
> is turn them into normal interrupts (ie, GSIs). But let's discuss that
> below.
> 
> So the GSI bit. We can assume that GSIs in that context are basically
> our "global interrupt number". This would apply to pretty much every
> platform indeed.
> 
> The routing here, if I understand things correctly, consists of
> associating such a global interrupt number with a specific input pin (or
> virtual pin) of a specific source controller (ie, IO APIC).

...or PIC or whatever you have on your platform.

> 
> This would generally make sense in embedded space as well I suppose,
> where you can have multiple or even cascaded interrupt controllers of
> different breeds etc...
> 
> However, in the pseries system, this routing is essentially encoded in
> the interrupt number itself. As I think I explained earlier, the number
> is arbitrarily split in two parts, the top bits indicating the source
> controller and the bottom bits the source within that controller. In
> qemu/kvm we have made an arbitrary split (whose size I don't remember
> precisely) and we currently create only one fairly big source controller
> but we might change that in the future.
> 
> This there is no such thing as needing to "associate" or create routing
> entries here. qemu will directly shoot "GSIs" using an ioctl and our
> code can directly map that to a source controller without any routing
> table of any sort. In fact, adding one would complicate things since
> we'd have a requirement that it's populated 1:1 or thing would get very
> confused indeed so overall, there's no point for us to implement or use
> that API or the "generic" code behind it, it would be pure bloat,
> complication and problems.

OK, that puts the IRQ injection IOCTL on pseries in the same category as
KVM_SIGNAL_MSI on x86. Both require no routing as the target address is
fully encoded and not otherwise dynamically remapped in the kernel.

> 
> However, making  that code more generic might make sense for other
> platforms (including other powerpc platforms such as embedded) where
> multiple interrupt controllers may exist though here too, it's probably
> going to be fairly common that the GSI numbers are essentially be a bit
> field split with entire ranges assigned to a given PIC. We don't have to
> emulate x86+ACPI ability to individually remap interrupts.

In KVM, a "GSI" is an index to its central IRQ routing table where the
target IRQ controller or MSI message is defined. Other interpretations
of the number passed down the IRQ injection IOCTL are of course possible
(like you do on pseries), but then it becomes arch-specific.

> 
> The case if MSIs now. My understanding from what Anthony says is that
> your MSIs essentially bypass the IO APIC and route directly to the local
> APIC, which is equivalent to our presentation controller. You thus need
> specific APIs to associate an MSI (which isn't a GSI) to as specific
> local APIC.

First part is right, second part not. We only need special APIs on x86
for the legacy PCI assignment mess.

For MSI delivery, we now have KVM_SIGNAL_MSI, and - as indicated above -
that bypasses any routing. Only if your MSI event is delivered to the
kernel via a virtual IRQ line (means irqfd today), you need a routing
table entry.

> 
> We have no such need at all. Our MSIs are decoded by the PCI host bridge
> and directly turned into "normal" interrupt. In fact, in HW, our bridges
> contain a special source controller that *is* essentially the thing that
> gets hit by MSIs. 
> 
> So our MSIs are just normal interrupts in the global space. Their
> numbers are assigned by qemu, the kernel never knows about them. When an
> emulated device triggers an MSI that turns into a normal "trigger global
> interrupt X" ioctl to the kernel. The only "knowledge" the kernel
> emulation gets along the way is an argument to the ioctl that indicates
> whether this is a level set, level reset, or edge type action (MSIs are
> edge obviously) which dictates how the delivery state machine will work
> (one shot vs. continuous until cleared).
> 
> So qemu assigns interrupt numbers to MSIs and there's never any routing
> to establish at the kernel level. That also means that the current API
> that has tendrils all the way into devices in qemu for "getting the virq
> for a given MSI" is totally unsuitable for us.

Just like it is for x86 when the MSI event is generated in userspace -
and that's why we avoid it in that case.

> In fact we don't need a
> different API for KVM vs. full emulation. Everything in qemu side is the
> same, until the qirq gets actually delivered in which case with KVM
> we'll shoot an ioctl rather than emulating the source controller.
> 
> So the only APIs we need as these:
> 
>  - Create the IRQ chips themselves
>  - Shoot an interrupt
>  - Save and Restore of individual source state for migration
> 
> (The content of the state is very specific to a given IRQ chip
> implementation).
> 
> I fail to see how we can shoe horn any of that in generic code, it
> doesn't fit the model you currently have at all and making it do so
> would add bloat and complexity without any benefit. IE. We wouldn't
> "share" code, we would "add" code not otherwise useful.

I agree that you have no logical need for IRQ routing. Still, if you
want to use irqfd without reimplementing it for pseries, you will have
to populate a GSI routing table. That's because irqfd injects a "GSI"
(via kvm_set_irq), and that is dispatched according to the routing table.

So you need to define for the GSI number of an irqfd which pseries
global IRQ number should be generated. There is a bit of refactoring
needed in irq_comm.c to pull out remaining x86 bits and place some arch
callbacks, e.g. in setup_routing_entry so that you can hook up your
specific IRQ handler.

Options for configuring IRQ chips from userspace were discussed already.

> 
> The ARM situation might be different (and the powerpc situation for
> other platforms such as mac99 and embedded) in that there might be some
> value in having that GSI -> PIC input mapping, but here too I tend to
> doubt it. We are probably better off starting with a cleaner slate
> without the gross x86 baggage and use a unified flat number space in the
> kernel, leaving all the complication of who is connected to whome to
> qemu.

Because the specialties of pseries have not much use for IRQ routing,
you shouldn't derive that it is useless for all other archs. The
interface we have are generic enough, and there is no excuse to ignore
them without actually having tried them on ARM or embedded Power.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-26 22:03                                         ` Benjamin Herrenschmidt
@ 2012-10-27 10:01                                           ` Peter Maydell
  -1 siblings, 0 replies; 102+ messages in thread
From: Peter Maydell @ 2012-10-27 10:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jan Kiszka, Paolo Bonzini, Paul Mackerras, Christoffer Dall,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On 26 October 2012 23:03, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> So the GSI bit. We can assume that GSIs in that context are basically
> our "global interrupt number". This would apply to pretty much every
> platform indeed.
>
> The routing here, if I understand things correctly, consists of
> associating such a global interrupt number with a specific input pin (or
> virtual pin) of a specific source controller (ie, IO APIC).
>
> This would generally make sense in embedded space as well I suppose,
> where you can have multiple or even cascaded interrupt controllers of
> different breeds etc...
>
> However, in the pseries system, this routing is essentially encoded in
> the interrupt number itself. As I think I explained earlier, the number
> is arbitrarily split in two parts, the top bits indicating the source
> controller and the bottom bits the source within that controller. In
> qemu/kvm we have made an arbitrary split (whose size I don't remember
> precisely) and we currently create only one fairly big source controller
> but we might change that in the future.

This is more or less how ARM has done it (though our specific encoding
of interrupt numbers is different, obviously).

If I were designing an interface for this kind of thing from scratch
I'd probably want it to look like "create a specific irqchip and give
me some kind of handle to it" and then have an interface for "assert
interrupt line X on that irqchip". Lacking that, a plausible encoding
scheme on the global interrupt numbers works OK given that you know
there aren't going to be that many irqchips in practice...

-- PMM

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-27 10:01                                           ` Peter Maydell
  0 siblings, 0 replies; 102+ messages in thread
From: Peter Maydell @ 2012-10-27 10:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jan Kiszka, Paolo Bonzini, Paul Mackerras, Christoffer Dall,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On 26 October 2012 23:03, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> So the GSI bit. We can assume that GSIs in that context are basically
> our "global interrupt number". This would apply to pretty much every
> platform indeed.
>
> The routing here, if I understand things correctly, consists of
> associating such a global interrupt number with a specific input pin (or
> virtual pin) of a specific source controller (ie, IO APIC).
>
> This would generally make sense in embedded space as well I suppose,
> where you can have multiple or even cascaded interrupt controllers of
> different breeds etc...
>
> However, in the pseries system, this routing is essentially encoded in
> the interrupt number itself. As I think I explained earlier, the number
> is arbitrarily split in two parts, the top bits indicating the source
> controller and the bottom bits the source within that controller. In
> qemu/kvm we have made an arbitrary split (whose size I don't remember
> precisely) and we currently create only one fairly big source controller
> but we might change that in the future.

This is more or less how ARM has done it (though our specific encoding
of interrupt numbers is different, obviously).

If I were designing an interface for this kind of thing from scratch
I'd probably want it to look like "create a specific irqchip and give
me some kind of handle to it" and then have an interface for "assert
interrupt line X on that irqchip". Lacking that, a plausible encoding
scheme on the global interrupt numbers works OK given that you know
there aren't going to be that many irqchips in practice...

-- PMM

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
  2012-10-27 10:01                                           ` Peter Maydell
@ 2012-10-28 22:19                                             ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 102+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-28 22:19 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Jan Kiszka, Paolo Bonzini, Paul Mackerras, Christoffer Dall,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On Sat, 2012-10-27 at 11:01 +0100, Peter Maydell wrote:

> This is more or less how ARM has done it (though our specific encoding
> of interrupt numbers is different, obviously).
> 
> If I were designing an interface for this kind of thing from scratch
> I'd probably want it to look like "create a specific irqchip and give
> me some kind of handle to it" and then have an interface for "assert
> interrupt line X on that irqchip". Lacking that, a plausible encoding
> scheme on the global interrupt numbers works OK given that you know
> there aren't going to be that many irqchips in practice...

Agreed. It feels like the whole business with routing is not useful for
either of us and just adds confusion...

Note: I'll be in KVM forum next week (hopefully) so if necessary (and if
you guys are there too) we can have a "live" discussion.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
@ 2012-10-28 22:19                                             ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 102+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-28 22:19 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Jan Kiszka, Paolo Bonzini, Paul Mackerras, Christoffer Dall,
	Alexander Graf, kvmarm, kvm, kvm-ppc

On Sat, 2012-10-27 at 11:01 +0100, Peter Maydell wrote:

> This is more or less how ARM has done it (though our specific encoding
> of interrupt numbers is different, obviously).
> 
> If I were designing an interface for this kind of thing from scratch
> I'd probably want it to look like "create a specific irqchip and give
> me some kind of handle to it" and then have an interface for "assert
> interrupt line X on that irqchip". Lacking that, a plausible encoding
> scheme on the global interrupt numbers works OK given that you know
> there aren't going to be that many irqchips in practice...

Agreed. It feels like the whole business with routing is not useful for
either of us and just adds confusion...

Note: I'll be in KVM forum next week (hopefully) so if necessary (and if
you guys are there too) we can have a "live" discussion.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 102+ messages in thread

end of thread, other threads:[~2012-10-28 22:19 UTC | newest]

Thread overview: 102+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-14  0:04 [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses Christoffer Dall
2012-10-14  0:04 ` [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl Christoffer Dall
2012-10-17 20:21   ` Peter Maydell
2012-10-17 20:23     ` Christoffer Dall
2012-10-17 20:31       ` Peter Maydell
2012-10-17 20:39         ` Christoffer Dall
2012-10-18 12:20   ` Avi Kivity
2012-10-19 18:42     ` Christoffer Dall
2012-10-14  0:04 ` [RFC PATCH 2/3] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl Christoffer Dall
2012-10-17 20:29   ` Peter Maydell
2012-10-19 18:46     ` Christoffer Dall
2012-10-19 20:24       ` Peter Maydell
2012-10-19 20:27         ` Christoffer Dall
2012-10-19 20:33           ` Christoffer Dall
2012-10-14  0:04 ` [RFC PATCH 3/3] KVM: ARM: Split KVM_CREATE_IRQCHIP and KVM_INIT_IRQCHIP Christoffer Dall
2012-10-18 11:15   ` Peter Maydell
2012-10-17 20:38 ` [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses Alexander Graf
2012-10-17 20:38   ` Alexander Graf
2012-10-17 20:39   ` Christoffer Dall
2012-10-17 20:39     ` Christoffer Dall
2012-10-17 21:19     ` Benjamin Herrenschmidt
2012-10-17 21:19       ` Benjamin Herrenschmidt
2012-10-17 22:10     ` Paul Mackerras
2012-10-17 22:10       ` Paul Mackerras
2012-10-17 23:58       ` Benjamin Herrenschmidt
2012-10-17 23:58         ` Benjamin Herrenschmidt
2012-10-18 13:48         ` Christoffer Dall
2012-10-18 13:48           ` Christoffer Dall
2012-10-18 13:49           ` Alexander Graf
2012-10-18 13:49             ` Alexander Graf
2012-10-18 15:25             ` Avi Kivity
2012-10-18 15:25               ` Avi Kivity
2012-10-23 10:48           ` Jan Kiszka
2012-10-23 10:48             ` Jan Kiszka
2012-10-23 10:52             ` Peter Maydell
2012-10-23 10:52               ` Peter Maydell
2012-10-23 11:00               ` Jan Kiszka
2012-10-23 11:00                 ` Jan Kiszka
2012-10-23 11:04                 ` Peter Maydell
2012-10-23 11:04                   ` Peter Maydell
2012-10-23 11:08                   ` Jan Kiszka
2012-10-23 11:08                     ` Jan Kiszka
2012-10-24  0:50             ` Paul Mackerras
2012-10-24  0:50               ` Paul Mackerras
2012-10-25 11:57               ` Jan Kiszka
2012-10-25 11:57                 ` Jan Kiszka
2012-10-25 16:14               ` Paolo Bonzini
2012-10-25 16:14                 ` Paolo Bonzini
2012-10-25 16:32                 ` Jan Kiszka
2012-10-25 16:32                   ` Jan Kiszka
2012-10-25 18:27                   ` Paolo Bonzini
2012-10-25 18:27                     ` Paolo Bonzini
2012-10-25 19:40                     ` Benjamin Herrenschmidt
2012-10-25 19:40                       ` Benjamin Herrenschmidt
2012-10-26  9:58                       ` Paolo Bonzini
2012-10-26  9:58                         ` Paolo Bonzini
2012-10-26 10:09                         ` Peter Maydell
2012-10-26 10:09                           ` Peter Maydell
2012-10-26 10:15                           ` Paolo Bonzini
2012-10-26 10:15                             ` Paolo Bonzini
2012-10-26 10:22                             ` Jan Kiszka
2012-10-26 10:22                               ` Jan Kiszka
2012-10-26 10:44                             ` Benjamin Herrenschmidt
2012-10-26 10:44                               ` Benjamin Herrenschmidt
2012-10-26 11:00                               ` Jan Kiszka
2012-10-26 11:00                                 ` Jan Kiszka
2012-10-26 11:09                                 ` Benjamin Herrenschmidt
2012-10-26 11:09                                   ` Benjamin Herrenschmidt
2012-10-26 11:57                                   ` Paolo Bonzini
2012-10-26 11:57                                     ` Paolo Bonzini
2012-10-26 12:08                                     ` Peter Maydell
2012-10-26 12:08                                       ` Peter Maydell
2012-10-26 12:41                                       ` Jan Kiszka
2012-10-26 12:41                                         ` Jan Kiszka
2012-10-26 20:21                                     ` Benjamin Herrenschmidt
2012-10-26 20:21                                       ` Benjamin Herrenschmidt
2012-10-26 11:17                               ` Peter Maydell
2012-10-26 11:17                                 ` Peter Maydell
2012-10-26 11:39                                 ` Benjamin Herrenschmidt
2012-10-26 11:39                                   ` Benjamin Herrenschmidt
2012-10-26 12:39                                   ` Jan Kiszka
2012-10-26 12:39                                     ` Jan Kiszka
2012-10-26 20:45                                     ` Benjamin Herrenschmidt
2012-10-26 20:45                                       ` Benjamin Herrenschmidt
2012-10-26 22:03                                       ` Benjamin Herrenschmidt
2012-10-26 22:03                                         ` Benjamin Herrenschmidt
2012-10-27  8:06                                         ` Jan Kiszka
2012-10-27  8:06                                           ` Jan Kiszka
2012-10-27 10:01                                         ` Peter Maydell
2012-10-27 10:01                                           ` Peter Maydell
2012-10-28 22:19                                           ` Benjamin Herrenschmidt
2012-10-28 22:19                                             ` Benjamin Herrenschmidt
2012-10-26 10:37                         ` Benjamin Herrenschmidt
2012-10-26 10:37                           ` Benjamin Herrenschmidt
2012-10-26 10:40                           ` Paolo Bonzini
2012-10-26 10:40                             ` Paolo Bonzini
2012-10-26 10:47                             ` Benjamin Herrenschmidt
2012-10-26 10:47                               ` Benjamin Herrenschmidt
2012-10-26 11:47                               ` Paolo Bonzini
2012-10-26 11:47                                 ` Paolo Bonzini
2012-10-25 19:39                 ` Benjamin Herrenschmidt
2012-10-25 19:39                   ` Benjamin Herrenschmidt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.