* [PATCH 01/16] kvm: ioctl for KVM_ASSIGN_DEV_IRQ and KVM_DEASSIGN_DEV_IRQ
2009-03-12 13:36 [PATCH 0/16 v5] Device assignment improvement in userspace Sheng Yang
@ 2009-03-12 13:36 ` Sheng Yang
2009-03-12 13:40 ` Sheng Yang
2009-03-12 13:36 ` [PATCH 02/16] kvm: deassign irq for INTx Sheng Yang
` (15 subsequent siblings)
16 siblings, 1 reply; 32+ messages in thread
From: Sheng Yang @ 2009-03-12 13:36 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tosatti, Anthony Liguori
Cc: kvm, Marcelo Tosatti, Sheng Yang
From: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
libkvm/libkvm.c | 38 +++++++++++++++++++++++++++++++++++++-
libkvm/libkvm.h | 21 +++++++++++++++++----
2 files changed, 54 insertions(+), 5 deletions(-)
diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c
index 0ac1c28..80a0481 100644
--- a/libkvm/libkvm.c
+++ b/libkvm/libkvm.c
@@ -1141,7 +1141,7 @@ int kvm_assign_pci_device(kvm_context_t kvm,
return ret;
}
-int kvm_assign_irq(kvm_context_t kvm,
+static int kvm_old_assign_irq(kvm_context_t kvm,
struct kvm_assigned_irq *assigned_irq)
{
int ret;
@@ -1152,6 +1152,42 @@ int kvm_assign_irq(kvm_context_t kvm,
return ret;
}
+
+#ifdef KVM_CAP_ASSIGN_DEV_IRQ
+int kvm_assign_irq(kvm_context_t kvm,
+ struct kvm_assigned_irq *assigned_irq)
+{
+ int ret;
+
+ ret = ioctl(kvm->fd, KVM_CHECK_EXTENSION, KVM_CAP_ASSIGN_DEV_IRQ);
+ if (ret > 0) {
+ ret = ioctl(kvm->vm_fd, KVM_ASSIGN_DEV_IRQ, assigned_irq);
+ if (ret < 0)
+ return -errno;
+ return ret;
+ }
+
+ return kvm_old_assign_irq(kvm, assigned_irq);
+}
+
+int kvm_deassign_irq(kvm_context_t kvm,
+ struct kvm_assigned_irq *assigned_irq)
+{
+ int ret;
+
+ ret = ioctl(kvm->vm_fd, KVM_DEASSIGN_DEV_IRQ, assigned_irq);
+ if (ret < 0)
+ return -errno;
+
+ return ret;
+}
+#else
+int kvm_assign_irq(kvm_context_t kvm,
+ struct kvm_assigned_irq *assigned_irq)
+{
+ return kvm_old_assign_irq(kvm, assigned_irq);
+}
+#endif
#endif
#ifdef KVM_CAP_DEVICE_DEASSIGNMENT
diff --git a/libkvm/libkvm.h b/libkvm/libkvm.h
index 0239cb6..3e5efe0 100644
--- a/libkvm/libkvm.h
+++ b/libkvm/libkvm.h
@@ -715,11 +715,10 @@ int kvm_assign_pci_device(kvm_context_t kvm,
struct kvm_assigned_pci_dev *assigned_dev);
/*!
- * \brief Notifies host kernel about changes to IRQ for an assigned device
+ * \brief Assign IRQ for an assigned device
*
- * Used for PCI device assignment, this function notifies the host
- * kernel about the changes in IRQ number for an assigned physical
- * PCI device.
+ * Used for PCI device assignment, this function assigns IRQ numbers for
+ * an physical device and guest IRQ handling.
*
* \param kvm Pointer to the current kvm_context
* \param assigned_irq Parameters, like dev id, host irq, guest irq, etc
@@ -727,6 +726,20 @@ int kvm_assign_pci_device(kvm_context_t kvm,
int kvm_assign_irq(kvm_context_t kvm,
struct kvm_assigned_irq *assigned_irq);
+#ifdef KVM_CAP_ASSIGN_DEV_IRQ
+/*!
+ * \brief Deassign IRQ for an assigned device
+ *
+ * Used for PCI device assignment, this function deassigns IRQ numbers
+ * for an assigned device.
+ *
+ * \param kvm Pointer to the current kvm_context
+ * \param assigned_irq Parameters, like dev id, host irq, guest irq, etc
+ */
+int kvm_deassign_irq(kvm_context_t kvm,
+ struct kvm_assigned_irq *assigned_irq);
+#endif
+
/*!
* \brief Determines whether destroying memory regions is allowed
*
--
1.5.4.5
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 01/16] kvm: ioctl for KVM_ASSIGN_DEV_IRQ and KVM_DEASSIGN_DEV_IRQ
2009-03-12 13:36 ` [PATCH 01/16] kvm: ioctl for KVM_ASSIGN_DEV_IRQ and KVM_DEASSIGN_DEV_IRQ Sheng Yang
@ 2009-03-12 13:40 ` Sheng Yang
2009-03-16 8:30 ` Sheng Yang
2009-03-16 9:04 ` Avi Kivity
0 siblings, 2 replies; 32+ messages in thread
From: Sheng Yang @ 2009-03-12 13:40 UTC (permalink / raw)
To: Avi Kivity; +Cc: Marcelo Tosatti, Anthony Liguori, kvm
On Thursday 12 March 2009 21:36:44 Sheng Yang wrote:
> From: Marcelo Tosatti <mtosatti@redhat.com>
>
> Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Oops.. Should be Marcelo's signed-off here...
--
regards
Yang, Sheng
> ---
> libkvm/libkvm.c | 38 +++++++++++++++++++++++++++++++++++++-
> libkvm/libkvm.h | 21 +++++++++++++++++----
> 2 files changed, 54 insertions(+), 5 deletions(-)
>
> diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c
> index 0ac1c28..80a0481 100644
> --- a/libkvm/libkvm.c
> +++ b/libkvm/libkvm.c
> @@ -1141,7 +1141,7 @@ int kvm_assign_pci_device(kvm_context_t kvm,
> return ret;
> }
>
> -int kvm_assign_irq(kvm_context_t kvm,
> +static int kvm_old_assign_irq(kvm_context_t kvm,
> struct kvm_assigned_irq *assigned_irq)
> {
> int ret;
> @@ -1152,6 +1152,42 @@ int kvm_assign_irq(kvm_context_t kvm,
>
> return ret;
> }
> +
> +#ifdef KVM_CAP_ASSIGN_DEV_IRQ
> +int kvm_assign_irq(kvm_context_t kvm,
> + struct kvm_assigned_irq *assigned_irq)
> +{
> + int ret;
> +
> + ret = ioctl(kvm->fd, KVM_CHECK_EXTENSION, KVM_CAP_ASSIGN_DEV_IRQ);
> + if (ret > 0) {
> + ret = ioctl(kvm->vm_fd, KVM_ASSIGN_DEV_IRQ, assigned_irq);
> + if (ret < 0)
> + return -errno;
> + return ret;
> + }
> +
> + return kvm_old_assign_irq(kvm, assigned_irq);
> +}
> +
> +int kvm_deassign_irq(kvm_context_t kvm,
> + struct kvm_assigned_irq *assigned_irq)
> +{
> + int ret;
> +
> + ret = ioctl(kvm->vm_fd, KVM_DEASSIGN_DEV_IRQ, assigned_irq);
> + if (ret < 0)
> + return -errno;
> +
> + return ret;
> +}
> +#else
> +int kvm_assign_irq(kvm_context_t kvm,
> + struct kvm_assigned_irq *assigned_irq)
> +{
> + return kvm_old_assign_irq(kvm, assigned_irq);
> +}
> +#endif
> #endif
>
> #ifdef KVM_CAP_DEVICE_DEASSIGNMENT
> diff --git a/libkvm/libkvm.h b/libkvm/libkvm.h
> index 0239cb6..3e5efe0 100644
> --- a/libkvm/libkvm.h
> +++ b/libkvm/libkvm.h
> @@ -715,11 +715,10 @@ int kvm_assign_pci_device(kvm_context_t kvm,
> struct kvm_assigned_pci_dev *assigned_dev);
>
> /*!
> - * \brief Notifies host kernel about changes to IRQ for an assigned device
> + * \brief Assign IRQ for an assigned device
> *
> - * Used for PCI device assignment, this function notifies the host
> - * kernel about the changes in IRQ number for an assigned physical
> - * PCI device.
> + * Used for PCI device assignment, this function assigns IRQ numbers for
> + * an physical device and guest IRQ handling.
> *
> * \param kvm Pointer to the current kvm_context
> * \param assigned_irq Parameters, like dev id, host irq, guest irq, etc
> @@ -727,6 +726,20 @@ int kvm_assign_pci_device(kvm_context_t kvm,
> int kvm_assign_irq(kvm_context_t kvm,
> struct kvm_assigned_irq *assigned_irq);
>
> +#ifdef KVM_CAP_ASSIGN_DEV_IRQ
> +/*!
> + * \brief Deassign IRQ for an assigned device
> + *
> + * Used for PCI device assignment, this function deassigns IRQ numbers
> + * for an assigned device.
> + *
> + * \param kvm Pointer to the current kvm_context
> + * \param assigned_irq Parameters, like dev id, host irq, guest irq, etc
> + */
> +int kvm_deassign_irq(kvm_context_t kvm,
> + struct kvm_assigned_irq *assigned_irq);
> +#endif
> +
> /*!
> * \brief Determines whether destroying memory regions is allowed
> *
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 01/16] kvm: ioctl for KVM_ASSIGN_DEV_IRQ and KVM_DEASSIGN_DEV_IRQ
2009-03-12 13:40 ` Sheng Yang
@ 2009-03-16 8:30 ` Sheng Yang
2009-03-16 9:04 ` Avi Kivity
1 sibling, 0 replies; 32+ messages in thread
From: Sheng Yang @ 2009-03-16 8:30 UTC (permalink / raw)
To: Avi Kivity, Anthony Liguori; +Cc: kvm, Marcelo Tosatti
From: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
libkvm/libkvm.c | 38 +++++++++++++++++++++++++++++++++++++-
libkvm/libkvm.h | 21 +++++++++++++++++----
2 files changed, 54 insertions(+), 5 deletions(-)
diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c
index 0ac1c28..80a0481 100644
--- a/libkvm/libkvm.c
+++ b/libkvm/libkvm.c
@@ -1141,7 +1141,7 @@ int kvm_assign_pci_device(kvm_context_t kvm,
return ret;
}
-int kvm_assign_irq(kvm_context_t kvm,
+static int kvm_old_assign_irq(kvm_context_t kvm,
struct kvm_assigned_irq *assigned_irq)
{
int ret;
@@ -1152,6 +1152,42 @@ int kvm_assign_irq(kvm_context_t kvm,
return ret;
}
+
+#ifdef KVM_CAP_ASSIGN_DEV_IRQ
+int kvm_assign_irq(kvm_context_t kvm,
+ struct kvm_assigned_irq *assigned_irq)
+{
+ int ret;
+
+ ret = ioctl(kvm->fd, KVM_CHECK_EXTENSION, KVM_CAP_ASSIGN_DEV_IRQ);
+ if (ret > 0) {
+ ret = ioctl(kvm->vm_fd, KVM_ASSIGN_DEV_IRQ, assigned_irq);
+ if (ret < 0)
+ return -errno;
+ return ret;
+ }
+
+ return kvm_old_assign_irq(kvm, assigned_irq);
+}
+
+int kvm_deassign_irq(kvm_context_t kvm,
+ struct kvm_assigned_irq *assigned_irq)
+{
+ int ret;
+
+ ret = ioctl(kvm->vm_fd, KVM_DEASSIGN_DEV_IRQ, assigned_irq);
+ if (ret < 0)
+ return -errno;
+
+ return ret;
+}
+#else
+int kvm_assign_irq(kvm_context_t kvm,
+ struct kvm_assigned_irq *assigned_irq)
+{
+ return kvm_old_assign_irq(kvm, assigned_irq);
+}
+#endif
#endif
#ifdef KVM_CAP_DEVICE_DEASSIGNMENT
diff --git a/libkvm/libkvm.h b/libkvm/libkvm.h
index 0239cb6..3e5efe0 100644
--- a/libkvm/libkvm.h
+++ b/libkvm/libkvm.h
@@ -715,11 +715,10 @@ int kvm_assign_pci_device(kvm_context_t kvm,
struct kvm_assigned_pci_dev *assigned_dev);
/*!
- * \brief Notifies host kernel about changes to IRQ for an assigned device
+ * \brief Assign IRQ for an assigned device
*
- * Used for PCI device assignment, this function notifies the host
- * kernel about the changes in IRQ number for an assigned physical
- * PCI device.
+ * Used for PCI device assignment, this function assigns IRQ numbers for
+ * an physical device and guest IRQ handling.
*
* \param kvm Pointer to the current kvm_context
* \param assigned_irq Parameters, like dev id, host irq, guest irq, etc
@@ -727,6 +726,20 @@ int kvm_assign_pci_device(kvm_context_t kvm,
int kvm_assign_irq(kvm_context_t kvm,
struct kvm_assigned_irq *assigned_irq);
+#ifdef KVM_CAP_ASSIGN_DEV_IRQ
+/*!
+ * \brief Deassign IRQ for an assigned device
+ *
+ * Used for PCI device assignment, this function deassigns IRQ numbers
+ * for an assigned device.
+ *
+ * \param kvm Pointer to the current kvm_context
+ * \param assigned_irq Parameters, like dev id, host irq, guest irq, etc
+ */
+int kvm_deassign_irq(kvm_context_t kvm,
+ struct kvm_assigned_irq *assigned_irq);
+#endif
+
/*!
* \brief Determines whether destroying memory regions is allowed
*
--
1.5.4.5
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 01/16] kvm: ioctl for KVM_ASSIGN_DEV_IRQ and KVM_DEASSIGN_DEV_IRQ
2009-03-12 13:40 ` Sheng Yang
2009-03-16 8:30 ` Sheng Yang
@ 2009-03-16 9:04 ` Avi Kivity
2009-03-16 9:11 ` Sheng Yang
1 sibling, 1 reply; 32+ messages in thread
From: Avi Kivity @ 2009-03-16 9:04 UTC (permalink / raw)
To: Sheng Yang; +Cc: Marcelo Tosatti, Anthony Liguori, kvm
Sheng Yang wrote:
> On Thursday 12 March 2009 21:36:44 Sheng Yang wrote:
>
>> From: Marcelo Tosatti <mtosatti@redhat.com>
>>
>> Signed-off-by: Sheng Yang <sheng@linux.intel.com>
>>
> Oops.. Should be Marcelo's signed-off here...
>
Should be both. If you are forwarding someone else's patch, even with
no changes, you still need to add your signoff.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 01/16] kvm: ioctl for KVM_ASSIGN_DEV_IRQ and KVM_DEASSIGN_DEV_IRQ
2009-03-16 9:04 ` Avi Kivity
@ 2009-03-16 9:11 ` Sheng Yang
2009-03-16 9:14 ` Sheng Yang
0 siblings, 1 reply; 32+ messages in thread
From: Sheng Yang @ 2009-03-16 9:11 UTC (permalink / raw)
To: Avi Kivity; +Cc: Marcelo Tosatti, Anthony Liguori, kvm
On Monday 16 March 2009 17:04:19 Avi Kivity wrote:
> Sheng Yang wrote:
> > On Thursday 12 March 2009 21:36:44 Sheng Yang wrote:
> >> From: Marcelo Tosatti <mtosatti@redhat.com>
> >>
> >> Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> >
> > Oops.. Should be Marcelo's signed-off here...
>
> Should be both. If you are forwarding someone else's patch, even with
> no changes, you still need to add your signoff.
Thanks, I would update it again...
--
regards
Yang, Sheng
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 01/16] kvm: ioctl for KVM_ASSIGN_DEV_IRQ and KVM_DEASSIGN_DEV_IRQ
2009-03-16 9:11 ` Sheng Yang
@ 2009-03-16 9:14 ` Sheng Yang
0 siblings, 0 replies; 32+ messages in thread
From: Sheng Yang @ 2009-03-16 9:14 UTC (permalink / raw)
To: Avi Kivity, Anthony Liguori; +Cc: kvm, Marcelo Tosatti, Sheng Yang
From: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
libkvm/libkvm.c | 38 +++++++++++++++++++++++++++++++++++++-
libkvm/libkvm.h | 21 +++++++++++++++++----
2 files changed, 54 insertions(+), 5 deletions(-)
diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c
index 0ac1c28..80a0481 100644
--- a/libkvm/libkvm.c
+++ b/libkvm/libkvm.c
@@ -1141,7 +1141,7 @@ int kvm_assign_pci_device(kvm_context_t kvm,
return ret;
}
-int kvm_assign_irq(kvm_context_t kvm,
+static int kvm_old_assign_irq(kvm_context_t kvm,
struct kvm_assigned_irq *assigned_irq)
{
int ret;
@@ -1152,6 +1152,42 @@ int kvm_assign_irq(kvm_context_t kvm,
return ret;
}
+
+#ifdef KVM_CAP_ASSIGN_DEV_IRQ
+int kvm_assign_irq(kvm_context_t kvm,
+ struct kvm_assigned_irq *assigned_irq)
+{
+ int ret;
+
+ ret = ioctl(kvm->fd, KVM_CHECK_EXTENSION, KVM_CAP_ASSIGN_DEV_IRQ);
+ if (ret > 0) {
+ ret = ioctl(kvm->vm_fd, KVM_ASSIGN_DEV_IRQ, assigned_irq);
+ if (ret < 0)
+ return -errno;
+ return ret;
+ }
+
+ return kvm_old_assign_irq(kvm, assigned_irq);
+}
+
+int kvm_deassign_irq(kvm_context_t kvm,
+ struct kvm_assigned_irq *assigned_irq)
+{
+ int ret;
+
+ ret = ioctl(kvm->vm_fd, KVM_DEASSIGN_DEV_IRQ, assigned_irq);
+ if (ret < 0)
+ return -errno;
+
+ return ret;
+}
+#else
+int kvm_assign_irq(kvm_context_t kvm,
+ struct kvm_assigned_irq *assigned_irq)
+{
+ return kvm_old_assign_irq(kvm, assigned_irq);
+}
+#endif
#endif
#ifdef KVM_CAP_DEVICE_DEASSIGNMENT
diff --git a/libkvm/libkvm.h b/libkvm/libkvm.h
index 0239cb6..3e5efe0 100644
--- a/libkvm/libkvm.h
+++ b/libkvm/libkvm.h
@@ -715,11 +715,10 @@ int kvm_assign_pci_device(kvm_context_t kvm,
struct kvm_assigned_pci_dev *assigned_dev);
/*!
- * \brief Notifies host kernel about changes to IRQ for an assigned device
+ * \brief Assign IRQ for an assigned device
*
- * Used for PCI device assignment, this function notifies the host
- * kernel about the changes in IRQ number for an assigned physical
- * PCI device.
+ * Used for PCI device assignment, this function assigns IRQ numbers for
+ * an physical device and guest IRQ handling.
*
* \param kvm Pointer to the current kvm_context
* \param assigned_irq Parameters, like dev id, host irq, guest irq, etc
@@ -727,6 +726,20 @@ int kvm_assign_pci_device(kvm_context_t kvm,
int kvm_assign_irq(kvm_context_t kvm,
struct kvm_assigned_irq *assigned_irq);
+#ifdef KVM_CAP_ASSIGN_DEV_IRQ
+/*!
+ * \brief Deassign IRQ for an assigned device
+ *
+ * Used for PCI device assignment, this function deassigns IRQ numbers
+ * for an assigned device.
+ *
+ * \param kvm Pointer to the current kvm_context
+ * \param assigned_irq Parameters, like dev id, host irq, guest irq, etc
+ */
+int kvm_deassign_irq(kvm_context_t kvm,
+ struct kvm_assigned_irq *assigned_irq);
+#endif
+
/*!
* \brief Determines whether destroying memory regions is allowed
*
--
1.5.4.5
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 02/16] kvm: deassign irq for INTx
2009-03-12 13:36 [PATCH 0/16 v5] Device assignment improvement in userspace Sheng Yang
2009-03-12 13:36 ` [PATCH 01/16] kvm: ioctl for KVM_ASSIGN_DEV_IRQ and KVM_DEASSIGN_DEV_IRQ Sheng Yang
@ 2009-03-12 13:36 ` Sheng Yang
2009-03-12 13:36 ` [PATCH 03/16] kvm: Replace force type convert with container_of() Sheng Yang
` (14 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Sheng Yang @ 2009-03-12 13:36 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tosatti, Anthony Liguori
Cc: kvm, Marcelo Tosatti, Sheng Yang
From: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
qemu/hw/device-assignment.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
index 7c73210..19848b4 100644
--- a/qemu/hw/device-assignment.c
+++ b/qemu/hw/device-assignment.c
@@ -536,6 +536,14 @@ static int assign_irq(AssignedDevInfo *adev)
calc_assigned_dev_id(dev->h_busnr, dev->h_devfn);
assigned_irq_data.guest_irq = irq;
assigned_irq_data.host_irq = dev->real_device.irq;
+#ifdef KVM_CAP_ASSIGN_DEV_IRQ
+ assigned_irq_data.flags = KVM_DEV_IRQ_HOST_INTX | KVM_DEV_IRQ_GUEST_INTX;
+ r = kvm_deassign_irq(kvm_context, &assigned_irq_data);
+ /* -ENXIO means no assigned irq */
+ if (r && r != -ENXIO)
+ perror("assign_irq: deassign");
+#endif
+
r = kvm_assign_irq(kvm_context, &assigned_irq_data);
if (r < 0) {
fprintf(stderr, "Failed to assign irq for \"%s\": %s\n",
--
1.5.4.5
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 03/16] kvm: Replace force type convert with container_of()
2009-03-12 13:36 [PATCH 0/16 v5] Device assignment improvement in userspace Sheng Yang
2009-03-12 13:36 ` [PATCH 01/16] kvm: ioctl for KVM_ASSIGN_DEV_IRQ and KVM_DEASSIGN_DEV_IRQ Sheng Yang
2009-03-12 13:36 ` [PATCH 02/16] kvm: deassign irq for INTx Sheng Yang
@ 2009-03-12 13:36 ` Sheng Yang
2009-03-12 13:36 ` [PATCH 04/16] Make device assignment depend on libpci Sheng Yang
` (13 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Sheng Yang @ 2009-03-12 13:36 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tosatti, Anthony Liguori; +Cc: kvm, Sheng Yang
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
qemu/hw/device-assignment.c | 20 ++++++++++++--------
1 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
index 19848b4..e8a69ba 100644
--- a/qemu/hw/device-assignment.c
+++ b/qemu/hw/device-assignment.c
@@ -144,7 +144,7 @@ static uint32_t assigned_dev_ioport_readl(void *opaque, uint32_t addr)
static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num,
uint32_t e_phys, uint32_t e_size, int type)
{
- AssignedDevice *r_dev = (AssignedDevice *) pci_dev;
+ AssignedDevice *r_dev = container_of(pci_dev, AssignedDevice, dev);
AssignedDevRegion *region = &r_dev->v_addrs[region_num];
uint32_t old_ephys = region->e_physbase;
uint32_t old_esize = region->e_size;
@@ -175,7 +175,7 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num,
static void assigned_dev_ioport_map(PCIDevice *pci_dev, int region_num,
uint32_t addr, uint32_t size, int type)
{
- AssignedDevice *r_dev = (AssignedDevice *) pci_dev;
+ AssignedDevice *r_dev = container_of(pci_dev, AssignedDevice, dev);
AssignedDevRegion *region = &r_dev->v_addrs[region_num];
int first_map = (region->e_size == 0);
CPUState *env;
@@ -224,6 +224,7 @@ static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address,
{
int fd;
ssize_t ret;
+ AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev);
DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
@@ -245,7 +246,7 @@ static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address,
((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
(uint16_t) address, val, len);
- fd = ((AssignedDevice *)d)->real_device.config_fd;
+ fd = pci_dev->real_device.config_fd;
again:
ret = pwrite(fd, &val, len, address);
@@ -266,6 +267,7 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address,
uint32_t val = 0;
int fd;
ssize_t ret;
+ AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev);
if ((address >= 0x10 && address <= 0x24) || address == 0x34 ||
address == 0x3c || address == 0x3d) {
@@ -279,7 +281,7 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address,
if (address == 0xFC)
goto do_log;
- fd = ((AssignedDevice *)d)->real_device.config_fd;
+ fd = pci_dev->real_device.config_fd;
again:
ret = pread(fd, &val, len, address);
@@ -618,15 +620,17 @@ struct PCIDevice *init_assigned_device(AssignedDevInfo *adev, PCIBus *bus)
{
int r;
AssignedDevice *dev;
+ PCIDevice *pci_dev;
uint8_t e_device, e_intx;
DEBUG("Registering real physical device %s (bus=%x dev=%x func=%x)\n",
adev->name, adev->bus, adev->dev, adev->func);
- dev = (AssignedDevice *)
- pci_register_device(bus, adev->name, sizeof(AssignedDevice),
- -1, assigned_dev_pci_read_config,
- assigned_dev_pci_write_config);
+ pci_dev = pci_register_device(bus, adev->name,
+ sizeof(AssignedDevice), -1, assigned_dev_pci_read_config,
+ assigned_dev_pci_write_config);
+ dev = container_of(pci_dev, AssignedDevice, dev);
+
if (NULL == dev) {
fprintf(stderr, "%s: Error: Couldn't register real device %s\n",
__func__, adev->name);
--
1.5.4.5
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 04/16] Make device assignment depend on libpci
2009-03-12 13:36 [PATCH 0/16 v5] Device assignment improvement in userspace Sheng Yang
` (2 preceding siblings ...)
2009-03-12 13:36 ` [PATCH 03/16] kvm: Replace force type convert with container_of() Sheng Yang
@ 2009-03-12 13:36 ` Sheng Yang
2009-03-12 13:36 ` [PATCH 05/16] Figure out device capability Sheng Yang
` (12 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Sheng Yang @ 2009-03-12 13:36 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tosatti, Anthony Liguori; +Cc: kvm, Sheng Yang
Which is used later for capability detection.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
qemu/Makefile.target | 1 +
qemu/configure | 20 ++++++++++++++++++++
qemu/hw/pci.h | 8 ++++++++
3 files changed, 29 insertions(+), 0 deletions(-)
diff --git a/qemu/Makefile.target b/qemu/Makefile.target
index 460a3a8..21432e1 100644
--- a/qemu/Makefile.target
+++ b/qemu/Makefile.target
@@ -643,6 +643,7 @@ OBJS += msmouse.o
ifeq ($(USE_KVM_DEVICE_ASSIGNMENT), 1)
OBJS+= device-assignment.o
+LIBS+=-lpci
endif
ifeq ($(TARGET_BASE_ARCH), i386)
diff --git a/qemu/configure b/qemu/configure
index 88d3988..c0d61fc 100755
--- a/qemu/configure
+++ b/qemu/configure
@@ -806,6 +806,26 @@ EOF
fi
fi
+# libpci probe for kvm_cap_device_assignment
+if test $kvm_cap_device_assignment = "yes" ; then
+cat > $TMPC << EOF
+#include <pci/pci.h>
+#ifndef PCI_VENDOR_ID
+#error NO LIBPCI
+#endif
+int main(void) { return 0; }
+EOF
+ if $cc $ARCH_CFLAGS -o $TMPE ${OS_CFLAGS} $TMPC 2>/dev/null ; then
+ :
+ else
+ echo
+ echo "Error: libpci check failed"
+ echo "Disable KVM Device Assignment capability."
+ echo
+ kvm_cap_device_assignment="no"
+ fi
+fi
+
##########################################
# zlib check
diff --git a/qemu/hw/pci.h b/qemu/hw/pci.h
index 543c87a..2327215 100644
--- a/qemu/hw/pci.h
+++ b/qemu/hw/pci.h
@@ -173,9 +173,17 @@ typedef struct PCIIORegion {
#define PCI_STATUS_RESERVED1 0x007
#define PCI_STATUS_INT_STATUS 0x008
#define PCI_STATUS_CAPABILITIES 0x010
+
+#ifndef PCI_STATUS_66MHZ
#define PCI_STATUS_66MHZ 0x020
+#endif
+
#define PCI_STATUS_RESERVED2 0x040
+
+#ifndef PCI_STATUS_FAST_BACK
#define PCI_STATUS_FAST_BACK 0x080
+#endif
+
#define PCI_STATUS_DEVSEL 0x600
#define PCI_STATUS_RESERVED_MASK_LO (PCI_STATUS_RESERVED1 | \
--
1.5.4.5
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 05/16] Figure out device capability
2009-03-12 13:36 [PATCH 0/16 v5] Device assignment improvement in userspace Sheng Yang
` (3 preceding siblings ...)
2009-03-12 13:36 ` [PATCH 04/16] Make device assignment depend on libpci Sheng Yang
@ 2009-03-12 13:36 ` Sheng Yang
2009-03-12 13:36 ` [PATCH 06/16] Support for " Sheng Yang
` (11 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Sheng Yang @ 2009-03-12 13:36 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tosatti, Anthony Liguori; +Cc: kvm, Sheng Yang, Allen Kay
Try to figure out device capability in update_dev_cap(). Now we are only care
about MSI capability.
The function pci_find_cap_offset original function wrote by Allen for Xen.
Notice the function need root privilege to work. This depends on libpci to work.
Signed-off-by: Allen Kay <allen.m.kay@intel.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
qemu/hw/device-assignment.c | 29 +++++++++++++++++++++++++++++
qemu/hw/device-assignment.h | 1 +
2 files changed, 30 insertions(+), 0 deletions(-)
diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
index e8a69ba..a354681 100644
--- a/qemu/hw/device-assignment.c
+++ b/qemu/hw/device-assignment.c
@@ -219,6 +219,35 @@ static void assigned_dev_ioport_map(PCIDevice *pci_dev, int region_num,
(r_dev->v_addrs + region_num));
}
+static uint8_t pci_find_cap_offset(struct pci_dev *pci_dev, uint8_t cap)
+{
+ int id;
+ int max_cap = 48;
+ int pos = PCI_CAPABILITY_LIST;
+ int status;
+
+ status = pci_read_byte(pci_dev, PCI_STATUS);
+ if ((status & PCI_STATUS_CAP_LIST) == 0)
+ return 0;
+
+ while (max_cap--) {
+ pos = pci_read_byte(pci_dev, pos);
+ if (pos < 0x40)
+ break;
+
+ pos &= ~3;
+ id = pci_read_byte(pci_dev, pos + PCI_CAP_LIST_ID);
+
+ if (id == 0xff)
+ break;
+ if (id == cap)
+ return pos;
+
+ pos += PCI_CAP_LIST_NEXT;
+ }
+ return 0;
+}
+
static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address,
uint32_t val, int len)
{
diff --git a/qemu/hw/device-assignment.h b/qemu/hw/device-assignment.h
index da775d7..0fd78de 100644
--- a/qemu/hw/device-assignment.h
+++ b/qemu/hw/device-assignment.h
@@ -29,6 +29,7 @@
#define __DEVICE_ASSIGNMENT_H__
#include <sys/mman.h>
+#include <pci/pci.h>
#include "qemu-common.h"
#include "sys-queue.h"
#include "pci.h"
--
1.5.4.5
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 06/16] Support for device capability
2009-03-12 13:36 [PATCH 0/16 v5] Device assignment improvement in userspace Sheng Yang
` (4 preceding siblings ...)
2009-03-12 13:36 ` [PATCH 05/16] Figure out device capability Sheng Yang
@ 2009-03-12 13:36 ` Sheng Yang
2009-03-12 13:36 ` [PATCH 07/16] kvm: user interface for MSI type irq routing Sheng Yang
` (10 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Sheng Yang @ 2009-03-12 13:36 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tosatti, Anthony Liguori; +Cc: kvm, Sheng Yang
This framework can be easily extended to support device capability, like
MSI/MSI-x.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
qemu/hw/pci.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
qemu/hw/pci.h | 29 +++++++++++++++++++++
2 files changed, 104 insertions(+), 2 deletions(-)
diff --git a/qemu/hw/pci.c b/qemu/hw/pci.c
index 821646c..eca0517 100644
--- a/qemu/hw/pci.c
+++ b/qemu/hw/pci.c
@@ -427,8 +427,8 @@ static void pci_update_mappings(PCIDevice *d)
}
}
-uint32_t pci_default_read_config(PCIDevice *d,
- uint32_t address, int len)
+static uint32_t pci_read_config(PCIDevice *d,
+ uint32_t address, int len)
{
uint32_t val;
@@ -453,6 +453,45 @@ uint32_t pci_default_read_config(PCIDevice *d,
return val;
}
+static void pci_write_config(PCIDevice *pci_dev,
+ uint32_t address, uint32_t val, int len)
+{
+ int i;
+ for (i = 0; i < len; i++) {
+ pci_dev->config[address + i] = val & 0xff;
+ val >>= 8;
+ }
+}
+
+int pci_access_cap_config(PCIDevice *pci_dev, uint32_t address, int len)
+{
+ if (pci_dev->cap.supported && address >= pci_dev->cap.start &&
+ (address + len) < pci_dev->cap.start + pci_dev->cap.length)
+ return 1;
+ return 0;
+}
+
+uint32_t pci_default_cap_read_config(PCIDevice *pci_dev,
+ uint32_t address, int len)
+{
+ return pci_read_config(pci_dev, address, len);
+}
+
+void pci_default_cap_write_config(PCIDevice *pci_dev,
+ uint32_t address, uint32_t val, int len)
+{
+ pci_write_config(pci_dev, address, val, len);
+}
+
+uint32_t pci_default_read_config(PCIDevice *d,
+ uint32_t address, int len)
+{
+ if (pci_access_cap_config(d, address, len))
+ return d->cap.config_read(d, address, len);
+
+ return pci_read_config(d, address, len);
+}
+
void pci_default_write_config(PCIDevice *d,
uint32_t address, uint32_t val, int len)
{
@@ -485,6 +524,11 @@ void pci_default_write_config(PCIDevice *d,
return;
}
default_config:
+ if (pci_access_cap_config(d, address, len)) {
+ d->cap.config_write(d, address, val, len);
+ return;
+ }
+
/* not efficient, but simple */
addr = address;
for(i = 0; i < len; i++) {
@@ -905,3 +949,32 @@ PCIBus *pci_bridge_init(PCIBus *bus, int devfn, uint16_t vid, uint16_t did,
s->bus = pci_register_secondary_bus(&s->dev, map_irq);
return s->bus;
}
+
+int pci_enable_capability_support(PCIDevice *pci_dev,
+ uint32_t config_start,
+ PCICapConfigReadFunc *config_read,
+ PCICapConfigWriteFunc *config_write,
+ PCICapConfigInitFunc *config_init)
+{
+ if (!pci_dev)
+ return -ENODEV;
+
+ if (config_start == 0)
+ pci_dev->cap.start = PCI_CAPABILITY_CONFIG_DEFAULT_START_ADDR;
+ else if (config_start >= 0x40 && config_start < 0xff)
+ pci_dev->cap.start = config_start;
+ else
+ return -EINVAL;
+
+ if (config_read)
+ pci_dev->cap.config_read = config_read;
+ else
+ pci_dev->cap.config_read = pci_default_cap_read_config;
+ if (config_write)
+ pci_dev->cap.config_write = config_write;
+ else
+ pci_dev->cap.config_write = pci_default_cap_write_config;
+ pci_dev->cap.supported = 1;
+ pci_dev->config[PCI_CAPABILITY_LIST] = pci_dev->cap.start;
+ return config_init(pci_dev);
+}
diff --git a/qemu/hw/pci.h b/qemu/hw/pci.h
index 2327215..127dbed 100644
--- a/qemu/hw/pci.h
+++ b/qemu/hw/pci.h
@@ -139,6 +139,12 @@ typedef void PCIMapIORegionFunc(PCIDevice *pci_dev, int region_num,
uint32_t addr, uint32_t size, int type);
typedef int PCIUnregisterFunc(PCIDevice *pci_dev);
+typedef void PCICapConfigWriteFunc(PCIDevice *pci_dev,
+ uint32_t address, uint32_t val, int len);
+typedef uint32_t PCICapConfigReadFunc(PCIDevice *pci_dev,
+ uint32_t address, int len);
+typedef int PCICapConfigInitFunc(PCIDevice *pci_dev);
+
#define PCI_ADDRESS_SPACE_MEM 0x00
#define PCI_ADDRESS_SPACE_IO 0x01
#define PCI_ADDRESS_SPACE_MEM_PREFETCH 0x08
@@ -197,6 +203,10 @@ typedef struct PCIIORegion {
#define PCI_COMMAND_RESERVED_MASK_HI (PCI_COMMAND_RESERVED >> 8)
+#define PCI_CAPABILITY_CONFIG_MAX_LENGTH 0x60
+#define PCI_CAPABILITY_CONFIG_DEFAULT_START_ADDR 0x40
+#define PCI_CAPABILITY_CONFIG_MSI_LENGTH 0x10
+
struct PCIDevice {
/* PCI config space */
uint8_t config[256];
@@ -219,6 +229,14 @@ struct PCIDevice {
/* Current IRQ levels. Used internally by the generic PCI code. */
int irq_state[4];
+
+ /* Device capability configuration space */
+ struct {
+ int supported;
+ unsigned int start, length;
+ PCICapConfigReadFunc *config_read;
+ PCICapConfigWriteFunc *config_write;
+ } cap;
};
PCIDevice *pci_register_device(PCIBus *bus, const char *name,
@@ -231,6 +249,12 @@ void pci_register_io_region(PCIDevice *pci_dev, int region_num,
uint32_t size, int type,
PCIMapIORegionFunc *map_func);
+int pci_enable_capability_support(PCIDevice *pci_dev,
+ uint32_t config_start,
+ PCICapConfigReadFunc *config_read,
+ PCICapConfigWriteFunc *config_write,
+ PCICapConfigInitFunc *config_init);
+
int pci_map_irq(PCIDevice *pci_dev, int pin);
uint32_t pci_default_read_config(PCIDevice *d,
uint32_t address, int len);
@@ -238,6 +262,11 @@ void pci_default_write_config(PCIDevice *d,
uint32_t address, uint32_t val, int len);
void pci_device_save(PCIDevice *s, QEMUFile *f);
int pci_device_load(PCIDevice *s, QEMUFile *f);
+uint32_t pci_default_cap_read_config(PCIDevice *pci_dev,
+ uint32_t address, int len);
+void pci_default_cap_write_config(PCIDevice *pci_dev,
+ uint32_t address, uint32_t val, int len);
+int pci_access_cap_config(PCIDevice *pci_dev, uint32_t address, int len);
typedef void (*pci_set_irq_fn)(qemu_irq *pic, int irq_num, int level);
typedef int (*pci_map_irq_fn)(PCIDevice *pci_dev, int irq_num);
--
1.5.4.5
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 07/16] kvm: user interface for MSI type irq routing
2009-03-12 13:36 [PATCH 0/16 v5] Device assignment improvement in userspace Sheng Yang
` (5 preceding siblings ...)
2009-03-12 13:36 ` [PATCH 06/16] Support for " Sheng Yang
@ 2009-03-12 13:36 ` Sheng Yang
2009-03-12 13:36 ` [PATCH 08/16] kvm: libkvm: allocate unused gsi for " Sheng Yang
` (9 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Sheng Yang @ 2009-03-12 13:36 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tosatti, Anthony Liguori; +Cc: kvm, Sheng Yang
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
libkvm/libkvm.c | 98 ++++++++++++++++++++++++++++++++++++++++++++-----------
libkvm/libkvm.h | 22 ++++++++++++
2 files changed, 101 insertions(+), 19 deletions(-)
diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c
index 80a0481..e9bae23 100644
--- a/libkvm/libkvm.c
+++ b/libkvm/libkvm.c
@@ -1265,11 +1265,12 @@ int kvm_clear_gsi_routes(kvm_context_t kvm)
#endif
}
-int kvm_add_irq_route(kvm_context_t kvm, int gsi, int irqchip, int pin)
+int kvm_add_routing_entry(kvm_context_t kvm,
+ struct kvm_irq_routing_entry* entry)
{
#ifdef KVM_CAP_IRQ_ROUTING
struct kvm_irq_routing *z;
- struct kvm_irq_routing_entry *e;
+ struct kvm_irq_routing_entry *new;
int n, size;
if (kvm->irq_routes->nr == kvm->nr_allocated_irq_routes) {
@@ -1277,7 +1278,7 @@ int kvm_add_irq_route(kvm_context_t kvm, int gsi, int irqchip, int pin)
if (n < 64)
n = 64;
size = sizeof(struct kvm_irq_routing);
- size += n * sizeof(*e);
+ size += n * sizeof(*new);
z = realloc(kvm->irq_routes, size);
if (!z)
return -ENOMEM;
@@ -1285,34 +1286,77 @@ int kvm_add_irq_route(kvm_context_t kvm, int gsi, int irqchip, int pin)
kvm->irq_routes = z;
}
n = kvm->irq_routes->nr++;
- e = &kvm->irq_routes->entries[n];
- memset(e, 0, sizeof(*e));
- e->gsi = gsi;
- e->type = KVM_IRQ_ROUTING_IRQCHIP;
- e->flags = 0;
- e->u.irqchip.irqchip = irqchip;
- e->u.irqchip.pin = pin;
+ new = &kvm->irq_routes->entries[n];
+ memset(new, 0, sizeof(*new));
+ new->gsi = entry->gsi;
+ new->type = entry->type;
+ new->flags = entry->flags;
+ new->u = entry->u;
return 0;
#else
return -ENOSYS;
#endif
}
-int kvm_del_irq_route(kvm_context_t kvm, int gsi, int irqchip, int pin)
+int kvm_add_irq_route(kvm_context_t kvm, int gsi, int irqchip, int pin)
+{
+#ifdef KVM_CAP_IRQ_ROUTING
+ struct kvm_irq_routing_entry e;
+
+ e.gsi = gsi;
+ e.type = KVM_IRQ_ROUTING_IRQCHIP;
+ e.flags = 0;
+ e.u.irqchip.irqchip = irqchip;
+ e.u.irqchip.pin = pin;
+ return kvm_add_routing_entry(kvm, &e);
+#else
+ return -ENOSYS;
+#endif
+}
+
+int kvm_del_routing_entry(kvm_context_t kvm,
+ struct kvm_irq_routing_entry* entry)
{
#ifdef KVM_CAP_IRQ_ROUTING
struct kvm_irq_routing_entry *e, *p;
- int i;
+ int i, found = 0;
for (i = 0; i < kvm->irq_routes->nr; ++i) {
e = &kvm->irq_routes->entries[i];
- if (e->type == KVM_IRQ_ROUTING_IRQCHIP
- && e->gsi == gsi
- && e->u.irqchip.irqchip == irqchip
- && e->u.irqchip.pin == pin) {
- p = &kvm->irq_routes->entries[--kvm->irq_routes->nr];
- *e = *p;
- return 0;
+ if (e->type == entry->type
+ && e->gsi == entry->gsi) {
+ switch (e->type)
+ {
+ case KVM_IRQ_ROUTING_IRQCHIP: {
+ if (e->u.irqchip.irqchip ==
+ entry->u.irqchip.irqchip
+ && e->u.irqchip.pin ==
+ entry->u.irqchip.pin) {
+ p = &kvm->irq_routes->
+ entries[--kvm->irq_routes->nr];
+ *e = *p;
+ found = 1;
+ }
+ break;
+ }
+ case KVM_IRQ_ROUTING_MSI: {
+ if (e->u.msi.address_lo ==
+ entry->u.msi.address_lo
+ && e->u.msi.address_hi ==
+ entry->u.msi.address_hi
+ && e->u.msi.data == entry->u.msi.data) {
+ p = &kvm->irq_routes->
+ entries[--kvm->irq_routes->nr];
+ *e = *p;
+ found = 1;
+ }
+ break;
+ }
+ default:
+ break;
+ }
+ if (found)
+ return 0;
}
}
return -ESRCH;
@@ -1321,6 +1365,22 @@ int kvm_del_irq_route(kvm_context_t kvm, int gsi, int irqchip, int pin)
#endif
}
+int kvm_del_irq_route(kvm_context_t kvm, int gsi, int irqchip, int pin)
+{
+#ifdef KVM_CAP_IRQ_ROUTING
+ struct kvm_irq_routing_entry e;
+
+ e.gsi = gsi;
+ e.type = KVM_IRQ_ROUTING_IRQCHIP;
+ e.flags = 0;
+ e.u.irqchip.irqchip = irqchip;
+ e.u.irqchip.pin = pin;
+ return kvm_del_routing_entry(kvm, &e);
+#else
+ return -ENOSYS;
+#endif
+}
+
int kvm_commit_irq_routes(kvm_context_t kvm)
{
#ifdef KVM_CAP_IRQ_ROUTING
diff --git a/libkvm/libkvm.h b/libkvm/libkvm.h
index 3e5efe0..51f4d08 100644
--- a/libkvm/libkvm.h
+++ b/libkvm/libkvm.h
@@ -816,6 +816,28 @@ int kvm_add_irq_route(kvm_context_t kvm, int gsi, int irqchip, int pin);
int kvm_del_irq_route(kvm_context_t kvm, int gsi, int irqchip, int pin);
/*!
+ * \brief Adds a routing entry to the temporary irq routing table
+ *
+ * Adds a filled routing entry to the temporary irq routing table. Nothing is
+ * committed to the running VM.
+ *
+ * \param kvm Pointer to the current kvm_context
+ */
+int kvm_add_routing_entry(kvm_context_t kvm,
+ struct kvm_irq_routing_entry* entry);
+
+/*!
+ * \brief Removes a routing from the temporary irq routing table
+ *
+ * Remove a routing to the temporary irq routing table. Nothing is
+ * committed to the running VM.
+ *
+ * \param kvm Pointer to the current kvm_context
+ */
+int kvm_del_routing_entry(kvm_context_t kvm,
+ struct kvm_irq_routing_entry* entry);
+
+/*!
* \brief Commit the temporary irq routing table
*
* Commit the temporary irq routing table to the running VM.
--
1.5.4.5
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 08/16] kvm: libkvm: allocate unused gsi for irq routing
2009-03-12 13:36 [PATCH 0/16 v5] Device assignment improvement in userspace Sheng Yang
` (6 preceding siblings ...)
2009-03-12 13:36 ` [PATCH 07/16] kvm: user interface for MSI type irq routing Sheng Yang
@ 2009-03-12 13:36 ` Sheng Yang
2009-03-16 8:31 ` Sheng Yang
2009-03-12 13:36 ` [PATCH 09/16] kvm: expose MSI capability to guest Sheng Yang
` (8 subsequent siblings)
16 siblings, 1 reply; 32+ messages in thread
From: Sheng Yang @ 2009-03-12 13:36 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tosatti, Anthony Liguori; +Cc: kvm, Sheng Yang
Notice here is a simple solution, can be replaced later.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
libkvm/kvm-common.h | 1 +
libkvm/libkvm.c | 15 +++++++++++++++
libkvm/libkvm.h | 8 ++++++++
3 files changed, 24 insertions(+), 0 deletions(-)
diff --git a/libkvm/kvm-common.h b/libkvm/kvm-common.h
index de1ada2..70a95c2 100644
--- a/libkvm/kvm-common.h
+++ b/libkvm/kvm-common.h
@@ -66,6 +66,7 @@ struct kvm_context {
#ifdef KVM_CAP_IRQ_ROUTING
struct kvm_irq_routing *irq_routes;
int nr_allocated_irq_routes;
+ int max_used_gsi;
#endif
};
diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c
index e9bae23..405b0bf 100644
--- a/libkvm/libkvm.c
+++ b/libkvm/libkvm.c
@@ -1292,6 +1292,9 @@ int kvm_add_routing_entry(kvm_context_t kvm,
new->type = entry->type;
new->flags = entry->flags;
new->u = entry->u;
+
+ if (entry->gsi > kvm->max_used_gsi)
+ kvm->max_used_gsi = entry->gsi;
return 0;
#else
return -ENOSYS;
@@ -1395,3 +1398,15 @@ int kvm_commit_irq_routes(kvm_context_t kvm)
return -ENOSYS;
#endif
}
+
+int kvm_get_irq_route_gsi(kvm_context_t kvm)
+{
+ if (kvm->max_used_gsi >= KVM_IOAPIC_NUM_PINS) {
+ if (kvm->max_used_gsi <= kvm_get_gsi_count(kvm))
+ return kvm->max_used_gsi + 1;
+ else
+ return -ENOSPC;
+ } else
+ return KVM_IOAPIC_NUM_PINS;
+}
+
diff --git a/libkvm/libkvm.h b/libkvm/libkvm.h
index 51f4d08..9a7cbc6 100644
--- a/libkvm/libkvm.h
+++ b/libkvm/libkvm.h
@@ -846,4 +846,12 @@ int kvm_del_routing_entry(kvm_context_t kvm,
*/
int kvm_commit_irq_routes(kvm_context_t kvm);
+/*!
+ * \brief Get unused GSI number for irq routing table
+ *
+ * Get unused GSI number for irq routing table
+ *
+ * \param kvm Pointer to the current kvm_context
+ */
+int kvm_get_irq_route_gsi(kvm_context_t kvm);
#endif
--
1.5.4.5
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 08/16] kvm: libkvm: allocate unused gsi for irq routing
2009-03-12 13:36 ` [PATCH 08/16] kvm: libkvm: allocate unused gsi for " Sheng Yang
@ 2009-03-16 8:31 ` Sheng Yang
0 siblings, 0 replies; 32+ messages in thread
From: Sheng Yang @ 2009-03-16 8:31 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tosatti, Anthony Liguori; +Cc: kvm, Sheng Yang
Notice here is a simple solution, can be replaced later.
(update: fix kvm_check_extension() only return 0 or 1, which is inconsisent
with kvm_get_gsi_count())
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
libkvm/kvm-common.h | 1 +
libkvm/libkvm.c | 17 ++++++++++++++++-
libkvm/libkvm.h | 8 ++++++++
3 files changed, 25 insertions(+), 1 deletions(-)
diff --git a/libkvm/kvm-common.h b/libkvm/kvm-common.h
index de1ada2..70a95c2 100644
--- a/libkvm/kvm-common.h
+++ b/libkvm/kvm-common.h
@@ -66,6 +66,7 @@ struct kvm_context {
#ifdef KVM_CAP_IRQ_ROUTING
struct kvm_irq_routing *irq_routes;
int nr_allocated_irq_routes;
+ int max_used_gsi;
#endif
};
diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c
index e9bae23..d38ca00 100644
--- a/libkvm/libkvm.c
+++ b/libkvm/libkvm.c
@@ -421,7 +421,7 @@ int kvm_check_extension(kvm_context_t kvm, int ext)
ret = ioctl(kvm->fd, KVM_CHECK_EXTENSION, ext);
if (ret > 0)
- return 1;
+ return ret;
return 0;
}
@@ -1292,6 +1292,9 @@ int kvm_add_routing_entry(kvm_context_t kvm,
new->type = entry->type;
new->flags = entry->flags;
new->u = entry->u;
+
+ if (entry->gsi > kvm->max_used_gsi)
+ kvm->max_used_gsi = entry->gsi;
return 0;
#else
return -ENOSYS;
@@ -1395,3 +1398,15 @@ int kvm_commit_irq_routes(kvm_context_t kvm)
return -ENOSYS;
#endif
}
+
+int kvm_get_irq_route_gsi(kvm_context_t kvm)
+{
+ if (kvm->max_used_gsi >= KVM_IOAPIC_NUM_PINS) {
+ if (kvm->max_used_gsi <= kvm_get_gsi_count(kvm))
+ return kvm->max_used_gsi + 1;
+ else
+ return -ENOSPC;
+ } else
+ return KVM_IOAPIC_NUM_PINS;
+}
+
diff --git a/libkvm/libkvm.h b/libkvm/libkvm.h
index 51f4d08..9a7cbc6 100644
--- a/libkvm/libkvm.h
+++ b/libkvm/libkvm.h
@@ -846,4 +846,12 @@ int kvm_del_routing_entry(kvm_context_t kvm,
*/
int kvm_commit_irq_routes(kvm_context_t kvm);
+/*!
+ * \brief Get unused GSI number for irq routing table
+ *
+ * Get unused GSI number for irq routing table
+ *
+ * \param kvm Pointer to the current kvm_context
+ */
+int kvm_get_irq_route_gsi(kvm_context_t kvm);
#endif
--
1.5.4.5
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 09/16] kvm: expose MSI capability to guest
2009-03-12 13:36 [PATCH 0/16 v5] Device assignment improvement in userspace Sheng Yang
` (7 preceding siblings ...)
2009-03-12 13:36 ` [PATCH 08/16] kvm: libkvm: allocate unused gsi for " Sheng Yang
@ 2009-03-12 13:36 ` Sheng Yang
2009-03-12 13:36 ` [PATCH 10/16] kvm: Support MSI convert to INTx in device assignment Sheng Yang
` (7 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Sheng Yang @ 2009-03-12 13:36 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tosatti, Anthony Liguori
Cc: kvm, Sheng Yang, Alexander Duyck
(Alex: correct libpci usage)
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
qemu/hw/device-assignment.c | 140 ++++++++++++++++++++++++++++++++++++++++--
qemu/hw/device-assignment.h | 9 +++
2 files changed, 142 insertions(+), 7 deletions(-)
diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
index a354681..bda0e95 100644
--- a/qemu/hw/device-assignment.c
+++ b/qemu/hw/device-assignment.c
@@ -265,7 +265,8 @@ static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address,
}
if ((address >= 0x10 && address <= 0x24) || address == 0x34 ||
- address == 0x3c || address == 0x3d) {
+ address == 0x3c || address == 0x3d ||
+ pci_access_cap_config(d, address, len)) {
/* used for update-mappings (BAR emulation) */
pci_default_write_config(d, address, val, len);
return;
@@ -299,7 +300,8 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address,
AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev);
if ((address >= 0x10 && address <= 0x24) || address == 0x34 ||
- address == 0x3c || address == 0x3d) {
+ address == 0x3c || address == 0x3d ||
+ pci_access_cap_config(d, address, len)) {
val = pci_default_read_config(d, address, len);
DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
(d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
@@ -328,11 +330,13 @@ do_log:
DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
(d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
- /* kill the special capabilities */
- if (address == 4 && len == 4)
- val &= ~0x100000;
- else if (address == 6)
- val &= ~0x10;
+ if (!pci_dev->cap.available) {
+ /* kill the special capabilities */
+ if (address == 4 && len == 4)
+ val &= ~0x100000;
+ else if (address == 6)
+ val &= ~0x10;
+ }
return val;
}
@@ -474,6 +478,19 @@ again:
static LIST_HEAD(, AssignedDevInfo) adev_head;
+#ifdef KVM_CAP_IRQ_ROUTING
+static void free_dev_irq_entries(AssignedDevice *dev)
+{
+ int i;
+
+ for (i = 0; i < dev->irq_entries_nr; i++)
+ kvm_del_routing_entry(kvm_context, &dev->entry[i]);
+ free(dev->entry);
+ dev->entry = NULL;
+ dev->irq_entries_nr = 0;
+}
+#endif
+
static void free_assigned_device(AssignedDevInfo *adev)
{
AssignedDevice *dev = adev->assigned_dev;
@@ -506,6 +523,9 @@ static void free_assigned_device(AssignedDevInfo *adev)
}
pci_unregister_device(&dev->dev);
+#ifdef KVM_CAP_IRQ_ROUTING
+ free_dev_irq_entries(dev);
+#endif
adev->assigned_dev = dev = NULL;
}
@@ -645,11 +665,108 @@ void assigned_dev_update_irqs()
}
}
+#if defined(KVM_CAP_DEVICE_MSI) && defined (KVM_CAP_IRQ_ROUTING)
+static void assigned_dev_update_msi(PCIDevice *pci_dev, unsigned int ctrl_pos)
+{
+ struct kvm_assigned_irq assigned_irq_data;
+ AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice, dev);
+ uint8_t ctrl_byte = pci_dev->config[ctrl_pos];
+ int r;
+
+ memset(&assigned_irq_data, 0, sizeof assigned_irq_data);
+ assigned_irq_data.assigned_dev_id =
+ calc_assigned_dev_id(assigned_dev->h_busnr,
+ (uint8_t)assigned_dev->h_devfn);
+
+ assigned_irq_data.flags = KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_MSI;
+
+ free_dev_irq_entries(assigned_dev);
+ r = kvm_deassign_irq(kvm_context, &assigned_irq_data);
+ /* -ENXIO means no assigned irq */
+ if (r && r != -ENXIO)
+ perror("assigned_dev_update_msi: deassign irq");
+
+ if (ctrl_byte & PCI_MSI_FLAGS_ENABLE) {
+ assigned_dev->entry = calloc(1, sizeof(struct kvm_irq_routing_entry));
+ if (!assigned_dev->entry) {
+ perror("assigned_dev_update_msi: ");
+ return;
+ }
+ assigned_dev->entry->u.msi.address_lo =
+ *(uint32_t *)(pci_dev->config + pci_dev->cap.start +
+ PCI_MSI_ADDRESS_LO);
+ assigned_dev->entry->u.msi.address_hi = 0;
+ assigned_dev->entry->u.msi.data = *(uint16_t *)(pci_dev->config +
+ pci_dev->cap.start + PCI_MSI_DATA_32);
+ assigned_dev->entry->type = KVM_IRQ_ROUTING_MSI;
+ assigned_dev->entry->gsi = kvm_get_irq_route_gsi(kvm_context);
+ if (assigned_dev->entry->gsi < 0) {
+ perror("assigned_dev_update_msi: kvm_get_irq_route_gsi");
+ return;
+ }
+
+ kvm_add_routing_entry(kvm_context, assigned_dev->entry);
+ if (kvm_commit_irq_routes(kvm_context) < 0) {
+ perror("assigned_dev_update_msi: kvm_commit_irq_routes");
+ assigned_dev->cap.state &= ~ASSIGNED_DEVICE_MSI_ENABLED;
+ return;
+ }
+ assigned_irq_data.guest_irq = assigned_dev->entry->gsi;
+ }
+
+ if (ctrl_byte & PCI_MSI_FLAGS_ENABLE)
+ if (kvm_assign_irq(kvm_context, &assigned_irq_data) < 0)
+ perror("assigned_dev_enable_msi: assign irq");
+}
+#endif
+
+static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t address,
+ uint32_t val, int len)
+{
+ AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice, dev);
+ unsigned int pos = pci_dev->cap.start, ctrl_pos;
+
+ pci_default_cap_write_config(pci_dev, address, val, len);
+#if defined(KVM_CAP_DEVICE_MSI) && defined (KVM_CAP_IRQ_ROUTING)
+ if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSI) {
+ ctrl_pos = pos + PCI_MSI_FLAGS;
+ if (address <= ctrl_pos && address + len > ctrl_pos)
+ assigned_dev_update_msi(pci_dev, ctrl_pos);
+ pos += PCI_CAPABILITY_CONFIG_MSI_LENGTH;
+ }
+#endif
+ return;
+}
+
+static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
+{
+ AssignedDevice *dev = container_of(pci_dev, AssignedDevice, dev);
+ int next_cap_pt = 0;
+
+ pci_dev->cap.length = 0;
+#if defined(KVM_CAP_DEVICE_MSI) && defined (KVM_CAP_IRQ_ROUTING)
+ /* Expose MSI capability
+ * MSI capability is the 1st capability in capability config */
+ if (pci_find_cap_offset(dev->pdev, PCI_CAP_ID_MSI)) {
+ dev->cap.available |= ASSIGNED_DEVICE_CAP_MSI;
+ memset(&pci_dev->config[pci_dev->cap.start + pci_dev->cap.length],
+ 0, PCI_CAPABILITY_CONFIG_MSI_LENGTH);
+ pci_dev->config[pci_dev->cap.start + pci_dev->cap.length] =
+ PCI_CAP_ID_MSI;
+ pci_dev->cap.length += PCI_CAPABILITY_CONFIG_MSI_LENGTH;
+ next_cap_pt = 1;
+ }
+#endif
+
+ return 0;
+}
+
struct PCIDevice *init_assigned_device(AssignedDevInfo *adev, PCIBus *bus)
{
int r;
AssignedDevice *dev;
PCIDevice *pci_dev;
+ struct pci_access *pacc;
uint8_t e_device, e_intx;
DEBUG("Registering real physical device %s (bus=%x dev=%x func=%x)\n",
@@ -689,6 +806,10 @@ struct PCIDevice *init_assigned_device(AssignedDevInfo *adev, PCIBus *bus)
dev->h_busnr = adev->bus;
dev->h_devfn = PCI_DEVFN(adev->dev, adev->func);
+ pacc = pci_alloc();
+ pci_init(pacc);
+ dev->pdev = pci_get_dev(pacc, 0, adev->bus, adev->dev, adev->func);
+
/* assign device to guest */
r = assign_device(adev);
if (r < 0)
@@ -699,6 +820,11 @@ struct PCIDevice *init_assigned_device(AssignedDevInfo *adev, PCIBus *bus)
if (r < 0)
goto assigned_out;
+ if (pci_enable_capability_support(pci_dev, 0, NULL,
+ assigned_device_pci_cap_write_config,
+ assigned_device_pci_cap_init) < 0)
+ goto assigned_out;
+
return &dev->dev;
assigned_out:
diff --git a/qemu/hw/device-assignment.h b/qemu/hw/device-assignment.h
index 0fd78de..b1f2156 100644
--- a/qemu/hw/device-assignment.h
+++ b/qemu/hw/device-assignment.h
@@ -81,6 +81,15 @@ typedef struct {
unsigned char h_busnr;
unsigned int h_devfn;
int bound;
+ struct pci_dev *pdev;
+ struct {
+#define ASSIGNED_DEVICE_CAP_MSI (1 << 0)
+ uint32_t available;
+#define ASSIGNED_DEVICE_MSI_ENABLED (1 << 0)
+ uint32_t state;
+ } cap;
+ int irq_entries_nr;
+ struct kvm_irq_routing_entry *entry;
} AssignedDevice;
typedef struct AssignedDevInfo AssignedDevInfo;
--
1.5.4.5
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 10/16] kvm: Support MSI convert to INTx in device assignment
2009-03-12 13:36 [PATCH 0/16 v5] Device assignment improvement in userspace Sheng Yang
` (8 preceding siblings ...)
2009-03-12 13:36 ` [PATCH 09/16] kvm: expose MSI capability to guest Sheng Yang
@ 2009-03-12 13:36 ` Sheng Yang
2009-03-12 13:36 ` [PATCH 11/16] Add MSI-X related macro to pci.c Sheng Yang
` (6 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Sheng Yang @ 2009-03-12 13:36 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tosatti, Anthony Liguori; +Cc: kvm, Sheng Yang
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
qemu/hw/device-assignment.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletions(-)
diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
index bda0e95..01485d7 100644
--- a/qemu/hw/device-assignment.c
+++ b/qemu/hw/device-assignment.c
@@ -588,7 +588,11 @@ static int assign_irq(AssignedDevInfo *adev)
assigned_irq_data.guest_irq = irq;
assigned_irq_data.host_irq = dev->real_device.irq;
#ifdef KVM_CAP_ASSIGN_DEV_IRQ
- assigned_irq_data.flags = KVM_DEV_IRQ_HOST_INTX | KVM_DEV_IRQ_GUEST_INTX;
+ if (dev->cap.available & ASSIGNED_DEVICE_CAP_MSI)
+ assigned_irq_data.flags = KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_INTX;
+ else
+ assigned_irq_data.flags = KVM_DEV_IRQ_HOST_INTX | KVM_DEV_IRQ_GUEST_INTX;
+
r = kvm_deassign_irq(kvm_context, &assigned_irq_data);
/* -ENXIO means no assigned irq */
if (r && r != -ENXIO)
--
1.5.4.5
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 11/16] Add MSI-X related macro to pci.c
2009-03-12 13:36 [PATCH 0/16 v5] Device assignment improvement in userspace Sheng Yang
` (9 preceding siblings ...)
2009-03-12 13:36 ` [PATCH 10/16] kvm: Support MSI convert to INTx in device assignment Sheng Yang
@ 2009-03-12 13:36 ` Sheng Yang
2009-03-12 13:36 ` [PATCH 12/16] kvm: add ioctl KVM_SET_MSIX_ENTRY_NR and KVM_SET_MSIX_ENTRY Sheng Yang
` (5 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Sheng Yang @ 2009-03-12 13:36 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tosatti, Anthony Liguori; +Cc: kvm, Sheng Yang
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
qemu/hw/pci.h | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/qemu/hw/pci.h b/qemu/hw/pci.h
index 127dbed..1392626 100644
--- a/qemu/hw/pci.h
+++ b/qemu/hw/pci.h
@@ -206,6 +206,7 @@ typedef struct PCIIORegion {
#define PCI_CAPABILITY_CONFIG_MAX_LENGTH 0x60
#define PCI_CAPABILITY_CONFIG_DEFAULT_START_ADDR 0x40
#define PCI_CAPABILITY_CONFIG_MSI_LENGTH 0x10
+#define PCI_CAPABILITY_CONFIG_MSIX_LENGTH 0x10
struct PCIDevice {
/* PCI config space */
--
1.5.4.5
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 12/16] kvm: add ioctl KVM_SET_MSIX_ENTRY_NR and KVM_SET_MSIX_ENTRY
2009-03-12 13:36 [PATCH 0/16 v5] Device assignment improvement in userspace Sheng Yang
` (10 preceding siblings ...)
2009-03-12 13:36 ` [PATCH 11/16] Add MSI-X related macro to pci.c Sheng Yang
@ 2009-03-12 13:36 ` Sheng Yang
2009-03-12 13:36 ` [PATCH 13/16] kvm: enable MSI-X capabilty for assigned device Sheng Yang
` (4 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Sheng Yang @ 2009-03-12 13:36 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tosatti, Anthony Liguori; +Cc: kvm, Sheng Yang
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
libkvm/libkvm.c | 25 +++++++++++++++++++++++++
libkvm/libkvm.h | 7 +++++++
2 files changed, 32 insertions(+), 0 deletions(-)
diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c
index 405b0bf..f8129a4 100644
--- a/libkvm/libkvm.c
+++ b/libkvm/libkvm.c
@@ -1410,3 +1410,28 @@ int kvm_get_irq_route_gsi(kvm_context_t kvm)
return KVM_IOAPIC_NUM_PINS;
}
+#ifdef KVM_CAP_DEVICE_MSIX
+int kvm_assign_set_msix_nr(kvm_context_t kvm,
+ struct kvm_assigned_msix_nr *msix_nr)
+{
+ int ret;
+
+ ret = ioctl(kvm->vm_fd, KVM_ASSIGN_SET_MSIX_NR, msix_nr);
+ if (ret < 0)
+ return -errno;
+
+ return ret;
+}
+
+int kvm_assign_set_msix_entry(kvm_context_t kvm,
+ struct kvm_assigned_msix_entry *entry)
+{
+ int ret;
+
+ ret = ioctl(kvm->vm_fd, KVM_ASSIGN_SET_MSIX_ENTRY, entry);
+ if (ret < 0)
+ return -errno;
+
+ return ret;
+}
+#endif
diff --git a/libkvm/libkvm.h b/libkvm/libkvm.h
index 9a7cbc6..d3e431a 100644
--- a/libkvm/libkvm.h
+++ b/libkvm/libkvm.h
@@ -854,4 +854,11 @@ int kvm_commit_irq_routes(kvm_context_t kvm);
* \param kvm Pointer to the current kvm_context
*/
int kvm_get_irq_route_gsi(kvm_context_t kvm);
+
+#ifdef KVM_CAP_DEVICE_MSIX
+int kvm_assign_set_msix_nr(kvm_context_t kvm,
+ struct kvm_assigned_msix_nr *msix_nr);
+int kvm_assign_set_msix_entry(kvm_context_t kvm,
+ struct kvm_assigned_msix_entry *entry);
+#endif
#endif
--
1.5.4.5
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 13/16] kvm: enable MSI-X capabilty for assigned device
2009-03-12 13:36 [PATCH 0/16 v5] Device assignment improvement in userspace Sheng Yang
` (11 preceding siblings ...)
2009-03-12 13:36 ` [PATCH 12/16] kvm: add ioctl KVM_SET_MSIX_ENTRY_NR and KVM_SET_MSIX_ENTRY Sheng Yang
@ 2009-03-12 13:36 ` Sheng Yang
2009-03-16 8:32 ` Sheng Yang
2009-03-12 13:36 ` [PATCH 14/16] kvm: fix irq 0 assignment Sheng Yang
` (3 subsequent siblings)
16 siblings, 1 reply; 32+ messages in thread
From: Sheng Yang @ 2009-03-12 13:36 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tosatti, Anthony Liguori; +Cc: kvm, Sheng Yang
The most important part here, is we emulate a page of MMIO region using a
page of memory. That's because MSI-X table was put in the region and we have to
intercept it.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
qemu/hw/device-assignment.c | 287 ++++++++++++++++++++++++++++++++++++++++++-
qemu/hw/device-assignment.h | 6 +
2 files changed, 288 insertions(+), 5 deletions(-)
diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
index 01485d7..1cd4cf7 100644
--- a/qemu/hw/device-assignment.c
+++ b/qemu/hw/device-assignment.c
@@ -146,6 +146,7 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num,
{
AssignedDevice *r_dev = container_of(pci_dev, AssignedDevice, dev);
AssignedDevRegion *region = &r_dev->v_addrs[region_num];
+ PCIRegion *real_region = &r_dev->real_device.regions[region_num];
uint32_t old_ephys = region->e_physbase;
uint32_t old_esize = region->e_size;
int first_map = (region->e_size == 0);
@@ -161,10 +162,27 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num,
kvm_destroy_phys_mem(kvm_context, old_ephys,
TARGET_PAGE_ALIGN(old_esize));
- if (e_size > 0)
+ if (e_size > 0) {
+ /* deal with MSI-X MMIO page */
+ if (real_region->base_addr <= r_dev->msix_table_addr &&
+ real_region->base_addr + real_region->size >=
+ r_dev->msix_table_addr) {
+ int offset = r_dev->msix_table_addr - real_region->base_addr;
+ ret = munmap(region->u.r_virtbase + offset, TARGET_PAGE_SIZE);
+ if (ret == 0)
+ DEBUG("munmap done, virt_base 0x%p\n",
+ region->u.r_virtbase + offset);
+ else {
+ fprintf(stderr, "%s: fail munmap msix table!\n", __func__);
+ exit(1);
+ }
+ cpu_register_physical_memory(e_phys + offset,
+ TARGET_PAGE_SIZE, r_dev->mmio_index);
+ }
ret = kvm_register_phys_mem(kvm_context, e_phys,
region->u.r_virtbase,
TARGET_PAGE_ALIGN(e_size), 0);
+ }
if (ret != 0) {
fprintf(stderr, "%s: Error: create new mapping failed\n", __func__);
@@ -669,7 +687,9 @@ void assigned_dev_update_irqs()
}
}
-#if defined(KVM_CAP_DEVICE_MSI) && defined (KVM_CAP_IRQ_ROUTING)
+#ifdef KVM_CAP_IRQ_ROUTING
+
+#ifdef KVM_CAP_DEVICE_MSI
static void assigned_dev_update_msi(PCIDevice *pci_dev, unsigned int ctrl_pos)
{
struct kvm_assigned_irq assigned_irq_data;
@@ -724,14 +744,147 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev, unsigned int ctrl_pos)
}
#endif
+#ifdef KVM_CAP_DEVICE_MSIX
+static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
+{
+ AssignedDevice *adev = container_of(pci_dev, AssignedDevice, dev);
+ u16 entries_nr = 0, entries_max_nr;
+ int pos = 0, i, r = 0;
+ u32 msg_addr, msg_upper_addr, msg_data, msg_ctrl;
+ struct kvm_assigned_msix_nr msix_nr;
+ struct kvm_assigned_msix_entry msix_entry;
+ void *va = adev->msix_table_page;
+
+ if (adev->cap.available & ASSIGNED_DEVICE_CAP_MSI)
+ pos = pci_dev->cap.start + PCI_CAPABILITY_CONFIG_MSI_LENGTH;
+ entries_max_nr = pci_dev->config[pos + 2];
+ entries_max_nr &= PCI_MSIX_TABSIZE;
+
+ /* Get the usable entry number for allocating */
+ for (i = 0; i < entries_max_nr; i++) {
+ memcpy(&msg_ctrl, va + i * 16 + 12, 4);
+ /* 0x1 is mask bit for per vector */
+ if (msg_ctrl & 0x1)
+ continue;
+ memcpy(&msg_data, va + i * 16 + 8, 4);
+ /* Ignore unused entry even it's unmasked */
+ if (msg_data == 0)
+ continue;
+ entries_nr ++;
+ }
+
+ if (entries_nr == 0) {
+ fprintf(stderr, "MSI-X entry number is zero!\n");
+ return -EINVAL;
+ }
+ msix_nr.assigned_dev_id = calc_assigned_dev_id(adev->h_busnr,
+ (uint8_t)adev->h_devfn);
+ msix_nr.entry_nr = entries_nr;
+ r = kvm_assign_set_msix_nr(kvm_context, &msix_nr);
+ if (r != 0) {
+ fprintf(stderr, "fail to set MSI-X entry number for MSIX! %s\n",
+ strerror(-r));
+ return r;
+ }
+
+ free_dev_irq_entries(adev);
+ adev->irq_entries_nr = entries_nr;
+ adev->entry = calloc(entries_nr, sizeof(struct kvm_irq_routing_entry));
+ if (!adev->entry) {
+ perror("assigned_dev_update_msix_mmio: ");
+ return -errno;
+ }
+
+ msix_entry.assigned_dev_id = msix_nr.assigned_dev_id;
+ entries_nr = 0;
+ for (i = 0; i < entries_max_nr; i++) {
+ if (entries_nr >= msix_nr.entry_nr)
+ break;
+ memcpy(&msg_ctrl, va + i * 16 + 12, 4);
+ if (msg_ctrl & 0x1)
+ continue;
+ memcpy(&msg_data, va + i * 16 + 8, 4);
+ if (msg_data == 0)
+ continue;
+
+ memcpy(&msg_addr, va + i * 16, 4);
+ memcpy(&msg_upper_addr, va + i * 16 + 4, 4);
+
+ r = kvm_get_irq_route_gsi(kvm_context);
+ if (r < 0)
+ return r;
+
+ adev->entry[entries_nr].gsi = r;
+ adev->entry[entries_nr].type = KVM_IRQ_ROUTING_MSI;
+ adev->entry[entries_nr].flags = 0;
+ adev->entry[entries_nr].u.msi.address_lo = msg_addr;
+ adev->entry[entries_nr].u.msi.address_hi = msg_upper_addr;
+ adev->entry[entries_nr].u.msi.data = msg_data;
+ DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!", msg_data, msg_addr);
+ kvm_add_routing_entry(kvm_context, &adev->entry[entries_nr]);
+
+ msix_entry.gsi = adev->entry[entries_nr].gsi;
+ msix_entry.entry = i;
+ r = kvm_assign_set_msix_entry(kvm_context, &msix_entry);
+ if (r) {
+ fprintf(stderr, "fail to set MSI-X entry! %s\n", strerror(-r));
+ break;
+ }
+ DEBUG("MSI-X entry gsi 0x%x, entry %d\n!",
+ msix_entry.gsi, msix_entry.entry);
+ entries_nr ++;
+ }
+
+ if (r == 0 && kvm_commit_irq_routes(kvm_context) < 0) {
+ perror("assigned_dev_update_msix_mmio: kvm_commit_irq_routes");
+ return -EINVAL;
+ }
+
+ return r;
+}
+
+static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
+{
+ struct kvm_assigned_irq assigned_irq_data;
+ AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice, dev);
+ uint16_t *ctrl_word = (uint16_t *)(pci_dev->config + ctrl_pos);
+ int r;
+
+ memset(&assigned_irq_data, 0, sizeof assigned_irq_data);
+ assigned_irq_data.assigned_dev_id =
+ calc_assigned_dev_id(assigned_dev->h_busnr,
+ (uint8_t)assigned_dev->h_devfn);
+
+ free_dev_irq_entries(assigned_dev);
+ assigned_irq_data.flags = KVM_DEV_IRQ_HOST_MSIX | KVM_DEV_IRQ_GUEST_MSIX;
+ r = kvm_deassign_irq(kvm_context, &assigned_irq_data);
+ /* -ENXIO means no assigned irq */
+ if (r && r != -ENXIO)
+ perror("assigned_dev_update_msix: deassign irq");
+
+ if (*ctrl_word & PCI_MSIX_ENABLE) {
+ if (assigned_dev_update_msix_mmio(pci_dev) < 0) {
+ perror("assigned_dev_update_msix_mmio");
+ return;
+ }
+ if (kvm_assign_irq(kvm_context, &assigned_irq_data) < 0) {
+ perror("assigned_dev_enable_msix: assign irq");
+ return;
+ }
+ }
+}
+#endif
+#endif
+
static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t address,
- uint32_t val, int len)
+ uint32_t val, int len)
{
AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice, dev);
unsigned int pos = pci_dev->cap.start, ctrl_pos;
pci_default_cap_write_config(pci_dev, address, val, len);
-#if defined(KVM_CAP_DEVICE_MSI) && defined (KVM_CAP_IRQ_ROUTING)
+#ifdef KVM_CAP_IRQ_ROUTING
+#ifdef KVM_CAP_DEVICE_MSI
if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSI) {
ctrl_pos = pos + PCI_MSI_FLAGS;
if (address <= ctrl_pos && address + len > ctrl_pos)
@@ -739,16 +892,29 @@ static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t ad
pos += PCI_CAPABILITY_CONFIG_MSI_LENGTH;
}
#endif
+#ifdef KVM_CAP_DEVICE_MSIX
+ if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX) {
+ ctrl_pos = pos + 3;
+ if (address <= ctrl_pos && address + len > ctrl_pos) {
+ ctrl_pos--; /* control is word long */
+ assigned_dev_update_msix(pci_dev, ctrl_pos);
+ }
+ pos += PCI_CAPABILITY_CONFIG_MSIX_LENGTH;
+ }
+#endif
+#endif
return;
}
static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
{
AssignedDevice *dev = container_of(pci_dev, AssignedDevice, dev);
+ PCIRegion *pci_region = dev->real_device.regions;
int next_cap_pt = 0;
pci_dev->cap.length = 0;
-#if defined(KVM_CAP_DEVICE_MSI) && defined (KVM_CAP_IRQ_ROUTING)
+#ifdef KVM_CAP_IRQ_ROUTING
+#ifdef KVM_CAP_DEVICE_MSI
/* Expose MSI capability
* MSI capability is the 1st capability in capability config */
if (pci_find_cap_offset(dev->pdev, PCI_CAP_ID_MSI)) {
@@ -761,7 +927,113 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
next_cap_pt = 1;
}
#endif
+#ifdef KVM_CAP_DEVICE_MSIX
+ /* Expose MSI-X capability */
+ if (pci_find_cap_offset(dev->pdev, PCI_CAP_ID_MSIX)) {
+ int pos, entry_nr, bar_nr;
+ u32 msix_table_entry;
+ dev->cap.available |= ASSIGNED_DEVICE_CAP_MSIX;
+ memset(&pci_dev->config[pci_dev->cap.start + pci_dev->cap.length],
+ 0, PCI_CAPABILITY_CONFIG_MSIX_LENGTH);
+ pos = pci_find_cap_offset(dev->pdev, PCI_CAP_ID_MSIX);
+ entry_nr = pci_read_word(dev->pdev, pos + 2) & PCI_MSIX_TABSIZE;
+ pci_dev->config[pci_dev->cap.start + pci_dev->cap.length] = 0x11;
+ pci_dev->config[pci_dev->cap.start +
+ pci_dev->cap.length + 2] = entry_nr;
+ msix_table_entry = pci_read_long(dev->pdev, pos + PCI_MSIX_TABLE);
+ *(uint32_t *)(pci_dev->config + pci_dev->cap.start +
+ pci_dev->cap.length + PCI_MSIX_TABLE) = msix_table_entry;
+ *(uint32_t *)(pci_dev->config + pci_dev->cap.start +
+ pci_dev->cap.length + PCI_MSIX_PBA) =
+ pci_read_long(dev->pdev, pos + PCI_MSIX_PBA);
+ bar_nr = msix_table_entry & PCI_MSIX_BIR;
+ msix_table_entry &= ~PCI_MSIX_BIR;
+ dev->msix_table_addr = pci_region[bar_nr].base_addr + msix_table_entry;
+ if (next_cap_pt != 0) {
+ pci_dev->config[pci_dev->cap.start + next_cap_pt] =
+ pci_dev->cap.start + pci_dev->cap.length;
+ next_cap_pt += PCI_CAPABILITY_CONFIG_MSI_LENGTH;
+ } else
+ next_cap_pt = 1;
+ pci_dev->cap.length += PCI_CAPABILITY_CONFIG_MSIX_LENGTH;
+ }
+#endif
+#endif
+
+ return 0;
+}
+
+static uint32_t msix_mmio_readl(void *opaque, target_phys_addr_t addr)
+{
+ AssignedDevice *adev = opaque;
+ unsigned int offset = addr & 0xfff;
+ void *page = adev->msix_table_page;
+ uint32_t val = 0;
+
+ memcpy(&val, (void *)((char *)page + offset), 4);
+ return val;
+}
+
+static uint32_t msix_mmio_readb(void *opaque, target_phys_addr_t addr)
+{
+ return ((msix_mmio_readl(opaque, addr & ~3)) >>
+ (8 * (addr & 3))) & 0xff;
+}
+
+static uint32_t msix_mmio_readw(void *opaque, target_phys_addr_t addr)
+{
+ return ((msix_mmio_readl(opaque, addr & ~3)) >>
+ (8 * (addr & 3))) & 0xffff;
+}
+
+static void msix_mmio_writel(void *opaque,
+ target_phys_addr_t addr, uint32_t val)
+{
+ AssignedDevice *adev = opaque;
+ unsigned int offset = addr & 0xfff;
+ void *page = adev->msix_table_page;
+
+ DEBUG("write to MSI-X entry table mmio offset 0x%lx, val 0x%lx\n",
+ addr, val);
+ memcpy((void *)((char *)page + offset), &val, 4);
+}
+
+static void msix_mmio_writew(void *opaque,
+ target_phys_addr_t addr, uint32_t val)
+{
+ msix_mmio_writel(opaque, addr & ~3,
+ (val & 0xffff) << (8*(addr & 3)));
+}
+
+static void msix_mmio_writeb(void *opaque,
+ target_phys_addr_t addr, uint32_t val)
+{
+ msix_mmio_writel(opaque, addr & ~3,
+ (val & 0xff) << (8*(addr & 3)));
+}
+
+static CPUWriteMemoryFunc *msix_mmio_write[] = {
+ msix_mmio_writeb, msix_mmio_writew, msix_mmio_writel
+};
+
+static CPUReadMemoryFunc *msix_mmio_read[] = {
+ msix_mmio_readb, msix_mmio_readw, msix_mmio_readl
+};
+
+static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
+{
+ dev->msix_table_page = mmap(NULL, 0x1000,
+ PROT_READ|PROT_WRITE,
+ MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
+ memset(dev->msix_table_page, 0, 0x1000);
+ if (dev->msix_table_page == MAP_FAILED) {
+ fprintf(stderr, "fail allocate msix_table_page! %s\n",
+ strerror(errno));
+ return -EFAULT;
+ }
+ dev->mmio_index = cpu_register_io_memory(0,
+ msix_mmio_read, msix_mmio_write, dev);
return 0;
}
@@ -829,6 +1101,11 @@ struct PCIDevice *init_assigned_device(AssignedDevInfo *adev, PCIBus *bus)
assigned_device_pci_cap_init) < 0)
goto assigned_out;
+ /* intercept MSI-X entry page in the MMIO */
+ if (dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX)
+ if (assigned_dev_register_msix_mmio(dev))
+ return NULL;
+
return &dev->dev;
assigned_out:
diff --git a/qemu/hw/device-assignment.h b/qemu/hw/device-assignment.h
index b1f2156..69d549d 100644
--- a/qemu/hw/device-assignment.h
+++ b/qemu/hw/device-assignment.h
@@ -84,12 +84,18 @@ typedef struct {
struct pci_dev *pdev;
struct {
#define ASSIGNED_DEVICE_CAP_MSI (1 << 0)
+#define ASSIGNED_DEVICE_CAP_MSIX (1 << 1)
uint32_t available;
#define ASSIGNED_DEVICE_MSI_ENABLED (1 << 0)
+#define ASSIGNED_DEVICE_MSIX_ENABLED (1 << 1)
+#define ASSIGNED_DEVICE_MSIX_MASKED (1 << 2)
uint32_t state;
} cap;
int irq_entries_nr;
struct kvm_irq_routing_entry *entry;
+ void *msix_table_page;
+ target_phys_addr_t msix_table_addr;
+ int mmio_index;
} AssignedDevice;
typedef struct AssignedDevInfo AssignedDevInfo;
--
1.5.4.5
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 13/16] kvm: enable MSI-X capabilty for assigned device
2009-03-12 13:36 ` [PATCH 13/16] kvm: enable MSI-X capabilty for assigned device Sheng Yang
@ 2009-03-16 8:32 ` Sheng Yang
0 siblings, 0 replies; 32+ messages in thread
From: Sheng Yang @ 2009-03-16 8:32 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tosatti, Anthony Liguori
Cc: kvm, Sheng Yang, Yunbiao (Ben) Lin
The most important part here, is we emulate a page of MMIO region using a
page of memory. That's because MSI-X table was put in the region and we have to
intercept it.
(Thanks Ben to find one bug in assigned_dev_update_msix_mmio)
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Yunbiao (Ben) Lin <ben.y.lin@intel.com>
---
qemu/hw/device-assignment.c | 290 ++++++++++++++++++++++++++++++++++++++++++-
qemu/hw/device-assignment.h | 6 +
2 files changed, 291 insertions(+), 5 deletions(-)
diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
index 01485d7..1e35525 100644
--- a/qemu/hw/device-assignment.c
+++ b/qemu/hw/device-assignment.c
@@ -146,6 +146,7 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num,
{
AssignedDevice *r_dev = container_of(pci_dev, AssignedDevice, dev);
AssignedDevRegion *region = &r_dev->v_addrs[region_num];
+ PCIRegion *real_region = &r_dev->real_device.regions[region_num];
uint32_t old_ephys = region->e_physbase;
uint32_t old_esize = region->e_size;
int first_map = (region->e_size == 0);
@@ -161,10 +162,27 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num,
kvm_destroy_phys_mem(kvm_context, old_ephys,
TARGET_PAGE_ALIGN(old_esize));
- if (e_size > 0)
+ if (e_size > 0) {
+ /* deal with MSI-X MMIO page */
+ if (real_region->base_addr <= r_dev->msix_table_addr &&
+ real_region->base_addr + real_region->size >=
+ r_dev->msix_table_addr) {
+ int offset = r_dev->msix_table_addr - real_region->base_addr;
+ ret = munmap(region->u.r_virtbase + offset, TARGET_PAGE_SIZE);
+ if (ret == 0)
+ DEBUG("munmap done, virt_base 0x%p\n",
+ region->u.r_virtbase + offset);
+ else {
+ fprintf(stderr, "%s: fail munmap msix table!\n", __func__);
+ exit(1);
+ }
+ cpu_register_physical_memory(e_phys + offset,
+ TARGET_PAGE_SIZE, r_dev->mmio_index);
+ }
ret = kvm_register_phys_mem(kvm_context, e_phys,
region->u.r_virtbase,
TARGET_PAGE_ALIGN(e_size), 0);
+ }
if (ret != 0) {
fprintf(stderr, "%s: Error: create new mapping failed\n", __func__);
@@ -669,7 +687,9 @@ void assigned_dev_update_irqs()
}
}
-#if defined(KVM_CAP_DEVICE_MSI) && defined (KVM_CAP_IRQ_ROUTING)
+#ifdef KVM_CAP_IRQ_ROUTING
+
+#ifdef KVM_CAP_DEVICE_MSI
static void assigned_dev_update_msi(PCIDevice *pci_dev, unsigned int ctrl_pos)
{
struct kvm_assigned_irq assigned_irq_data;
@@ -724,14 +744,150 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev, unsigned int ctrl_pos)
}
#endif
+#ifdef KVM_CAP_DEVICE_MSIX
+static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
+{
+ AssignedDevice *adev = container_of(pci_dev, AssignedDevice, dev);
+ u16 entries_nr = 0, entries_max_nr;
+ int pos = 0, i, r = 0;
+ u32 msg_addr, msg_upper_addr, msg_data, msg_ctrl;
+ struct kvm_assigned_msix_nr msix_nr;
+ struct kvm_assigned_msix_entry msix_entry;
+ void *va = adev->msix_table_page;
+
+ if (adev->cap.available & ASSIGNED_DEVICE_CAP_MSI)
+ pos = pci_dev->cap.start + PCI_CAPABILITY_CONFIG_MSI_LENGTH;
+ else
+ pos = pci_dev->cap.start;
+
+ entries_max_nr = pci_dev->config[pos + 2];
+ entries_max_nr &= PCI_MSIX_TABSIZE;
+
+ /* Get the usable entry number for allocating */
+ for (i = 0; i < entries_max_nr; i++) {
+ memcpy(&msg_ctrl, va + i * 16 + 12, 4);
+ /* 0x1 is mask bit for per vector */
+ if (msg_ctrl & 0x1)
+ continue;
+ memcpy(&msg_data, va + i * 16 + 8, 4);
+ /* Ignore unused entry even it's unmasked */
+ if (msg_data == 0)
+ continue;
+ entries_nr ++;
+ }
+
+ if (entries_nr == 0) {
+ fprintf(stderr, "MSI-X entry number is zero!\n");
+ return -EINVAL;
+ }
+ msix_nr.assigned_dev_id = calc_assigned_dev_id(adev->h_busnr,
+ (uint8_t)adev->h_devfn);
+ msix_nr.entry_nr = entries_nr;
+ r = kvm_assign_set_msix_nr(kvm_context, &msix_nr);
+ if (r != 0) {
+ fprintf(stderr, "fail to set MSI-X entry number for MSIX! %s\n",
+ strerror(-r));
+ return r;
+ }
+
+ free_dev_irq_entries(adev);
+ adev->irq_entries_nr = entries_nr;
+ adev->entry = calloc(entries_nr, sizeof(struct kvm_irq_routing_entry));
+ if (!adev->entry) {
+ perror("assigned_dev_update_msix_mmio: ");
+ return -errno;
+ }
+
+ msix_entry.assigned_dev_id = msix_nr.assigned_dev_id;
+ entries_nr = 0;
+ for (i = 0; i < entries_max_nr; i++) {
+ if (entries_nr >= msix_nr.entry_nr)
+ break;
+ memcpy(&msg_ctrl, va + i * 16 + 12, 4);
+ if (msg_ctrl & 0x1)
+ continue;
+ memcpy(&msg_data, va + i * 16 + 8, 4);
+ if (msg_data == 0)
+ continue;
+
+ memcpy(&msg_addr, va + i * 16, 4);
+ memcpy(&msg_upper_addr, va + i * 16 + 4, 4);
+
+ r = kvm_get_irq_route_gsi(kvm_context);
+ if (r < 0)
+ return r;
+
+ adev->entry[entries_nr].gsi = r;
+ adev->entry[entries_nr].type = KVM_IRQ_ROUTING_MSI;
+ adev->entry[entries_nr].flags = 0;
+ adev->entry[entries_nr].u.msi.address_lo = msg_addr;
+ adev->entry[entries_nr].u.msi.address_hi = msg_upper_addr;
+ adev->entry[entries_nr].u.msi.data = msg_data;
+ DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!", msg_data, msg_addr);
+ kvm_add_routing_entry(kvm_context, &adev->entry[entries_nr]);
+
+ msix_entry.gsi = adev->entry[entries_nr].gsi;
+ msix_entry.entry = i;
+ r = kvm_assign_set_msix_entry(kvm_context, &msix_entry);
+ if (r) {
+ fprintf(stderr, "fail to set MSI-X entry! %s\n", strerror(-r));
+ break;
+ }
+ DEBUG("MSI-X entry gsi 0x%x, entry %d\n!",
+ msix_entry.gsi, msix_entry.entry);
+ entries_nr ++;
+ }
+
+ if (r == 0 && kvm_commit_irq_routes(kvm_context) < 0) {
+ perror("assigned_dev_update_msix_mmio: kvm_commit_irq_routes");
+ return -EINVAL;
+ }
+
+ return r;
+}
+
+static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
+{
+ struct kvm_assigned_irq assigned_irq_data;
+ AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice, dev);
+ uint16_t *ctrl_word = (uint16_t *)(pci_dev->config + ctrl_pos);
+ int r;
+
+ memset(&assigned_irq_data, 0, sizeof assigned_irq_data);
+ assigned_irq_data.assigned_dev_id =
+ calc_assigned_dev_id(assigned_dev->h_busnr,
+ (uint8_t)assigned_dev->h_devfn);
+
+ free_dev_irq_entries(assigned_dev);
+ assigned_irq_data.flags = KVM_DEV_IRQ_HOST_MSIX | KVM_DEV_IRQ_GUEST_MSIX;
+ r = kvm_deassign_irq(kvm_context, &assigned_irq_data);
+ /* -ENXIO means no assigned irq */
+ if (r && r != -ENXIO)
+ perror("assigned_dev_update_msix: deassign irq");
+
+ if (*ctrl_word & PCI_MSIX_ENABLE) {
+ if (assigned_dev_update_msix_mmio(pci_dev) < 0) {
+ perror("assigned_dev_update_msix_mmio");
+ return;
+ }
+ if (kvm_assign_irq(kvm_context, &assigned_irq_data) < 0) {
+ perror("assigned_dev_enable_msix: assign irq");
+ return;
+ }
+ }
+}
+#endif
+#endif
+
static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t address,
- uint32_t val, int len)
+ uint32_t val, int len)
{
AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice, dev);
unsigned int pos = pci_dev->cap.start, ctrl_pos;
pci_default_cap_write_config(pci_dev, address, val, len);
-#if defined(KVM_CAP_DEVICE_MSI) && defined (KVM_CAP_IRQ_ROUTING)
+#ifdef KVM_CAP_IRQ_ROUTING
+#ifdef KVM_CAP_DEVICE_MSI
if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSI) {
ctrl_pos = pos + PCI_MSI_FLAGS;
if (address <= ctrl_pos && address + len > ctrl_pos)
@@ -739,16 +895,29 @@ static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t ad
pos += PCI_CAPABILITY_CONFIG_MSI_LENGTH;
}
#endif
+#ifdef KVM_CAP_DEVICE_MSIX
+ if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX) {
+ ctrl_pos = pos + 3;
+ if (address <= ctrl_pos && address + len > ctrl_pos) {
+ ctrl_pos--; /* control is word long */
+ assigned_dev_update_msix(pci_dev, ctrl_pos);
+ }
+ pos += PCI_CAPABILITY_CONFIG_MSIX_LENGTH;
+ }
+#endif
+#endif
return;
}
static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
{
AssignedDevice *dev = container_of(pci_dev, AssignedDevice, dev);
+ PCIRegion *pci_region = dev->real_device.regions;
int next_cap_pt = 0;
pci_dev->cap.length = 0;
-#if defined(KVM_CAP_DEVICE_MSI) && defined (KVM_CAP_IRQ_ROUTING)
+#ifdef KVM_CAP_IRQ_ROUTING
+#ifdef KVM_CAP_DEVICE_MSI
/* Expose MSI capability
* MSI capability is the 1st capability in capability config */
if (pci_find_cap_offset(dev->pdev, PCI_CAP_ID_MSI)) {
@@ -761,10 +930,116 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
next_cap_pt = 1;
}
#endif
+#ifdef KVM_CAP_DEVICE_MSIX
+ /* Expose MSI-X capability */
+ if (pci_find_cap_offset(dev->pdev, PCI_CAP_ID_MSIX)) {
+ int pos, entry_nr, bar_nr;
+ u32 msix_table_entry;
+ dev->cap.available |= ASSIGNED_DEVICE_CAP_MSIX;
+ memset(&pci_dev->config[pci_dev->cap.start + pci_dev->cap.length],
+ 0, PCI_CAPABILITY_CONFIG_MSIX_LENGTH);
+ pos = pci_find_cap_offset(dev->pdev, PCI_CAP_ID_MSIX);
+ entry_nr = pci_read_word(dev->pdev, pos + 2) & PCI_MSIX_TABSIZE;
+ pci_dev->config[pci_dev->cap.start + pci_dev->cap.length] = 0x11;
+ pci_dev->config[pci_dev->cap.start +
+ pci_dev->cap.length + 2] = entry_nr;
+ msix_table_entry = pci_read_long(dev->pdev, pos + PCI_MSIX_TABLE);
+ *(uint32_t *)(pci_dev->config + pci_dev->cap.start +
+ pci_dev->cap.length + PCI_MSIX_TABLE) = msix_table_entry;
+ *(uint32_t *)(pci_dev->config + pci_dev->cap.start +
+ pci_dev->cap.length + PCI_MSIX_PBA) =
+ pci_read_long(dev->pdev, pos + PCI_MSIX_PBA);
+ bar_nr = msix_table_entry & PCI_MSIX_BIR;
+ msix_table_entry &= ~PCI_MSIX_BIR;
+ dev->msix_table_addr = pci_region[bar_nr].base_addr + msix_table_entry;
+ if (next_cap_pt != 0) {
+ pci_dev->config[pci_dev->cap.start + next_cap_pt] =
+ pci_dev->cap.start + pci_dev->cap.length;
+ next_cap_pt += PCI_CAPABILITY_CONFIG_MSI_LENGTH;
+ } else
+ next_cap_pt = 1;
+ pci_dev->cap.length += PCI_CAPABILITY_CONFIG_MSIX_LENGTH;
+ }
+#endif
+#endif
return 0;
}
+static uint32_t msix_mmio_readl(void *opaque, target_phys_addr_t addr)
+{
+ AssignedDevice *adev = opaque;
+ unsigned int offset = addr & 0xfff;
+ void *page = adev->msix_table_page;
+ uint32_t val = 0;
+
+ memcpy(&val, (void *)((char *)page + offset), 4);
+
+ return val;
+}
+
+static uint32_t msix_mmio_readb(void *opaque, target_phys_addr_t addr)
+{
+ return ((msix_mmio_readl(opaque, addr & ~3)) >>
+ (8 * (addr & 3))) & 0xff;
+}
+
+static uint32_t msix_mmio_readw(void *opaque, target_phys_addr_t addr)
+{
+ return ((msix_mmio_readl(opaque, addr & ~3)) >>
+ (8 * (addr & 3))) & 0xffff;
+}
+
+static void msix_mmio_writel(void *opaque,
+ target_phys_addr_t addr, uint32_t val)
+{
+ AssignedDevice *adev = opaque;
+ unsigned int offset = addr & 0xfff;
+ void *page = adev->msix_table_page;
+
+ DEBUG("write to MSI-X entry table mmio offset 0x%lx, val 0x%lx\n",
+ addr, val);
+ memcpy((void *)((char *)page + offset), &val, 4);
+}
+
+static void msix_mmio_writew(void *opaque,
+ target_phys_addr_t addr, uint32_t val)
+{
+ msix_mmio_writel(opaque, addr & ~3,
+ (val & 0xffff) << (8*(addr & 3)));
+}
+
+static void msix_mmio_writeb(void *opaque,
+ target_phys_addr_t addr, uint32_t val)
+{
+ msix_mmio_writel(opaque, addr & ~3,
+ (val & 0xff) << (8*(addr & 3)));
+}
+
+static CPUWriteMemoryFunc *msix_mmio_write[] = {
+ msix_mmio_writeb, msix_mmio_writew, msix_mmio_writel
+};
+
+static CPUReadMemoryFunc *msix_mmio_read[] = {
+ msix_mmio_readb, msix_mmio_readw, msix_mmio_readl
+};
+
+static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
+{
+ dev->msix_table_page = mmap(NULL, 0x1000,
+ PROT_READ|PROT_WRITE,
+ MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
+ memset(dev->msix_table_page, 0, 0x1000);
+ if (dev->msix_table_page == MAP_FAILED) {
+ fprintf(stderr, "fail allocate msix_table_page! %s\n",
+ strerror(errno));
+ return -EFAULT;
+ }
+ dev->mmio_index = cpu_register_io_memory(0,
+ msix_mmio_read, msix_mmio_write, dev);
+ return 0;
+}
+
struct PCIDevice *init_assigned_device(AssignedDevInfo *adev, PCIBus *bus)
{
int r;
@@ -829,6 +1104,11 @@ struct PCIDevice *init_assigned_device(AssignedDevInfo *adev, PCIBus *bus)
assigned_device_pci_cap_init) < 0)
goto assigned_out;
+ /* intercept MSI-X entry page in the MMIO */
+ if (dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX)
+ if (assigned_dev_register_msix_mmio(dev))
+ return NULL;
+
return &dev->dev;
assigned_out:
diff --git a/qemu/hw/device-assignment.h b/qemu/hw/device-assignment.h
index b1f2156..69d549d 100644
--- a/qemu/hw/device-assignment.h
+++ b/qemu/hw/device-assignment.h
@@ -84,12 +84,18 @@ typedef struct {
struct pci_dev *pdev;
struct {
#define ASSIGNED_DEVICE_CAP_MSI (1 << 0)
+#define ASSIGNED_DEVICE_CAP_MSIX (1 << 1)
uint32_t available;
#define ASSIGNED_DEVICE_MSI_ENABLED (1 << 0)
+#define ASSIGNED_DEVICE_MSIX_ENABLED (1 << 1)
+#define ASSIGNED_DEVICE_MSIX_MASKED (1 << 2)
uint32_t state;
} cap;
int irq_entries_nr;
struct kvm_irq_routing_entry *entry;
+ void *msix_table_page;
+ target_phys_addr_t msix_table_addr;
+ int mmio_index;
} AssignedDevice;
typedef struct AssignedDevInfo AssignedDevInfo;
--
1.5.4.5
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 14/16] kvm: fix irq 0 assignment
2009-03-12 13:36 [PATCH 0/16 v5] Device assignment improvement in userspace Sheng Yang
` (12 preceding siblings ...)
2009-03-12 13:36 ` [PATCH 13/16] kvm: enable MSI-X capabilty for assigned device Sheng Yang
@ 2009-03-12 13:36 ` Sheng Yang
2009-03-12 13:36 ` [PATCH 15/16] KVM: Fill config with correct VID/DID Sheng Yang
` (2 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Sheng Yang @ 2009-03-12 13:36 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tosatti, Anthony Liguori; +Cc: kvm, Sheng Yang
Shouldn't update assigned irq if host irq is 0, which means uninitialized
or don't support INTx.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
qemu/hw/device-assignment.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
index 1cd4cf7..69f8e3a 100644
--- a/qemu/hw/device-assignment.c
+++ b/qemu/hw/device-assignment.c
@@ -590,6 +590,10 @@ static int assign_irq(AssignedDevInfo *adev)
AssignedDevice *dev = adev->assigned_dev;
int irq, r = 0;
+ /* IRQ PIN 0 means not use INTx */
+ if (pci_read_byte(dev->pdev, PCI_INTERRUPT_PIN) == 0)
+ return 0;
+
irq = pci_map_irq(&dev->dev, dev->intpin);
irq = piix_get_irq(irq);
--
1.5.4.5
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 15/16] KVM: Fill config with correct VID/DID
2009-03-12 13:36 [PATCH 0/16 v5] Device assignment improvement in userspace Sheng Yang
` (13 preceding siblings ...)
2009-03-12 13:36 ` [PATCH 14/16] kvm: fix irq 0 assignment Sheng Yang
@ 2009-03-12 13:36 ` Sheng Yang
2009-03-12 13:36 ` [PATCH 16/16] kvm: emulate command register for SRIOV virtual function Sheng Yang
2009-03-16 9:10 ` [PATCH 0/16 v5] Device assignment improvement in userspace Avi Kivity
16 siblings, 0 replies; 32+ messages in thread
From: Sheng Yang @ 2009-03-12 13:36 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tosatti, Anthony Liguori; +Cc: kvm, Sheng Yang
SRIOV's virtual function didn't show correct Vendor ID/Device ID in config, so
we have to fill them manually according to device/vendor file in sysfs.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
qemu/hw/device-assignment.c | 31 ++++++++++++++++++++++++++++++-
1 files changed, 30 insertions(+), 1 deletions(-)
diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
index 69f8e3a..ea67ce9 100644
--- a/qemu/hw/device-assignment.c
+++ b/qemu/hw/device-assignment.c
@@ -317,7 +317,8 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address,
ssize_t ret;
AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev);
- if ((address >= 0x10 && address <= 0x24) || address == 0x34 ||
+ if (address < 0x4 ||
+ (address >= 0x10 && address <= 0x24) || address == 0x34 ||
address == 0x3c || address == 0x3d ||
pci_access_cap_config(d, address, len)) {
val = pci_default_read_config(d, address, len);
@@ -429,6 +430,7 @@ static int get_real_device(AssignedDevice *pci_dev, uint8_t r_bus,
int fd, r = 0;
FILE *f;
unsigned long long start, end, size, flags;
+ unsigned long id;
PCIRegion *rp;
PCIDevRegions *dev = &pci_dev->real_device;
@@ -488,6 +490,33 @@ again:
DEBUG("region %d size %d start 0x%llx type %d resource_fd %d\n",
r, rp->size, start, rp->type, rp->resource_fd);
}
+
+ fclose(f);
+
+ /* read and fill device ID */
+ snprintf(name, sizeof(name), "%svendor", dir);
+ f = fopen(name, "r");
+ if (f == NULL) {
+ fprintf(stderr, "%s: %s: %m\n", __func__, name);
+ return 1;
+ }
+ if (fscanf(f, "%li\n", &id) == 1) {
+ pci_dev->dev.config[0] = id & 0xff;
+ pci_dev->dev.config[1] = (id & 0xff00) >> 8;
+ }
+ fclose(f);
+
+ /* read and fill vendor ID */
+ snprintf(name, sizeof(name), "%sdevice", dir);
+ f = fopen(name, "r");
+ if (f == NULL) {
+ fprintf(stderr, "%s: %s: %m\n", __func__, name);
+ return 1;
+ }
+ if (fscanf(f, "%li\n", &id) == 1) {
+ pci_dev->dev.config[2] = id & 0xff;
+ pci_dev->dev.config[3] = (id & 0xff00) >> 8;
+ }
fclose(f);
dev->region_number = r;
--
1.5.4.5
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 16/16] kvm: emulate command register for SRIOV virtual function
2009-03-12 13:36 [PATCH 0/16 v5] Device assignment improvement in userspace Sheng Yang
` (14 preceding siblings ...)
2009-03-12 13:36 ` [PATCH 15/16] KVM: Fill config with correct VID/DID Sheng Yang
@ 2009-03-12 13:36 ` Sheng Yang
2009-03-16 9:10 ` [PATCH 0/16 v5] Device assignment improvement in userspace Avi Kivity
16 siblings, 0 replies; 32+ messages in thread
From: Sheng Yang @ 2009-03-12 13:36 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tosatti, Anthony Liguori; +Cc: kvm, Sheng Yang
MMIO enable byte would be checked when enabling virtual function, but in fact,
the whole virtual function's command register is hard-wired to zero... So when
guest read from command register it would only get 0, specially for MMIO enable
bit. And this result in, if guest driver want to do a RMW to the command
register, it would always get 0 and override former setting (e.g. unmap MMIO by
set the correlated bit to zero)...
Then we relay on QEmu to provide a reasonable command register content to guest.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
qemu/hw/device-assignment.c | 13 ++++++++++++-
qemu/hw/device-assignment.h | 1 +
2 files changed, 13 insertions(+), 1 deletions(-)
diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
index ea67ce9..299c8ea 100644
--- a/qemu/hw/device-assignment.c
+++ b/qemu/hw/device-assignment.c
@@ -26,7 +26,10 @@
* Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
*/
#include <stdio.h>
+#include <unistd.h>
#include <sys/io.h>
+#include <sys/types.h>
+#include <sys/stat.h>
#include "qemu-kvm.h"
#include "hw.h"
#include "pc.h"
@@ -317,7 +320,7 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address,
ssize_t ret;
AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev);
- if (address < 0x4 ||
+ if (address < 0x4 || (pci_dev->need_emulate_cmd && address == 0x4) ||
(address >= 0x10 && address <= 0x24) || address == 0x34 ||
address == 0x3c || address == 0x3d ||
pci_access_cap_config(d, address, len)) {
@@ -431,6 +434,7 @@ static int get_real_device(AssignedDevice *pci_dev, uint8_t r_bus,
FILE *f;
unsigned long long start, end, size, flags;
unsigned long id;
+ struct stat statbuf;
PCIRegion *rp;
PCIDevRegions *dev = &pci_dev->real_device;
@@ -519,6 +523,13 @@ again:
}
fclose(f);
+ /* dealing with virtual function device */
+ snprintf(name, sizeof(name), "%sphysfn/", dir);
+ if (!stat(name, &statbuf))
+ pci_dev->need_emulate_cmd = 1;
+ else
+ pci_dev->need_emulate_cmd = 0;
+
dev->region_number = r;
return 0;
}
diff --git a/qemu/hw/device-assignment.h b/qemu/hw/device-assignment.h
index 69d549d..1e5a84f 100644
--- a/qemu/hw/device-assignment.h
+++ b/qemu/hw/device-assignment.h
@@ -96,6 +96,7 @@ typedef struct {
void *msix_table_page;
target_phys_addr_t msix_table_addr;
int mmio_index;
+ int need_emulate_cmd;
} AssignedDevice;
typedef struct AssignedDevInfo AssignedDevInfo;
--
1.5.4.5
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 0/16 v5] Device assignment improvement in userspace
2009-03-12 13:36 [PATCH 0/16 v5] Device assignment improvement in userspace Sheng Yang
` (15 preceding siblings ...)
2009-03-12 13:36 ` [PATCH 16/16] kvm: emulate command register for SRIOV virtual function Sheng Yang
@ 2009-03-16 9:10 ` Avi Kivity
2009-03-16 18:12 ` Marcelo Tosatti
16 siblings, 1 reply; 32+ messages in thread
From: Avi Kivity @ 2009-03-16 9:10 UTC (permalink / raw)
To: Sheng Yang; +Cc: Marcelo Tosatti, Anthony Liguori, kvm
Sheng Yang wrote:
> Patch 1 and 2 are new ones, all the others had been sent before.
>
> This (huge) patchset, contained:
>
> Patch 1..2 are new interface after reworked device assignment kernel part.
>
> Patch 3..6 are generic capability support mechanism. These may can be adopted
> by QEmu upstream as well.
>
> Patch 7..10 enable MSI with device assignment on KVM. Also due to reworked
> device assignment kernel part discard MSI convert to INTx mechanism, patch 10
> enable it again in userspace.
>
> Patch 11..13 enable MSI-X with device assignment on KVM.
>
> And Patch 14..16 enable SR-IOV with KVM.
>
> Update from latest series:
>
> 1. Convert to the new ioctl interface.
> 2. Merge capability configuration space with PCIDevice one.
> 3. Support of deassign IRQ(unload driver) with MSI/MSI-X better.
> 4. Not assume IRQ0 means no INTx any longer, but check interrupt pin field in
> configuration space for the judgment.
>
> Please help to review! Thanks!
>
This looks ready to apply. I'd like Marcelo to look it over, though,
before.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 0/16 v5] Device assignment improvement in userspace
2009-03-16 9:10 ` [PATCH 0/16 v5] Device assignment improvement in userspace Avi Kivity
@ 2009-03-16 18:12 ` Marcelo Tosatti
2009-03-17 3:43 ` Sheng Yang
2009-03-17 9:40 ` Avi Kivity
0 siblings, 2 replies; 32+ messages in thread
From: Marcelo Tosatti @ 2009-03-16 18:12 UTC (permalink / raw)
To: Avi Kivity; +Cc: Sheng Yang, Anthony Liguori, kvm
On Mon, Mar 16, 2009 at 11:10:47AM +0200, Avi Kivity wrote:
> Sheng Yang wrote:
>> Patch 1 and 2 are new ones, all the others had been sent before.
>>
>> This (huge) patchset, contained:
>>
>> Patch 1..2 are new interface after reworked device assignment kernel part.
>>
>> Patch 3..6 are generic capability support mechanism. These may can be adopted
>> by QEmu upstream as well.
>>
>> Patch 7..10 enable MSI with device assignment on KVM. Also due to reworked
>> device assignment kernel part discard MSI convert to INTx mechanism, patch 10
>> enable it again in userspace.
>>
>> Patch 11..13 enable MSI-X with device assignment on KVM.
>>
>> And Patch 14..16 enable SR-IOV with KVM.
>>
>> Update from latest series:
>>
>> 1. Convert to the new ioctl interface.
>> 2. Merge capability configuration space with PCIDevice one.
>> 3. Support of deassign IRQ(unload driver) with MSI/MSI-X better.
>> 4. Not assume IRQ0 means no INTx any longer, but check interrupt pin field in
>> configuration space for the judgment.
>>
>> Please help to review! Thanks!
>>
>
> This looks ready to apply. I'd like Marcelo to look it over, though,
> before.
Looks good to me, ready to be applied.
There is one pending detail in the ioctl interface. Its a minor issue,
but might become troublesome later (and can be fixed after the patchset
has been applied).
The unassign ioctl takes "struct kvm_assigned_irq" and parses its flags
to decide what to do, in this way:
- If any bit is set in the guest mask (GUEST_INTX, GUEST_MSI,
GUEST_MSIX), we disable guest-side interrupt.
- Likewise for host, disabling host-side interrupt.
host_irq_type = irq_requested_type & KVM_DEV_IRQ_HOST_MASK;
guest_irq_type = irq_requested_type & KVM_DEV_IRQ_GUEST_MASK;
if (host_irq_type)
deassign_host_irq(kvm, assigned_dev);
if (guest_irq_type)
deassign_guest_irq(kvm, assigned_dev);
This is a little confusing. If we simply want to disable
_whatever is assigned_ in either guest or host side, we want a
UNASSIGN_GUEST/UNASSIGN_HOST pair of flags (this is how the ioctl
behaves, but we pass more flags and don't use them effectively).
Or, if the unassign ioctl continues to receive guest/host flags with
interrupt type detail, it should error out if userspace passed a type
that does not match what is currently assigned.
The current behaviour is simpler for userspace, but then we'd need not
to pass "struct kvm_assigned_irq".
Sheng, what do you say?
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 0/16 v5] Device assignment improvement in userspace
2009-03-16 18:12 ` Marcelo Tosatti
@ 2009-03-17 3:43 ` Sheng Yang
2009-03-17 13:55 ` Marcelo Tosatti
2009-03-17 9:40 ` Avi Kivity
1 sibling, 1 reply; 32+ messages in thread
From: Sheng Yang @ 2009-03-17 3:43 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Avi Kivity, Anthony Liguori, kvm
On Tuesday 17 March 2009 02:12:11 Marcelo Tosatti wrote:
> On Mon, Mar 16, 2009 at 11:10:47AM +0200, Avi Kivity wrote:
> > Sheng Yang wrote:
> >> Patch 1 and 2 are new ones, all the others had been sent before.
> >>
> >> This (huge) patchset, contained:
> >>
> >> Patch 1..2 are new interface after reworked device assignment kernel
> >> part.
> >>
> >> Patch 3..6 are generic capability support mechanism. These may can be
> >> adopted by QEmu upstream as well.
> >>
> >> Patch 7..10 enable MSI with device assignment on KVM. Also due to
> >> reworked device assignment kernel part discard MSI convert to INTx
> >> mechanism, patch 10 enable it again in userspace.
> >>
> >> Patch 11..13 enable MSI-X with device assignment on KVM.
> >>
> >> And Patch 14..16 enable SR-IOV with KVM.
> >>
> >> Update from latest series:
> >>
> >> 1. Convert to the new ioctl interface.
> >> 2. Merge capability configuration space with PCIDevice one.
> >> 3. Support of deassign IRQ(unload driver) with MSI/MSI-X better.
> >> 4. Not assume IRQ0 means no INTx any longer, but check interrupt pin
> >> field in configuration space for the judgment.
> >>
> >> Please help to review! Thanks!
> >
> > This looks ready to apply. I'd like Marcelo to look it over, though,
> > before.
>
> Looks good to me, ready to be applied.
>
> There is one pending detail in the ioctl interface. Its a minor issue,
> but might become troublesome later (and can be fixed after the patchset
> has been applied).
>
> The unassign ioctl takes "struct kvm_assigned_irq" and parses its flags
> to decide what to do, in this way:
>
> - If any bit is set in the guest mask (GUEST_INTX, GUEST_MSI,
> GUEST_MSIX), we disable guest-side interrupt.
> - Likewise for host, disabling host-side interrupt.
>
> host_irq_type = irq_requested_type & KVM_DEV_IRQ_HOST_MASK;
> guest_irq_type = irq_requested_type & KVM_DEV_IRQ_GUEST_MASK;
>
> if (host_irq_type)
> deassign_host_irq(kvm, assigned_dev);
> if (guest_irq_type)
> deassign_guest_irq(kvm, assigned_dev);
>
> This is a little confusing. If we simply want to disable
> _whatever is assigned_ in either guest or host side, we want a
> UNASSIGN_GUEST/UNASSIGN_HOST pair of flags (this is how the ioctl
> behaves, but we pass more flags and don't use them effectively).
>
> Or, if the unassign ioctl continues to receive guest/host flags with
> interrupt type detail, it should error out if userspace passed a type
> that does not match what is currently assigned.
>
> The current behaviour is simpler for userspace, but then we'd need not
> to pass "struct kvm_assigned_irq".
>
> Sheng, what do you say?
Yeah, it's a ambiguous point.
I think we have three questions here:
1. Do we need to verify guest's "qualification" before deassign the IRQ?
I think it's unnecessary, because even if we got a "malicious" userspace, it
can try different combination and finally got it...
2. Do we need to keep the flexibility? (e.g. "struct kvm_assigned_irq" and the
split of guest and host IRQ deassign)
I am not sure. I think we can. And for it have been there(upstream) already, I
think just keep it there is OK. It shouldn't affect much, and maybe we can use
it in the future.
3. How to clarify the ambiguous of flags of kvm_assigned_irq?
I've updated the patchset, add one irq_requested_type for assigned_dev in
userspace. At least, it's more precise in semantic. This can also be used to
implement "deassign guest irq" only in the future(if it's necessary. But I am
still worry about what if device have interrupt between deassign guest irq and
assign new guest irq).
Marcelo, how do you think?
--
regards
Yang, Sheng
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 0/16 v5] Device assignment improvement in userspace
2009-03-17 3:43 ` Sheng Yang
@ 2009-03-17 13:55 ` Marcelo Tosatti
2009-03-17 20:19 ` Marcelo Tosatti
0 siblings, 1 reply; 32+ messages in thread
From: Marcelo Tosatti @ 2009-03-17 13:55 UTC (permalink / raw)
To: Sheng Yang; +Cc: Avi Kivity, Anthony Liguori, kvm
On Tue, Mar 17, 2009 at 11:43:10AM +0800, Sheng Yang wrote:
> On Tuesday 17 March 2009 02:12:11 Marcelo Tosatti wrote:
> > On Mon, Mar 16, 2009 at 11:10:47AM +0200, Avi Kivity wrote:
> > > Sheng Yang wrote:
> > >> Patch 1 and 2 are new ones, all the others had been sent before.
> > >>
> > >> This (huge) patchset, contained:
> > >>
> > >> Patch 1..2 are new interface after reworked device assignment kernel
> > >> part.
> > >>
> > >> Patch 3..6 are generic capability support mechanism. These may can be
> > >> adopted by QEmu upstream as well.
> > >>
> > >> Patch 7..10 enable MSI with device assignment on KVM. Also due to
> > >> reworked device assignment kernel part discard MSI convert to INTx
> > >> mechanism, patch 10 enable it again in userspace.
> > >>
> > >> Patch 11..13 enable MSI-X with device assignment on KVM.
> > >>
> > >> And Patch 14..16 enable SR-IOV with KVM.
> > >>
> > >> Update from latest series:
> > >>
> > >> 1. Convert to the new ioctl interface.
> > >> 2. Merge capability configuration space with PCIDevice one.
> > >> 3. Support of deassign IRQ(unload driver) with MSI/MSI-X better.
> > >> 4. Not assume IRQ0 means no INTx any longer, but check interrupt pin
> > >> field in configuration space for the judgment.
> > >>
> > >> Please help to review! Thanks!
> > >
> > > This looks ready to apply. I'd like Marcelo to look it over, though,
> > > before.
> >
> > Looks good to me, ready to be applied.
> >
> > There is one pending detail in the ioctl interface. Its a minor issue,
> > but might become troublesome later (and can be fixed after the patchset
> > has been applied).
> >
> > The unassign ioctl takes "struct kvm_assigned_irq" and parses its flags
> > to decide what to do, in this way:
> >
> > - If any bit is set in the guest mask (GUEST_INTX, GUEST_MSI,
> > GUEST_MSIX), we disable guest-side interrupt.
> > - Likewise for host, disabling host-side interrupt.
> >
> > host_irq_type = irq_requested_type & KVM_DEV_IRQ_HOST_MASK;
> > guest_irq_type = irq_requested_type & KVM_DEV_IRQ_GUEST_MASK;
> >
> > if (host_irq_type)
> > deassign_host_irq(kvm, assigned_dev);
> > if (guest_irq_type)
> > deassign_guest_irq(kvm, assigned_dev);
> >
> > This is a little confusing. If we simply want to disable
> > _whatever is assigned_ in either guest or host side, we want a
> > UNASSIGN_GUEST/UNASSIGN_HOST pair of flags (this is how the ioctl
> > behaves, but we pass more flags and don't use them effectively).
> >
> > Or, if the unassign ioctl continues to receive guest/host flags with
> > interrupt type detail, it should error out if userspace passed a type
> > that does not match what is currently assigned.
> >
> > The current behaviour is simpler for userspace, but then we'd need not
> > to pass "struct kvm_assigned_irq".
> >
> > Sheng, what do you say?
>
> Yeah, it's a ambiguous point.
>
> I think we have three questions here:
>
> 1. Do we need to verify guest's "qualification" before deassign the IRQ?
>
> I think it's unnecessary, because even if we got a "malicious" userspace, it
> can try different combination and finally got it...
By requiring the type to be deassigned we force userspace to keep
correct accounting, which is not bad.
> 2. Do we need to keep the flexibility? (e.g. "struct kvm_assigned_irq" and the
> split of guest and host IRQ deassign)
>
> I am not sure. I think we can. And for it have been there(upstream) already, I
> think just keep it there is OK. It shouldn't affect much, and maybe we can use
> it in the future.
Right.
> 3. How to clarify the ambiguous of flags of kvm_assigned_irq?
>
> I've updated the patchset, add one irq_requested_type for assigned_dev in
> userspace. At least, it's more precise in semantic. This can also be used to
> implement "deassign guest irq" only in the future(if it's necessary. But I am
> still worry about what if device have interrupt between deassign guest irq and
> assign new guest irq).
Yes, it needs to be done carefully.
> Marcelo, how do you think?
I think that enforcing the correct type on deassign is alright.
Avi, I'm good with this patchset, we can force the correct type on
deassign ioctl as a separate kernel patch.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 0/16 v5] Device assignment improvement in userspace
2009-03-17 13:55 ` Marcelo Tosatti
@ 2009-03-17 20:19 ` Marcelo Tosatti
0 siblings, 0 replies; 32+ messages in thread
From: Marcelo Tosatti @ 2009-03-17 20:19 UTC (permalink / raw)
To: Sheng Yang; +Cc: Avi Kivity, Anthony Liguori, kvm
On Tue, Mar 17, 2009 at 10:55:48AM -0300, Marcelo Tosatti wrote:
> I think that enforcing the correct type on deassign is alright.
>
> Avi, I'm good with this patchset, we can force the correct type on
> deassign ioctl as a separate kernel patch.
Err, meant v6 patchset.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 0/16 v5] Device assignment improvement in userspace
2009-03-16 18:12 ` Marcelo Tosatti
2009-03-17 3:43 ` Sheng Yang
@ 2009-03-17 9:40 ` Avi Kivity
2009-03-17 14:50 ` Marcelo Tosatti
1 sibling, 1 reply; 32+ messages in thread
From: Avi Kivity @ 2009-03-17 9:40 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Sheng Yang, Anthony Liguori, kvm
Marcelo Tosatti wrote:
> Looks good to me, ready to be applied.
>
> There is one pending detail in the ioctl interface. Its a minor issue,
> but might become troublesome later (and can be fixed after the patchset
> has been applied).
>
> The unassign ioctl takes "struct kvm_assigned_irq" and parses its flags
> to decide what to do, in this way:
>
> - If any bit is set in the guest mask (GUEST_INTX, GUEST_MSI,
> GUEST_MSIX), we disable guest-side interrupt.
> - Likewise for host, disabling host-side interrupt.
>
> host_irq_type = irq_requested_type & KVM_DEV_IRQ_HOST_MASK;
> guest_irq_type = irq_requested_type & KVM_DEV_IRQ_GUEST_MASK;
>
> if (host_irq_type)
> deassign_host_irq(kvm, assigned_dev);
> if (guest_irq_type)
> deassign_guest_irq(kvm, assigned_dev);
>
> This is a little confusing. If we simply want to disable
> _whatever is assigned_ in either guest or host side, we want a
> UNASSIGN_GUEST/UNASSIGN_HOST pair of flags (this is how the ioctl
> behaves, but we pass more flags and don't use them effectively).
>
> Or, if the unassign ioctl continues to receive guest/host flags with
> interrupt type detail, it should error out if userspace passed a type
> that does not match what is currently assigned.
>
> The current behaviour is simpler for userspace, but then we'd need not
> to pass "struct kvm_assigned_irq".
>
> Sheng, what do you say?
>
>
Maybe we want different ioctl pairs for guest and host?
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 32+ messages in thread