All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6 v5] MSI-X mask support for assigned device
@ 2010-11-15  9:15 Sheng Yang
  2010-11-15  9:15 ` [PATCH 1/6] PCI: MSI: Move MSI-X entry definition to pci_regs.h Sheng Yang
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Sheng Yang @ 2010-11-15  9:15 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Michael S. Tsirkin, kvm, Sheng Yang

Change from v4:
1. Rebased on latest KVM
2. Fix minor comments.
3. Drop big-endian patch because unable to guarantee the correctless. Add TODO for it.

Change from v3:
1. Re-design the userspace API.
2. Add big-endian support for msix_mmio_read/write()(untested!)

Change from v2:
1. Move all mask handling to kernel, and userspace has to access it using API.
2. Discard userspace mask bit operation API.

Sheng Yang (6):
  PCI: MSI: Move MSI-X entry definition to pci_regs.h
  PCI: Add mask bit definition for MSI-X table
  KVM: Move struct kvm_io_device to kvm_host.h
  KVM: Add kvm_get_irq_routing_entry() func
  KVM: assigned dev: Clean up assigned_device's flag
  KVM: assigned dev: MSI-X mask support

 arch/x86/kvm/x86.c       |    1 +
 drivers/pci/msi.c        |    5 +-
 drivers/pci/msi.h        |    6 -
 include/linux/kvm.h      |   32 +++++
 include/linux/kvm_host.h |   31 +++++
 include/linux/pci_regs.h |    8 +
 virt/kvm/assigned-dev.c  |  325 +++++++++++++++++++++++++++++++++++++++++++++-
 virt/kvm/iodev.h         |   25 +----
 virt/kvm/irq_comm.c      |   20 +++
 9 files changed, 417 insertions(+), 36 deletions(-)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/6] PCI: MSI: Move MSI-X entry definition to pci_regs.h
  2010-11-15  9:15 [PATCH 0/6 v5] MSI-X mask support for assigned device Sheng Yang
@ 2010-11-15  9:15 ` Sheng Yang
  2010-11-15  9:15 ` [PATCH 2/6] PCI: Add mask bit definition for MSI-X table Sheng Yang
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Sheng Yang @ 2010-11-15  9:15 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Marcelo Tosatti, Michael S. Tsirkin, kvm, Sheng Yang, linux-pci

Then it can be used by others.

Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 drivers/pci/msi.h        |    6 ------
 include/linux/pci_regs.h |    7 +++++++
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/msi.h b/drivers/pci/msi.h
index feff3be..65c42f8 100644
--- a/drivers/pci/msi.h
+++ b/drivers/pci/msi.h
@@ -6,12 +6,6 @@
 #ifndef MSI_H
 #define MSI_H
 
-#define PCI_MSIX_ENTRY_SIZE		16
-#define  PCI_MSIX_ENTRY_LOWER_ADDR	0
-#define  PCI_MSIX_ENTRY_UPPER_ADDR	4
-#define  PCI_MSIX_ENTRY_DATA		8
-#define  PCI_MSIX_ENTRY_VECTOR_CTRL	12
-
 #define msi_control_reg(base)		(base + PCI_MSI_FLAGS)
 #define msi_lower_address_reg(base)	(base + PCI_MSI_ADDRESS_LO)
 #define msi_upper_address_reg(base)	(base + PCI_MSI_ADDRESS_HI)
diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h
index af83076..b21d33e 100644
--- a/include/linux/pci_regs.h
+++ b/include/linux/pci_regs.h
@@ -309,6 +309,13 @@
 #define PCI_MSIX_PBA		8
 #define  PCI_MSIX_FLAGS_BIRMASK	(7 << 0)
 
+/* MSI-X entry's format */
+#define PCI_MSIX_ENTRY_SIZE		16
+#define  PCI_MSIX_ENTRY_LOWER_ADDR	0
+#define  PCI_MSIX_ENTRY_UPPER_ADDR	4
+#define  PCI_MSIX_ENTRY_DATA		8
+#define  PCI_MSIX_ENTRY_VECTOR_CTRL	12
+
 /* CompactPCI Hotswap Register */
 
 #define PCI_CHSWP_CSR		2	/* Control and Status Register */
-- 
1.7.0.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/6] PCI: Add mask bit definition for MSI-X table
  2010-11-15  9:15 [PATCH 0/6 v5] MSI-X mask support for assigned device Sheng Yang
  2010-11-15  9:15 ` [PATCH 1/6] PCI: MSI: Move MSI-X entry definition to pci_regs.h Sheng Yang
@ 2010-11-15  9:15 ` Sheng Yang
  2010-11-15  9:15 ` [PATCH 3/6] KVM: Move struct kvm_io_device to kvm_host.h Sheng Yang
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Sheng Yang @ 2010-11-15  9:15 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Marcelo Tosatti, Michael S. Tsirkin, kvm, Sheng Yang,
	Matthew Wilcox, linux-pci

Then we can use it instead of magic number 1.

Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 drivers/pci/msi.c        |    5 +++--
 include/linux/pci_regs.h |    1 +
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 7c24dce..44b0aee 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -168,8 +168,9 @@ static u32 __msix_mask_irq(struct msi_desc *desc, u32 flag)
 	u32 mask_bits = desc->masked;
 	unsigned offset = desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
 						PCI_MSIX_ENTRY_VECTOR_CTRL;
-	mask_bits &= ~1;
-	mask_bits |= flag;
+	mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT;
+	if (flag)
+		mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
 	writel(mask_bits, desc->mask_base + offset);
 
 	return mask_bits;
diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h
index b21d33e..d4f2c80 100644
--- a/include/linux/pci_regs.h
+++ b/include/linux/pci_regs.h
@@ -315,6 +315,7 @@
 #define  PCI_MSIX_ENTRY_UPPER_ADDR	4
 #define  PCI_MSIX_ENTRY_DATA		8
 #define  PCI_MSIX_ENTRY_VECTOR_CTRL	12
+#define   PCI_MSIX_ENTRY_CTRL_MASKBIT	1
 
 /* CompactPCI Hotswap Register */
 
-- 
1.7.0.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/6] KVM: Move struct kvm_io_device to kvm_host.h
  2010-11-15  9:15 [PATCH 0/6 v5] MSI-X mask support for assigned device Sheng Yang
  2010-11-15  9:15 ` [PATCH 1/6] PCI: MSI: Move MSI-X entry definition to pci_regs.h Sheng Yang
  2010-11-15  9:15 ` [PATCH 2/6] PCI: Add mask bit definition for MSI-X table Sheng Yang
@ 2010-11-15  9:15 ` Sheng Yang
  2010-11-15  9:15 ` [PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func Sheng Yang
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Sheng Yang @ 2010-11-15  9:15 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Michael S. Tsirkin, kvm, Sheng Yang

Then it can be used in struct kvm_assigned_dev_kernel later.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 include/linux/kvm_host.h |   23 +++++++++++++++++++++++
 virt/kvm/iodev.h         |   25 +------------------------
 2 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 2d63f2c..9da2f1a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -97,6 +97,29 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
 int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 #endif
 
+struct kvm_io_device;
+
+/**
+ * kvm_io_device_ops are called under kvm slots_lock.
+ * read and write handlers return 0 if the transaction has been handled,
+ * or non-zero to have it passed to the next device.
+ **/
+struct kvm_io_device_ops {
+	int (*read)(struct kvm_io_device *this,
+		    gpa_t addr,
+		    int len,
+		    void *val);
+	int (*write)(struct kvm_io_device *this,
+		     gpa_t addr,
+		     int len,
+		     const void *val);
+	void (*destructor)(struct kvm_io_device *this);
+};
+
+struct kvm_io_device {
+	const struct kvm_io_device_ops *ops;
+};
+
 struct kvm_vcpu {
 	struct kvm *kvm;
 #ifdef CONFIG_PREEMPT_NOTIFIERS
diff --git a/virt/kvm/iodev.h b/virt/kvm/iodev.h
index 12fd3ca..d1f5651 100644
--- a/virt/kvm/iodev.h
+++ b/virt/kvm/iodev.h
@@ -17,32 +17,9 @@
 #define __KVM_IODEV_H__
 
 #include <linux/kvm_types.h>
+#include <linux/kvm_host.h>
 #include <asm/errno.h>
 
-struct kvm_io_device;
-
-/**
- * kvm_io_device_ops are called under kvm slots_lock.
- * read and write handlers return 0 if the transaction has been handled,
- * or non-zero to have it passed to the next device.
- **/
-struct kvm_io_device_ops {
-	int (*read)(struct kvm_io_device *this,
-		    gpa_t addr,
-		    int len,
-		    void *val);
-	int (*write)(struct kvm_io_device *this,
-		     gpa_t addr,
-		     int len,
-		     const void *val);
-	void (*destructor)(struct kvm_io_device *this);
-};
-
-
-struct kvm_io_device {
-	const struct kvm_io_device_ops *ops;
-};
-
 static inline void kvm_iodevice_init(struct kvm_io_device *dev,
 				     const struct kvm_io_device_ops *ops)
 {
-- 
1.7.0.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func
  2010-11-15  9:15 [PATCH 0/6 v5] MSI-X mask support for assigned device Sheng Yang
                   ` (2 preceding siblings ...)
  2010-11-15  9:15 ` [PATCH 3/6] KVM: Move struct kvm_io_device to kvm_host.h Sheng Yang
@ 2010-11-15  9:15 ` Sheng Yang
  2010-11-17 14:01   ` Avi Kivity
  2010-11-15  9:15 ` [PATCH 5/6] KVM: assigned dev: Clean up assigned_device's flag Sheng Yang
  2010-11-15  9:15 ` [PATCH 6/6] KVM: assigned dev: MSI-X mask support Sheng Yang
  5 siblings, 1 reply; 26+ messages in thread
From: Sheng Yang @ 2010-11-15  9:15 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Michael S. Tsirkin, kvm, Sheng Yang

We need to query the entry later.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 include/linux/kvm_host.h |    2 ++
 virt/kvm/irq_comm.c      |   20 ++++++++++++++++++++
 2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9da2f1a..274655b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -669,6 +669,8 @@ int kvm_set_irq_routing(struct kvm *kvm,
 			const struct kvm_irq_routing_entry *entries,
 			unsigned nr,
 			unsigned flags);
+int kvm_get_irq_routing_entry(struct kvm *kvm, int gsi,
+		struct kvm_kernel_irq_routing_entry *entry);
 void kvm_free_irq_routing(struct kvm *kvm);
 
 #else
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 8edca91..ae1dc7c 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -421,6 +421,26 @@ out:
 	return r;
 }
 
+int kvm_get_irq_routing_entry(struct kvm *kvm, int gsi,
+		struct kvm_kernel_irq_routing_entry *entry)
+{
+	int count = 0;
+	struct kvm_kernel_irq_routing_entry *ei = NULL;
+	struct kvm_irq_routing_table *irq_rt;
+	struct hlist_node *n;
+
+	rcu_read_lock();
+	irq_rt = rcu_dereference(kvm->irq_routing);
+	if (gsi < irq_rt->nr_rt_entries)
+		hlist_for_each_entry(ei, n, &irq_rt->map[gsi], link)
+			count++;
+	if (count == 1)
+		*entry = *ei;
+	rcu_read_unlock();
+
+	return (count != 1);
+}
+
 #define IOAPIC_ROUTING_ENTRY(irq) \
 	{ .gsi = irq, .type = KVM_IRQ_ROUTING_IRQCHIP,	\
 	  .u.irqchip.irqchip = KVM_IRQCHIP_IOAPIC, .u.irqchip.pin = (irq) }
-- 
1.7.0.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/6] KVM: assigned dev: Clean up assigned_device's flag
  2010-11-15  9:15 [PATCH 0/6 v5] MSI-X mask support for assigned device Sheng Yang
                   ` (3 preceding siblings ...)
  2010-11-15  9:15 ` [PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func Sheng Yang
@ 2010-11-15  9:15 ` Sheng Yang
  2010-11-15  9:15 ` [PATCH 6/6] KVM: assigned dev: MSI-X mask support Sheng Yang
  5 siblings, 0 replies; 26+ messages in thread
From: Sheng Yang @ 2010-11-15  9:15 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Michael S. Tsirkin, kvm, Sheng Yang

Reuse KVM_DEV_ASSIGN_ENABLE_IOMMU for an in-kernel struct didn't make much
sense.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 include/linux/kvm_host.h |    1 +
 virt/kvm/assigned-dev.c  |    7 ++++---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 274655b..f09db87 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -500,6 +500,7 @@ struct kvm_guest_msix_entry {
 	u16 flags;
 };
 
+#define KVM_ASSIGNED_ENABLED_IOMMU		(1 << 0)
 struct kvm_assigned_dev_kernel {
 	struct kvm_irq_ack_notifier ack_notifier;
 	struct work_struct interrupt_work;
diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index 7c98928..5c6b96d 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -552,7 +552,8 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
 	match->host_segnr = assigned_dev->segnr;
 	match->host_busnr = assigned_dev->busnr;
 	match->host_devfn = assigned_dev->devfn;
-	match->flags = assigned_dev->flags;
+	if (assigned_dev->flags & KVM_DEV_ASSIGN_ENABLE_IOMMU)
+		match->flags |= KVM_ASSIGNED_ENABLED_IOMMU;
 	match->dev = dev;
 	spin_lock_init(&match->assigned_dev_lock);
 	match->irq_source_id = -1;
@@ -563,7 +564,7 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
 
 	list_add(&match->list, &kvm->arch.assigned_dev_head);
 
-	if (assigned_dev->flags & KVM_DEV_ASSIGN_ENABLE_IOMMU) {
+	if (assigned_dev->flags & KVM_ASSIGNED_ENABLED_IOMMU) {
 		if (!kvm->arch.iommu_domain) {
 			r = kvm_iommu_map_guest(kvm);
 			if (r)
@@ -609,7 +610,7 @@ static int kvm_vm_ioctl_deassign_device(struct kvm *kvm,
 		goto out;
 	}
 
-	if (match->flags & KVM_DEV_ASSIGN_ENABLE_IOMMU)
+	if (match->flags & KVM_ASSIGNED_ENABLED_IOMMU)
 		kvm_deassign_device(kvm, match);
 
 	kvm_free_assigned_device(kvm, match);
-- 
1.7.0.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 6/6] KVM: assigned dev: MSI-X mask support
  2010-11-15  9:15 [PATCH 0/6 v5] MSI-X mask support for assigned device Sheng Yang
                   ` (4 preceding siblings ...)
  2010-11-15  9:15 ` [PATCH 5/6] KVM: assigned dev: Clean up assigned_device's flag Sheng Yang
@ 2010-11-15  9:15 ` Sheng Yang
  2010-11-15  9:27   ` [PATCH 6/6 v5 updated] " Sheng Yang
                     ` (2 more replies)
  5 siblings, 3 replies; 26+ messages in thread
From: Sheng Yang @ 2010-11-15  9:15 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Michael S. Tsirkin, kvm, Sheng Yang

This patch enable per-vector mask for assigned devices using MSI-X.

This patch provided two new APIs: one is for guest to specific device's MSI-X
table address in MMIO, the other is for userspace to get information about mask
bit.

All the mask bit operation are kept in kernel, in order to accelerate.
Userspace shouldn't access the device MMIO directly for the information,
instead it should uses provided API to do so.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 arch/x86/kvm/x86.c       |    1 +
 include/linux/kvm.h      |   32 +++++
 include/linux/kvm_host.h |    5 +
 virt/kvm/assigned-dev.c  |  318 +++++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 355 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fc29223..37602e2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1966,6 +1966,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_X86_ROBUST_SINGLESTEP:
 	case KVM_CAP_XSAVE:
 	case KVM_CAP_ASYNC_PF:
+	case KVM_CAP_MSIX_MASK:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index ea2dc1a..b3e5ffe 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -541,6 +541,9 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_PPC_GET_PVINFO 57
 #define KVM_CAP_PPC_IRQ_LEVEL 58
 #define KVM_CAP_ASYNC_PF 59
+#ifdef __KVM_HAVE_MSIX
+#define KVM_CAP_MSIX_MASK 60
+#endif
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -672,6 +675,9 @@ struct kvm_clock_data {
 #define KVM_XEN_HVM_CONFIG        _IOW(KVMIO,  0x7a, struct kvm_xen_hvm_config)
 #define KVM_SET_CLOCK             _IOW(KVMIO,  0x7b, struct kvm_clock_data)
 #define KVM_GET_CLOCK             _IOR(KVMIO,  0x7c, struct kvm_clock_data)
+/* Available with KVM_CAP_MSIX_MASK */
+#define KVM_GET_MSIX_ENTRY        _IOWR(KVMIO,  0x7d, struct kvm_msix_entry)
+#define KVM_UPDATE_MSIX_MMIO      _IOW(KVMIO,  0x7e, struct kvm_msix_mmio)
 /* Available with KVM_CAP_PIT_STATE2 */
 #define KVM_GET_PIT2              _IOR(KVMIO,  0x9f, struct kvm_pit_state2)
 #define KVM_SET_PIT2              _IOW(KVMIO,  0xa0, struct kvm_pit_state2)
@@ -795,4 +801,30 @@ struct kvm_assigned_msix_entry {
 	__u16 padding[3];
 };
 
+#define KVM_MSIX_TYPE_ASSIGNED_DEV	1
+
+#define KVM_MSIX_FLAG_MASKBIT		(1 << 0)
+#define KVM_MSIX_FLAG_QUERY_MASKBIT	(1 << 0)
+
+struct kvm_msix_entry {
+	__u32 id;
+	__u32 type;
+	__u32 entry; /* The index of entry in the MSI-X table */
+	__u32 flags;
+	__u32 query_flags;
+	__u32 reserved[5];
+};
+
+#define KVM_MSIX_MMIO_FLAG_REGISTER	(1 << 0)
+#define KVM_MSIX_MMIO_FLAG_UNREGISTER	(1 << 1)
+
+struct kvm_msix_mmio {
+	__u32 id;
+	__u32 type;
+	__u64 base_addr;
+	__u32 max_entries_nr;
+	__u32 flags;
+	__u32 reserved[6];
+};
+
 #endif /* __LINUX_KVM_H */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f09db87..57a437a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -501,6 +501,7 @@ struct kvm_guest_msix_entry {
 };
 
 #define KVM_ASSIGNED_ENABLED_IOMMU		(1 << 0)
+#define KVM_ASSIGNED_ENABLED_MSIX_MMIO		(1 << 1)
 struct kvm_assigned_dev_kernel {
 	struct kvm_irq_ack_notifier ack_notifier;
 	struct work_struct interrupt_work;
@@ -521,6 +522,10 @@ struct kvm_assigned_dev_kernel {
 	struct pci_dev *dev;
 	struct kvm *kvm;
 	spinlock_t assigned_dev_lock;
+	DECLARE_BITMAP(msix_mask_bitmap, KVM_MAX_MSIX_PER_DEV);
+	gpa_t msix_mmio_base;
+	struct kvm_io_device msix_mmio_dev;
+	int msix_max_entries_nr;
 };
 
 struct kvm_irq_mask_notifier {
diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index 5c6b96d..76a1f12 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -226,12 +226,27 @@ static void kvm_free_assigned_irq(struct kvm *kvm,
 	kvm_deassign_irq(kvm, assigned_dev, assigned_dev->irq_requested_type);
 }
 
+static void unregister_msix_mmio(struct kvm *kvm,
+				 struct kvm_assigned_dev_kernel *adev)
+{
+	if (adev->flags & KVM_ASSIGNED_ENABLED_MSIX_MMIO) {
+		mutex_lock(&kvm->slots_lock);
+		kvm_io_bus_unregister_dev(kvm, KVM_MMIO_BUS,
+				&adev->msix_mmio_dev);
+		mutex_unlock(&kvm->slots_lock);
+		adev->flags &= ~KVM_ASSIGNED_ENABLED_MSIX_MMIO;
+	}
+}
+
 static void kvm_free_assigned_device(struct kvm *kvm,
 				     struct kvm_assigned_dev_kernel
 				     *assigned_dev)
 {
 	kvm_free_assigned_irq(kvm, assigned_dev);
 
+#ifdef __KVM_HAVE_MSIX
+	unregister_msix_mmio(kvm, assigned_dev);
+#endif
 	pci_reset_function(assigned_dev->dev);
 
 	pci_release_regions(assigned_dev->dev);
@@ -504,7 +519,7 @@ out:
 static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
 				      struct kvm_assigned_pci_dev *assigned_dev)
 {
-	int r = 0, idx;
+	int r = 0, idx, i;
 	struct kvm_assigned_dev_kernel *match;
 	struct pci_dev *dev;
 
@@ -564,6 +579,10 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
 
 	list_add(&match->list, &kvm->arch.assigned_dev_head);
 
+	/* The state after reset of MSI-X table is all masked */
+	for (i = 0; i < KVM_MAX_MSIX_PER_DEV; i++)
+		set_bit(i, match->msix_mask_bitmap);
+
 	if (assigned_dev->flags & KVM_ASSIGNED_ENABLED_IOMMU) {
 		if (!kvm->arch.iommu_domain) {
 			r = kvm_iommu_map_guest(kvm);
@@ -667,6 +686,43 @@ msix_nr_out:
 	return r;
 }
 
+static void update_msix_mask(struct kvm_assigned_dev_kernel *adev,
+			     int idx, bool new_mask_flag)
+{
+	int irq;
+	bool old_mask_flag, need_flush = false;
+
+	spin_lock_irq(&adev->assigned_dev_lock);
+
+	if (!adev->dev->msix_enabled ||
+	    !(adev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX))
+		goto out;
+
+	old_mask_flag = test_bit(adev->guest_msix_entries[idx].entry,
+			adev->msix_mask_bitmap);
+	if (old_mask_flag == new_mask_flag)
+		goto out;
+
+	irq = adev->host_msix_entries[idx].vector;
+	BUG_ON(irq == 0);
+
+	if (new_mask_flag) {
+		set_bit(adev->guest_msix_entries[idx].entry,
+				adev->msix_mask_bitmap);
+		disable_irq_nosync(irq);
+		need_flush = true;
+	} else {
+		clear_bit(adev->guest_msix_entries[idx].entry,
+				adev->msix_mask_bitmap);
+		enable_irq(irq);
+	}
+out:
+	spin_unlock_irq(&adev->assigned_dev_lock);
+
+	if (need_flush)
+		flush_work(&adev->interrupt_work);
+}
+
 static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
 				       struct kvm_assigned_msix_entry *entry)
 {
@@ -701,6 +757,235 @@ msix_entry_out:
 
 	return r;
 }
+
+static int kvm_vm_ioctl_get_msix_entry(struct kvm *kvm,
+				       struct kvm_msix_entry *entry)
+{
+	int r = 0;
+	struct kvm_assigned_dev_kernel *adev;
+
+	if (entry->type != KVM_MSIX_TYPE_ASSIGNED_DEV)
+		return -EINVAL;
+
+	if (!entry->query_flags)
+		return -EINVAL;
+
+	mutex_lock(&kvm->lock);
+
+	adev = kvm_find_assigned_dev(&kvm->arch.assigned_dev_head,
+				      entry->id);
+
+	if (!adev) {
+		r = -EINVAL;
+		goto out;
+	}
+
+	if (entry->entry >= adev->msix_max_entries_nr) {
+		r = -ENOSPC;
+		goto out;
+	}
+
+	if (entry->query_flags & KVM_MSIX_FLAG_QUERY_MASKBIT) {
+		if (test_bit(entry->entry, adev->msix_mask_bitmap))
+			entry->flags |= KVM_MSIX_FLAG_MASKBIT;
+		else
+			entry->flags &= ~KVM_MSIX_FLAG_MASKBIT;
+	}
+
+out:
+	mutex_unlock(&kvm->lock);
+
+	return r;
+}
+
+static bool msix_mmio_in_range(struct kvm_assigned_dev_kernel *adev,
+			      gpa_t addr, int len)
+{
+	gpa_t start, end;
+
+	BUG_ON(!(adev->flags & KVM_ASSIGNED_ENABLED_MSIX_MMIO));
+	start = adev->msix_mmio_base;
+	end = adev->msix_mmio_base + PCI_MSIX_ENTRY_SIZE *
+		adev->msix_max_entries_nr;
+	if (addr >= start && addr + len <= end)
+		return true;
+
+	return false;
+}
+
+static int msix_get_enabled_idx(struct kvm_assigned_dev_kernel *adev,
+			      gpa_t addr, int len)
+{
+	int i, index = (addr - adev->msix_mmio_base) / PCI_MSIX_ENTRY_SIZE;
+
+	for (i = 0; i < adev->entries_nr; i++)
+		if (adev->guest_msix_entries[i].entry == index)
+			return i;
+
+	return -EINVAL;
+}
+
+static int msix_mmio_read(struct kvm_io_device *this, gpa_t addr, int len,
+			  void *val)
+{
+	struct kvm_assigned_dev_kernel *adev =
+			container_of(this, struct kvm_assigned_dev_kernel,
+				     msix_mmio_dev);
+	int idx, r = 0;
+	u32 entry[4];
+	struct kvm_kernel_irq_routing_entry e;
+
+	/* TODO: Get big-endian machine work */
+	mutex_lock(&adev->kvm->lock);
+	if (!msix_mmio_in_range(adev, addr, len)) {
+		r = -EOPNOTSUPP;
+		goto out;
+	}
+	if ((addr & 0x3) || len != 4)
+		goto out;
+
+	idx = msix_get_enabled_idx(adev, addr, len);
+	if (idx < 0) {
+		idx = (addr - adev->msix_mmio_base) / PCI_MSIX_ENTRY_SIZE;
+		if ((addr % PCI_MSIX_ENTRY_SIZE) ==
+				PCI_MSIX_ENTRY_VECTOR_CTRL)
+			*(unsigned long *)val =
+				test_bit(idx, adev->msix_mask_bitmap) ?
+				PCI_MSIX_ENTRY_CTRL_MASKBIT : 0;
+		else
+			r = -EOPNOTSUPP;
+		goto out;
+	}
+
+	r = kvm_get_irq_routing_entry(adev->kvm,
+			adev->guest_msix_entries[idx].vector, &e);
+	if (r || e.type != KVM_IRQ_ROUTING_MSI) {
+		r = -EOPNOTSUPP;
+		goto out;
+	}
+	entry[0] = e.msi.address_lo;
+	entry[1] = e.msi.address_hi;
+	entry[2] = e.msi.data;
+	entry[3] = test_bit(adev->guest_msix_entries[idx].entry,
+			adev->msix_mask_bitmap);
+	memcpy(val, &entry[addr % PCI_MSIX_ENTRY_SIZE / sizeof *entry], len);
+
+out:
+	mutex_unlock(&adev->kvm->lock);
+	return r;
+}
+
+static int msix_mmio_write(struct kvm_io_device *this, gpa_t addr, int len,
+			   const void *val)
+{
+	struct kvm_assigned_dev_kernel *adev =
+			container_of(this, struct kvm_assigned_dev_kernel,
+				     msix_mmio_dev);
+	int idx, r = 0;
+	unsigned long new_val = *(unsigned long *)val;
+
+	/* TODO: Get big-endian machine work */
+	mutex_lock(&adev->kvm->lock);
+	if (!msix_mmio_in_range(adev, addr, len)) {
+		r = -EOPNOTSUPP;
+		goto out;
+	}
+	if ((addr & 0x3) || len != 4)
+		goto out;
+
+	idx = msix_get_enabled_idx(adev, addr, len);
+	if (idx < 0) {
+		idx = (addr - adev->msix_mmio_base) / PCI_MSIX_ENTRY_SIZE;
+		if (((addr % PCI_MSIX_ENTRY_SIZE) ==
+				PCI_MSIX_ENTRY_VECTOR_CTRL)) {
+			if (new_val & ~PCI_MSIX_ENTRY_CTRL_MASKBIT)
+				goto out;
+			if (new_val & PCI_MSIX_ENTRY_CTRL_MASKBIT)
+				set_bit(idx, adev->msix_mask_bitmap);
+			else
+				clear_bit(idx, adev->msix_mask_bitmap);
+			/* It's possible that we need re-enable MSI-X, so go
+			 * back to userspace */
+		}
+		/* Userspace would handle other MMIO writing */
+		r = -EOPNOTSUPP;
+		goto out;
+	}
+	if (addr % PCI_MSIX_ENTRY_SIZE != PCI_MSIX_ENTRY_VECTOR_CTRL) {
+		r = -EOPNOTSUPP;
+		goto out;
+	}
+	if (new_val & ~PCI_MSIX_ENTRY_CTRL_MASKBIT)
+		goto out;
+	update_msix_mask(adev, idx, !!(new_val & PCI_MSIX_ENTRY_CTRL_MASKBIT));
+out:
+	mutex_unlock(&adev->kvm->lock);
+
+	return r;
+}
+
+static const struct kvm_io_device_ops msix_mmio_ops = {
+	.read     = msix_mmio_read,
+	.write    = msix_mmio_write,
+};
+
+static int kvm_vm_ioctl_update_msix_mmio(struct kvm *kvm,
+				struct kvm_msix_mmio *msix_mmio)
+{
+	int r = 0;
+	struct kvm_assigned_dev_kernel *adev;
+
+	if (msix_mmio->type != KVM_MSIX_TYPE_ASSIGNED_DEV)
+		return -EINVAL;
+
+	if (!msix_mmio->flags)
+		return -EINVAL;
+
+	mutex_lock(&kvm->lock);
+	adev = kvm_find_assigned_dev(&kvm->arch.assigned_dev_head,
+				      msix_mmio->id);
+	if (!adev) {
+		r = -EINVAL;
+		goto out;
+	}
+	if (msix_mmio->base_addr == 0) {
+		r = -EINVAL;
+		goto out;
+	}
+	if (msix_mmio->max_entries_nr == 0 ||
+			msix_mmio->max_entries_nr > KVM_MAX_MSIX_PER_DEV) {
+		r = -EINVAL;
+		goto out;
+	}
+
+	if ((msix_mmio->flags & KVM_MSIX_MMIO_FLAG_REGISTER) &&
+			(msix_mmio->flags & KVM_MSIX_MMIO_FLAG_UNREGISTER)) {
+		r = -EINVAL;
+		goto out;
+	}
+
+	if (msix_mmio->flags & KVM_MSIX_MMIO_FLAG_REGISTER) {
+		if (!(adev->flags & KVM_ASSIGNED_ENABLED_MSIX_MMIO)) {
+			mutex_lock(&kvm->slots_lock);
+			kvm_iodevice_init(&adev->msix_mmio_dev,
+					&msix_mmio_ops);
+			r = kvm_io_bus_register_dev(kvm, KVM_MMIO_BUS,
+					&adev->msix_mmio_dev);
+			if (!r)
+				adev->flags |= KVM_ASSIGNED_ENABLED_MSIX_MMIO;
+			mutex_unlock(&kvm->slots_lock);
+		}
+		if (!r) {
+			adev->msix_mmio_base = msix_mmio->base_addr;
+			adev->msix_max_entries_nr = msix_mmio->max_entries_nr;
+		}
+	} else if (msix_mmio->flags & KVM_MSIX_MMIO_FLAG_UNREGISTER)
+		unregister_msix_mmio(kvm, adev);
+out:
+	mutex_unlock(&kvm->lock);
+
+	return r;
+}
 #endif
 
 long kvm_vm_ioctl_assigned_device(struct kvm *kvm, unsigned ioctl,
@@ -813,6 +1098,37 @@ long kvm_vm_ioctl_assigned_device(struct kvm *kvm, unsigned ioctl,
 			goto out;
 		break;
 	}
+	case KVM_GET_MSIX_ENTRY: {
+		struct kvm_msix_entry entry;
+		r = -EFAULT;
+		if (copy_from_user(&entry, argp, sizeof entry))
+			goto out;
+		r = kvm_vm_ioctl_get_msix_entry(kvm, &entry);
+		if (r)
+			goto out;
+		r = -EFAULT;
+		if (copy_to_user(argp, &entry, sizeof entry))
+			goto out;
+		r = 0;
+		break;
+	}
+	case KVM_UPDATE_MSIX_MMIO: {
+		struct kvm_msix_mmio msix_mmio;
+
+		r = -EFAULT;
+		if (copy_from_user(&msix_mmio, argp, sizeof(msix_mmio)))
+			goto out;
+
+		r = -EINVAL;
+		if (find_first_bit((unsigned long *)msix_mmio.reserved,
+		    sizeof(msix_mmio.reserved)) < sizeof(msix_mmio.reserved))
+			goto out;
+
+		r = kvm_vm_ioctl_update_msix_mmio(kvm, &msix_mmio);
+		if (r)
+			goto out;
+		break;
+	}
 #endif
 	}
 out:
-- 
1.7.0.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 6/6 v5 updated] KVM: assigned dev: MSI-X mask support
  2010-11-15  9:15 ` [PATCH 6/6] KVM: assigned dev: MSI-X mask support Sheng Yang
@ 2010-11-15  9:27   ` Sheng Yang
  2010-11-16 19:45   ` [PATCH 6/6] " Marcelo Tosatti
  2010-11-17 13:58   ` Avi Kivity
  2 siblings, 0 replies; 26+ messages in thread
From: Sheng Yang @ 2010-11-15  9:27 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Michael S. Tsirkin, kvm, Sheng Yang

This patch enable per-vector mask for assigned devices using MSI-X.

This patch provided two new APIs: one is for guest to specific device's MSI-X
table address in MMIO, the other is for userspace to get information about mask
bit.

All the mask bit operation are kept in kernel, in order to accelerate.
Userspace shouldn't access the device MMIO directly for the information,
instead it should uses provided API to do so.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 arch/x86/kvm/x86.c       |    1 +
 include/linux/kvm.h      |   32 +++++
 include/linux/kvm_host.h |    5 +
 virt/kvm/assigned-dev.c  |  323 +++++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 360 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fc29223..37602e2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1966,6 +1966,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_X86_ROBUST_SINGLESTEP:
 	case KVM_CAP_XSAVE:
 	case KVM_CAP_ASYNC_PF:
+	case KVM_CAP_MSIX_MASK:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index ea2dc1a..b3e5ffe 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -541,6 +541,9 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_PPC_GET_PVINFO 57
 #define KVM_CAP_PPC_IRQ_LEVEL 58
 #define KVM_CAP_ASYNC_PF 59
+#ifdef __KVM_HAVE_MSIX
+#define KVM_CAP_MSIX_MASK 60
+#endif
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -672,6 +675,9 @@ struct kvm_clock_data {
 #define KVM_XEN_HVM_CONFIG        _IOW(KVMIO,  0x7a, struct kvm_xen_hvm_config)
 #define KVM_SET_CLOCK             _IOW(KVMIO,  0x7b, struct kvm_clock_data)
 #define KVM_GET_CLOCK             _IOR(KVMIO,  0x7c, struct kvm_clock_data)
+/* Available with KVM_CAP_MSIX_MASK */
+#define KVM_GET_MSIX_ENTRY        _IOWR(KVMIO,  0x7d, struct kvm_msix_entry)
+#define KVM_UPDATE_MSIX_MMIO      _IOW(KVMIO,  0x7e, struct kvm_msix_mmio)
 /* Available with KVM_CAP_PIT_STATE2 */
 #define KVM_GET_PIT2              _IOR(KVMIO,  0x9f, struct kvm_pit_state2)
 #define KVM_SET_PIT2              _IOW(KVMIO,  0xa0, struct kvm_pit_state2)
@@ -795,4 +801,30 @@ struct kvm_assigned_msix_entry {
 	__u16 padding[3];
 };
 
+#define KVM_MSIX_TYPE_ASSIGNED_DEV	1
+
+#define KVM_MSIX_FLAG_MASKBIT		(1 << 0)
+#define KVM_MSIX_FLAG_QUERY_MASKBIT	(1 << 0)
+
+struct kvm_msix_entry {
+	__u32 id;
+	__u32 type;
+	__u32 entry; /* The index of entry in the MSI-X table */
+	__u32 flags;
+	__u32 query_flags;
+	__u32 reserved[5];
+};
+
+#define KVM_MSIX_MMIO_FLAG_REGISTER	(1 << 0)
+#define KVM_MSIX_MMIO_FLAG_UNREGISTER	(1 << 1)
+
+struct kvm_msix_mmio {
+	__u32 id;
+	__u32 type;
+	__u64 base_addr;
+	__u32 max_entries_nr;
+	__u32 flags;
+	__u32 reserved[6];
+};
+
 #endif /* __LINUX_KVM_H */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f09db87..57a437a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -501,6 +501,7 @@ struct kvm_guest_msix_entry {
 };
 
 #define KVM_ASSIGNED_ENABLED_IOMMU		(1 << 0)
+#define KVM_ASSIGNED_ENABLED_MSIX_MMIO		(1 << 1)
 struct kvm_assigned_dev_kernel {
 	struct kvm_irq_ack_notifier ack_notifier;
 	struct work_struct interrupt_work;
@@ -521,6 +522,10 @@ struct kvm_assigned_dev_kernel {
 	struct pci_dev *dev;
 	struct kvm *kvm;
 	spinlock_t assigned_dev_lock;
+	DECLARE_BITMAP(msix_mask_bitmap, KVM_MAX_MSIX_PER_DEV);
+	gpa_t msix_mmio_base;
+	struct kvm_io_device msix_mmio_dev;
+	int msix_max_entries_nr;
 };
 
 struct kvm_irq_mask_notifier {
diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index 5c6b96d..a96a74d 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -226,12 +226,27 @@ static void kvm_free_assigned_irq(struct kvm *kvm,
 	kvm_deassign_irq(kvm, assigned_dev, assigned_dev->irq_requested_type);
 }
 
+static void unregister_msix_mmio(struct kvm *kvm,
+				 struct kvm_assigned_dev_kernel *adev)
+{
+	if (adev->flags & KVM_ASSIGNED_ENABLED_MSIX_MMIO) {
+		mutex_lock(&kvm->slots_lock);
+		kvm_io_bus_unregister_dev(kvm, KVM_MMIO_BUS,
+				&adev->msix_mmio_dev);
+		mutex_unlock(&kvm->slots_lock);
+		adev->flags &= ~KVM_ASSIGNED_ENABLED_MSIX_MMIO;
+	}
+}
+
 static void kvm_free_assigned_device(struct kvm *kvm,
 				     struct kvm_assigned_dev_kernel
 				     *assigned_dev)
 {
 	kvm_free_assigned_irq(kvm, assigned_dev);
 
+#ifdef __KVM_HAVE_MSIX
+	unregister_msix_mmio(kvm, assigned_dev);
+#endif
 	pci_reset_function(assigned_dev->dev);
 
 	pci_release_regions(assigned_dev->dev);
@@ -504,7 +519,7 @@ out:
 static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
 				      struct kvm_assigned_pci_dev *assigned_dev)
 {
-	int r = 0, idx;
+	int r = 0, idx, i;
 	struct kvm_assigned_dev_kernel *match;
 	struct pci_dev *dev;
 
@@ -564,6 +579,10 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
 
 	list_add(&match->list, &kvm->arch.assigned_dev_head);
 
+	/* The state after reset of MSI-X table is all masked */
+	for (i = 0; i < KVM_MAX_MSIX_PER_DEV; i++)
+		set_bit(i, match->msix_mask_bitmap);
+
 	if (assigned_dev->flags & KVM_ASSIGNED_ENABLED_IOMMU) {
 		if (!kvm->arch.iommu_domain) {
 			r = kvm_iommu_map_guest(kvm);
@@ -667,6 +686,43 @@ msix_nr_out:
 	return r;
 }
 
+static void update_msix_mask(struct kvm_assigned_dev_kernel *adev,
+			     int idx, bool new_mask_flag)
+{
+	int irq;
+	bool old_mask_flag, need_flush = false;
+
+	spin_lock_irq(&adev->assigned_dev_lock);
+
+	if (!adev->dev->msix_enabled ||
+	    !(adev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX))
+		goto out;
+
+	old_mask_flag = test_bit(adev->guest_msix_entries[idx].entry,
+			adev->msix_mask_bitmap);
+	if (old_mask_flag == new_mask_flag)
+		goto out;
+
+	irq = adev->host_msix_entries[idx].vector;
+	BUG_ON(irq == 0);
+
+	if (new_mask_flag) {
+		set_bit(adev->guest_msix_entries[idx].entry,
+				adev->msix_mask_bitmap);
+		disable_irq_nosync(irq);
+		need_flush = true;
+	} else {
+		clear_bit(adev->guest_msix_entries[idx].entry,
+				adev->msix_mask_bitmap);
+		enable_irq(irq);
+	}
+out:
+	spin_unlock_irq(&adev->assigned_dev_lock);
+
+	if (need_flush)
+		flush_work(&adev->interrupt_work);
+}
+
 static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
 				       struct kvm_assigned_msix_entry *entry)
 {
@@ -701,6 +757,240 @@ msix_entry_out:
 
 	return r;
 }
+
+static int kvm_vm_ioctl_get_msix_entry(struct kvm *kvm,
+				       struct kvm_msix_entry *entry)
+{
+	int r = 0;
+	struct kvm_assigned_dev_kernel *adev;
+
+	if (entry->type != KVM_MSIX_TYPE_ASSIGNED_DEV)
+		return -EINVAL;
+
+	if (!entry->query_flags)
+		return -EINVAL;
+
+	mutex_lock(&kvm->lock);
+
+	adev = kvm_find_assigned_dev(&kvm->arch.assigned_dev_head,
+				      entry->id);
+
+	if (!adev) {
+		r = -EINVAL;
+		goto out;
+	}
+
+	if (entry->entry >= adev->msix_max_entries_nr) {
+		r = -ENOSPC;
+		goto out;
+	}
+
+	if (entry->query_flags & KVM_MSIX_FLAG_QUERY_MASKBIT) {
+		if (test_bit(entry->entry, adev->msix_mask_bitmap))
+			entry->flags |= KVM_MSIX_FLAG_MASKBIT;
+		else
+			entry->flags &= ~KVM_MSIX_FLAG_MASKBIT;
+	}
+
+out:
+	mutex_unlock(&kvm->lock);
+
+	return r;
+}
+
+static bool msix_mmio_in_range(struct kvm_assigned_dev_kernel *adev,
+			      gpa_t addr, int len)
+{
+	gpa_t start, end;
+
+	BUG_ON(!(adev->flags & KVM_ASSIGNED_ENABLED_MSIX_MMIO));
+	start = adev->msix_mmio_base;
+	end = adev->msix_mmio_base + PCI_MSIX_ENTRY_SIZE *
+		adev->msix_max_entries_nr;
+	if (addr >= start && addr + len <= end)
+		return true;
+
+	return false;
+}
+
+static int msix_get_enabled_idx(struct kvm_assigned_dev_kernel *adev,
+			      gpa_t addr, int len)
+{
+	int i, index = (addr - adev->msix_mmio_base) / PCI_MSIX_ENTRY_SIZE;
+
+	for (i = 0; i < adev->entries_nr; i++)
+		if (adev->guest_msix_entries[i].entry == index)
+			return i;
+
+	return -EINVAL;
+}
+
+static int msix_mmio_read(struct kvm_io_device *this, gpa_t addr, int len,
+			  void *val)
+{
+	struct kvm_assigned_dev_kernel *adev =
+			container_of(this, struct kvm_assigned_dev_kernel,
+				     msix_mmio_dev);
+	int idx, r = 0;
+	u32 entry[4];
+	struct kvm_kernel_irq_routing_entry e;
+
+	/* TODO: Get big-endian machine work */
+	mutex_lock(&adev->kvm->lock);
+	if (!msix_mmio_in_range(adev, addr, len)) {
+		r = -EOPNOTSUPP;
+		goto out;
+	}
+	if ((addr & 0x3) || len != 4) {
+		r = -EOPNOTSUPP;
+		goto out;
+	}
+
+	idx = msix_get_enabled_idx(adev, addr, len);
+	if (idx < 0) {
+		idx = (addr - adev->msix_mmio_base) / PCI_MSIX_ENTRY_SIZE;
+		if ((addr % PCI_MSIX_ENTRY_SIZE) ==
+				PCI_MSIX_ENTRY_VECTOR_CTRL)
+			*(unsigned long *)val =
+				test_bit(idx, adev->msix_mask_bitmap) ?
+				PCI_MSIX_ENTRY_CTRL_MASKBIT : 0;
+		else
+			r = -EOPNOTSUPP;
+		goto out;
+	}
+
+	r = kvm_get_irq_routing_entry(adev->kvm,
+			adev->guest_msix_entries[idx].vector, &e);
+	if (r || e.type != KVM_IRQ_ROUTING_MSI) {
+		r = -EOPNOTSUPP;
+		goto out;
+	}
+	entry[0] = e.msi.address_lo;
+	entry[1] = e.msi.address_hi;
+	entry[2] = e.msi.data;
+	entry[3] = test_bit(adev->guest_msix_entries[idx].entry,
+			adev->msix_mask_bitmap);
+	memcpy(val, &entry[addr % PCI_MSIX_ENTRY_SIZE / sizeof *entry], len);
+
+out:
+	mutex_unlock(&adev->kvm->lock);
+	return r;
+}
+
+static int msix_mmio_write(struct kvm_io_device *this, gpa_t addr, int len,
+			   const void *val)
+{
+	struct kvm_assigned_dev_kernel *adev =
+			container_of(this, struct kvm_assigned_dev_kernel,
+				     msix_mmio_dev);
+	int idx, r = 0;
+	unsigned long new_val;
+
+	/* TODO: Get big-endian machine work */
+	mutex_lock(&adev->kvm->lock);
+	if (!msix_mmio_in_range(adev, addr, len)) {
+		r = -EOPNOTSUPP;
+		goto out;
+	}
+	if ((addr & 0x3) || len != 4) {
+		r = -EOPNOTSUPP;
+		goto out;
+	}
+
+	new_val = *(unsigned long *)val;
+	idx = msix_get_enabled_idx(adev, addr, len);
+	if (idx < 0) {
+		idx = (addr - adev->msix_mmio_base) / PCI_MSIX_ENTRY_SIZE;
+		if (((addr % PCI_MSIX_ENTRY_SIZE) ==
+				PCI_MSIX_ENTRY_VECTOR_CTRL)) {
+			if (new_val & ~PCI_MSIX_ENTRY_CTRL_MASKBIT)
+				goto out;
+			if (new_val & PCI_MSIX_ENTRY_CTRL_MASKBIT)
+				set_bit(idx, adev->msix_mask_bitmap);
+			else
+				clear_bit(idx, adev->msix_mask_bitmap);
+			/* It's possible that we need re-enable MSI-X, so go
+			 * back to userspace */
+		}
+		/* Userspace would handle other MMIO writing */
+		r = -EOPNOTSUPP;
+		goto out;
+	}
+	if (addr % PCI_MSIX_ENTRY_SIZE != PCI_MSIX_ENTRY_VECTOR_CTRL) {
+		r = -EOPNOTSUPP;
+		goto out;
+	}
+	if (new_val & ~PCI_MSIX_ENTRY_CTRL_MASKBIT)
+		goto out;
+	update_msix_mask(adev, idx, !!(new_val & PCI_MSIX_ENTRY_CTRL_MASKBIT));
+out:
+	mutex_unlock(&adev->kvm->lock);
+
+	return r;
+}
+
+static const struct kvm_io_device_ops msix_mmio_ops = {
+	.read     = msix_mmio_read,
+	.write    = msix_mmio_write,
+};
+
+static int kvm_vm_ioctl_update_msix_mmio(struct kvm *kvm,
+				struct kvm_msix_mmio *msix_mmio)
+{
+	int r = 0;
+	struct kvm_assigned_dev_kernel *adev;
+
+	if (msix_mmio->type != KVM_MSIX_TYPE_ASSIGNED_DEV)
+		return -EINVAL;
+
+	if (!msix_mmio->flags)
+		return -EINVAL;
+
+	mutex_lock(&kvm->lock);
+	adev = kvm_find_assigned_dev(&kvm->arch.assigned_dev_head,
+				      msix_mmio->id);
+	if (!adev) {
+		r = -EINVAL;
+		goto out;
+	}
+	if (msix_mmio->base_addr == 0) {
+		r = -EINVAL;
+		goto out;
+	}
+	if (msix_mmio->max_entries_nr == 0 ||
+			msix_mmio->max_entries_nr > KVM_MAX_MSIX_PER_DEV) {
+		r = -EINVAL;
+		goto out;
+	}
+
+	if ((msix_mmio->flags & KVM_MSIX_MMIO_FLAG_REGISTER) &&
+			(msix_mmio->flags & KVM_MSIX_MMIO_FLAG_UNREGISTER)) {
+		r = -EINVAL;
+		goto out;
+	}
+
+	if (msix_mmio->flags & KVM_MSIX_MMIO_FLAG_REGISTER) {
+		if (!(adev->flags & KVM_ASSIGNED_ENABLED_MSIX_MMIO)) {
+			mutex_lock(&kvm->slots_lock);
+			kvm_iodevice_init(&adev->msix_mmio_dev,
+					&msix_mmio_ops);
+			r = kvm_io_bus_register_dev(kvm, KVM_MMIO_BUS,
+					&adev->msix_mmio_dev);
+			if (!r)
+				adev->flags |= KVM_ASSIGNED_ENABLED_MSIX_MMIO;
+			mutex_unlock(&kvm->slots_lock);
+		}
+		if (!r) {
+			adev->msix_mmio_base = msix_mmio->base_addr;
+			adev->msix_max_entries_nr = msix_mmio->max_entries_nr;
+		}
+	} else if (msix_mmio->flags & KVM_MSIX_MMIO_FLAG_UNREGISTER)
+		unregister_msix_mmio(kvm, adev);
+out:
+	mutex_unlock(&kvm->lock);
+
+	return r;
+}
 #endif
 
 long kvm_vm_ioctl_assigned_device(struct kvm *kvm, unsigned ioctl,
@@ -813,6 +1103,37 @@ long kvm_vm_ioctl_assigned_device(struct kvm *kvm, unsigned ioctl,
 			goto out;
 		break;
 	}
+	case KVM_GET_MSIX_ENTRY: {
+		struct kvm_msix_entry entry;
+		r = -EFAULT;
+		if (copy_from_user(&entry, argp, sizeof entry))
+			goto out;
+		r = kvm_vm_ioctl_get_msix_entry(kvm, &entry);
+		if (r)
+			goto out;
+		r = -EFAULT;
+		if (copy_to_user(argp, &entry, sizeof entry))
+			goto out;
+		r = 0;
+		break;
+	}
+	case KVM_UPDATE_MSIX_MMIO: {
+		struct kvm_msix_mmio msix_mmio;
+
+		r = -EFAULT;
+		if (copy_from_user(&msix_mmio, argp, sizeof(msix_mmio)))
+			goto out;
+
+		r = -EINVAL;
+		if (find_first_bit((unsigned long *)msix_mmio.reserved,
+		    sizeof(msix_mmio.reserved)) < sizeof(msix_mmio.reserved))
+			goto out;
+
+		r = kvm_vm_ioctl_update_msix_mmio(kvm, &msix_mmio);
+		if (r)
+			goto out;
+		break;
+	}
 #endif
 	}
 out:
-- 
1.7.0.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support
  2010-11-15  9:15 ` [PATCH 6/6] KVM: assigned dev: MSI-X mask support Sheng Yang
  2010-11-15  9:27   ` [PATCH 6/6 v5 updated] " Sheng Yang
@ 2010-11-16 19:45   ` Marcelo Tosatti
  2010-11-17  1:29     ` Sheng Yang
  2010-11-17 13:58   ` Avi Kivity
  2 siblings, 1 reply; 26+ messages in thread
From: Marcelo Tosatti @ 2010-11-16 19:45 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Avi Kivity, Michael S. Tsirkin, kvm

On Mon, Nov 15, 2010 at 05:15:32PM +0800, Sheng Yang wrote:
> This patch enable per-vector mask for assigned devices using MSI-X.
> 
> This patch provided two new APIs: one is for guest to specific device's MSI-X
> table address in MMIO, the other is for userspace to get information about mask
> bit.
> 
> All the mask bit operation are kept in kernel, in order to accelerate.
> Userspace shouldn't access the device MMIO directly for the information,
> instead it should uses provided API to do so.
> 
> Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> ---
>  arch/x86/kvm/x86.c       |    1 +
>  include/linux/kvm.h      |   32 +++++
>  include/linux/kvm_host.h |    5 +
>  virt/kvm/assigned-dev.c  |  318 +++++++++++++++++++++++++++++++++++++++++++++-
>  4 files changed, 355 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fc29223..37602e2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1966,6 +1966,7 @@ int kvm_dev_ioctl_check_extension(long ext)
>  	case KVM_CAP_X86_ROBUST_SINGLESTEP:
>  	case KVM_CAP_XSAVE:
>  	case KVM_CAP_ASYNC_PF:
> +	case KVM_CAP_MSIX_MASK:
>  		r = 1;
>  		break;
>  	case KVM_CAP_COALESCED_MMIO:
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index ea2dc1a..b3e5ffe 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -541,6 +541,9 @@ struct kvm_ppc_pvinfo {
>  #define KVM_CAP_PPC_GET_PVINFO 57
>  #define KVM_CAP_PPC_IRQ_LEVEL 58
>  #define KVM_CAP_ASYNC_PF 59
> +#ifdef __KVM_HAVE_MSIX
> +#define KVM_CAP_MSIX_MASK 60
> +#endif
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> @@ -672,6 +675,9 @@ struct kvm_clock_data {
>  #define KVM_XEN_HVM_CONFIG        _IOW(KVMIO,  0x7a, struct kvm_xen_hvm_config)
>  #define KVM_SET_CLOCK             _IOW(KVMIO,  0x7b, struct kvm_clock_data)
>  #define KVM_GET_CLOCK             _IOR(KVMIO,  0x7c, struct kvm_clock_data)
> +/* Available with KVM_CAP_MSIX_MASK */
> +#define KVM_GET_MSIX_ENTRY        _IOWR(KVMIO,  0x7d, struct kvm_msix_entry)
> +#define KVM_UPDATE_MSIX_MMIO      _IOW(KVMIO,  0x7e, struct kvm_msix_mmio)
>  /* Available with KVM_CAP_PIT_STATE2 */
>  #define KVM_GET_PIT2              _IOR(KVMIO,  0x9f, struct kvm_pit_state2)
>  #define KVM_SET_PIT2              _IOW(KVMIO,  0xa0, struct kvm_pit_state2)
> @@ -795,4 +801,30 @@ struct kvm_assigned_msix_entry {
>  	__u16 padding[3];
>  };
>  
> +#define KVM_MSIX_TYPE_ASSIGNED_DEV	1
> +
> +#define KVM_MSIX_FLAG_MASKBIT		(1 << 0)
> +#define KVM_MSIX_FLAG_QUERY_MASKBIT	(1 << 0)
> +
> +struct kvm_msix_entry {
> +	__u32 id;
> +	__u32 type;

Is type really necessary? Will it ever differ from
KVM_MSIX_TYPE_ASSIGNED_DEV?

> +	__u32 entry; /* The index of entry in the MSI-X table */
> +	__u32 flags;
> +	__u32 query_flags;
> +	__u32 reserved[5];
> +};
> +
> +#define KVM_MSIX_MMIO_FLAG_REGISTER	(1 << 0)
> +#define KVM_MSIX_MMIO_FLAG_UNREGISTER	(1 << 1)
> +
> +struct kvm_msix_mmio {
> +	__u32 id;
> +	__u32 type;
> +	__u64 base_addr;
> +	__u32 max_entries_nr;
> +	__u32 flags;
> +	__u32 reserved[6];
> +};
> +
>  #endif /* __LINUX_KVM_H */
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index f09db87..57a437a 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -501,6 +501,7 @@ struct kvm_guest_msix_entry {
>  };
>  
>  #define KVM_ASSIGNED_ENABLED_IOMMU		(1 << 0)
> +#define KVM_ASSIGNED_ENABLED_MSIX_MMIO		(1 << 1)
>  struct kvm_assigned_dev_kernel {
>  	struct kvm_irq_ack_notifier ack_notifier;
>  	struct work_struct interrupt_work;
> @@ -521,6 +522,10 @@ struct kvm_assigned_dev_kernel {
>  	struct pci_dev *dev;
>  	struct kvm *kvm;
>  	spinlock_t assigned_dev_lock;
> +	DECLARE_BITMAP(msix_mask_bitmap, KVM_MAX_MSIX_PER_DEV);
> +	gpa_t msix_mmio_base;
> +	struct kvm_io_device msix_mmio_dev;
> +	int msix_max_entries_nr;
>  };
>  
>  struct kvm_irq_mask_notifier {
> diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> index 5c6b96d..76a1f12 100644
> --- a/virt/kvm/assigned-dev.c
> +++ b/virt/kvm/assigned-dev.c
> @@ -226,12 +226,27 @@ static void kvm_free_assigned_irq(struct kvm *kvm,
>  	kvm_deassign_irq(kvm, assigned_dev, assigned_dev->irq_requested_type);
>  }
>  
> +static void unregister_msix_mmio(struct kvm *kvm,
> +				 struct kvm_assigned_dev_kernel *adev)
> +{
> +	if (adev->flags & KVM_ASSIGNED_ENABLED_MSIX_MMIO) {
> +		mutex_lock(&kvm->slots_lock);
> +		kvm_io_bus_unregister_dev(kvm, KVM_MMIO_BUS,
> +				&adev->msix_mmio_dev);
> +		mutex_unlock(&kvm->slots_lock);
> +		adev->flags &= ~KVM_ASSIGNED_ENABLED_MSIX_MMIO;
> +	}
> +}
> +
>  static void kvm_free_assigned_device(struct kvm *kvm,
>  				     struct kvm_assigned_dev_kernel
>  				     *assigned_dev)
>  {
>  	kvm_free_assigned_irq(kvm, assigned_dev);
>  
> +#ifdef __KVM_HAVE_MSIX
> +	unregister_msix_mmio(kvm, assigned_dev);
> +#endif
>  	pci_reset_function(assigned_dev->dev);
>  
>  	pci_release_regions(assigned_dev->dev);
> @@ -504,7 +519,7 @@ out:
>  static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
>  				      struct kvm_assigned_pci_dev *assigned_dev)
>  {
> -	int r = 0, idx;
> +	int r = 0, idx, i;
>  	struct kvm_assigned_dev_kernel *match;
>  	struct pci_dev *dev;
>  
> @@ -564,6 +579,10 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
>  
>  	list_add(&match->list, &kvm->arch.assigned_dev_head);
>  
> +	/* The state after reset of MSI-X table is all masked */
> +	for (i = 0; i < KVM_MAX_MSIX_PER_DEV; i++)
> +		set_bit(i, match->msix_mask_bitmap);
> +
>  	if (assigned_dev->flags & KVM_ASSIGNED_ENABLED_IOMMU) {
>  		if (!kvm->arch.iommu_domain) {
>  			r = kvm_iommu_map_guest(kvm);
> @@ -667,6 +686,43 @@ msix_nr_out:
>  	return r;
>  }
>  
> +static void update_msix_mask(struct kvm_assigned_dev_kernel *adev,
> +			     int idx, bool new_mask_flag)
> +{
> +	int irq;
> +	bool old_mask_flag, need_flush = false;
> +
> +	spin_lock_irq(&adev->assigned_dev_lock);
> +
> +	if (!adev->dev->msix_enabled ||
> +	    !(adev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX))
> +		goto out;
> +
> +	old_mask_flag = test_bit(adev->guest_msix_entries[idx].entry,
> +			adev->msix_mask_bitmap);
> +	if (old_mask_flag == new_mask_flag)
> +		goto out;
> +
> +	irq = adev->host_msix_entries[idx].vector;
> +	BUG_ON(irq == 0);
> +
> +	if (new_mask_flag) {
> +		set_bit(adev->guest_msix_entries[idx].entry,
> +				adev->msix_mask_bitmap);
> +		disable_irq_nosync(irq);
> +		need_flush = true;
> +	} else {
> +		clear_bit(adev->guest_msix_entries[idx].entry,
> +				adev->msix_mask_bitmap);
> +		enable_irq(irq);
> +	}
> +out:
> +	spin_unlock_irq(&adev->assigned_dev_lock);
> +
> +	if (need_flush)
> +		flush_work(&adev->interrupt_work);
> +}
> +
>  static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
>  				       struct kvm_assigned_msix_entry *entry)
>  {
> @@ -701,6 +757,235 @@ msix_entry_out:
>  
>  	return r;
>  }
> +
> +static int kvm_vm_ioctl_get_msix_entry(struct kvm *kvm,
> +				       struct kvm_msix_entry *entry)
> +{
> +	int r = 0;
> +	struct kvm_assigned_dev_kernel *adev;
> +
> +	if (entry->type != KVM_MSIX_TYPE_ASSIGNED_DEV)
> +		return -EINVAL;
> +
> +	if (!entry->query_flags)
> +		return -EINVAL;
> +
> +	mutex_lock(&kvm->lock);
> +
> +	adev = kvm_find_assigned_dev(&kvm->arch.assigned_dev_head,
> +				      entry->id);
> +
> +	if (!adev) {
> +		r = -EINVAL;
> +		goto out;
> +	}
> +
> +	if (entry->entry >= adev->msix_max_entries_nr) {
> +		r = -ENOSPC;
> +		goto out;
> +	}
> +
> +	if (entry->query_flags & KVM_MSIX_FLAG_QUERY_MASKBIT) {
> +		if (test_bit(entry->entry, adev->msix_mask_bitmap))
> +			entry->flags |= KVM_MSIX_FLAG_MASKBIT;
> +		else
> +			entry->flags &= ~KVM_MSIX_FLAG_MASKBIT;
> +	}
> +
> +out:
> +	mutex_unlock(&kvm->lock);
> +
> +	return r;
> +}
> +
> +static bool msix_mmio_in_range(struct kvm_assigned_dev_kernel *adev,
> +			      gpa_t addr, int len)
> +{
> +	gpa_t start, end;
> +
> +	BUG_ON(!(adev->flags & KVM_ASSIGNED_ENABLED_MSIX_MMIO));
> +	start = adev->msix_mmio_base;
> +	end = adev->msix_mmio_base + PCI_MSIX_ENTRY_SIZE *
> +		adev->msix_max_entries_nr;
> +	if (addr >= start && addr + len <= end)
> +		return true;
> +
> +	return false;
> +}
> +
> +static int msix_get_enabled_idx(struct kvm_assigned_dev_kernel *adev,
> +			      gpa_t addr, int len)
> +{
> +	int i, index = (addr - adev->msix_mmio_base) / PCI_MSIX_ENTRY_SIZE;
> +
> +	for (i = 0; i < adev->entries_nr; i++)
> +		if (adev->guest_msix_entries[i].entry == index)
> +			return i;
> +
> +	return -EINVAL;
> +}
> +
> +static int msix_mmio_read(struct kvm_io_device *this, gpa_t addr, int len,
> +			  void *val)
> +{
> +	struct kvm_assigned_dev_kernel *adev =
> +			container_of(this, struct kvm_assigned_dev_kernel,
> +				     msix_mmio_dev);
> +	int idx, r = 0;
> +	u32 entry[4];
> +	struct kvm_kernel_irq_routing_entry e;
> +
> +	/* TODO: Get big-endian machine work */
> +	mutex_lock(&adev->kvm->lock);
> +	if (!msix_mmio_in_range(adev, addr, len)) {
> +		r = -EOPNOTSUPP;
> +		goto out;
> +	}

The unregister path does:

mutex_lock(kvm->lock)
kvm_io_bus_unregister_dev()
synchronize_srcu()

If an instance of msix_mmio_read/msix_mmio_write is waiting on
kvm->lock, synchronize_srcu will never complete.

You should use a separate lock for the in range check (and have it mind
that reads/writes can trigger after kvm_io_bus_register_dev, so all
state accessible in the r/w handlers should be complete by that time).

> +	if ((addr & 0x3) || len != 4)
> +		goto out;
> +
> +	idx = msix_get_enabled_idx(adev, addr, len);
> +	if (idx < 0) {
> +		idx = (addr - adev->msix_mmio_base) / PCI_MSIX_ENTRY_SIZE;
> +		if ((addr % PCI_MSIX_ENTRY_SIZE) ==
> +				PCI_MSIX_ENTRY_VECTOR_CTRL)
> +			*(unsigned long *)val =
> +				test_bit(idx, adev->msix_mask_bitmap) ?
> +				PCI_MSIX_ENTRY_CTRL_MASKBIT : 0;
> +		else
> +			r = -EOPNOTSUPP;
> +		goto out;
> +	}
> +
> +	r = kvm_get_irq_routing_entry(adev->kvm,
> +			adev->guest_msix_entries[idx].vector, &e);
> +	if (r || e.type != KVM_IRQ_ROUTING_MSI) {
> +		r = -EOPNOTSUPP;
> +		goto out;
> +	}
> +	entry[0] = e.msi.address_lo;
> +	entry[1] = e.msi.address_hi;
> +	entry[2] = e.msi.data;
> +	entry[3] = test_bit(adev->guest_msix_entries[idx].entry,
> +			adev->msix_mask_bitmap);
> +	memcpy(val, &entry[addr % PCI_MSIX_ENTRY_SIZE / sizeof *entry], len);

Division by zero?



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support
  2010-11-16 19:45   ` [PATCH 6/6] " Marcelo Tosatti
@ 2010-11-17  1:29     ` Sheng Yang
  2010-11-17 13:35       ` Marcelo Tosatti
  2010-11-18  9:43       ` Michael S. Tsirkin
  0 siblings, 2 replies; 26+ messages in thread
From: Sheng Yang @ 2010-11-17  1:29 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, Michael S. Tsirkin, kvm

On Wednesday 17 November 2010 03:45:22 Marcelo Tosatti wrote:
> On Mon, Nov 15, 2010 at 05:15:32PM +0800, Sheng Yang wrote:
> > This patch enable per-vector mask for assigned devices using MSI-X.
> > 
> > This patch provided two new APIs: one is for guest to specific device's
> > MSI-X table address in MMIO, the other is for userspace to get
> > information about mask bit.
> > 
> > All the mask bit operation are kept in kernel, in order to accelerate.
> > Userspace shouldn't access the device MMIO directly for the information,
> > instead it should uses provided API to do so.
> > 
> > Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> > ---
> > 
> >  arch/x86/kvm/x86.c       |    1 +
> >  include/linux/kvm.h      |   32 +++++
> >  include/linux/kvm_host.h |    5 +
> >  virt/kvm/assigned-dev.c  |  318
> >  +++++++++++++++++++++++++++++++++++++++++++++- 4 files changed, 
355
> >  insertions(+), 1 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index fc29223..37602e2 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1966,6 +1966,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> > 
> >  	case KVM_CAP_X86_ROBUST_SINGLESTEP:
> >  	case KVM_CAP_XSAVE:
> > 
> >  	case KVM_CAP_ASYNC_PF:
> > +	case KVM_CAP_MSIX_MASK:
> >  		r = 1;
> >  		break;
> >  	
> >  	case KVM_CAP_COALESCED_MMIO:
> > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > index ea2dc1a..b3e5ffe 100644
> > --- a/include/linux/kvm.h
> > +++ b/include/linux/kvm.h
> > @@ -541,6 +541,9 @@ struct kvm_ppc_pvinfo {
> > 
> >  #define KVM_CAP_PPC_GET_PVINFO 57
> >  #define KVM_CAP_PPC_IRQ_LEVEL 58
> >  #define KVM_CAP_ASYNC_PF 59
> > 
> > +#ifdef __KVM_HAVE_MSIX
> > +#define KVM_CAP_MSIX_MASK 60
> > +#endif
> > 
> >  #ifdef KVM_CAP_IRQ_ROUTING
> > 
> > @@ -672,6 +675,9 @@ struct kvm_clock_data {
> > 
> >  #define KVM_XEN_HVM_CONFIG        _IOW(KVMIO,  0x7a, struct
> >  kvm_xen_hvm_config) #define KVM_SET_CLOCK             _IOW(KVMIO, 
> >  0x7b, struct kvm_clock_data) #define KVM_GET_CLOCK            
> >  _IOR(KVMIO,  0x7c, struct kvm_clock_data)
> > 
> > +/* Available with KVM_CAP_MSIX_MASK */
> > +#define KVM_GET_MSIX_ENTRY        _IOWR(KVMIO,  0x7d, struct
> > kvm_msix_entry) +#define KVM_UPDATE_MSIX_MMIO      _IOW(KVMIO,  0x7e,
> > struct kvm_msix_mmio)
> > 
> >  /* Available with KVM_CAP_PIT_STATE2 */
> >  #define KVM_GET_PIT2              _IOR(KVMIO,  0x9f, struct
> >  kvm_pit_state2) #define KVM_SET_PIT2              _IOW(KVMIO,  0xa0,
> >  struct kvm_pit_state2)
> > 
> > @@ -795,4 +801,30 @@ struct kvm_assigned_msix_entry {
> > 
> >  	__u16 padding[3];
> >  
> >  };
> > 
> > +#define KVM_MSIX_TYPE_ASSIGNED_DEV	1
> > +
> > +#define KVM_MSIX_FLAG_MASKBIT		(1 << 0)
> > +#define KVM_MSIX_FLAG_QUERY_MASKBIT	(1 << 0)
> > +
> > +struct kvm_msix_entry {
> > +	__u32 id;
> > +	__u32 type;
> 
> Is type really necessary? Will it ever differ from
> KVM_MSIX_TYPE_ASSIGNED_DEV?

This is the suggestion from Michael. He want it to be reused by emulated/pv 
devices. So I add the type field here.
> 
> > +	__u32 entry; /* The index of entry in the MSI-X table */
> > +	__u32 flags;
> > +	__u32 query_flags;
> > +	__u32 reserved[5];
> > +};
> > +
> > +#define KVM_MSIX_MMIO_FLAG_REGISTER	(1 << 0)
> > +#define KVM_MSIX_MMIO_FLAG_UNREGISTER	(1 << 1)
> > +
> > +struct kvm_msix_mmio {
> > +	__u32 id;
> > +	__u32 type;
> > +	__u64 base_addr;
> > +	__u32 max_entries_nr;
> > +	__u32 flags;
> > +	__u32 reserved[6];
> > +};
> > +
> > 
> >  #endif /* __LINUX_KVM_H */
> > 
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index f09db87..57a437a 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -501,6 +501,7 @@ struct kvm_guest_msix_entry {
> > 
> >  };
> >  
> >  #define KVM_ASSIGNED_ENABLED_IOMMU		(1 << 0)
> > 
> > +#define KVM_ASSIGNED_ENABLED_MSIX_MMIO		(1 << 1)
> > 
> >  struct kvm_assigned_dev_kernel {
> >  
> >  	struct kvm_irq_ack_notifier ack_notifier;
> >  	struct work_struct interrupt_work;
> > 
> > @@ -521,6 +522,10 @@ struct kvm_assigned_dev_kernel {
> > 
> >  	struct pci_dev *dev;
> >  	struct kvm *kvm;
> >  	spinlock_t assigned_dev_lock;
> > 
> > +	DECLARE_BITMAP(msix_mask_bitmap, KVM_MAX_MSIX_PER_DEV);
> > +	gpa_t msix_mmio_base;
> > +	struct kvm_io_device msix_mmio_dev;
> > +	int msix_max_entries_nr;
> > 
> >  };
> >  
> >  struct kvm_irq_mask_notifier {
> > 
> > diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> > index 5c6b96d..76a1f12 100644
> > --- a/virt/kvm/assigned-dev.c
> > +++ b/virt/kvm/assigned-dev.c
> > @@ -226,12 +226,27 @@ static void kvm_free_assigned_irq(struct kvm *kvm,
> > 
> >  	kvm_deassign_irq(kvm, assigned_dev, assigned_dev->irq_requested_type);
> >  
> >  }
> > 
> > +static void unregister_msix_mmio(struct kvm *kvm,
> > +				 struct kvm_assigned_dev_kernel *adev)
> > +{
> > +	if (adev->flags & KVM_ASSIGNED_ENABLED_MSIX_MMIO) {
> > +		mutex_lock(&kvm->slots_lock);
> > +		kvm_io_bus_unregister_dev(kvm, KVM_MMIO_BUS,
> > +				&adev->msix_mmio_dev);
> > +		mutex_unlock(&kvm->slots_lock);
> > +		adev->flags &= ~KVM_ASSIGNED_ENABLED_MSIX_MMIO;
> > +	}
> > +}
> > +
> > 
> >  static void kvm_free_assigned_device(struct kvm *kvm,
> >  
> >  				     struct kvm_assigned_dev_kernel
> >  				     *assigned_dev)
> >  
> >  {
> >  
> >  	kvm_free_assigned_irq(kvm, assigned_dev);
> > 
> > +#ifdef __KVM_HAVE_MSIX
> > +	unregister_msix_mmio(kvm, assigned_dev);
> > +#endif
> > 
> >  	pci_reset_function(assigned_dev->dev);
> >  	
> >  	pci_release_regions(assigned_dev->dev);
> > 
> > @@ -504,7 +519,7 @@ out:
> >  static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
> >  
> >  				      struct kvm_assigned_pci_dev *assigned_dev)
> >  
> >  {
> > 
> > -	int r = 0, idx;
> > +	int r = 0, idx, i;
> > 
> >  	struct kvm_assigned_dev_kernel *match;
> >  	struct pci_dev *dev;
> > 
> > @@ -564,6 +579,10 @@ static int kvm_vm_ioctl_assign_device(struct kvm
> > *kvm,
> > 
> >  	list_add(&match->list, &kvm->arch.assigned_dev_head);
> > 
> > +	/* The state after reset of MSI-X table is all masked */
> > +	for (i = 0; i < KVM_MAX_MSIX_PER_DEV; i++)
> > +		set_bit(i, match->msix_mask_bitmap);
> > +
> > 
> >  	if (assigned_dev->flags & KVM_ASSIGNED_ENABLED_IOMMU) {
> >  	
> >  		if (!kvm->arch.iommu_domain) {
> >  		
> >  			r = kvm_iommu_map_guest(kvm);
> > 
> > @@ -667,6 +686,43 @@ msix_nr_out:
> >  	return r;
> >  
> >  }
> > 
> > +static void update_msix_mask(struct kvm_assigned_dev_kernel *adev,
> > +			     int idx, bool new_mask_flag)
> > +{
> > +	int irq;
> > +	bool old_mask_flag, need_flush = false;
> > +
> > +	spin_lock_irq(&adev->assigned_dev_lock);
> > +
> > +	if (!adev->dev->msix_enabled ||
> > +	    !(adev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX))
> > +		goto out;
> > +
> > +	old_mask_flag = test_bit(adev->guest_msix_entries[idx].entry,
> > +			adev->msix_mask_bitmap);
> > +	if (old_mask_flag == new_mask_flag)
> > +		goto out;
> > +
> > +	irq = adev->host_msix_entries[idx].vector;
> > +	BUG_ON(irq == 0);
> > +
> > +	if (new_mask_flag) {
> > +		set_bit(adev->guest_msix_entries[idx].entry,
> > +				adev->msix_mask_bitmap);
> > +		disable_irq_nosync(irq);
> > +		need_flush = true;
> > +	} else {
> > +		clear_bit(adev->guest_msix_entries[idx].entry,
> > +				adev->msix_mask_bitmap);
> > +		enable_irq(irq);
> > +	}
> > +out:
> > +	spin_unlock_irq(&adev->assigned_dev_lock);
> > +
> > +	if (need_flush)
> > +		flush_work(&adev->interrupt_work);
> > +}
> > +
> > 
> >  static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
> >  
> >  				       struct kvm_assigned_msix_entry *entry)
> >  
> >  {
> > 
> > @@ -701,6 +757,235 @@ msix_entry_out:
> >  	return r;
> >  
> >  }
> > 
> > +
> > +static int kvm_vm_ioctl_get_msix_entry(struct kvm *kvm,
> > +				       struct kvm_msix_entry *entry)
> > +{
> > +	int r = 0;
> > +	struct kvm_assigned_dev_kernel *adev;
> > +
> > +	if (entry->type != KVM_MSIX_TYPE_ASSIGNED_DEV)
> > +		return -EINVAL;
> > +
> > +	if (!entry->query_flags)
> > +		return -EINVAL;
> > +
> > +	mutex_lock(&kvm->lock);
> > +
> > +	adev = kvm_find_assigned_dev(&kvm->arch.assigned_dev_head,
> > +				      entry->id);
> > +
> > +	if (!adev) {
> > +		r = -EINVAL;
> > +		goto out;
> > +	}
> > +
> > +	if (entry->entry >= adev->msix_max_entries_nr) {
> > +		r = -ENOSPC;
> > +		goto out;
> > +	}
> > +
> > +	if (entry->query_flags & KVM_MSIX_FLAG_QUERY_MASKBIT) {
> > +		if (test_bit(entry->entry, adev->msix_mask_bitmap))
> > +			entry->flags |= KVM_MSIX_FLAG_MASKBIT;
> > +		else
> > +			entry->flags &= ~KVM_MSIX_FLAG_MASKBIT;
> > +	}
> > +
> > +out:
> > +	mutex_unlock(&kvm->lock);
> > +
> > +	return r;
> > +}
> > +
> > +static bool msix_mmio_in_range(struct kvm_assigned_dev_kernel *adev,
> > +			      gpa_t addr, int len)
> > +{
> > +	gpa_t start, end;
> > +
> > +	BUG_ON(!(adev->flags & KVM_ASSIGNED_ENABLED_MSIX_MMIO));
> > +	start = adev->msix_mmio_base;
> > +	end = adev->msix_mmio_base + PCI_MSIX_ENTRY_SIZE *
> > +		adev->msix_max_entries_nr;
> > +	if (addr >= start && addr + len <= end)
> > +		return true;
> > +
> > +	return false;
> > +}
> > +
> > +static int msix_get_enabled_idx(struct kvm_assigned_dev_kernel *adev,
> > +			      gpa_t addr, int len)
> > +{
> > +	int i, index = (addr - adev->msix_mmio_base) / PCI_MSIX_ENTRY_SIZE;
> > +
> > +	for (i = 0; i < adev->entries_nr; i++)
> > +		if (adev->guest_msix_entries[i].entry == index)
> > +			return i;
> > +
> > +	return -EINVAL;
> > +}
> > +
> > +static int msix_mmio_read(struct kvm_io_device *this, gpa_t addr, int
> > len, +			  void *val)
> > +{
> > +	struct kvm_assigned_dev_kernel *adev =
> > +			container_of(this, struct kvm_assigned_dev_kernel,
> > +				     msix_mmio_dev);
> > +	int idx, r = 0;
> > +	u32 entry[4];
> > +	struct kvm_kernel_irq_routing_entry e;
> > +
> > +	/* TODO: Get big-endian machine work */
> > +	mutex_lock(&adev->kvm->lock);
> > +	if (!msix_mmio_in_range(adev, addr, len)) {
> > +		r = -EOPNOTSUPP;
> > +		goto out;
> > +	}
> 
> The unregister path does:
> 
> mutex_lock(kvm->lock)
> kvm_io_bus_unregister_dev()
> synchronize_srcu()
> 
> If an instance of msix_mmio_read/msix_mmio_write is waiting on
> kvm->lock, synchronize_srcu will never complete.
> 
> You should use a separate lock for the in range check (and have it mind
> that reads/writes can trigger after kvm_io_bus_register_dev, so all
> state accessible in the r/w handlers should be complete by that time).

Good point! Would update it.
> 
> > +	if ((addr & 0x3) || len != 4)
> > +		goto out;
> > +
> > +	idx = msix_get_enabled_idx(adev, addr, len);
> > +	if (idx < 0) {
> > +		idx = (addr - adev->msix_mmio_base) / PCI_MSIX_ENTRY_SIZE;
> > +		if ((addr % PCI_MSIX_ENTRY_SIZE) ==
> > +				PCI_MSIX_ENTRY_VECTOR_CTRL)
> > +			*(unsigned long *)val =
> > +				test_bit(idx, adev->msix_mask_bitmap) ?
> > +				PCI_MSIX_ENTRY_CTRL_MASKBIT : 0;
> > +		else
> > +			r = -EOPNOTSUPP;
> > +		goto out;
> > +	}
> > +
> > +	r = kvm_get_irq_routing_entry(adev->kvm,
> > +			adev->guest_msix_entries[idx].vector, &e);
> > +	if (r || e.type != KVM_IRQ_ROUTING_MSI) {
> > +		r = -EOPNOTSUPP;
> > +		goto out;
> > +	}
> > +	entry[0] = e.msi.address_lo;
> > +	entry[1] = e.msi.address_hi;
> > +	entry[2] = e.msi.data;
> > +	entry[3] = test_bit(adev->guest_msix_entries[idx].entry,
> > +			adev->msix_mask_bitmap);
> > +	memcpy(val, &entry[addr % PCI_MSIX_ENTRY_SIZE / sizeof *entry], len);
> 
> Division by zero?

Not quite understand. You mean sizeof *entry or PCI_MSIX_ENTRY_SIZE? Both of them 
should be positive integer I think... Maybe I should use sizeof u32 here?

--
regards
Yang, Sheng

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support
  2010-11-17  1:29     ` Sheng Yang
@ 2010-11-17 13:35       ` Marcelo Tosatti
  2010-11-18  9:43       ` Michael S. Tsirkin
  1 sibling, 0 replies; 26+ messages in thread
From: Marcelo Tosatti @ 2010-11-17 13:35 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Avi Kivity, Michael S. Tsirkin, kvm

On Wed, Nov 17, 2010 at 09:29:22AM +0800, Sheng Yang wrote:
> > > +			adev->msix_mask_bitmap);
> > > +	memcpy(val, &entry[addr % PCI_MSIX_ENTRY_SIZE / sizeof *entry], len);
> > 
> > Division by zero?
> 
> Not quite understand. You mean sizeof *entry or PCI_MSIX_ENTRY_SIZE? Both of them 
> should be positive integer I think... Maybe I should use sizeof u32 here?

You're right, nevermind.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support
  2010-11-15  9:15 ` [PATCH 6/6] KVM: assigned dev: MSI-X mask support Sheng Yang
  2010-11-15  9:27   ` [PATCH 6/6 v5 updated] " Sheng Yang
  2010-11-16 19:45   ` [PATCH 6/6] " Marcelo Tosatti
@ 2010-11-17 13:58   ` Avi Kivity
  2010-11-18  1:58     ` Sheng Yang
  2 siblings, 1 reply; 26+ messages in thread
From: Avi Kivity @ 2010-11-17 13:58 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Marcelo Tosatti, Michael S. Tsirkin, kvm, Alex Williamson

On 11/15/2010 11:15 AM, Sheng Yang wrote:
> This patch enable per-vector mask for assigned devices using MSI-X.
>
> This patch provided two new APIs: one is for guest to specific device's MSI-X
> table address in MMIO, the other is for userspace to get information about mask
> bit.
>
> All the mask bit operation are kept in kernel, in order to accelerate.
> Userspace shouldn't access the device MMIO directly for the information,
> instead it should uses provided API to do so.
>
> Signed-off-by: Sheng Yang<sheng@linux.intel.com>
> ---
>   arch/x86/kvm/x86.c       |    1 +
>   include/linux/kvm.h      |   32 +++++
>   include/linux/kvm_host.h |    5 +
>   virt/kvm/assigned-dev.c  |  318 +++++++++++++++++++++++++++++++++++++++++++++-
>   4 files changed, 355 insertions(+), 1 deletions(-)
>

Documentation?

> +static int msix_mmio_read(struct kvm_io_device *this, gpa_t addr, int len,
> +			  void *val)
> +{
> +	struct kvm_assigned_dev_kernel *adev =
> +			container_of(this, struct kvm_assigned_dev_kernel,
> +				     msix_mmio_dev);
> +	int idx, r = 0;
> +	u32 entry[4];
> +	struct kvm_kernel_irq_routing_entry e;
> +
> +	/* TODO: Get big-endian machine work */
> +	mutex_lock(&adev->kvm->lock);
> +	if (!msix_mmio_in_range(adev, addr, len)) {
> +		r = -EOPNOTSUPP;
> +		goto out;
> +	}
> +	if ((addr&  0x3) || len != 4)
> +		goto out;
> +
> +	idx = msix_get_enabled_idx(adev, addr, len);
> +	if (idx<  0) {
> +		idx = (addr - adev->msix_mmio_base) / PCI_MSIX_ENTRY_SIZE;
> +		if ((addr % PCI_MSIX_ENTRY_SIZE) ==
> +				PCI_MSIX_ENTRY_VECTOR_CTRL)
> +			*(unsigned long *)val =
> +				test_bit(idx, adev->msix_mask_bitmap) ?
> +				PCI_MSIX_ENTRY_CTRL_MASKBIT : 0;
> +		else
> +			r = -EOPNOTSUPP;
> +		goto out;
> +	}
> +
> +	r = kvm_get_irq_routing_entry(adev->kvm,
> +			adev->guest_msix_entries[idx].vector,&e);
> +	if (r || e.type != KVM_IRQ_ROUTING_MSI) {
> +		r = -EOPNOTSUPP;
> +		goto out;
> +	}
> +	entry[0] = e.msi.address_lo;
> +	entry[1] = e.msi.address_hi;
> +	entry[2] = e.msi.data;
> +	entry[3] = test_bit(adev->guest_msix_entries[idx].entry,
> +			adev->msix_mask_bitmap);
> +	memcpy(val,&entry[addr % PCI_MSIX_ENTRY_SIZE / sizeof *entry], len);
> +
> +out:
> +	mutex_unlock(&adev->kvm->lock);
> +	return r;
> +}
> +
> +static int msix_mmio_write(struct kvm_io_device *this, gpa_t addr, int len,
> +			   const void *val)
> +{
> +	struct kvm_assigned_dev_kernel *adev =
> +			container_of(this, struct kvm_assigned_dev_kernel,
> +				     msix_mmio_dev);
> +	int idx, r = 0;
> +	unsigned long new_val = *(unsigned long *)val;

What if it's a 64-bit write on a 32-bit host?

Are we sure the trailing bytes of val are zero?

> +
> +	/* TODO: Get big-endian machine work */

BUILD_BUG_ON(something)

> +	mutex_lock(&adev->kvm->lock);
> +	if (!msix_mmio_in_range(adev, addr, len)) {
> +		r = -EOPNOTSUPP;
> +		goto out;
> +	}

Why is this needed?  Didn't the iodev check already do this?

> +	if ((addr&  0x3) || len != 4)
> +		goto out;

What if len == 8?  I think mst said it was legal.

> +
> +	idx = msix_get_enabled_idx(adev, addr, len);
> +	if (idx<  0) {
> +		idx = (addr - adev->msix_mmio_base) / PCI_MSIX_ENTRY_SIZE;
> +		if (((addr % PCI_MSIX_ENTRY_SIZE) ==
> +				PCI_MSIX_ENTRY_VECTOR_CTRL)) {
> +			if (new_val&  ~PCI_MSIX_ENTRY_CTRL_MASKBIT)
> +				goto out;
> +			if (new_val&  PCI_MSIX_ENTRY_CTRL_MASKBIT)
> +				set_bit(idx, adev->msix_mask_bitmap);
> +			else
> +				clear_bit(idx, adev->msix_mask_bitmap);
> +			/* It's possible that we need re-enable MSI-X, so go
> +			 * back to userspace */
> +		}
> +		/* Userspace would handle other MMIO writing */
> +		r = -EOPNOTSUPP;

That's not very good.  We should do the entire thing in the kernel or in 
userspace.  We can have a new EXIT_REASON to let userspace know an msix 
entry changed, and it should read it from the kernel.

> +		goto out;
> +	}
> +	if (addr % PCI_MSIX_ENTRY_SIZE != PCI_MSIX_ENTRY_VECTOR_CTRL) {
> +		r = -EOPNOTSUPP;
> +		goto out;
> +	}
> +	if (new_val&  ~PCI_MSIX_ENTRY_CTRL_MASKBIT)
> +		goto out;
> +	update_msix_mask(adev, idx, !!(new_val&  PCI_MSIX_ENTRY_CTRL_MASKBIT));
> +out:
> +	mutex_unlock(&adev->kvm->lock);
> +
> +	return r;
> +}
> +
> +static int kvm_vm_ioctl_update_msix_mmio(struct kvm *kvm,
> +				struct kvm_msix_mmio *msix_mmio)
> +{
> +	int r = 0;
> +	struct kvm_assigned_dev_kernel *adev;
> +
> +	if (msix_mmio->type != KVM_MSIX_TYPE_ASSIGNED_DEV)
> +		return -EINVAL;
> +
> +	if (!msix_mmio->flags)
> +		return -EINVAL;

Need to check flags for undefined bits too.

> +
> +	mutex_lock(&kvm->lock);
> +	adev = kvm_find_assigned_dev(&kvm->arch.assigned_dev_head,
> +				      msix_mmio->id);
> +	if (!adev) {
> +		r = -EINVAL;
> +		goto out;
> +	}
> +	if (msix_mmio->base_addr == 0) {
> +		r = -EINVAL;
> +		goto out;
> +	}
> +	if (msix_mmio->max_entries_nr == 0 ||
> +			msix_mmio->max_entries_nr>  KVM_MAX_MSIX_PER_DEV) {
> +		r = -EINVAL;
> +		goto out;
> +	}
> +
> +	if ((msix_mmio->flags&  KVM_MSIX_MMIO_FLAG_REGISTER)&&
> +			(msix_mmio->flags&  KVM_MSIX_MMIO_FLAG_UNREGISTER)) {
> +		r = -EINVAL;
> +		goto out;
> +	}
> +
> +	if (msix_mmio->flags&  KVM_MSIX_MMIO_FLAG_REGISTER) {
> +		if (!(adev->flags&  KVM_ASSIGNED_ENABLED_MSIX_MMIO)) {
> +			mutex_lock(&kvm->slots_lock);
> +			kvm_iodevice_init(&adev->msix_mmio_dev,
> +					&msix_mmio_ops);
> +			r = kvm_io_bus_register_dev(kvm, KVM_MMIO_BUS,
> +					&adev->msix_mmio_dev);
> +			if (!r)
> +				adev->flags |= KVM_ASSIGNED_ENABLED_MSIX_MMIO;
> +			mutex_unlock(&kvm->slots_lock);
> +		}
> +		if (!r) {
> +			adev->msix_mmio_base = msix_mmio->base_addr;
> +			adev->msix_max_entries_nr = msix_mmio->max_entries_nr;
> +		}
> +	} else if (msix_mmio->flags&  KVM_MSIX_MMIO_FLAG_UNREGISTER)
> +		unregister_msix_mmio(kvm, adev);
> +out:
> +	mutex_unlock(&kvm->lock);
> +
> +	return r;
> +}
>   #endif

Question: how do we do this with vfio?  One idea is to have

    ioctl(vmfd, KVM_IO_PIPE, { KVM_IO_MMIO /* or KVM_IO_PIO */, range, 
pipefd1, pipefd2 })

and have kvm convert mmio or pio in that range to commands and responses 
sent over that pipe.  We could have qemu threads, other userspace 
processes, or kernel components like vfio implement the other side of 
the channel.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func
  2010-11-15  9:15 ` [PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func Sheng Yang
@ 2010-11-17 14:01   ` Avi Kivity
  2010-11-18  2:22     ` Sheng Yang
  0 siblings, 1 reply; 26+ messages in thread
From: Avi Kivity @ 2010-11-17 14:01 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Marcelo Tosatti, Michael S. Tsirkin, kvm

On 11/15/2010 11:15 AM, Sheng Yang wrote:
> We need to query the entry later.
>
> +int kvm_get_irq_routing_entry(struct kvm *kvm, int gsi,
> +		struct kvm_kernel_irq_routing_entry *entry)
> +{
> +	int count = 0;
> +	struct kvm_kernel_irq_routing_entry *ei = NULL;
> +	struct kvm_irq_routing_table *irq_rt;
> +	struct hlist_node *n;
> +
> +	rcu_read_lock();
> +	irq_rt = rcu_dereference(kvm->irq_routing);
> +	if (gsi<  irq_rt->nr_rt_entries)
> +		hlist_for_each_entry(ei, n,&irq_rt->map[gsi], link)
> +			count++;
> +	if (count == 1)
> +		*entry = *ei;
> +	rcu_read_unlock();
> +
> +	return (count != 1);
> +}
> +

Not good form to rely on ei being valid after the loop.

I guess this is only useful for msi?  Need to document it.

*entry may be stale after rcu_read_unlock().  Is this a problem?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support
  2010-11-17 13:58   ` Avi Kivity
@ 2010-11-18  1:58     ` Sheng Yang
  2010-11-18  6:21       ` Michael S. Tsirkin
  2010-11-18  9:28       ` Avi Kivity
  0 siblings, 2 replies; 26+ messages in thread
From: Sheng Yang @ 2010-11-18  1:58 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Michael S. Tsirkin, kvm, Alex Williamson

On Wednesday 17 November 2010 21:58:00 Avi Kivity wrote:
> On 11/15/2010 11:15 AM, Sheng Yang wrote:
> > This patch enable per-vector mask for assigned devices using MSI-X.
> > 
> > This patch provided two new APIs: one is for guest to specific device's
> > MSI-X table address in MMIO, the other is for userspace to get
> > information about mask bit.
> > 
> > All the mask bit operation are kept in kernel, in order to accelerate.
> > Userspace shouldn't access the device MMIO directly for the information,
> > instead it should uses provided API to do so.
> > 
> > Signed-off-by: Sheng Yang<sheng@linux.intel.com>
> > ---
> > 
> >   arch/x86/kvm/x86.c       |    1 +
> >   include/linux/kvm.h      |   32 +++++
> >   include/linux/kvm_host.h |    5 +
> >   virt/kvm/assigned-dev.c  |  318
> >   +++++++++++++++++++++++++++++++++++++++++++++- 4 files changed, 
355
> >   insertions(+), 1 deletions(-)
> 
> Documentation?

For we are keeping changing the API for last several versions, I'd like to settle 
down the API first. Would bring back the document after API was agreed.
> 
> > +static int msix_mmio_read(struct kvm_io_device *this, gpa_t addr, int
> > len, +			  void *val)
> > +{
> > +	struct kvm_assigned_dev_kernel *adev =
> > +			container_of(this, struct kvm_assigned_dev_kernel,
> > +				     msix_mmio_dev);
> > +	int idx, r = 0;
> > +	u32 entry[4];
> > +	struct kvm_kernel_irq_routing_entry e;
> > +
> > +	/* TODO: Get big-endian machine work */
> > +	mutex_lock(&adev->kvm->lock);
> > +	if (!msix_mmio_in_range(adev, addr, len)) {
> > +		r = -EOPNOTSUPP;
> > +		goto out;
> > +	}
> > +	if ((addr&  0x3) || len != 4)
> > +		goto out;
> > +
> > +	idx = msix_get_enabled_idx(adev, addr, len);
> > +	if (idx<  0) {
> > +		idx = (addr - adev->msix_mmio_base) / PCI_MSIX_ENTRY_SIZE;
> > +		if ((addr % PCI_MSIX_ENTRY_SIZE) ==
> > +				PCI_MSIX_ENTRY_VECTOR_CTRL)
> > +			*(unsigned long *)val =
> > +				test_bit(idx, adev->msix_mask_bitmap) ?
> > +				PCI_MSIX_ENTRY_CTRL_MASKBIT : 0;
> > +		else
> > +			r = -EOPNOTSUPP;
> > +		goto out;
> > +	}
> > +
> > +	r = kvm_get_irq_routing_entry(adev->kvm,
> > +			adev->guest_msix_entries[idx].vector,&e);
> > +	if (r || e.type != KVM_IRQ_ROUTING_MSI) {
> > +		r = -EOPNOTSUPP;
> > +		goto out;
> > +	}
> > +	entry[0] = e.msi.address_lo;
> > +	entry[1] = e.msi.address_hi;
> > +	entry[2] = e.msi.data;
> > +	entry[3] = test_bit(adev->guest_msix_entries[idx].entry,
> > +			adev->msix_mask_bitmap);
> > +	memcpy(val,&entry[addr % PCI_MSIX_ENTRY_SIZE / sizeof *entry], len);
> > +
> > +out:
> > +	mutex_unlock(&adev->kvm->lock);
> > +	return r;
> > +}
> > +
> > +static int msix_mmio_write(struct kvm_io_device *this, gpa_t addr, int
> > len, +			   const void *val)
> > +{
> > +	struct kvm_assigned_dev_kernel *adev =
> > +			container_of(this, struct kvm_assigned_dev_kernel,
> > +				     msix_mmio_dev);
> > +	int idx, r = 0;
> > +	unsigned long new_val = *(unsigned long *)val;
> 
> What if it's a 64-bit write on a 32-bit host?

In fact we haven't support QWORD(64bit) accessing now. The reason is we haven't 
seen any OS is using it in this way now, so I think we can leave it later.

Also seems QEmu doesn't got the way to handle 64bit MMIO.
> 
> Are we sure the trailing bytes of val are zero?
> 
> > +
> > +	/* TODO: Get big-endian machine work */
> 
> BUILD_BUG_ON(something)

Good idea!
> 
> > +	mutex_lock(&adev->kvm->lock);
> > +	if (!msix_mmio_in_range(adev, addr, len)) {
> > +		r = -EOPNOTSUPP;
> > +		goto out;
> > +	}
> 
> Why is this needed?  Didn't the iodev check already do this?

Well, kvm_io_device_ops() hasn't got "in_range" callback yet...
> 
> > +	if ((addr&  0x3) || len != 4)
> > +		goto out;
> 
> What if len == 8?  I think mst said it was legal.

Since we haven't seen anyone is using it in this way, so I think we can leave it 
later.
> 
> > +
> > +	idx = msix_get_enabled_idx(adev, addr, len);
> > +	if (idx<  0) {
> > +		idx = (addr - adev->msix_mmio_base) / PCI_MSIX_ENTRY_SIZE;
> > +		if (((addr % PCI_MSIX_ENTRY_SIZE) ==
> > +				PCI_MSIX_ENTRY_VECTOR_CTRL)) {
> > +			if (new_val&  ~PCI_MSIX_ENTRY_CTRL_MASKBIT)
> > +				goto out;
> > +			if (new_val&  PCI_MSIX_ENTRY_CTRL_MASKBIT)
> > +				set_bit(idx, adev->msix_mask_bitmap);
> > +			else
> > +				clear_bit(idx, adev->msix_mask_bitmap);
> > +			/* It's possible that we need re-enable MSI-X, so go
> > +			 * back to userspace */
> > +		}
> > +		/* Userspace would handle other MMIO writing */
> > +		r = -EOPNOTSUPP;
> 
> That's not very good.  We should do the entire thing in the kernel or in
> userspace.  We can have a new EXIT_REASON to let userspace know an msix
> entry changed, and it should read it from the kernel.

If you look it in this way:
1. Mask bit owned by kernel.
2. Routing owned by userspace.
3. Read the routing in kernel is an speed up for normal operation - because kernel 
can read from them.

So I think the logic here is clear to understand.

But if we can modify the routing in kernel, it would be raise some sync issues due 
to both kernel and userspace own routing. So maybe the better solution is move the 
routing to kernel.

We can do something like that later, because this patch won't be need much change.  
> 
> > +		goto out;
> > +	}
> > +	if (addr % PCI_MSIX_ENTRY_SIZE != PCI_MSIX_ENTRY_VECTOR_CTRL) {
> > +		r = -EOPNOTSUPP;
> > +		goto out;
> > +	}
> > +	if (new_val&  ~PCI_MSIX_ENTRY_CTRL_MASKBIT)
> > +		goto out;
> > +	update_msix_mask(adev, idx, !!(new_val&  PCI_MSIX_ENTRY_CTRL_MASKBIT));
> > +out:
> > +	mutex_unlock(&adev->kvm->lock);
> > +
> > +	return r;
> > +}
> > +
> > +static int kvm_vm_ioctl_update_msix_mmio(struct kvm *kvm,
> > +				struct kvm_msix_mmio *msix_mmio)
> > +{
> > +	int r = 0;
> > +	struct kvm_assigned_dev_kernel *adev;
> > +
> > +	if (msix_mmio->type != KVM_MSIX_TYPE_ASSIGNED_DEV)
> > +		return -EINVAL;
> > +
> > +	if (!msix_mmio->flags)
> > +		return -EINVAL;
> 
> Need to check flags for undefined bits too.

There is a checking later, I can move it earlier.

--
regards
Yang, Sheng

> 
> > +
> > +	mutex_lock(&kvm->lock);
> > +	adev = kvm_find_assigned_dev(&kvm->arch.assigned_dev_head,
> > +				      msix_mmio->id);
> > +	if (!adev) {
> > +		r = -EINVAL;
> > +		goto out;
> > +	}
> > +	if (msix_mmio->base_addr == 0) {
> > +		r = -EINVAL;
> > +		goto out;
> > +	}
> > +	if (msix_mmio->max_entries_nr == 0 ||
> > +			msix_mmio->max_entries_nr>  KVM_MAX_MSIX_PER_DEV) {
> > +		r = -EINVAL;
> > +		goto out;
> > +	}
> > +
> > +	if ((msix_mmio->flags&  KVM_MSIX_MMIO_FLAG_REGISTER)&&
> > +			(msix_mmio->flags&  KVM_MSIX_MMIO_FLAG_UNREGISTER)) {
> > +		r = -EINVAL;
> > +		goto out;
> > +	}
> > +
> > +	if (msix_mmio->flags&  KVM_MSIX_MMIO_FLAG_REGISTER) {
> > +		if (!(adev->flags&  KVM_ASSIGNED_ENABLED_MSIX_MMIO)) {
> > +			mutex_lock(&kvm->slots_lock);
> > +			kvm_iodevice_init(&adev->msix_mmio_dev,
> > +					&msix_mmio_ops);
> > +			r = kvm_io_bus_register_dev(kvm, KVM_MMIO_BUS,
> > +					&adev->msix_mmio_dev);
> > +			if (!r)
> > +				adev->flags |= KVM_ASSIGNED_ENABLED_MSIX_MMIO;
> > +			mutex_unlock(&kvm->slots_lock);
> > +		}
> > +		if (!r) {
> > +			adev->msix_mmio_base = msix_mmio->base_addr;
> > +			adev->msix_max_entries_nr = msix_mmio->max_entries_nr;
> > +		}
> > +	} else if (msix_mmio->flags&  KVM_MSIX_MMIO_FLAG_UNREGISTER)
> > +		unregister_msix_mmio(kvm, adev);
> > +out:
> > +	mutex_unlock(&kvm->lock);
> > +
> > +	return r;
> > +}
> > 
> >   #endif
> 
> Question: how do we do this with vfio?  One idea is to have
> 
>     ioctl(vmfd, KVM_IO_PIPE, { KVM_IO_MMIO /* or KVM_IO_PIO */, range,
> pipefd1, pipefd2 })
> 
> and have kvm convert mmio or pio in that range to commands and responses
> sent over that pipe.  We could have qemu threads, other userspace
> processes, or kernel components like vfio implement the other side of
> the channel.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func
  2010-11-17 14:01   ` Avi Kivity
@ 2010-11-18  2:22     ` Sheng Yang
  2010-11-18  9:30       ` Avi Kivity
  0 siblings, 1 reply; 26+ messages in thread
From: Sheng Yang @ 2010-11-18  2:22 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Michael S. Tsirkin, kvm

On Wednesday 17 November 2010 22:01:41 Avi Kivity wrote:
> On 11/15/2010 11:15 AM, Sheng Yang wrote:
> > We need to query the entry later.
> > 
> > +int kvm_get_irq_routing_entry(struct kvm *kvm, int gsi,
> > +		struct kvm_kernel_irq_routing_entry *entry)
> > +{
> > +	int count = 0;
> > +	struct kvm_kernel_irq_routing_entry *ei = NULL;
> > +	struct kvm_irq_routing_table *irq_rt;
> > +	struct hlist_node *n;
> > +
> > +	rcu_read_lock();
> > +	irq_rt = rcu_dereference(kvm->irq_routing);
> > +	if (gsi<  irq_rt->nr_rt_entries)
> > +		hlist_for_each_entry(ei, n,&irq_rt->map[gsi], link)
> > +			count++;
> > +	if (count == 1)
> > +		*entry = *ei;
> > +	rcu_read_unlock();
> > +
> > +	return (count != 1);
> > +}
> > +
> 
> Not good form to rely on ei being valid after the loop.
> 
> I guess this is only useful for msi?  Need to document it.

May can be used for others later, it's somehow generic. Where should I document 
it?
> 
> *entry may be stale after rcu_read_unlock().  Is this a problem?

I suppose not. All MSI-X MMIO accessing would be executed without delay, so no re-
order issue would happen. If the guest is reading and writing the field at the same 
time(from two cpus), it should got some kinds of sync method for itself - or it 
may not care what's the reading result(like the one after msix_mask_irq()). 

--
regards
Yang, Sheng

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support
  2010-11-18  1:58     ` Sheng Yang
@ 2010-11-18  6:21       ` Michael S. Tsirkin
  2010-11-18  6:39         ` Sheng Yang
  2010-11-18  9:28       ` Avi Kivity
  1 sibling, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2010-11-18  6:21 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson

On Thu, Nov 18, 2010 at 09:58:55AM +0800, Sheng Yang wrote:
> > > +static int msix_mmio_write(struct kvm_io_device *this, gpa_t addr, int
> > > len, +			   const void *val)
> > > +{
> > > +	struct kvm_assigned_dev_kernel *adev =
> > > +			container_of(this, struct kvm_assigned_dev_kernel,
> > > +				     msix_mmio_dev);
> > > +	int idx, r = 0;
> > > +	unsigned long new_val = *(unsigned long *)val;
> > 
> > What if it's a 64-bit write on a 32-bit host?
> 
> In fact we haven't support QWORD(64bit) accessing now. The reason is we haven't 
> seen any OS is using it in this way now, so I think we can leave it later.
> 
> Also seems QEmu doesn't got the way to handle 64bit MMIO.

I think it does.  I think it simply splits these to 32-bit transactions
and handles as such. That seems to be spec-compilant.  I wouldn't want us
to regress.

-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support
  2010-11-18  6:21       ` Michael S. Tsirkin
@ 2010-11-18  6:39         ` Sheng Yang
  0 siblings, 0 replies; 26+ messages in thread
From: Sheng Yang @ 2010-11-18  6:39 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Avi Kivity, Marcelo Tosatti, kvm, Alex Williamson

On Thursday 18 November 2010 14:21:40 Michael S. Tsirkin wrote:
> On Thu, Nov 18, 2010 at 09:58:55AM +0800, Sheng Yang wrote:
> > > > +static int msix_mmio_write(struct kvm_io_device *this, gpa_t addr,
> > > > int len, +			   const void *val)
> > > > +{
> > > > +	struct kvm_assigned_dev_kernel *adev =
> > > > +			container_of(this, struct kvm_assigned_dev_kernel,
> > > > +				     msix_mmio_dev);
> > > > +	int idx, r = 0;
> > > > +	unsigned long new_val = *(unsigned long *)val;
> > > 
> > > What if it's a 64-bit write on a 32-bit host?
> > 
> > In fact we haven't support QWORD(64bit) accessing now. The reason is we
> > haven't seen any OS is using it in this way now, so I think we can leave
> > it later.
> > 
> > Also seems QEmu doesn't got the way to handle 64bit MMIO.
> 
> I think it does.  I think it simply splits these to 32-bit transactions
> and handles as such. That seems to be spec-compilant.  I wouldn't want us
> to regress.

Yes, you're right...

I think I have to add it. :shrug:

--
regards
Yang, Sheng

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support
  2010-11-18  1:58     ` Sheng Yang
  2010-11-18  6:21       ` Michael S. Tsirkin
@ 2010-11-18  9:28       ` Avi Kivity
  2010-11-18  9:37         ` Michael S. Tsirkin
  2010-11-18 12:08         ` Sheng Yang
  1 sibling, 2 replies; 26+ messages in thread
From: Avi Kivity @ 2010-11-18  9:28 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Marcelo Tosatti, Michael S. Tsirkin, kvm, Alex Williamson

On 11/18/2010 03:58 AM, Sheng Yang wrote:
> On Wednesday 17 November 2010 21:58:00 Avi Kivity wrote:
> >  On 11/15/2010 11:15 AM, Sheng Yang wrote:
> >  >  This patch enable per-vector mask for assigned devices using MSI-X.
> >  >
> >  >  This patch provided two new APIs: one is for guest to specific device's
> >  >  MSI-X table address in MMIO, the other is for userspace to get
> >  >  information about mask bit.
> >  >
> >  >  All the mask bit operation are kept in kernel, in order to accelerate.
> >  >  Userspace shouldn't access the device MMIO directly for the information,
> >  >  instead it should uses provided API to do so.
> >  >
> >  >  Signed-off-by: Sheng Yang<sheng@linux.intel.com>
> >  >  ---
> >  >
> >  >    arch/x86/kvm/x86.c       |    1 +
> >  >    include/linux/kvm.h      |   32 +++++
> >  >    include/linux/kvm_host.h |    5 +
> >  >    virt/kvm/assigned-dev.c  |  318
> >  >    +++++++++++++++++++++++++++++++++++++++++++++- 4 files changed,
> 355
> >  >    insertions(+), 1 deletions(-)
> >
> >  Documentation?
>
> For we are keeping changing the API for last several versions, I'd like to settle
> down the API first. Would bring back the document after API was agreed.

Maybe for APIs we should start with only the documentation patch, agree 
on that, and move on to the implementation.

> >  What if it's a 64-bit write on a 32-bit host?
>
> In fact we haven't support QWORD(64bit) accessing now. The reason is we haven't
> seen any OS is using it in this way now, so I think we can leave it later.
>
> Also seems QEmu doesn't got the way to handle 64bit MMIO.

There's a difference, if the API doesn't support it, we can't add it 
later without changing both kernel and userspace.

> >
> >  That's not very good.  We should do the entire thing in the kernel or in
> >  userspace.  We can have a new EXIT_REASON to let userspace know an msix
> >  entry changed, and it should read it from the kernel.
>
> If you look it in this way:
> 1. Mask bit owned by kernel.
> 2. Routing owned by userspace.
> 3. Read the routing in kernel is an speed up for normal operation - because kernel
> can read from them.
>
> So I think the logic here is clear to understand.

Still, it's complicated and the state is split across multiple components.

> But if we can modify the routing in kernel, it would be raise some sync issues due
> to both kernel and userspace own routing. So maybe the better solution is move the
> routing to kernel.

That may work, but I don't think we can do this for vfio.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func
  2010-11-18  2:22     ` Sheng Yang
@ 2010-11-18  9:30       ` Avi Kivity
  2010-11-18  9:41         ` Michael S. Tsirkin
  0 siblings, 1 reply; 26+ messages in thread
From: Avi Kivity @ 2010-11-18  9:30 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Marcelo Tosatti, Michael S. Tsirkin, kvm

On 11/18/2010 04:22 AM, Sheng Yang wrote:
> On Wednesday 17 November 2010 22:01:41 Avi Kivity wrote:
> >  On 11/15/2010 11:15 AM, Sheng Yang wrote:
> >  >  We need to query the entry later.
> >  >
> >  >  +int kvm_get_irq_routing_entry(struct kvm *kvm, int gsi,
> >  >  +		struct kvm_kernel_irq_routing_entry *entry)
> >  >  +{
> >  >  +	int count = 0;
> >  >  +	struct kvm_kernel_irq_routing_entry *ei = NULL;
> >  >  +	struct kvm_irq_routing_table *irq_rt;
> >  >  +	struct hlist_node *n;
> >  >  +
> >  >  +	rcu_read_lock();
> >  >  +	irq_rt = rcu_dereference(kvm->irq_routing);
> >  >  +	if (gsi<   irq_rt->nr_rt_entries)
> >  >  +		hlist_for_each_entry(ei, n,&irq_rt->map[gsi], link)
> >  >  +			count++;
> >  >  +	if (count == 1)
> >  >  +		*entry = *ei;
> >  >  +	rcu_read_unlock();
> >  >  +
> >  >  +	return (count != 1);
> >  >  +}
> >  >  +
> >
> >  Not good form to rely on ei being valid after the loop.
> >
> >  I guess this is only useful for msi?  Need to document it.
>
> May can be used for others later, it's somehow generic. Where should I document
> it?

Non-msi interrupts (wires) can be wired to more than one interrupt line 
(and often are - pic/ioapic).

You can document it by adding _msi to the name.

> >
> >  *entry may be stale after rcu_read_unlock().  Is this a problem?
>
> I suppose not. All MSI-X MMIO accessing would be executed without delay, so no re-
> order issue would happen. If the guest is reading and writing the field at the same
> time(from two cpus), it should got some kinds of sync method for itself - or it
> may not care what's the reading result(like the one after msix_mask_irq()).

I guess so.  Michael/Alex?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support
  2010-11-18  9:28       ` Avi Kivity
@ 2010-11-18  9:37         ` Michael S. Tsirkin
  2010-11-18 12:08         ` Sheng Yang
  1 sibling, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2010-11-18  9:37 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Sheng Yang, Marcelo Tosatti, kvm, Alex Williamson

On Thu, Nov 18, 2010 at 11:28:02AM +0200, Avi Kivity wrote:
> On 11/18/2010 03:58 AM, Sheng Yang wrote:
> >On Wednesday 17 November 2010 21:58:00 Avi Kivity wrote:
> >>  On 11/15/2010 11:15 AM, Sheng Yang wrote:
> >>  >  This patch enable per-vector mask for assigned devices using MSI-X.
> >>  >
> >>  >  This patch provided two new APIs: one is for guest to specific device's
> >>  >  MSI-X table address in MMIO, the other is for userspace to get
> >>  >  information about mask bit.
> >>  >
> >>  >  All the mask bit operation are kept in kernel, in order to accelerate.
> >>  >  Userspace shouldn't access the device MMIO directly for the information,
> >>  >  instead it should uses provided API to do so.
> >>  >
> >>  >  Signed-off-by: Sheng Yang<sheng@linux.intel.com>
> >>  >  ---
> >>  >
> >>  >    arch/x86/kvm/x86.c       |    1 +
> >>  >    include/linux/kvm.h      |   32 +++++
> >>  >    include/linux/kvm_host.h |    5 +
> >>  >    virt/kvm/assigned-dev.c  |  318
> >>  >    +++++++++++++++++++++++++++++++++++++++++++++- 4 files changed,
> >355
> >>  >    insertions(+), 1 deletions(-)
> >>
> >>  Documentation?
> >
> >For we are keeping changing the API for last several versions, I'd like to settle
> >down the API first. Would bring back the document after API was agreed.
> 
> Maybe for APIs we should start with only the documentation patch,
> agree on that, and move on to the implementation.
> 
> >>  What if it's a 64-bit write on a 32-bit host?
> >
> >In fact we haven't support QWORD(64bit) accessing now. The reason is we haven't
> >seen any OS is using it in this way now, so I think we can leave it later.
> >
> >Also seems QEmu doesn't got the way to handle 64bit MMIO.
> 
> There's a difference, if the API doesn't support it, we can't add it
> later without changing both kernel and userspace.
> 
> >>
> >>  That's not very good.  We should do the entire thing in the kernel or in
> >>  userspace.  We can have a new EXIT_REASON to let userspace know an msix
> >>  entry changed, and it should read it from the kernel.
> >
> >If you look it in this way:
> >1. Mask bit owned by kernel.
> >2. Routing owned by userspace.
> >3. Read the routing in kernel is an speed up for normal operation - because kernel
> >can read from them.
> >
> >So I think the logic here is clear to understand.
> 
> Still, it's complicated and the state is split across multiple components.
> 
> >But if we can modify the routing in kernel, it would be raise some sync issues due
> >to both kernel and userspace own routing. So maybe the better solution is move the
> >routing to kernel.
> 
> That may work, but I don't think we can do this for vfio.

Actually, if done right it might work for VFIO: we would need
2 eventfds to notify it that it has to mask/unmask entries.
The interface would need to be careful to keep programming of the guest
side emulation and the masking in the backend device completely
separate.

> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func
  2010-11-18  9:30       ` Avi Kivity
@ 2010-11-18  9:41         ` Michael S. Tsirkin
  2010-11-18 11:59           ` Sheng Yang
  0 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2010-11-18  9:41 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Sheng Yang, Marcelo Tosatti, kvm

On Thu, Nov 18, 2010 at 11:30:47AM +0200, Avi Kivity wrote:
> >>
> >>  *entry may be stale after rcu_read_unlock().  Is this a problem?
> >
> >I suppose not. All MSI-X MMIO accessing would be executed without delay, so no re-
> >order issue would happen. If the guest is reading and writing the field at the same
> >time(from two cpus), it should got some kinds of sync method for itself - or it
> >may not care what's the reading result(like the one after msix_mask_irq()).
> 
> I guess so.  Michael/Alex?

This is kvm_get_irq_routing_entry which is used for table reads,
correct?  Actually, the pci read *is* the sync method that guests use,
they rely on reads to flush out all previous writes.

> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support
  2010-11-17  1:29     ` Sheng Yang
  2010-11-17 13:35       ` Marcelo Tosatti
@ 2010-11-18  9:43       ` Michael S. Tsirkin
  1 sibling, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2010-11-18  9:43 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Marcelo Tosatti, Avi Kivity, kvm

On Wed, Nov 17, 2010 at 09:29:22AM +0800, Sheng Yang wrote:
> > > +#define KVM_MSIX_TYPE_ASSIGNED_DEV	1
> > > +
> > > +#define KVM_MSIX_FLAG_MASKBIT		(1 << 0)
> > > +#define KVM_MSIX_FLAG_QUERY_MASKBIT	(1 << 0)
> > > +
> > > +struct kvm_msix_entry {
> > > +	__u32 id;
> > > +	__u32 type;
> > 
> > Is type really necessary? Will it ever differ from
> > KVM_MSIX_TYPE_ASSIGNED_DEV?
> 
> This is the suggestion from Michael. He want it to be reused by emulated/pv 
> devices. So I add the type field here.

Maybe id field can be reused for this somehow.

-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func
  2010-11-18  9:41         ` Michael S. Tsirkin
@ 2010-11-18 11:59           ` Sheng Yang
  2010-11-18 12:33             ` Michael S. Tsirkin
  0 siblings, 1 reply; 26+ messages in thread
From: Sheng Yang @ 2010-11-18 11:59 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Avi Kivity, Sheng Yang, Marcelo Tosatti, kvm

On Thu, Nov 18, 2010 at 5:41 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Thu, Nov 18, 2010 at 11:30:47AM +0200, Avi Kivity wrote:
>> >>
>> >>  *entry may be stale after rcu_read_unlock().  Is this a problem?
>> >
>> >I suppose not. All MSI-X MMIO accessing would be executed without delay, so no re-
>> >order issue would happen. If the guest is reading and writing the field at the same
>> >time(from two cpus), it should got some kinds of sync method for itself - or it
>> >may not care what's the reading result(like the one after msix_mask_irq()).
>>
>> I guess so.  Michael/Alex?
>
> This is kvm_get_irq_routing_entry which is used for table reads,
> correct?  Actually, the pci read *is* the sync method that guests use,
> they rely on reads to flush out all previous writes.

Michael, I think the *sync* you are talking about is not the one I
meant. I was talking about two cpus case, one is reading and the other
is writing, the order can't be determined if guest doesn't use lock or
some other synchronize methods; and you're talking about to flush out
all previous writes of the only one CPU...

-- 
regards,
Yang, Sheng

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support
  2010-11-18  9:28       ` Avi Kivity
  2010-11-18  9:37         ` Michael S. Tsirkin
@ 2010-11-18 12:08         ` Sheng Yang
  1 sibling, 0 replies; 26+ messages in thread
From: Sheng Yang @ 2010-11-18 12:08 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Sheng Yang, Marcelo Tosatti, Michael S. Tsirkin, kvm, Alex Williamson

On Thu, Nov 18, 2010 at 5:28 PM, Avi Kivity <avi@redhat.com> wrote:
> On 11/18/2010 03:58 AM, Sheng Yang wrote:
>>
>> On Wednesday 17 November 2010 21:58:00 Avi Kivity wrote:
>> >  On 11/15/2010 11:15 AM, Sheng Yang wrote:
>> >  >  This patch enable per-vector mask for assigned devices using MSI-X.
>> >  >
>> >  >  This patch provided two new APIs: one is for guest to specific
>> > device's
>> >  >  MSI-X table address in MMIO, the other is for userspace to get
>> >  >  information about mask bit.
>> >  >
>> >  >  All the mask bit operation are kept in kernel, in order to
>> > accelerate.
>> >  >  Userspace shouldn't access the device MMIO directly for the
>> > information,
>> >  >  instead it should uses provided API to do so.
>> >  >
>> >  >  Signed-off-by: Sheng Yang<sheng@linux.intel.com>
>> >  >  ---
>> >  >
>> >  >    arch/x86/kvm/x86.c       |    1 +
>> >  >    include/linux/kvm.h      |   32 +++++
>> >  >    include/linux/kvm_host.h |    5 +
>> >  >    virt/kvm/assigned-dev.c  |  318
>> >  >    +++++++++++++++++++++++++++++++++++++++++++++- 4 files changed,
>> 355
>> >  >    insertions(+), 1 deletions(-)
>> >
>> >  Documentation?
>>
>> For we are keeping changing the API for last several versions, I'd like to
>> settle
>> down the API first. Would bring back the document after API was agreed.
>
> Maybe for APIs we should start with only the documentation patch, agree on
> that, and move on to the implementation.

Yes, would follow it next time. And I would bring back the documents
in the next edition, for Michael and I have reached agreement on API.
>
>> >  What if it's a 64-bit write on a 32-bit host?
>>
>> In fact we haven't support QWORD(64bit) accessing now. The reason is we
>> haven't
>> seen any OS is using it in this way now, so I think we can leave it later.
>>
>> Also seems QEmu doesn't got the way to handle 64bit MMIO.
>
> There's a difference, if the API doesn't support it, we can't add it later
> without changing both kernel and userspace.

Um... Which API you're talking about? I think userspace API(set msix
mmio, and get mask bit status) is unrelated here?
>
>> >
>> >  That's not very good.  We should do the entire thing in the kernel or
>> > in
>> >  userspace.  We can have a new EXIT_REASON to let userspace know an msix
>> >  entry changed, and it should read it from the kernel.
>>
>> If you look it in this way:
>> 1. Mask bit owned by kernel.
>> 2. Routing owned by userspace.
>> 3. Read the routing in kernel is an speed up for normal operation -
>> because kernel
>> can read from them.
>>
>> So I think the logic here is clear to understand.
>
> Still, it's complicated and the state is split across multiple components.

So how about removing the reading acceleration part in the patch
temporarily? Kernel owns mask bit and userspace owns others. That
should be better. I can add the reading part later when we can find an
elegant way to do so.

>
>> But if we can modify the routing in kernel, it would be raise some sync
>> issues due
>> to both kernel and userspace own routing. So maybe the better solution is
>> move the
>> routing to kernel.
>
> That may work, but I don't think we can do this for vfio.

-- 
regards,
Yang, Sheng
>
> --
> error compiling committee.c: too many arguments to function
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func
  2010-11-18 11:59           ` Sheng Yang
@ 2010-11-18 12:33             ` Michael S. Tsirkin
  2010-11-18 12:40               ` Sheng Yang
  0 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2010-11-18 12:33 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Avi Kivity, Sheng Yang, Marcelo Tosatti, kvm

On Thu, Nov 18, 2010 at 07:59:10PM +0800, Sheng Yang wrote:
> On Thu, Nov 18, 2010 at 5:41 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Thu, Nov 18, 2010 at 11:30:47AM +0200, Avi Kivity wrote:
> >> >>
> >> >>  *entry may be stale after rcu_read_unlock().  Is this a problem?
> >> >
> >> >I suppose not. All MSI-X MMIO accessing would be executed without delay, so no re-
> >> >order issue would happen. If the guest is reading and writing the field at the same
> >> >time(from two cpus), it should got some kinds of sync method for itself - or it
> >> >may not care what's the reading result(like the one after msix_mask_irq()).
> >>
> >> I guess so.  Michael/Alex?
> >
> > This is kvm_get_irq_routing_entry which is used for table reads,
> > correct?  Actually, the pci read *is* the sync method that guests use,
> > they rely on reads to flush out all previous writes.
> 
> Michael, I think the *sync* you are talking about is not the one I
> meant. I was talking about two cpus case, one is reading and the other
> is writing, the order can't be determined if guest doesn't use lock or
> some other synchronize methods; and you're talking about to flush out
> all previous writes of the only one CPU...

Yes, but you don't seem to flush out writes on a read, either.

> -- 
> regards,
> Yang, Sheng

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func
  2010-11-18 12:33             ` Michael S. Tsirkin
@ 2010-11-18 12:40               ` Sheng Yang
  0 siblings, 0 replies; 26+ messages in thread
From: Sheng Yang @ 2010-11-18 12:40 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Avi Kivity, Sheng Yang, Marcelo Tosatti, kvm

On Thu, Nov 18, 2010 at 8:33 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Thu, Nov 18, 2010 at 07:59:10PM +0800, Sheng Yang wrote:
>> On Thu, Nov 18, 2010 at 5:41 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > On Thu, Nov 18, 2010 at 11:30:47AM +0200, Avi Kivity wrote:
>> >> >>
>> >> >>  *entry may be stale after rcu_read_unlock().  Is this a problem?
>> >> >
>> >> >I suppose not. All MSI-X MMIO accessing would be executed without delay, so no re-
>> >> >order issue would happen. If the guest is reading and writing the field at the same
>> >> >time(from two cpus), it should got some kinds of sync method for itself - or it
>> >> >may not care what's the reading result(like the one after msix_mask_irq()).
>> >>
>> >> I guess so.  Michael/Alex?
>> >
>> > This is kvm_get_irq_routing_entry which is used for table reads,
>> > correct?  Actually, the pci read *is* the sync method that guests use,
>> > they rely on reads to flush out all previous writes.
>>
>> Michael, I think the *sync* you are talking about is not the one I
>> meant. I was talking about two cpus case, one is reading and the other
>> is writing, the order can't be determined if guest doesn't use lock or
>> some other synchronize methods; and you're talking about to flush out
>> all previous writes of the only one CPU...
>
> Yes, but you don't seem to flush out writes on a read, either.

... I don't understand... We are emulating the writing operation using
software and make it in effect immediately... What should we supposed
to do with this "flush"?

-- 
regards,
Yang, Sheng

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2010-11-18 12:40 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-15  9:15 [PATCH 0/6 v5] MSI-X mask support for assigned device Sheng Yang
2010-11-15  9:15 ` [PATCH 1/6] PCI: MSI: Move MSI-X entry definition to pci_regs.h Sheng Yang
2010-11-15  9:15 ` [PATCH 2/6] PCI: Add mask bit definition for MSI-X table Sheng Yang
2010-11-15  9:15 ` [PATCH 3/6] KVM: Move struct kvm_io_device to kvm_host.h Sheng Yang
2010-11-15  9:15 ` [PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func Sheng Yang
2010-11-17 14:01   ` Avi Kivity
2010-11-18  2:22     ` Sheng Yang
2010-11-18  9:30       ` Avi Kivity
2010-11-18  9:41         ` Michael S. Tsirkin
2010-11-18 11:59           ` Sheng Yang
2010-11-18 12:33             ` Michael S. Tsirkin
2010-11-18 12:40               ` Sheng Yang
2010-11-15  9:15 ` [PATCH 5/6] KVM: assigned dev: Clean up assigned_device's flag Sheng Yang
2010-11-15  9:15 ` [PATCH 6/6] KVM: assigned dev: MSI-X mask support Sheng Yang
2010-11-15  9:27   ` [PATCH 6/6 v5 updated] " Sheng Yang
2010-11-16 19:45   ` [PATCH 6/6] " Marcelo Tosatti
2010-11-17  1:29     ` Sheng Yang
2010-11-17 13:35       ` Marcelo Tosatti
2010-11-18  9:43       ` Michael S. Tsirkin
2010-11-17 13:58   ` Avi Kivity
2010-11-18  1:58     ` Sheng Yang
2010-11-18  6:21       ` Michael S. Tsirkin
2010-11-18  6:39         ` Sheng Yang
2010-11-18  9:28       ` Avi Kivity
2010-11-18  9:37         ` Michael S. Tsirkin
2010-11-18 12:08         ` Sheng Yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.