linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series
@ 2015-09-18 14:29 Feng Wu
  2015-09-18 14:29 ` [PATCH v9 01/18] virt: IRQ bypass manager Feng Wu
                   ` (19 more replies)
  0 siblings, 20 replies; 56+ messages in thread
From: Feng Wu @ 2015-09-18 14:29 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.

You can find the VT-d Posted-Interrtups Spec. in the following URL:
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html

v9:
- Include the whole series:
[01/18]: irq bypasser manager
[02/18] - [06/18]: Common non-architecture part for VT-d PI and ARM side forwarded irq
[07/18] - [18/18]: VT-d PI part

v8:
refer to the changelog in each patch

v7:
* Define two weak irq bypass callbacks:
  - kvm_arch_irq_bypass_start()
  - kvm_arch_irq_bypass_stop()
* Remove the x86 dummy implementation of the above two functions.
* Print some useful information instead of WARN_ON() when the
  irq bypass consumer unregistration fails.
* Fix an issue when calling pi_pre_block and pi_post_block.

v6:
* Rebase on 4.2.0-rc6
* Rebase on https://lkml.org/lkml/2015/8/6/526 and http://www.gossamer-threads.com/lists/linux/kernel/2235623
* Make the add_consumer and del_consumer callbacks static
* Remove pointless INIT_LIST_HEAD to 'vdev->ctx[vector].producer.node)'
* Use dev_info instead of WARN_ON() when irq_bypass_register_producer fails
* Remove optional dummy callbacks for irq producer

v4:
* For lowest-priority interrupt, only support single-CPU destination
interrupts at the current stage, more common lowest priority support
will be added later.
* Accoring to Marcelo's suggestion, when vCPU is blocked, we handle
the posted-interrupts in the HLT emulation path.
* Some small changes (coding style, typo, add some code comments)

v3:
* Adjust the Posted-interrupts Descriptor updating logic when vCPU is
  preempted or blocked.
* KVM_DEV_VFIO_DEVICE_POSTING_IRQ --> KVM_DEV_VFIO_DEVICE_POST_IRQ
* __KVM_HAVE_ARCH_KVM_VFIO_POSTING --> __KVM_HAVE_ARCH_KVM_VFIO_POST
* Add KVM_DEV_VFIO_DEVICE_UNPOST_IRQ attribute for VFIO irq, which
  can be used to change back to remapping mode.
* Fix typo

v2:
* Use VFIO framework to enable this feature, the VFIO part of this series is
  base on Eric's patch "[PATCH v3 0/8] KVM-VFIO IRQ forward control"
* Rebase this patchset on git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git,
  then revise some irq logic based on the new hierarchy irqdomain patches provided
  by Jiang Liu <jiang.liu@linux.intel.com>


*** BLURB HERE ***

Alex Williamson (1):
  virt: IRQ bypass manager

Eric Auger (4):
  KVM: arm/arm64: select IRQ_BYPASS_MANAGER
  KVM: create kvm_irqfd.h
  KVM: introduce kvm_arch functions for IRQ bypass
  KVM: eventfd: add irq bypass consumer management

Feng Wu (13):
  KVM: x86: select IRQ_BYPASS_MANAGER
  KVM: Extend struct pi_desc for VT-d Posted-Interrupts
  KVM: Add some helper functions for Posted-Interrupts
  KVM: Define a new interface kvm_intr_is_single_vcpu()
  KVM: Make struct kvm_irq_routing_table accessible
  KVM: make kvm_set_msi_irq() public
  vfio: Register/unregister irq_bypass_producer
  KVM: x86: Update IRTE for posted-interrupts
  KVM: Implement IRQ bypass consumer callbacks for x86
  KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'
  KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
  KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  iommu/vt-d: Add a command line parameter for VT-d posted-interrupts

 Documentation/kernel-parameters.txt   |   1 +
 Documentation/virtual/kvm/locking.txt |  12 ++
 MAINTAINERS                           |   7 +
 arch/arm/kvm/Kconfig                  |   2 +
 arch/arm/kvm/Makefile                 |   1 +
 arch/arm64/kvm/Kconfig                |   2 +
 arch/arm64/kvm/Makefile               |   1 +
 arch/x86/include/asm/kvm_host.h       |  24 +++
 arch/x86/kvm/Kconfig                  |   3 +
 arch/x86/kvm/Makefile                 |   3 +
 arch/x86/kvm/irq_comm.c               |  32 ++-
 arch/x86/kvm/lapic.c                  |  59 ++++++
 arch/x86/kvm/lapic.h                  |   2 +
 arch/x86/kvm/trace.h                  |  33 ++++
 arch/x86/kvm/vmx.c                    | 361 +++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c                    | 108 +++++++++-
 drivers/iommu/irq_remapping.c         |  12 +-
 drivers/vfio/pci/Kconfig              |   1 +
 drivers/vfio/pci/vfio_pci_intrs.c     |   9 +
 drivers/vfio/pci/vfio_pci_private.h   |   2 +
 include/linux/irqbypass.h             |  90 +++++++++
 include/linux/kvm_host.h              |  29 +++
 include/linux/kvm_irqfd.h             |  71 +++++++
 virt/kvm/Kconfig                      |   3 +
 virt/kvm/eventfd.c                    | 142 +++++++------
 virt/kvm/irqchip.c                    |  10 -
 virt/kvm/kvm_main.c                   |   3 +
 virt/lib/Kconfig                      |   2 +
 virt/lib/Makefile                     |   1 +
 virt/lib/irqbypass.c                  | 257 ++++++++++++++++++++++++
 30 files changed, 1182 insertions(+), 101 deletions(-)
 create mode 100644 include/linux/irqbypass.h
 create mode 100644 include/linux/kvm_irqfd.h
 create mode 100644 virt/lib/Kconfig
 create mode 100644 virt/lib/Makefile
 create mode 100644 virt/lib/irqbypass.c

-- 
2.1.0


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v9 01/18] virt: IRQ bypass manager
  2015-09-18 14:29 [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series Feng Wu
@ 2015-09-18 14:29 ` Feng Wu
  2015-09-18 15:34   ` Wu, Feng
  2015-09-18 14:29 ` [PATCH v9 02/18] KVM: x86: select IRQ_BYPASS_MANAGER Feng Wu
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 56+ messages in thread
From: Feng Wu @ 2015-09-18 14:29 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel

From: Alex Williamson <alex.williamson@redhat.com>

When a physical I/O device is assigned to a virtual machine through
facilities like VFIO and KVM, the interrupt for the device generally
bounces through the host system before being injected into the VM.
However, hardware technologies exist that often allow the host to be
bypassed for some of these scenarios.  Intel Posted Interrupts allow
the specified physical edge interrupts to be directly injected into a
guest when delivered to a physical processor while the vCPU is
running.  ARM IRQ Forwarding allows forwarded physical interrupts to
be directly deactivated by the guest.

The IRQ bypass manager here is meant to provide the shim to connect
interrupt producers, generally the host physical device driver, with
interrupt consumers, generally the hypervisor, in order to configure
these bypass mechanism.  To do this, we base the connection on a
shared, opaque token.  For KVM-VFIO this is expected to be an
eventfd_ctx since this is the connection we already use to connect an
eventfd to an irqfd on the in-kernel path.  When a producer and
consumer with matching tokens is found, callbacks via both registered
participants allow the bypass facilities to be automatically enabled.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Tested-by: Eric Auger <eric.auger@linaro.org>
Tested-by: Feng Wu <feng.wu@intel.com>
---
v4: All producer callbacks are optional, as with Intel PI, it's
    possible for the producer to be blissfully unaware of the bypass.

 MAINTAINERS               |   7 ++
 include/linux/irqbypass.h |  90 ++++++++++++++++
 virt/lib/Kconfig          |   2 +
 virt/lib/Makefile         |   1 +
 virt/lib/irqbypass.c      | 257 ++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 357 insertions(+)
 create mode 100644 include/linux/irqbypass.h
 create mode 100644 virt/lib/Kconfig
 create mode 100644 virt/lib/Makefile
 create mode 100644 virt/lib/irqbypass.c

diff --git a/MAINTAINERS b/MAINTAINERS
index a9ae6c1..10c8b2f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10963,6 +10963,13 @@ L:	netdev@vger.kernel.org
 S:	Maintained
 F:	drivers/net/ethernet/via/via-velocity.*
 
+VIRT LIB
+M:	Alex Williamson <alex.williamson@redhat.com>
+M:	Paolo Bonzini <pbonzini@redhat.com>
+L:	kvm@vger.kernel.org
+S:	Supported
+F:	virt/lib/
+
 VIVID VIRTUAL VIDEO DRIVER
 M:	Hans Verkuil <hverkuil@xs4all.nl>
 L:	linux-media@vger.kernel.org
diff --git a/include/linux/irqbypass.h b/include/linux/irqbypass.h
new file mode 100644
index 0000000..1551b5b
--- /dev/null
+++ b/include/linux/irqbypass.h
@@ -0,0 +1,90 @@
+/*
+ * IRQ offload/bypass manager
+ *
+ * Copyright (C) 2015 Red Hat, Inc.
+ * Copyright (c) 2015 Linaro Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#ifndef IRQBYPASS_H
+#define IRQBYPASS_H
+
+#include <linux/list.h>
+
+struct irq_bypass_consumer;
+
+/*
+ * Theory of operation
+ *
+ * The IRQ bypass manager is a simple set of lists and callbacks that allows
+ * IRQ producers (ex. physical interrupt sources) to be matched to IRQ
+ * consumers (ex. virtualization hardware that allows IRQ bypass or offload)
+ * via a shared token (ex. eventfd_ctx).  Producers and consumers register
+ * independently.  When a token match is found, the optional @stop callback
+ * will be called for each participant.  The pair will then be connected via
+ * the @add_* callbacks, and finally the optional @start callback will allow
+ * any final coordination.  When either participant is unregistered, the
+ * process is repeated using the @del_* callbacks in place of the @add_*
+ * callbacks.  Match tokens must be unique per producer/consumer, 1:N pairings
+ * are not supported.
+ */
+
+/**
+ * struct irq_bypass_producer - IRQ bypass producer definition
+ * @node: IRQ bypass manager private list management
+ * @token: opaque token to match between producer and consumer
+ * @irq: Linux IRQ number for the producer device
+ * @add_consumer: Connect the IRQ producer to an IRQ consumer (optional)
+ * @del_consumer: Disconnect the IRQ producer from an IRQ consumer (optional)
+ * @stop: Perform any quiesce operations necessary prior to add/del (optional)
+ * @start: Perform any startup operations necessary after add/del (optional)
+ *
+ * The IRQ bypass producer structure represents an interrupt source for
+ * participation in possible host bypass, for instance an interrupt vector
+ * for a physical device assigned to a VM.
+ */
+struct irq_bypass_producer {
+	struct list_head node;
+	void *token;
+	int irq;
+	int (*add_consumer)(struct irq_bypass_producer *,
+			    struct irq_bypass_consumer *);
+	void (*del_consumer)(struct irq_bypass_producer *,
+			     struct irq_bypass_consumer *);
+	void (*stop)(struct irq_bypass_producer *);
+	void (*start)(struct irq_bypass_producer *);
+};
+
+/**
+ * struct irq_bypass_consumer - IRQ bypass consumer definition
+ * @node: IRQ bypass manager private list management
+ * @token: opaque token to match between producer and consumer
+ * @add_producer: Connect the IRQ consumer to an IRQ producer
+ * @del_producer: Disconnect the IRQ consumer from an IRQ producer
+ * @stop: Perform any quiesce operations necessary prior to add/del (optional)
+ * @start: Perform any startup operations necessary after add/del (optional)
+ *
+ * The IRQ bypass consumer structure represents an interrupt sink for
+ * participation in possible host bypass, for instance a hypervisor may
+ * support offloads to allow bypassing the host entirely or offload
+ * portions of the interrupt handling to the VM.
+ */
+struct irq_bypass_consumer {
+	struct list_head node;
+	void *token;
+	int (*add_producer)(struct irq_bypass_consumer *,
+			    struct irq_bypass_producer *);
+	void (*del_producer)(struct irq_bypass_consumer *,
+			     struct irq_bypass_producer *);
+	void (*stop)(struct irq_bypass_consumer *);
+	void (*start)(struct irq_bypass_consumer *);
+};
+
+int irq_bypass_register_producer(struct irq_bypass_producer *);
+void irq_bypass_unregister_producer(struct irq_bypass_producer *);
+int irq_bypass_register_consumer(struct irq_bypass_consumer *);
+void irq_bypass_unregister_consumer(struct irq_bypass_consumer *);
+
+#endif /* IRQBYPASS_H */
diff --git a/virt/lib/Kconfig b/virt/lib/Kconfig
new file mode 100644
index 0000000..89a414f
--- /dev/null
+++ b/virt/lib/Kconfig
@@ -0,0 +1,2 @@
+config IRQ_BYPASS_MANAGER
+	tristate
diff --git a/virt/lib/Makefile b/virt/lib/Makefile
new file mode 100644
index 0000000..901228d
--- /dev/null
+++ b/virt/lib/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_IRQ_BYPASS_MANAGER) += irqbypass.o
diff --git a/virt/lib/irqbypass.c b/virt/lib/irqbypass.c
new file mode 100644
index 0000000..09a03b5
--- /dev/null
+++ b/virt/lib/irqbypass.c
@@ -0,0 +1,257 @@
+/*
+ * IRQ offload/bypass manager
+ *
+ * Copyright (C) 2015 Red Hat, Inc.
+ * Copyright (c) 2015 Linaro Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Various virtualization hardware acceleration techniques allow bypassing or
+ * offloading interrupts received from devices around the host kernel.  Posted
+ * Interrupts on Intel VT-d systems can allow interrupts to be received
+ * directly by a virtual machine.  ARM IRQ Forwarding allows forwarded physical
+ * interrupts to be directly deactivated by the guest.  This manager allows
+ * interrupt producers and consumers to find each other to enable this sort of
+ * bypass.
+ */
+
+#include <linux/irqbypass.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("IRQ bypass manager utility module");
+
+static LIST_HEAD(producers);
+static LIST_HEAD(consumers);
+static DEFINE_MUTEX(lock);
+
+/* @lock must be held when calling connect */
+static int __connect(struct irq_bypass_producer *prod,
+		     struct irq_bypass_consumer *cons)
+{
+	int ret = 0;
+
+	if (prod->stop)
+		prod->stop(prod);
+	if (cons->stop)
+		cons->stop(cons);
+
+	if (prod->add_consumer)
+		ret = prod->add_consumer(prod, cons);
+
+	if (!ret) {
+		ret = cons->add_producer(cons, prod);
+		if (ret && prod->del_consumer)
+			prod->del_consumer(prod, cons);
+	}
+
+	if (cons->start)
+		cons->start(cons);
+	if (prod->start)
+		prod->start(prod);
+
+	return ret;
+}
+
+/* @lock must be held when calling disconnect */
+static void __disconnect(struct irq_bypass_producer *prod,
+			 struct irq_bypass_consumer *cons)
+{
+	if (prod->stop)
+		prod->stop(prod);
+	if (cons->stop)
+		cons->stop(cons);
+
+	cons->del_producer(cons, prod);
+
+	if (prod->del_consumer)
+		prod->del_consumer(prod, cons);
+
+	if (cons->start)
+		cons->start(cons);
+	if (prod->start)
+		prod->start(prod);
+}
+
+/**
+ * irq_bypass_register_producer - register IRQ bypass producer
+ * @producer: pointer to producer structure
+ *
+ * Add the provided IRQ producer to the list of producers and connect
+ * with any matching token found on the IRQ consumers list.
+ */
+int irq_bypass_register_producer(struct irq_bypass_producer *producer)
+{
+	struct irq_bypass_producer *tmp;
+	struct irq_bypass_consumer *consumer;
+
+	might_sleep();
+
+	if (!try_module_get(THIS_MODULE))
+		return -ENODEV;
+
+	mutex_lock(&lock);
+
+	list_for_each_entry(tmp, &producers, node) {
+		if (tmp->token == producer->token) {
+			mutex_unlock(&lock);
+			module_put(THIS_MODULE);
+			return -EBUSY;
+		}
+	}
+
+	list_for_each_entry(consumer, &consumers, node) {
+		if (consumer->token == producer->token) {
+			int ret = __connect(producer, consumer);
+			if (ret) {
+				mutex_unlock(&lock);
+				module_put(THIS_MODULE);
+				return ret;
+			}
+			break;
+		}
+	}
+
+	list_add(&producer->node, &producers);
+
+	mutex_unlock(&lock);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(irq_bypass_register_producer);
+
+/**
+ * irq_bypass_unregister_producer - unregister IRQ bypass producer
+ * @producer: pointer to producer structure
+ *
+ * Remove a previously registered IRQ producer from the list of producers
+ * and disconnect it from any connected IRQ consumer.
+ */
+void irq_bypass_unregister_producer(struct irq_bypass_producer *producer)
+{
+	struct irq_bypass_producer *tmp;
+	struct irq_bypass_consumer *consumer;
+
+	might_sleep();
+
+	if (!try_module_get(THIS_MODULE))
+		return; /* nothing in the list anyway */
+
+	mutex_lock(&lock);
+
+	list_for_each_entry(tmp, &producers, node) {
+		if (tmp->token != producer->token)
+			continue;
+
+		list_for_each_entry(consumer, &consumers, node) {
+			if (consumer->token == producer->token) {
+				__disconnect(producer, consumer);
+				break;
+			}
+		}
+
+		list_del(&producer->node);
+		module_put(THIS_MODULE);
+		break;
+	}
+
+	mutex_unlock(&lock);
+
+	module_put(THIS_MODULE);
+}
+EXPORT_SYMBOL_GPL(irq_bypass_unregister_producer);
+
+/**
+ * irq_bypass_register_consumer - register IRQ bypass consumer
+ * @consumer: pointer to consumer structure
+ *
+ * Add the provided IRQ consumer to the list of consumers and connect
+ * with any matching token found on the IRQ producer list.
+ */
+int irq_bypass_register_consumer(struct irq_bypass_consumer *consumer)
+{
+	struct irq_bypass_consumer *tmp;
+	struct irq_bypass_producer *producer;
+
+	if (!consumer->add_producer || !consumer->del_producer)
+		return -EINVAL;
+
+	might_sleep();
+
+	if (!try_module_get(THIS_MODULE))
+		return -ENODEV;
+
+	mutex_lock(&lock);
+
+	list_for_each_entry(tmp, &consumers, node) {
+		if (tmp->token == consumer->token) {
+			mutex_unlock(&lock);
+			module_put(THIS_MODULE);
+			return -EBUSY;
+		}
+	}
+
+	list_for_each_entry(producer, &producers, node) {
+		if (producer->token == consumer->token) {
+			int ret = __connect(producer, consumer);
+			if (ret) {
+				mutex_unlock(&lock);
+				module_put(THIS_MODULE);
+				return ret;
+			}
+			break;
+		}
+	}
+
+	list_add(&consumer->node, &consumers);
+
+	mutex_unlock(&lock);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(irq_bypass_register_consumer);
+
+/**
+ * irq_bypass_unregister_consumer - unregister IRQ bypass consumer
+ * @consumer: pointer to consumer structure
+ *
+ * Remove a previously registered IRQ consumer from the list of consumers
+ * and disconnect it from any connected IRQ producer.
+ */
+void irq_bypass_unregister_consumer(struct irq_bypass_consumer *consumer)
+{
+	struct irq_bypass_consumer *tmp;
+	struct irq_bypass_producer *producer;
+
+	might_sleep();
+
+	if (!try_module_get(THIS_MODULE))
+		return; /* nothing in the list anyway */
+
+	mutex_lock(&lock);
+
+	list_for_each_entry(tmp, &consumers, node) {
+		if (tmp->token != consumer->token)
+			continue;
+
+		list_for_each_entry(producer, &producers, node) {
+			if (producer->token == consumer->token) {
+				__disconnect(producer, consumer);
+				break;
+			}
+		}
+
+		list_del(&consumer->node);
+		module_put(THIS_MODULE);
+		break;
+	}
+
+	mutex_unlock(&lock);
+
+	module_put(THIS_MODULE);
+}
+EXPORT_SYMBOL_GPL(irq_bypass_unregister_consumer);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v9 02/18] KVM: x86: select IRQ_BYPASS_MANAGER
  2015-09-18 14:29 [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series Feng Wu
  2015-09-18 14:29 ` [PATCH v9 01/18] virt: IRQ bypass manager Feng Wu
@ 2015-09-18 14:29 ` Feng Wu
  2015-09-18 14:29 ` [PATCH v9 03/18] KVM: arm/arm64: " Feng Wu
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 56+ messages in thread
From: Feng Wu @ 2015-09-18 14:29 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

Select IRQ_BYPASS_MANAGER for x86 when CONFIG_KVM is set

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 arch/x86/kvm/Kconfig  | 2 ++
 arch/x86/kvm/Makefile | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index d8a1d56..c951d44 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -3,6 +3,7 @@
 #
 
 source "virt/kvm/Kconfig"
+source "virt/lib/Kconfig"
 
 menuconfig VIRTUALIZATION
 	bool "Virtualization"
@@ -28,6 +29,7 @@ config KVM
 	select ANON_INODES
 	select HAVE_KVM_IRQCHIP
 	select HAVE_KVM_IRQFD
+	select IRQ_BYPASS_MANAGER
 	select HAVE_KVM_IRQ_ROUTING
 	select HAVE_KVM_EVENTFD
 	select KVM_APIC_ARCHITECTURE
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 67d215c..05cc2d7 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -6,6 +6,9 @@ CFLAGS_svm.o := -I.
 CFLAGS_vmx.o := -I.
 
 KVM := ../../../virt/kvm
+LIB := ../../../virt/lib
+
+obj-$(CONFIG_IRQ_BYPASS_MANAGER)	+= $(LIB)/
 
 kvm-y			+= $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \
 				$(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v9 03/18] KVM: arm/arm64: select IRQ_BYPASS_MANAGER
  2015-09-18 14:29 [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series Feng Wu
  2015-09-18 14:29 ` [PATCH v9 01/18] virt: IRQ bypass manager Feng Wu
  2015-09-18 14:29 ` [PATCH v9 02/18] KVM: x86: select IRQ_BYPASS_MANAGER Feng Wu
@ 2015-09-18 14:29 ` Feng Wu
  2015-09-21 19:32   ` Eric Auger
  2015-09-18 14:29 ` [PATCH v9 04/18] KVM: create kvm_irqfd.h Feng Wu
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 56+ messages in thread
From: Feng Wu @ 2015-09-18 14:29 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

From: Eric Auger <eric.auger@linaro.org>

Select IRQ_BYPASS_MANAGER when CONFIG_KVM is set
Also add compilation of virt/lib.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v3 -> v4:
- add compilation of virt/lib in arm/arm64 KVM

v2 -> v3:
- [Feng Wu] Correct a typo in 'arch/arm64/kvm/Kconfig'

v1 -> v2:
- also set IRQ_BYPASS_MANAGER for arm64

 arch/arm/kvm/Kconfig    | 2 ++
 arch/arm/kvm/Makefile   | 1 +
 arch/arm64/kvm/Kconfig  | 2 ++
 arch/arm64/kvm/Makefile | 1 +
 4 files changed, 6 insertions(+)

diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index bfb915d..3c565b9 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -3,6 +3,7 @@
 #
 
 source "virt/kvm/Kconfig"
+source "virt/lib/Kconfig"
 
 menuconfig VIRTUALIZATION
 	bool "Virtualization"
@@ -31,6 +32,7 @@ config KVM
 	select KVM_VFIO
 	select HAVE_KVM_EVENTFD
 	select HAVE_KVM_IRQFD
+	select IRQ_BYPASS_MANAGER
 	depends on ARM_VIRT_EXT && ARM_LPAE && ARM_ARCH_TIMER
 	---help---
 	  Support hosting virtualized guest machines.
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index c5eef02c..a6a41dd 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -24,3 +24,4 @@ obj-y += $(KVM)/arm/vgic.o
 obj-y += $(KVM)/arm/vgic-v2.o
 obj-y += $(KVM)/arm/vgic-v2-emul.o
 obj-y += $(KVM)/arm/arch_timer.o
+obj-y += ../../../virt/lib/
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index bfffe8f..2509539 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -3,6 +3,7 @@
 #
 
 source "virt/kvm/Kconfig"
+source "virt/lib/Kconfig"
 
 menuconfig VIRTUALIZATION
 	bool "Virtualization"
@@ -31,6 +32,7 @@ config KVM
 	select KVM_VFIO
 	select HAVE_KVM_EVENTFD
 	select HAVE_KVM_IRQFD
+	select IRQ_BYPASS_MANAGER
 	---help---
 	  Support hosting virtualized guest machines.
 
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index f90f4aa..55eec69 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -27,3 +27,4 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v3.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v3-emul.o
 kvm-$(CONFIG_KVM_ARM_HOST) += vgic-v3-switch.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
+kvm-$(CONFIG_KVM_ARM_HOST) += ../../../virt/lib/
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v9 04/18] KVM: create kvm_irqfd.h
  2015-09-18 14:29 [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series Feng Wu
                   ` (2 preceding siblings ...)
  2015-09-18 14:29 ` [PATCH v9 03/18] KVM: arm/arm64: " Feng Wu
@ 2015-09-18 14:29 ` Feng Wu
  2015-09-18 15:35   ` Wu, Feng
  2015-09-18 14:29 ` [PATCH v9 05/18] KVM: introduce kvm_arch functions for IRQ bypass Feng Wu
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 56+ messages in thread
From: Feng Wu @ 2015-09-18 14:29 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel

From: Eric Auger <eric.auger@linaro.org>

Move _irqfd_resampler and _irqfd struct declarations in a new
public header: kvm_irqfd.h. They are respectively renamed into
kvm_kernel_irqfd_resampler and kvm_kernel_irqfd. Those datatypes
will be used by architecture specific code, in the context of
IRQ bypass manager integration.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 include/linux/kvm_irqfd.h | 69 ++++++++++++++++++++++++++++++++++
 virt/kvm/eventfd.c        | 95 ++++++++++++-----------------------------------
 2 files changed, 92 insertions(+), 72 deletions(-)
 create mode 100644 include/linux/kvm_irqfd.h

diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
new file mode 100644
index 0000000..f926b39
--- /dev/null
+++ b/include/linux/kvm_irqfd.h
@@ -0,0 +1,69 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * irqfd: Allows an fd to be used to inject an interrupt to the guest
+ * Credit goes to Avi Kivity for the original idea.
+ */
+
+#ifndef __LINUX_KVM_IRQFD_H
+#define __LINUX_KVM_IRQFD_H
+
+#include <linux/kvm_host.h>
+#include <linux/poll.h>
+
+/*
+ * Resampling irqfds are a special variety of irqfds used to emulate
+ * level triggered interrupts.  The interrupt is asserted on eventfd
+ * trigger.  On acknowledgment through the irq ack notifier, the
+ * interrupt is de-asserted and userspace is notified through the
+ * resamplefd.  All resamplers on the same gsi are de-asserted
+ * together, so we don't need to track the state of each individual
+ * user.  We can also therefore share the same irq source ID.
+ */
+struct kvm_kernel_irqfd_resampler {
+	struct kvm *kvm;
+	/*
+	 * List of resampling struct _irqfd objects sharing this gsi.
+	 * RCU list modified under kvm->irqfds.resampler_lock
+	 */
+	struct list_head list;
+	struct kvm_irq_ack_notifier notifier;
+	/*
+	 * Entry in list of kvm->irqfd.resampler_list.  Use for sharing
+	 * resamplers among irqfds on the same gsi.
+	 * Accessed and modified under kvm->irqfds.resampler_lock
+	 */
+	struct list_head link;
+};
+
+struct kvm_kernel_irqfd {
+	/* Used for MSI fast-path */
+	struct kvm *kvm;
+	wait_queue_t wait;
+	/* Update side is protected by irqfds.lock */
+	struct kvm_kernel_irq_routing_entry irq_entry;
+	seqcount_t irq_entry_sc;
+	/* Used for level IRQ fast-path */
+	int gsi;
+	struct work_struct inject;
+	/* The resampler used by this irqfd (resampler-only) */
+	struct kvm_kernel_irqfd_resampler *resampler;
+	/* Eventfd notified on resample (resampler-only) */
+	struct eventfd_ctx *resamplefd;
+	/* Entry in list of irqfds for a resampler (resampler-only) */
+	struct list_head resampler_link;
+	/* Used for setup/shutdown */
+	struct eventfd_ctx *eventfd;
+	struct list_head list;
+	poll_table pt;
+	struct work_struct shutdown;
+};
+
+#endif /* __LINUX_KVM_IRQFD_H */
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 9ff4193..647ffb8 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -23,6 +23,7 @@
 
 #include <linux/kvm_host.h>
 #include <linux/kvm.h>
+#include <linux/kvm_irqfd.h>
 #include <linux/workqueue.h>
 #include <linux/syscalls.h>
 #include <linux/wait.h>
@@ -39,68 +40,14 @@
 #include <kvm/iodev.h>
 
 #ifdef CONFIG_HAVE_KVM_IRQFD
-/*
- * --------------------------------------------------------------------
- * irqfd: Allows an fd to be used to inject an interrupt to the guest
- *
- * Credit goes to Avi Kivity for the original idea.
- * --------------------------------------------------------------------
- */
-
-/*
- * Resampling irqfds are a special variety of irqfds used to emulate
- * level triggered interrupts.  The interrupt is asserted on eventfd
- * trigger.  On acknowledgement through the irq ack notifier, the
- * interrupt is de-asserted and userspace is notified through the
- * resamplefd.  All resamplers on the same gsi are de-asserted
- * together, so we don't need to track the state of each individual
- * user.  We can also therefore share the same irq source ID.
- */
-struct _irqfd_resampler {
-	struct kvm *kvm;
-	/*
-	 * List of resampling struct _irqfd objects sharing this gsi.
-	 * RCU list modified under kvm->irqfds.resampler_lock
-	 */
-	struct list_head list;
-	struct kvm_irq_ack_notifier notifier;
-	/*
-	 * Entry in list of kvm->irqfd.resampler_list.  Use for sharing
-	 * resamplers among irqfds on the same gsi.
-	 * Accessed and modified under kvm->irqfds.resampler_lock
-	 */
-	struct list_head link;
-};
-
-struct _irqfd {
-	/* Used for MSI fast-path */
-	struct kvm *kvm;
-	wait_queue_t wait;
-	/* Update side is protected by irqfds.lock */
-	struct kvm_kernel_irq_routing_entry irq_entry;
-	seqcount_t irq_entry_sc;
-	/* Used for level IRQ fast-path */
-	int gsi;
-	struct work_struct inject;
-	/* The resampler used by this irqfd (resampler-only) */
-	struct _irqfd_resampler *resampler;
-	/* Eventfd notified on resample (resampler-only) */
-	struct eventfd_ctx *resamplefd;
-	/* Entry in list of irqfds for a resampler (resampler-only) */
-	struct list_head resampler_link;
-	/* Used for setup/shutdown */
-	struct eventfd_ctx *eventfd;
-	struct list_head list;
-	poll_table pt;
-	struct work_struct shutdown;
-};
 
 static struct workqueue_struct *irqfd_cleanup_wq;
 
 static void
 irqfd_inject(struct work_struct *work)
 {
-	struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
+	struct kvm_kernel_irqfd *irqfd =
+		container_of(work, struct kvm_kernel_irqfd, inject);
 	struct kvm *kvm = irqfd->kvm;
 
 	if (!irqfd->resampler) {
@@ -121,12 +68,13 @@ irqfd_inject(struct work_struct *work)
 static void
 irqfd_resampler_ack(struct kvm_irq_ack_notifier *kian)
 {
-	struct _irqfd_resampler *resampler;
+	struct kvm_kernel_irqfd_resampler *resampler;
 	struct kvm *kvm;
-	struct _irqfd *irqfd;
+	struct kvm_kernel_irqfd *irqfd;
 	int idx;
 
-	resampler = container_of(kian, struct _irqfd_resampler, notifier);
+	resampler = container_of(kian,
+			struct kvm_kernel_irqfd_resampler, notifier);
 	kvm = resampler->kvm;
 
 	kvm_set_irq(kvm, KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID,
@@ -141,9 +89,9 @@ irqfd_resampler_ack(struct kvm_irq_ack_notifier *kian)
 }
 
 static void
-irqfd_resampler_shutdown(struct _irqfd *irqfd)
+irqfd_resampler_shutdown(struct kvm_kernel_irqfd *irqfd)
 {
-	struct _irqfd_resampler *resampler = irqfd->resampler;
+	struct kvm_kernel_irqfd_resampler *resampler = irqfd->resampler;
 	struct kvm *kvm = resampler->kvm;
 
 	mutex_lock(&kvm->irqfds.resampler_lock);
@@ -168,7 +116,8 @@ irqfd_resampler_shutdown(struct _irqfd *irqfd)
 static void
 irqfd_shutdown(struct work_struct *work)
 {
-	struct _irqfd *irqfd = container_of(work, struct _irqfd, shutdown);
+	struct kvm_kernel_irqfd *irqfd =
+		container_of(work, struct kvm_kernel_irqfd, shutdown);
 	u64 cnt;
 
 	/*
@@ -198,7 +147,7 @@ irqfd_shutdown(struct work_struct *work)
 
 /* assumes kvm->irqfds.lock is held */
 static bool
-irqfd_is_active(struct _irqfd *irqfd)
+irqfd_is_active(struct kvm_kernel_irqfd *irqfd)
 {
 	return list_empty(&irqfd->list) ? false : true;
 }
@@ -209,7 +158,7 @@ irqfd_is_active(struct _irqfd *irqfd)
  * assumes kvm->irqfds.lock is held
  */
 static void
-irqfd_deactivate(struct _irqfd *irqfd)
+irqfd_deactivate(struct kvm_kernel_irqfd *irqfd)
 {
 	BUG_ON(!irqfd_is_active(irqfd));
 
@@ -224,7 +173,8 @@ irqfd_deactivate(struct _irqfd *irqfd)
 static int
 irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
 {
-	struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
+	struct kvm_kernel_irqfd *irqfd =
+		container_of(wait, struct kvm_kernel_irqfd, wait);
 	unsigned long flags = (unsigned long)key;
 	struct kvm_kernel_irq_routing_entry irq;
 	struct kvm *kvm = irqfd->kvm;
@@ -274,12 +224,13 @@ static void
 irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh,
 			poll_table *pt)
 {
-	struct _irqfd *irqfd = container_of(pt, struct _irqfd, pt);
+	struct kvm_kernel_irqfd *irqfd =
+		container_of(pt, struct kvm_kernel_irqfd, pt);
 	add_wait_queue(wqh, &irqfd->wait);
 }
 
 /* Must be called under irqfds.lock */
-static void irqfd_update(struct kvm *kvm, struct _irqfd *irqfd)
+static void irqfd_update(struct kvm *kvm, struct kvm_kernel_irqfd *irqfd)
 {
 	struct kvm_kernel_irq_routing_entry *e;
 	struct kvm_kernel_irq_routing_entry entries[KVM_NR_IRQCHIPS];
@@ -304,7 +255,7 @@ static void irqfd_update(struct kvm *kvm, struct _irqfd *irqfd)
 static int
 kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 {
-	struct _irqfd *irqfd, *tmp;
+	struct kvm_kernel_irqfd *irqfd, *tmp;
 	struct fd f;
 	struct eventfd_ctx *eventfd = NULL, *resamplefd = NULL;
 	int ret;
@@ -340,7 +291,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 	irqfd->eventfd = eventfd;
 
 	if (args->flags & KVM_IRQFD_FLAG_RESAMPLE) {
-		struct _irqfd_resampler *resampler;
+		struct kvm_kernel_irqfd_resampler *resampler;
 
 		resamplefd = eventfd_ctx_fdget(args->resamplefd);
 		if (IS_ERR(resamplefd)) {
@@ -525,7 +476,7 @@ kvm_eventfd_init(struct kvm *kvm)
 static int
 kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
 {
-	struct _irqfd *irqfd, *tmp;
+	struct kvm_kernel_irqfd *irqfd, *tmp;
 	struct eventfd_ctx *eventfd;
 
 	eventfd = eventfd_ctx_fdget(args->fd);
@@ -581,7 +532,7 @@ kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
 void
 kvm_irqfd_release(struct kvm *kvm)
 {
-	struct _irqfd *irqfd, *tmp;
+	struct kvm_kernel_irqfd *irqfd, *tmp;
 
 	spin_lock_irq(&kvm->irqfds.lock);
 
@@ -604,7 +555,7 @@ kvm_irqfd_release(struct kvm *kvm)
  */
 void kvm_irq_routing_update(struct kvm *kvm)
 {
-	struct _irqfd *irqfd;
+	struct kvm_kernel_irqfd *irqfd;
 
 	spin_lock_irq(&kvm->irqfds.lock);
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v9 05/18] KVM: introduce kvm_arch functions for IRQ bypass
  2015-09-18 14:29 [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series Feng Wu
                   ` (3 preceding siblings ...)
  2015-09-18 14:29 ` [PATCH v9 04/18] KVM: create kvm_irqfd.h Feng Wu
@ 2015-09-18 14:29 ` Feng Wu
  2015-09-18 14:29 ` [PATCH v9 06/18] KVM: eventfd: add irq bypass consumer management Feng Wu
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 56+ messages in thread
From: Feng Wu @ 2015-09-18 14:29 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

From: Eric Auger <eric.auger@linaro.org>

This patch introduces
- kvm_arch_irq_bypass_add_producer
- kvm_arch_irq_bypass_del_producer
- kvm_arch_irq_bypass_stop
- kvm_arch_irq_bypass_start

They make possible to specialize the KVM IRQ bypass consumer in
case CONFIG_KVM_HAVE_IRQ_BYPASS is set.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v4 -> v5:
- remove static inline stub functions

v2 -> v3 (Feng Wu):
- use 'kvm_arch_irq_bypass_start' instead of 'kvm_arch_irq_bypass_resume'
- Remove 'kvm_arch_irq_bypass_update', which is not needed to be
  a irqbypass callback per Alex's comments.
- Make kvm_arch_irq_bypass_add_producer return 'int'

v1 -> v2:
- use CONFIG_KVM_HAVE_IRQ_BYPASS instead CONFIG_IRQ_BYPASS_MANAGER
- rename all functions according to Paolo's proposal
- add kvm_arch_irq_bypass_update according to Feng's need

 include/linux/kvm_host.h | 10 ++++++++++
 virt/kvm/Kconfig         |  3 +++
 2 files changed, 13 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 05e99b8..5ac8d21 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -24,6 +24,7 @@
 #include <linux/err.h>
 #include <linux/irqflags.h>
 #include <linux/context_tracking.h>
+#include <linux/irqbypass.h>
 #include <asm/signal.h>
 
 #include <linux/kvm.h>
@@ -1151,5 +1152,14 @@ static inline void kvm_vcpu_set_dy_eligible(struct kvm_vcpu *vcpu, bool val)
 {
 }
 #endif /* CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT */
+
+#ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
+int kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *,
+			   struct irq_bypass_producer *);
+void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *,
+			   struct irq_bypass_producer *);
+void kvm_arch_irq_bypass_stop(struct irq_bypass_consumer *);
+void kvm_arch_irq_bypass_start(struct irq_bypass_consumer *);
+#endif /* CONFIG_HAVE_KVM_IRQ_BYPASS */
 #endif
 
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index e2c876d..9f8014d 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -47,3 +47,6 @@ config KVM_GENERIC_DIRTYLOG_READ_PROTECT
 config KVM_COMPAT
        def_bool y
        depends on COMPAT && !S390
+
+config HAVE_KVM_IRQ_BYPASS
+       bool
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v9 06/18] KVM: eventfd: add irq bypass consumer management
  2015-09-18 14:29 [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series Feng Wu
                   ` (4 preceding siblings ...)
  2015-09-18 14:29 ` [PATCH v9 05/18] KVM: introduce kvm_arch functions for IRQ bypass Feng Wu
@ 2015-09-18 14:29 ` Feng Wu
  2015-09-18 14:29 ` [PATCH v9 07/18] KVM: Extend struct pi_desc for VT-d Posted-Interrupts Feng Wu
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 56+ messages in thread
From: Feng Wu @ 2015-09-18 14:29 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

From: Eric Auger <eric.auger@linaro.org>

This patch adds the registration/unregistration of an
irq_bypass_consumer on irqfd assignment/deassignment.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v4 -> v5:
- due to removal of static inline stubs, add
  #ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
  around consumer registration/unregistration
- add pr_info when registration fails

v2 -> v3 (Feng Wu):
- Use kvm_arch_irq_bypass_start
- Remove kvm_arch_irq_bypass_update
- Add member 'struct irq_bypass_producer *producer' in
  'struct kvm_kernel_irqfd', it is needed by posted interrupt.
- Remove 'irq_bypass_unregister_consumer' in kvm_irqfd_deassign()

v1 -> v2:
- populate of kvm and gsi removed
- unregister the consumer on irqfd_shutdown

 include/linux/kvm_irqfd.h |  2 ++
 virt/kvm/eventfd.c        | 15 +++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
index f926b39..0c1de05 100644
--- a/include/linux/kvm_irqfd.h
+++ b/include/linux/kvm_irqfd.h
@@ -64,6 +64,8 @@ struct kvm_kernel_irqfd {
 	struct list_head list;
 	poll_table pt;
 	struct work_struct shutdown;
+	struct irq_bypass_consumer consumer;
+	struct irq_bypass_producer *producer;
 };
 
 #endif /* __LINUX_KVM_IRQFD_H */
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 647ffb8..d7a230f 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -35,6 +35,7 @@
 #include <linux/srcu.h>
 #include <linux/slab.h>
 #include <linux/seqlock.h>
+#include <linux/irqbypass.h>
 #include <trace/events/kvm.h>
 
 #include <kvm/iodev.h>
@@ -140,6 +141,9 @@ irqfd_shutdown(struct work_struct *work)
 	/*
 	 * It is now safe to release the object's resources
 	 */
+#ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
+	irq_bypass_unregister_consumer(&irqfd->consumer);
+#endif
 	eventfd_ctx_put(irqfd->eventfd);
 	kfree(irqfd);
 }
@@ -379,6 +383,17 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 	 * we might race against the POLLHUP
 	 */
 	fdput(f);
+#ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
+	irqfd->consumer.token = (void *)irqfd->eventfd;
+	irqfd->consumer.add_producer = kvm_arch_irq_bypass_add_producer;
+	irqfd->consumer.del_producer = kvm_arch_irq_bypass_del_producer;
+	irqfd->consumer.stop = kvm_arch_irq_bypass_stop;
+	irqfd->consumer.start = kvm_arch_irq_bypass_start;
+	ret = irq_bypass_register_consumer(&irqfd->consumer);
+	if (ret)
+		pr_info("irq bypass consumer (token %p) registration fails: %d\n",
+				irqfd->consumer.token, ret);
+#endif
 
 	return 0;
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v9 07/18] KVM: Extend struct pi_desc for VT-d Posted-Interrupts
  2015-09-18 14:29 [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series Feng Wu
                   ` (5 preceding siblings ...)
  2015-09-18 14:29 ` [PATCH v9 06/18] KVM: eventfd: add irq bypass consumer management Feng Wu
@ 2015-09-18 14:29 ` Feng Wu
  2015-09-18 14:29 ` [PATCH v9 08/18] KVM: Add some helper functions for Posted-Interrupts Feng Wu
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 56+ messages in thread
From: Feng Wu @ 2015-09-18 14:29 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

Extend struct pi_desc for VT-d Posted-Interrupts.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 arch/x86/kvm/vmx.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 83b7b5c..271dd70 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -446,8 +446,24 @@ struct nested_vmx {
 /* Posted-Interrupt Descriptor */
 struct pi_desc {
 	u32 pir[8];     /* Posted interrupt requested */
-	u32 control;	/* bit 0 of control is outstanding notification bit */
-	u32 rsvd[7];
+	union {
+		struct {
+				/* bit 256 - Outstanding Notification */
+			u16	on	: 1,
+				/* bit 257 - Suppress Notification */
+				sn	: 1,
+				/* bit 271:258 - Reserved */
+				rsvd_1	: 14;
+				/* bit 279:272 - Notification Vector */
+			u8	nv;
+				/* bit 287:280 - Reserved */
+			u8	rsvd_2;
+				/* bit 319:288 - Notification Destination */
+			u32	ndst;
+		};
+		u64 control;
+	};
+	u32 rsvd[6];
 } __aligned(64);
 
 static bool pi_test_and_set_on(struct pi_desc *pi_desc)
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v9 08/18] KVM: Add some helper functions for Posted-Interrupts
  2015-09-18 14:29 [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series Feng Wu
                   ` (6 preceding siblings ...)
  2015-09-18 14:29 ` [PATCH v9 07/18] KVM: Extend struct pi_desc for VT-d Posted-Interrupts Feng Wu
@ 2015-09-18 14:29 ` Feng Wu
  2015-09-18 14:29 ` [PATCH v9 09/18] KVM: Define a new interface kvm_intr_is_single_vcpu() Feng Wu
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 56+ messages in thread
From: Feng Wu @ 2015-09-18 14:29 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

This patch adds some helper functions to manipulate the
Posted-Interrupts Descriptor.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 271dd70..316f9bf 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -443,6 +443,8 @@ struct nested_vmx {
 };
 
 #define POSTED_INTR_ON  0
+#define POSTED_INTR_SN  1
+
 /* Posted-Interrupt Descriptor */
 struct pi_desc {
 	u32 pir[8];     /* Posted interrupt requested */
@@ -483,6 +485,30 @@ static int pi_test_and_set_pir(int vector, struct pi_desc *pi_desc)
 	return test_and_set_bit(vector, (unsigned long *)pi_desc->pir);
 }
 
+static void pi_clear_sn(struct pi_desc *pi_desc)
+{
+	return clear_bit(POSTED_INTR_SN,
+			(unsigned long *)&pi_desc->control);
+}
+
+static void pi_set_sn(struct pi_desc *pi_desc)
+{
+	return set_bit(POSTED_INTR_SN,
+			(unsigned long *)&pi_desc->control);
+}
+
+static int pi_test_on(struct pi_desc *pi_desc)
+{
+	return test_bit(POSTED_INTR_ON,
+			(unsigned long *)&pi_desc->control);
+}
+
+static int pi_test_sn(struct pi_desc *pi_desc)
+{
+	return test_bit(POSTED_INTR_SN,
+			(unsigned long *)&pi_desc->control);
+}
+
 struct vcpu_vmx {
 	struct kvm_vcpu       vcpu;
 	unsigned long         host_rsp;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v9 09/18] KVM: Define a new interface kvm_intr_is_single_vcpu()
  2015-09-18 14:29 [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series Feng Wu
                   ` (7 preceding siblings ...)
  2015-09-18 14:29 ` [PATCH v9 08/18] KVM: Add some helper functions for Posted-Interrupts Feng Wu
@ 2015-09-18 14:29 ` Feng Wu
  2015-09-18 14:29 ` [PATCH v9 10/18] KVM: Make struct kvm_irq_routing_table accessible Feng Wu
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 56+ messages in thread
From: Feng Wu @ 2015-09-18 14:29 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

This patch defines a new interface kvm_intr_is_single_vcpu(),
which can returns whether the interrupt is for single-CPU or not.

It is used by VT-d PI, since now we only support single-CPU
interrupts, For lowest-priority interrupts, if user configures
it via /proc/irq or uses irqbalance to make it single-CPU, we
can use PI to deliver the interrupts to it. Full functionality
of lowest-priority support will be added later.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v9:
- Move kvm_intr_is_single_vcpu_fast() to lapic.c
- Remove incorrect WARN_ON_ONCE()

v8:
- Some optimizations in kvm_intr_is_single_vcpu().
- Expose kvm_intr_is_single_vcpu() so we can use it in vmx code.
- Add kvm_intr_is_single_vcpu_fast() as the fast path to find
  the target vCPU for the single-destination interrupt

 arch/x86/include/asm/kvm_host.h |  3 +++
 arch/x86/kvm/irq_comm.c         | 27 +++++++++++++++++++
 arch/x86/kvm/lapic.c            | 59 +++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/lapic.h            |  2 ++
 4 files changed, 91 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 49ec903..af11bca 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1204,4 +1204,7 @@ int __x86_set_memory_region(struct kvm *kvm,
 int x86_set_memory_region(struct kvm *kvm,
 			  const struct kvm_userspace_memory_region *mem);
 
+bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
+			     struct kvm_vcpu **dest_vcpu);
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 9efff9e..f86a0da 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -297,6 +297,33 @@ out:
 	return r;
 }
 
+bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
+			     struct kvm_vcpu **dest_vcpu)
+{
+	int i, r = 0;
+	struct kvm_vcpu *vcpu;
+
+	if (kvm_intr_is_single_vcpu_fast(kvm, irq, dest_vcpu))
+		return true;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (!kvm_apic_present(vcpu))
+			continue;
+
+		if (!kvm_apic_match_dest(vcpu, NULL, irq->shorthand,
+					irq->dest_id, irq->dest_mode))
+			continue;
+
+		if (++r == 2)
+			return false;
+
+		*dest_vcpu = vcpu;
+	}
+
+	return r == 1;
+}
+EXPORT_SYMBOL_GPL(kvm_intr_is_single_vcpu);
+
 #define IOAPIC_ROUTING_ENTRY(irq) \
 	{ .gsi = irq, .type = KVM_IRQ_ROUTING_IRQCHIP,	\
 	  .u.irqchip = { .irqchip = KVM_IRQCHIP_IOAPIC, .pin = (irq) } }
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 2a5ca97..3c8fc71 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -764,6 +764,65 @@ out:
 	return ret;
 }
 
+bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq,
+			struct kvm_vcpu **dest_vcpu)
+{
+	struct kvm_apic_map *map;
+	bool ret = false;
+	struct kvm_lapic *dst = NULL;
+
+	if (irq->shorthand)
+		return false;
+
+	rcu_read_lock();
+	map = rcu_dereference(kvm->arch.apic_map);
+
+	if (!map)
+		goto out;
+
+	if (irq->dest_mode == APIC_DEST_PHYSICAL) {
+		if (irq->dest_id == 0xFF)
+			goto out;
+
+		if (irq->dest_id >= ARRAY_SIZE(map->phys_map))
+			goto out;
+
+		dst = map->phys_map[irq->dest_id];
+		if (dst && kvm_apic_present(dst->vcpu))
+			*dest_vcpu = dst->vcpu;
+		else
+			goto out;
+	} else {
+		u16 cid;
+		unsigned long bitmap = 1;
+		int i, r = 0;
+
+		if (!kvm_apic_logical_map_valid(map))
+			goto out;
+
+		apic_logical_id(map, irq->dest_id, &cid, (u16 *)&bitmap);
+
+		if (cid >= ARRAY_SIZE(map->logical_map))
+			goto out;
+
+		for_each_set_bit(i, &bitmap, 16) {
+			dst = map->logical_map[cid][i];
+			if (++r == 2)
+				goto out;
+		}
+
+		if (dst && kvm_apic_present(dst->vcpu))
+			*dest_vcpu = dst->vcpu;
+		else
+			goto out;
+	}
+
+	ret = true;
+out:
+	rcu_read_unlock();
+	return ret;
+}
+
 /*
  * Add a pending IRQ into lapic.
  * Return 1 if successfully added and 0 if discarded.
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 7195274..032fe2d 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -169,4 +169,6 @@ bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
 
 void wait_lapic_expire(struct kvm_vcpu *vcpu);
 
+bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq,
+			struct kvm_vcpu **dest_vcpu);
 #endif
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v9 10/18] KVM: Make struct kvm_irq_routing_table accessible
  2015-09-18 14:29 [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series Feng Wu
                   ` (8 preceding siblings ...)
  2015-09-18 14:29 ` [PATCH v9 09/18] KVM: Define a new interface kvm_intr_is_single_vcpu() Feng Wu
@ 2015-09-18 14:29 ` Feng Wu
  2015-09-18 14:29 ` [PATCH v9 11/18] KVM: make kvm_set_msi_irq() public Feng Wu
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 56+ messages in thread
From: Feng Wu @ 2015-09-18 14:29 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

Move struct kvm_irq_routing_table from irqchip.c to kvm_host.h,
so we can use it outside of irqchip.c.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/linux/kvm_host.h | 14 ++++++++++++++
 virt/kvm/irqchip.c       | 10 ----------
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5ac8d21..5f183fb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -328,6 +328,20 @@ struct kvm_kernel_irq_routing_entry {
 	struct hlist_node link;
 };
 
+#ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
+
+struct kvm_irq_routing_table {
+	int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
+	u32 nr_rt_entries;
+	/*
+	 * Array indexed by gsi. Each entry contains list of irq chips
+	 * the gsi is connected to.
+	 */
+	struct hlist_head map[0];
+};
+
+#endif
+
 #ifndef KVM_PRIVATE_MEM_SLOTS
 #define KVM_PRIVATE_MEM_SLOTS 0
 #endif
diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
index 21c1424..2cf45d3 100644
--- a/virt/kvm/irqchip.c
+++ b/virt/kvm/irqchip.c
@@ -31,16 +31,6 @@
 #include <trace/events/kvm.h>
 #include "irq.h"
 
-struct kvm_irq_routing_table {
-	int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
-	u32 nr_rt_entries;
-	/*
-	 * Array indexed by gsi. Each entry contains list of irq chips
-	 * the gsi is connected to.
-	 */
-	struct hlist_head map[0];
-};
-
 int kvm_irq_map_gsi(struct kvm *kvm,
 		    struct kvm_kernel_irq_routing_entry *entries, int gsi)
 {
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v9 11/18] KVM: make kvm_set_msi_irq() public
  2015-09-18 14:29 [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series Feng Wu
                   ` (9 preceding siblings ...)
  2015-09-18 14:29 ` [PATCH v9 10/18] KVM: Make struct kvm_irq_routing_table accessible Feng Wu
@ 2015-09-18 14:29 ` Feng Wu
  2015-09-18 14:29 ` [PATCH v9 12/18] vfio: Register/unregister irq_bypass_producer Feng Wu
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 56+ messages in thread
From: Feng Wu @ 2015-09-18 14:29 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

Make kvm_set_msi_irq() public, we can use this function outside.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
v8:
- Export kvm_set_msi_irq() so we can use it in vmx code

 arch/x86/include/asm/kvm_host.h | 4 ++++
 arch/x86/kvm/irq_comm.c         | 5 +++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index af11bca..daa6126 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -175,6 +175,8 @@ enum {
  */
 #define KVM_APIC_PV_EOI_PENDING	1
 
+struct kvm_kernel_irq_routing_entry;
+
 /*
  * We don't want allocation failures within the mmu code, so we preallocate
  * enough memory for a single page fault in a cache.
@@ -1207,4 +1209,6 @@ int x86_set_memory_region(struct kvm *kvm,
 bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
 			     struct kvm_vcpu **dest_vcpu);
 
+void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
+		     struct kvm_lapic_irq *irq);
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index f86a0da..4f6fa67 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -91,8 +91,8 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
 	return r;
 }
 
-static inline void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
-				   struct kvm_lapic_irq *irq)
+void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
+		     struct kvm_lapic_irq *irq)
 {
 	trace_kvm_msi_set_irq(e->msi.address_lo, e->msi.data);
 
@@ -108,6 +108,7 @@ static inline void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
 	irq->level = 1;
 	irq->shorthand = 0;
 }
+EXPORT_SYMBOL_GPL(kvm_set_msi_irq);
 
 int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
 		struct kvm *kvm, int irq_source_id, int level, bool line_status)
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v9 12/18] vfio: Register/unregister irq_bypass_producer
  2015-09-18 14:29 [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series Feng Wu
                   ` (10 preceding siblings ...)
  2015-09-18 14:29 ` [PATCH v9 11/18] KVM: make kvm_set_msi_irq() public Feng Wu
@ 2015-09-18 14:29 ` Feng Wu
  2015-09-18 17:19   ` Alex Williamson
                     ` (2 more replies)
  2015-09-18 14:29 ` [PATCH v9 13/18] KVM: x86: Update IRTE for posted-interrupts Feng Wu
                   ` (7 subsequent siblings)
  19 siblings, 3 replies; 56+ messages in thread
From: Feng Wu @ 2015-09-18 14:29 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

This patch adds the registration/unregistration of an
irq_bypass_producer for MSI/MSIx on vfio pci devices.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v8:
- Merge "[PATCH v7 08/17] vfio: Select IRQ_BYPASS_MANAGER for vfio PCI devices"
  into this patch.

v6:
- Make the add_consumer and del_consumer callbacks static
- Remove pointless INIT_LIST_HEAD to 'vdev->ctx[vector].producer.node)'
- Use dev_info instead of WARN_ON() when irq_bypass_register_producer fails
- Remove optional dummy callbacks for irq producer

 drivers/vfio/pci/Kconfig            | 1 +
 drivers/vfio/pci/vfio_pci_intrs.c   | 9 +++++++++
 drivers/vfio/pci/vfio_pci_private.h | 2 ++
 3 files changed, 12 insertions(+)

diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 579d83b..02912f1 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -2,6 +2,7 @@ config VFIO_PCI
 	tristate "VFIO support for PCI devices"
 	depends on VFIO && PCI && EVENTFD
 	select VFIO_VIRQFD
+	select IRQ_BYPASS_MANAGER
 	help
 	  Support for the PCI VFIO bus driver.  This is required to make
 	  use of PCI drivers using the VFIO framework.
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 1f577b4..c65299d 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -319,6 +319,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
 
 	if (vdev->ctx[vector].trigger) {
 		free_irq(irq, vdev->ctx[vector].trigger);
+		irq_bypass_unregister_producer(&vdev->ctx[vector].producer);
 		kfree(vdev->ctx[vector].name);
 		eventfd_ctx_put(vdev->ctx[vector].trigger);
 		vdev->ctx[vector].trigger = NULL;
@@ -360,6 +361,14 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
 		return ret;
 	}
 
+	vdev->ctx[vector].producer.token = trigger;
+	vdev->ctx[vector].producer.irq = irq;
+	ret = irq_bypass_register_producer(&vdev->ctx[vector].producer);
+	if (unlikely(ret))
+		dev_info(&pdev->dev,
+		"irq bypass producer (token %p) registeration fails: %d\n",
+		vdev->ctx[vector].producer.token, ret);
+
 	vdev->ctx[vector].trigger = trigger;
 
 	return 0;
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index ae0e1b4..0e7394f 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -13,6 +13,7 @@
 
 #include <linux/mutex.h>
 #include <linux/pci.h>
+#include <linux/irqbypass.h>
 
 #ifndef VFIO_PCI_PRIVATE_H
 #define VFIO_PCI_PRIVATE_H
@@ -29,6 +30,7 @@ struct vfio_pci_irq_ctx {
 	struct virqfd		*mask;
 	char			*name;
 	bool			masked;
+	struct irq_bypass_producer	producer;
 };
 
 struct vfio_pci_device {
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v9 13/18] KVM: x86: Update IRTE for posted-interrupts
  2015-09-18 14:29 [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series Feng Wu
                   ` (11 preceding siblings ...)
  2015-09-18 14:29 ` [PATCH v9 12/18] vfio: Register/unregister irq_bypass_producer Feng Wu
@ 2015-09-18 14:29 ` Feng Wu
  2015-09-18 14:29 ` [PATCH v9 14/18] KVM: Implement IRQ bypass consumer callbacks for x86 Feng Wu
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 56+ messages in thread
From: Feng Wu @ 2015-09-18 14:29 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

This patch adds the routine to update IRTE for posted-interrupts
when guest changes the interrupt configuration.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v9:
- Check !kvm_arch_has_assigned_device(kvm) first then
  !irq_remapping_cap(IRQ_POSTING_CAP)

v8:
- Move 'kvm_arch_update_pi_irte' to vmx.c as a callback
- Only update the PI irte when VM has assigned devices
- Add a trace point for VT-d posted-interrupts when we update
  or disable it for a specific irq.

 arch/x86/include/asm/kvm_host.h |  3 ++
 arch/x86/kvm/trace.h            | 33 ++++++++++++++++
 arch/x86/kvm/vmx.c              | 83 +++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c              |  2 +
 4 files changed, 121 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index daa6126..8c44286 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -862,6 +862,9 @@ struct kvm_x86_ops {
 					   gfn_t offset, unsigned long mask);
 	/* pmu operations of sub-arch */
 	const struct kvm_pmu_ops *pmu_ops;
+
+	int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
+			      uint32_t guest_irq, bool set);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 4eae7c3..539a9e4 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -974,6 +974,39 @@ TRACE_EVENT(kvm_enter_smm,
 		  __entry->smbase)
 );
 
+/*
+ * Tracepoint for VT-d posted-interrupts.
+ */
+TRACE_EVENT(kvm_pi_irte_update,
+	TP_PROTO(unsigned int vcpu_id, unsigned int gsi,
+		 unsigned int gvec, u64 pi_desc_addr, bool set),
+	TP_ARGS(vcpu_id, gsi, gvec, pi_desc_addr, set),
+
+	TP_STRUCT__entry(
+		__field(	unsigned int,	vcpu_id		)
+		__field(	unsigned int,	gsi		)
+		__field(	unsigned int,	gvec		)
+		__field(	u64,		pi_desc_addr	)
+		__field(	bool,		set		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_id	= vcpu_id;
+		__entry->gsi		= gsi;
+		__entry->gvec		= gvec;
+		__entry->pi_desc_addr	= pi_desc_addr;
+		__entry->set		= set;
+	),
+
+	TP_printk("VT-d PI is %s for this irq, vcpu %u, gsi: 0x%x, "
+		  "gvec: 0x%x, pi_desc_addr: 0x%llx",
+		  __entry->set ? "enabled and being updated" : "disabled",
+		  __entry->vcpu_id,
+		  __entry->gsi,
+		  __entry->gvec,
+		  __entry->pi_desc_addr)
+);
+
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 316f9bf..11bda72 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -45,6 +45,7 @@
 #include <asm/debugreg.h>
 #include <asm/kexec.h>
 #include <asm/apic.h>
+#include <asm/irq_remapping.h>
 
 #include "trace.h"
 #include "pmu.h"
@@ -605,6 +606,11 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
 	return container_of(vcpu, struct vcpu_vmx, vcpu);
 }
 
+struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
+{
+	return &(to_vmx(vcpu)->pi_desc);
+}
+
 #define VMCS12_OFFSET(x) offsetof(struct vmcs12, x)
 #define FIELD(number, name)	[number] = VMCS12_OFFSET(name)
 #define FIELD64(number, name)	[number] = VMCS12_OFFSET(name), \
@@ -10344,6 +10350,81 @@ static void vmx_enable_log_dirty_pt_masked(struct kvm *kvm,
 	kvm_mmu_clear_dirty_pt_masked(kvm, memslot, offset, mask);
 }
 
+/*
+ * vmx_update_pi_irte - set IRTE for Posted-Interrupts
+ *
+ * @kvm: kvm
+ * @host_irq: host irq of the interrupt
+ * @guest_irq: gsi of the interrupt
+ * @set: set or unset PI
+ * returns 0 on success, < 0 on failure
+ */
+int vmx_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
+		       uint32_t guest_irq, bool set)
+{
+	struct kvm_kernel_irq_routing_entry *e;
+	struct kvm_irq_routing_table *irq_rt;
+	struct kvm_lapic_irq irq;
+	struct kvm_vcpu *vcpu;
+	struct vcpu_data vcpu_info;
+	int idx, ret = -EINVAL;
+
+	if (!kvm_arch_has_assigned_device(kvm) ||
+		!irq_remapping_cap(IRQ_POSTING_CAP))
+		return 0;
+
+	idx = srcu_read_lock(&kvm->irq_srcu);
+	irq_rt = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu);
+	BUG_ON(guest_irq >= irq_rt->nr_rt_entries);
+
+	hlist_for_each_entry(e, &irq_rt->map[guest_irq], link) {
+		if (e->type != KVM_IRQ_ROUTING_MSI)
+			continue;
+		/*
+		 * VT-d PI cannot support posting multicast/broadcast
+		 * interrupts to a vCPU, we still use interrupt remapping
+		 * for these kind of interrupts.
+		 *
+		 * For lowest-priority interrupts, we only support
+		 * those with single CPU as the destination, e.g. user
+		 * configures the interrupts via /proc/irq or uses
+		 * irqbalance to make the interrupts single-CPU.
+		 *
+		 * We will support full lowest-priority interrupt later.
+		 */
+
+		kvm_set_msi_irq(e, &irq);
+		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
+			continue;
+
+		vcpu_info.pi_desc_addr = __pa(vcpu_to_pi_desc(vcpu));
+		vcpu_info.vector = irq.vector;
+
+		trace_kvm_pi_irte_update(vcpu->vcpu_id, e->gsi,
+				vcpu_info.vector, vcpu_info.pi_desc_addr, set);
+
+		if (set)
+			ret = irq_set_vcpu_affinity(host_irq, &vcpu_info);
+		else {
+			/* suppress notification event before unposting */
+			pi_set_sn(vcpu_to_pi_desc(vcpu));
+			ret = irq_set_vcpu_affinity(host_irq, NULL);
+			pi_clear_sn(vcpu_to_pi_desc(vcpu));
+		}
+
+		if (ret < 0) {
+			printk(KERN_INFO "%s: failed to update PI IRTE\n",
+					__func__);
+			goto out;
+		}
+	}
+
+	ret = 0;
+out:
+	srcu_read_unlock(&kvm->irq_srcu, idx);
+	return ret;
+}
+
 static struct kvm_x86_ops vmx_x86_ops = {
 	.cpu_has_kvm_support = cpu_has_kvm_support,
 	.disabled_by_bios = vmx_disabled_by_bios,
@@ -10461,6 +10542,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
 	.enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked,
 
 	.pmu_ops = &intel_pmu_ops,
+
+	.update_pi_irte = vmx_update_pi_irte,
 };
 
 static int __init vmx_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5ef2560..9dcd501 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -63,6 +63,7 @@
 #include <asm/fpu/internal.h> /* Ugh! */
 #include <asm/pvclock.h>
 #include <asm/div64.h>
+#include <asm/irq_remapping.h>
 
 #define MAX_IO_MSRS 256
 #define KVM_MAX_MCE_BANKS 32
@@ -8263,3 +8264,4 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_intercepts);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_write_tsc_offset);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_ple_window);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pml_full);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pi_irte_update);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v9 14/18] KVM: Implement IRQ bypass consumer callbacks for x86
  2015-09-18 14:29 [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series Feng Wu
                   ` (12 preceding siblings ...)
  2015-09-18 14:29 ` [PATCH v9 13/18] KVM: x86: Update IRTE for posted-interrupts Feng Wu
@ 2015-09-18 14:29 ` Feng Wu
  2015-09-18 14:29 ` [PATCH v9 15/18] KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd' Feng Wu
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 56+ messages in thread
From: Feng Wu @ 2015-09-18 14:29 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

Implement the following callbacks for x86:

- kvm_arch_irq_bypass_add_producer
- kvm_arch_irq_bypass_del_producer
- kvm_arch_irq_bypass_stop: dummy callback
- kvm_arch_irq_bypass_resume: dummy callback

and set CONFIG_HAVE_KVM_IRQ_BYPASS for x86.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v8:
- Move the weak irq bypas stop and irq bypass start to this patch.
- Call kvm_x86_ops->update_pi_irte() instead of kvm_arch_update_pi_irte().

 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/Kconfig            |  1 +
 arch/x86/kvm/x86.c              | 44 +++++++++++++++++++++++++++++++++++++++++
 virt/kvm/eventfd.c              | 12 +++++++++++
 4 files changed, 58 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8c44286..0ddd353 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -24,6 +24,7 @@
 #include <linux/perf_event.h>
 #include <linux/pvclock_gtod.h>
 #include <linux/clocksource.h>
+#include <linux/irqbypass.h>
 
 #include <asm/pvclock-abi.h>
 #include <asm/desc.h>
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index c951d44..b90776f 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -30,6 +30,7 @@ config KVM
 	select HAVE_KVM_IRQCHIP
 	select HAVE_KVM_IRQFD
 	select IRQ_BYPASS_MANAGER
+	select HAVE_KVM_IRQ_BYPASS
 	select HAVE_KVM_IRQ_ROUTING
 	select HAVE_KVM_EVENTFD
 	select KVM_APIC_ARCHITECTURE
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9dcd501..79dac02 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -50,6 +50,8 @@
 #include <linux/pci.h>
 #include <linux/timekeeper_internal.h>
 #include <linux/pvclock_gtod.h>
+#include <linux/kvm_irqfd.h>
+#include <linux/irqbypass.h>
 #include <trace/events/kvm.h>
 
 #define CREATE_TRACE_POINTS
@@ -8249,6 +8251,48 @@ bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
 }
 EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma);
 
+int kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *cons,
+				      struct irq_bypass_producer *prod)
+{
+	struct kvm_kernel_irqfd *irqfd =
+		container_of(cons, struct kvm_kernel_irqfd, consumer);
+
+	if (kvm_x86_ops->update_pi_irte) {
+		irqfd->producer = prod;
+		return kvm_x86_ops->update_pi_irte(irqfd->kvm,
+				prod->irq, irqfd->gsi, 1);
+	}
+
+	return -EINVAL;
+}
+
+void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons,
+				      struct irq_bypass_producer *prod)
+{
+	int ret;
+	struct kvm_kernel_irqfd *irqfd =
+		container_of(cons, struct kvm_kernel_irqfd, consumer);
+
+	if (!kvm_x86_ops->update_pi_irte) {
+		WARN_ON(irqfd->producer != NULL);
+		return;
+	}
+
+	WARN_ON(irqfd->producer != prod);
+	irqfd->producer = NULL;
+
+	/*
+	 * When producer of consumer is unregistered, we change back to
+	 * remapped mode, so we can re-use the current implementation
+	 * when the irq is masked/disabed or the consumer side (KVM
+	 * int this case doesn't want to receive the interrupts.
+	*/
+	ret = kvm_x86_ops->update_pi_irte(irqfd->kvm, prod->irq, irqfd->gsi, 0);
+	if (ret)
+		printk(KERN_INFO "irq bypass consumer (token %p) unregistration"
+		       " fails: %d\n", irqfd->consumer.token, ret);
+}
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index d7a230f..c0a56a1 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -256,6 +256,18 @@ static void irqfd_update(struct kvm *kvm, struct kvm_kernel_irqfd *irqfd)
 	write_seqcount_end(&irqfd->irq_entry_sc);
 }
 
+#ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
+void __attribute__((weak)) kvm_arch_irq_bypass_stop(
+				struct irq_bypass_consumer *cons)
+{
+}
+
+void __attribute__((weak)) kvm_arch_irq_bypass_start(
+				struct irq_bypass_consumer *cons)
+{
+}
+#endif
+
 static int
 kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 {
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v9 15/18] KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'
  2015-09-18 14:29 [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series Feng Wu
                   ` (13 preceding siblings ...)
  2015-09-18 14:29 ` [PATCH v9 14/18] KVM: Implement IRQ bypass consumer callbacks for x86 Feng Wu
@ 2015-09-18 14:29 ` Feng Wu
  2015-09-18 14:29 ` [PATCH v9 16/18] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted Feng Wu
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 56+ messages in thread
From: Feng Wu @ 2015-09-18 14:29 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

This patch adds an arch specific hooks 'arch_update' in
'struct kvm_kernel_irqfd'. On Intel side, it is used to
update the IRTE when VT-d posted-interrupts is used.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v9:
- Use 'if' instead of "? :" in kvm_arch_update_irqfd_routing()
- coding style

v8:
- Remove callback .arch_update()
- Remove kvm_arch_irqfd_init()
- Call kvm_arch_update_irqfd_routing() instead.

 arch/x86/kvm/x86.c       |  9 +++++++++
 include/linux/kvm_host.h |  2 ++
 virt/kvm/eventfd.c       | 20 +++++++++++++++++++-
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 79dac02..58688aa 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8293,6 +8293,15 @@ void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons,
 		       " fails: %d\n", irqfd->consumer.token, ret);
 }
 
+int kvm_arch_update_irqfd_routing(struct kvm *kvm, unsigned int host_irq,
+				   uint32_t guest_irq, bool set)
+{
+	if (!kvm_x86_ops->update_pi_irte)
+		return -EINVAL;
+
+	return kvm_x86_ops->update_pi_irte(kvm, host_irq, guest_irq, set);
+}
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5f183fb..feba1fb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1174,6 +1174,8 @@ void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *,
 			   struct irq_bypass_producer *);
 void kvm_arch_irq_bypass_stop(struct irq_bypass_consumer *);
 void kvm_arch_irq_bypass_start(struct irq_bypass_consumer *);
+int kvm_arch_update_irqfd_routing(struct kvm *kvm, unsigned int host_irq,
+				  uint32_t guest_irq, bool set);
 #endif /* CONFIG_HAVE_KVM_IRQ_BYPASS */
 #endif
 
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index c0a56a1..94306a3 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -266,6 +266,13 @@ void __attribute__((weak)) kvm_arch_irq_bypass_start(
 				struct irq_bypass_consumer *cons)
 {
 }
+
+int  __attribute__((weak)) kvm_arch_update_irqfd_routing(
+				struct kvm *kvm, unsigned int host_irq,
+				uint32_t guest_irq, bool set)
+{
+	return 0;
+}
 #endif
 
 static int
@@ -582,13 +589,24 @@ kvm_irqfd_release(struct kvm *kvm)
  */
 void kvm_irq_routing_update(struct kvm *kvm)
 {
+	int ret;
 	struct kvm_kernel_irqfd *irqfd;
 
 	spin_lock_irq(&kvm->irqfds.lock);
 
-	list_for_each_entry(irqfd, &kvm->irqfds.items, list)
+	list_for_each_entry(irqfd, &kvm->irqfds.items, list) {
 		irqfd_update(kvm, irqfd);
 
+#ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
+		if (irqfd->producer) {
+			ret = kvm_arch_update_irqfd_routing(
+					irqfd->kvm, irqfd->producer->irq,
+					irqfd->gsi, 1);
+			WARN_ON(ret);
+		}
+#endif
+	}
+
 	spin_unlock_irq(&kvm->irqfds.lock);
 }
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v9 16/18] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
  2015-09-18 14:29 [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series Feng Wu
                   ` (14 preceding siblings ...)
  2015-09-18 14:29 ` [PATCH v9 15/18] KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd' Feng Wu
@ 2015-09-18 14:29 ` Feng Wu
  2015-09-18 14:29 ` [PATCH v9 17/18] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked Feng Wu
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 56+ messages in thread
From: Feng Wu @ 2015-09-18 14:29 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

This patch updates the Posted-Interrupts Descriptor when vCPU
is preempted.

sched out:
- Set 'SN' to suppress furture non-urgent interrupts posted for
the vCPU.

sched in:
- Clear 'SN'
- Change NDST if vCPU is scheduled to a different CPU
- Set 'NV' to POSTED_INTR_VECTOR

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v9:
- Check !kvm_arch_has_assigned_device(vcpu->kvm) first, then
  !irq_remapping_cap(IRQ_POSTING_CAP)

v8:
- Add two wrapper fucntion vmx_vcpu_pi_load() and vmx_vcpu_pi_put().
- Only handle VT-d PI related logic when the VM has assigned devices.

 arch/x86/kvm/vmx.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 79 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 11bda72..902a67d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1943,6 +1943,52 @@ static void vmx_load_host_state(struct vcpu_vmx *vmx)
 	preempt_enable();
 }
 
+static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
+{
+	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+	struct pi_desc old, new;
+	unsigned int dest;
+
+	if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
+		!irq_remapping_cap(IRQ_POSTING_CAP))
+		return;
+
+	do {
+		old.control = new.control = pi_desc->control;
+
+		/*
+		 * If 'nv' field is POSTED_INTR_WAKEUP_VECTOR, there
+		 * are two possible cases:
+		 * 1. After running 'pre_block', context switch
+		 *    happened. For this case, 'sn' was set in
+		 *    vmx_vcpu_put(), so we need to clear it here.
+		 * 2. After running 'pre_block', we were blocked,
+		 *    and woken up by some other guy. For this case,
+		 *    we don't need to do anything, 'pi_post_block'
+		 *    will do everything for us. However, we cannot
+		 *    check whether it is case #1 or case #2 here
+		 *    (maybe, not needed), so we also clear sn here,
+		 *    I think it is not a big deal.
+		 */
+		if (pi_desc->nv != POSTED_INTR_WAKEUP_VECTOR) {
+			if (vcpu->cpu != cpu) {
+				dest = cpu_physical_id(cpu);
+
+				if (x2apic_enabled())
+					new.ndst = dest;
+				else
+					new.ndst = (dest << 8) & 0xFF00;
+			}
+
+			/* set 'NV' to 'notification vector' */
+			new.nv = POSTED_INTR_VECTOR;
+		}
+
+		/* Allow posting non-urgent interrupts */
+		new.sn = 0;
+	} while (cmpxchg(&pi_desc->control, old.control,
+			new.control) != old.control);
+}
 /*
  * Switches to specified vcpu, until a matching vcpu_put(), but assumes
  * vcpu mutex is already taken.
@@ -1993,10 +2039,27 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 */
 		vmx->loaded_vmcs->cpu = cpu;
 	}
+
+	vmx_vcpu_pi_load(vcpu, cpu);
+}
+
+static void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu)
+{
+	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+
+	if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
+		!irq_remapping_cap(IRQ_POSTING_CAP))
+		return;
+
+	/* Set SN when the vCPU is preempted */
+	if (vcpu->preempted)
+		pi_set_sn(pi_desc);
 }
 
 static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
 {
+	vmx_vcpu_pi_put(vcpu);
+
 	__vmx_load_host_state(to_vmx(vcpu));
 	if (!vmm_exclusive) {
 		__loaded_vmcs_clear(to_vmx(vcpu)->loaded_vmcs);
@@ -4426,6 +4489,22 @@ static inline bool kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu)
 {
 #ifdef CONFIG_SMP
 	if (vcpu->mode == IN_GUEST_MODE) {
+		struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+		/*
+		 * Currently, we don't support urgent interrupt,
+		 * all interrupts are recognized as non-urgent
+		 * interrupt, so we cannot post interrupts when
+		 * 'SN' is set.
+		 *
+		 * If the vcpu is in guest mode, it means it is
+		 * running instead of being scheduled out and
+		 * waiting in the run queue, and that's the only
+		 * case when 'SN' is set currently, warning if
+		 * 'SN' is set.
+		 */
+		WARN_ON_ONCE(pi_test_sn(&vmx->pi_desc));
+
 		apic->send_IPI_mask(get_cpu_mask(vcpu->cpu),
 				POSTED_INTR_VECTOR);
 		return true;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v9 17/18] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-09-18 14:29 [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series Feng Wu
                   ` (15 preceding siblings ...)
  2015-09-18 14:29 ` [PATCH v9 16/18] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted Feng Wu
@ 2015-09-18 14:29 ` Feng Wu
  2015-09-18 16:06   ` Paolo Bonzini
  2015-10-14 23:41   ` David Matlack
  2015-09-18 14:29 ` [PATCH v9 18/18] iommu/vt-d: Add a command line parameter for VT-d posted-interrupts Feng Wu
                   ` (2 subsequent siblings)
  19 siblings, 2 replies; 56+ messages in thread
From: Feng Wu @ 2015-09-18 14:29 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

This patch updates the Posted-Interrupts Descriptor when vCPU
is blocked.

pre-block:
- Add the vCPU to the blocked per-CPU list
- Set 'NV' to POSTED_INTR_WAKEUP_VECTOR

post-block:
- Remove the vCPU from the per-CPU list

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v9:
- Add description for blocked_vcpu_on_cpu_lock in Documentation/virtual/kvm/locking.txt
- Check !kvm_arch_has_assigned_device(vcpu->kvm) first, then
  !irq_remapping_cap(IRQ_POSTING_CAP)

v8:
- Rename 'pi_pre_block' to 'pre_block'
- Rename 'pi_post_block' to 'post_block'
- Change some comments
- Only add the vCPU to the blocking list when the VM has assigned devices.

 Documentation/virtual/kvm/locking.txt |  12 +++
 arch/x86/include/asm/kvm_host.h       |  13 +++
 arch/x86/kvm/vmx.c                    | 153 ++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c                    |  53 +++++++++---
 include/linux/kvm_host.h              |   3 +
 virt/kvm/kvm_main.c                   |   3 +
 6 files changed, 227 insertions(+), 10 deletions(-)

diff --git a/Documentation/virtual/kvm/locking.txt b/Documentation/virtual/kvm/locking.txt
index d68af4d..19f94a6 100644
--- a/Documentation/virtual/kvm/locking.txt
+++ b/Documentation/virtual/kvm/locking.txt
@@ -166,3 +166,15 @@ Comment:	The srcu read lock must be held while accessing memslots (e.g.
 		MMIO/PIO address->device structure mapping (kvm->buses).
 		The srcu index can be stored in kvm_vcpu->srcu_idx per vcpu
 		if it is needed by multiple functions.
+
+Name:		blocked_vcpu_on_cpu_lock
+Type:		spinlock_t
+Arch:		x86
+Protects:	blocked_vcpu_on_cpu
+Comment:	This is a per-CPU lock and it is used for VT-d posted-interrupts.
+		When VT-d posted-interrupts is supported and the VM has assigned
+		devices, we put the blocked vCPU on the list blocked_vcpu_on_cpu
+		protected by blocked_vcpu_on_cpu_lock, when VT-d hardware issues
+		wakeup notification event since external interrupts from the
+		assigned devices happens, we will find the vCPU on the list to
+		wakeup.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0ddd353..304fbb5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -552,6 +552,8 @@ struct kvm_vcpu_arch {
 	 */
 	bool write_fault_to_shadow_pgtable;
 
+	bool halted;
+
 	/* set at EPT violation at this point */
 	unsigned long exit_qualification;
 
@@ -864,6 +866,17 @@ struct kvm_x86_ops {
 	/* pmu operations of sub-arch */
 	const struct kvm_pmu_ops *pmu_ops;
 
+	/*
+	 * Architecture specific hooks for vCPU blocking due to
+	 * HLT instruction.
+	 * Returns for .pre_block():
+	 *    - 0 means continue to block the vCPU.
+	 *    - 1 means we cannot block the vCPU since some event
+	 *        happens during this period, such as, 'ON' bit in
+	 *        posted-interrupts descriptor is set.
+	 */
+	int (*pre_block)(struct kvm_vcpu *vcpu);
+	void (*post_block)(struct kvm_vcpu *vcpu);
 	int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
 			      uint32_t guest_irq, bool set);
 };
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 902a67d..9968896 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -879,6 +879,13 @@ static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
 static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
 static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
 
+/*
+ * We maintian a per-CPU linked-list of vCPU, so in wakeup_handler() we
+ * can find which vCPU should be waken up.
+ */
+static DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
+static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
+
 static unsigned long *vmx_io_bitmap_a;
 static unsigned long *vmx_io_bitmap_b;
 static unsigned long *vmx_msr_bitmap_legacy;
@@ -2985,6 +2992,8 @@ static int hardware_enable(void)
 		return -EBUSY;
 
 	INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
+	INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
+	spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
 
 	/*
 	 * Now we can enable the vmclear operation in kdump
@@ -6121,6 +6130,25 @@ static void update_ple_window_actual_max(void)
 			                    ple_window_grow, INT_MIN);
 }
 
+/*
+ * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
+ */
+static void wakeup_handler(void)
+{
+	struct kvm_vcpu *vcpu;
+	int cpu = smp_processor_id();
+
+	spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
+	list_for_each_entry(vcpu, &per_cpu(blocked_vcpu_on_cpu, cpu),
+			blocked_vcpu_list) {
+		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+
+		if (pi_test_on(pi_desc) == 1)
+			kvm_vcpu_kick(vcpu);
+	}
+	spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
+}
+
 static __init int hardware_setup(void)
 {
 	int r = -ENOMEM, i, msr;
@@ -6305,6 +6333,8 @@ static __init int hardware_setup(void)
 		kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
 	}
 
+	kvm_set_posted_intr_wakeup_handler(wakeup_handler);
+
 	return alloc_kvm_area();
 
 out8:
@@ -10430,6 +10460,126 @@ static void vmx_enable_log_dirty_pt_masked(struct kvm *kvm,
 }
 
 /*
+ * This routine does the following things for vCPU which is going
+ * to be blocked if VT-d PI is enabled.
+ * - Store the vCPU to the wakeup list, so when interrupts happen
+ *   we can find the right vCPU to wake up.
+ * - Change the Posted-interrupt descriptor as below:
+ *      'NDST' <-- vcpu->pre_pcpu
+ *      'NV' <-- POSTED_INTR_WAKEUP_VECTOR
+ * - If 'ON' is set during this process, which means at least one
+ *   interrupt is posted for this vCPU, we cannot block it, in
+ *   this case, return 1, otherwise, return 0.
+ *
+ */
+static int vmx_pre_block(struct kvm_vcpu *vcpu)
+{
+	unsigned long flags;
+	unsigned int dest;
+	struct pi_desc old, new;
+	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+
+	if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
+		!irq_remapping_cap(IRQ_POSTING_CAP))
+		return 0;
+
+	vcpu->pre_pcpu = vcpu->cpu;
+	spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
+			  vcpu->pre_pcpu), flags);
+	list_add_tail(&vcpu->blocked_vcpu_list,
+		      &per_cpu(blocked_vcpu_on_cpu,
+		      vcpu->pre_pcpu));
+	spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
+			       vcpu->pre_pcpu), flags);
+
+	do {
+		old.control = new.control = pi_desc->control;
+
+		/*
+		 * We should not block the vCPU if
+		 * an interrupt is posted for it.
+		 */
+		if (pi_test_on(pi_desc) == 1) {
+			spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
+					  vcpu->pre_pcpu), flags);
+			list_del(&vcpu->blocked_vcpu_list);
+			spin_unlock_irqrestore(
+					&per_cpu(blocked_vcpu_on_cpu_lock,
+					vcpu->pre_pcpu), flags);
+			vcpu->pre_pcpu = -1;
+
+			return 1;
+		}
+
+		WARN((pi_desc->sn == 1),
+		     "Warning: SN field of posted-interrupts "
+		     "is set before blocking\n");
+
+		/*
+		 * Since vCPU can be preempted during this process,
+		 * vcpu->cpu could be different with pre_pcpu, we
+		 * need to set pre_pcpu as the destination of wakeup
+		 * notification event, then we can find the right vCPU
+		 * to wakeup in wakeup handler if interrupts happen
+		 * when the vCPU is in blocked state.
+		 */
+		dest = cpu_physical_id(vcpu->pre_pcpu);
+
+		if (x2apic_enabled())
+			new.ndst = dest;
+		else
+			new.ndst = (dest << 8) & 0xFF00;
+
+		/* set 'NV' to 'wakeup vector' */
+		new.nv = POSTED_INTR_WAKEUP_VECTOR;
+	} while (cmpxchg(&pi_desc->control, old.control,
+			new.control) != old.control);
+
+	return 0;
+}
+
+static void vmx_post_block(struct kvm_vcpu *vcpu)
+{
+	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+	struct pi_desc old, new;
+	unsigned int dest;
+	unsigned long flags;
+
+	if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
+		!irq_remapping_cap(IRQ_POSTING_CAP))
+		return;
+
+	do {
+		old.control = new.control = pi_desc->control;
+
+		dest = cpu_physical_id(vcpu->cpu);
+
+		if (x2apic_enabled())
+			new.ndst = dest;
+		else
+			new.ndst = (dest << 8) & 0xFF00;
+
+		/* Allow posting non-urgent interrupts */
+		new.sn = 0;
+
+		/* set 'NV' to 'notification vector' */
+		new.nv = POSTED_INTR_VECTOR;
+	} while (cmpxchg(&pi_desc->control, old.control,
+			new.control) != old.control);
+
+	if(vcpu->pre_pcpu != -1) {
+		spin_lock_irqsave(
+			&per_cpu(blocked_vcpu_on_cpu_lock,
+			vcpu->pre_pcpu), flags);
+		list_del(&vcpu->blocked_vcpu_list);
+		spin_unlock_irqrestore(
+			&per_cpu(blocked_vcpu_on_cpu_lock,
+			vcpu->pre_pcpu), flags);
+		vcpu->pre_pcpu = -1;
+	}
+}
+
+/*
  * vmx_update_pi_irte - set IRTE for Posted-Interrupts
  *
  * @kvm: kvm
@@ -10620,6 +10770,9 @@ static struct kvm_x86_ops vmx_x86_ops = {
 	.flush_log_dirty = vmx_flush_log_dirty,
 	.enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked,
 
+	.pre_block = vmx_pre_block,
+	.post_block = vmx_post_block,
+
 	.pmu_ops = &intel_pmu_ops,
 
 	.update_pi_irte = vmx_update_pi_irte,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 58688aa..46f55b2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5869,7 +5869,12 @@ int kvm_vcpu_halt(struct kvm_vcpu *vcpu)
 {
 	++vcpu->stat.halt_exits;
 	if (irqchip_in_kernel(vcpu->kvm)) {
-		vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
+		/* Handle posted-interrupt when vCPU is to be halted */
+		if (!kvm_x86_ops->pre_block ||
+				kvm_x86_ops->pre_block(vcpu) == 0) {
+			vcpu->arch.halted = true;
+			vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
+		}
 		return 1;
 	} else {
 		vcpu->run->exit_reason = KVM_EXIT_HLT;
@@ -6518,6 +6523,20 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			kvm_vcpu_reload_apic_access_page(vcpu);
 	}
 
+	/*
+	 * KVM_REQ_EVENT is not set when posted interrupts are set by
+	 * VT-d hardware, so we have to update RVI unconditionally.
+	 */
+	if (kvm_lapic_enabled(vcpu)) {
+		/*
+		 * Update architecture specific hints for APIC
+		 * virtual interrupt delivery.
+		 */
+		if (kvm_x86_ops->hwapic_irr_update)
+			kvm_x86_ops->hwapic_irr_update(vcpu,
+				kvm_lapic_find_highest_irr(vcpu));
+	}
+
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
 		kvm_apic_accept_events(vcpu);
 		if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
@@ -6534,13 +6553,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			kvm_x86_ops->enable_irq_window(vcpu);
 
 		if (kvm_lapic_enabled(vcpu)) {
-			/*
-			 * Update architecture specific hints for APIC
-			 * virtual interrupt delivery.
-			 */
-			if (kvm_x86_ops->hwapic_irr_update)
-				kvm_x86_ops->hwapic_irr_update(vcpu,
-					kvm_lapic_find_highest_irr(vcpu));
 			update_cr8_intercept(vcpu);
 			kvm_lapic_sync_to_vapic(vcpu);
 		}
@@ -6711,10 +6723,31 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
 
 	for (;;) {
 		if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
-		    !vcpu->arch.apf.halted)
+		    !vcpu->arch.apf.halted) {
+			/*
+			 * For some cases, we can get here with
+			 * vcpu->arch.halted being true.
+			 */
+			if (kvm_x86_ops->post_block && vcpu->arch.halted) {
+				kvm_x86_ops->post_block(vcpu);
+				vcpu->arch.halted = false;
+			}
+
 			r = vcpu_enter_guest(vcpu);
-		else
+		} else {
 			r = vcpu_block(kvm, vcpu);
+
+			/*
+			 * post_block() must be called after
+			 * pre_block() which is called in
+			 * kvm_vcpu_halt().
+			 */
+			if (kvm_x86_ops->post_block && vcpu->arch.halted) {
+				kvm_x86_ops->post_block(vcpu);
+				vcpu->arch.halted = false;
+			}
+		}
+
 		if (r <= 0)
 			break;
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index feba1fb..bf462e7 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -231,6 +231,9 @@ struct kvm_vcpu {
 	unsigned long requests;
 	unsigned long guest_debug;
 
+	int pre_pcpu;
+	struct list_head blocked_vcpu_list;
+
 	struct mutex mutex;
 	struct kvm_run *run;
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8b8a444..191c7eb 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -220,6 +220,9 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
 	init_waitqueue_head(&vcpu->wq);
 	kvm_async_pf_vcpu_init(vcpu);
 
+	vcpu->pre_pcpu = -1;
+	INIT_LIST_HEAD(&vcpu->blocked_vcpu_list);
+
 	page = alloc_page(GFP_KERNEL | __GFP_ZERO);
 	if (!page) {
 		r = -ENOMEM;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v9 18/18] iommu/vt-d: Add a command line parameter for VT-d posted-interrupts
  2015-09-18 14:29 [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series Feng Wu
                   ` (16 preceding siblings ...)
  2015-09-18 14:29 ` [PATCH v9 17/18] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked Feng Wu
@ 2015-09-18 14:29 ` Feng Wu
  2015-09-21 13:46   ` Joerg Roedel
  2015-09-18 14:58 ` [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series Paolo Bonzini
  2015-09-25  1:49 ` Wu, Feng
  19 siblings, 1 reply; 56+ messages in thread
From: Feng Wu @ 2015-09-18 14:29 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

Enable VT-d Posted-Interrtups and add a command line
parameter for it.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/kernel-parameters.txt |  1 +
 drivers/iommu/irq_remapping.c       | 12 ++++++++----
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 1d6f045..52aca36 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1547,6 +1547,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			nosid	disable Source ID checking
 			no_x2apic_optout
 				BIOS x2APIC opt-out request will be ignored
+			nopost	disable Interrupt Posting
 
 	iomem=		Disable strict checking of access to MMIO memory
 		strict	regions from userspace.
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index 2d99930..d8c3997 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -22,7 +22,7 @@ int irq_remap_broken;
 int disable_sourceid_checking;
 int no_x2apic_optout;
 
-int disable_irq_post = 1;
+int disable_irq_post = 0;
 
 static int disable_irq_remap;
 static struct irq_remap_ops *remap_ops;
@@ -58,14 +58,18 @@ static __init int setup_irqremap(char *str)
 		return -EINVAL;
 
 	while (*str) {
-		if (!strncmp(str, "on", 2))
+		if (!strncmp(str, "on", 2)) {
 			disable_irq_remap = 0;
-		else if (!strncmp(str, "off", 3))
+			disable_irq_post = 0;
+		} else if (!strncmp(str, "off", 3)) {
 			disable_irq_remap = 1;
-		else if (!strncmp(str, "nosid", 5))
+			disable_irq_post = 1;
+		} else if (!strncmp(str, "nosid", 5))
 			disable_sourceid_checking = 1;
 		else if (!strncmp(str, "no_x2apic_optout", 16))
 			no_x2apic_optout = 1;
+		else if (!strncmp(str, "nopost", 6))
+			disable_irq_post = 1;
 
 		str += strcspn(str, ",");
 		while (*str == ',')
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series
  2015-09-18 14:29 [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series Feng Wu
                   ` (17 preceding siblings ...)
  2015-09-18 14:29 ` [PATCH v9 18/18] iommu/vt-d: Add a command line parameter for VT-d posted-interrupts Feng Wu
@ 2015-09-18 14:58 ` Paolo Bonzini
  2015-09-18 15:08   ` Wu, Feng
  2015-09-18 17:57   ` Alex Williamson
  2015-09-25  1:49 ` Wu, Feng
  19 siblings, 2 replies; 56+ messages in thread
From: Paolo Bonzini @ 2015-09-18 14:58 UTC (permalink / raw)
  To: Feng Wu, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel



On 18/09/2015 16:29, Feng Wu wrote:
> VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> With VT-d Posted-Interrupts enabled, external interrupts from
> direct-assigned devices can be delivered to guests without VMM
> intervention when guest is running in non-root mode.
> 
> You can find the VT-d Posted-Interrtups Spec. in the following URL:
> http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html

Thanks.  I will squash patches 2 and 14 together, and drop patch 3.

Signed-off-bys are missing in patch 1 and 4.  The patches exist
elsewhere in the mailing list archives, so not a big deal.  Or just
reply to them with the S-o-b line.

Alex, can you ack the series and review patch 12?

Joerg, can you ack patch 18?

Paolo

> v9:
> - Include the whole series:
> [01/18]: irq bypasser manager
> [02/18] - [06/18]: Common non-architecture part for VT-d PI and ARM side forwarded irq
> [07/18] - [18/18]: VT-d PI part
> 
> v8:
> refer to the changelog in each patch
> 
> v7:
> * Define two weak irq bypass callbacks:
>   - kvm_arch_irq_bypass_start()
>   - kvm_arch_irq_bypass_stop()
> * Remove the x86 dummy implementation of the above two functions.
> * Print some useful information instead of WARN_ON() when the
>   irq bypass consumer unregistration fails.
> * Fix an issue when calling pi_pre_block and pi_post_block.
> 
> v6:
> * Rebase on 4.2.0-rc6
> * Rebase on https://lkml.org/lkml/2015/8/6/526 and http://www.gossamer-threads.com/lists/linux/kernel/2235623
> * Make the add_consumer and del_consumer callbacks static
> * Remove pointless INIT_LIST_HEAD to 'vdev->ctx[vector].producer.node)'
> * Use dev_info instead of WARN_ON() when irq_bypass_register_producer fails
> * Remove optional dummy callbacks for irq producer
> 
> v4:
> * For lowest-priority interrupt, only support single-CPU destination
> interrupts at the current stage, more common lowest priority support
> will be added later.
> * Accoring to Marcelo's suggestion, when vCPU is blocked, we handle
> the posted-interrupts in the HLT emulation path.
> * Some small changes (coding style, typo, add some code comments)
> 
> v3:
> * Adjust the Posted-interrupts Descriptor updating logic when vCPU is
>   preempted or blocked.
> * KVM_DEV_VFIO_DEVICE_POSTING_IRQ --> KVM_DEV_VFIO_DEVICE_POST_IRQ
> * __KVM_HAVE_ARCH_KVM_VFIO_POSTING --> __KVM_HAVE_ARCH_KVM_VFIO_POST
> * Add KVM_DEV_VFIO_DEVICE_UNPOST_IRQ attribute for VFIO irq, which
>   can be used to change back to remapping mode.
> * Fix typo
> 
> v2:
> * Use VFIO framework to enable this feature, the VFIO part of this series is
>   base on Eric's patch "[PATCH v3 0/8] KVM-VFIO IRQ forward control"
> * Rebase this patchset on git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git,
>   then revise some irq logic based on the new hierarchy irqdomain patches provided
>   by Jiang Liu <jiang.liu@linux.intel.com>
> 
> 
> *** BLURB HERE ***
> 
> Alex Williamson (1):
>   virt: IRQ bypass manager
> 
> Eric Auger (4):
>   KVM: arm/arm64: select IRQ_BYPASS_MANAGER
>   KVM: create kvm_irqfd.h
>   KVM: introduce kvm_arch functions for IRQ bypass
>   KVM: eventfd: add irq bypass consumer management
> 
> Feng Wu (13):
>   KVM: x86: select IRQ_BYPASS_MANAGER
>   KVM: Extend struct pi_desc for VT-d Posted-Interrupts
>   KVM: Add some helper functions for Posted-Interrupts
>   KVM: Define a new interface kvm_intr_is_single_vcpu()
>   KVM: Make struct kvm_irq_routing_table accessible
>   KVM: make kvm_set_msi_irq() public
>   vfio: Register/unregister irq_bypass_producer
>   KVM: x86: Update IRTE for posted-interrupts
>   KVM: Implement IRQ bypass consumer callbacks for x86
>   KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'
>   KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
>   KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
>   iommu/vt-d: Add a command line parameter for VT-d posted-interrupts
> 
>  Documentation/kernel-parameters.txt   |   1 +
>  Documentation/virtual/kvm/locking.txt |  12 ++
>  MAINTAINERS                           |   7 +
>  arch/arm/kvm/Kconfig                  |   2 +
>  arch/arm/kvm/Makefile                 |   1 +
>  arch/arm64/kvm/Kconfig                |   2 +
>  arch/arm64/kvm/Makefile               |   1 +
>  arch/x86/include/asm/kvm_host.h       |  24 +++
>  arch/x86/kvm/Kconfig                  |   3 +
>  arch/x86/kvm/Makefile                 |   3 +
>  arch/x86/kvm/irq_comm.c               |  32 ++-
>  arch/x86/kvm/lapic.c                  |  59 ++++++
>  arch/x86/kvm/lapic.h                  |   2 +
>  arch/x86/kvm/trace.h                  |  33 ++++
>  arch/x86/kvm/vmx.c                    | 361 +++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/x86.c                    | 108 +++++++++-
>  drivers/iommu/irq_remapping.c         |  12 +-
>  drivers/vfio/pci/Kconfig              |   1 +
>  drivers/vfio/pci/vfio_pci_intrs.c     |   9 +
>  drivers/vfio/pci/vfio_pci_private.h   |   2 +
>  include/linux/irqbypass.h             |  90 +++++++++
>  include/linux/kvm_host.h              |  29 +++
>  include/linux/kvm_irqfd.h             |  71 +++++++
>  virt/kvm/Kconfig                      |   3 +
>  virt/kvm/eventfd.c                    | 142 +++++++------
>  virt/kvm/irqchip.c                    |  10 -
>  virt/kvm/kvm_main.c                   |   3 +
>  virt/lib/Kconfig                      |   2 +
>  virt/lib/Makefile                     |   1 +
>  virt/lib/irqbypass.c                  | 257 ++++++++++++++++++++++++
>  30 files changed, 1182 insertions(+), 101 deletions(-)
>  create mode 100644 include/linux/irqbypass.h
>  create mode 100644 include/linux/kvm_irqfd.h
>  create mode 100644 virt/lib/Kconfig
>  create mode 100644 virt/lib/Makefile
>  create mode 100644 virt/lib/irqbypass.c
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series
  2015-09-18 14:58 ` [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series Paolo Bonzini
@ 2015-09-18 15:08   ` Wu, Feng
  2015-09-18 15:21     ` Paolo Bonzini
  2015-09-18 17:57   ` Alex Williamson
  1 sibling, 1 reply; 56+ messages in thread
From: Wu, Feng @ 2015-09-18 15:08 UTC (permalink / raw)
  To: Paolo Bonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Wu, Feng



> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Friday, September 18, 2015 10:59 PM
> To: Wu, Feng; alex.williamson@redhat.com; joro@8bytes.org;
> mtosatti@redhat.com
> Cc: eric.auger@linaro.org; kvm@vger.kernel.org;
> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including
> prerequisite series
> 
> 
> 
> On 18/09/2015 16:29, Feng Wu wrote:
> > VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> > With VT-d Posted-Interrupts enabled, external interrupts from
> > direct-assigned devices can be delivered to guests without VMM
> > intervention when guest is running in non-root mode.
> >
> > You can find the VT-d Posted-Interrtups Spec. in the following URL:
> >
> http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolog
> y/vt-directed-io-spec.html
> 
> Thanks.  I will squash patches 2 and 14 together, and drop patch 3.
> 
> Signed-off-bys are missing in patch 1 and 4.  The patches exist
> elsewhere in the mailing list archives, so not a big deal.  Or just
> reply to them with the S-o-b line.
> 

Thanks for your quick response, Paolo! I didn't change the code
in patch 1 and 4, do I need to add s-o-b, if needed, I can reply
the patches.

Thanks,
Feng

> Alex, can you ack the series and review patch 12?
> 
> Joerg, can you ack patch 18?
> 
> Paolo
> 
> > v9:
> > - Include the whole series:
> > [01/18]: irq bypasser manager
> > [02/18] - [06/18]: Common non-architecture part for VT-d PI and ARM side
> forwarded irq
> > [07/18] - [18/18]: VT-d PI part
> >
> > v8:
> > refer to the changelog in each patch
> >
> > v7:
> > * Define two weak irq bypass callbacks:
> >   - kvm_arch_irq_bypass_start()
> >   - kvm_arch_irq_bypass_stop()
> > * Remove the x86 dummy implementation of the above two functions.
> > * Print some useful information instead of WARN_ON() when the
> >   irq bypass consumer unregistration fails.
> > * Fix an issue when calling pi_pre_block and pi_post_block.
> >
> > v6:
> > * Rebase on 4.2.0-rc6
> > * Rebase on https://lkml.org/lkml/2015/8/6/526 and
> http://www.gossamer-threads.com/lists/linux/kernel/2235623
> > * Make the add_consumer and del_consumer callbacks static
> > * Remove pointless INIT_LIST_HEAD to 'vdev->ctx[vector].producer.node)'
> > * Use dev_info instead of WARN_ON() when irq_bypass_register_producer
> fails
> > * Remove optional dummy callbacks for irq producer
> >
> > v4:
> > * For lowest-priority interrupt, only support single-CPU destination
> > interrupts at the current stage, more common lowest priority support
> > will be added later.
> > * Accoring to Marcelo's suggestion, when vCPU is blocked, we handle
> > the posted-interrupts in the HLT emulation path.
> > * Some small changes (coding style, typo, add some code comments)
> >
> > v3:
> > * Adjust the Posted-interrupts Descriptor updating logic when vCPU is
> >   preempted or blocked.
> > * KVM_DEV_VFIO_DEVICE_POSTING_IRQ -->
> KVM_DEV_VFIO_DEVICE_POST_IRQ
> > * __KVM_HAVE_ARCH_KVM_VFIO_POSTING -->
> __KVM_HAVE_ARCH_KVM_VFIO_POST
> > * Add KVM_DEV_VFIO_DEVICE_UNPOST_IRQ attribute for VFIO irq, which
> >   can be used to change back to remapping mode.
> > * Fix typo
> >
> > v2:
> > * Use VFIO framework to enable this feature, the VFIO part of this series is
> >   base on Eric's patch "[PATCH v3 0/8] KVM-VFIO IRQ forward control"
> > * Rebase this patchset on
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git,
> >   then revise some irq logic based on the new hierarchy irqdomain patches
> provided
> >   by Jiang Liu <jiang.liu@linux.intel.com>
> >
> >
> > *** BLURB HERE ***
> >
> > Alex Williamson (1):
> >   virt: IRQ bypass manager
> >
> > Eric Auger (4):
> >   KVM: arm/arm64: select IRQ_BYPASS_MANAGER
> >   KVM: create kvm_irqfd.h
> >   KVM: introduce kvm_arch functions for IRQ bypass
> >   KVM: eventfd: add irq bypass consumer management
> >
> > Feng Wu (13):
> >   KVM: x86: select IRQ_BYPASS_MANAGER
> >   KVM: Extend struct pi_desc for VT-d Posted-Interrupts
> >   KVM: Add some helper functions for Posted-Interrupts
> >   KVM: Define a new interface kvm_intr_is_single_vcpu()
> >   KVM: Make struct kvm_irq_routing_table accessible
> >   KVM: make kvm_set_msi_irq() public
> >   vfio: Register/unregister irq_bypass_producer
> >   KVM: x86: Update IRTE for posted-interrupts
> >   KVM: Implement IRQ bypass consumer callbacks for x86
> >   KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'
> >   KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
> >   KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
> >   iommu/vt-d: Add a command line parameter for VT-d posted-interrupts
> >
> >  Documentation/kernel-parameters.txt   |   1 +
> >  Documentation/virtual/kvm/locking.txt |  12 ++
> >  MAINTAINERS                           |   7 +
> >  arch/arm/kvm/Kconfig                  |   2 +
> >  arch/arm/kvm/Makefile                 |   1 +
> >  arch/arm64/kvm/Kconfig                |   2 +
> >  arch/arm64/kvm/Makefile               |   1 +
> >  arch/x86/include/asm/kvm_host.h       |  24 +++
> >  arch/x86/kvm/Kconfig                  |   3 +
> >  arch/x86/kvm/Makefile                 |   3 +
> >  arch/x86/kvm/irq_comm.c               |  32 ++-
> >  arch/x86/kvm/lapic.c                  |  59 ++++++
> >  arch/x86/kvm/lapic.h                  |   2 +
> >  arch/x86/kvm/trace.h                  |  33 ++++
> >  arch/x86/kvm/vmx.c                    | 361
> +++++++++++++++++++++++++++++++++-
> >  arch/x86/kvm/x86.c                    | 108 +++++++++-
> >  drivers/iommu/irq_remapping.c         |  12 +-
> >  drivers/vfio/pci/Kconfig              |   1 +
> >  drivers/vfio/pci/vfio_pci_intrs.c     |   9 +
> >  drivers/vfio/pci/vfio_pci_private.h   |   2 +
> >  include/linux/irqbypass.h             |  90 +++++++++
> >  include/linux/kvm_host.h              |  29 +++
> >  include/linux/kvm_irqfd.h             |  71 +++++++
> >  virt/kvm/Kconfig                      |   3 +
> >  virt/kvm/eventfd.c                    | 142 +++++++------
> >  virt/kvm/irqchip.c                    |  10 -
> >  virt/kvm/kvm_main.c                   |   3 +
> >  virt/lib/Kconfig                      |   2 +
> >  virt/lib/Makefile                     |   1 +
> >  virt/lib/irqbypass.c                  | 257
> ++++++++++++++++++++++++
> >  30 files changed, 1182 insertions(+), 101 deletions(-)
> >  create mode 100644 include/linux/irqbypass.h
> >  create mode 100644 include/linux/kvm_irqfd.h
> >  create mode 100644 virt/lib/Kconfig
> >  create mode 100644 virt/lib/Makefile
> >  create mode 100644 virt/lib/irqbypass.c
> >

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series
  2015-09-18 15:08   ` Wu, Feng
@ 2015-09-18 15:21     ` Paolo Bonzini
  2015-09-18 15:38       ` Wu, Feng
  0 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2015-09-18 15:21 UTC (permalink / raw)
  To: Wu, Feng, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel



On 18/09/2015 17:08, Wu, Feng wrote:
> 
> 
>> -----Original Message-----
>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
>> Sent: Friday, September 18, 2015 10:59 PM
>> To: Wu, Feng; alex.williamson@redhat.com; joro@8bytes.org;
>> mtosatti@redhat.com
>> Cc: eric.auger@linaro.org; kvm@vger.kernel.org;
>> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org
>> Subject: Re: [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including
>> prerequisite series
>>
>>
>>
>> On 18/09/2015 16:29, Feng Wu wrote:
>>> VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
>>> With VT-d Posted-Interrupts enabled, external interrupts from
>>> direct-assigned devices can be delivered to guests without VMM
>>> intervention when guest is running in non-root mode.
>>>
>>> You can find the VT-d Posted-Interrtups Spec. in the following URL:
>>>
>> http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolog
>> y/vt-directed-io-spec.html
>>
>> Thanks.  I will squash patches 2 and 14 together, and drop patch 3.
>>
>> Signed-off-bys are missing in patch 1 and 4.  The patches exist
>> elsewhere in the mailing list archives, so not a big deal.  Or just
>> reply to them with the S-o-b line.
>>
> 
> Thanks for your quick response, Paolo! I didn't change the code
> in patch 1 and 4, do I need to add s-o-b, if needed, I can reply
> the patches.

Yes, the s-o-b just means that the code passed through your hands.

Note that I replied to patch 17, but no need to resend that one
either---just mailing list discussion is enough.

Paolo

> Thanks,
> Feng
> 
>> Alex, can you ack the series and review patch 12?
>>
>> Joerg, can you ack patch 18?
>>
>> Paolo
>>
>>> v9:
>>> - Include the whole series:
>>> [01/18]: irq bypasser manager
>>> [02/18] - [06/18]: Common non-architecture part for VT-d PI and ARM side
>> forwarded irq
>>> [07/18] - [18/18]: VT-d PI part
>>>
>>> v8:
>>> refer to the changelog in each patch
>>>
>>> v7:
>>> * Define two weak irq bypass callbacks:
>>>   - kvm_arch_irq_bypass_start()
>>>   - kvm_arch_irq_bypass_stop()
>>> * Remove the x86 dummy implementation of the above two functions.
>>> * Print some useful information instead of WARN_ON() when the
>>>   irq bypass consumer unregistration fails.
>>> * Fix an issue when calling pi_pre_block and pi_post_block.
>>>
>>> v6:
>>> * Rebase on 4.2.0-rc6
>>> * Rebase on https://lkml.org/lkml/2015/8/6/526 and
>> http://www.gossamer-threads.com/lists/linux/kernel/2235623
>>> * Make the add_consumer and del_consumer callbacks static
>>> * Remove pointless INIT_LIST_HEAD to 'vdev->ctx[vector].producer.node)'
>>> * Use dev_info instead of WARN_ON() when irq_bypass_register_producer
>> fails
>>> * Remove optional dummy callbacks for irq producer
>>>
>>> v4:
>>> * For lowest-priority interrupt, only support single-CPU destination
>>> interrupts at the current stage, more common lowest priority support
>>> will be added later.
>>> * Accoring to Marcelo's suggestion, when vCPU is blocked, we handle
>>> the posted-interrupts in the HLT emulation path.
>>> * Some small changes (coding style, typo, add some code comments)
>>>
>>> v3:
>>> * Adjust the Posted-interrupts Descriptor updating logic when vCPU is
>>>   preempted or blocked.
>>> * KVM_DEV_VFIO_DEVICE_POSTING_IRQ -->
>> KVM_DEV_VFIO_DEVICE_POST_IRQ
>>> * __KVM_HAVE_ARCH_KVM_VFIO_POSTING -->
>> __KVM_HAVE_ARCH_KVM_VFIO_POST
>>> * Add KVM_DEV_VFIO_DEVICE_UNPOST_IRQ attribute for VFIO irq, which
>>>   can be used to change back to remapping mode.
>>> * Fix typo
>>>
>>> v2:
>>> * Use VFIO framework to enable this feature, the VFIO part of this series is
>>>   base on Eric's patch "[PATCH v3 0/8] KVM-VFIO IRQ forward control"
>>> * Rebase this patchset on
>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git,
>>>   then revise some irq logic based on the new hierarchy irqdomain patches
>> provided
>>>   by Jiang Liu <jiang.liu@linux.intel.com>
>>>
>>>
>>> *** BLURB HERE ***
>>>
>>> Alex Williamson (1):
>>>   virt: IRQ bypass manager
>>>
>>> Eric Auger (4):
>>>   KVM: arm/arm64: select IRQ_BYPASS_MANAGER
>>>   KVM: create kvm_irqfd.h
>>>   KVM: introduce kvm_arch functions for IRQ bypass
>>>   KVM: eventfd: add irq bypass consumer management
>>>
>>> Feng Wu (13):
>>>   KVM: x86: select IRQ_BYPASS_MANAGER
>>>   KVM: Extend struct pi_desc for VT-d Posted-Interrupts
>>>   KVM: Add some helper functions for Posted-Interrupts
>>>   KVM: Define a new interface kvm_intr_is_single_vcpu()
>>>   KVM: Make struct kvm_irq_routing_table accessible
>>>   KVM: make kvm_set_msi_irq() public
>>>   vfio: Register/unregister irq_bypass_producer
>>>   KVM: x86: Update IRTE for posted-interrupts
>>>   KVM: Implement IRQ bypass consumer callbacks for x86
>>>   KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'
>>>   KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
>>>   KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
>>>   iommu/vt-d: Add a command line parameter for VT-d posted-interrupts
>>>
>>>  Documentation/kernel-parameters.txt   |   1 +
>>>  Documentation/virtual/kvm/locking.txt |  12 ++
>>>  MAINTAINERS                           |   7 +
>>>  arch/arm/kvm/Kconfig                  |   2 +
>>>  arch/arm/kvm/Makefile                 |   1 +
>>>  arch/arm64/kvm/Kconfig                |   2 +
>>>  arch/arm64/kvm/Makefile               |   1 +
>>>  arch/x86/include/asm/kvm_host.h       |  24 +++
>>>  arch/x86/kvm/Kconfig                  |   3 +
>>>  arch/x86/kvm/Makefile                 |   3 +
>>>  arch/x86/kvm/irq_comm.c               |  32 ++-
>>>  arch/x86/kvm/lapic.c                  |  59 ++++++
>>>  arch/x86/kvm/lapic.h                  |   2 +
>>>  arch/x86/kvm/trace.h                  |  33 ++++
>>>  arch/x86/kvm/vmx.c                    | 361
>> +++++++++++++++++++++++++++++++++-
>>>  arch/x86/kvm/x86.c                    | 108 +++++++++-
>>>  drivers/iommu/irq_remapping.c         |  12 +-
>>>  drivers/vfio/pci/Kconfig              |   1 +
>>>  drivers/vfio/pci/vfio_pci_intrs.c     |   9 +
>>>  drivers/vfio/pci/vfio_pci_private.h   |   2 +
>>>  include/linux/irqbypass.h             |  90 +++++++++
>>>  include/linux/kvm_host.h              |  29 +++
>>>  include/linux/kvm_irqfd.h             |  71 +++++++
>>>  virt/kvm/Kconfig                      |   3 +
>>>  virt/kvm/eventfd.c                    | 142 +++++++------
>>>  virt/kvm/irqchip.c                    |  10 -
>>>  virt/kvm/kvm_main.c                   |   3 +
>>>  virt/lib/Kconfig                      |   2 +
>>>  virt/lib/Makefile                     |   1 +
>>>  virt/lib/irqbypass.c                  | 257
>> ++++++++++++++++++++++++
>>>  30 files changed, 1182 insertions(+), 101 deletions(-)
>>>  create mode 100644 include/linux/irqbypass.h
>>>  create mode 100644 include/linux/kvm_irqfd.h
>>>  create mode 100644 virt/lib/Kconfig
>>>  create mode 100644 virt/lib/Makefile
>>>  create mode 100644 virt/lib/irqbypass.c
>>>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v9 01/18] virt: IRQ bypass manager
  2015-09-18 14:29 ` [PATCH v9 01/18] virt: IRQ bypass manager Feng Wu
@ 2015-09-18 15:34   ` Wu, Feng
  0 siblings, 0 replies; 56+ messages in thread
From: Wu, Feng @ 2015-09-18 15:34 UTC (permalink / raw)
  To: Wu, Feng, pbonzini, alex.williamson, joro, mtosatti
  Cc: iommu, linux-kernel, kvm, eric.auger, Wu, Feng

Signed-off-by: Feng Wu <feng.wu@intel.com>

> -----Original Message-----
> From: iommu-bounces@lists.linux-foundation.org
> [mailto:iommu-bounces@lists.linux-foundation.org] On Behalf Of Feng Wu
> Sent: Friday, September 18, 2015 10:30 PM
> To: pbonzini@redhat.com; alex.williamson@redhat.com; joro@8bytes.org;
> mtosatti@redhat.com
> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> kvm@vger.kernel.org; eric.auger@linaro.org
> Subject: [PATCH v9 01/18] virt: IRQ bypass manager
> 
> From: Alex Williamson <alex.williamson@redhat.com>
> 
> When a physical I/O device is assigned to a virtual machine through
> facilities like VFIO and KVM, the interrupt for the device generally
> bounces through the host system before being injected into the VM.
> However, hardware technologies exist that often allow the host to be
> bypassed for some of these scenarios.  Intel Posted Interrupts allow
> the specified physical edge interrupts to be directly injected into a
> guest when delivered to a physical processor while the vCPU is
> running.  ARM IRQ Forwarding allows forwarded physical interrupts to
> be directly deactivated by the guest.
> 
> The IRQ bypass manager here is meant to provide the shim to connect
> interrupt producers, generally the host physical device driver, with
> interrupt consumers, generally the hypervisor, in order to configure
> these bypass mechanism.  To do this, we base the connection on a
> shared, opaque token.  For KVM-VFIO this is expected to be an
> eventfd_ctx since this is the connection we already use to connect an
> eventfd to an irqfd on the in-kernel path.  When a producer and
> consumer with matching tokens is found, callbacks via both registered
> participants allow the bypass facilities to be automatically enabled.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> Reviewed-by: Eric Auger <eric.auger@linaro.org>
> Tested-by: Eric Auger <eric.auger@linaro.org>
> Tested-by: Feng Wu <feng.wu@intel.com>
> ---
> v4: All producer callbacks are optional, as with Intel PI, it's
>     possible for the producer to be blissfully unaware of the bypass.
> 
>  MAINTAINERS               |   7 ++
>  include/linux/irqbypass.h |  90 ++++++++++++++++
>  virt/lib/Kconfig          |   2 +
>  virt/lib/Makefile         |   1 +
>  virt/lib/irqbypass.c      | 257
> ++++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 357 insertions(+)
>  create mode 100644 include/linux/irqbypass.h
>  create mode 100644 virt/lib/Kconfig
>  create mode 100644 virt/lib/Makefile
>  create mode 100644 virt/lib/irqbypass.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a9ae6c1..10c8b2f 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -10963,6 +10963,13 @@ L:	netdev@vger.kernel.org
>  S:	Maintained
>  F:	drivers/net/ethernet/via/via-velocity.*
> 
> +VIRT LIB
> +M:	Alex Williamson <alex.williamson@redhat.com>
> +M:	Paolo Bonzini <pbonzini@redhat.com>
> +L:	kvm@vger.kernel.org
> +S:	Supported
> +F:	virt/lib/
> +
>  VIVID VIRTUAL VIDEO DRIVER
>  M:	Hans Verkuil <hverkuil@xs4all.nl>
>  L:	linux-media@vger.kernel.org
> diff --git a/include/linux/irqbypass.h b/include/linux/irqbypass.h
> new file mode 100644
> index 0000000..1551b5b
> --- /dev/null
> +++ b/include/linux/irqbypass.h
> @@ -0,0 +1,90 @@
> +/*
> + * IRQ offload/bypass manager
> + *
> + * Copyright (C) 2015 Red Hat, Inc.
> + * Copyright (c) 2015 Linaro Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +#ifndef IRQBYPASS_H
> +#define IRQBYPASS_H
> +
> +#include <linux/list.h>
> +
> +struct irq_bypass_consumer;
> +
> +/*
> + * Theory of operation
> + *
> + * The IRQ bypass manager is a simple set of lists and callbacks that allows
> + * IRQ producers (ex. physical interrupt sources) to be matched to IRQ
> + * consumers (ex. virtualization hardware that allows IRQ bypass or offload)
> + * via a shared token (ex. eventfd_ctx).  Producers and consumers register
> + * independently.  When a token match is found, the optional @stop callback
> + * will be called for each participant.  The pair will then be connected via
> + * the @add_* callbacks, and finally the optional @start callback will allow
> + * any final coordination.  When either participant is unregistered, the
> + * process is repeated using the @del_* callbacks in place of the @add_*
> + * callbacks.  Match tokens must be unique per producer/consumer, 1:N
> pairings
> + * are not supported.
> + */
> +
> +/**
> + * struct irq_bypass_producer - IRQ bypass producer definition
> + * @node: IRQ bypass manager private list management
> + * @token: opaque token to match between producer and consumer
> + * @irq: Linux IRQ number for the producer device
> + * @add_consumer: Connect the IRQ producer to an IRQ consumer (optional)
> + * @del_consumer: Disconnect the IRQ producer from an IRQ consumer
> (optional)
> + * @stop: Perform any quiesce operations necessary prior to add/del
> (optional)
> + * @start: Perform any startup operations necessary after add/del (optional)
> + *
> + * The IRQ bypass producer structure represents an interrupt source for
> + * participation in possible host bypass, for instance an interrupt vector
> + * for a physical device assigned to a VM.
> + */
> +struct irq_bypass_producer {
> +	struct list_head node;
> +	void *token;
> +	int irq;
> +	int (*add_consumer)(struct irq_bypass_producer *,
> +			    struct irq_bypass_consumer *);
> +	void (*del_consumer)(struct irq_bypass_producer *,
> +			     struct irq_bypass_consumer *);
> +	void (*stop)(struct irq_bypass_producer *);
> +	void (*start)(struct irq_bypass_producer *);
> +};
> +
> +/**
> + * struct irq_bypass_consumer - IRQ bypass consumer definition
> + * @node: IRQ bypass manager private list management
> + * @token: opaque token to match between producer and consumer
> + * @add_producer: Connect the IRQ consumer to an IRQ producer
> + * @del_producer: Disconnect the IRQ consumer from an IRQ producer
> + * @stop: Perform any quiesce operations necessary prior to add/del
> (optional)
> + * @start: Perform any startup operations necessary after add/del (optional)
> + *
> + * The IRQ bypass consumer structure represents an interrupt sink for
> + * participation in possible host bypass, for instance a hypervisor may
> + * support offloads to allow bypassing the host entirely or offload
> + * portions of the interrupt handling to the VM.
> + */
> +struct irq_bypass_consumer {
> +	struct list_head node;
> +	void *token;
> +	int (*add_producer)(struct irq_bypass_consumer *,
> +			    struct irq_bypass_producer *);
> +	void (*del_producer)(struct irq_bypass_consumer *,
> +			     struct irq_bypass_producer *);
> +	void (*stop)(struct irq_bypass_consumer *);
> +	void (*start)(struct irq_bypass_consumer *);
> +};
> +
> +int irq_bypass_register_producer(struct irq_bypass_producer *);
> +void irq_bypass_unregister_producer(struct irq_bypass_producer *);
> +int irq_bypass_register_consumer(struct irq_bypass_consumer *);
> +void irq_bypass_unregister_consumer(struct irq_bypass_consumer *);
> +
> +#endif /* IRQBYPASS_H */
> diff --git a/virt/lib/Kconfig b/virt/lib/Kconfig
> new file mode 100644
> index 0000000..89a414f
> --- /dev/null
> +++ b/virt/lib/Kconfig
> @@ -0,0 +1,2 @@
> +config IRQ_BYPASS_MANAGER
> +	tristate
> diff --git a/virt/lib/Makefile b/virt/lib/Makefile
> new file mode 100644
> index 0000000..901228d
> --- /dev/null
> +++ b/virt/lib/Makefile
> @@ -0,0 +1 @@
> +obj-$(CONFIG_IRQ_BYPASS_MANAGER) += irqbypass.o
> diff --git a/virt/lib/irqbypass.c b/virt/lib/irqbypass.c
> new file mode 100644
> index 0000000..09a03b5
> --- /dev/null
> +++ b/virt/lib/irqbypass.c
> @@ -0,0 +1,257 @@
> +/*
> + * IRQ offload/bypass manager
> + *
> + * Copyright (C) 2015 Red Hat, Inc.
> + * Copyright (c) 2015 Linaro Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * Various virtualization hardware acceleration techniques allow bypassing or
> + * offloading interrupts received from devices around the host kernel.
> Posted
> + * Interrupts on Intel VT-d systems can allow interrupts to be received
> + * directly by a virtual machine.  ARM IRQ Forwarding allows forwarded
> physical
> + * interrupts to be directly deactivated by the guest.  This manager allows
> + * interrupt producers and consumers to find each other to enable this sort of
> + * bypass.
> + */
> +
> +#include <linux/irqbypass.h>
> +#include <linux/list.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +
> +MODULE_LICENSE("GPL v2");
> +MODULE_DESCRIPTION("IRQ bypass manager utility module");
> +
> +static LIST_HEAD(producers);
> +static LIST_HEAD(consumers);
> +static DEFINE_MUTEX(lock);
> +
> +/* @lock must be held when calling connect */
> +static int __connect(struct irq_bypass_producer *prod,
> +		     struct irq_bypass_consumer *cons)
> +{
> +	int ret = 0;
> +
> +	if (prod->stop)
> +		prod->stop(prod);
> +	if (cons->stop)
> +		cons->stop(cons);
> +
> +	if (prod->add_consumer)
> +		ret = prod->add_consumer(prod, cons);
> +
> +	if (!ret) {
> +		ret = cons->add_producer(cons, prod);
> +		if (ret && prod->del_consumer)
> +			prod->del_consumer(prod, cons);
> +	}
> +
> +	if (cons->start)
> +		cons->start(cons);
> +	if (prod->start)
> +		prod->start(prod);
> +
> +	return ret;
> +}
> +
> +/* @lock must be held when calling disconnect */
> +static void __disconnect(struct irq_bypass_producer *prod,
> +			 struct irq_bypass_consumer *cons)
> +{
> +	if (prod->stop)
> +		prod->stop(prod);
> +	if (cons->stop)
> +		cons->stop(cons);
> +
> +	cons->del_producer(cons, prod);
> +
> +	if (prod->del_consumer)
> +		prod->del_consumer(prod, cons);
> +
> +	if (cons->start)
> +		cons->start(cons);
> +	if (prod->start)
> +		prod->start(prod);
> +}
> +
> +/**
> + * irq_bypass_register_producer - register IRQ bypass producer
> + * @producer: pointer to producer structure
> + *
> + * Add the provided IRQ producer to the list of producers and connect
> + * with any matching token found on the IRQ consumers list.
> + */
> +int irq_bypass_register_producer(struct irq_bypass_producer *producer)
> +{
> +	struct irq_bypass_producer *tmp;
> +	struct irq_bypass_consumer *consumer;
> +
> +	might_sleep();
> +
> +	if (!try_module_get(THIS_MODULE))
> +		return -ENODEV;
> +
> +	mutex_lock(&lock);
> +
> +	list_for_each_entry(tmp, &producers, node) {
> +		if (tmp->token == producer->token) {
> +			mutex_unlock(&lock);
> +			module_put(THIS_MODULE);
> +			return -EBUSY;
> +		}
> +	}
> +
> +	list_for_each_entry(consumer, &consumers, node) {
> +		if (consumer->token == producer->token) {
> +			int ret = __connect(producer, consumer);
> +			if (ret) {
> +				mutex_unlock(&lock);
> +				module_put(THIS_MODULE);
> +				return ret;
> +			}
> +			break;
> +		}
> +	}
> +
> +	list_add(&producer->node, &producers);
> +
> +	mutex_unlock(&lock);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(irq_bypass_register_producer);
> +
> +/**
> + * irq_bypass_unregister_producer - unregister IRQ bypass producer
> + * @producer: pointer to producer structure
> + *
> + * Remove a previously registered IRQ producer from the list of producers
> + * and disconnect it from any connected IRQ consumer.
> + */
> +void irq_bypass_unregister_producer(struct irq_bypass_producer *producer)
> +{
> +	struct irq_bypass_producer *tmp;
> +	struct irq_bypass_consumer *consumer;
> +
> +	might_sleep();
> +
> +	if (!try_module_get(THIS_MODULE))
> +		return; /* nothing in the list anyway */
> +
> +	mutex_lock(&lock);
> +
> +	list_for_each_entry(tmp, &producers, node) {
> +		if (tmp->token != producer->token)
> +			continue;
> +
> +		list_for_each_entry(consumer, &consumers, node) {
> +			if (consumer->token == producer->token) {
> +				__disconnect(producer, consumer);
> +				break;
> +			}
> +		}
> +
> +		list_del(&producer->node);
> +		module_put(THIS_MODULE);
> +		break;
> +	}
> +
> +	mutex_unlock(&lock);
> +
> +	module_put(THIS_MODULE);
> +}
> +EXPORT_SYMBOL_GPL(irq_bypass_unregister_producer);
> +
> +/**
> + * irq_bypass_register_consumer - register IRQ bypass consumer
> + * @consumer: pointer to consumer structure
> + *
> + * Add the provided IRQ consumer to the list of consumers and connect
> + * with any matching token found on the IRQ producer list.
> + */
> +int irq_bypass_register_consumer(struct irq_bypass_consumer *consumer)
> +{
> +	struct irq_bypass_consumer *tmp;
> +	struct irq_bypass_producer *producer;
> +
> +	if (!consumer->add_producer || !consumer->del_producer)
> +		return -EINVAL;
> +
> +	might_sleep();
> +
> +	if (!try_module_get(THIS_MODULE))
> +		return -ENODEV;
> +
> +	mutex_lock(&lock);
> +
> +	list_for_each_entry(tmp, &consumers, node) {
> +		if (tmp->token == consumer->token) {
> +			mutex_unlock(&lock);
> +			module_put(THIS_MODULE);
> +			return -EBUSY;
> +		}
> +	}
> +
> +	list_for_each_entry(producer, &producers, node) {
> +		if (producer->token == consumer->token) {
> +			int ret = __connect(producer, consumer);
> +			if (ret) {
> +				mutex_unlock(&lock);
> +				module_put(THIS_MODULE);
> +				return ret;
> +			}
> +			break;
> +		}
> +	}
> +
> +	list_add(&consumer->node, &consumers);
> +
> +	mutex_unlock(&lock);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(irq_bypass_register_consumer);
> +
> +/**
> + * irq_bypass_unregister_consumer - unregister IRQ bypass consumer
> + * @consumer: pointer to consumer structure
> + *
> + * Remove a previously registered IRQ consumer from the list of consumers
> + * and disconnect it from any connected IRQ producer.
> + */
> +void irq_bypass_unregister_consumer(struct irq_bypass_consumer
> *consumer)
> +{
> +	struct irq_bypass_consumer *tmp;
> +	struct irq_bypass_producer *producer;
> +
> +	might_sleep();
> +
> +	if (!try_module_get(THIS_MODULE))
> +		return; /* nothing in the list anyway */
> +
> +	mutex_lock(&lock);
> +
> +	list_for_each_entry(tmp, &consumers, node) {
> +		if (tmp->token != consumer->token)
> +			continue;
> +
> +		list_for_each_entry(producer, &producers, node) {
> +			if (producer->token == consumer->token) {
> +				__disconnect(producer, consumer);
> +				break;
> +			}
> +		}
> +
> +		list_del(&consumer->node);
> +		module_put(THIS_MODULE);
> +		break;
> +	}
> +
> +	mutex_unlock(&lock);
> +
> +	module_put(THIS_MODULE);
> +}
> +EXPORT_SYMBOL_GPL(irq_bypass_unregister_consumer);
> --
> 2.1.0
> 
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v9 04/18] KVM: create kvm_irqfd.h
  2015-09-18 14:29 ` [PATCH v9 04/18] KVM: create kvm_irqfd.h Feng Wu
@ 2015-09-18 15:35   ` Wu, Feng
  0 siblings, 0 replies; 56+ messages in thread
From: Wu, Feng @ 2015-09-18 15:35 UTC (permalink / raw)
  To: Wu, Feng, pbonzini, alex.williamson, joro, mtosatti
  Cc: iommu, linux-kernel, kvm, eric.auger, Wu, Feng

Signed-off-by: Feng Wu <feng.wu@intel.com>

> -----Original Message-----
> From: iommu-bounces@lists.linux-foundation.org
> [mailto:iommu-bounces@lists.linux-foundation.org] On Behalf Of Feng Wu
> Sent: Friday, September 18, 2015 10:30 PM
> To: pbonzini@redhat.com; alex.williamson@redhat.com; joro@8bytes.org;
> mtosatti@redhat.com
> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> kvm@vger.kernel.org; eric.auger@linaro.org
> Subject: [PATCH v9 04/18] KVM: create kvm_irqfd.h
> 
> From: Eric Auger <eric.auger@linaro.org>
> 
> Move _irqfd_resampler and _irqfd struct declarations in a new
> public header: kvm_irqfd.h. They are respectively renamed into
> kvm_kernel_irqfd_resampler and kvm_kernel_irqfd. Those datatypes
> will be used by architecture specific code, in the context of
> IRQ bypass manager integration.
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> ---
>  include/linux/kvm_irqfd.h | 69 ++++++++++++++++++++++++++++++++++
>  virt/kvm/eventfd.c        | 95 ++++++++++++-----------------------------------
>  2 files changed, 92 insertions(+), 72 deletions(-)
>  create mode 100644 include/linux/kvm_irqfd.h
> 
> diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
> new file mode 100644
> index 0000000..f926b39
> --- /dev/null
> +++ b/include/linux/kvm_irqfd.h
> @@ -0,0 +1,69 @@
> +/*
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * irqfd: Allows an fd to be used to inject an interrupt to the guest
> + * Credit goes to Avi Kivity for the original idea.
> + */
> +
> +#ifndef __LINUX_KVM_IRQFD_H
> +#define __LINUX_KVM_IRQFD_H
> +
> +#include <linux/kvm_host.h>
> +#include <linux/poll.h>
> +
> +/*
> + * Resampling irqfds are a special variety of irqfds used to emulate
> + * level triggered interrupts.  The interrupt is asserted on eventfd
> + * trigger.  On acknowledgment through the irq ack notifier, the
> + * interrupt is de-asserted and userspace is notified through the
> + * resamplefd.  All resamplers on the same gsi are de-asserted
> + * together, so we don't need to track the state of each individual
> + * user.  We can also therefore share the same irq source ID.
> + */
> +struct kvm_kernel_irqfd_resampler {
> +	struct kvm *kvm;
> +	/*
> +	 * List of resampling struct _irqfd objects sharing this gsi.
> +	 * RCU list modified under kvm->irqfds.resampler_lock
> +	 */
> +	struct list_head list;
> +	struct kvm_irq_ack_notifier notifier;
> +	/*
> +	 * Entry in list of kvm->irqfd.resampler_list.  Use for sharing
> +	 * resamplers among irqfds on the same gsi.
> +	 * Accessed and modified under kvm->irqfds.resampler_lock
> +	 */
> +	struct list_head link;
> +};
> +
> +struct kvm_kernel_irqfd {
> +	/* Used for MSI fast-path */
> +	struct kvm *kvm;
> +	wait_queue_t wait;
> +	/* Update side is protected by irqfds.lock */
> +	struct kvm_kernel_irq_routing_entry irq_entry;
> +	seqcount_t irq_entry_sc;
> +	/* Used for level IRQ fast-path */
> +	int gsi;
> +	struct work_struct inject;
> +	/* The resampler used by this irqfd (resampler-only) */
> +	struct kvm_kernel_irqfd_resampler *resampler;
> +	/* Eventfd notified on resample (resampler-only) */
> +	struct eventfd_ctx *resamplefd;
> +	/* Entry in list of irqfds for a resampler (resampler-only) */
> +	struct list_head resampler_link;
> +	/* Used for setup/shutdown */
> +	struct eventfd_ctx *eventfd;
> +	struct list_head list;
> +	poll_table pt;
> +	struct work_struct shutdown;
> +};
> +
> +#endif /* __LINUX_KVM_IRQFD_H */
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 9ff4193..647ffb8 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -23,6 +23,7 @@
> 
>  #include <linux/kvm_host.h>
>  #include <linux/kvm.h>
> +#include <linux/kvm_irqfd.h>
>  #include <linux/workqueue.h>
>  #include <linux/syscalls.h>
>  #include <linux/wait.h>
> @@ -39,68 +40,14 @@
>  #include <kvm/iodev.h>
> 
>  #ifdef CONFIG_HAVE_KVM_IRQFD
> -/*
> - * --------------------------------------------------------------------
> - * irqfd: Allows an fd to be used to inject an interrupt to the guest
> - *
> - * Credit goes to Avi Kivity for the original idea.
> - * --------------------------------------------------------------------
> - */
> -
> -/*
> - * Resampling irqfds are a special variety of irqfds used to emulate
> - * level triggered interrupts.  The interrupt is asserted on eventfd
> - * trigger.  On acknowledgement through the irq ack notifier, the
> - * interrupt is de-asserted and userspace is notified through the
> - * resamplefd.  All resamplers on the same gsi are de-asserted
> - * together, so we don't need to track the state of each individual
> - * user.  We can also therefore share the same irq source ID.
> - */
> -struct _irqfd_resampler {
> -	struct kvm *kvm;
> -	/*
> -	 * List of resampling struct _irqfd objects sharing this gsi.
> -	 * RCU list modified under kvm->irqfds.resampler_lock
> -	 */
> -	struct list_head list;
> -	struct kvm_irq_ack_notifier notifier;
> -	/*
> -	 * Entry in list of kvm->irqfd.resampler_list.  Use for sharing
> -	 * resamplers among irqfds on the same gsi.
> -	 * Accessed and modified under kvm->irqfds.resampler_lock
> -	 */
> -	struct list_head link;
> -};
> -
> -struct _irqfd {
> -	/* Used for MSI fast-path */
> -	struct kvm *kvm;
> -	wait_queue_t wait;
> -	/* Update side is protected by irqfds.lock */
> -	struct kvm_kernel_irq_routing_entry irq_entry;
> -	seqcount_t irq_entry_sc;
> -	/* Used for level IRQ fast-path */
> -	int gsi;
> -	struct work_struct inject;
> -	/* The resampler used by this irqfd (resampler-only) */
> -	struct _irqfd_resampler *resampler;
> -	/* Eventfd notified on resample (resampler-only) */
> -	struct eventfd_ctx *resamplefd;
> -	/* Entry in list of irqfds for a resampler (resampler-only) */
> -	struct list_head resampler_link;
> -	/* Used for setup/shutdown */
> -	struct eventfd_ctx *eventfd;
> -	struct list_head list;
> -	poll_table pt;
> -	struct work_struct shutdown;
> -};
> 
>  static struct workqueue_struct *irqfd_cleanup_wq;
> 
>  static void
>  irqfd_inject(struct work_struct *work)
>  {
> -	struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> +	struct kvm_kernel_irqfd *irqfd =
> +		container_of(work, struct kvm_kernel_irqfd, inject);
>  	struct kvm *kvm = irqfd->kvm;
> 
>  	if (!irqfd->resampler) {
> @@ -121,12 +68,13 @@ irqfd_inject(struct work_struct *work)
>  static void
>  irqfd_resampler_ack(struct kvm_irq_ack_notifier *kian)
>  {
> -	struct _irqfd_resampler *resampler;
> +	struct kvm_kernel_irqfd_resampler *resampler;
>  	struct kvm *kvm;
> -	struct _irqfd *irqfd;
> +	struct kvm_kernel_irqfd *irqfd;
>  	int idx;
> 
> -	resampler = container_of(kian, struct _irqfd_resampler, notifier);
> +	resampler = container_of(kian,
> +			struct kvm_kernel_irqfd_resampler, notifier);
>  	kvm = resampler->kvm;
> 
>  	kvm_set_irq(kvm, KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID,
> @@ -141,9 +89,9 @@ irqfd_resampler_ack(struct kvm_irq_ack_notifier *kian)
>  }
> 
>  static void
> -irqfd_resampler_shutdown(struct _irqfd *irqfd)
> +irqfd_resampler_shutdown(struct kvm_kernel_irqfd *irqfd)
>  {
> -	struct _irqfd_resampler *resampler = irqfd->resampler;
> +	struct kvm_kernel_irqfd_resampler *resampler = irqfd->resampler;
>  	struct kvm *kvm = resampler->kvm;
> 
>  	mutex_lock(&kvm->irqfds.resampler_lock);
> @@ -168,7 +116,8 @@ irqfd_resampler_shutdown(struct _irqfd *irqfd)
>  static void
>  irqfd_shutdown(struct work_struct *work)
>  {
> -	struct _irqfd *irqfd = container_of(work, struct _irqfd, shutdown);
> +	struct kvm_kernel_irqfd *irqfd =
> +		container_of(work, struct kvm_kernel_irqfd, shutdown);
>  	u64 cnt;
> 
>  	/*
> @@ -198,7 +147,7 @@ irqfd_shutdown(struct work_struct *work)
> 
>  /* assumes kvm->irqfds.lock is held */
>  static bool
> -irqfd_is_active(struct _irqfd *irqfd)
> +irqfd_is_active(struct kvm_kernel_irqfd *irqfd)
>  {
>  	return list_empty(&irqfd->list) ? false : true;
>  }
> @@ -209,7 +158,7 @@ irqfd_is_active(struct _irqfd *irqfd)
>   * assumes kvm->irqfds.lock is held
>   */
>  static void
> -irqfd_deactivate(struct _irqfd *irqfd)
> +irqfd_deactivate(struct kvm_kernel_irqfd *irqfd)
>  {
>  	BUG_ON(!irqfd_is_active(irqfd));
> 
> @@ -224,7 +173,8 @@ irqfd_deactivate(struct _irqfd *irqfd)
>  static int
>  irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
>  {
> -	struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
> +	struct kvm_kernel_irqfd *irqfd =
> +		container_of(wait, struct kvm_kernel_irqfd, wait);
>  	unsigned long flags = (unsigned long)key;
>  	struct kvm_kernel_irq_routing_entry irq;
>  	struct kvm *kvm = irqfd->kvm;
> @@ -274,12 +224,13 @@ static void
>  irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh,
>  			poll_table *pt)
>  {
> -	struct _irqfd *irqfd = container_of(pt, struct _irqfd, pt);
> +	struct kvm_kernel_irqfd *irqfd =
> +		container_of(pt, struct kvm_kernel_irqfd, pt);
>  	add_wait_queue(wqh, &irqfd->wait);
>  }
> 
>  /* Must be called under irqfds.lock */
> -static void irqfd_update(struct kvm *kvm, struct _irqfd *irqfd)
> +static void irqfd_update(struct kvm *kvm, struct kvm_kernel_irqfd *irqfd)
>  {
>  	struct kvm_kernel_irq_routing_entry *e;
>  	struct kvm_kernel_irq_routing_entry entries[KVM_NR_IRQCHIPS];
> @@ -304,7 +255,7 @@ static void irqfd_update(struct kvm *kvm, struct _irqfd
> *irqfd)
>  static int
>  kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
>  {
> -	struct _irqfd *irqfd, *tmp;
> +	struct kvm_kernel_irqfd *irqfd, *tmp;
>  	struct fd f;
>  	struct eventfd_ctx *eventfd = NULL, *resamplefd = NULL;
>  	int ret;
> @@ -340,7 +291,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd
> *args)
>  	irqfd->eventfd = eventfd;
> 
>  	if (args->flags & KVM_IRQFD_FLAG_RESAMPLE) {
> -		struct _irqfd_resampler *resampler;
> +		struct kvm_kernel_irqfd_resampler *resampler;
> 
>  		resamplefd = eventfd_ctx_fdget(args->resamplefd);
>  		if (IS_ERR(resamplefd)) {
> @@ -525,7 +476,7 @@ kvm_eventfd_init(struct kvm *kvm)
>  static int
>  kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
>  {
> -	struct _irqfd *irqfd, *tmp;
> +	struct kvm_kernel_irqfd *irqfd, *tmp;
>  	struct eventfd_ctx *eventfd;
> 
>  	eventfd = eventfd_ctx_fdget(args->fd);
> @@ -581,7 +532,7 @@ kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
>  void
>  kvm_irqfd_release(struct kvm *kvm)
>  {
> -	struct _irqfd *irqfd, *tmp;
> +	struct kvm_kernel_irqfd *irqfd, *tmp;
> 
>  	spin_lock_irq(&kvm->irqfds.lock);
> 
> @@ -604,7 +555,7 @@ kvm_irqfd_release(struct kvm *kvm)
>   */
>  void kvm_irq_routing_update(struct kvm *kvm)
>  {
> -	struct _irqfd *irqfd;
> +	struct kvm_kernel_irqfd *irqfd;
> 
>  	spin_lock_irq(&kvm->irqfds.lock);
> 
> --
> 2.1.0
> 
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series
  2015-09-18 15:21     ` Paolo Bonzini
@ 2015-09-18 15:38       ` Wu, Feng
  0 siblings, 0 replies; 56+ messages in thread
From: Wu, Feng @ 2015-09-18 15:38 UTC (permalink / raw)
  To: Paolo Bonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Wu, Feng



> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Friday, September 18, 2015 11:21 PM
> To: Wu, Feng; alex.williamson@redhat.com; joro@8bytes.org;
> mtosatti@redhat.com
> Cc: eric.auger@linaro.org; kvm@vger.kernel.org;
> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including
> prerequisite series
> 
> 
> 
> On 18/09/2015 17:08, Wu, Feng wrote:
> >
> >
> >> -----Original Message-----
> >> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> >> Sent: Friday, September 18, 2015 10:59 PM
> >> To: Wu, Feng; alex.williamson@redhat.com; joro@8bytes.org;
> >> mtosatti@redhat.com
> >> Cc: eric.auger@linaro.org; kvm@vger.kernel.org;
> >> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org
> >> Subject: Re: [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including
> >> prerequisite series
> >>
> >>
> >>
> >> On 18/09/2015 16:29, Feng Wu wrote:
> >>> VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> >>> With VT-d Posted-Interrupts enabled, external interrupts from
> >>> direct-assigned devices can be delivered to guests without VMM
> >>> intervention when guest is running in non-root mode.
> >>>
> >>> You can find the VT-d Posted-Interrtups Spec. in the following URL:
> >>>
> >>
> http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolog
> >> y/vt-directed-io-spec.html
> >>
> >> Thanks.  I will squash patches 2 and 14 together, and drop patch 3.
> >>
> >> Signed-off-bys are missing in patch 1 and 4.  The patches exist
> >> elsewhere in the mailing list archives, so not a big deal.  Or just
> >> reply to them with the S-o-b line.
> >>
> >
> > Thanks for your quick response, Paolo! I didn't change the code
> > in patch 1 and 4, do I need to add s-o-b, if needed, I can reply
> > the patches.
> 
> Yes, the s-o-b just means that the code passed through your hands.

Done.
> 
> Note that I replied to patch 17, but no need to resend that one
> either---just mailing list discussion is enough.

Do you mean you replied to patch 17 just now, but I don't find your replies
in the mailing list.

Thanks,
Feng

> 
> Paolo
> 
> > Thanks,
> > Feng
> >
> >> Alex, can you ack the series and review patch 12?
> >>
> >> Joerg, can you ack patch 18?
> >>
> >> Paolo
> >>
> >>> v9:
> >>> - Include the whole series:
> >>> [01/18]: irq bypasser manager
> >>> [02/18] - [06/18]: Common non-architecture part for VT-d PI and ARM side
> >> forwarded irq
> >>> [07/18] - [18/18]: VT-d PI part
> >>>
> >>> v8:
> >>> refer to the changelog in each patch
> >>>
> >>> v7:
> >>> * Define two weak irq bypass callbacks:
> >>>   - kvm_arch_irq_bypass_start()
> >>>   - kvm_arch_irq_bypass_stop()
> >>> * Remove the x86 dummy implementation of the above two functions.
> >>> * Print some useful information instead of WARN_ON() when the
> >>>   irq bypass consumer unregistration fails.
> >>> * Fix an issue when calling pi_pre_block and pi_post_block.
> >>>
> >>> v6:
> >>> * Rebase on 4.2.0-rc6
> >>> * Rebase on https://lkml.org/lkml/2015/8/6/526 and
> >> http://www.gossamer-threads.com/lists/linux/kernel/2235623
> >>> * Make the add_consumer and del_consumer callbacks static
> >>> * Remove pointless INIT_LIST_HEAD to 'vdev->ctx[vector].producer.node)'
> >>> * Use dev_info instead of WARN_ON() when irq_bypass_register_producer
> >> fails
> >>> * Remove optional dummy callbacks for irq producer
> >>>
> >>> v4:
> >>> * For lowest-priority interrupt, only support single-CPU destination
> >>> interrupts at the current stage, more common lowest priority support
> >>> will be added later.
> >>> * Accoring to Marcelo's suggestion, when vCPU is blocked, we handle
> >>> the posted-interrupts in the HLT emulation path.
> >>> * Some small changes (coding style, typo, add some code comments)
> >>>
> >>> v3:
> >>> * Adjust the Posted-interrupts Descriptor updating logic when vCPU is
> >>>   preempted or blocked.
> >>> * KVM_DEV_VFIO_DEVICE_POSTING_IRQ -->
> >> KVM_DEV_VFIO_DEVICE_POST_IRQ
> >>> * __KVM_HAVE_ARCH_KVM_VFIO_POSTING -->
> >> __KVM_HAVE_ARCH_KVM_VFIO_POST
> >>> * Add KVM_DEV_VFIO_DEVICE_UNPOST_IRQ attribute for VFIO irq, which
> >>>   can be used to change back to remapping mode.
> >>> * Fix typo
> >>>
> >>> v2:
> >>> * Use VFIO framework to enable this feature, the VFIO part of this series is
> >>>   base on Eric's patch "[PATCH v3 0/8] KVM-VFIO IRQ forward control"
> >>> * Rebase this patchset on
> >> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git,
> >>>   then revise some irq logic based on the new hierarchy irqdomain
> patches
> >> provided
> >>>   by Jiang Liu <jiang.liu@linux.intel.com>
> >>>
> >>>
> >>> *** BLURB HERE ***
> >>>
> >>> Alex Williamson (1):
> >>>   virt: IRQ bypass manager
> >>>
> >>> Eric Auger (4):
> >>>   KVM: arm/arm64: select IRQ_BYPASS_MANAGER
> >>>   KVM: create kvm_irqfd.h
> >>>   KVM: introduce kvm_arch functions for IRQ bypass
> >>>   KVM: eventfd: add irq bypass consumer management
> >>>
> >>> Feng Wu (13):
> >>>   KVM: x86: select IRQ_BYPASS_MANAGER
> >>>   KVM: Extend struct pi_desc for VT-d Posted-Interrupts
> >>>   KVM: Add some helper functions for Posted-Interrupts
> >>>   KVM: Define a new interface kvm_intr_is_single_vcpu()
> >>>   KVM: Make struct kvm_irq_routing_table accessible
> >>>   KVM: make kvm_set_msi_irq() public
> >>>   vfio: Register/unregister irq_bypass_producer
> >>>   KVM: x86: Update IRTE for posted-interrupts
> >>>   KVM: Implement IRQ bypass consumer callbacks for x86
> >>>   KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'
> >>>   KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
> >>>   KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
> >>>   iommu/vt-d: Add a command line parameter for VT-d posted-interrupts
> >>>
> >>>  Documentation/kernel-parameters.txt   |   1 +
> >>>  Documentation/virtual/kvm/locking.txt |  12 ++
> >>>  MAINTAINERS                           |   7 +
> >>>  arch/arm/kvm/Kconfig                  |   2 +
> >>>  arch/arm/kvm/Makefile                 |   1 +
> >>>  arch/arm64/kvm/Kconfig                |   2 +
> >>>  arch/arm64/kvm/Makefile               |   1 +
> >>>  arch/x86/include/asm/kvm_host.h       |  24 +++
> >>>  arch/x86/kvm/Kconfig                  |   3 +
> >>>  arch/x86/kvm/Makefile                 |   3 +
> >>>  arch/x86/kvm/irq_comm.c               |  32 ++-
> >>>  arch/x86/kvm/lapic.c                  |  59 ++++++
> >>>  arch/x86/kvm/lapic.h                  |   2 +
> >>>  arch/x86/kvm/trace.h                  |  33 ++++
> >>>  arch/x86/kvm/vmx.c                    | 361
> >> +++++++++++++++++++++++++++++++++-
> >>>  arch/x86/kvm/x86.c                    | 108 +++++++++-
> >>>  drivers/iommu/irq_remapping.c         |  12 +-
> >>>  drivers/vfio/pci/Kconfig              |   1 +
> >>>  drivers/vfio/pci/vfio_pci_intrs.c     |   9 +
> >>>  drivers/vfio/pci/vfio_pci_private.h   |   2 +
> >>>  include/linux/irqbypass.h             |  90 +++++++++
> >>>  include/linux/kvm_host.h              |  29 +++
> >>>  include/linux/kvm_irqfd.h             |  71 +++++++
> >>>  virt/kvm/Kconfig                      |   3 +
> >>>  virt/kvm/eventfd.c                    | 142 +++++++------
> >>>  virt/kvm/irqchip.c                    |  10 -
> >>>  virt/kvm/kvm_main.c                   |   3 +
> >>>  virt/lib/Kconfig                      |   2 +
> >>>  virt/lib/Makefile                     |   1 +
> >>>  virt/lib/irqbypass.c                  | 257
> >> ++++++++++++++++++++++++
> >>>  30 files changed, 1182 insertions(+), 101 deletions(-)
> >>>  create mode 100644 include/linux/irqbypass.h
> >>>  create mode 100644 include/linux/kvm_irqfd.h
> >>>  create mode 100644 virt/lib/Kconfig
> >>>  create mode 100644 virt/lib/Makefile
> >>>  create mode 100644 virt/lib/irqbypass.c
> >>>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v9 17/18] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-09-18 14:29 ` [PATCH v9 17/18] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked Feng Wu
@ 2015-09-18 16:06   ` Paolo Bonzini
  2015-09-19  7:11     ` Wu, Feng
  2015-09-21  2:16     ` Wu, Feng
  2015-10-14 23:41   ` David Matlack
  1 sibling, 2 replies; 56+ messages in thread
From: Paolo Bonzini @ 2015-09-18 16:06 UTC (permalink / raw)
  To: Wu, Feng, Alex Williamson, joro, Marcelo Tosatti
  Cc: iommu, linux-kernel, KVM list, Eric Auger



On 18/09/2015 16:29, Feng Wu wrote:
> This patch updates the Posted-Interrupts Descriptor when vCPU
> is blocked.
> 
> pre-block:
> - Add the vCPU to the blocked per-CPU list
> - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR
> 
> post-block:
> - Remove the vCPU from the per-CPU list
> 
> Signed-off-by: Feng Wu <feng.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> ---
> v9:
> - Add description for blocked_vcpu_on_cpu_lock in Documentation/virtual/kvm/locking.txt
> - Check !kvm_arch_has_assigned_device(vcpu->kvm) first, then
>   !irq_remapping_cap(IRQ_POSTING_CAP)
> 
> v8:
> - Rename 'pi_pre_block' to 'pre_block'
> - Rename 'pi_post_block' to 'post_block'
> - Change some comments
> - Only add the vCPU to the blocking list when the VM has assigned devices.
> 
>  Documentation/virtual/kvm/locking.txt |  12 +++
>  arch/x86/include/asm/kvm_host.h       |  13 +++
>  arch/x86/kvm/vmx.c                    | 153 ++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/x86.c                    |  53 +++++++++---
>  include/linux/kvm_host.h              |   3 +
>  virt/kvm/kvm_main.c                   |   3 +
>  6 files changed, 227 insertions(+), 10 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/locking.txt b/Documentation/virtual/kvm/locking.txt
> index d68af4d..19f94a6 100644
> --- a/Documentation/virtual/kvm/locking.txt
> +++ b/Documentation/virtual/kvm/locking.txt
> @@ -166,3 +166,15 @@ Comment:	The srcu read lock must be held while accessing memslots (e.g.
>  		MMIO/PIO address->device structure mapping (kvm->buses).
>  		The srcu index can be stored in kvm_vcpu->srcu_idx per vcpu
>  		if it is needed by multiple functions.
> +
> +Name:		blocked_vcpu_on_cpu_lock
> +Type:		spinlock_t
> +Arch:		x86
> +Protects:	blocked_vcpu_on_cpu
> +Comment:	This is a per-CPU lock and it is used for VT-d posted-interrupts.
> +		When VT-d posted-interrupts is supported and the VM has assigned
> +		devices, we put the blocked vCPU on the list blocked_vcpu_on_cpu
> +		protected by blocked_vcpu_on_cpu_lock, when VT-d hardware issues
> +		wakeup notification event since external interrupts from the
> +		assigned devices happens, we will find the vCPU on the list to
> +		wakeup.
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 0ddd353..304fbb5 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -552,6 +552,8 @@ struct kvm_vcpu_arch {
>  	 */
>  	bool write_fault_to_shadow_pgtable;
>  
> +	bool halted;
> +
>  	/* set at EPT violation at this point */
>  	unsigned long exit_qualification;
>  
> @@ -864,6 +866,17 @@ struct kvm_x86_ops {
>  	/* pmu operations of sub-arch */
>  	const struct kvm_pmu_ops *pmu_ops;
>  
> +	/*
> +	 * Architecture specific hooks for vCPU blocking due to
> +	 * HLT instruction.
> +	 * Returns for .pre_block():
> +	 *    - 0 means continue to block the vCPU.
> +	 *    - 1 means we cannot block the vCPU since some event
> +	 *        happens during this period, such as, 'ON' bit in
> +	 *        posted-interrupts descriptor is set.
> +	 */
> +	int (*pre_block)(struct kvm_vcpu *vcpu);
> +	void (*post_block)(struct kvm_vcpu *vcpu);
>  	int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
>  			      uint32_t guest_irq, bool set);
>  };
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 902a67d..9968896 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -879,6 +879,13 @@ static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
>  static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
>  static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
>  
> +/*
> + * We maintian a per-CPU linked-list of vCPU, so in wakeup_handler() we
> + * can find which vCPU should be waken up.
> + */
> +static DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
> +static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
> +
>  static unsigned long *vmx_io_bitmap_a;
>  static unsigned long *vmx_io_bitmap_b;
>  static unsigned long *vmx_msr_bitmap_legacy;
> @@ -2985,6 +2992,8 @@ static int hardware_enable(void)
>  		return -EBUSY;
>  
>  	INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
> +	INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
> +	spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
>  
>  	/*
>  	 * Now we can enable the vmclear operation in kdump
> @@ -6121,6 +6130,25 @@ static void update_ple_window_actual_max(void)
>  			                    ple_window_grow, INT_MIN);
>  }
>  
> +/*
> + * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
> + */
> +static void wakeup_handler(void)
> +{
> +	struct kvm_vcpu *vcpu;
> +	int cpu = smp_processor_id();
> +
> +	spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> +	list_for_each_entry(vcpu, &per_cpu(blocked_vcpu_on_cpu, cpu),
> +			blocked_vcpu_list) {
> +		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> +
> +		if (pi_test_on(pi_desc) == 1)
> +			kvm_vcpu_kick(vcpu);
> +	}
> +	spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> +}
> +
>  static __init int hardware_setup(void)
>  {
>  	int r = -ENOMEM, i, msr;
> @@ -6305,6 +6333,8 @@ static __init int hardware_setup(void)
>  		kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
>  	}
>  
> +	kvm_set_posted_intr_wakeup_handler(wakeup_handler);
> +
>  	return alloc_kvm_area();
>  
>  out8:
> @@ -10430,6 +10460,126 @@ static void vmx_enable_log_dirty_pt_masked(struct kvm *kvm,
>  }
>  
>  /*
> + * This routine does the following things for vCPU which is going
> + * to be blocked if VT-d PI is enabled.
> + * - Store the vCPU to the wakeup list, so when interrupts happen
> + *   we can find the right vCPU to wake up.
> + * - Change the Posted-interrupt descriptor as below:
> + *      'NDST' <-- vcpu->pre_pcpu
> + *      'NV' <-- POSTED_INTR_WAKEUP_VECTOR
> + * - If 'ON' is set during this process, which means at least one
> + *   interrupt is posted for this vCPU, we cannot block it, in
> + *   this case, return 1, otherwise, return 0.
> + *
> + */
> +static int vmx_pre_block(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long flags;
> +	unsigned int dest;
> +	struct pi_desc old, new;
> +	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> +
> +	if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
> +		!irq_remapping_cap(IRQ_POSTING_CAP))
> +		return 0;
> +
> +	vcpu->pre_pcpu = vcpu->cpu;
> +	spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> +			  vcpu->pre_pcpu), flags);
> +	list_add_tail(&vcpu->blocked_vcpu_list,
> +		      &per_cpu(blocked_vcpu_on_cpu,
> +		      vcpu->pre_pcpu));
> +	spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> +			       vcpu->pre_pcpu), flags);
> +
> +	do {
> +		old.control = new.control = pi_desc->control;
> +
> +		/*
> +		 * We should not block the vCPU if
> +		 * an interrupt is posted for it.
> +		 */
> +		if (pi_test_on(pi_desc) == 1) {
> +			spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> +					  vcpu->pre_pcpu), flags);
> +			list_del(&vcpu->blocked_vcpu_list);
> +			spin_unlock_irqrestore(
> +					&per_cpu(blocked_vcpu_on_cpu_lock,
> +					vcpu->pre_pcpu), flags);
> +			vcpu->pre_pcpu = -1;
> +
> +			return 1;
> +		}
> +
> +		WARN((pi_desc->sn == 1),
> +		     "Warning: SN field of posted-interrupts "
> +		     "is set before blocking\n");
> +
> +		/*
> +		 * Since vCPU can be preempted during this process,
> +		 * vcpu->cpu could be different with pre_pcpu, we
> +		 * need to set pre_pcpu as the destination of wakeup
> +		 * notification event, then we can find the right vCPU
> +		 * to wakeup in wakeup handler if interrupts happen
> +		 * when the vCPU is in blocked state.
> +		 */
> +		dest = cpu_physical_id(vcpu->pre_pcpu);
> +
> +		if (x2apic_enabled())
> +			new.ndst = dest;
> +		else
> +			new.ndst = (dest << 8) & 0xFF00;
> +
> +		/* set 'NV' to 'wakeup vector' */
> +		new.nv = POSTED_INTR_WAKEUP_VECTOR;
> +	} while (cmpxchg(&pi_desc->control, old.control,
> +			new.control) != old.control);
> +
> +	return 0;
> +}
> +
> +static void vmx_post_block(struct kvm_vcpu *vcpu)
> +{
> +	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> +	struct pi_desc old, new;
> +	unsigned int dest;
> +	unsigned long flags;
> +
> +	if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
> +		!irq_remapping_cap(IRQ_POSTING_CAP))
> +		return;
> +
> +	do {
> +		old.control = new.control = pi_desc->control;
> +
> +		dest = cpu_physical_id(vcpu->cpu);
> +
> +		if (x2apic_enabled())
> +			new.ndst = dest;
> +		else
> +			new.ndst = (dest << 8) & 0xFF00;
> +
> +		/* Allow posting non-urgent interrupts */
> +		new.sn = 0;
> +
> +		/* set 'NV' to 'notification vector' */
> +		new.nv = POSTED_INTR_VECTOR;
> +	} while (cmpxchg(&pi_desc->control, old.control,
> +			new.control) != old.control);
> +
> +	if(vcpu->pre_pcpu != -1) {
> +		spin_lock_irqsave(
> +			&per_cpu(blocked_vcpu_on_cpu_lock,
> +			vcpu->pre_pcpu), flags);
> +		list_del(&vcpu->blocked_vcpu_list);
> +		spin_unlock_irqrestore(
> +			&per_cpu(blocked_vcpu_on_cpu_lock,
> +			vcpu->pre_pcpu), flags);
> +		vcpu->pre_pcpu = -1;
> +	}
> +}
> +
> +/*
>   * vmx_update_pi_irte - set IRTE for Posted-Interrupts
>   *
>   * @kvm: kvm
> @@ -10620,6 +10770,9 @@ static struct kvm_x86_ops vmx_x86_ops = {
>  	.flush_log_dirty = vmx_flush_log_dirty,
>  	.enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked,
>  
> +	.pre_block = vmx_pre_block,
> +	.post_block = vmx_post_block,
> +
>  	.pmu_ops = &intel_pmu_ops,
>  
>  	.update_pi_irte = vmx_update_pi_irte,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 58688aa..46f55b2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5869,7 +5869,12 @@ int kvm_vcpu_halt(struct kvm_vcpu *vcpu)
>  {
>  	++vcpu->stat.halt_exits;
>  	if (irqchip_in_kernel(vcpu->kvm)) {
> -		vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
> +		/* Handle posted-interrupt when vCPU is to be halted */
> +		if (!kvm_x86_ops->pre_block ||
> +				kvm_x86_ops->pre_block(vcpu) == 0) {
> +			vcpu->arch.halted = true;
> +			vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
> +		}
>  		return 1;
>  	} else {
>  		vcpu->run->exit_reason = KVM_EXIT_HLT;
> @@ -6518,6 +6523,20 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  			kvm_vcpu_reload_apic_access_page(vcpu);
>  	}
>  
> +	/*
> +	 * KVM_REQ_EVENT is not set when posted interrupts are set by
> +	 * VT-d hardware, so we have to update RVI unconditionally.
> +	 */
> +	if (kvm_lapic_enabled(vcpu)) {
> +		/*
> +		 * Update architecture specific hints for APIC
> +		 * virtual interrupt delivery.
> +		 */
> +		if (kvm_x86_ops->hwapic_irr_update)
> +			kvm_x86_ops->hwapic_irr_update(vcpu,
> +				kvm_lapic_find_highest_irr(vcpu));
> +	}
> +
>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>  		kvm_apic_accept_events(vcpu);
>  		if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> @@ -6534,13 +6553,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  			kvm_x86_ops->enable_irq_window(vcpu);
>  
>  		if (kvm_lapic_enabled(vcpu)) {
> -			/*
> -			 * Update architecture specific hints for APIC
> -			 * virtual interrupt delivery.
> -			 */
> -			if (kvm_x86_ops->hwapic_irr_update)
> -				kvm_x86_ops->hwapic_irr_update(vcpu,
> -					kvm_lapic_find_highest_irr(vcpu));
>  			update_cr8_intercept(vcpu);
>  			kvm_lapic_sync_to_vapic(vcpu);
>  		}
> @@ -6711,10 +6723,31 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
>  
>  	for (;;) {
>  		if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
> -		    !vcpu->arch.apf.halted)
> +		    !vcpu->arch.apf.halted) {
> +			/*
> +			 * For some cases, we can get here with
> +			 * vcpu->arch.halted being true.
> +			 */


Feng,

can you explain what are the reasons for this?

Perhaps pre_block and post_block should be handled entirely in
vcpu_block, like this:

 static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu)
 {
-	if (!kvm_arch_vcpu_runnable(vcpu)) {
+	if (!kvm_arch_vcpu_runnable(vcpu) &&
+	    (!kvm_x86_ops->pre_block || kvm_x86_ops->pre_block(vcpu) == 0)) {
 		srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
 		kvm_vcpu_block(vcpu);
 		vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
+
+		if (kvm_x86_ops->post_block)
+			kvm_x86_ops->post_block(vcpu);
+
 		if (!kvm_check_request(KVM_REQ_UNHALT, vcpu))
 			return 1;
 	}

and removing all of your changes in kvm_vcpu_halt and vcpu_run?
(Full patch below my signature, only compile-tested).

It should not matter if mp_state goes briefly to KVM_MP_STATE_HALTED and
from there back to KVM_MP_STATE_RUNNABLE.

Paolo

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4a0b1a800317..5265a5522458 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -568,8 +568,6 @@ struct kvm_vcpu_arch {
 	 */
 	bool write_fault_to_shadow_pgtable;
 
-	bool halted;
-
 	/* set at EPT violation at this point */
 	unsigned long exit_qualification;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a2a070541318..44092cfa1c0a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5674,12 +5674,7 @@ int kvm_vcpu_halt(struct kvm_vcpu *vcpu)
 {
 	++vcpu->stat.halt_exits;
 	if (irqchip_in_kernel(vcpu->kvm)) {
-		/* Handle posted-interrupt when vCPU is to be halted */
-		if (!kvm_x86_ops->pre_block ||
-				kvm_x86_ops->pre_block(vcpu) == 0) {
-			vcpu->arch.halted = true;
-			vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
-		}
+		vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
 		return 1;
 	} else {
 		vcpu->run->exit_reason = KVM_EXIT_HLT;
@@ -6446,10 +6441,15 @@ out:
 
 static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu)
 {
-	if (!kvm_arch_vcpu_runnable(vcpu)) {
+	if (!kvm_arch_vcpu_runnable(vcpu) &&
+	    (!kvm_x86_ops->pre_block || kvm_x86_ops->pre_block(vcpu) == 0)) {
 		srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
 		kvm_vcpu_block(vcpu);
 		vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
+
+		if (kvm_x86_ops->post_block)
+			kvm_x86_ops->post_block(vcpu);
+
 		if (!kvm_check_request(KVM_REQ_UNHALT, vcpu))
 			return 1;
 	}
@@ -6482,28 +6482,9 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
 	for (;;) {
 		if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
 		    !vcpu->arch.apf.halted) {
-			/*
-			 * For some cases, we can get here with
-			 * vcpu->arch.halted being true.
-			 */
-			if (kvm_x86_ops->post_block && vcpu->arch.halted) {
-				kvm_x86_ops->post_block(vcpu);
-				vcpu->arch.halted = false;
-			}
-
 			r = vcpu_enter_guest(vcpu);
 		} else {
 			r = vcpu_block(kvm, vcpu);
-
-			/*
-			 * post_block() must be called after
-			 * pre_block() which is called in
-			 * kvm_vcpu_halt().
-			 */
-			if (kvm_x86_ops->post_block && vcpu->arch.halted) {
-				kvm_x86_ops->post_block(vcpu);
-				vcpu->arch.halted = false;
-			}
 		}
 
 		if (r <= 0)

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v9 12/18] vfio: Register/unregister irq_bypass_producer
  2015-09-18 14:29 ` [PATCH v9 12/18] vfio: Register/unregister irq_bypass_producer Feng Wu
@ 2015-09-18 17:19   ` Alex Williamson
  2015-09-21  8:56   ` Wu, Feng
  2016-04-26 20:08   ` Alex Williamson
  2 siblings, 0 replies; 56+ messages in thread
From: Alex Williamson @ 2015-09-18 17:19 UTC (permalink / raw)
  To: Feng Wu; +Cc: pbonzini, joro, mtosatti, eric.auger, kvm, iommu, linux-kernel

On Fri, 2015-09-18 at 22:29 +0800, Feng Wu wrote:
> This patch adds the registration/unregistration of an
> irq_bypass_producer for MSI/MSIx on vfio pci devices.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>

On nit, Paolo could you please fix the spelling of "registration" in the
dev_info, otherwise:

Acked-by: Alex Williamson <alex.williamson@redhat.com>


> ---
> v8:
> - Merge "[PATCH v7 08/17] vfio: Select IRQ_BYPASS_MANAGER for vfio PCI devices"
>   into this patch.
> 
> v6:
> - Make the add_consumer and del_consumer callbacks static
> - Remove pointless INIT_LIST_HEAD to 'vdev->ctx[vector].producer.node)'
> - Use dev_info instead of WARN_ON() when irq_bypass_register_producer fails
> - Remove optional dummy callbacks for irq producer
> 
>  drivers/vfio/pci/Kconfig            | 1 +
>  drivers/vfio/pci/vfio_pci_intrs.c   | 9 +++++++++
>  drivers/vfio/pci/vfio_pci_private.h | 2 ++
>  3 files changed, 12 insertions(+)
> 
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 579d83b..02912f1 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -2,6 +2,7 @@ config VFIO_PCI
>  	tristate "VFIO support for PCI devices"
>  	depends on VFIO && PCI && EVENTFD
>  	select VFIO_VIRQFD
> +	select IRQ_BYPASS_MANAGER
>  	help
>  	  Support for the PCI VFIO bus driver.  This is required to make
>  	  use of PCI drivers using the VFIO framework.
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 1f577b4..c65299d 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -319,6 +319,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
>  
>  	if (vdev->ctx[vector].trigger) {
>  		free_irq(irq, vdev->ctx[vector].trigger);
> +		irq_bypass_unregister_producer(&vdev->ctx[vector].producer);
>  		kfree(vdev->ctx[vector].name);
>  		eventfd_ctx_put(vdev->ctx[vector].trigger);
>  		vdev->ctx[vector].trigger = NULL;
> @@ -360,6 +361,14 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
>  		return ret;
>  	}
>  
> +	vdev->ctx[vector].producer.token = trigger;
> +	vdev->ctx[vector].producer.irq = irq;
> +	ret = irq_bypass_register_producer(&vdev->ctx[vector].producer);
> +	if (unlikely(ret))
> +		dev_info(&pdev->dev,
> +		"irq bypass producer (token %p) registeration fails: %d\n",
> +		vdev->ctx[vector].producer.token, ret);
> +
>  	vdev->ctx[vector].trigger = trigger;
>  
>  	return 0;
> diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
> index ae0e1b4..0e7394f 100644
> --- a/drivers/vfio/pci/vfio_pci_private.h
> +++ b/drivers/vfio/pci/vfio_pci_private.h
> @@ -13,6 +13,7 @@
>  
>  #include <linux/mutex.h>
>  #include <linux/pci.h>
> +#include <linux/irqbypass.h>
>  
>  #ifndef VFIO_PCI_PRIVATE_H
>  #define VFIO_PCI_PRIVATE_H
> @@ -29,6 +30,7 @@ struct vfio_pci_irq_ctx {
>  	struct virqfd		*mask;
>  	char			*name;
>  	bool			masked;
> +	struct irq_bypass_producer	producer;
>  };
>  
>  struct vfio_pci_device {




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series
  2015-09-18 14:58 ` [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series Paolo Bonzini
  2015-09-18 15:08   ` Wu, Feng
@ 2015-09-18 17:57   ` Alex Williamson
  1 sibling, 0 replies; 56+ messages in thread
From: Alex Williamson @ 2015-09-18 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Feng Wu, joro, mtosatti, eric.auger, kvm, iommu, linux-kernel

On Fri, 2015-09-18 at 16:58 +0200, Paolo Bonzini wrote:
> 
> On 18/09/2015 16:29, Feng Wu wrote:
> > VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> > With VT-d Posted-Interrupts enabled, external interrupts from
> > direct-assigned devices can be delivered to guests without VMM
> > intervention when guest is running in non-root mode.
> > 
> > You can find the VT-d Posted-Interrtups Spec. in the following URL:
> > http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html
> 
> Thanks.  I will squash patches 2 and 14 together, and drop patch 3.
> 
> Signed-off-bys are missing in patch 1 and 4.  The patches exist
> elsewhere in the mailing list archives, so not a big deal.  Or just
> reply to them with the S-o-b line.
> 
> Alex, can you ack the series and review patch 12?

I sent an ack for 12 separately, I got a bit lost in 16 & 17, but for
all the others that don't already have some tag from me,

Reviewed-by: Alex Williamson <alex.williamson@redhat.com>

> 
> Joerg, can you ack patch 18?
> 
> Paolo
> 
> > v9:
> > - Include the whole series:
> > [01/18]: irq bypasser manager
> > [02/18] - [06/18]: Common non-architecture part for VT-d PI and ARM side forwarded irq
> > [07/18] - [18/18]: VT-d PI part
> > 
> > v8:
> > refer to the changelog in each patch
> > 
> > v7:
> > * Define two weak irq bypass callbacks:
> >   - kvm_arch_irq_bypass_start()
> >   - kvm_arch_irq_bypass_stop()
> > * Remove the x86 dummy implementation of the above two functions.
> > * Print some useful information instead of WARN_ON() when the
> >   irq bypass consumer unregistration fails.
> > * Fix an issue when calling pi_pre_block and pi_post_block.
> > 
> > v6:
> > * Rebase on 4.2.0-rc6
> > * Rebase on https://lkml.org/lkml/2015/8/6/526 and http://www.gossamer-threads.com/lists/linux/kernel/2235623
> > * Make the add_consumer and del_consumer callbacks static
> > * Remove pointless INIT_LIST_HEAD to 'vdev->ctx[vector].producer.node)'
> > * Use dev_info instead of WARN_ON() when irq_bypass_register_producer fails
> > * Remove optional dummy callbacks for irq producer
> > 
> > v4:
> > * For lowest-priority interrupt, only support single-CPU destination
> > interrupts at the current stage, more common lowest priority support
> > will be added later.
> > * Accoring to Marcelo's suggestion, when vCPU is blocked, we handle
> > the posted-interrupts in the HLT emulation path.
> > * Some small changes (coding style, typo, add some code comments)
> > 
> > v3:
> > * Adjust the Posted-interrupts Descriptor updating logic when vCPU is
> >   preempted or blocked.
> > * KVM_DEV_VFIO_DEVICE_POSTING_IRQ --> KVM_DEV_VFIO_DEVICE_POST_IRQ
> > * __KVM_HAVE_ARCH_KVM_VFIO_POSTING --> __KVM_HAVE_ARCH_KVM_VFIO_POST
> > * Add KVM_DEV_VFIO_DEVICE_UNPOST_IRQ attribute for VFIO irq, which
> >   can be used to change back to remapping mode.
> > * Fix typo
> > 
> > v2:
> > * Use VFIO framework to enable this feature, the VFIO part of this series is
> >   base on Eric's patch "[PATCH v3 0/8] KVM-VFIO IRQ forward control"
> > * Rebase this patchset on git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git,
> >   then revise some irq logic based on the new hierarchy irqdomain patches provided
> >   by Jiang Liu <jiang.liu@linux.intel.com>
> > 
> > 
> > *** BLURB HERE ***
> > 
> > Alex Williamson (1):
> >   virt: IRQ bypass manager
> > 
> > Eric Auger (4):
> >   KVM: arm/arm64: select IRQ_BYPASS_MANAGER
> >   KVM: create kvm_irqfd.h
> >   KVM: introduce kvm_arch functions for IRQ bypass
> >   KVM: eventfd: add irq bypass consumer management
> > 
> > Feng Wu (13):
> >   KVM: x86: select IRQ_BYPASS_MANAGER
> >   KVM: Extend struct pi_desc for VT-d Posted-Interrupts
> >   KVM: Add some helper functions for Posted-Interrupts
> >   KVM: Define a new interface kvm_intr_is_single_vcpu()
> >   KVM: Make struct kvm_irq_routing_table accessible
> >   KVM: make kvm_set_msi_irq() public
> >   vfio: Register/unregister irq_bypass_producer
> >   KVM: x86: Update IRTE for posted-interrupts
> >   KVM: Implement IRQ bypass consumer callbacks for x86
> >   KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'
> >   KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
> >   KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
> >   iommu/vt-d: Add a command line parameter for VT-d posted-interrupts
> > 
> >  Documentation/kernel-parameters.txt   |   1 +
> >  Documentation/virtual/kvm/locking.txt |  12 ++
> >  MAINTAINERS                           |   7 +
> >  arch/arm/kvm/Kconfig                  |   2 +
> >  arch/arm/kvm/Makefile                 |   1 +
> >  arch/arm64/kvm/Kconfig                |   2 +
> >  arch/arm64/kvm/Makefile               |   1 +
> >  arch/x86/include/asm/kvm_host.h       |  24 +++
> >  arch/x86/kvm/Kconfig                  |   3 +
> >  arch/x86/kvm/Makefile                 |   3 +
> >  arch/x86/kvm/irq_comm.c               |  32 ++-
> >  arch/x86/kvm/lapic.c                  |  59 ++++++
> >  arch/x86/kvm/lapic.h                  |   2 +
> >  arch/x86/kvm/trace.h                  |  33 ++++
> >  arch/x86/kvm/vmx.c                    | 361 +++++++++++++++++++++++++++++++++-
> >  arch/x86/kvm/x86.c                    | 108 +++++++++-
> >  drivers/iommu/irq_remapping.c         |  12 +-
> >  drivers/vfio/pci/Kconfig              |   1 +
> >  drivers/vfio/pci/vfio_pci_intrs.c     |   9 +
> >  drivers/vfio/pci/vfio_pci_private.h   |   2 +
> >  include/linux/irqbypass.h             |  90 +++++++++
> >  include/linux/kvm_host.h              |  29 +++
> >  include/linux/kvm_irqfd.h             |  71 +++++++
> >  virt/kvm/Kconfig                      |   3 +
> >  virt/kvm/eventfd.c                    | 142 +++++++------
> >  virt/kvm/irqchip.c                    |  10 -
> >  virt/kvm/kvm_main.c                   |   3 +
> >  virt/lib/Kconfig                      |   2 +
> >  virt/lib/Makefile                     |   1 +
> >  virt/lib/irqbypass.c                  | 257 ++++++++++++++++++++++++
> >  30 files changed, 1182 insertions(+), 101 deletions(-)
> >  create mode 100644 include/linux/irqbypass.h
> >  create mode 100644 include/linux/kvm_irqfd.h
> >  create mode 100644 virt/lib/Kconfig
> >  create mode 100644 virt/lib/Makefile
> >  create mode 100644 virt/lib/irqbypass.c
> > 




^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v9 17/18] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-09-18 16:06   ` Paolo Bonzini
@ 2015-09-19  7:11     ` Wu, Feng
  2015-09-21  2:16     ` Wu, Feng
  1 sibling, 0 replies; 56+ messages in thread
From: Wu, Feng @ 2015-09-19  7:11 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Williamson, joro, Marcelo Tosatti
  Cc: iommu, linux-kernel, KVM list, Eric Auger, Wu, Feng



> -----Original Message-----
> From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo
> Bonzini
> Sent: Saturday, September 19, 2015 12:07 AM
> To: Wu, Feng; Alex Williamson; joro@8bytes.org; Marcelo Tosatti
> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org; KVM list;
> Eric Auger
> Subject: Re: [PATCH v9 17/18] KVM: Update Posted-Interrupts Descriptor when
> vCPU is blocked
> 
> 
> 
> On 18/09/2015 16:29, Feng Wu wrote:
> > This patch updates the Posted-Interrupts Descriptor when vCPU
> > is blocked.
> >
> > pre-block:
> > - Add the vCPU to the blocked per-CPU list
> > - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR
> >
> > post-block:
> > - Remove the vCPU from the per-CPU list
> >
> > Signed-off-by: Feng Wu
> <feng.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > ---
> > v9:
> > - Add description for blocked_vcpu_on_cpu_lock in
> Documentation/virtual/kvm/locking.txt
> > - Check !kvm_arch_has_assigned_device(vcpu->kvm) first, then
> >   !irq_remapping_cap(IRQ_POSTING_CAP)
> >
> > v8:
> > - Rename 'pi_pre_block' to 'pre_block'
> > - Rename 'pi_post_block' to 'post_block'
> > - Change some comments
> > - Only add the vCPU to the blocking list when the VM has assigned devices.
> >
> >  Documentation/virtual/kvm/locking.txt |  12 +++
> >  arch/x86/include/asm/kvm_host.h       |  13 +++
> >  arch/x86/kvm/vmx.c                    | 153
> ++++++++++++++++++++++++++++++++++
> >  arch/x86/kvm/x86.c                    |  53 +++++++++---
> >  include/linux/kvm_host.h              |   3 +
> >  virt/kvm/kvm_main.c                   |   3 +
> >  6 files changed, 227 insertions(+), 10 deletions(-)
> >
> > diff --git a/Documentation/virtual/kvm/locking.txt
> b/Documentation/virtual/kvm/locking.txt
> > index d68af4d..19f94a6 100644
> > --- a/Documentation/virtual/kvm/locking.txt
> > +++ b/Documentation/virtual/kvm/locking.txt
> > @@ -166,3 +166,15 @@ Comment:	The srcu read lock must be held while
> accessing memslots (e.g.
> >  		MMIO/PIO address->device structure mapping (kvm->buses).
> >  		The srcu index can be stored in kvm_vcpu->srcu_idx per vcpu
> >  		if it is needed by multiple functions.
> > +
> > +Name:		blocked_vcpu_on_cpu_lock
> > +Type:		spinlock_t
> > +Arch:		x86
> > +Protects:	blocked_vcpu_on_cpu
> > +Comment:	This is a per-CPU lock and it is used for VT-d posted-interrupts.
> > +		When VT-d posted-interrupts is supported and the VM has assigned
> > +		devices, we put the blocked vCPU on the list blocked_vcpu_on_cpu
> > +		protected by blocked_vcpu_on_cpu_lock, when VT-d hardware issues
> > +		wakeup notification event since external interrupts from the
> > +		assigned devices happens, we will find the vCPU on the list to
> > +		wakeup.
> > diff --git a/arch/x86/include/asm/kvm_host.h
> b/arch/x86/include/asm/kvm_host.h
> > index 0ddd353..304fbb5 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -552,6 +552,8 @@ struct kvm_vcpu_arch {
> >  	 */
> >  	bool write_fault_to_shadow_pgtable;
> >
> > +	bool halted;
> > +
> >  	/* set at EPT violation at this point */
> >  	unsigned long exit_qualification;
> >
> > @@ -864,6 +866,17 @@ struct kvm_x86_ops {
> >  	/* pmu operations of sub-arch */
> >  	const struct kvm_pmu_ops *pmu_ops;
> >
> > +	/*
> > +	 * Architecture specific hooks for vCPU blocking due to
> > +	 * HLT instruction.
> > +	 * Returns for .pre_block():
> > +	 *    - 0 means continue to block the vCPU.
> > +	 *    - 1 means we cannot block the vCPU since some event
> > +	 *        happens during this period, such as, 'ON' bit in
> > +	 *        posted-interrupts descriptor is set.
> > +	 */
> > +	int (*pre_block)(struct kvm_vcpu *vcpu);
> > +	void (*post_block)(struct kvm_vcpu *vcpu);
> >  	int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
> >  			      uint32_t guest_irq, bool set);
> >  };
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 902a67d..9968896 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -879,6 +879,13 @@ static DEFINE_PER_CPU(struct vmcs *,
> current_vmcs);
> >  static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
> >  static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
> >
> > +/*
> > + * We maintian a per-CPU linked-list of vCPU, so in wakeup_handler() we
> > + * can find which vCPU should be waken up.
> > + */
> > +static DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
> > +static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
> > +
> >  static unsigned long *vmx_io_bitmap_a;
> >  static unsigned long *vmx_io_bitmap_b;
> >  static unsigned long *vmx_msr_bitmap_legacy;
> > @@ -2985,6 +2992,8 @@ static int hardware_enable(void)
> >  		return -EBUSY;
> >
> >  	INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
> > +	INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
> > +	spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> >
> >  	/*
> >  	 * Now we can enable the vmclear operation in kdump
> > @@ -6121,6 +6130,25 @@ static void update_ple_window_actual_max(void)
> >  			                    ple_window_grow, INT_MIN);
> >  }
> >
> > +/*
> > + * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
> > + */
> > +static void wakeup_handler(void)
> > +{
> > +	struct kvm_vcpu *vcpu;
> > +	int cpu = smp_processor_id();
> > +
> > +	spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > +	list_for_each_entry(vcpu, &per_cpu(blocked_vcpu_on_cpu, cpu),
> > +			blocked_vcpu_list) {
> > +		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > +
> > +		if (pi_test_on(pi_desc) == 1)
> > +			kvm_vcpu_kick(vcpu);
> > +	}
> > +	spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > +}
> > +
> >  static __init int hardware_setup(void)
> >  {
> >  	int r = -ENOMEM, i, msr;
> > @@ -6305,6 +6333,8 @@ static __init int hardware_setup(void)
> >  		kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
> >  	}
> >
> > +	kvm_set_posted_intr_wakeup_handler(wakeup_handler);
> > +
> >  	return alloc_kvm_area();
> >
> >  out8:
> > @@ -10430,6 +10460,126 @@ static void
> vmx_enable_log_dirty_pt_masked(struct kvm *kvm,
> >  }
> >
> >  /*
> > + * This routine does the following things for vCPU which is going
> > + * to be blocked if VT-d PI is enabled.
> > + * - Store the vCPU to the wakeup list, so when interrupts happen
> > + *   we can find the right vCPU to wake up.
> > + * - Change the Posted-interrupt descriptor as below:
> > + *      'NDST' <-- vcpu->pre_pcpu
> > + *      'NV' <-- POSTED_INTR_WAKEUP_VECTOR
> > + * - If 'ON' is set during this process, which means at least one
> > + *   interrupt is posted for this vCPU, we cannot block it, in
> > + *   this case, return 1, otherwise, return 0.
> > + *
> > + */
> > +static int vmx_pre_block(struct kvm_vcpu *vcpu)
> > +{
> > +	unsigned long flags;
> > +	unsigned int dest;
> > +	struct pi_desc old, new;
> > +	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > +
> > +	if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
> > +		!irq_remapping_cap(IRQ_POSTING_CAP))
> > +		return 0;
> > +
> > +	vcpu->pre_pcpu = vcpu->cpu;
> > +	spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > +			  vcpu->pre_pcpu), flags);
> > +	list_add_tail(&vcpu->blocked_vcpu_list,
> > +		      &per_cpu(blocked_vcpu_on_cpu,
> > +		      vcpu->pre_pcpu));
> > +	spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> > +			       vcpu->pre_pcpu), flags);
> > +
> > +	do {
> > +		old.control = new.control = pi_desc->control;
> > +
> > +		/*
> > +		 * We should not block the vCPU if
> > +		 * an interrupt is posted for it.
> > +		 */
> > +		if (pi_test_on(pi_desc) == 1) {
> > +			spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > +					  vcpu->pre_pcpu), flags);
> > +			list_del(&vcpu->blocked_vcpu_list);
> > +			spin_unlock_irqrestore(
> > +					&per_cpu(blocked_vcpu_on_cpu_lock,
> > +					vcpu->pre_pcpu), flags);
> > +			vcpu->pre_pcpu = -1;
> > +
> > +			return 1;
> > +		}
> > +
> > +		WARN((pi_desc->sn == 1),
> > +		     "Warning: SN field of posted-interrupts "
> > +		     "is set before blocking\n");
> > +
> > +		/*
> > +		 * Since vCPU can be preempted during this process,
> > +		 * vcpu->cpu could be different with pre_pcpu, we
> > +		 * need to set pre_pcpu as the destination of wakeup
> > +		 * notification event, then we can find the right vCPU
> > +		 * to wakeup in wakeup handler if interrupts happen
> > +		 * when the vCPU is in blocked state.
> > +		 */
> > +		dest = cpu_physical_id(vcpu->pre_pcpu);
> > +
> > +		if (x2apic_enabled())
> > +			new.ndst = dest;
> > +		else
> > +			new.ndst = (dest << 8) & 0xFF00;
> > +
> > +		/* set 'NV' to 'wakeup vector' */
> > +		new.nv = POSTED_INTR_WAKEUP_VECTOR;
> > +	} while (cmpxchg(&pi_desc->control, old.control,
> > +			new.control) != old.control);
> > +
> > +	return 0;
> > +}
> > +
> > +static void vmx_post_block(struct kvm_vcpu *vcpu)
> > +{
> > +	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > +	struct pi_desc old, new;
> > +	unsigned int dest;
> > +	unsigned long flags;
> > +
> > +	if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
> > +		!irq_remapping_cap(IRQ_POSTING_CAP))
> > +		return;
> > +
> > +	do {
> > +		old.control = new.control = pi_desc->control;
> > +
> > +		dest = cpu_physical_id(vcpu->cpu);
> > +
> > +		if (x2apic_enabled())
> > +			new.ndst = dest;
> > +		else
> > +			new.ndst = (dest << 8) & 0xFF00;
> > +
> > +		/* Allow posting non-urgent interrupts */
> > +		new.sn = 0;
> > +
> > +		/* set 'NV' to 'notification vector' */
> > +		new.nv = POSTED_INTR_VECTOR;
> > +	} while (cmpxchg(&pi_desc->control, old.control,
> > +			new.control) != old.control);
> > +
> > +	if(vcpu->pre_pcpu != -1) {
> > +		spin_lock_irqsave(
> > +			&per_cpu(blocked_vcpu_on_cpu_lock,
> > +			vcpu->pre_pcpu), flags);
> > +		list_del(&vcpu->blocked_vcpu_list);
> > +		spin_unlock_irqrestore(
> > +			&per_cpu(blocked_vcpu_on_cpu_lock,
> > +			vcpu->pre_pcpu), flags);
> > +		vcpu->pre_pcpu = -1;
> > +	}
> > +}
> > +
> > +/*
> >   * vmx_update_pi_irte - set IRTE for Posted-Interrupts
> >   *
> >   * @kvm: kvm
> > @@ -10620,6 +10770,9 @@ static struct kvm_x86_ops vmx_x86_ops = {
> >  	.flush_log_dirty = vmx_flush_log_dirty,
> >  	.enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked,
> >
> > +	.pre_block = vmx_pre_block,
> > +	.post_block = vmx_post_block,
> > +
> >  	.pmu_ops = &intel_pmu_ops,
> >
> >  	.update_pi_irte = vmx_update_pi_irte,
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 58688aa..46f55b2 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -5869,7 +5869,12 @@ int kvm_vcpu_halt(struct kvm_vcpu *vcpu)
> >  {
> >  	++vcpu->stat.halt_exits;
> >  	if (irqchip_in_kernel(vcpu->kvm)) {
> > -		vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
> > +		/* Handle posted-interrupt when vCPU is to be halted */
> > +		if (!kvm_x86_ops->pre_block ||
> > +				kvm_x86_ops->pre_block(vcpu) == 0) {
> > +			vcpu->arch.halted = true;
> > +			vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
> > +		}
> >  		return 1;
> >  	} else {
> >  		vcpu->run->exit_reason = KVM_EXIT_HLT;
> > @@ -6518,6 +6523,20 @@ static int vcpu_enter_guest(struct kvm_vcpu
> *vcpu)
> >  			kvm_vcpu_reload_apic_access_page(vcpu);
> >  	}
> >
> > +	/*
> > +	 * KVM_REQ_EVENT is not set when posted interrupts are set by
> > +	 * VT-d hardware, so we have to update RVI unconditionally.
> > +	 */
> > +	if (kvm_lapic_enabled(vcpu)) {
> > +		/*
> > +		 * Update architecture specific hints for APIC
> > +		 * virtual interrupt delivery.
> > +		 */
> > +		if (kvm_x86_ops->hwapic_irr_update)
> > +			kvm_x86_ops->hwapic_irr_update(vcpu,
> > +				kvm_lapic_find_highest_irr(vcpu));
> > +	}
> > +
> >  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> >  		kvm_apic_accept_events(vcpu);
> >  		if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> > @@ -6534,13 +6553,6 @@ static int vcpu_enter_guest(struct kvm_vcpu
> *vcpu)
> >  			kvm_x86_ops->enable_irq_window(vcpu);
> >
> >  		if (kvm_lapic_enabled(vcpu)) {
> > -			/*
> > -			 * Update architecture specific hints for APIC
> > -			 * virtual interrupt delivery.
> > -			 */
> > -			if (kvm_x86_ops->hwapic_irr_update)
> > -				kvm_x86_ops->hwapic_irr_update(vcpu,
> > -					kvm_lapic_find_highest_irr(vcpu));
> >  			update_cr8_intercept(vcpu);
> >  			kvm_lapic_sync_to_vapic(vcpu);
> >  		}
> > @@ -6711,10 +6723,31 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
> >
> >  	for (;;) {
> >  		if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
> > -		    !vcpu->arch.apf.halted)
> > +		    !vcpu->arch.apf.halted) {
> > +			/*
> > +			 * For some cases, we can get here with
> > +			 * vcpu->arch.halted being true.
> > +			 */
> 
> 
> Feng,
> 
> can you explain what are the reasons for this?
> 
> Perhaps pre_block and post_block should be handled entirely in
> vcpu_block, like this:

Indeed, I think your suggestion below is a clearer way to implement
It, I will test it and get back to you then. Thanks for the suggestion, Paolo!

Thanks,
Feng

> 
>  static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu)
>  {
> -	if (!kvm_arch_vcpu_runnable(vcpu)) {
> +	if (!kvm_arch_vcpu_runnable(vcpu) &&
> +	    (!kvm_x86_ops->pre_block || kvm_x86_ops->pre_block(vcpu) == 0)) {
>  		srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
>  		kvm_vcpu_block(vcpu);
>  		vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
> +
> +		if (kvm_x86_ops->post_block)
> +			kvm_x86_ops->post_block(vcpu);
> +
>  		if (!kvm_check_request(KVM_REQ_UNHALT, vcpu))
>  			return 1;
>  	}
> 
> and removing all of your changes in kvm_vcpu_halt and vcpu_run?
> (Full patch below my signature, only compile-tested).
> 
> It should not matter if mp_state goes briefly to KVM_MP_STATE_HALTED and
> from there back to KVM_MP_STATE_RUNNABLE.
> 
> Paolo
> 
> diff --git a/arch/x86/include/asm/kvm_host.h
> b/arch/x86/include/asm/kvm_host.h
> index 4a0b1a800317..5265a5522458 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -568,8 +568,6 @@ struct kvm_vcpu_arch {
>  	 */
>  	bool write_fault_to_shadow_pgtable;
> 
> -	bool halted;
> -
>  	/* set at EPT violation at this point */
>  	unsigned long exit_qualification;
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a2a070541318..44092cfa1c0a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5674,12 +5674,7 @@ int kvm_vcpu_halt(struct kvm_vcpu *vcpu)
>  {
>  	++vcpu->stat.halt_exits;
>  	if (irqchip_in_kernel(vcpu->kvm)) {
> -		/* Handle posted-interrupt when vCPU is to be halted */
> -		if (!kvm_x86_ops->pre_block ||
> -				kvm_x86_ops->pre_block(vcpu) == 0) {
> -			vcpu->arch.halted = true;
> -			vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
> -		}
> +		vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
>  		return 1;
>  	} else {
>  		vcpu->run->exit_reason = KVM_EXIT_HLT;
> @@ -6446,10 +6441,15 @@ out:
> 
>  static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu)
>  {
> -	if (!kvm_arch_vcpu_runnable(vcpu)) {
> +	if (!kvm_arch_vcpu_runnable(vcpu) &&
> +	    (!kvm_x86_ops->pre_block || kvm_x86_ops->pre_block(vcpu) == 0)) {
>  		srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
>  		kvm_vcpu_block(vcpu);
>  		vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
> +
> +		if (kvm_x86_ops->post_block)
> +			kvm_x86_ops->post_block(vcpu);
> +
>  		if (!kvm_check_request(KVM_REQ_UNHALT, vcpu))
>  			return 1;
>  	}
> @@ -6482,28 +6482,9 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
>  	for (;;) {
>  		if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
>  		    !vcpu->arch.apf.halted) {
> -			/*
> -			 * For some cases, we can get here with
> -			 * vcpu->arch.halted being true.
> -			 */
> -			if (kvm_x86_ops->post_block && vcpu->arch.halted) {
> -				kvm_x86_ops->post_block(vcpu);
> -				vcpu->arch.halted = false;
> -			}
> -
>  			r = vcpu_enter_guest(vcpu);
>  		} else {
>  			r = vcpu_block(kvm, vcpu);
> -
> -			/*
> -			 * post_block() must be called after
> -			 * pre_block() which is called in
> -			 * kvm_vcpu_halt().
> -			 */
> -			if (kvm_x86_ops->post_block && vcpu->arch.halted) {
> -				kvm_x86_ops->post_block(vcpu);
> -				vcpu->arch.halted = false;
> -			}
>  		}
> 
>  		if (r <= 0)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v9 17/18] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-09-18 16:06   ` Paolo Bonzini
  2015-09-19  7:11     ` Wu, Feng
@ 2015-09-21  2:16     ` Wu, Feng
  2015-09-21  5:32       ` Paolo Bonzini
  1 sibling, 1 reply; 56+ messages in thread
From: Wu, Feng @ 2015-09-21  2:16 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Williamson, joro, Marcelo Tosatti
  Cc: iommu, linux-kernel, KVM list, Eric Auger, Wu, Feng



> -----Original Message-----
> From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo
> Bonzini
> Sent: Saturday, September 19, 2015 12:07 AM
> To: Wu, Feng; Alex Williamson; joro@8bytes.org; Marcelo Tosatti
> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org; KVM list;
> Eric Auger
> Subject: Re: [PATCH v9 17/18] KVM: Update Posted-Interrupts Descriptor when
> vCPU is blocked
> 
> 
> 
> On 18/09/2015 16:29, Feng Wu wrote:
> > This patch updates the Posted-Interrupts Descriptor when vCPU
> > is blocked.
> >
> > pre-block:
> > - Add the vCPU to the blocked per-CPU list
> > - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR
> >
> > post-block:
> > - Remove the vCPU from the per-CPU list
> >
> > Signed-off-by: Feng Wu
> <feng.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > ---
> > v9:
> > - Add description for blocked_vcpu_on_cpu_lock in
> Documentation/virtual/kvm/locking.txt
> > - Check !kvm_arch_has_assigned_device(vcpu->kvm) first, then
> >   !irq_remapping_cap(IRQ_POSTING_CAP)
> >
> > v8:
> > - Rename 'pi_pre_block' to 'pre_block'
> > - Rename 'pi_post_block' to 'post_block'
> > - Change some comments
> > - Only add the vCPU to the blocking list when the VM has assigned devices.
> >
> >  Documentation/virtual/kvm/locking.txt |  12 +++
> >  arch/x86/include/asm/kvm_host.h       |  13 +++
> >  arch/x86/kvm/vmx.c                    | 153
> ++++++++++++++++++++++++++++++++++
> >  arch/x86/kvm/x86.c                    |  53 +++++++++---
> >  include/linux/kvm_host.h              |   3 +
> >  virt/kvm/kvm_main.c                   |   3 +
> >  6 files changed, 227 insertions(+), 10 deletions(-)
> >
> > diff --git a/Documentation/virtual/kvm/locking.txt
> b/Documentation/virtual/kvm/locking.txt
> > index d68af4d..19f94a6 100644
> > --- a/Documentation/virtual/kvm/locking.txt
> > +++ b/Documentation/virtual/kvm/locking.txt
> > @@ -166,3 +166,15 @@ Comment:	The srcu read lock must be held while
> accessing memslots (e.g.
> >  		MMIO/PIO address->device structure mapping (kvm->buses).
> >  		The srcu index can be stored in kvm_vcpu->srcu_idx per vcpu
> >  		if it is needed by multiple functions.
> > +
> > +Name:		blocked_vcpu_on_cpu_lock
> > +Type:		spinlock_t
> > +Arch:		x86
> > +Protects:	blocked_vcpu_on_cpu
> > +Comment:	This is a per-CPU lock and it is used for VT-d posted-interrupts.
> > +		When VT-d posted-interrupts is supported and the VM has assigned
> > +		devices, we put the blocked vCPU on the list blocked_vcpu_on_cpu
> > +		protected by blocked_vcpu_on_cpu_lock, when VT-d hardware issues
> > +		wakeup notification event since external interrupts from the
> > +		assigned devices happens, we will find the vCPU on the list to
> > +		wakeup.
> > diff --git a/arch/x86/include/asm/kvm_host.h
> b/arch/x86/include/asm/kvm_host.h
> > index 0ddd353..304fbb5 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -552,6 +552,8 @@ struct kvm_vcpu_arch {
> >  	 */
> >  	bool write_fault_to_shadow_pgtable;
> >
> > +	bool halted;
> > +
> >  	/* set at EPT violation at this point */
> >  	unsigned long exit_qualification;
> >
> > @@ -864,6 +866,17 @@ struct kvm_x86_ops {
> >  	/* pmu operations of sub-arch */
> >  	const struct kvm_pmu_ops *pmu_ops;
> >
> > +	/*
> > +	 * Architecture specific hooks for vCPU blocking due to
> > +	 * HLT instruction.
> > +	 * Returns for .pre_block():
> > +	 *    - 0 means continue to block the vCPU.
> > +	 *    - 1 means we cannot block the vCPU since some event
> > +	 *        happens during this period, such as, 'ON' bit in
> > +	 *        posted-interrupts descriptor is set.
> > +	 */
> > +	int (*pre_block)(struct kvm_vcpu *vcpu);
> > +	void (*post_block)(struct kvm_vcpu *vcpu);
> >  	int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
> >  			      uint32_t guest_irq, bool set);
> >  };
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 902a67d..9968896 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -879,6 +879,13 @@ static DEFINE_PER_CPU(struct vmcs *,
> current_vmcs);
> >  static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
> >  static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
> >
> > +/*
> > + * We maintian a per-CPU linked-list of vCPU, so in wakeup_handler() we
> > + * can find which vCPU should be waken up.
> > + */
> > +static DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
> > +static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
> > +
> >  static unsigned long *vmx_io_bitmap_a;
> >  static unsigned long *vmx_io_bitmap_b;
> >  static unsigned long *vmx_msr_bitmap_legacy;
> > @@ -2985,6 +2992,8 @@ static int hardware_enable(void)
> >  		return -EBUSY;
> >
> >  	INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
> > +	INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
> > +	spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> >
> >  	/*
> >  	 * Now we can enable the vmclear operation in kdump
> > @@ -6121,6 +6130,25 @@ static void
> update_ple_window_actual_max(void)
> >  			                    ple_window_grow, INT_MIN);
> >  }
> >
> > +/*
> > + * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
> > + */
> > +static void wakeup_handler(void)
> > +{
> > +	struct kvm_vcpu *vcpu;
> > +	int cpu = smp_processor_id();
> > +
> > +	spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > +	list_for_each_entry(vcpu, &per_cpu(blocked_vcpu_on_cpu, cpu),
> > +			blocked_vcpu_list) {
> > +		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > +
> > +		if (pi_test_on(pi_desc) == 1)
> > +			kvm_vcpu_kick(vcpu);
> > +	}
> > +	spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > +}
> > +
> >  static __init int hardware_setup(void)
> >  {
> >  	int r = -ENOMEM, i, msr;
> > @@ -6305,6 +6333,8 @@ static __init int hardware_setup(void)
> >  		kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
> >  	}
> >
> > +	kvm_set_posted_intr_wakeup_handler(wakeup_handler);
> > +
> >  	return alloc_kvm_area();
> >
> >  out8:
> > @@ -10430,6 +10460,126 @@ static void
> vmx_enable_log_dirty_pt_masked(struct kvm *kvm,
> >  }
> >
> >  /*
> > + * This routine does the following things for vCPU which is going
> > + * to be blocked if VT-d PI is enabled.
> > + * - Store the vCPU to the wakeup list, so when interrupts happen
> > + *   we can find the right vCPU to wake up.
> > + * - Change the Posted-interrupt descriptor as below:
> > + *      'NDST' <-- vcpu->pre_pcpu
> > + *      'NV' <-- POSTED_INTR_WAKEUP_VECTOR
> > + * - If 'ON' is set during this process, which means at least one
> > + *   interrupt is posted for this vCPU, we cannot block it, in
> > + *   this case, return 1, otherwise, return 0.
> > + *
> > + */
> > +static int vmx_pre_block(struct kvm_vcpu *vcpu)
> > +{
> > +	unsigned long flags;
> > +	unsigned int dest;
> > +	struct pi_desc old, new;
> > +	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > +
> > +	if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
> > +		!irq_remapping_cap(IRQ_POSTING_CAP))
> > +		return 0;
> > +
> > +	vcpu->pre_pcpu = vcpu->cpu;
> > +	spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > +			  vcpu->pre_pcpu), flags);
> > +	list_add_tail(&vcpu->blocked_vcpu_list,
> > +		      &per_cpu(blocked_vcpu_on_cpu,
> > +		      vcpu->pre_pcpu));
> > +	spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> > +			       vcpu->pre_pcpu), flags);
> > +
> > +	do {
> > +		old.control = new.control = pi_desc->control;
> > +
> > +		/*
> > +		 * We should not block the vCPU if
> > +		 * an interrupt is posted for it.
> > +		 */
> > +		if (pi_test_on(pi_desc) == 1) {
> > +			spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > +					  vcpu->pre_pcpu), flags);
> > +			list_del(&vcpu->blocked_vcpu_list);
> > +			spin_unlock_irqrestore(
> > +					&per_cpu(blocked_vcpu_on_cpu_lock,
> > +					vcpu->pre_pcpu), flags);
> > +			vcpu->pre_pcpu = -1;
> > +
> > +			return 1;
> > +		}
> > +
> > +		WARN((pi_desc->sn == 1),
> > +		     "Warning: SN field of posted-interrupts "
> > +		     "is set before blocking\n");
> > +
> > +		/*
> > +		 * Since vCPU can be preempted during this process,
> > +		 * vcpu->cpu could be different with pre_pcpu, we
> > +		 * need to set pre_pcpu as the destination of wakeup
> > +		 * notification event, then we can find the right vCPU
> > +		 * to wakeup in wakeup handler if interrupts happen
> > +		 * when the vCPU is in blocked state.
> > +		 */
> > +		dest = cpu_physical_id(vcpu->pre_pcpu);
> > +
> > +		if (x2apic_enabled())
> > +			new.ndst = dest;
> > +		else
> > +			new.ndst = (dest << 8) & 0xFF00;
> > +
> > +		/* set 'NV' to 'wakeup vector' */
> > +		new.nv = POSTED_INTR_WAKEUP_VECTOR;
> > +	} while (cmpxchg(&pi_desc->control, old.control,
> > +			new.control) != old.control);
> > +
> > +	return 0;
> > +}
> > +
> > +static void vmx_post_block(struct kvm_vcpu *vcpu)
> > +{
> > +	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > +	struct pi_desc old, new;
> > +	unsigned int dest;
> > +	unsigned long flags;
> > +
> > +	if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
> > +		!irq_remapping_cap(IRQ_POSTING_CAP))
> > +		return;
> > +
> > +	do {
> > +		old.control = new.control = pi_desc->control;
> > +
> > +		dest = cpu_physical_id(vcpu->cpu);
> > +
> > +		if (x2apic_enabled())
> > +			new.ndst = dest;
> > +		else
> > +			new.ndst = (dest << 8) & 0xFF00;
> > +
> > +		/* Allow posting non-urgent interrupts */
> > +		new.sn = 0;
> > +
> > +		/* set 'NV' to 'notification vector' */
> > +		new.nv = POSTED_INTR_VECTOR;
> > +	} while (cmpxchg(&pi_desc->control, old.control,
> > +			new.control) != old.control);
> > +
> > +	if(vcpu->pre_pcpu != -1) {
> > +		spin_lock_irqsave(
> > +			&per_cpu(blocked_vcpu_on_cpu_lock,
> > +			vcpu->pre_pcpu), flags);
> > +		list_del(&vcpu->blocked_vcpu_list);
> > +		spin_unlock_irqrestore(
> > +			&per_cpu(blocked_vcpu_on_cpu_lock,
> > +			vcpu->pre_pcpu), flags);
> > +		vcpu->pre_pcpu = -1;
> > +	}
> > +}
> > +
> > +/*
> >   * vmx_update_pi_irte - set IRTE for Posted-Interrupts
> >   *
> >   * @kvm: kvm
> > @@ -10620,6 +10770,9 @@ static struct kvm_x86_ops vmx_x86_ops = {
> >  	.flush_log_dirty = vmx_flush_log_dirty,
> >  	.enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked,
> >
> > +	.pre_block = vmx_pre_block,
> > +	.post_block = vmx_post_block,
> > +
> >  	.pmu_ops = &intel_pmu_ops,
> >
> >  	.update_pi_irte = vmx_update_pi_irte,
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 58688aa..46f55b2 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -5869,7 +5869,12 @@ int kvm_vcpu_halt(struct kvm_vcpu *vcpu)
> >  {
> >  	++vcpu->stat.halt_exits;
> >  	if (irqchip_in_kernel(vcpu->kvm)) {
> > -		vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
> > +		/* Handle posted-interrupt when vCPU is to be halted */
> > +		if (!kvm_x86_ops->pre_block ||
> > +				kvm_x86_ops->pre_block(vcpu) == 0) {
> > +			vcpu->arch.halted = true;
> > +			vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
> > +		}
> >  		return 1;
> >  	} else {
> >  		vcpu->run->exit_reason = KVM_EXIT_HLT;
> > @@ -6518,6 +6523,20 @@ static int vcpu_enter_guest(struct kvm_vcpu
> *vcpu)
> >  			kvm_vcpu_reload_apic_access_page(vcpu);
> >  	}
> >
> > +	/*
> > +	 * KVM_REQ_EVENT is not set when posted interrupts are set by
> > +	 * VT-d hardware, so we have to update RVI unconditionally.
> > +	 */
> > +	if (kvm_lapic_enabled(vcpu)) {
> > +		/*
> > +		 * Update architecture specific hints for APIC
> > +		 * virtual interrupt delivery.
> > +		 */
> > +		if (kvm_x86_ops->hwapic_irr_update)
> > +			kvm_x86_ops->hwapic_irr_update(vcpu,
> > +				kvm_lapic_find_highest_irr(vcpu));
> > +	}
> > +
> >  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> >  		kvm_apic_accept_events(vcpu);
> >  		if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> > @@ -6534,13 +6553,6 @@ static int vcpu_enter_guest(struct kvm_vcpu
> *vcpu)
> >  			kvm_x86_ops->enable_irq_window(vcpu);
> >
> >  		if (kvm_lapic_enabled(vcpu)) {
> > -			/*
> > -			 * Update architecture specific hints for APIC
> > -			 * virtual interrupt delivery.
> > -			 */
> > -			if (kvm_x86_ops->hwapic_irr_update)
> > -				kvm_x86_ops->hwapic_irr_update(vcpu,
> > -					kvm_lapic_find_highest_irr(vcpu));
> >  			update_cr8_intercept(vcpu);
> >  			kvm_lapic_sync_to_vapic(vcpu);
> >  		}
> > @@ -6711,10 +6723,31 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
> >
> >  	for (;;) {
> >  		if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
> > -		    !vcpu->arch.apf.halted)
> > +		    !vcpu->arch.apf.halted) {
> > +			/*
> > +			 * For some cases, we can get here with
> > +			 * vcpu->arch.halted being true.
> > +			 */
> 
> 
> Feng,
> 
> can you explain what are the reasons for this?
> 
> Perhaps pre_block and post_block should be handled entirely in
> vcpu_block, like this:
> 
>  static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu)
>  {
> -	if (!kvm_arch_vcpu_runnable(vcpu)) {
> +	if (!kvm_arch_vcpu_runnable(vcpu) &&
> +	    (!kvm_x86_ops->pre_block || kvm_x86_ops->pre_block(vcpu) == 0)) {
>  		srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
>  		kvm_vcpu_block(vcpu);
>  		vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
> +
> +		if (kvm_x86_ops->post_block)
> +			kvm_x86_ops->post_block(vcpu);
> +
>  		if (!kvm_check_request(KVM_REQ_UNHALT, vcpu))
>  			return 1;
>  	}
> 
> and removing all of your changes in kvm_vcpu_halt and vcpu_run?
> (Full patch below my signature, only compile-tested).
> 
> It should not matter if mp_state goes briefly to KVM_MP_STATE_HALTED and
> from there back to KVM_MP_STATE_RUNNABLE.
> 
> Paolo
> 
> diff --git a/arch/x86/include/asm/kvm_host.h
> b/arch/x86/include/asm/kvm_host.h
> index 4a0b1a800317..5265a5522458 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -568,8 +568,6 @@ struct kvm_vcpu_arch {
>  	 */
>  	bool write_fault_to_shadow_pgtable;
> 
> -	bool halted;
> -
>  	/* set at EPT violation at this point */
>  	unsigned long exit_qualification;
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a2a070541318..44092cfa1c0a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5674,12 +5674,7 @@ int kvm_vcpu_halt(struct kvm_vcpu *vcpu)
>  {
>  	++vcpu->stat.halt_exits;
>  	if (irqchip_in_kernel(vcpu->kvm)) {
> -		/* Handle posted-interrupt when vCPU is to be halted */
> -		if (!kvm_x86_ops->pre_block ||
> -				kvm_x86_ops->pre_block(vcpu) == 0) {
> -			vcpu->arch.halted = true;
> -			vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
> -		}
> +		vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
>  		return 1;
>  	} else {
>  		vcpu->run->exit_reason = KVM_EXIT_HLT;
> @@ -6446,10 +6441,15 @@ out:
> 
>  static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu)
>  {
> -	if (!kvm_arch_vcpu_runnable(vcpu)) {
> +	if (!kvm_arch_vcpu_runnable(vcpu) &&
> +	    (!kvm_x86_ops->pre_block || kvm_x86_ops->pre_block(vcpu) == 0)) {
>  		srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
>  		kvm_vcpu_block(vcpu);
>  		vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
> +
> +		if (kvm_x86_ops->post_block)
> +			kvm_x86_ops->post_block(vcpu);
> +
>  		if (!kvm_check_request(KVM_REQ_UNHALT, vcpu))
>  			return 1;
>  	}
> @@ -6482,28 +6482,9 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
>  	for (;;) {
>  		if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
>  		    !vcpu->arch.apf.halted) {
> -			/*
> -			 * For some cases, we can get here with
> -			 * vcpu->arch.halted being true.
> -			 */
> -			if (kvm_x86_ops->post_block && vcpu->arch.halted) {
> -				kvm_x86_ops->post_block(vcpu);
> -				vcpu->arch.halted = false;
> -			}
> -
>  			r = vcpu_enter_guest(vcpu);
>  		} else {
>  			r = vcpu_block(kvm, vcpu);
> -
> -			/*
> -			 * post_block() must be called after
> -			 * pre_block() which is called in
> -			 * kvm_vcpu_halt().
> -			 */
> -			if (kvm_x86_ops->post_block && vcpu->arch.halted) {
> -				kvm_x86_ops->post_block(vcpu);
> -				vcpu->arch.halted = false;
> -			}
>  		}
> 
>  		if (r <= 0)

I tested the above patch you suggested, it works fine. Thank you! So
do I need to resend a new version or you can handle it in your tree?

Thanks,
Feng


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v9 17/18] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-09-21  2:16     ` Wu, Feng
@ 2015-09-21  5:32       ` Paolo Bonzini
  2015-09-21  5:45         ` Wu, Feng
  0 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2015-09-21  5:32 UTC (permalink / raw)
  To: Wu, Feng, Alex Williamson, joro, Marcelo Tosatti
  Cc: iommu, linux-kernel, KVM list, Eric Auger



On 21/09/2015 04:16, Wu, Feng wrote:
> I tested the above patch you suggested, it works fine. Thank you! So
> do I need to resend a new version or you can handle it in your tree?

I will handle it.

Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v9 17/18] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-09-21  5:32       ` Paolo Bonzini
@ 2015-09-21  5:45         ` Wu, Feng
  0 siblings, 0 replies; 56+ messages in thread
From: Wu, Feng @ 2015-09-21  5:45 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Williamson, joro, Marcelo Tosatti
  Cc: iommu, linux-kernel, KVM list, Eric Auger, Wu, Feng



> -----Original Message-----
> From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo
> Bonzini
> Sent: Monday, September 21, 2015 1:33 PM
> To: Wu, Feng; Alex Williamson; joro@8bytes.org; Marcelo Tosatti
> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org; KVM list;
> Eric Auger
> Subject: Re: [PATCH v9 17/18] KVM: Update Posted-Interrupts Descriptor when
> vCPU is blocked
> 
> 
> 
> On 21/09/2015 04:16, Wu, Feng wrote:
> > I tested the above patch you suggested, it works fine. Thank you! So
> > do I need to resend a new version or you can handle it in your tree?
> 
> I will handle it.

Thanks a lot for your review on this series!

Thanks,
Feng

> 
> Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v9 12/18] vfio: Register/unregister irq_bypass_producer
  2015-09-18 14:29 ` [PATCH v9 12/18] vfio: Register/unregister irq_bypass_producer Feng Wu
  2015-09-18 17:19   ` Alex Williamson
@ 2015-09-21  8:56   ` Wu, Feng
  2015-09-21  9:32     ` Paolo Bonzini
  2016-04-26 20:08   ` Alex Williamson
  2 siblings, 1 reply; 56+ messages in thread
From: Wu, Feng @ 2015-09-21  8:56 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Wu, Feng

Hi Paolo & Alex,

I find that there is a build error in the following two cases:
- KVM is configured as 'M' and VFIO as 'Y'
The reason is the build of irqbypass manager is triggered in
arch/x86/kvm/Makefile, and VFIO is built before KVM, hence
it cannot find the symbols in irqbypass manager.

- Disable KVM and enable VFIO in .config
The reason is similar with the above one, the irqbypass manager
is not built since KVM is not configured.

I think the point is that we cannot trigger the build of irqbypass
manager inside KVM or VFIO, we need trigger the build at a high
level and it should be built before VFIO and KVM. Any ideas?

Thanks,
Feng

> -----Original Message-----
> From: Wu, Feng
> Sent: Friday, September 18, 2015 10:30 PM
> To: pbonzini@redhat.com; alex.williamson@redhat.com; joro@8bytes.org;
> mtosatti@redhat.com
> Cc: eric.auger@linaro.org; kvm@vger.kernel.org;
> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org; Wu, Feng
> Subject: [PATCH v9 12/18] vfio: Register/unregister irq_bypass_producer
> 
> This patch adds the registration/unregistration of an
> irq_bypass_producer for MSI/MSIx on vfio pci devices.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
> v8:
> - Merge "[PATCH v7 08/17] vfio: Select IRQ_BYPASS_MANAGER for vfio PCI
> devices"
>   into this patch.
> 
> v6:
> - Make the add_consumer and del_consumer callbacks static
> - Remove pointless INIT_LIST_HEAD to 'vdev->ctx[vector].producer.node)'
> - Use dev_info instead of WARN_ON() when irq_bypass_register_producer fails
> - Remove optional dummy callbacks for irq producer
> 
>  drivers/vfio/pci/Kconfig            | 1 +
>  drivers/vfio/pci/vfio_pci_intrs.c   | 9 +++++++++
>  drivers/vfio/pci/vfio_pci_private.h | 2 ++
>  3 files changed, 12 insertions(+)
> 
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 579d83b..02912f1 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -2,6 +2,7 @@ config VFIO_PCI
>  	tristate "VFIO support for PCI devices"
>  	depends on VFIO && PCI && EVENTFD
>  	select VFIO_VIRQFD
> +	select IRQ_BYPASS_MANAGER
>  	help
>  	  Support for the PCI VFIO bus driver.  This is required to make
>  	  use of PCI drivers using the VFIO framework.
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 1f577b4..c65299d 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -319,6 +319,7 @@ static int vfio_msi_set_vector_signal(struct
> vfio_pci_device *vdev,
> 
>  	if (vdev->ctx[vector].trigger) {
>  		free_irq(irq, vdev->ctx[vector].trigger);
> +		irq_bypass_unregister_producer(&vdev->ctx[vector].producer);
>  		kfree(vdev->ctx[vector].name);
>  		eventfd_ctx_put(vdev->ctx[vector].trigger);
>  		vdev->ctx[vector].trigger = NULL;
> @@ -360,6 +361,14 @@ static int vfio_msi_set_vector_signal(struct
> vfio_pci_device *vdev,
>  		return ret;
>  	}
> 
> +	vdev->ctx[vector].producer.token = trigger;
> +	vdev->ctx[vector].producer.irq = irq;
> +	ret = irq_bypass_register_producer(&vdev->ctx[vector].producer);
> +	if (unlikely(ret))
> +		dev_info(&pdev->dev,
> +		"irq bypass producer (token %p) registeration fails: %d\n",
> +		vdev->ctx[vector].producer.token, ret);
> +
>  	vdev->ctx[vector].trigger = trigger;
> 
>  	return 0;
> diff --git a/drivers/vfio/pci/vfio_pci_private.h
> b/drivers/vfio/pci/vfio_pci_private.h
> index ae0e1b4..0e7394f 100644
> --- a/drivers/vfio/pci/vfio_pci_private.h
> +++ b/drivers/vfio/pci/vfio_pci_private.h
> @@ -13,6 +13,7 @@
> 
>  #include <linux/mutex.h>
>  #include <linux/pci.h>
> +#include <linux/irqbypass.h>
> 
>  #ifndef VFIO_PCI_PRIVATE_H
>  #define VFIO_PCI_PRIVATE_H
> @@ -29,6 +30,7 @@ struct vfio_pci_irq_ctx {
>  	struct virqfd		*mask;
>  	char			*name;
>  	bool			masked;
> +	struct irq_bypass_producer	producer;
>  };
> 
>  struct vfio_pci_device {
> --
> 2.1.0


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v9 12/18] vfio: Register/unregister irq_bypass_producer
  2015-09-21  8:56   ` Wu, Feng
@ 2015-09-21  9:32     ` Paolo Bonzini
  2015-09-21 11:35       ` Wu, Feng
  2015-09-21 12:53       ` Wu, Feng
  0 siblings, 2 replies; 56+ messages in thread
From: Paolo Bonzini @ 2015-09-21  9:32 UTC (permalink / raw)
  To: Wu, Feng, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel



On 21/09/2015 10:56, Wu, Feng wrote:
> Hi Paolo & Alex,
> 
> I find that there is a build error in the following two cases:
> - KVM is configured as 'M' and VFIO as 'Y'
> The reason is the build of irqbypass manager is triggered in
> arch/x86/kvm/Makefile, and VFIO is built before KVM, hence
> it cannot find the symbols in irqbypass manager.
> 
> - Disable KVM and enable VFIO in .config
> The reason is similar with the above one, the irqbypass manager
> is not built since KVM is not configured.
> 
> I think the point is that we cannot trigger the build of irqbypass
> manager inside KVM or VFIO, we need trigger the build at a high
> level and it should be built before VFIO and KVM. Any ideas?

We can add virt/Makefile and build virt/lib/ directly, not through
arch/x86/kvm.

Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v9 12/18] vfio: Register/unregister irq_bypass_producer
  2015-09-21  9:32     ` Paolo Bonzini
@ 2015-09-21 11:35       ` Wu, Feng
  2015-09-21 12:06         ` Paolo Bonzini
  2015-09-21 12:53       ` Wu, Feng
  1 sibling, 1 reply; 56+ messages in thread
From: Wu, Feng @ 2015-09-21 11:35 UTC (permalink / raw)
  To: Paolo Bonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Wu, Feng



> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Monday, September 21, 2015 5:32 PM
> To: Wu, Feng; alex.williamson@redhat.com; joro@8bytes.org;
> mtosatti@redhat.com
> Cc: eric.auger@linaro.org; kvm@vger.kernel.org;
> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v9 12/18] vfio: Register/unregister irq_bypass_producer
> 
> 
> 
> On 21/09/2015 10:56, Wu, Feng wrote:
> > Hi Paolo & Alex,
> >
> > I find that there is a build error in the following two cases:
> > - KVM is configured as 'M' and VFIO as 'Y'
> > The reason is the build of irqbypass manager is triggered in
> > arch/x86/kvm/Makefile, and VFIO is built before KVM, hence
> > it cannot find the symbols in irqbypass manager.
> >
> > - Disable KVM and enable VFIO in .config
> > The reason is similar with the above one, the irqbypass manager
> > is not built since KVM is not configured.
> >
> > I think the point is that we cannot trigger the build of irqbypass
> > manager inside KVM or VFIO, we need trigger the build at a high
> > level and it should be built before VFIO and KVM. Any ideas?
> 
> We can add virt/Makefile and build virt/lib/ directly, not through
> arch/x86/kvm.

Yes, that can solve the build error. Should I send a new version?

Thanks,
Feng

> 
> Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v9 12/18] vfio: Register/unregister irq_bypass_producer
  2015-09-21 11:35       ` Wu, Feng
@ 2015-09-21 12:06         ` Paolo Bonzini
  2015-09-21 12:08           ` Wu, Feng
  0 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2015-09-21 12:06 UTC (permalink / raw)
  To: Wu, Feng, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel



On 21/09/2015 13:35, Wu, Feng wrote:
>>> > > I think the point is that we cannot trigger the build of irqbypass
>>> > > manager inside KVM or VFIO, we need trigger the build at a high
>>> > > level and it should be built before VFIO and KVM. Any ideas?
>> > 
>> > We can add virt/Makefile and build virt/lib/ directly, not through
>> > arch/x86/kvm.
> Yes, that can solve the build error. Should I send a new version?

You can send a separate patch on top of this v9.

Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v9 12/18] vfio: Register/unregister irq_bypass_producer
  2015-09-21 12:06         ` Paolo Bonzini
@ 2015-09-21 12:08           ` Wu, Feng
  0 siblings, 0 replies; 56+ messages in thread
From: Wu, Feng @ 2015-09-21 12:08 UTC (permalink / raw)
  To: Paolo Bonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Wu, Feng


> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Monday, September 21, 2015 8:07 PM
> To: Wu, Feng; alex.williamson@redhat.com; joro@8bytes.org;
> mtosatti@redhat.com
> Cc: eric.auger@linaro.org; kvm@vger.kernel.org;
> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v9 12/18] vfio: Register/unregister irq_bypass_producer
> 
> 
> 
> On 21/09/2015 13:35, Wu, Feng wrote:
> >>> > > I think the point is that we cannot trigger the build of irqbypass
> >>> > > manager inside KVM or VFIO, we need trigger the build at a high
> >>> > > level and it should be built before VFIO and KVM. Any ideas?
> >> >
> >> > We can add virt/Makefile and build virt/lib/ directly, not through
> >> > arch/x86/kvm.
> > Yes, that can solve the build error. Should I send a new version?
> 
> You can send a separate patch on top of this v9.

Sure, will do this soon!

Thanks,
Feng

> 
> Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v9 12/18] vfio: Register/unregister irq_bypass_producer
  2015-09-21  9:32     ` Paolo Bonzini
  2015-09-21 11:35       ` Wu, Feng
@ 2015-09-21 12:53       ` Wu, Feng
  2015-09-21 13:02         ` Paolo Bonzini
  1 sibling, 1 reply; 56+ messages in thread
From: Wu, Feng @ 2015-09-21 12:53 UTC (permalink / raw)
  To: Paolo Bonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Wu, Feng



> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Monday, September 21, 2015 5:32 PM
> To: Wu, Feng; alex.williamson@redhat.com; joro@8bytes.org;
> mtosatti@redhat.com
> Cc: eric.auger@linaro.org; kvm@vger.kernel.org;
> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v9 12/18] vfio: Register/unregister irq_bypass_producer
> 
> 
> 
> On 21/09/2015 10:56, Wu, Feng wrote:
> > Hi Paolo & Alex,
> >
> > I find that there is a build error in the following two cases:
> > - KVM is configured as 'M' and VFIO as 'Y'
> > The reason is the build of irqbypass manager is triggered in
> > arch/x86/kvm/Makefile, and VFIO is built before KVM, hence
> > it cannot find the symbols in irqbypass manager.
> >
> > - Disable KVM and enable VFIO in .config
> > The reason is similar with the above one, the irqbypass manager
> > is not built since KVM is not configured.
> >
> > I think the point is that we cannot trigger the build of irqbypass
> > manager inside KVM or VFIO, we need trigger the build at a high
> > level and it should be built before VFIO and KVM. Any ideas?
> 
> We can add virt/Makefile and build virt/lib/ directly, not through
> arch/x86/kvm.

Thinking about this more, does that mean we need to add the virt directory
in the top Makefile in Linux tree?

Thanks,
Feng

> 
> Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v9 12/18] vfio: Register/unregister irq_bypass_producer
  2015-09-21 12:53       ` Wu, Feng
@ 2015-09-21 13:02         ` Paolo Bonzini
  2015-09-21 19:46           ` Eric Auger
  0 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2015-09-21 13:02 UTC (permalink / raw)
  To: Wu, Feng, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel



On 21/09/2015 14:53, Wu, Feng wrote:
>>> > > I think the point is that we cannot trigger the build of irqbypass
>>> > > manager inside KVM or VFIO, we need trigger the build at a high
>>> > > level and it should be built before VFIO and KVM. Any ideas?
>> > 
>> > We can add virt/Makefile and build virt/lib/ directly, not through
>> > arch/x86/kvm.
> Thinking about this more, does that mean we need to add the virt directory
> in the top Makefile in Linux tree?

Yes, it does.

Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v9 18/18] iommu/vt-d: Add a command line parameter for VT-d posted-interrupts
  2015-09-18 14:29 ` [PATCH v9 18/18] iommu/vt-d: Add a command line parameter for VT-d posted-interrupts Feng Wu
@ 2015-09-21 13:46   ` Joerg Roedel
  0 siblings, 0 replies; 56+ messages in thread
From: Joerg Roedel @ 2015-09-21 13:46 UTC (permalink / raw)
  To: Feng Wu
  Cc: pbonzini, alex.williamson, mtosatti, eric.auger, kvm, iommu,
	linux-kernel

On Fri, Sep 18, 2015 at 10:29:56PM +0800, Feng Wu wrote:
> Enable VT-d Posted-Interrtups and add a command line
> parameter for it.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  Documentation/kernel-parameters.txt |  1 +
>  drivers/iommu/irq_remapping.c       | 12 ++++++++----
>  2 files changed, 9 insertions(+), 4 deletions(-)

Acked-by: Joerg Roedel <jroedel@suse.de>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v9 03/18] KVM: arm/arm64: select IRQ_BYPASS_MANAGER
  2015-09-18 14:29 ` [PATCH v9 03/18] KVM: arm/arm64: " Feng Wu
@ 2015-09-21 19:32   ` Eric Auger
  0 siblings, 0 replies; 56+ messages in thread
From: Eric Auger @ 2015-09-21 19:32 UTC (permalink / raw)
  To: Feng Wu, pbonzini, alex.williamson, joro, mtosatti
  Cc: kvm, iommu, linux-kernel

Hi Feng,

There is a compilation issue for arm64 I need to fix here. Shall I
resend the pre-requisite series or do you prefer to remove that patch
file from this series. It would be included later when arm irq
forwarding series get's ready.

Best Regards

Eric

On 09/18/2015 04:29 PM, Feng Wu wrote:
> From: Eric Auger <eric.auger@linaro.org>
> 
> Select IRQ_BYPASS_MANAGER when CONFIG_KVM is set
> Also add compilation of virt/lib.
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
> v3 -> v4:
> - add compilation of virt/lib in arm/arm64 KVM
> 
> v2 -> v3:
> - [Feng Wu] Correct a typo in 'arch/arm64/kvm/Kconfig'
> 
> v1 -> v2:
> - also set IRQ_BYPASS_MANAGER for arm64
> 
>  arch/arm/kvm/Kconfig    | 2 ++
>  arch/arm/kvm/Makefile   | 1 +
>  arch/arm64/kvm/Kconfig  | 2 ++
>  arch/arm64/kvm/Makefile | 1 +
>  4 files changed, 6 insertions(+)
> 
> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
> index bfb915d..3c565b9 100644
> --- a/arch/arm/kvm/Kconfig
> +++ b/arch/arm/kvm/Kconfig
> @@ -3,6 +3,7 @@
>  #
>  
>  source "virt/kvm/Kconfig"
> +source "virt/lib/Kconfig"
>  
>  menuconfig VIRTUALIZATION
>  	bool "Virtualization"
> @@ -31,6 +32,7 @@ config KVM
>  	select KVM_VFIO
>  	select HAVE_KVM_EVENTFD
>  	select HAVE_KVM_IRQFD
> +	select IRQ_BYPASS_MANAGER
>  	depends on ARM_VIRT_EXT && ARM_LPAE && ARM_ARCH_TIMER
>  	---help---
>  	  Support hosting virtualized guest machines.
> diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
> index c5eef02c..a6a41dd 100644
> --- a/arch/arm/kvm/Makefile
> +++ b/arch/arm/kvm/Makefile
> @@ -24,3 +24,4 @@ obj-y += $(KVM)/arm/vgic.o
>  obj-y += $(KVM)/arm/vgic-v2.o
>  obj-y += $(KVM)/arm/vgic-v2-emul.o
>  obj-y += $(KVM)/arm/arch_timer.o
> +obj-y += ../../../virt/lib/
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index bfffe8f..2509539 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -3,6 +3,7 @@
>  #
>  
>  source "virt/kvm/Kconfig"
> +source "virt/lib/Kconfig"
>  
>  menuconfig VIRTUALIZATION
>  	bool "Virtualization"
> @@ -31,6 +32,7 @@ config KVM
>  	select KVM_VFIO
>  	select HAVE_KVM_EVENTFD
>  	select HAVE_KVM_IRQFD
> +	select IRQ_BYPASS_MANAGER
>  	---help---
>  	  Support hosting virtualized guest machines.
>  
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index f90f4aa..55eec69 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -27,3 +27,4 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v3.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v3-emul.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += vgic-v3-switch.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
> +kvm-$(CONFIG_KVM_ARM_HOST) += ../../../virt/lib/
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v9 12/18] vfio: Register/unregister irq_bypass_producer
  2015-09-21 13:02         ` Paolo Bonzini
@ 2015-09-21 19:46           ` Eric Auger
  0 siblings, 0 replies; 56+ messages in thread
From: Eric Auger @ 2015-09-21 19:46 UTC (permalink / raw)
  To: Paolo Bonzini, Wu, Feng, alex.williamson, joro, mtosatti
  Cc: kvm, iommu, linux-kernel

Hi,
On 09/21/2015 03:02 PM, Paolo Bonzini wrote:
> 
> 
> On 21/09/2015 14:53, Wu, Feng wrote:
>>>>>> I think the point is that we cannot trigger the build of irqbypass
>>>>>> manager inside KVM or VFIO, we need trigger the build at a high
>>>>>> level and it should be built before VFIO and KVM. Any ideas?
>>>>
>>>> We can add virt/Makefile and build virt/lib/ directly, not through
>>>> arch/x86/kvm.
>> Thinking about this more, does that mean we need to add the virt directory
>> in the top Makefile in Linux tree?
> 
> Yes, it does.
So I understand this will replace patches 2 & 3 then and will fix the
arm64 issue then.

Thanks

Eric

> 
> Paolo
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series
  2015-09-18 14:29 [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series Feng Wu
                   ` (18 preceding siblings ...)
  2015-09-18 14:58 ` [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series Paolo Bonzini
@ 2015-09-25  1:49 ` Wu, Feng
  2015-09-25 11:14   ` Paolo Bonzini
  19 siblings, 1 reply; 56+ messages in thread
From: Wu, Feng @ 2015-09-25  1:49 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Wu, Feng

Hi Paolo,

Thanks for your review on this series! I'd like to confirm this series (plus
the patch fixing the compilation error) is okay to you and I don't need to
do extra things for it, right?

Thanks,
Feng

> -----Original Message-----
> From: Wu, Feng
> Sent: Friday, September 18, 2015 10:30 PM
> To: pbonzini@redhat.com; alex.williamson@redhat.com; joro@8bytes.org;
> mtosatti@redhat.com
> Cc: eric.auger@linaro.org; kvm@vger.kernel.org;
> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org; Wu, Feng
> Subject: [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including
> prerequisite series
> 
> VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> With VT-d Posted-Interrupts enabled, external interrupts from
> direct-assigned devices can be delivered to guests without VMM
> intervention when guest is running in non-root mode.
> 
> You can find the VT-d Posted-Interrtups Spec. in the following URL:
> http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolog
> y/vt-directed-io-spec.html
> 
> v9:
> - Include the whole series:
> [01/18]: irq bypasser manager
> [02/18] - [06/18]: Common non-architecture part for VT-d PI and ARM side
> forwarded irq
> [07/18] - [18/18]: VT-d PI part
> 
> v8:
> refer to the changelog in each patch
> 
> v7:
> * Define two weak irq bypass callbacks:
>   - kvm_arch_irq_bypass_start()
>   - kvm_arch_irq_bypass_stop()
> * Remove the x86 dummy implementation of the above two functions.
> * Print some useful information instead of WARN_ON() when the
>   irq bypass consumer unregistration fails.
> * Fix an issue when calling pi_pre_block and pi_post_block.
> 
> v6:
> * Rebase on 4.2.0-rc6
> * Rebase on https://lkml.org/lkml/2015/8/6/526 and
> http://www.gossamer-threads.com/lists/linux/kernel/2235623
> * Make the add_consumer and del_consumer callbacks static
> * Remove pointless INIT_LIST_HEAD to 'vdev->ctx[vector].producer.node)'
> * Use dev_info instead of WARN_ON() when irq_bypass_register_producer fails
> * Remove optional dummy callbacks for irq producer
> 
> v4:
> * For lowest-priority interrupt, only support single-CPU destination
> interrupts at the current stage, more common lowest priority support
> will be added later.
> * Accoring to Marcelo's suggestion, when vCPU is blocked, we handle
> the posted-interrupts in the HLT emulation path.
> * Some small changes (coding style, typo, add some code comments)
> 
> v3:
> * Adjust the Posted-interrupts Descriptor updating logic when vCPU is
>   preempted or blocked.
> * KVM_DEV_VFIO_DEVICE_POSTING_IRQ -->
> KVM_DEV_VFIO_DEVICE_POST_IRQ
> * __KVM_HAVE_ARCH_KVM_VFIO_POSTING -->
> __KVM_HAVE_ARCH_KVM_VFIO_POST
> * Add KVM_DEV_VFIO_DEVICE_UNPOST_IRQ attribute for VFIO irq, which
>   can be used to change back to remapping mode.
> * Fix typo
> 
> v2:
> * Use VFIO framework to enable this feature, the VFIO part of this series is
>   base on Eric's patch "[PATCH v3 0/8] KVM-VFIO IRQ forward control"
> * Rebase this patchset on
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git,
>   then revise some irq logic based on the new hierarchy irqdomain patches
> provided
>   by Jiang Liu <jiang.liu@linux.intel.com>
> 
> 
> *** BLURB HERE ***
> 
> Alex Williamson (1):
>   virt: IRQ bypass manager
> 
> Eric Auger (4):
>   KVM: arm/arm64: select IRQ_BYPASS_MANAGER
>   KVM: create kvm_irqfd.h
>   KVM: introduce kvm_arch functions for IRQ bypass
>   KVM: eventfd: add irq bypass consumer management
> 
> Feng Wu (13):
>   KVM: x86: select IRQ_BYPASS_MANAGER
>   KVM: Extend struct pi_desc for VT-d Posted-Interrupts
>   KVM: Add some helper functions for Posted-Interrupts
>   KVM: Define a new interface kvm_intr_is_single_vcpu()
>   KVM: Make struct kvm_irq_routing_table accessible
>   KVM: make kvm_set_msi_irq() public
>   vfio: Register/unregister irq_bypass_producer
>   KVM: x86: Update IRTE for posted-interrupts
>   KVM: Implement IRQ bypass consumer callbacks for x86
>   KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'
>   KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
>   KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
>   iommu/vt-d: Add a command line parameter for VT-d posted-interrupts
> 
>  Documentation/kernel-parameters.txt   |   1 +
>  Documentation/virtual/kvm/locking.txt |  12 ++
>  MAINTAINERS                           |   7 +
>  arch/arm/kvm/Kconfig                  |   2 +
>  arch/arm/kvm/Makefile                 |   1 +
>  arch/arm64/kvm/Kconfig                |   2 +
>  arch/arm64/kvm/Makefile               |   1 +
>  arch/x86/include/asm/kvm_host.h       |  24 +++
>  arch/x86/kvm/Kconfig                  |   3 +
>  arch/x86/kvm/Makefile                 |   3 +
>  arch/x86/kvm/irq_comm.c               |  32 ++-
>  arch/x86/kvm/lapic.c                  |  59 ++++++
>  arch/x86/kvm/lapic.h                  |   2 +
>  arch/x86/kvm/trace.h                  |  33 ++++
>  arch/x86/kvm/vmx.c                    | 361
> +++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/x86.c                    | 108 +++++++++-
>  drivers/iommu/irq_remapping.c         |  12 +-
>  drivers/vfio/pci/Kconfig              |   1 +
>  drivers/vfio/pci/vfio_pci_intrs.c     |   9 +
>  drivers/vfio/pci/vfio_pci_private.h   |   2 +
>  include/linux/irqbypass.h             |  90 +++++++++
>  include/linux/kvm_host.h              |  29 +++
>  include/linux/kvm_irqfd.h             |  71 +++++++
>  virt/kvm/Kconfig                      |   3 +
>  virt/kvm/eventfd.c                    | 142 +++++++------
>  virt/kvm/irqchip.c                    |  10 -
>  virt/kvm/kvm_main.c                   |   3 +
>  virt/lib/Kconfig                      |   2 +
>  virt/lib/Makefile                     |   1 +
>  virt/lib/irqbypass.c                  | 257 ++++++++++++++++++++++++
>  30 files changed, 1182 insertions(+), 101 deletions(-)
>  create mode 100644 include/linux/irqbypass.h
>  create mode 100644 include/linux/kvm_irqfd.h
>  create mode 100644 virt/lib/Kconfig
>  create mode 100644 virt/lib/Makefile
>  create mode 100644 virt/lib/irqbypass.c
> 
> --
> 2.1.0


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series
  2015-09-25  1:49 ` Wu, Feng
@ 2015-09-25 11:14   ` Paolo Bonzini
  2015-09-28 10:14     ` Wu, Feng
  0 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2015-09-25 11:14 UTC (permalink / raw)
  To: Wu, Feng, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel



On 25/09/2015 03:49, Wu, Feng wrote:
> Hi Paolo,
> 
> Thanks for your review on this series! I'd like to confirm this series (plus
> the patch fixing the compilation error) is okay to you and I don't need to
> do extra things for it, right?

Yes, can you check if branch vtd-pi of
git://git.kernel.org/pub/scm/virt/kvm/kvm.git works for you?  If so I'll
merge it.

Paolo


^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series
  2015-09-25 11:14   ` Paolo Bonzini
@ 2015-09-28 10:14     ` Wu, Feng
  2015-09-28 10:18       ` Paolo Bonzini
  0 siblings, 1 reply; 56+ messages in thread
From: Wu, Feng @ 2015-09-28 10:14 UTC (permalink / raw)
  To: Paolo Bonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Wu, Feng



> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Friday, September 25, 2015 7:15 PM
> To: Wu, Feng; alex.williamson@redhat.com; joro@8bytes.org;
> mtosatti@redhat.com
> Cc: eric.auger@linaro.org; kvm@vger.kernel.org;
> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including
> prerequisite series
> 
> 
> 
> On 25/09/2015 03:49, Wu, Feng wrote:
> > Hi Paolo,
> >
> > Thanks for your review on this series! I'd like to confirm this series (plus
> > the patch fixing the compilation error) is okay to you and I don't need to
> > do extra things for it, right?
> 
> Yes, can you check if branch vtd-pi of
> git://git.kernel.org/pub/scm/virt/kvm/kvm.git works for you?  If so I'll
> merge it.

Thanks a lot for creating branch for vt-d pi. However, I cannot launch guests
with this tree. I encountered the following kernel dump, and I find that the
problematic commit is " 2260b1cde0b5472ab70ad0764b10095372e41913 "

    KVM: x86: put vcpu_create under kvm->srcu critical section

    This is needed in case vcpu_create wants to access the memslots array.
    Fixes this lockdep splat:

After removing this commit from the tree, my VT-d patch-set works fine.


Kernel dump:
[  221.978182] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  221.986085] IP: [<ffffffffc0368ab0>] kvm_arch_vcpu_create+0x30/0x90 [kvm]
[  221.993102] PGD 0
[  221.995148] Oops: 0000 [#1] SMP
[  221.998440] Modules linked in: bnep rfcomm bluetooth ax88179_178a usbnet intel_rapl mii snd_hda_codec_hdmi iosf_mbi x86_pkg_temp_thermal nouveau intel_powerclamp snd_hda_intel snd_hda_codec coretemp kvm_intel snd_hda_core kvm snd_hwdep snd_pcm crct10dif_pclmul crc32_pclmul snd_seq_midi ghash_clmulni_intel mxm_wmi snd_seq_midi_event snd_rawmidi video snd_seq ttm aesni_intel aes_x86_64 lrw gf128mul drm_kms_helper snd_seq_device binfmt_misc snd_timer glue_helper ablk_helper drm cryptd fb_sys_fops snd syscopyarea sysfillrect sb_edac soundcore sysimgblt mei_me parport_pc edac_core ppdev mei shpchp lp lpc_ich mac_hid parport acpi_power_meter wmi ixgbe igb i2c_algo_bit hid_generic usbhid ptp ahci hid libahci pps_core mdio
[  222.063533] CPU: 4 PID: 3384 Comm: qemu-system-x86 Not tainted 4.3.0-rc1+ #6
[  222.070612] Hardware name: Intel Corp. GRANGEVILLE/GRANTLEY, BIOS GNVDCRB1.86B.0020.V07.1409241147 09/24/2014
[  222.080764] task: ffff88006e7c8000 ti: ffff8800714a8000 task.ti: ffff8800714a8000
[  222.088283] RIP: 0010:[<ffffffffc0368ab0>]  [<ffffffffc0368ab0>] kvm_arch_vcpu_create+0x30/0x90 [kvm]
[  222.097680] RSP: 0018:ffff8800714abde0  EFLAGS: 00010246
[  222.103153] RAX: 0000000000000000 RBX: ffff88016f28c000 RCX: 0000000000000000
[  222.110407] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88016f28c000
[  222.117659] RBP: ffff8800714abdf8 R08: 0000000000000001 R09: 0000000000000040
[  222.124824] R10: ffff880077e86438 R11: ffff880163e06880 R12: ffff88016f28c000
[  222.132150] R13: 0000000000000000 R14: 000000000000ae41 R15: 0000000000000000
[  222.139405] FS:  00007f43fd7ec700(0000) GS:ffff880178700000(0000) knlGS:0000000000000000
[  222.147629] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  222.153471] CR2: 0000000000000000 CR3: 000000017074b000 CR4: 00000000003426e0
[  222.160726] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  222.167979] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  222.175231] Stack:
[  222.177277]  0000000000000000 ffff88016f28c000 0000000000000000 ffff8800714abea0
[  222.184870]  ffffffffc0355b17 0000000000000008 ffff8800714abe28 ffffffff810aba32
[  222.192444]  ffff880178816e40 ffff8800714abe40 ffffffff810a4f44 ffff880178816e40
[  222.200017] Call Trace:
[  222.202522]  [<ffffffffc0355b17>] kvm_vm_ioctl+0x277/0x6e0 [kvm]
[  222.208633]  [<ffffffff810aba32>] ? put_prev_task_fair+0x22/0x40
[  222.214741]  [<ffffffff810a4f44>] ? pick_next_task_idle+0x14/0x30
[  222.220942]  [<ffffffff811fb53a>] do_vfs_ioctl+0x2ba/0x490
[  222.226523]  [<ffffffff8106274a>] ? __do_page_fault+0x1ba/0x410
[  222.232546]  [<ffffffff811fb789>] SyS_ioctl+0x79/0x90
[  222.237684]  [<ffffffff81003ba5>] ? syscall_return_slowpath+0x55/0x150
[  222.244323]  [<ffffffff81791d36>] entry_SYSCALL_64_fastpath+0x16/0x75
[  222.250869] Code: 55 48 89 e5 41 55 41 54 53 41 89 f5 48 89 fb e8 27 61 cb c0 85 c0 74 13 8b 83 f0 09 00 00 85 c0 74 09 80 3d 53 2e 04 00 00 74 40 <48> 8b 04 25 00 00 00 00 48 8d 78 48 e8 7f c4 d6 c0 41 89 c4 48
[  222.270790] RIP  [<ffffffffc0368ab0>] kvm_arch_vcpu_create+0x30/0x90 [kvm]
[  222.277813]  RSP <ffff8800714abde0>
[  222.281359] CR2: 0000000000000000
[  222.290421] ---[ end trace 957f5a39692fe6c7 ]---
root@feng-bdw-de-pi:~/workspace/tools# dmesg > ~/dmesg.log
root@feng-bdw-de-pi:~/workspace/tools# vim ~/dmesg.log
[  221.998440] Modules linked in: bnep rfcomm bluetooth ax88179_178a usbnet intel_rapl mii snd_hda_codec_hdmi iosf_mbi x86_pkg_temp_thermal nouveau intel_powerclamp snd_hda_intel snd_hda_codec coretemp kvm_intel snd_hda_core kvm snd_hwdep snd_pcm crct10dif_pclmul crc32_pclmul snd_seq_midi ghash_clmulni_intel mxm_wmi snd_seq_midi_event snd_rawmidi video snd_seq ttm aesni_intel aes_x86_64 lrw gf128mul drm_kms_helper snd_seq_device binfmt_misc snd_timer glue_helper ablk_helper drm cryptd fb_sys_fops snd syscopyarea sysfillrect sb_edac soundcore sysimgblt mei_me parport_pc edac_core ppdev mei shpchp lp lpc_ich mac_hid parport acpi_power_meter wmi ixgbe igb i2c_algo_bit hid_generic usbhid ptp ahci hid libahci pps_core mdio
[  222.063533] CPU: 4 PID: 3384 Comm: qemu-system-x86 Not tainted 4.3.0-rc1+ #6
[  222.070612] Hardware name: Intel Corp. GRANGEVILLE/GRANTLEY, BIOS GNVDCRB1.86B.0020.V07.1409241147 09/24/2014
[  222.080764] task: ffff88006e7c8000 ti: ffff8800714a8000 task.ti: ffff8800714a8000
[  222.088283] RIP: 0010:[<ffffffffc0368ab0>]  [<ffffffffc0368ab0>] kvm_arch_vcpu_create+0x30/0x90 [kvm]
[  222.097680] RSP: 0018:ffff8800714abde0  EFLAGS: 00010246
[  222.103153] RAX: 0000000000000000 RBX: ffff88016f28c000 RCX: 0000000000000000
[  222.110407] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88016f28c000
[  222.117659] RBP: ffff8800714abdf8 R08: 0000000000000001 R09: 0000000000000040
[  222.124824] R10: ffff880077e86438 R11: ffff880163e06880 R12: ffff88016f28c000
[  222.132150] R13: 0000000000000000 R14: 000000000000ae41 R15: 0000000000000000
[  222.139405] FS:  00007f43fd7ec700(0000) GS:ffff880178700000(0000) knlGS:0000000000000000
[  222.147629] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  222.153471] CR2: 0000000000000000 CR3: 000000017074b000 CR4: 00000000003426e0
[  222.160726] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  222.167979] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  222.175231] Stack:
[  222.177277]  0000000000000000 ffff88016f28c000 0000000000000000 ffff8800714abea0
[  222.184870]  ffffffffc0355b17 0000000000000008 ffff8800714abe28 ffffffff810aba32
[  222.192444]  ffff880178816e40 ffff8800714abe40 ffffffff810a4f44 ffff880178816e40
[  222.200017] Call Trace:
[  222.202522]  [<ffffffffc0355b17>] kvm_vm_ioctl+0x277/0x6e0 [kvm]
[  222.208633]  [<ffffffff810aba32>] ? put_prev_task_fair+0x22/0x40
[  222.214741]  [<ffffffff810a4f44>] ? pick_next_task_idle+0x14/0x30
[  222.220942]  [<ffffffff811fb53a>] do_vfs_ioctl+0x2ba/0x490
[  222.226523]  [<ffffffff8106274a>] ? __do_page_fault+0x1ba/0x410
[  222.232546]  [<ffffffff811fb789>] SyS_ioctl+0x79/0x90
[  222.237684]  [<ffffffff81003ba5>] ? syscall_return_slowpath+0x55/0x150
[  222.244323]  [<ffffffff81791d36>] entry_SYSCALL_64_fastpath+0x16/0x75
[  222.250869] Code: 55 48 89 e5 41 55 41 54 53 41 89 f5 48 89 fb e8 27 61 cb c0 85 c0 74 13 8b 83 f0 09 00 00 85 c0 74 09 80 3d 53 2e 04 00 00 74 40 <48> 8b 04 25 00 00 00 00 48 8d 78 48 e8 7f c4 d6 c0 41 89 c4 48
[  222.270790] RIP  [<ffffffffc0368ab0>] kvm_arch_vcpu_create+0x30/0x90 [kvm]
[  222.277813]  RSP <ffff8800714abde0>
[  222.281359] CR2: 0000000000000000
[  222.290421] ---[ end trace 957f5a39692fe6c7 ]---

Thanks,
Feng

> 
> Paolo


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series
  2015-09-28 10:14     ` Wu, Feng
@ 2015-09-28 10:18       ` Paolo Bonzini
  2015-09-28 10:22         ` Wu, Feng
  0 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2015-09-28 10:18 UTC (permalink / raw)
  To: Wu, Feng, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel



On 28/09/2015 12:14, Wu, Feng wrote:
> Thanks a lot for creating branch for vt-d pi. However, I cannot launch guests
> with this tree. I encountered the following kernel dump, and I find that the
> problematic commit is " 2260b1cde0b5472ab70ad0764b10095372e41913 "
> 
>     KVM: x86: put vcpu_create under kvm->srcu critical section
> 
>     This is needed in case vcpu_create wants to access the memslots array.
>     Fixes this lockdep splat:
> 
> After removing this commit from the tree, my VT-d patch-set works fine.

Great, thanks.  The above commit had already been reverted.

I'm sorting out the kbuild reports, and then will merge VT-d PI.

Paolo

> 
> Kernel dump:
> [  221.978182] BUG: unable to handle kernel NULL pointer dereference at           (null)
> [  221.986085] IP: [<ffffffffc0368ab0>] kvm_arch_vcpu_create+0x30/0x90 [kvm]
> [  221.993102] PGD 0
> [  221.995148] Oops: 0000 [#1] SMP
> [  221.998440] Modules linked in: bnep rfcomm bluetooth ax88179_178a usbnet intel_rapl mii snd_hda_codec_hdmi iosf_mbi x86_pkg_temp_thermal nouveau intel_powerclamp snd_hda_intel snd_hda_codec coretemp kvm_intel snd_hda_core kvm snd_hwdep snd_pcm crct10dif_pclmul crc32_pclmul snd_seq_midi ghash_clmulni_intel mxm_wmi snd_seq_midi_event snd_rawmidi video snd_seq ttm aesni_intel aes_x86_64 lrw gf128mul drm_kms_helper snd_seq_device binfmt_misc snd_timer glue_helper ablk_helper drm cryptd fb_sys_fops snd syscopyarea sysfillrect sb_edac soundcore sysimgblt mei_me parport_pc edac_core ppdev mei shpchp lp lpc_ich mac_hid parport acpi_power_meter wmi ixgbe igb i2c_algo_bit hid_generic usbhid ptp ahci hid libahci pps_core mdio
> [  222.063533] CPU: 4 PID: 3384 Comm: qemu-system-x86 Not tainted 4.3.0-rc1+ #6
> [  222.070612] Hardware name: Intel Corp. GRANGEVILLE/GRANTLEY, BIOS GNVDCRB1.86B.0020.V07.1409241147 09/24/2014
> [  222.080764] task: ffff88006e7c8000 ti: ffff8800714a8000 task.ti: ffff8800714a8000
> [  222.088283] RIP: 0010:[<ffffffffc0368ab0>]  [<ffffffffc0368ab0>] kvm_arch_vcpu_create+0x30/0x90 [kvm]
> [  222.097680] RSP: 0018:ffff8800714abde0  EFLAGS: 00010246
> [  222.103153] RAX: 0000000000000000 RBX: ffff88016f28c000 RCX: 0000000000000000
> [  222.110407] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88016f28c000
> [  222.117659] RBP: ffff8800714abdf8 R08: 0000000000000001 R09: 0000000000000040
> [  222.124824] R10: ffff880077e86438 R11: ffff880163e06880 R12: ffff88016f28c000
> [  222.132150] R13: 0000000000000000 R14: 000000000000ae41 R15: 0000000000000000
> [  222.139405] FS:  00007f43fd7ec700(0000) GS:ffff880178700000(0000) knlGS:0000000000000000
> [  222.147629] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  222.153471] CR2: 0000000000000000 CR3: 000000017074b000 CR4: 00000000003426e0
> [  222.160726] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  222.167979] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  222.175231] Stack:
> [  222.177277]  0000000000000000 ffff88016f28c000 0000000000000000 ffff8800714abea0
> [  222.184870]  ffffffffc0355b17 0000000000000008 ffff8800714abe28 ffffffff810aba32
> [  222.192444]  ffff880178816e40 ffff8800714abe40 ffffffff810a4f44 ffff880178816e40
> [  222.200017] Call Trace:
> [  222.202522]  [<ffffffffc0355b17>] kvm_vm_ioctl+0x277/0x6e0 [kvm]
> [  222.208633]  [<ffffffff810aba32>] ? put_prev_task_fair+0x22/0x40
> [  222.214741]  [<ffffffff810a4f44>] ? pick_next_task_idle+0x14/0x30
> [  222.220942]  [<ffffffff811fb53a>] do_vfs_ioctl+0x2ba/0x490
> [  222.226523]  [<ffffffff8106274a>] ? __do_page_fault+0x1ba/0x410
> [  222.232546]  [<ffffffff811fb789>] SyS_ioctl+0x79/0x90
> [  222.237684]  [<ffffffff81003ba5>] ? syscall_return_slowpath+0x55/0x150
> [  222.244323]  [<ffffffff81791d36>] entry_SYSCALL_64_fastpath+0x16/0x75
> [  222.250869] Code: 55 48 89 e5 41 55 41 54 53 41 89 f5 48 89 fb e8 27 61 cb c0 85 c0 74 13 8b 83 f0 09 00 00 85 c0 74 09 80 3d 53 2e 04 00 00 74 40 <48> 8b 04 25 00 00 00 00 48 8d 78 48 e8 7f c4 d6 c0 41 89 c4 48
> [  222.270790] RIP  [<ffffffffc0368ab0>] kvm_arch_vcpu_create+0x30/0x90 [kvm]
> [  222.277813]  RSP <ffff8800714abde0>
> [  222.281359] CR2: 0000000000000000
> [  222.290421] ---[ end trace 957f5a39692fe6c7 ]---
> root@feng-bdw-de-pi:~/workspace/tools# dmesg > ~/dmesg.log
> root@feng-bdw-de-pi:~/workspace/tools# vim ~/dmesg.log
> [  221.998440] Modules linked in: bnep rfcomm bluetooth ax88179_178a usbnet intel_rapl mii snd_hda_codec_hdmi iosf_mbi x86_pkg_temp_thermal nouveau intel_powerclamp snd_hda_intel snd_hda_codec coretemp kvm_intel snd_hda_core kvm snd_hwdep snd_pcm crct10dif_pclmul crc32_pclmul snd_seq_midi ghash_clmulni_intel mxm_wmi snd_seq_midi_event snd_rawmidi video snd_seq ttm aesni_intel aes_x86_64 lrw gf128mul drm_kms_helper snd_seq_device binfmt_misc snd_timer glue_helper ablk_helper drm cryptd fb_sys_fops snd syscopyarea sysfillrect sb_edac soundcore sysimgblt mei_me parport_pc edac_core ppdev mei shpchp lp lpc_ich mac_hid parport acpi_power_meter wmi ixgbe igb i2c_algo_bit hid_generic usbhid ptp ahci hid libahci pps_core mdio
> [  222.063533] CPU: 4 PID: 3384 Comm: qemu-system-x86 Not tainted 4.3.0-rc1+ #6
> [  222.070612] Hardware name: Intel Corp. GRANGEVILLE/GRANTLEY, BIOS GNVDCRB1.86B.0020.V07.1409241147 09/24/2014
> [  222.080764] task: ffff88006e7c8000 ti: ffff8800714a8000 task.ti: ffff8800714a8000
> [  222.088283] RIP: 0010:[<ffffffffc0368ab0>]  [<ffffffffc0368ab0>] kvm_arch_vcpu_create+0x30/0x90 [kvm]
> [  222.097680] RSP: 0018:ffff8800714abde0  EFLAGS: 00010246
> [  222.103153] RAX: 0000000000000000 RBX: ffff88016f28c000 RCX: 0000000000000000
> [  222.110407] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88016f28c000
> [  222.117659] RBP: ffff8800714abdf8 R08: 0000000000000001 R09: 0000000000000040
> [  222.124824] R10: ffff880077e86438 R11: ffff880163e06880 R12: ffff88016f28c000
> [  222.132150] R13: 0000000000000000 R14: 000000000000ae41 R15: 0000000000000000
> [  222.139405] FS:  00007f43fd7ec700(0000) GS:ffff880178700000(0000) knlGS:0000000000000000
> [  222.147629] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  222.153471] CR2: 0000000000000000 CR3: 000000017074b000 CR4: 00000000003426e0
> [  222.160726] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  222.167979] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  222.175231] Stack:
> [  222.177277]  0000000000000000 ffff88016f28c000 0000000000000000 ffff8800714abea0
> [  222.184870]  ffffffffc0355b17 0000000000000008 ffff8800714abe28 ffffffff810aba32
> [  222.192444]  ffff880178816e40 ffff8800714abe40 ffffffff810a4f44 ffff880178816e40
> [  222.200017] Call Trace:
> [  222.202522]  [<ffffffffc0355b17>] kvm_vm_ioctl+0x277/0x6e0 [kvm]
> [  222.208633]  [<ffffffff810aba32>] ? put_prev_task_fair+0x22/0x40
> [  222.214741]  [<ffffffff810a4f44>] ? pick_next_task_idle+0x14/0x30
> [  222.220942]  [<ffffffff811fb53a>] do_vfs_ioctl+0x2ba/0x490
> [  222.226523]  [<ffffffff8106274a>] ? __do_page_fault+0x1ba/0x410
> [  222.232546]  [<ffffffff811fb789>] SyS_ioctl+0x79/0x90
> [  222.237684]  [<ffffffff81003ba5>] ? syscall_return_slowpath+0x55/0x150
> [  222.244323]  [<ffffffff81791d36>] entry_SYSCALL_64_fastpath+0x16/0x75
> [  222.250869] Code: 55 48 89 e5 41 55 41 54 53 41 89 f5 48 89 fb e8 27 61 cb c0 85 c0 74 13 8b 83 f0 09 00 00 85 c0 74 09 80 3d 53 2e 04 00 00 74 40 <48> 8b 04 25 00 00 00 00 48 8d 78 48 e8 7f c4 d6 c0 41 89 c4 48
> [  222.270790] RIP  [<ffffffffc0368ab0>] kvm_arch_vcpu_create+0x30/0x90 [kvm]
> [  222.277813]  RSP <ffff8800714abde0>
> [  222.281359] CR2: 0000000000000000
> [  222.290421] ---[ end trace 957f5a39692fe6c7 ]---

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series
  2015-09-28 10:18       ` Paolo Bonzini
@ 2015-09-28 10:22         ` Wu, Feng
  0 siblings, 0 replies; 56+ messages in thread
From: Wu, Feng @ 2015-09-28 10:22 UTC (permalink / raw)
  To: Paolo Bonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Wu, Feng



> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Monday, September 28, 2015 6:19 PM
> To: Wu, Feng; alex.williamson@redhat.com; joro@8bytes.org;
> mtosatti@redhat.com
> Cc: eric.auger@linaro.org; kvm@vger.kernel.org;
> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including
> prerequisite series
> 
> 
> 
> On 28/09/2015 12:14, Wu, Feng wrote:
> > Thanks a lot for creating branch for vt-d pi. However, I cannot launch guests
> > with this tree. I encountered the following kernel dump, and I find that the
> > problematic commit is " 2260b1cde0b5472ab70ad0764b10095372e41913 "
> >
> >     KVM: x86: put vcpu_create under kvm->srcu critical section
> >
> >     This is needed in case vcpu_create wants to access the memslots array.
> >     Fixes this lockdep splat:
> >
> > After removing this commit from the tree, my VT-d patch-set works fine.
> 
> Great, thanks.  The above commit had already been reverted.
> 
> I'm sorting out the kbuild reports, and then will merge VT-d PI.

Thanks a lot for make this happen!

Thanks,
Feng

> 
> Paolo
> 
> >
> > Kernel dump:
> > [  221.978182] BUG: unable to handle kernel NULL pointer dereference at
> (null)
> > [  221.986085] IP: [<ffffffffc0368ab0>] kvm_arch_vcpu_create+0x30/0x90
> [kvm]
> > [  221.993102] PGD 0
> > [  221.995148] Oops: 0000 [#1] SMP
> > [  221.998440] Modules linked in: bnep rfcomm bluetooth ax88179_178a
> usbnet intel_rapl mii snd_hda_codec_hdmi iosf_mbi x86_pkg_temp_thermal
> nouveau intel_powerclamp snd_hda_intel snd_hda_codec coretemp kvm_intel
> snd_hda_core kvm snd_hwdep snd_pcm crct10dif_pclmul crc32_pclmul
> snd_seq_midi ghash_clmulni_intel mxm_wmi snd_seq_midi_event snd_rawmidi
> video snd_seq ttm aesni_intel aes_x86_64 lrw gf128mul drm_kms_helper
> snd_seq_device binfmt_misc snd_timer glue_helper ablk_helper drm cryptd
> fb_sys_fops snd syscopyarea sysfillrect sb_edac soundcore sysimgblt mei_me
> parport_pc edac_core ppdev mei shpchp lp lpc_ich mac_hid parport
> acpi_power_meter wmi ixgbe igb i2c_algo_bit hid_generic usbhid ptp ahci hid
> libahci pps_core mdio
> > [  222.063533] CPU: 4 PID: 3384 Comm: qemu-system-x86 Not tainted
> 4.3.0-rc1+ #6
> > [  222.070612] Hardware name: Intel Corp. GRANGEVILLE/GRANTLEY, BIOS
> GNVDCRB1.86B.0020.V07.1409241147 09/24/2014
> > [  222.080764] task: ffff88006e7c8000 ti: ffff8800714a8000 task.ti:
> ffff8800714a8000
> > [  222.088283] RIP: 0010:[<ffffffffc0368ab0>]  [<ffffffffc0368ab0>]
> kvm_arch_vcpu_create+0x30/0x90 [kvm]
> > [  222.097680] RSP: 0018:ffff8800714abde0  EFLAGS: 00010246
> > [  222.103153] RAX: 0000000000000000 RBX: ffff88016f28c000 RCX:
> 0000000000000000
> > [  222.110407] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> ffff88016f28c000
> > [  222.117659] RBP: ffff8800714abdf8 R08: 0000000000000001 R09:
> 0000000000000040
> > [  222.124824] R10: ffff880077e86438 R11: ffff880163e06880 R12:
> ffff88016f28c000
> > [  222.132150] R13: 0000000000000000 R14: 000000000000ae41 R15:
> 0000000000000000
> > [  222.139405] FS:  00007f43fd7ec700(0000) GS:ffff880178700000(0000)
> knlGS:0000000000000000
> > [  222.147629] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  222.153471] CR2: 0000000000000000 CR3: 000000017074b000 CR4:
> 00000000003426e0
> > [  222.160726] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> > [  222.167979] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> > [  222.175231] Stack:
> > [  222.177277]  0000000000000000 ffff88016f28c000 0000000000000000
> ffff8800714abea0
> > [  222.184870]  ffffffffc0355b17 0000000000000008 ffff8800714abe28
> ffffffff810aba32
> > [  222.192444]  ffff880178816e40 ffff8800714abe40 ffffffff810a4f44
> ffff880178816e40
> > [  222.200017] Call Trace:
> > [  222.202522]  [<ffffffffc0355b17>] kvm_vm_ioctl+0x277/0x6e0 [kvm]
> > [  222.208633]  [<ffffffff810aba32>] ? put_prev_task_fair+0x22/0x40
> > [  222.214741]  [<ffffffff810a4f44>] ? pick_next_task_idle+0x14/0x30
> > [  222.220942]  [<ffffffff811fb53a>] do_vfs_ioctl+0x2ba/0x490
> > [  222.226523]  [<ffffffff8106274a>] ? __do_page_fault+0x1ba/0x410
> > [  222.232546]  [<ffffffff811fb789>] SyS_ioctl+0x79/0x90
> > [  222.237684]  [<ffffffff81003ba5>] ? syscall_return_slowpath+0x55/0x150
> > [  222.244323]  [<ffffffff81791d36>]
> entry_SYSCALL_64_fastpath+0x16/0x75
> > [  222.250869] Code: 55 48 89 e5 41 55 41 54 53 41 89 f5 48 89 fb e8 27 61
> cb c0 85 c0 74 13 8b 83 f0 09 00 00 85 c0 74 09 80 3d 53 2e 04 00 00 74 40 <48>
> 8b 04 25 00 00 00 00 48 8d 78 48 e8 7f c4 d6 c0 41 89 c4 48
> > [  222.270790] RIP  [<ffffffffc0368ab0>] kvm_arch_vcpu_create+0x30/0x90
> [kvm]
> > [  222.277813]  RSP <ffff8800714abde0>
> > [  222.281359] CR2: 0000000000000000
> > [  222.290421] ---[ end trace 957f5a39692fe6c7 ]---
> > root@feng-bdw-de-pi:~/workspace/tools# dmesg > ~/dmesg.log
> > root@feng-bdw-de-pi:~/workspace/tools# vim ~/dmesg.log
> > [  221.998440] Modules linked in: bnep rfcomm bluetooth ax88179_178a
> usbnet intel_rapl mii snd_hda_codec_hdmi iosf_mbi x86_pkg_temp_thermal
> nouveau intel_powerclamp snd_hda_intel snd_hda_codec coretemp kvm_intel
> snd_hda_core kvm snd_hwdep snd_pcm crct10dif_pclmul crc32_pclmul
> snd_seq_midi ghash_clmulni_intel mxm_wmi snd_seq_midi_event snd_rawmidi
> video snd_seq ttm aesni_intel aes_x86_64 lrw gf128mul drm_kms_helper
> snd_seq_device binfmt_misc snd_timer glue_helper ablk_helper drm cryptd
> fb_sys_fops snd syscopyarea sysfillrect sb_edac soundcore sysimgblt mei_me
> parport_pc edac_core ppdev mei shpchp lp lpc_ich mac_hid parport
> acpi_power_meter wmi ixgbe igb i2c_algo_bit hid_generic usbhid ptp ahci hid
> libahci pps_core mdio
> > [  222.063533] CPU: 4 PID: 3384 Comm: qemu-system-x86 Not tainted
> 4.3.0-rc1+ #6
> > [  222.070612] Hardware name: Intel Corp. GRANGEVILLE/GRANTLEY, BIOS
> GNVDCRB1.86B.0020.V07.1409241147 09/24/2014
> > [  222.080764] task: ffff88006e7c8000 ti: ffff8800714a8000 task.ti:
> ffff8800714a8000
> > [  222.088283] RIP: 0010:[<ffffffffc0368ab0>]  [<ffffffffc0368ab0>]
> kvm_arch_vcpu_create+0x30/0x90 [kvm]
> > [  222.097680] RSP: 0018:ffff8800714abde0  EFLAGS: 00010246
> > [  222.103153] RAX: 0000000000000000 RBX: ffff88016f28c000 RCX:
> 0000000000000000
> > [  222.110407] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> ffff88016f28c000
> > [  222.117659] RBP: ffff8800714abdf8 R08: 0000000000000001 R09:
> 0000000000000040
> > [  222.124824] R10: ffff880077e86438 R11: ffff880163e06880 R12:
> ffff88016f28c000
> > [  222.132150] R13: 0000000000000000 R14: 000000000000ae41 R15:
> 0000000000000000
> > [  222.139405] FS:  00007f43fd7ec700(0000) GS:ffff880178700000(0000)
> knlGS:0000000000000000
> > [  222.147629] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  222.153471] CR2: 0000000000000000 CR3: 000000017074b000 CR4:
> 00000000003426e0
> > [  222.160726] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> > [  222.167979] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> > [  222.175231] Stack:
> > [  222.177277]  0000000000000000 ffff88016f28c000 0000000000000000
> ffff8800714abea0
> > [  222.184870]  ffffffffc0355b17 0000000000000008 ffff8800714abe28
> ffffffff810aba32
> > [  222.192444]  ffff880178816e40 ffff8800714abe40 ffffffff810a4f44
> ffff880178816e40
> > [  222.200017] Call Trace:
> > [  222.202522]  [<ffffffffc0355b17>] kvm_vm_ioctl+0x277/0x6e0 [kvm]
> > [  222.208633]  [<ffffffff810aba32>] ? put_prev_task_fair+0x22/0x40
> > [  222.214741]  [<ffffffff810a4f44>] ? pick_next_task_idle+0x14/0x30
> > [  222.220942]  [<ffffffff811fb53a>] do_vfs_ioctl+0x2ba/0x490
> > [  222.226523]  [<ffffffff8106274a>] ? __do_page_fault+0x1ba/0x410
> > [  222.232546]  [<ffffffff811fb789>] SyS_ioctl+0x79/0x90
> > [  222.237684]  [<ffffffff81003ba5>] ? syscall_return_slowpath+0x55/0x150
> > [  222.244323]  [<ffffffff81791d36>]
> entry_SYSCALL_64_fastpath+0x16/0x75
> > [  222.250869] Code: 55 48 89 e5 41 55 41 54 53 41 89 f5 48 89 fb e8 27 61
> cb c0 85 c0 74 13 8b 83 f0 09 00 00 85 c0 74 09 80 3d 53 2e 04 00 00 74 40 <48>
> 8b 04 25 00 00 00 00 48 8d 78 48 e8 7f c4 d6 c0 41 89 c4 48
> > [  222.270790] RIP  [<ffffffffc0368ab0>] kvm_arch_vcpu_create+0x30/0x90
> [kvm]
> > [  222.277813]  RSP <ffff8800714abde0>
> > [  222.281359] CR2: 0000000000000000
> > [  222.290421] ---[ end trace 957f5a39692fe6c7 ]---

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v9 17/18] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-09-18 14:29 ` [PATCH v9 17/18] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked Feng Wu
  2015-09-18 16:06   ` Paolo Bonzini
@ 2015-10-14 23:41   ` David Matlack
  2015-10-15  1:33     ` Wu, Feng
  1 sibling, 1 reply; 56+ messages in thread
From: David Matlack @ 2015-10-14 23:41 UTC (permalink / raw)
  To: Feng Wu
  Cc: Paolo Bonzini, alex.williamson, Joerg Roedel, Marcelo Tosatti,
	eric.auger, kvm list, iommu, linux-kernel

Hi Feng.

On Fri, Sep 18, 2015 at 7:29 AM, Feng Wu <feng.wu@intel.com> wrote:
> This patch updates the Posted-Interrupts Descriptor when vCPU
> is blocked.
>
> pre-block:
> - Add the vCPU to the blocked per-CPU list
> - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR
>
> post-block:
> - Remove the vCPU from the per-CPU list

I'm wondering what happens if a posted interrupt arrives at the IOMMU
after pre-block and before post-block.

In pre_block, NV is set to POSTED_INTR_WAKEUP_VECTOR. IIUC, this means
future posted interrupts will not trigger "Posted-Interrupt Processing"
(PIR will not get copied to VIRR). Instead, the IOMMU will do ON := 1,
PIR |= (1 << vector), and send POSTED_INTR_WAKEUP_VECTOR. PIWV calls
wakeup_handler which does kvm_vcpu_kick. kvm_vcpu_kick does a wait-queue
wakeup and possibly a scheduler ipi.

But the VCPU is sitting in kvm_vcpu_block. It spins and/or schedules
(wait queue) until it has a reason to wake up. I couldn't find a code
path from kvm_vcpu_block that lead to checking ON or PIR. How does the
blocked VCPU "receive" the posted interrupt? (And when does Posted-
Interrupt Processing get triggered?)

Thanks!

>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
> v9:
> - Add description for blocked_vcpu_on_cpu_lock in Documentation/virtual/kvm/locking.txt
> - Check !kvm_arch_has_assigned_device(vcpu->kvm) first, then
>   !irq_remapping_cap(IRQ_POSTING_CAP)
>
> v8:
> - Rename 'pi_pre_block' to 'pre_block'
> - Rename 'pi_post_block' to 'post_block'
> - Change some comments
> - Only add the vCPU to the blocking list when the VM has assigned devices.
>
>  Documentation/virtual/kvm/locking.txt |  12 +++
>  arch/x86/include/asm/kvm_host.h       |  13 +++
>  arch/x86/kvm/vmx.c                    | 153 ++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/x86.c                    |  53 +++++++++---
>  include/linux/kvm_host.h              |   3 +
>  virt/kvm/kvm_main.c                   |   3 +
>  6 files changed, 227 insertions(+), 10 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/locking.txt b/Documentation/virtual/kvm/locking.txt
> index d68af4d..19f94a6 100644
> --- a/Documentation/virtual/kvm/locking.txt
> +++ b/Documentation/virtual/kvm/locking.txt
> @@ -166,3 +166,15 @@ Comment:   The srcu read lock must be held while accessing memslots (e.g.
>                 MMIO/PIO address->device structure mapping (kvm->buses).
>                 The srcu index can be stored in kvm_vcpu->srcu_idx per vcpu
>                 if it is needed by multiple functions.
> +
> +Name:          blocked_vcpu_on_cpu_lock
> +Type:          spinlock_t
> +Arch:          x86
> +Protects:      blocked_vcpu_on_cpu
> +Comment:       This is a per-CPU lock and it is used for VT-d posted-interrupts.
> +               When VT-d posted-interrupts is supported and the VM has assigned
> +               devices, we put the blocked vCPU on the list blocked_vcpu_on_cpu
> +               protected by blocked_vcpu_on_cpu_lock, when VT-d hardware issues
> +               wakeup notification event since external interrupts from the
> +               assigned devices happens, we will find the vCPU on the list to
> +               wakeup.
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 0ddd353..304fbb5 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -552,6 +552,8 @@ struct kvm_vcpu_arch {
>          */
>         bool write_fault_to_shadow_pgtable;
>
> +       bool halted;
> +
>         /* set at EPT violation at this point */
>         unsigned long exit_qualification;
>
> @@ -864,6 +866,17 @@ struct kvm_x86_ops {
>         /* pmu operations of sub-arch */
>         const struct kvm_pmu_ops *pmu_ops;
>
> +       /*
> +        * Architecture specific hooks for vCPU blocking due to
> +        * HLT instruction.
> +        * Returns for .pre_block():
> +        *    - 0 means continue to block the vCPU.
> +        *    - 1 means we cannot block the vCPU since some event
> +        *        happens during this period, such as, 'ON' bit in
> +        *        posted-interrupts descriptor is set.
> +        */
> +       int (*pre_block)(struct kvm_vcpu *vcpu);
> +       void (*post_block)(struct kvm_vcpu *vcpu);
>         int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
>                               uint32_t guest_irq, bool set);
>  };
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 902a67d..9968896 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -879,6 +879,13 @@ static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
>  static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
>  static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
>
> +/*
> + * We maintian a per-CPU linked-list of vCPU, so in wakeup_handler() we
> + * can find which vCPU should be waken up.
> + */
> +static DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
> +static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
> +
>  static unsigned long *vmx_io_bitmap_a;
>  static unsigned long *vmx_io_bitmap_b;
>  static unsigned long *vmx_msr_bitmap_legacy;
> @@ -2985,6 +2992,8 @@ static int hardware_enable(void)
>                 return -EBUSY;
>
>         INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
> +       INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
> +       spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
>
>         /*
>          * Now we can enable the vmclear operation in kdump
> @@ -6121,6 +6130,25 @@ static void update_ple_window_actual_max(void)
>                                             ple_window_grow, INT_MIN);
>  }
>
> +/*
> + * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
> + */
> +static void wakeup_handler(void)
> +{
> +       struct kvm_vcpu *vcpu;
> +       int cpu = smp_processor_id();
> +
> +       spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> +       list_for_each_entry(vcpu, &per_cpu(blocked_vcpu_on_cpu, cpu),
> +                       blocked_vcpu_list) {
> +               struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> +
> +               if (pi_test_on(pi_desc) == 1)
> +                       kvm_vcpu_kick(vcpu);
> +       }
> +       spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> +}
> +
>  static __init int hardware_setup(void)
>  {
>         int r = -ENOMEM, i, msr;
> @@ -6305,6 +6333,8 @@ static __init int hardware_setup(void)
>                 kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
>         }
>
> +       kvm_set_posted_intr_wakeup_handler(wakeup_handler);
> +
>         return alloc_kvm_area();
>
>  out8:
> @@ -10430,6 +10460,126 @@ static void vmx_enable_log_dirty_pt_masked(struct kvm *kvm,
>  }
>
>  /*
> + * This routine does the following things for vCPU which is going
> + * to be blocked if VT-d PI is enabled.
> + * - Store the vCPU to the wakeup list, so when interrupts happen
> + *   we can find the right vCPU to wake up.
> + * - Change the Posted-interrupt descriptor as below:
> + *      'NDST' <-- vcpu->pre_pcpu
> + *      'NV' <-- POSTED_INTR_WAKEUP_VECTOR
> + * - If 'ON' is set during this process, which means at least one
> + *   interrupt is posted for this vCPU, we cannot block it, in
> + *   this case, return 1, otherwise, return 0.
> + *
> + */
> +static int vmx_pre_block(struct kvm_vcpu *vcpu)
> +{
> +       unsigned long flags;
> +       unsigned int dest;
> +       struct pi_desc old, new;
> +       struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> +
> +       if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
> +               !irq_remapping_cap(IRQ_POSTING_CAP))
> +               return 0;
> +
> +       vcpu->pre_pcpu = vcpu->cpu;
> +       spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> +                         vcpu->pre_pcpu), flags);
> +       list_add_tail(&vcpu->blocked_vcpu_list,
> +                     &per_cpu(blocked_vcpu_on_cpu,
> +                     vcpu->pre_pcpu));
> +       spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> +                              vcpu->pre_pcpu), flags);
> +
> +       do {
> +               old.control = new.control = pi_desc->control;
> +
> +               /*
> +                * We should not block the vCPU if
> +                * an interrupt is posted for it.
> +                */
> +               if (pi_test_on(pi_desc) == 1) {
> +                       spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> +                                         vcpu->pre_pcpu), flags);
> +                       list_del(&vcpu->blocked_vcpu_list);
> +                       spin_unlock_irqrestore(
> +                                       &per_cpu(blocked_vcpu_on_cpu_lock,
> +                                       vcpu->pre_pcpu), flags);
> +                       vcpu->pre_pcpu = -1;
> +
> +                       return 1;
> +               }
> +
> +               WARN((pi_desc->sn == 1),
> +                    "Warning: SN field of posted-interrupts "
> +                    "is set before blocking\n");
> +
> +               /*
> +                * Since vCPU can be preempted during this process,
> +                * vcpu->cpu could be different with pre_pcpu, we
> +                * need to set pre_pcpu as the destination of wakeup
> +                * notification event, then we can find the right vCPU
> +                * to wakeup in wakeup handler if interrupts happen
> +                * when the vCPU is in blocked state.
> +                */
> +               dest = cpu_physical_id(vcpu->pre_pcpu);
> +
> +               if (x2apic_enabled())
> +                       new.ndst = dest;
> +               else
> +                       new.ndst = (dest << 8) & 0xFF00;
> +
> +               /* set 'NV' to 'wakeup vector' */
> +               new.nv = POSTED_INTR_WAKEUP_VECTOR;
> +       } while (cmpxchg(&pi_desc->control, old.control,
> +                       new.control) != old.control);
> +
> +       return 0;
> +}
> +
> +static void vmx_post_block(struct kvm_vcpu *vcpu)
> +{
> +       struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> +       struct pi_desc old, new;
> +       unsigned int dest;
> +       unsigned long flags;
> +
> +       if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
> +               !irq_remapping_cap(IRQ_POSTING_CAP))
> +               return;
> +
> +       do {
> +               old.control = new.control = pi_desc->control;
> +
> +               dest = cpu_physical_id(vcpu->cpu);
> +
> +               if (x2apic_enabled())
> +                       new.ndst = dest;
> +               else
> +                       new.ndst = (dest << 8) & 0xFF00;
> +
> +               /* Allow posting non-urgent interrupts */
> +               new.sn = 0;
> +
> +               /* set 'NV' to 'notification vector' */
> +               new.nv = POSTED_INTR_VECTOR;
> +       } while (cmpxchg(&pi_desc->control, old.control,
> +                       new.control) != old.control);
> +
> +       if(vcpu->pre_pcpu != -1) {
> +               spin_lock_irqsave(
> +                       &per_cpu(blocked_vcpu_on_cpu_lock,
> +                       vcpu->pre_pcpu), flags);
> +               list_del(&vcpu->blocked_vcpu_list);
> +               spin_unlock_irqrestore(
> +                       &per_cpu(blocked_vcpu_on_cpu_lock,
> +                       vcpu->pre_pcpu), flags);
> +               vcpu->pre_pcpu = -1;
> +       }
> +}
> +
> +/*
>   * vmx_update_pi_irte - set IRTE for Posted-Interrupts
>   *
>   * @kvm: kvm
> @@ -10620,6 +10770,9 @@ static struct kvm_x86_ops vmx_x86_ops = {
>         .flush_log_dirty = vmx_flush_log_dirty,
>         .enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked,
>
> +       .pre_block = vmx_pre_block,
> +       .post_block = vmx_post_block,
> +
>         .pmu_ops = &intel_pmu_ops,
>
>         .update_pi_irte = vmx_update_pi_irte,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 58688aa..46f55b2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5869,7 +5869,12 @@ int kvm_vcpu_halt(struct kvm_vcpu *vcpu)
>  {
>         ++vcpu->stat.halt_exits;
>         if (irqchip_in_kernel(vcpu->kvm)) {
> -               vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
> +               /* Handle posted-interrupt when vCPU is to be halted */
> +               if (!kvm_x86_ops->pre_block ||
> +                               kvm_x86_ops->pre_block(vcpu) == 0) {
> +                       vcpu->arch.halted = true;
> +                       vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
> +               }
>                 return 1;
>         } else {
>                 vcpu->run->exit_reason = KVM_EXIT_HLT;
> @@ -6518,6 +6523,20 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>                         kvm_vcpu_reload_apic_access_page(vcpu);
>         }
>
> +       /*
> +        * KVM_REQ_EVENT is not set when posted interrupts are set by
> +        * VT-d hardware, so we have to update RVI unconditionally.
> +        */
> +       if (kvm_lapic_enabled(vcpu)) {
> +               /*
> +                * Update architecture specific hints for APIC
> +                * virtual interrupt delivery.
> +                */
> +               if (kvm_x86_ops->hwapic_irr_update)
> +                       kvm_x86_ops->hwapic_irr_update(vcpu,
> +                               kvm_lapic_find_highest_irr(vcpu));
> +       }
> +
>         if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>                 kvm_apic_accept_events(vcpu);
>                 if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> @@ -6534,13 +6553,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>                         kvm_x86_ops->enable_irq_window(vcpu);
>
>                 if (kvm_lapic_enabled(vcpu)) {
> -                       /*
> -                        * Update architecture specific hints for APIC
> -                        * virtual interrupt delivery.
> -                        */
> -                       if (kvm_x86_ops->hwapic_irr_update)
> -                               kvm_x86_ops->hwapic_irr_update(vcpu,
> -                                       kvm_lapic_find_highest_irr(vcpu));
>                         update_cr8_intercept(vcpu);
>                         kvm_lapic_sync_to_vapic(vcpu);
>                 }
> @@ -6711,10 +6723,31 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
>
>         for (;;) {
>                 if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
> -                   !vcpu->arch.apf.halted)
> +                   !vcpu->arch.apf.halted) {
> +                       /*
> +                        * For some cases, we can get here with
> +                        * vcpu->arch.halted being true.
> +                        */
> +                       if (kvm_x86_ops->post_block && vcpu->arch.halted) {
> +                               kvm_x86_ops->post_block(vcpu);
> +                               vcpu->arch.halted = false;
> +                       }
> +
>                         r = vcpu_enter_guest(vcpu);
> -               else
> +               } else {
>                         r = vcpu_block(kvm, vcpu);
> +
> +                       /*
> +                        * post_block() must be called after
> +                        * pre_block() which is called in
> +                        * kvm_vcpu_halt().
> +                        */
> +                       if (kvm_x86_ops->post_block && vcpu->arch.halted) {
> +                               kvm_x86_ops->post_block(vcpu);
> +                               vcpu->arch.halted = false;
> +                       }
> +               }
> +
>                 if (r <= 0)
>                         break;
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index feba1fb..bf462e7 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -231,6 +231,9 @@ struct kvm_vcpu {
>         unsigned long requests;
>         unsigned long guest_debug;
>
> +       int pre_pcpu;
> +       struct list_head blocked_vcpu_list;
> +
>         struct mutex mutex;
>         struct kvm_run *run;
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 8b8a444..191c7eb 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -220,6 +220,9 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
>         init_waitqueue_head(&vcpu->wq);
>         kvm_async_pf_vcpu_init(vcpu);
>
> +       vcpu->pre_pcpu = -1;
> +       INIT_LIST_HEAD(&vcpu->blocked_vcpu_list);
> +
>         page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>         if (!page) {
>                 r = -ENOMEM;
> --
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v9 17/18] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-10-14 23:41   ` David Matlack
@ 2015-10-15  1:33     ` Wu, Feng
  2015-10-15 17:39       ` David Matlack
  0 siblings, 1 reply; 56+ messages in thread
From: Wu, Feng @ 2015-10-15  1:33 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, alex.williamson, Joerg Roedel, Marcelo Tosatti,
	eric.auger, kvm list, iommu, linux-kernel, Wu, Feng

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 18692 bytes --]



> -----Original Message-----
> From: David Matlack [mailto:dmatlack@google.com]
> Sent: Thursday, October 15, 2015 7:41 AM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>; alex.williamson@redhat.com; Joerg
> Roedel <joro@8bytes.org>; Marcelo Tosatti <mtosatti@redhat.com>;
> eric.auger@linaro.org; kvm list <kvm@vger.kernel.org>; iommu@lists.linux-
> foundation.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v9 17/18] KVM: Update Posted-Interrupts Descriptor when
> vCPU is blocked
> 
> Hi Feng.
> 
> On Fri, Sep 18, 2015 at 7:29 AM, Feng Wu <feng.wu@intel.com> wrote:
> > This patch updates the Posted-Interrupts Descriptor when vCPU
> > is blocked.
> >
> > pre-block:
> > - Add the vCPU to the blocked per-CPU list
> > - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR
> >
> > post-block:
> > - Remove the vCPU from the per-CPU list
> 
> I'm wondering what happens if a posted interrupt arrives at the IOMMU
> after pre-block and before post-block.
> 
> In pre_block, NV is set to POSTED_INTR_WAKEUP_VECTOR. IIUC, this means
> future posted interrupts will not trigger "Posted-Interrupt Processing"
> (PIR will not get copied to VIRR). Instead, the IOMMU will do ON := 1,
> PIR |= (1 << vector), and send POSTED_INTR_WAKEUP_VECTOR. PIWV calls
> wakeup_handler which does kvm_vcpu_kick. kvm_vcpu_kick does a wait-queue
> wakeup and possibly a scheduler ipi.
> 
> But the VCPU is sitting in kvm_vcpu_block. It spins and/or schedules
> (wait queue) until it has a reason to wake up. I couldn't find a code
> path from kvm_vcpu_block that lead to checking ON or PIR. How does the
> blocked VCPU "receive" the posted interrupt? (And when does Posted-
> Interrupt Processing get triggered?)

In the pre_block, it also change the 'NDST' filed to the pCPU, on which the vCPU
is put to the per-CPU list 'blocked_vcpu_on_cpu', so when posted-interrupts
come it, it will sent the wakeup notification event to the pCPU above, then in
the wakeup_handler, it can find the vCPU from the per-CPU list, hence
kvm_vcpu_kick can wake up it.

Thanks,
Feng

> 
> Thanks!
> 
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> > v9:
> > - Add description for blocked_vcpu_on_cpu_lock in
> Documentation/virtual/kvm/locking.txt
> > - Check !kvm_arch_has_assigned_device(vcpu->kvm) first, then
> >   !irq_remapping_cap(IRQ_POSTING_CAP)
> >
> > v8:
> > - Rename 'pi_pre_block' to 'pre_block'
> > - Rename 'pi_post_block' to 'post_block'
> > - Change some comments
> > - Only add the vCPU to the blocking list when the VM has assigned devices.
> >
> >  Documentation/virtual/kvm/locking.txt |  12 +++
> >  arch/x86/include/asm/kvm_host.h       |  13 +++
> >  arch/x86/kvm/vmx.c                    | 153
> ++++++++++++++++++++++++++++++++++
> >  arch/x86/kvm/x86.c                    |  53 +++++++++---
> >  include/linux/kvm_host.h              |   3 +
> >  virt/kvm/kvm_main.c                   |   3 +
> >  6 files changed, 227 insertions(+), 10 deletions(-)
> >
> > diff --git a/Documentation/virtual/kvm/locking.txt
> b/Documentation/virtual/kvm/locking.txt
> > index d68af4d..19f94a6 100644
> > --- a/Documentation/virtual/kvm/locking.txt
> > +++ b/Documentation/virtual/kvm/locking.txt
> > @@ -166,3 +166,15 @@ Comment:   The srcu read lock must be held while
> accessing memslots (e.g.
> >                 MMIO/PIO address->device structure mapping (kvm->buses).
> >                 The srcu index can be stored in kvm_vcpu->srcu_idx per vcpu
> >                 if it is needed by multiple functions.
> > +
> > +Name:          blocked_vcpu_on_cpu_lock
> > +Type:          spinlock_t
> > +Arch:          x86
> > +Protects:      blocked_vcpu_on_cpu
> > +Comment:       This is a per-CPU lock and it is used for VT-d posted-interrupts.
> > +               When VT-d posted-interrupts is supported and the VM has assigned
> > +               devices, we put the blocked vCPU on the list blocked_vcpu_on_cpu
> > +               protected by blocked_vcpu_on_cpu_lock, when VT-d hardware
> issues
> > +               wakeup notification event since external interrupts from the
> > +               assigned devices happens, we will find the vCPU on the list to
> > +               wakeup.
> > diff --git a/arch/x86/include/asm/kvm_host.h
> b/arch/x86/include/asm/kvm_host.h
> > index 0ddd353..304fbb5 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -552,6 +552,8 @@ struct kvm_vcpu_arch {
> >          */
> >         bool write_fault_to_shadow_pgtable;
> >
> > +       bool halted;
> > +
> >         /* set at EPT violation at this point */
> >         unsigned long exit_qualification;
> >
> > @@ -864,6 +866,17 @@ struct kvm_x86_ops {
> >         /* pmu operations of sub-arch */
> >         const struct kvm_pmu_ops *pmu_ops;
> >
> > +       /*
> > +        * Architecture specific hooks for vCPU blocking due to
> > +        * HLT instruction.
> > +        * Returns for .pre_block():
> > +        *    - 0 means continue to block the vCPU.
> > +        *    - 1 means we cannot block the vCPU since some event
> > +        *        happens during this period, such as, 'ON' bit in
> > +        *        posted-interrupts descriptor is set.
> > +        */
> > +       int (*pre_block)(struct kvm_vcpu *vcpu);
> > +       void (*post_block)(struct kvm_vcpu *vcpu);
> >         int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
> >                               uint32_t guest_irq, bool set);
> >  };
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 902a67d..9968896 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -879,6 +879,13 @@ static DEFINE_PER_CPU(struct vmcs *,
> current_vmcs);
> >  static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
> >  static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
> >
> > +/*
> > + * We maintian a per-CPU linked-list of vCPU, so in wakeup_handler() we
> > + * can find which vCPU should be waken up.
> > + */
> > +static DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
> > +static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
> > +
> >  static unsigned long *vmx_io_bitmap_a;
> >  static unsigned long *vmx_io_bitmap_b;
> >  static unsigned long *vmx_msr_bitmap_legacy;
> > @@ -2985,6 +2992,8 @@ static int hardware_enable(void)
> >                 return -EBUSY;
> >
> >         INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
> > +       INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
> > +       spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> >
> >         /*
> >          * Now we can enable the vmclear operation in kdump
> > @@ -6121,6 +6130,25 @@ static void update_ple_window_actual_max(void)
> >                                             ple_window_grow, INT_MIN);
> >  }
> >
> > +/*
> > + * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
> > + */
> > +static void wakeup_handler(void)
> > +{
> > +       struct kvm_vcpu *vcpu;
> > +       int cpu = smp_processor_id();
> > +
> > +       spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > +       list_for_each_entry(vcpu, &per_cpu(blocked_vcpu_on_cpu, cpu),
> > +                       blocked_vcpu_list) {
> > +               struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > +
> > +               if (pi_test_on(pi_desc) == 1)
> > +                       kvm_vcpu_kick(vcpu);
> > +       }
> > +       spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > +}
> > +
> >  static __init int hardware_setup(void)
> >  {
> >         int r = -ENOMEM, i, msr;
> > @@ -6305,6 +6333,8 @@ static __init int hardware_setup(void)
> >                 kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
> >         }
> >
> > +       kvm_set_posted_intr_wakeup_handler(wakeup_handler);
> > +
> >         return alloc_kvm_area();
> >
> >  out8:
> > @@ -10430,6 +10460,126 @@ static void
> vmx_enable_log_dirty_pt_masked(struct kvm *kvm,
> >  }
> >
> >  /*
> > + * This routine does the following things for vCPU which is going
> > + * to be blocked if VT-d PI is enabled.
> > + * - Store the vCPU to the wakeup list, so when interrupts happen
> > + *   we can find the right vCPU to wake up.
> > + * - Change the Posted-interrupt descriptor as below:
> > + *      'NDST' <-- vcpu->pre_pcpu
> > + *      'NV' <-- POSTED_INTR_WAKEUP_VECTOR
> > + * - If 'ON' is set during this process, which means at least one
> > + *   interrupt is posted for this vCPU, we cannot block it, in
> > + *   this case, return 1, otherwise, return 0.
> > + *
> > + */
> > +static int vmx_pre_block(struct kvm_vcpu *vcpu)
> > +{
> > +       unsigned long flags;
> > +       unsigned int dest;
> > +       struct pi_desc old, new;
> > +       struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > +
> > +       if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
> > +               !irq_remapping_cap(IRQ_POSTING_CAP))
> > +               return 0;
> > +
> > +       vcpu->pre_pcpu = vcpu->cpu;
> > +       spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > +                         vcpu->pre_pcpu), flags);
> > +       list_add_tail(&vcpu->blocked_vcpu_list,
> > +                     &per_cpu(blocked_vcpu_on_cpu,
> > +                     vcpu->pre_pcpu));
> > +       spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> > +                              vcpu->pre_pcpu), flags);
> > +
> > +       do {
> > +               old.control = new.control = pi_desc->control;
> > +
> > +               /*
> > +                * We should not block the vCPU if
> > +                * an interrupt is posted for it.
> > +                */
> > +               if (pi_test_on(pi_desc) == 1) {
> > +                       spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > +                                         vcpu->pre_pcpu), flags);
> > +                       list_del(&vcpu->blocked_vcpu_list);
> > +                       spin_unlock_irqrestore(
> > +                                       &per_cpu(blocked_vcpu_on_cpu_lock,
> > +                                       vcpu->pre_pcpu), flags);
> > +                       vcpu->pre_pcpu = -1;
> > +
> > +                       return 1;
> > +               }
> > +
> > +               WARN((pi_desc->sn == 1),
> > +                    "Warning: SN field of posted-interrupts "
> > +                    "is set before blocking\n");
> > +
> > +               /*
> > +                * Since vCPU can be preempted during this process,
> > +                * vcpu->cpu could be different with pre_pcpu, we
> > +                * need to set pre_pcpu as the destination of wakeup
> > +                * notification event, then we can find the right vCPU
> > +                * to wakeup in wakeup handler if interrupts happen
> > +                * when the vCPU is in blocked state.
> > +                */
> > +               dest = cpu_physical_id(vcpu->pre_pcpu);
> > +
> > +               if (x2apic_enabled())
> > +                       new.ndst = dest;
> > +               else
> > +                       new.ndst = (dest << 8) & 0xFF00;
> > +
> > +               /* set 'NV' to 'wakeup vector' */
> > +               new.nv = POSTED_INTR_WAKEUP_VECTOR;
> > +       } while (cmpxchg(&pi_desc->control, old.control,
> > +                       new.control) != old.control);
> > +
> > +       return 0;
> > +}
> > +
> > +static void vmx_post_block(struct kvm_vcpu *vcpu)
> > +{
> > +       struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > +       struct pi_desc old, new;
> > +       unsigned int dest;
> > +       unsigned long flags;
> > +
> > +       if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
> > +               !irq_remapping_cap(IRQ_POSTING_CAP))
> > +               return;
> > +
> > +       do {
> > +               old.control = new.control = pi_desc->control;
> > +
> > +               dest = cpu_physical_id(vcpu->cpu);
> > +
> > +               if (x2apic_enabled())
> > +                       new.ndst = dest;
> > +               else
> > +                       new.ndst = (dest << 8) & 0xFF00;
> > +
> > +               /* Allow posting non-urgent interrupts */
> > +               new.sn = 0;
> > +
> > +               /* set 'NV' to 'notification vector' */
> > +               new.nv = POSTED_INTR_VECTOR;
> > +       } while (cmpxchg(&pi_desc->control, old.control,
> > +                       new.control) != old.control);
> > +
> > +       if(vcpu->pre_pcpu != -1) {
> > +               spin_lock_irqsave(
> > +                       &per_cpu(blocked_vcpu_on_cpu_lock,
> > +                       vcpu->pre_pcpu), flags);
> > +               list_del(&vcpu->blocked_vcpu_list);
> > +               spin_unlock_irqrestore(
> > +                       &per_cpu(blocked_vcpu_on_cpu_lock,
> > +                       vcpu->pre_pcpu), flags);
> > +               vcpu->pre_pcpu = -1;
> > +       }
> > +}
> > +
> > +/*
> >   * vmx_update_pi_irte - set IRTE for Posted-Interrupts
> >   *
> >   * @kvm: kvm
> > @@ -10620,6 +10770,9 @@ static struct kvm_x86_ops vmx_x86_ops = {
> >         .flush_log_dirty = vmx_flush_log_dirty,
> >         .enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked,
> >
> > +       .pre_block = vmx_pre_block,
> > +       .post_block = vmx_post_block,
> > +
> >         .pmu_ops = &intel_pmu_ops,
> >
> >         .update_pi_irte = vmx_update_pi_irte,
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 58688aa..46f55b2 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -5869,7 +5869,12 @@ int kvm_vcpu_halt(struct kvm_vcpu *vcpu)
> >  {
> >         ++vcpu->stat.halt_exits;
> >         if (irqchip_in_kernel(vcpu->kvm)) {
> > -               vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
> > +               /* Handle posted-interrupt when vCPU is to be halted */
> > +               if (!kvm_x86_ops->pre_block ||
> > +                               kvm_x86_ops->pre_block(vcpu) == 0) {
> > +                       vcpu->arch.halted = true;
> > +                       vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
> > +               }
> >                 return 1;
> >         } else {
> >                 vcpu->run->exit_reason = KVM_EXIT_HLT;
> > @@ -6518,6 +6523,20 @@ static int vcpu_enter_guest(struct kvm_vcpu
> *vcpu)
> >                         kvm_vcpu_reload_apic_access_page(vcpu);
> >         }
> >
> > +       /*
> > +        * KVM_REQ_EVENT is not set when posted interrupts are set by
> > +        * VT-d hardware, so we have to update RVI unconditionally.
> > +        */
> > +       if (kvm_lapic_enabled(vcpu)) {
> > +               /*
> > +                * Update architecture specific hints for APIC
> > +                * virtual interrupt delivery.
> > +                */
> > +               if (kvm_x86_ops->hwapic_irr_update)
> > +                       kvm_x86_ops->hwapic_irr_update(vcpu,
> > +                               kvm_lapic_find_highest_irr(vcpu));
> > +       }
> > +
> >         if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> >                 kvm_apic_accept_events(vcpu);
> >                 if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> > @@ -6534,13 +6553,6 @@ static int vcpu_enter_guest(struct kvm_vcpu
> *vcpu)
> >                         kvm_x86_ops->enable_irq_window(vcpu);
> >
> >                 if (kvm_lapic_enabled(vcpu)) {
> > -                       /*
> > -                        * Update architecture specific hints for APIC
> > -                        * virtual interrupt delivery.
> > -                        */
> > -                       if (kvm_x86_ops->hwapic_irr_update)
> > -                               kvm_x86_ops->hwapic_irr_update(vcpu,
> > -                                       kvm_lapic_find_highest_irr(vcpu));
> >                         update_cr8_intercept(vcpu);
> >                         kvm_lapic_sync_to_vapic(vcpu);
> >                 }
> > @@ -6711,10 +6723,31 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
> >
> >         for (;;) {
> >                 if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
> > -                   !vcpu->arch.apf.halted)
> > +                   !vcpu->arch.apf.halted) {
> > +                       /*
> > +                        * For some cases, we can get here with
> > +                        * vcpu->arch.halted being true.
> > +                        */
> > +                       if (kvm_x86_ops->post_block && vcpu->arch.halted) {
> > +                               kvm_x86_ops->post_block(vcpu);
> > +                               vcpu->arch.halted = false;
> > +                       }
> > +
> >                         r = vcpu_enter_guest(vcpu);
> > -               else
> > +               } else {
> >                         r = vcpu_block(kvm, vcpu);
> > +
> > +                       /*
> > +                        * post_block() must be called after
> > +                        * pre_block() which is called in
> > +                        * kvm_vcpu_halt().
> > +                        */
> > +                       if (kvm_x86_ops->post_block && vcpu->arch.halted) {
> > +                               kvm_x86_ops->post_block(vcpu);
> > +                               vcpu->arch.halted = false;
> > +                       }
> > +               }
> > +
> >                 if (r <= 0)
> >                         break;
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index feba1fb..bf462e7 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -231,6 +231,9 @@ struct kvm_vcpu {
> >         unsigned long requests;
> >         unsigned long guest_debug;
> >
> > +       int pre_pcpu;
> > +       struct list_head blocked_vcpu_list;
> > +
> >         struct mutex mutex;
> >         struct kvm_run *run;
> >
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 8b8a444..191c7eb 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -220,6 +220,9 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm
> *kvm, unsigned id)
> >         init_waitqueue_head(&vcpu->wq);
> >         kvm_async_pf_vcpu_init(vcpu);
> >
> > +       vcpu->pre_pcpu = -1;
> > +       INIT_LIST_HEAD(&vcpu->blocked_vcpu_list);
> > +
> >         page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> >         if (!page) {
> >                 r = -ENOMEM;
> > --
> > 2.1.0
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v9 17/18] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-10-15  1:33     ` Wu, Feng
@ 2015-10-15 17:39       ` David Matlack
  2015-10-15 18:13         ` Paolo Bonzini
  0 siblings, 1 reply; 56+ messages in thread
From: David Matlack @ 2015-10-15 17:39 UTC (permalink / raw)
  To: Wu, Feng
  Cc: Paolo Bonzini, alex.williamson, Joerg Roedel, Marcelo Tosatti,
	eric.auger, kvm list, iommu, linux-kernel

On Wed, Oct 14, 2015 at 6:33 PM, Wu, Feng <feng.wu@intel.com> wrote:
>
>> -----Original Message-----
>> From: David Matlack [mailto:dmatlack@google.com]
>> Sent: Thursday, October 15, 2015 7:41 AM
>> To: Wu, Feng <feng.wu@intel.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>; alex.williamson@redhat.com; Joerg
>> Roedel <joro@8bytes.org>; Marcelo Tosatti <mtosatti@redhat.com>;
>> eric.auger@linaro.org; kvm list <kvm@vger.kernel.org>; iommu@lists.linux-
>> foundation.org; linux-kernel@vger.kernel.org
>> Subject: Re: [PATCH v9 17/18] KVM: Update Posted-Interrupts Descriptor when
>> vCPU is blocked
>>
>> Hi Feng.
>>
>> On Fri, Sep 18, 2015 at 7:29 AM, Feng Wu <feng.wu@intel.com> wrote:
>> > This patch updates the Posted-Interrupts Descriptor when vCPU
>> > is blocked.
>> >
>> > pre-block:
>> > - Add the vCPU to the blocked per-CPU list
>> > - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR
>> >
>> > post-block:
>> > - Remove the vCPU from the per-CPU list
>>
>> I'm wondering what happens if a posted interrupt arrives at the IOMMU
>> after pre-block and before post-block.
>>
>> In pre_block, NV is set to POSTED_INTR_WAKEUP_VECTOR. IIUC, this means
>> future posted interrupts will not trigger "Posted-Interrupt Processing"
>> (PIR will not get copied to VIRR). Instead, the IOMMU will do ON := 1,
>> PIR |= (1 << vector), and send POSTED_INTR_WAKEUP_VECTOR. PIWV calls
>> wakeup_handler which does kvm_vcpu_kick. kvm_vcpu_kick does a wait-queue
>> wakeup and possibly a scheduler ipi.
>>
>> But the VCPU is sitting in kvm_vcpu_block. It spins and/or schedules
>> (wait queue) until it has a reason to wake up. I couldn't find a code
>> path from kvm_vcpu_block that lead to checking ON or PIR. How does the
>> blocked VCPU "receive" the posted interrupt? (And when does Posted-
>> Interrupt Processing get triggered?)
>
> In the pre_block, it also change the 'NDST' filed to the pCPU, on which the vCPU
> is put to the per-CPU list 'blocked_vcpu_on_cpu', so when posted-interrupts
> come it, it will sent the wakeup notification event to the pCPU above, then in
> the wakeup_handler, it can find the vCPU from the per-CPU list, hence
> kvm_vcpu_kick can wake up it.

Thank you for your response. I was actually confused about something
else. After wakeup_handler->kvm_vcpu_kick causes the vcpu to wake up,
that vcpu calls kvm_vcpu_check_block() to check if there are pending
events, otherwise the vcpu goes back to sleep. I had trouble yesterday
finding the code path from kvm_vcpu_check_block() which checks PIR/ON.

But after spending more time reading the source code this morning I
found that kvm_vcpu_check_block() eventually calls into
vmx_sync_pir_to_irr(), which copies PIR to IRR and clears ON. And then
apic_find_highest_irr() detects the pending posted interrupt.

>
> Thanks,
> Feng
>
>>
>> Thanks!
>>
>> >
>> > Signed-off-by: Feng Wu <feng.wu@intel.com>
>> > ---
>> > v9:
>> > - Add description for blocked_vcpu_on_cpu_lock in
>> Documentation/virtual/kvm/locking.txt
>> > - Check !kvm_arch_has_assigned_device(vcpu->kvm) first, then
>> >   !irq_remapping_cap(IRQ_POSTING_CAP)
>> >
>> > v8:
>> > - Rename 'pi_pre_block' to 'pre_block'
>> > - Rename 'pi_post_block' to 'post_block'
>> > - Change some comments
>> > - Only add the vCPU to the blocking list when the VM has assigned devices.
>> >
>> >  Documentation/virtual/kvm/locking.txt |  12 +++
>> >  arch/x86/include/asm/kvm_host.h       |  13 +++
>> >  arch/x86/kvm/vmx.c                    | 153
>> ++++++++++++++++++++++++++++++++++
>> >  arch/x86/kvm/x86.c                    |  53 +++++++++---
>> >  include/linux/kvm_host.h              |   3 +
>> >  virt/kvm/kvm_main.c                   |   3 +
>> >  6 files changed, 227 insertions(+), 10 deletions(-)
>> >
>> > diff --git a/Documentation/virtual/kvm/locking.txt
>> b/Documentation/virtual/kvm/locking.txt
>> > index d68af4d..19f94a6 100644
>> > --- a/Documentation/virtual/kvm/locking.txt
>> > +++ b/Documentation/virtual/kvm/locking.txt
>> > @@ -166,3 +166,15 @@ Comment:   The srcu read lock must be held while
>> accessing memslots (e.g.
>> >                 MMIO/PIO address->device structure mapping (kvm->buses).
>> >                 The srcu index can be stored in kvm_vcpu->srcu_idx per vcpu
>> >                 if it is needed by multiple functions.
>> > +
>> > +Name:          blocked_vcpu_on_cpu_lock
>> > +Type:          spinlock_t
>> > +Arch:          x86
>> > +Protects:      blocked_vcpu_on_cpu
>> > +Comment:       This is a per-CPU lock and it is used for VT-d posted-interrupts.
>> > +               When VT-d posted-interrupts is supported and the VM has assigned
>> > +               devices, we put the blocked vCPU on the list blocked_vcpu_on_cpu
>> > +               protected by blocked_vcpu_on_cpu_lock, when VT-d hardware
>> issues
>> > +               wakeup notification event since external interrupts from the
>> > +               assigned devices happens, we will find the vCPU on the list to
>> > +               wakeup.
>> > diff --git a/arch/x86/include/asm/kvm_host.h
>> b/arch/x86/include/asm/kvm_host.h
>> > index 0ddd353..304fbb5 100644
>> > --- a/arch/x86/include/asm/kvm_host.h
>> > +++ b/arch/x86/include/asm/kvm_host.h
>> > @@ -552,6 +552,8 @@ struct kvm_vcpu_arch {
>> >          */
>> >         bool write_fault_to_shadow_pgtable;
>> >
>> > +       bool halted;
>> > +
>> >         /* set at EPT violation at this point */
>> >         unsigned long exit_qualification;
>> >
>> > @@ -864,6 +866,17 @@ struct kvm_x86_ops {
>> >         /* pmu operations of sub-arch */
>> >         const struct kvm_pmu_ops *pmu_ops;
>> >
>> > +       /*
>> > +        * Architecture specific hooks for vCPU blocking due to
>> > +        * HLT instruction.
>> > +        * Returns for .pre_block():
>> > +        *    - 0 means continue to block the vCPU.
>> > +        *    - 1 means we cannot block the vCPU since some event
>> > +        *        happens during this period, such as, 'ON' bit in
>> > +        *        posted-interrupts descriptor is set.
>> > +        */
>> > +       int (*pre_block)(struct kvm_vcpu *vcpu);
>> > +       void (*post_block)(struct kvm_vcpu *vcpu);
>> >         int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
>> >                               uint32_t guest_irq, bool set);
>> >  };
>> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> > index 902a67d..9968896 100644
>> > --- a/arch/x86/kvm/vmx.c
>> > +++ b/arch/x86/kvm/vmx.c
>> > @@ -879,6 +879,13 @@ static DEFINE_PER_CPU(struct vmcs *,
>> current_vmcs);
>> >  static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
>> >  static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
>> >
>> > +/*
>> > + * We maintian a per-CPU linked-list of vCPU, so in wakeup_handler() we
>> > + * can find which vCPU should be waken up.
>> > + */
>> > +static DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
>> > +static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
>> > +
>> >  static unsigned long *vmx_io_bitmap_a;
>> >  static unsigned long *vmx_io_bitmap_b;
>> >  static unsigned long *vmx_msr_bitmap_legacy;
>> > @@ -2985,6 +2992,8 @@ static int hardware_enable(void)
>> >                 return -EBUSY;
>> >
>> >         INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
>> > +       INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
>> > +       spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
>> >
>> >         /*
>> >          * Now we can enable the vmclear operation in kdump
>> > @@ -6121,6 +6130,25 @@ static void update_ple_window_actual_max(void)
>> >                                             ple_window_grow, INT_MIN);
>> >  }
>> >
>> > +/*
>> > + * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
>> > + */
>> > +static void wakeup_handler(void)
>> > +{
>> > +       struct kvm_vcpu *vcpu;
>> > +       int cpu = smp_processor_id();
>> > +
>> > +       spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
>> > +       list_for_each_entry(vcpu, &per_cpu(blocked_vcpu_on_cpu, cpu),
>> > +                       blocked_vcpu_list) {
>> > +               struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
>> > +
>> > +               if (pi_test_on(pi_desc) == 1)
>> > +                       kvm_vcpu_kick(vcpu);
>> > +       }
>> > +       spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
>> > +}
>> > +
>> >  static __init int hardware_setup(void)
>> >  {
>> >         int r = -ENOMEM, i, msr;
>> > @@ -6305,6 +6333,8 @@ static __init int hardware_setup(void)
>> >                 kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
>> >         }
>> >
>> > +       kvm_set_posted_intr_wakeup_handler(wakeup_handler);
>> > +
>> >         return alloc_kvm_area();
>> >
>> >  out8:
>> > @@ -10430,6 +10460,126 @@ static void
>> vmx_enable_log_dirty_pt_masked(struct kvm *kvm,
>> >  }
>> >
>> >  /*
>> > + * This routine does the following things for vCPU which is going
>> > + * to be blocked if VT-d PI is enabled.
>> > + * - Store the vCPU to the wakeup list, so when interrupts happen
>> > + *   we can find the right vCPU to wake up.
>> > + * - Change the Posted-interrupt descriptor as below:
>> > + *      'NDST' <-- vcpu->pre_pcpu
>> > + *      'NV' <-- POSTED_INTR_WAKEUP_VECTOR
>> > + * - If 'ON' is set during this process, which means at least one
>> > + *   interrupt is posted for this vCPU, we cannot block it, in
>> > + *   this case, return 1, otherwise, return 0.
>> > + *
>> > + */
>> > +static int vmx_pre_block(struct kvm_vcpu *vcpu)
>> > +{
>> > +       unsigned long flags;
>> > +       unsigned int dest;
>> > +       struct pi_desc old, new;
>> > +       struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
>> > +
>> > +       if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
>> > +               !irq_remapping_cap(IRQ_POSTING_CAP))
>> > +               return 0;
>> > +
>> > +       vcpu->pre_pcpu = vcpu->cpu;
>> > +       spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
>> > +                         vcpu->pre_pcpu), flags);
>> > +       list_add_tail(&vcpu->blocked_vcpu_list,
>> > +                     &per_cpu(blocked_vcpu_on_cpu,
>> > +                     vcpu->pre_pcpu));
>> > +       spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
>> > +                              vcpu->pre_pcpu), flags);
>> > +
>> > +       do {
>> > +               old.control = new.control = pi_desc->control;
>> > +
>> > +               /*
>> > +                * We should not block the vCPU if
>> > +                * an interrupt is posted for it.
>> > +                */
>> > +               if (pi_test_on(pi_desc) == 1) {
>> > +                       spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
>> > +                                         vcpu->pre_pcpu), flags);
>> > +                       list_del(&vcpu->blocked_vcpu_list);
>> > +                       spin_unlock_irqrestore(
>> > +                                       &per_cpu(blocked_vcpu_on_cpu_lock,
>> > +                                       vcpu->pre_pcpu), flags);
>> > +                       vcpu->pre_pcpu = -1;
>> > +
>> > +                       return 1;
>> > +               }
>> > +
>> > +               WARN((pi_desc->sn == 1),
>> > +                    "Warning: SN field of posted-interrupts "
>> > +                    "is set before blocking\n");
>> > +
>> > +               /*
>> > +                * Since vCPU can be preempted during this process,
>> > +                * vcpu->cpu could be different with pre_pcpu, we
>> > +                * need to set pre_pcpu as the destination of wakeup
>> > +                * notification event, then we can find the right vCPU
>> > +                * to wakeup in wakeup handler if interrupts happen
>> > +                * when the vCPU is in blocked state.
>> > +                */
>> > +               dest = cpu_physical_id(vcpu->pre_pcpu);
>> > +
>> > +               if (x2apic_enabled())
>> > +                       new.ndst = dest;
>> > +               else
>> > +                       new.ndst = (dest << 8) & 0xFF00;
>> > +
>> > +               /* set 'NV' to 'wakeup vector' */
>> > +               new.nv = POSTED_INTR_WAKEUP_VECTOR;
>> > +       } while (cmpxchg(&pi_desc->control, old.control,
>> > +                       new.control) != old.control);
>> > +
>> > +       return 0;
>> > +}
>> > +
>> > +static void vmx_post_block(struct kvm_vcpu *vcpu)
>> > +{
>> > +       struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
>> > +       struct pi_desc old, new;
>> > +       unsigned int dest;
>> > +       unsigned long flags;
>> > +
>> > +       if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
>> > +               !irq_remapping_cap(IRQ_POSTING_CAP))
>> > +               return;
>> > +
>> > +       do {
>> > +               old.control = new.control = pi_desc->control;
>> > +
>> > +               dest = cpu_physical_id(vcpu->cpu);
>> > +
>> > +               if (x2apic_enabled())
>> > +                       new.ndst = dest;
>> > +               else
>> > +                       new.ndst = (dest << 8) & 0xFF00;
>> > +
>> > +               /* Allow posting non-urgent interrupts */
>> > +               new.sn = 0;
>> > +
>> > +               /* set 'NV' to 'notification vector' */
>> > +               new.nv = POSTED_INTR_VECTOR;
>> > +       } while (cmpxchg(&pi_desc->control, old.control,
>> > +                       new.control) != old.control);
>> > +
>> > +       if(vcpu->pre_pcpu != -1) {
>> > +               spin_lock_irqsave(
>> > +                       &per_cpu(blocked_vcpu_on_cpu_lock,
>> > +                       vcpu->pre_pcpu), flags);
>> > +               list_del(&vcpu->blocked_vcpu_list);
>> > +               spin_unlock_irqrestore(
>> > +                       &per_cpu(blocked_vcpu_on_cpu_lock,
>> > +                       vcpu->pre_pcpu), flags);
>> > +               vcpu->pre_pcpu = -1;
>> > +       }
>> > +}
>> > +
>> > +/*
>> >   * vmx_update_pi_irte - set IRTE for Posted-Interrupts
>> >   *
>> >   * @kvm: kvm
>> > @@ -10620,6 +10770,9 @@ static struct kvm_x86_ops vmx_x86_ops = {
>> >         .flush_log_dirty = vmx_flush_log_dirty,
>> >         .enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked,
>> >
>> > +       .pre_block = vmx_pre_block,
>> > +       .post_block = vmx_post_block,
>> > +
>> >         .pmu_ops = &intel_pmu_ops,
>> >
>> >         .update_pi_irte = vmx_update_pi_irte,
>> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> > index 58688aa..46f55b2 100644
>> > --- a/arch/x86/kvm/x86.c
>> > +++ b/arch/x86/kvm/x86.c
>> > @@ -5869,7 +5869,12 @@ int kvm_vcpu_halt(struct kvm_vcpu *vcpu)
>> >  {
>> >         ++vcpu->stat.halt_exits;
>> >         if (irqchip_in_kernel(vcpu->kvm)) {
>> > -               vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
>> > +               /* Handle posted-interrupt when vCPU is to be halted */
>> > +               if (!kvm_x86_ops->pre_block ||
>> > +                               kvm_x86_ops->pre_block(vcpu) == 0) {
>> > +                       vcpu->arch.halted = true;
>> > +                       vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
>> > +               }
>> >                 return 1;
>> >         } else {
>> >                 vcpu->run->exit_reason = KVM_EXIT_HLT;
>> > @@ -6518,6 +6523,20 @@ static int vcpu_enter_guest(struct kvm_vcpu
>> *vcpu)
>> >                         kvm_vcpu_reload_apic_access_page(vcpu);
>> >         }
>> >
>> > +       /*
>> > +        * KVM_REQ_EVENT is not set when posted interrupts are set by
>> > +        * VT-d hardware, so we have to update RVI unconditionally.
>> > +        */
>> > +       if (kvm_lapic_enabled(vcpu)) {
>> > +               /*
>> > +                * Update architecture specific hints for APIC
>> > +                * virtual interrupt delivery.
>> > +                */
>> > +               if (kvm_x86_ops->hwapic_irr_update)
>> > +                       kvm_x86_ops->hwapic_irr_update(vcpu,
>> > +                               kvm_lapic_find_highest_irr(vcpu));
>> > +       }
>> > +
>> >         if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>> >                 kvm_apic_accept_events(vcpu);
>> >                 if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
>> > @@ -6534,13 +6553,6 @@ static int vcpu_enter_guest(struct kvm_vcpu
>> *vcpu)
>> >                         kvm_x86_ops->enable_irq_window(vcpu);
>> >
>> >                 if (kvm_lapic_enabled(vcpu)) {
>> > -                       /*
>> > -                        * Update architecture specific hints for APIC
>> > -                        * virtual interrupt delivery.
>> > -                        */
>> > -                       if (kvm_x86_ops->hwapic_irr_update)
>> > -                               kvm_x86_ops->hwapic_irr_update(vcpu,
>> > -                                       kvm_lapic_find_highest_irr(vcpu));
>> >                         update_cr8_intercept(vcpu);
>> >                         kvm_lapic_sync_to_vapic(vcpu);
>> >                 }
>> > @@ -6711,10 +6723,31 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
>> >
>> >         for (;;) {
>> >                 if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
>> > -                   !vcpu->arch.apf.halted)
>> > +                   !vcpu->arch.apf.halted) {
>> > +                       /*
>> > +                        * For some cases, we can get here with
>> > +                        * vcpu->arch.halted being true.
>> > +                        */
>> > +                       if (kvm_x86_ops->post_block && vcpu->arch.halted) {
>> > +                               kvm_x86_ops->post_block(vcpu);
>> > +                               vcpu->arch.halted = false;
>> > +                       }
>> > +
>> >                         r = vcpu_enter_guest(vcpu);
>> > -               else
>> > +               } else {
>> >                         r = vcpu_block(kvm, vcpu);
>> > +
>> > +                       /*
>> > +                        * post_block() must be called after
>> > +                        * pre_block() which is called in
>> > +                        * kvm_vcpu_halt().
>> > +                        */
>> > +                       if (kvm_x86_ops->post_block && vcpu->arch.halted) {
>> > +                               kvm_x86_ops->post_block(vcpu);
>> > +                               vcpu->arch.halted = false;
>> > +                       }
>> > +               }
>> > +
>> >                 if (r <= 0)
>> >                         break;
>> >
>> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> > index feba1fb..bf462e7 100644
>> > --- a/include/linux/kvm_host.h
>> > +++ b/include/linux/kvm_host.h
>> > @@ -231,6 +231,9 @@ struct kvm_vcpu {
>> >         unsigned long requests;
>> >         unsigned long guest_debug;
>> >
>> > +       int pre_pcpu;
>> > +       struct list_head blocked_vcpu_list;
>> > +
>> >         struct mutex mutex;
>> >         struct kvm_run *run;
>> >
>> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> > index 8b8a444..191c7eb 100644
>> > --- a/virt/kvm/kvm_main.c
>> > +++ b/virt/kvm/kvm_main.c
>> > @@ -220,6 +220,9 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm
>> *kvm, unsigned id)
>> >         init_waitqueue_head(&vcpu->wq);
>> >         kvm_async_pf_vcpu_init(vcpu);
>> >
>> > +       vcpu->pre_pcpu = -1;
>> > +       INIT_LIST_HEAD(&vcpu->blocked_vcpu_list);
>> > +
>> >         page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>> >         if (!page) {
>> >                 r = -ENOMEM;
>> > --
>> > 2.1.0
>> >
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe kvm" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v9 17/18] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-10-15 17:39       ` David Matlack
@ 2015-10-15 18:13         ` Paolo Bonzini
  2015-10-16  1:45           ` Wu, Feng
  0 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2015-10-15 18:13 UTC (permalink / raw)
  To: David Matlack, Wu, Feng
  Cc: alex.williamson, Joerg Roedel, Marcelo Tosatti, eric.auger,
	kvm list, iommu, linux-kernel



On 15/10/2015 19:39, David Matlack wrote:
> But after spending more time reading the source code this morning I
> found that kvm_vcpu_check_block() eventually calls into
> vmx_sync_pir_to_irr(), which copies PIR to IRR and clears ON. And then
> apic_find_highest_irr() detects the pending posted interrupt.

Right.  And related to this, Feng, can you check if this is still
necessary on kvm/queue:

@@ -6518,6 +6523,20 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			kvm_vcpu_reload_apic_access_page(vcpu);
 	}
 
+	/*
+	 * KVM_REQ_EVENT is not set when posted interrupts are set by
+	 * VT-d hardware, so we have to update RVI unconditionally.
+	 */
+	if (kvm_lapic_enabled(vcpu)) {
+		/*
+		 * Update architecture specific hints for APIC
+		 * virtual interrupt delivery.
+		 */
+		if (kvm_x86_ops->hwapic_irr_update)
+			kvm_x86_ops->hwapic_irr_update(vcpu,
+				kvm_lapic_find_highest_irr(vcpu));
+	}
+
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
 		kvm_apic_accept_events(vcpu);
 		if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
@@ -6534,13 +6553,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			kvm_x86_ops->enable_irq_window(vcpu);
 
 		if (kvm_lapic_enabled(vcpu)) {
-			/*
-			 * Update architecture specific hints for APIC
-			 * virtual interrupt delivery.
-			 */
-			if (kvm_x86_ops->hwapic_irr_update)
-				kvm_x86_ops->hwapic_irr_update(vcpu,
-					kvm_lapic_find_highest_irr(vcpu));
 			update_cr8_intercept(vcpu);
 			kvm_lapic_sync_to_vapic(vcpu);
 		}


It may be obsolete now that we have the patch from Radim to set KVM_REQ_EVENT
in vmx_sync_pir_to_irr (http://permalink.gmane.org/gmane.linux.kernel/2057138).

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v9 17/18] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-10-15 18:13         ` Paolo Bonzini
@ 2015-10-16  1:45           ` Wu, Feng
  0 siblings, 0 replies; 56+ messages in thread
From: Wu, Feng @ 2015-10-16  1:45 UTC (permalink / raw)
  To: Paolo Bonzini, David Matlack
  Cc: alex.williamson, Joerg Roedel, Marcelo Tosatti, eric.auger,
	kvm list, iommu, linux-kernel, Wu, Feng

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3747 bytes --]



> -----Original Message-----
> From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo
> Bonzini
> Sent: Friday, October 16, 2015 2:13 AM
> To: David Matlack <dmatlack@google.com>; Wu, Feng <feng.wu@intel.com>
> Cc: alex.williamson@redhat.com; Joerg Roedel <joro@8bytes.org>; Marcelo
> Tosatti <mtosatti@redhat.com>; eric.auger@linaro.org; kvm list
> <kvm@vger.kernel.org>; iommu@lists.linux-foundation.org; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH v9 17/18] KVM: Update Posted-Interrupts Descriptor when
> vCPU is blocked
> 
> 
> 
> On 15/10/2015 19:39, David Matlack wrote:
> > But after spending more time reading the source code this morning I
> > found that kvm_vcpu_check_block() eventually calls into
> > vmx_sync_pir_to_irr(), which copies PIR to IRR and clears ON. And then
> > apic_find_highest_irr() detects the pending posted interrupt.
> 
> Right.  And related to this, Feng, can you check if this is still
> necessary on kvm/queue:
> 
> @@ -6518,6 +6523,20 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  			kvm_vcpu_reload_apic_access_page(vcpu);
>  	}
> 
> +	/*
> +	 * KVM_REQ_EVENT is not set when posted interrupts are set by
> +	 * VT-d hardware, so we have to update RVI unconditionally.
> +	 */
> +	if (kvm_lapic_enabled(vcpu)) {
> +		/*
> +		 * Update architecture specific hints for APIC
> +		 * virtual interrupt delivery.
> +		 */
> +		if (kvm_x86_ops->hwapic_irr_update)
> +			kvm_x86_ops->hwapic_irr_update(vcpu,
> +				kvm_lapic_find_highest_irr(vcpu));
> +	}
> +
>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>  		kvm_apic_accept_events(vcpu);
>  		if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> @@ -6534,13 +6553,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  			kvm_x86_ops->enable_irq_window(vcpu);
> 
>  		if (kvm_lapic_enabled(vcpu)) {
> -			/*
> -			 * Update architecture specific hints for APIC
> -			 * virtual interrupt delivery.
> -			 */
> -			if (kvm_x86_ops->hwapic_irr_update)
> -				kvm_x86_ops->hwapic_irr_update(vcpu,
> -					kvm_lapic_find_highest_irr(vcpu));
>  			update_cr8_intercept(vcpu);
>  			kvm_lapic_sync_to_vapic(vcpu);
>  		}
> 

I think the above code is needed, before the place where 'KVM_REQ_EVENT'
got checked in vcpu_enter_guest(), VT-d hardware can issue notification
event at any time. Consider the following scenario:

vcpu_run()
{
	......	
	
	for(;;) {
		point #1
		vcpu_enter_guest()
	}	

	point #2
}

For example, if we receive notification events issued by VT-d hardware at
point #1 and point#2, then enter vcpu_enter_guest() with 'KVM_REQ_EVENT'
not set, the interrupts cannot be delivered to guest during _this_ VM-Entry.

The point is that VT-d hardware can issue notification event at any time,
but it cannot set 'KVM_REQ_EVENT' like software does.

Maybe one thing we can do is only executing the following code when
vt-d pi is enabled,

 +	/*
 +	 * KVM_REQ_EVENT is not set when posted interrupts are set by
 +	 * VT-d hardware, so we have to update RVI unconditionally.
 +	 */
 +	if (kvm_lapic_enabled(vcpu)) {
 +		/*
 +		 * Update architecture specific hints for APIC
 +		 * virtual interrupt delivery.
 +		 */
 +		if (kvm_x86_ops->hwapic_irr_update)
 +			kvm_x86_ops->hwapic_irr_update(vcpu,
 +				kvm_lapic_find_highest_irr(vcpu));
 +	}
 +

And do this inside the KVM_REQ_EVENT check when VT-d PI is not enabled.

Thanks,
Feng

> 
> It may be obsolete now that we have the patch from Radim to set
> KVM_REQ_EVENT
> in vmx_sync_pir_to_irr
> (http://permalink.gmane.org/gmane.linux.kernel/2057138).
> 
> Thanks,
> 
> Paolo
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v9 12/18] vfio: Register/unregister irq_bypass_producer
  2015-09-18 14:29 ` [PATCH v9 12/18] vfio: Register/unregister irq_bypass_producer Feng Wu
  2015-09-18 17:19   ` Alex Williamson
  2015-09-21  8:56   ` Wu, Feng
@ 2016-04-26 20:08   ` Alex Williamson
  2016-04-27  1:32     ` Wu, Feng
  2016-04-28 15:35     ` Eric Auger
  2 siblings, 2 replies; 56+ messages in thread
From: Alex Williamson @ 2016-04-26 20:08 UTC (permalink / raw)
  To: Feng Wu; +Cc: pbonzini, joro, mtosatti, eric.auger, kvm, iommu, linux-kernel

On Fri, 18 Sep 2015 22:29:50 +0800
Feng Wu <feng.wu@intel.com> wrote:

 @@ -360,6 +361,14 @@ static int vfio_msi_set_vector_signal(struct
 vfio_pci_device *vdev,
>  		return ret;
>  	}
>  
> +	vdev->ctx[vector].producer.token = trigger;
> +	vdev->ctx[vector].producer.irq = irq;
> +	ret = irq_bypass_register_producer(&vdev->ctx[vector].producer);
> +	if (unlikely(ret))
> +		dev_info(&pdev->dev,
> +		"irq bypass producer (token %p) registeration fails: %d\n",
> +		vdev->ctx[vector].producer.token, ret);
> +
>  	vdev->ctx[vector].trigger = trigger;
>  
>  	return 0;

Digging back into the IRQ producer/consumer thing, I'm not sure how we
should be handling a failure here, but it turns out that what we have
is pretty sub-optimal.  Any sort of testing on AMD hits this dev_info
because kvm_arch_irq_bypass_add_producer() returns -EINVAL without
kvm_x86_ops->update_pi_irte which is only implemented for vmx.  Clearly
we don't want to spew confusing error messages for a feature that does
not exist.

The easiest option is to simply make this error silent, but should
registering a producer/consumer really fail due to a mismatch on the
other end or should the __connect sequence fail silently, which both
ends would know about (if they care) due to the add/del handshake
between them?  Perhaps for now we simply need a stable suitable fix to
silence the dev_info above, but longer term, registration shouldn't
fail for mismatches like this.  Thoughts?  Thanks,

Alex

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v9 12/18] vfio: Register/unregister irq_bypass_producer
  2016-04-26 20:08   ` Alex Williamson
@ 2016-04-27  1:32     ` Wu, Feng
  2016-04-28 16:40       ` Alex Williamson
  2016-04-28 15:35     ` Eric Auger
  1 sibling, 1 reply; 56+ messages in thread
From: Wu, Feng @ 2016-04-27  1:32 UTC (permalink / raw)
  To: Alex Williamson
  Cc: pbonzini, joro, mtosatti, eric.auger, kvm, iommu, linux-kernel, Wu, Feng



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Wednesday, April 27, 2016 4:08 AM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: pbonzini@redhat.com; joro@8bytes.org; mtosatti@redhat.com;
> eric.auger@linaro.org; kvm@vger.kernel.org; iommu@lists.linux-
> foundation.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v9 12/18] vfio: Register/unregister irq_bypass_producer
> 
> On Fri, 18 Sep 2015 22:29:50 +0800
> Feng Wu <feng.wu@intel.com> wrote:
> 
>  @@ -360,6 +361,14 @@ static int vfio_msi_set_vector_signal(struct
>  vfio_pci_device *vdev,
> >  		return ret;
> >  	}
> >
> > +	vdev->ctx[vector].producer.token = trigger;
> > +	vdev->ctx[vector].producer.irq = irq;
> > +	ret = irq_bypass_register_producer(&vdev->ctx[vector].producer);
> > +	if (unlikely(ret))
> > +		dev_info(&pdev->dev,
> > +		"irq bypass producer (token %p) registeration fails: %d\n",
> > +		vdev->ctx[vector].producer.token, ret);
> > +
> >  	vdev->ctx[vector].trigger = trigger;
> >
> >  	return 0;
> 
> Digging back into the IRQ producer/consumer thing, I'm not sure how we
> should be handling a failure here, but it turns out that what we have
> is pretty sub-optimal.  Any sort of testing on AMD hits this dev_info
> because kvm_arch_irq_bypass_add_producer() returns -EINVAL without
> kvm_x86_ops->update_pi_irte which is only implemented for vmx.  Clearly
> we don't want to spew confusing error messages for a feature that does
> not exist.
> 
> The easiest option is to simply make this error silent, but should
> registering a producer/consumer really fail due to a mismatch on the
> other end or should the __connect sequence fail silently, which both
> ends would know about (if they care) due to the add/del handshake
> between them?  Perhaps for now we simply need a stable suitable fix to
> silence the dev_info above, but longer term, registration shouldn't
> fail for mismatches like this.  Thoughts?  Thanks,

Can we just return 0 when kvm_x86_ops->update_pi_irte is NULL in
kvm_arch_irq_bypass_add_producer?

Thanks,
Feng

> 
> Alex

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v9 12/18] vfio: Register/unregister irq_bypass_producer
  2016-04-26 20:08   ` Alex Williamson
  2016-04-27  1:32     ` Wu, Feng
@ 2016-04-28 15:35     ` Eric Auger
  1 sibling, 0 replies; 56+ messages in thread
From: Eric Auger @ 2016-04-28 15:35 UTC (permalink / raw)
  To: Alex Williamson, Feng Wu
  Cc: pbonzini, joro, mtosatti, kvm, iommu, linux-kernel

Hi Alex,
On 04/26/2016 10:08 PM, Alex Williamson wrote:
> On Fri, 18 Sep 2015 22:29:50 +0800
> Feng Wu <feng.wu@intel.com> wrote:
> 
>  @@ -360,6 +361,14 @@ static int vfio_msi_set_vector_signal(struct
>  vfio_pci_device *vdev,
>>  		return ret;
>>  	}
>>  
>> +	vdev->ctx[vector].producer.token = trigger;
>> +	vdev->ctx[vector].producer.irq = irq;
>> +	ret = irq_bypass_register_producer(&vdev->ctx[vector].producer);
>> +	if (unlikely(ret))
>> +		dev_info(&pdev->dev,
>> +		"irq bypass producer (token %p) registeration fails: %d\n",
>> +		vdev->ctx[vector].producer.token, ret);
>> +
>>  	vdev->ctx[vector].trigger = trigger;
>>  
>>  	return 0;
> 
> Digging back into the IRQ producer/consumer thing, I'm not sure how we
> should be handling a failure here, but it turns out that what we have
> is pretty sub-optimal.  Any sort of testing on AMD hits this dev_info
> because kvm_arch_irq_bypass_add_producer() returns -EINVAL without
> kvm_x86_ops->update_pi_irte which is only implemented for vmx.  Clearly
> we don't want to spew confusing error messages for a feature that does
> not exist.
> 
> The easiest option is to simply make this error silent, but should
> registering a producer/consumer really fail due to a mismatch on the
> other end or should the __connect sequence fail silently, which both
> ends would know about (if they care) due to the add/del handshake
> between them?  Perhaps for now we simply need a stable suitable fix to
> silence the dev_info above, but longer term, registration shouldn't
> fail for mismatches like this.  Thoughts?  Thanks,

Regarding the ARM IRQ forwarding use case, I think it is OK to fail
silently. We would fall back to the irqfd standard mechanism. Anyway
this series still is waiting for ARM new-vgic dependency to be resolved,
as discussed with Christoffer and Marc.

Best Regards

Eric
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v9 12/18] vfio: Register/unregister irq_bypass_producer
  2016-04-27  1:32     ` Wu, Feng
@ 2016-04-28 16:40       ` Alex Williamson
  0 siblings, 0 replies; 56+ messages in thread
From: Alex Williamson @ 2016-04-28 16:40 UTC (permalink / raw)
  To: Wu, Feng; +Cc: pbonzini, joro, mtosatti, eric.auger, kvm, iommu, linux-kernel

On Wed, 27 Apr 2016 01:32:32 +0000
"Wu, Feng" <feng.wu@intel.com> wrote:

> > -----Original Message-----
> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Wednesday, April 27, 2016 4:08 AM
> > To: Wu, Feng <feng.wu@intel.com>
> > Cc: pbonzini@redhat.com; joro@8bytes.org; mtosatti@redhat.com;
> > eric.auger@linaro.org; kvm@vger.kernel.org; iommu@lists.linux-
> > foundation.org; linux-kernel@vger.kernel.org
> > Subject: Re: [PATCH v9 12/18] vfio: Register/unregister irq_bypass_producer
> > 
> > On Fri, 18 Sep 2015 22:29:50 +0800
> > Feng Wu <feng.wu@intel.com> wrote:
> > 
> >  @@ -360,6 +361,14 @@ static int vfio_msi_set_vector_signal(struct
> >  vfio_pci_device *vdev,  
> > >  		return ret;
> > >  	}
> > >
> > > +	vdev->ctx[vector].producer.token = trigger;
> > > +	vdev->ctx[vector].producer.irq = irq;
> > > +	ret = irq_bypass_register_producer(&vdev->ctx[vector].producer);
> > > +	if (unlikely(ret))
> > > +		dev_info(&pdev->dev,
> > > +		"irq bypass producer (token %p) registeration fails: %d\n",
> > > +		vdev->ctx[vector].producer.token, ret);
> > > +
> > >  	vdev->ctx[vector].trigger = trigger;
> > >
> > >  	return 0;  
> > 
> > Digging back into the IRQ producer/consumer thing, I'm not sure how we
> > should be handling a failure here, but it turns out that what we have
> > is pretty sub-optimal.  Any sort of testing on AMD hits this dev_info
> > because kvm_arch_irq_bypass_add_producer() returns -EINVAL without
> > kvm_x86_ops->update_pi_irte which is only implemented for vmx.  Clearly
> > we don't want to spew confusing error messages for a feature that does
> > not exist.
> > 
> > The easiest option is to simply make this error silent, but should
> > registering a producer/consumer really fail due to a mismatch on the
> > other end or should the __connect sequence fail silently, which both
> > ends would know about (if they care) due to the add/del handshake
> > between them?  Perhaps for now we simply need a stable suitable fix to
> > silence the dev_info above, but longer term, registration shouldn't
> > fail for mismatches like this.  Thoughts?  Thanks,  
> 
> Can we just return 0 when kvm_x86_ops->update_pi_irte is NULL in
> kvm_arch_irq_bypass_add_producer?

Yeah, that may be the best way to go, only return error for actual
failures, not for simple lack of a bypass mechanism.  This is
consistent with what update_pi_irte does when running on hardware
or configurations without PI.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2016-04-28 16:40 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-18 14:29 [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series Feng Wu
2015-09-18 14:29 ` [PATCH v9 01/18] virt: IRQ bypass manager Feng Wu
2015-09-18 15:34   ` Wu, Feng
2015-09-18 14:29 ` [PATCH v9 02/18] KVM: x86: select IRQ_BYPASS_MANAGER Feng Wu
2015-09-18 14:29 ` [PATCH v9 03/18] KVM: arm/arm64: " Feng Wu
2015-09-21 19:32   ` Eric Auger
2015-09-18 14:29 ` [PATCH v9 04/18] KVM: create kvm_irqfd.h Feng Wu
2015-09-18 15:35   ` Wu, Feng
2015-09-18 14:29 ` [PATCH v9 05/18] KVM: introduce kvm_arch functions for IRQ bypass Feng Wu
2015-09-18 14:29 ` [PATCH v9 06/18] KVM: eventfd: add irq bypass consumer management Feng Wu
2015-09-18 14:29 ` [PATCH v9 07/18] KVM: Extend struct pi_desc for VT-d Posted-Interrupts Feng Wu
2015-09-18 14:29 ` [PATCH v9 08/18] KVM: Add some helper functions for Posted-Interrupts Feng Wu
2015-09-18 14:29 ` [PATCH v9 09/18] KVM: Define a new interface kvm_intr_is_single_vcpu() Feng Wu
2015-09-18 14:29 ` [PATCH v9 10/18] KVM: Make struct kvm_irq_routing_table accessible Feng Wu
2015-09-18 14:29 ` [PATCH v9 11/18] KVM: make kvm_set_msi_irq() public Feng Wu
2015-09-18 14:29 ` [PATCH v9 12/18] vfio: Register/unregister irq_bypass_producer Feng Wu
2015-09-18 17:19   ` Alex Williamson
2015-09-21  8:56   ` Wu, Feng
2015-09-21  9:32     ` Paolo Bonzini
2015-09-21 11:35       ` Wu, Feng
2015-09-21 12:06         ` Paolo Bonzini
2015-09-21 12:08           ` Wu, Feng
2015-09-21 12:53       ` Wu, Feng
2015-09-21 13:02         ` Paolo Bonzini
2015-09-21 19:46           ` Eric Auger
2016-04-26 20:08   ` Alex Williamson
2016-04-27  1:32     ` Wu, Feng
2016-04-28 16:40       ` Alex Williamson
2016-04-28 15:35     ` Eric Auger
2015-09-18 14:29 ` [PATCH v9 13/18] KVM: x86: Update IRTE for posted-interrupts Feng Wu
2015-09-18 14:29 ` [PATCH v9 14/18] KVM: Implement IRQ bypass consumer callbacks for x86 Feng Wu
2015-09-18 14:29 ` [PATCH v9 15/18] KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd' Feng Wu
2015-09-18 14:29 ` [PATCH v9 16/18] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted Feng Wu
2015-09-18 14:29 ` [PATCH v9 17/18] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked Feng Wu
2015-09-18 16:06   ` Paolo Bonzini
2015-09-19  7:11     ` Wu, Feng
2015-09-21  2:16     ` Wu, Feng
2015-09-21  5:32       ` Paolo Bonzini
2015-09-21  5:45         ` Wu, Feng
2015-10-14 23:41   ` David Matlack
2015-10-15  1:33     ` Wu, Feng
2015-10-15 17:39       ` David Matlack
2015-10-15 18:13         ` Paolo Bonzini
2015-10-16  1:45           ` Wu, Feng
2015-09-18 14:29 ` [PATCH v9 18/18] iommu/vt-d: Add a command line parameter for VT-d posted-interrupts Feng Wu
2015-09-21 13:46   ` Joerg Roedel
2015-09-18 14:58 ` [PATCH v9 00/18] Add VT-d Posted-Interrupts support - including prerequisite series Paolo Bonzini
2015-09-18 15:08   ` Wu, Feng
2015-09-18 15:21     ` Paolo Bonzini
2015-09-18 15:38       ` Wu, Feng
2015-09-18 17:57   ` Alex Williamson
2015-09-25  1:49 ` Wu, Feng
2015-09-25 11:14   ` Paolo Bonzini
2015-09-28 10:14     ` Wu, Feng
2015-09-28 10:18       ` Paolo Bonzini
2015-09-28 10:22         ` Wu, Feng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).