kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] KVM: Add module for IRQ forwarding
@ 2020-05-11 22:00 Micah Morton
  2020-05-12 17:14 ` Alex Williamson
  0 siblings, 1 reply; 19+ messages in thread
From: Micah Morton @ 2020-05-11 22:00 UTC (permalink / raw)
  To: kvm, pbonzini, alex.williamson, jmattson; +Cc: Micah Morton

Relevant KVM maintainers:
I'm looking for comments on the feasibility of adding a module like
this in KVM for solving the problem described below:


Currently, KVM/VFIO offers no way to forward interrupts into a guest
from devices that are implicitly assigned to the VM by nature of being
downstream from a bus controller PCI device (e.g. I2C controller) that
gets explicitly assigned to the VM. This module allows for forwarding
arbitrary interrupts on the host system into the guest, supporting this
platform-device-behind-PCI-controller scenario.

This code is mostly inspired/taken from the equivalent code in VFIO. It
is not a finished product, but I wanted to check in with the KVM mailing
list in order to assess feasibility before doing any more work on it.

One obvious question would be why not just add this support to VFIO?
See https://www.redhat.com/archives/vfio-users/2019-December/msg00008.html
and the encompassing thread for a discussion as to why this probably
isn't the way to go.

Forwarding arbitrary IRQs to a guest VM does require the VMM to "tell"
the guest about the interrupt (e.g. through ACPI), since such info is
not discoverable by the guest like it is for PCI devices. So separate
logic is needed in the VMM to set this up -- this isn't something done
by the module shared here.

Forwarding platform IRQs can have a big payoff for getting platform
devices to work in a guest, especially when the platform devices sit
behind a PCI bus controller that can be easily passed through to the
guest. On an Intel device I'm using for development, this module allowed
me to get multiple devices (keyboard, touchscreen, touchpad) working in
a VM guest on the device that wouldn't have worked otherwise -- straight
out of the box after passing through the PCI bus controller with
vfio-pci (plus constructing some AML for the guest in the VMM).

NOTE: This code works for forwarding IRQs to a guest (with the VMM
calling the appropriate ioctls with the appropriate args), although it's
missing some code and testing related to shutdown/irq disable/reboot.
Works well enough to demonstrate the feasibility though.

Developed on top of v5.7-rc4.

Signed-off-by: Micah Morton <mortonm@chromium.org>
---
 include/linux/irqfd.h           |  22 +++
 include/linux/miscdevice.h      |   1 +
 include/uapi/linux/irqforward.h |  55 ++++++
 virt/lib/Kconfig                |   3 +
 virt/lib/Makefile               |   1 +
 virt/lib/irqfd.c                | 146 ++++++++++++++++
 virt/lib/irqforward.c           | 289 ++++++++++++++++++++++++++++++++
 7 files changed, 517 insertions(+)
 create mode 100644 include/linux/irqfd.h
 create mode 100644 include/uapi/linux/irqforward.h
 create mode 100644 virt/lib/irqfd.c
 create mode 100644 virt/lib/irqforward.c

diff --git a/include/linux/irqfd.h b/include/linux/irqfd.h
new file mode 100644
index 000000000000..79d2a8c779e1
--- /dev/null
+++ b/include/linux/irqfd.h
@@ -0,0 +1,22 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#ifndef IRQFD_H
+#define IRQFD_H
+
+#include <linux/poll.h>
+
+struct irq_forward_irqfd {
+	struct eventfd_ctx	*eventfd;
+	int			(*handler)(void *, void *);
+	void			*data;
+	wait_queue_entry_t	wait;
+	poll_table		pt;
+	struct work_struct	shutdown;
+	struct irq_forward_irqfd		**pirqfd;
+};
+
+int irq_forward_irqfd_enable(int (*handler)(void *, void *), void *data, struct irq_forward_irqfd **pirqfd, int fd);
+#endif /* IRQFD_H */
diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h
index c7a93002a3c1..f17b37fb8264 100644
--- a/include/linux/miscdevice.h
+++ b/include/linux/miscdevice.h
@@ -50,6 +50,7 @@
 #define D7S_MINOR		193
 #define VFIO_MINOR		196
 #define PXA3XX_GCU_MINOR	197
+#define IRQ_FORWARD_MINOR       198
 #define TUN_MINOR		200
 #define CUSE_MINOR		203
 #define MWAVE_MINOR		219	/* ACP/Mwave Modem */
diff --git a/include/uapi/linux/irqforward.h b/include/uapi/linux/irqforward.h
new file mode 100644
index 000000000000..a77aaa4841b5
--- /dev/null
+++ b/include/uapi/linux/irqforward.h
@@ -0,0 +1,55 @@
+/*
+ * API definition for IRQ Forwarding to KVM guests
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#ifndef _UAPIIRQFORWARD_H
+#define _UAPIIRQFORWARD_H
+
+#include <linux/ioctl.h>
+
+#define IRQ_FORWARD_API_VERSION	0
+
+#define IRQ_FORWARD_TYPE       (';')
+#define IRQ_FORWARD_BASE       100
+
+struct irq_forward_edge_triggered {
+	struct eventfd_ctx *trigger;
+	uint32_t irq_num;
+	struct list_head list;
+};
+
+struct irq_forward_level_triggered {
+	struct eventfd_ctx *trigger;
+	struct irq_forward_irqfd *unmask;
+	bool is_masked;
+	spinlock_t spinlock;
+	uint32_t irq_num;
+	struct list_head list;
+};
+
+/**
+ *
+ * Set masking and unmasking of interrupts.  Caller provides
+ * struct irq_forward_set with all fields set.
+ *
+ */
+struct irq_forward_set {
+	__u32	argsz;
+	__u32	action_flags;
+#define IRQ_FORWARD_SET_LEVEL_TRIGGER_EVENTFD	(1 << 0)
+#define IRQ_FORWARD_SET_LEVEL_UNMASK_EVENTFD	(1 << 1)
+#define IRQ_FORWARD_SET_EDGE_TRIGGER		(1 << 2)
+	__u32	irq_number_host;
+	__u32	count;
+	__u8	eventfd[];
+};
+
+/* ---- IOCTLs for IRQ Forwarding fd (/dev/irq-forward) ---- */
+#define IRQ_FORWARD_SET _IO(IRQ_FORWARD_TYPE, IRQ_FORWARD_BASE + 0)
+
+/* *********************************************************************** */
+
+#endif /* _UAPIIRQFORWARD_H */
diff --git a/virt/lib/Kconfig b/virt/lib/Kconfig
index 2d9523b7155e..847b06a95c14 100644
--- a/virt/lib/Kconfig
+++ b/virt/lib/Kconfig
@@ -1,3 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 config IRQ_BYPASS_MANAGER
 	tristate
+
+config IRQ_FORWARD
+	tristate "Enable forwarding arbitrary IRQs to guest in KVM"
diff --git a/virt/lib/Makefile b/virt/lib/Makefile
index bd7f9a78bb6b..bd46aad8d426 100644
--- a/virt/lib/Makefile
+++ b/virt/lib/Makefile
@@ -1,2 +1,3 @@
 # SPDX-License-Identifier: GPL-2.0-only
 obj-$(CONFIG_IRQ_BYPASS_MANAGER) += irqbypass.o
+obj-$(CONFIG_IRQ_FORWARD) += irqforward.o irqfd.o
diff --git a/virt/lib/irqfd.c b/virt/lib/irqfd.c
new file mode 100644
index 000000000000..4eb47a1c1e6f
--- /dev/null
+++ b/virt/lib/irqfd.c
@@ -0,0 +1,146 @@
+#include <linux/file.h>
+#include <linux/vfio.h>
+#include <linux/eventfd.h>
+#include <linux/slab.h>
+#include <uapi/linux/irqforward.h>
+#include <linux/irqfd.h>
+
+static struct workqueue_struct *irqfd_cleanup_wq;
+static DEFINE_SPINLOCK(irqfd_lock);
+
+static void irqfd_deactivate(struct irq_forward_irqfd *irqfd)
+{
+        queue_work(irqfd_cleanup_wq, &irqfd->shutdown);
+}
+
+static int irqfd_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync, void *key)
+{
+        struct irq_forward_irqfd *irqfd = container_of(wait, struct irq_forward_irqfd, wait);
+        __poll_t flags = key_to_poll(key);
+
+        if (flags & EPOLLIN) {
+                /* An event has been signaled, call function */
+                if (!irqfd->handler ||
+                     irqfd->handler(NULL, irqfd->data))
+                        printk(KERN_EMERG "handler failed\n");
+        }
+
+        if (flags & EPOLLHUP) {
+                unsigned long flags;
+                spin_lock_irqsave(&irqfd_lock, flags);
+
+                /*
+                 * The eventfd is closing, if the irqfd has not yet been
+                 * queued for release, as determined by testing whether the
+                 * irqfd pointer to it is still valid, queue it now.  As
+                 * with kvm irqfds, we know we won't race against the irqfd
+                 * going away because we hold the lock to get here.
+                 */
+                if (*(irqfd->pirqfd) == irqfd) {
+                        *(irqfd->pirqfd) = NULL;
+                        irqfd_deactivate(irqfd);
+                }
+
+                spin_unlock_irqrestore(&irqfd_lock, flags);
+        }
+
+        return 0;
+}
+
+
+static void irqfd_ptable_queue_proc(struct file *file,
+                                     wait_queue_head_t *wqh, poll_table *pt)
+{
+        struct irq_forward_irqfd *irqfd = container_of(pt, struct irq_forward_irqfd, pt);
+        add_wait_queue(wqh, &irqfd->wait);
+}
+
+static void irqfd_shutdown(struct work_struct *work)
+{
+        struct irq_forward_irqfd *irqfd = container_of(work, struct irq_forward_irqfd, shutdown);
+        u64 cnt;
+
+        eventfd_ctx_remove_wait_queue(irqfd->eventfd, &irqfd->wait, &cnt);
+        eventfd_ctx_put(irqfd->eventfd);
+
+        kfree(irqfd);
+}
+
+int irq_forward_irqfd_enable(int (*handler)(void *, void *), void *data, struct irq_forward_irqfd **pirqfd, int fd)
+{
+        struct fd irqfd;
+        struct eventfd_ctx *ctx;
+        struct irq_forward_irqfd *irqfd_struct;
+        int ret = 0;
+        unsigned int events;
+
+        irqfd_struct = kzalloc(sizeof(*irqfd_struct), GFP_KERNEL);
+        if (!irqfd_struct)
+                return -ENOMEM;
+
+        irqfd_struct->pirqfd = pirqfd;
+        irqfd_struct->handler = handler;
+        irqfd_struct->data = data;
+
+        // shutdown causes crash
+        INIT_WORK(&irqfd_struct->shutdown, irqfd_shutdown);
+
+        irqfd = fdget(fd);
+        if (!irqfd.file) {
+                ret = -EBADF;
+                goto err_fd;
+        }
+
+        ctx = eventfd_ctx_fileget(irqfd.file);
+        if (IS_ERR(ctx)) {
+                ret = PTR_ERR(ctx);
+                goto err_ctx;
+        }
+
+        irqfd_struct->eventfd = ctx;
+
+         // irqfds can be released by closing the eventfd or directly
+         // through ioctl.  These are both done through a workqueue, so
+         // we update the pointer to the irqfd under lock to avoid
+         // pushing multiple jobs to release the same irqfd.
+        spin_lock_irq(&irqfd_lock);
+
+        if (*pirqfd) {
+                printk(KERN_EMERG "pirqfd should be NULL. BUG!\n");
+                spin_unlock_irq(&irqfd_lock);
+                ret = -EBUSY;
+                goto err_busy;
+        }
+        *pirqfd = irqfd_struct;
+
+        spin_unlock_irq(&irqfd_lock);
+
+         // Install our own custom wake-up handling so we are notified via
+         // a callback whenever someone signals the underlying eventfd.
+        init_waitqueue_func_entry(&irqfd_struct->wait, irqfd_wakeup);
+        init_poll_funcptr(&irqfd_struct->pt, irqfd_ptable_queue_proc);
+
+        events = irqfd.file->f_op->poll(irqfd.file, &irqfd_struct->pt);
+
+         // Check if there was an event already pending on the eventfd
+         // before we registered and trigger it as if we didn't miss it.
+        if (events & POLLIN) {
+                if (!handler || handler(NULL, data))
+                        printk(KERN_EMERG "handler failed\n");
+        }
+
+         // Do not drop the file until the irqfd is fully initialized,
+         // otherwise we might race against the POLLHUP.
+        fdput(irqfd);
+
+        return 0;
+err_busy:
+        eventfd_ctx_put(ctx);
+err_ctx:
+        fdput(irqfd);
+err_fd:
+        kfree(irqfd_struct);
+
+        return ret;
+}
+EXPORT_SYMBOL_GPL(irq_forward_irqfd_enable);
diff --git a/virt/lib/irqforward.c b/virt/lib/irqforward.c
new file mode 100644
index 000000000000..1d5030d347aa
--- /dev/null
+++ b/virt/lib/irqforward.c
@@ -0,0 +1,289 @@
+#include <linux/cdev.h>
+#include <linux/compat.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/miscdevice.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/sched.h>
+//#include <linux/vfio.h>
+#include <linux/eventfd.h>
+#include <linux/delay.h>
+#include <uapi/linux/irqforward.h>
+#include <linux/irqfd.h>
+
+#define VERSION	"0.1"
+#define AUTHOR	"Micah Morton <mortonm@chromium.org>"
+#define DESC	"IRQ Forwarding"
+
+MODULE_VERSION(VERSION);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR(AUTHOR);
+MODULE_DESCRIPTION(DESC);
+MODULE_ALIAS_MISCDEV(IRQ_FORWARD_MINOR);
+MODULE_ALIAS("devname:irq-forward");
+
+static LIST_HEAD(level_triggered_irqs);
+static LIST_HEAD(edge_triggered_irqs);
+
+
+static int irq_forward_unmask_handler_level(void *opaque, void *level)
+{
+	unsigned long flags;
+	struct irq_forward_level_triggered *l = (struct irq_forward_level_triggered *) level;
+
+	spin_lock_irqsave(&(l->spinlock), flags);
+	if (l->is_masked) {
+		enable_irq(l->irq_num);
+		l->is_masked = false;
+	}
+	spin_unlock_irqrestore(&(l->spinlock), flags);
+	return 0;
+}
+
+
+static irqreturn_t irq_forward_handler_level(int irq, void *level)
+{
+	unsigned long flags;
+	int ret = IRQ_NONE;
+	struct irq_forward_level_triggered *l = (struct irq_forward_level_triggered *) level;
+	spin_lock_irqsave(&(l->spinlock), flags);
+
+	disable_irq_nosync(irq);
+	l->is_masked = true;
+	ret = IRQ_HANDLED;
+
+	spin_unlock_irqrestore(&(l->spinlock), flags);
+
+	if (ret == IRQ_HANDLED)
+	        eventfd_signal(l->trigger, 1);
+
+	return ret;
+}
+
+static irqreturn_t irq_forward_handler_edge(int irq, void *edge)
+{
+        eventfd_signal(((struct irq_forward_edge_triggered*)edge)->trigger, 1);
+
+        return IRQ_HANDLED;
+}
+
+static int irq_forward_set_level_trigger(void *data, uint32_t irq_number_host, struct irq_forward_level_triggered *level)
+{
+	int32_t fd;
+	struct eventfd_ctx *trigger;
+	int ret;
+
+	fd = *(int32_t *)data;
+
+	if (fd < 0) /* Disable only */
+	        return 0;
+
+	trigger = eventfd_ctx_fdget(fd);
+	if (IS_ERR(trigger)) {
+	        return PTR_ERR(trigger);
+	}
+
+	level->trigger = trigger;
+	spin_lock_init(&(level->spinlock));
+
+	ret = request_irq(irq_number_host, irq_forward_handler_level, 0, "level-triggered-irq", level);
+	if (ret) {
+	        level->trigger = NULL;
+	        eventfd_ctx_put(trigger);
+	        return ret;
+	}
+
+	return 0;
+
+}
+
+static int irq_forward_set_level_unmask(void *data, struct irq_forward_level_triggered *level)
+{
+	int32_t fd;
+	fd = *(int32_t *)data;
+
+	if (fd >= 0)
+	        return irq_forward_irqfd_enable(irq_forward_unmask_handler_level, level, &(level->unmask), fd);
+	return -1;
+}
+
+static int irq_forward_set_edge_trigger(void *data, uint32_t irq_number_host, struct irq_forward_edge_triggered *edge)
+{
+	struct eventfd_ctx *trigger;
+    int ret;
+	int32_t fd;
+	fd = *(int32_t *)data;
+
+	if (fd < 0) /* Disable only */
+                return 0;
+
+        trigger = eventfd_ctx_fdget(fd);
+        if (IS_ERR(trigger)) {
+                return PTR_ERR(trigger);
+        }
+
+        edge->trigger = trigger;
+
+        ret = request_irq(irq_number_host, irq_forward_handler_edge, IRQF_SHARED, "edge-triggered-irq", edge);
+        if (ret) {
+                edge->trigger = NULL;
+                eventfd_ctx_put(trigger);
+                return ret;
+        }
+
+        return 0;
+}
+
+
+int set_irqs_ioctl_level_trigger(uint32_t irq_number_host, void *data) {
+
+        struct irq_forward_level_triggered *level_irq = kzalloc(sizeof(struct irq_forward_level_triggered), GFP_KERNEL);
+        if (!level_irq)
+                return -ENOMEM;
+        level_irq->trigger = NULL;
+        level_irq->irq_num = irq_number_host;
+        level_irq->unmask = NULL;
+        level_irq->is_masked = true;
+        list_add(&(level_irq->list), &level_triggered_irqs);
+
+        return irq_forward_set_level_trigger(data, irq_number_host, level_irq);
+}
+
+int set_irqs_ioctl_level_unmask(uint32_t irq_number_host, void *data) {
+
+        struct list_head* position = NULL;
+        struct irq_forward_level_triggered *level_irq = NULL;
+        // We must already have a trigger for the IRQ before we add an unmask
+        list_for_each(position, &level_triggered_irqs) {
+                level_irq = list_entry(position, struct irq_forward_level_triggered, list);
+                if (level_irq->irq_num == irq_number_host)
+                        return irq_forward_set_level_unmask(data, level_irq);
+        }
+
+        return -1;
+}
+
+int set_irqs_ioctl_edge_trigger(uint32_t irq_number_host, void *data) {
+
+        struct irq_forward_edge_triggered *edge_irq = kzalloc(sizeof(struct irq_forward_edge_triggered), GFP_KERNEL);
+        if (!edge_irq)
+                return -ENOMEM;
+        edge_irq->trigger = NULL;
+        edge_irq->irq_num = irq_number_host;
+        list_add(&(edge_irq->list), &edge_triggered_irqs);
+
+        return irq_forward_set_edge_trigger(data, irq_number_host, edge_irq);
+}
+
+int irq_forward_ioctl(void *device_data, unsigned long arg)
+{
+	u8 *data = NULL;
+	unsigned long minsz;
+	struct irq_forward_set hdr;
+
+
+	minsz = offsetofend(struct irq_forward_set, count);
+
+	if (copy_from_user(&hdr, (void __user *)arg, minsz))
+                return -EFAULT;
+
+	data = memdup_user((void __user *)(arg + minsz), sizeof(int32_t));
+	if (IS_ERR(data))
+	        return PTR_ERR(data);
+
+    switch (hdr.action_flags)
+    {
+        case IRQ_FORWARD_SET_LEVEL_TRIGGER_EVENTFD:
+            return set_irqs_ioctl_level_trigger(hdr.irq_number_host, data);
+        case IRQ_FORWARD_SET_LEVEL_UNMASK_EVENTFD:
+            return set_irqs_ioctl_level_unmask(hdr.irq_number_host, data);
+        case IRQ_FORWARD_SET_EDGE_TRIGGER:
+            return set_irqs_ioctl_edge_trigger(hdr.irq_number_host, data);
+        default:
+            return -EINVAL;
+    }
+
+	kfree(data);
+	return 0;
+}
+
+/**
+ * IRQ Forwarding fd, /dev/irq-forward
+ */
+static long irq_forward_fops_unl_ioctl(struct file *filep,
+				unsigned int cmd, unsigned long arg)
+{
+	long ret = -EINVAL;
+
+	switch (cmd) {
+	case IRQ_FORWARD_SET:
+		ret = (long) irq_forward_ioctl(filep, arg);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+#ifdef CONFIG_COMPAT
+static long irq_forward_fops_compat_ioctl(struct file *filep,
+				   unsigned int cmd, unsigned long arg)
+{
+	arg = (unsigned long)compat_ptr(arg);
+	return irq_forward_fops_unl_ioctl(filep, cmd, arg);
+}
+#endif	/* CONFIG_COMPAT */
+
+static int irq_forward_fops_open(struct inode *inode, struct file *filep)
+{
+	return 0;
+}
+
+static int irq_forward_fops_release(struct inode *inode, struct file *filep)
+{
+	return 0;
+}
+
+static const struct file_operations irq_forward_fops = {
+	.owner		= THIS_MODULE,
+	.open		= irq_forward_fops_open,
+	.release	= irq_forward_fops_release,
+	.unlocked_ioctl	= irq_forward_fops_unl_ioctl,
+#ifdef CONFIG_COMPAT
+	.compat_ioctl	= irq_forward_fops_compat_ioctl,
+#endif
+};
+
+static struct miscdevice irq_forward_dev = {
+	.minor = IRQ_FORWARD_MINOR,
+	.name = "irq-forward",
+	.fops = &irq_forward_fops,
+	.nodename = "irq-forward",
+	.mode = S_IRUGO | S_IWUGO,
+};
+
+static int __init irq_forward_init(void)
+{
+	int ret;
+
+	ret = misc_register(&irq_forward_dev);
+	if (ret) {
+		pr_err("irq-forward: misc device register failed\n");
+		return ret;
+	}
+
+	pr_info(DESC " version: " VERSION "\n");
+
+	return 0;
+}
+
+// TODO: cleanup/free/disconnect stuff
+static void __exit irq_forward_cleanup(void)
+{
+	misc_deregister(&irq_forward_dev);
+}
+
+module_init(irq_forward_init);
+module_exit(irq_forward_cleanup);
-- 
2.26.2.645.ge9eca65c58-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] KVM: Add module for IRQ forwarding
  2020-05-11 22:00 [RFC PATCH] KVM: Add module for IRQ forwarding Micah Morton
@ 2020-05-12 17:14 ` Alex Williamson
  2020-05-13  7:02   ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: Alex Williamson @ 2020-05-12 17:14 UTC (permalink / raw)
  To: Micah Morton; +Cc: kvm, pbonzini, jmattson

On Mon, 11 May 2020 15:00:46 -0700
Micah Morton <mortonm@chromium.org> wrote:

> Relevant KVM maintainers:
> I'm looking for comments on the feasibility of adding a module like
> this in KVM for solving the problem described below:
> 
> 
> Currently, KVM/VFIO offers no way to forward interrupts into a guest
> from devices that are implicitly assigned to the VM by nature of being
> downstream from a bus controller PCI device (e.g. I2C controller) that
> gets explicitly assigned to the VM. This module allows for forwarding
> arbitrary interrupts on the host system into the guest, supporting this
> platform-device-behind-PCI-controller scenario.
> 
> This code is mostly inspired/taken from the equivalent code in VFIO. It
> is not a finished product, but I wanted to check in with the KVM mailing
> list in order to assess feasibility before doing any more work on it.
> 
> One obvious question would be why not just add this support to VFIO?
> See https://www.redhat.com/archives/vfio-users/2019-December/msg00008.html
> and the encompassing thread for a discussion as to why this probably
> isn't the way to go.
> 
> Forwarding arbitrary IRQs to a guest VM does require the VMM to "tell"
> the guest about the interrupt (e.g. through ACPI), since such info is
> not discoverable by the guest like it is for PCI devices. So separate
> logic is needed in the VMM to set this up -- this isn't something done
> by the module shared here.
> 
> Forwarding platform IRQs can have a big payoff for getting platform
> devices to work in a guest, especially when the platform devices sit
> behind a PCI bus controller that can be easily passed through to the
> guest. On an Intel device I'm using for development, this module allowed
> me to get multiple devices (keyboard, touchscreen, touchpad) working in
> a VM guest on the device that wouldn't have worked otherwise -- straight
> out of the box after passing through the PCI bus controller with
> vfio-pci (plus constructing some AML for the guest in the VMM).

But why not assign the individual platform devices via vfio-platform
rather than assign the i2c controller via vfio-pci and then assembling
the interrupts from those sub-devices with this ad-hoc interface?  An
emulated i2c controller in the guest could provide the same discovery
mechanism as is available in the host.

A few issues I see here:

 - Please don't squat on vfio ioctls (ie. IRQ_FORWARD_BASE/TYPE)

 - This provides only a single /dev/irq-forward device file with default
   user/group/other r/w permissions.  It's not secured by default and
   there's no way to restrict a given IRQ to a given user.  Access to
   this file allows the user to be signaled for any host IRQ the user
   chooses, and in the event of level IRQs, register an exclusive
   interrupt handler, blocking host drivers from legitimately
   registering shared IRQ handlers.  This is fatally poor security, IMO.

 - There's no lock protection for concurrent access to level and edge
   lists.

If we had an interface where arbitrary interrupts are exposed to
userspace, I think at a minimum we'd need a device file per IRQ so that
we can give access to a specific IRQ to a specific user.  However that
puts quite a burden on VM orchestration tools to be able to determine
this hidden IRQ dependency.  Thanks,

Alex

> NOTE: This code works for forwarding IRQs to a guest (with the VMM
> calling the appropriate ioctls with the appropriate args), although it's
> missing some code and testing related to shutdown/irq disable/reboot.
> Works well enough to demonstrate the feasibility though.
> 
> Developed on top of v5.7-rc4.
> 
> Signed-off-by: Micah Morton <mortonm@chromium.org>
> ---
>  include/linux/irqfd.h           |  22 +++
>  include/linux/miscdevice.h      |   1 +
>  include/uapi/linux/irqforward.h |  55 ++++++
>  virt/lib/Kconfig                |   3 +
>  virt/lib/Makefile               |   1 +
>  virt/lib/irqfd.c                | 146 ++++++++++++++++
>  virt/lib/irqforward.c           | 289 ++++++++++++++++++++++++++++++++
>  7 files changed, 517 insertions(+)
>  create mode 100644 include/linux/irqfd.h
>  create mode 100644 include/uapi/linux/irqforward.h
>  create mode 100644 virt/lib/irqfd.c
>  create mode 100644 virt/lib/irqforward.c
> 
> diff --git a/include/linux/irqfd.h b/include/linux/irqfd.h
> new file mode 100644
> index 000000000000..79d2a8c779e1
> --- /dev/null
> +++ b/include/linux/irqfd.h
> @@ -0,0 +1,22 @@
> +/*
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +#ifndef IRQFD_H
> +#define IRQFD_H
> +
> +#include <linux/poll.h>
> +
> +struct irq_forward_irqfd {
> +	struct eventfd_ctx	*eventfd;
> +	int			(*handler)(void *, void *);
> +	void			*data;
> +	wait_queue_entry_t	wait;
> +	poll_table		pt;
> +	struct work_struct	shutdown;
> +	struct irq_forward_irqfd		**pirqfd;
> +};
> +
> +int irq_forward_irqfd_enable(int (*handler)(void *, void *), void *data, struct irq_forward_irqfd **pirqfd, int fd);
> +#endif /* IRQFD_H */
> diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h
> index c7a93002a3c1..f17b37fb8264 100644
> --- a/include/linux/miscdevice.h
> +++ b/include/linux/miscdevice.h
> @@ -50,6 +50,7 @@
>  #define D7S_MINOR		193
>  #define VFIO_MINOR		196
>  #define PXA3XX_GCU_MINOR	197
> +#define IRQ_FORWARD_MINOR       198
>  #define TUN_MINOR		200
>  #define CUSE_MINOR		203
>  #define MWAVE_MINOR		219	/* ACP/Mwave Modem */
> diff --git a/include/uapi/linux/irqforward.h b/include/uapi/linux/irqforward.h
> new file mode 100644
> index 000000000000..a77aaa4841b5
> --- /dev/null
> +++ b/include/uapi/linux/irqforward.h
> @@ -0,0 +1,55 @@
> +/*
> + * API definition for IRQ Forwarding to KVM guests
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +#ifndef _UAPIIRQFORWARD_H
> +#define _UAPIIRQFORWARD_H
> +
> +#include <linux/ioctl.h>
> +
> +#define IRQ_FORWARD_API_VERSION	0
> +
> +#define IRQ_FORWARD_TYPE       (';')
> +#define IRQ_FORWARD_BASE       100
> +
> +struct irq_forward_edge_triggered {
> +	struct eventfd_ctx *trigger;
> +	uint32_t irq_num;
> +	struct list_head list;
> +};
> +
> +struct irq_forward_level_triggered {
> +	struct eventfd_ctx *trigger;
> +	struct irq_forward_irqfd *unmask;
> +	bool is_masked;
> +	spinlock_t spinlock;
> +	uint32_t irq_num;
> +	struct list_head list;
> +};
> +
> +/**
> + *
> + * Set masking and unmasking of interrupts.  Caller provides
> + * struct irq_forward_set with all fields set.
> + *
> + */
> +struct irq_forward_set {
> +	__u32	argsz;
> +	__u32	action_flags;
> +#define IRQ_FORWARD_SET_LEVEL_TRIGGER_EVENTFD	(1 << 0)
> +#define IRQ_FORWARD_SET_LEVEL_UNMASK_EVENTFD	(1 << 1)
> +#define IRQ_FORWARD_SET_EDGE_TRIGGER		(1 << 2)
> +	__u32	irq_number_host;
> +	__u32	count;
> +	__u8	eventfd[];
> +};
> +
> +/* ---- IOCTLs for IRQ Forwarding fd (/dev/irq-forward) ---- */
> +#define IRQ_FORWARD_SET _IO(IRQ_FORWARD_TYPE, IRQ_FORWARD_BASE + 0)
> +
> +/* *********************************************************************** */
> +
> +#endif /* _UAPIIRQFORWARD_H */
> diff --git a/virt/lib/Kconfig b/virt/lib/Kconfig
> index 2d9523b7155e..847b06a95c14 100644
> --- a/virt/lib/Kconfig
> +++ b/virt/lib/Kconfig
> @@ -1,3 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  config IRQ_BYPASS_MANAGER
>  	tristate
> +
> +config IRQ_FORWARD
> +	tristate "Enable forwarding arbitrary IRQs to guest in KVM"
> diff --git a/virt/lib/Makefile b/virt/lib/Makefile
> index bd7f9a78bb6b..bd46aad8d426 100644
> --- a/virt/lib/Makefile
> +++ b/virt/lib/Makefile
> @@ -1,2 +1,3 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  obj-$(CONFIG_IRQ_BYPASS_MANAGER) += irqbypass.o
> +obj-$(CONFIG_IRQ_FORWARD) += irqforward.o irqfd.o
> diff --git a/virt/lib/irqfd.c b/virt/lib/irqfd.c
> new file mode 100644
> index 000000000000..4eb47a1c1e6f
> --- /dev/null
> +++ b/virt/lib/irqfd.c
> @@ -0,0 +1,146 @@
> +#include <linux/file.h>
> +#include <linux/vfio.h>
> +#include <linux/eventfd.h>
> +#include <linux/slab.h>
> +#include <uapi/linux/irqforward.h>
> +#include <linux/irqfd.h>
> +
> +static struct workqueue_struct *irqfd_cleanup_wq;
> +static DEFINE_SPINLOCK(irqfd_lock);
> +
> +static void irqfd_deactivate(struct irq_forward_irqfd *irqfd)
> +{
> +        queue_work(irqfd_cleanup_wq, &irqfd->shutdown);
> +}
> +
> +static int irqfd_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync, void *key)
> +{
> +        struct irq_forward_irqfd *irqfd = container_of(wait, struct irq_forward_irqfd, wait);
> +        __poll_t flags = key_to_poll(key);
> +
> +        if (flags & EPOLLIN) {
> +                /* An event has been signaled, call function */
> +                if (!irqfd->handler ||
> +                     irqfd->handler(NULL, irqfd->data))
> +                        printk(KERN_EMERG "handler failed\n");
> +        }
> +
> +        if (flags & EPOLLHUP) {
> +                unsigned long flags;
> +                spin_lock_irqsave(&irqfd_lock, flags);
> +
> +                /*
> +                 * The eventfd is closing, if the irqfd has not yet been
> +                 * queued for release, as determined by testing whether the
> +                 * irqfd pointer to it is still valid, queue it now.  As
> +                 * with kvm irqfds, we know we won't race against the irqfd
> +                 * going away because we hold the lock to get here.
> +                 */
> +                if (*(irqfd->pirqfd) == irqfd) {
> +                        *(irqfd->pirqfd) = NULL;
> +                        irqfd_deactivate(irqfd);
> +                }
> +
> +                spin_unlock_irqrestore(&irqfd_lock, flags);
> +        }
> +
> +        return 0;
> +}
> +
> +
> +static void irqfd_ptable_queue_proc(struct file *file,
> +                                     wait_queue_head_t *wqh, poll_table *pt)
> +{
> +        struct irq_forward_irqfd *irqfd = container_of(pt, struct irq_forward_irqfd, pt);
> +        add_wait_queue(wqh, &irqfd->wait);
> +}
> +
> +static void irqfd_shutdown(struct work_struct *work)
> +{
> +        struct irq_forward_irqfd *irqfd = container_of(work, struct irq_forward_irqfd, shutdown);
> +        u64 cnt;
> +
> +        eventfd_ctx_remove_wait_queue(irqfd->eventfd, &irqfd->wait, &cnt);
> +        eventfd_ctx_put(irqfd->eventfd);
> +
> +        kfree(irqfd);
> +}
> +
> +int irq_forward_irqfd_enable(int (*handler)(void *, void *), void *data, struct irq_forward_irqfd **pirqfd, int fd)
> +{
> +        struct fd irqfd;
> +        struct eventfd_ctx *ctx;
> +        struct irq_forward_irqfd *irqfd_struct;
> +        int ret = 0;
> +        unsigned int events;
> +
> +        irqfd_struct = kzalloc(sizeof(*irqfd_struct), GFP_KERNEL);
> +        if (!irqfd_struct)
> +                return -ENOMEM;
> +
> +        irqfd_struct->pirqfd = pirqfd;
> +        irqfd_struct->handler = handler;
> +        irqfd_struct->data = data;
> +
> +        // shutdown causes crash
> +        INIT_WORK(&irqfd_struct->shutdown, irqfd_shutdown);
> +
> +        irqfd = fdget(fd);
> +        if (!irqfd.file) {
> +                ret = -EBADF;
> +                goto err_fd;
> +        }
> +
> +        ctx = eventfd_ctx_fileget(irqfd.file);
> +        if (IS_ERR(ctx)) {
> +                ret = PTR_ERR(ctx);
> +                goto err_ctx;
> +        }
> +
> +        irqfd_struct->eventfd = ctx;
> +
> +         // irqfds can be released by closing the eventfd or directly
> +         // through ioctl.  These are both done through a workqueue, so
> +         // we update the pointer to the irqfd under lock to avoid
> +         // pushing multiple jobs to release the same irqfd.
> +        spin_lock_irq(&irqfd_lock);
> +
> +        if (*pirqfd) {
> +                printk(KERN_EMERG "pirqfd should be NULL. BUG!\n");
> +                spin_unlock_irq(&irqfd_lock);
> +                ret = -EBUSY;
> +                goto err_busy;
> +        }
> +        *pirqfd = irqfd_struct;
> +
> +        spin_unlock_irq(&irqfd_lock);
> +
> +         // Install our own custom wake-up handling so we are notified via
> +         // a callback whenever someone signals the underlying eventfd.
> +        init_waitqueue_func_entry(&irqfd_struct->wait, irqfd_wakeup);
> +        init_poll_funcptr(&irqfd_struct->pt, irqfd_ptable_queue_proc);
> +
> +        events = irqfd.file->f_op->poll(irqfd.file, &irqfd_struct->pt);
> +
> +         // Check if there was an event already pending on the eventfd
> +         // before we registered and trigger it as if we didn't miss it.
> +        if (events & POLLIN) {
> +                if (!handler || handler(NULL, data))
> +                        printk(KERN_EMERG "handler failed\n");
> +        }
> +
> +         // Do not drop the file until the irqfd is fully initialized,
> +         // otherwise we might race against the POLLHUP.
> +        fdput(irqfd);
> +
> +        return 0;
> +err_busy:
> +        eventfd_ctx_put(ctx);
> +err_ctx:
> +        fdput(irqfd);
> +err_fd:
> +        kfree(irqfd_struct);
> +
> +        return ret;
> +}
> +EXPORT_SYMBOL_GPL(irq_forward_irqfd_enable);
> diff --git a/virt/lib/irqforward.c b/virt/lib/irqforward.c
> new file mode 100644
> index 000000000000..1d5030d347aa
> --- /dev/null
> +++ b/virt/lib/irqforward.c
> @@ -0,0 +1,289 @@
> +#include <linux/cdev.h>
> +#include <linux/compat.h>
> +#include <linux/file.h>
> +#include <linux/fs.h>
> +#include <linux/miscdevice.h>
> +#include <linux/module.h>
> +#include <linux/pci.h>
> +#include <linux/sched.h>
> +//#include <linux/vfio.h>
> +#include <linux/eventfd.h>
> +#include <linux/delay.h>
> +#include <uapi/linux/irqforward.h>
> +#include <linux/irqfd.h>
> +
> +#define VERSION	"0.1"
> +#define AUTHOR	"Micah Morton <mortonm@chromium.org>"
> +#define DESC	"IRQ Forwarding"
> +
> +MODULE_VERSION(VERSION);
> +MODULE_LICENSE("GPL v2");
> +MODULE_AUTHOR(AUTHOR);
> +MODULE_DESCRIPTION(DESC);
> +MODULE_ALIAS_MISCDEV(IRQ_FORWARD_MINOR);
> +MODULE_ALIAS("devname:irq-forward");
> +
> +static LIST_HEAD(level_triggered_irqs);
> +static LIST_HEAD(edge_triggered_irqs);
> +
> +
> +static int irq_forward_unmask_handler_level(void *opaque, void *level)
> +{
> +	unsigned long flags;
> +	struct irq_forward_level_triggered *l = (struct irq_forward_level_triggered *) level;
> +
> +	spin_lock_irqsave(&(l->spinlock), flags);
> +	if (l->is_masked) {
> +		enable_irq(l->irq_num);
> +		l->is_masked = false;
> +	}
> +	spin_unlock_irqrestore(&(l->spinlock), flags);
> +	return 0;
> +}
> +
> +
> +static irqreturn_t irq_forward_handler_level(int irq, void *level)
> +{
> +	unsigned long flags;
> +	int ret = IRQ_NONE;
> +	struct irq_forward_level_triggered *l = (struct irq_forward_level_triggered *) level;
> +	spin_lock_irqsave(&(l->spinlock), flags);
> +
> +	disable_irq_nosync(irq);
> +	l->is_masked = true;
> +	ret = IRQ_HANDLED;
> +
> +	spin_unlock_irqrestore(&(l->spinlock), flags);
> +
> +	if (ret == IRQ_HANDLED)
> +	        eventfd_signal(l->trigger, 1);
> +
> +	return ret;
> +}
> +
> +static irqreturn_t irq_forward_handler_edge(int irq, void *edge)
> +{
> +        eventfd_signal(((struct irq_forward_edge_triggered*)edge)->trigger, 1);
> +
> +        return IRQ_HANDLED;
> +}
> +
> +static int irq_forward_set_level_trigger(void *data, uint32_t irq_number_host, struct irq_forward_level_triggered *level)
> +{
> +	int32_t fd;
> +	struct eventfd_ctx *trigger;
> +	int ret;
> +
> +	fd = *(int32_t *)data;
> +
> +	if (fd < 0) /* Disable only */
> +	        return 0;
> +
> +	trigger = eventfd_ctx_fdget(fd);
> +	if (IS_ERR(trigger)) {
> +	        return PTR_ERR(trigger);
> +	}
> +
> +	level->trigger = trigger;
> +	spin_lock_init(&(level->spinlock));
> +
> +	ret = request_irq(irq_number_host, irq_forward_handler_level, 0, "level-triggered-irq", level);
> +	if (ret) {
> +	        level->trigger = NULL;
> +	        eventfd_ctx_put(trigger);
> +	        return ret;
> +	}
> +
> +	return 0;
> +
> +}
> +
> +static int irq_forward_set_level_unmask(void *data, struct irq_forward_level_triggered *level)
> +{
> +	int32_t fd;
> +	fd = *(int32_t *)data;
> +
> +	if (fd >= 0)
> +	        return irq_forward_irqfd_enable(irq_forward_unmask_handler_level, level, &(level->unmask), fd);
> +	return -1;
> +}
> +
> +static int irq_forward_set_edge_trigger(void *data, uint32_t irq_number_host, struct irq_forward_edge_triggered *edge)
> +{
> +	struct eventfd_ctx *trigger;
> +    int ret;
> +	int32_t fd;
> +	fd = *(int32_t *)data;
> +
> +	if (fd < 0) /* Disable only */
> +                return 0;
> +
> +        trigger = eventfd_ctx_fdget(fd);
> +        if (IS_ERR(trigger)) {
> +                return PTR_ERR(trigger);
> +        }
> +
> +        edge->trigger = trigger;
> +
> +        ret = request_irq(irq_number_host, irq_forward_handler_edge, IRQF_SHARED, "edge-triggered-irq", edge);
> +        if (ret) {
> +                edge->trigger = NULL;
> +                eventfd_ctx_put(trigger);
> +                return ret;
> +        }
> +
> +        return 0;
> +}
> +
> +
> +int set_irqs_ioctl_level_trigger(uint32_t irq_number_host, void *data) {
> +
> +        struct irq_forward_level_triggered *level_irq = kzalloc(sizeof(struct irq_forward_level_triggered), GFP_KERNEL);
> +        if (!level_irq)
> +                return -ENOMEM;
> +        level_irq->trigger = NULL;
> +        level_irq->irq_num = irq_number_host;
> +        level_irq->unmask = NULL;
> +        level_irq->is_masked = true;
> +        list_add(&(level_irq->list), &level_triggered_irqs);
> +
> +        return irq_forward_set_level_trigger(data, irq_number_host, level_irq);
> +}
> +
> +int set_irqs_ioctl_level_unmask(uint32_t irq_number_host, void *data) {
> +
> +        struct list_head* position = NULL;
> +        struct irq_forward_level_triggered *level_irq = NULL;
> +        // We must already have a trigger for the IRQ before we add an unmask
> +        list_for_each(position, &level_triggered_irqs) {
> +                level_irq = list_entry(position, struct irq_forward_level_triggered, list);
> +                if (level_irq->irq_num == irq_number_host)
> +                        return irq_forward_set_level_unmask(data, level_irq);
> +        }
> +
> +        return -1;
> +}
> +
> +int set_irqs_ioctl_edge_trigger(uint32_t irq_number_host, void *data) {
> +
> +        struct irq_forward_edge_triggered *edge_irq = kzalloc(sizeof(struct irq_forward_edge_triggered), GFP_KERNEL);
> +        if (!edge_irq)
> +                return -ENOMEM;
> +        edge_irq->trigger = NULL;
> +        edge_irq->irq_num = irq_number_host;
> +        list_add(&(edge_irq->list), &edge_triggered_irqs);
> +
> +        return irq_forward_set_edge_trigger(data, irq_number_host, edge_irq);
> +}
> +
> +int irq_forward_ioctl(void *device_data, unsigned long arg)
> +{
> +	u8 *data = NULL;
> +	unsigned long minsz;
> +	struct irq_forward_set hdr;
> +
> +
> +	minsz = offsetofend(struct irq_forward_set, count);
> +
> +	if (copy_from_user(&hdr, (void __user *)arg, minsz))
> +                return -EFAULT;
> +
> +	data = memdup_user((void __user *)(arg + minsz), sizeof(int32_t));
> +	if (IS_ERR(data))
> +	        return PTR_ERR(data);
> +
> +    switch (hdr.action_flags)
> +    {
> +        case IRQ_FORWARD_SET_LEVEL_TRIGGER_EVENTFD:
> +            return set_irqs_ioctl_level_trigger(hdr.irq_number_host, data);
> +        case IRQ_FORWARD_SET_LEVEL_UNMASK_EVENTFD:
> +            return set_irqs_ioctl_level_unmask(hdr.irq_number_host, data);
> +        case IRQ_FORWARD_SET_EDGE_TRIGGER:
> +            return set_irqs_ioctl_edge_trigger(hdr.irq_number_host, data);
> +        default:
> +            return -EINVAL;
> +    }
> +
> +	kfree(data);
> +	return 0;
> +}
> +
> +/**
> + * IRQ Forwarding fd, /dev/irq-forward
> + */
> +static long irq_forward_fops_unl_ioctl(struct file *filep,
> +				unsigned int cmd, unsigned long arg)
> +{
> +	long ret = -EINVAL;
> +
> +	switch (cmd) {
> +	case IRQ_FORWARD_SET:
> +		ret = (long) irq_forward_ioctl(filep, arg);
> +		break;
> +	default:
> +		ret = -EINVAL;
> +	}
> +
> +	return ret;
> +}
> +
> +#ifdef CONFIG_COMPAT
> +static long irq_forward_fops_compat_ioctl(struct file *filep,
> +				   unsigned int cmd, unsigned long arg)
> +{
> +	arg = (unsigned long)compat_ptr(arg);
> +	return irq_forward_fops_unl_ioctl(filep, cmd, arg);
> +}
> +#endif	/* CONFIG_COMPAT */
> +
> +static int irq_forward_fops_open(struct inode *inode, struct file *filep)
> +{
> +	return 0;
> +}
> +
> +static int irq_forward_fops_release(struct inode *inode, struct file *filep)
> +{
> +	return 0;
> +}
> +
> +static const struct file_operations irq_forward_fops = {
> +	.owner		= THIS_MODULE,
> +	.open		= irq_forward_fops_open,
> +	.release	= irq_forward_fops_release,
> +	.unlocked_ioctl	= irq_forward_fops_unl_ioctl,
> +#ifdef CONFIG_COMPAT
> +	.compat_ioctl	= irq_forward_fops_compat_ioctl,
> +#endif
> +};
> +
> +static struct miscdevice irq_forward_dev = {
> +	.minor = IRQ_FORWARD_MINOR,
> +	.name = "irq-forward",
> +	.fops = &irq_forward_fops,
> +	.nodename = "irq-forward",
> +	.mode = S_IRUGO | S_IWUGO,
> +};
> +
> +static int __init irq_forward_init(void)
> +{
> +	int ret;
> +
> +	ret = misc_register(&irq_forward_dev);
> +	if (ret) {
> +		pr_err("irq-forward: misc device register failed\n");
> +		return ret;
> +	}
> +
> +	pr_info(DESC " version: " VERSION "\n");
> +
> +	return 0;
> +}
> +
> +// TODO: cleanup/free/disconnect stuff
> +static void __exit irq_forward_cleanup(void)
> +{
> +	misc_deregister(&irq_forward_dev);
> +}
> +
> +module_init(irq_forward_init);
> +module_exit(irq_forward_cleanup);


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] KVM: Add module for IRQ forwarding
  2020-05-12 17:14 ` Alex Williamson
@ 2020-05-13  7:02   ` Paolo Bonzini
  2020-05-13 14:34     ` Alex Williamson
  2020-05-13 21:52     ` Micah Morton
  0 siblings, 2 replies; 19+ messages in thread
From: Paolo Bonzini @ 2020-05-13  7:02 UTC (permalink / raw)
  To: Alex Williamson, Micah Morton; +Cc: kvm, jmattson

On 12/05/20 19:14, Alex Williamson wrote:
> But why not assign the individual platform devices via vfio-platform
> rather than assign the i2c controller via vfio-pci and then assembling
> the interrupts from those sub-devices with this ad-hoc interface?  An
> emulated i2c controller in the guest could provide the same discovery
> mechanism as is available in the host.

I agree.  I read the whole discussion, but I still don't understand why
this is not using vfio-platform.

Alternatively, if you assign the i2c controller, I don't understand why
the guest doesn't discover interrupts on its own.  Of course you need to
tell the guest about the devices in the ACPI tables, but why is this new
concept necessary?

(Finally, in the past we were doing device assignment tasks within KVM
and it was a bad idea.  Anything you want to do within KVM with respect
to device assignment, someone else will want to do it from bare metal.
virt/lib/irqbypass.c is a special case because it's an IOMMU feature
that is designed to work in concert with VMX posted interrupts and SVM
AVIC, so in guest mode only).

Paolo


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] KVM: Add module for IRQ forwarding
  2020-05-13  7:02   ` Paolo Bonzini
@ 2020-05-13 14:34     ` Alex Williamson
  2020-05-13 15:23       ` Paolo Bonzini
  2020-05-13 21:52     ` Micah Morton
  1 sibling, 1 reply; 19+ messages in thread
From: Alex Williamson @ 2020-05-13 14:34 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Micah Morton, kvm, jmattson

On Wed, 13 May 2020 09:02:16 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 12/05/20 19:14, Alex Williamson wrote:
> > But why not assign the individual platform devices via vfio-platform
> > rather than assign the i2c controller via vfio-pci and then assembling
> > the interrupts from those sub-devices with this ad-hoc interface?  An
> > emulated i2c controller in the guest could provide the same discovery
> > mechanism as is available in the host.  
> 
> I agree.  I read the whole discussion, but I still don't understand why
> this is not using vfio-platform.
> 
> Alternatively, if you assign the i2c controller, I don't understand why
> the guest doesn't discover interrupts on its own.  Of course you need to
> tell the guest about the devices in the ACPI tables, but why is this new
> concept necessary?

The i2c controller is a PCI device, it can be assigned with vfio-pci
and we can use it to probe the i2c bus and find the sub-devices.
However the interrupt for this sub-device is unrelated to the PCI
controller device, it's an entirely arbitrary (from our perspective)
relationship described via ACPI.  So the guest needs an ACPI blob to
describe the interrupt and then access to the interrupt.  This is
what's new here.

We could potentially use device specific interrupts to expose this via
the controller device, but then vfio-pci needs to learn how to
essentially become an i2c controller to enumerate the sub-devices and
collect external dependencies.  This is not an approach I've embraced
versus the alternative of the host i2c driver claiming the PCI
controller, enumerating the sub-devices, and binding the resulting
device, complete with built-in interrupt support via vfio-platform.
Thanks,

Alex


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] KVM: Add module for IRQ forwarding
  2020-05-13 14:34     ` Alex Williamson
@ 2020-05-13 15:23       ` Paolo Bonzini
  2020-05-13 19:10         ` Micah Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2020-05-13 15:23 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Micah Morton, kvm, jmattson

On 13/05/20 16:34, Alex Williamson wrote:
>> Alternatively, if you assign the i2c controller, I don't understand why
>> the guest doesn't discover interrupts on its own.  Of course you need to
>> tell the guest about the devices in the ACPI tables, but why is this new
>> concept necessary?
>
> the interrupt for this sub-device is unrelated to the PCI
> controller device, it's an entirely arbitrary (from our perspective)
> relationship described via ACPI.

Ok, that's what I was missing.

> We could potentially use device specific interrupts to expose this via
> the controller device, but then vfio-pci needs to learn how to
> essentially become an i2c controller to enumerate the sub-devices and
> collect external dependencies.  This is not an approach I've embraced
> versus the alternative of the host i2c driver claiming the PCI
> controller, enumerating the sub-devices, and binding the resulting
> device, complete with built-in interrupt support via vfio-platform.

I agree that's the way to go.

Perhaps adding arbitrary non-PCI interrupts, i.e. neither INTX nor
MSI(-X), to vfio-pci could be acceptable.  However, the device claim
must claim them, and that seems hard to do when you rebind the PCI
device to pci-stub.

Paolo


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] KVM: Add module for IRQ forwarding
  2020-05-13 15:23       ` Paolo Bonzini
@ 2020-05-13 19:10         ` Micah Morton
  2020-05-13 22:05           ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: Micah Morton @ 2020-05-13 19:10 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Alex Williamson, kvm, jmattson

On Wed, May 13, 2020 at 8:23 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 13/05/20 16:34, Alex Williamson wrote:
> >> Alternatively, if you assign the i2c controller, I don't understand why
> >> the guest doesn't discover interrupts on its own.  Of course you need to
> >> tell the guest about the devices in the ACPI tables, but why is this new
> >> concept necessary?
> >
> > the interrupt for this sub-device is unrelated to the PCI
> > controller device, it's an entirely arbitrary (from our perspective)
> > relationship described via ACPI.
>
> Ok, that's what I was missing.
>
> > We could potentially use device specific interrupts to expose this via
> > the controller device, but then vfio-pci needs to learn how to
> > essentially become an i2c controller to enumerate the sub-devices and
> > collect external dependencies.  This is not an approach I've embraced
> > versus the alternative of the host i2c driver claiming the PCI
> > controller, enumerating the sub-devices, and binding the resulting
> > device, complete with built-in interrupt support via vfio-platform.
>
> I agree that's the way to go.

I think vfio-platform only builds for ARM at the moment
(https://elixir.bootlin.com/linux/v5.7-rc5/source/drivers/vfio/platform/Kconfig#L4),
and AFAICT vfio-platform is only for assigning DMA-capable devices on
ARM that are behind an IOMMU and listed in /sys/kernel/iommu_groups.
I’m not sure how much work it would take to get vfio-platform to
attach to platform devices on x86 and assign them to a guest, even
assuming we had the i2c controller emulation code* (part of the draw
for vfio-pci is to avoid emulating things in the host, so having to
emulate the i2c controller takes away from this advantage of vfio-pci
a bit). This might be a good approach for assigning platform devices
on x86 which are non-DMA-capable in the case you _don’t_ want to
assign all the platform devices on the bus together to the guest. I’m
not particularly interested in that scenario however, I’d rather just
give the whole bus to the guest, which makes vfio-pci very convenient.

* If we only care about the bus controller existing (in an emulated
fashion) enough for the guest to discover the device in question, this
could work. I’m concerned that power management could be an issue here
however. For instance, I have a touchscreen device assigned to the
guest (irq forwarding done with this module) that in response to the
screen being touched prepares the i2c controller for a transaction by
calling into the PM system which end up writing to the PCI config
space** (here https://elixir.bootlin.com/linux/v5.6.12/source/drivers/i2c/busses/i2c-designware-master.c#L435).
It seems like this kind of scenario expands the scope of what would
need to be supported by the emulated i2c controller, which is less
ideal. The way I have it currently working, vfio-pci emulates the PCI
config space so the guest can do power management by accessing that
space.

** Call stack looks like this:

[  +0.000023]  [<ffffffffb53a029f>] dump_stack+0x97/0xdb
[  +0.000012]  [<ffffffffb53e3c0c>] pci_write_config_word+0x3c/0x56
[  +0.000015]  [<ffffffffb53e2cc6>] pci_raw_set_power_state+0x106/0x220
[  +0.000014]  [<ffffffffb53e2b0b>] pci_set_power_state+0x50/0xbd
[  +0.000013]  [<ffffffffb53e8ea7>] pci_restore_standard_config+0x27/0x36
[  +0.000012]  [<ffffffffb53e8d56>] pci_pm_runtime_resume+0x3a/0xa6
[  +0.000013]  [<ffffffffb53e8d1c>] ? pci_pm_runtime_suspend+0x139/0x139
[  +0.000015]  [<ffffffffb557011e>] __rpm_callback+0x7a/0xf4
[  +0.000012]  [<ffffffffb53e8d1c>] ? pci_pm_runtime_suspend+0x139/0x139
[  +0.000013]  [<ffffffffb557004b>] rpm_callback+0x27/0x80
[  +0.000012]  [<ffffffffb57419dd>] rpm_resume+0x53a/0x608
[  +0.000014]  [<ffffffffb51a4b4c>] ? update_process_times+0x5c/0x5c
[  +0.000010]  [<ffffffffb574193a>] rpm_resume+0x497/0x608
[  +0.000016]  [<ffffffffb58dfdb0>] ? __switch_to_asm+0x40/0x70
[  +0.000011]  [<ffffffffb574148b>] __pm_runtime_resume+0x6b/0x83
[  +0.000011]  [<ffffffffb574ac0d>] i2c_dw_xfer+0x30/0x3c1
[  +0.000012]  [<ffffffffb574a13f>] __i2c_transfer+0xa2/0x290
[  +0.000011]  [<ffffffffb565d12a>] i2c_master_recv+0xc3/0x10c
[  +0.000014]  [<ffffffffb56f79e1>] i2c_hid_get_input+0x2f/0x119
[  +0.000012]  [<ffffffffb56f79ab>] i2c_hid_irq+0x1b/0x22


Most important point: Am I missing something about vfio-platform working on x86?

>
> Perhaps adding arbitrary non-PCI interrupts, i.e. neither INTX nor
> MSI(-X), to vfio-pci could be acceptable.  However, the device claim
> must claim them, and that seems hard to do when you rebind the PCI
> device to pci-stub.
>
> Paolo
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] KVM: Add module for IRQ forwarding
  2020-05-13  7:02   ` Paolo Bonzini
  2020-05-13 14:34     ` Alex Williamson
@ 2020-05-13 21:52     ` Micah Morton
  1 sibling, 0 replies; 19+ messages in thread
From: Micah Morton @ 2020-05-13 21:52 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Alex Williamson, kvm, jmattson

On Wed, May 13, 2020 at 12:02 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 12/05/20 19:14, Alex Williamson wrote:
> > But why not assign the individual platform devices via vfio-platform
> > rather than assign the i2c controller via vfio-pci and then assembling
> > the interrupts from those sub-devices with this ad-hoc interface?  An
> > emulated i2c controller in the guest could provide the same discovery
> > mechanism as is available in the host.
>
> I agree.  I read the whole discussion, but I still don't understand why
> this is not using vfio-platform.
>
> Alternatively, if you assign the i2c controller, I don't understand why
> the guest doesn't discover interrupts on its own.  Of course you need to
> tell the guest about the devices in the ACPI tables, but why is this new
> concept necessary?
>
> (Finally, in the past we were doing device assignment tasks within KVM
> and it was a bad idea.  Anything you want to do within KVM with respect
> to device assignment, someone else will want to do it from bare metal.

Are you saying people would want to use this in non-virtualized
scenarios like running drivers in userspace without any VMM/guest? And
they could do that if this was part of VFIO and not part of KVM?

> virt/lib/irqbypass.c is a special case because it's an IOMMU feature
> that is designed to work in concert with VMX posted interrupts and SVM
> AVIC, so in guest mode only).
>
> Paolo
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] KVM: Add module for IRQ forwarding
  2020-05-13 19:10         ` Micah Morton
@ 2020-05-13 22:05           ` Paolo Bonzini
  2020-05-14 17:44             ` Micah Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2020-05-13 22:05 UTC (permalink / raw)
  To: Micah Morton; +Cc: Alex Williamson, kvm, jmattson

On 13/05/20 21:10, Micah Morton wrote:
> * If we only care about the bus controller existing (in an emulated
> fashion) enough for the guest to discover the device in question, this
> could work. I’m concerned that power management could be an issue here
> however. For instance, I have a touchscreen device assigned to the
> guest (irq forwarding done with this module) that in response to the
> screen being touched prepares the i2c controller for a transaction by
> calling into the PM system which end up writing to the PCI config
> space** (here https://elixir.bootlin.com/linux/v5.6.12/source/drivers/i2c/busses/i2c-designware-master.c#L435).
> It seems like this kind of scenario expands the scope of what would
> need to be supported by the emulated i2c controller, which is less
> ideal. The way I have it currently working, vfio-pci emulates the PCI
> config space so the guest can do power management by accessing that
> space.

This wouldn't be a problem.  When the emulated i2c controller starts a
transaction on th edevice, it will be performed by the host i2c
controller and this will lead to the same config space write.

I have another question: would it be possible to expose this IRQ through
/dev/i2c-* instead of messing with VFIO?

In fact, adding support for /dev/i2c passthrough to QEMU has long been a
pet idea of mine (my usecase was different though: the idea was to write
programs for a microcontroller on an ARM single board computer and run
them under QEMU in emulation mode).  It's not trivial, because there
could be some impedence mismatch between the guest (which might be
programmed against a low-level controller or might even do bit banging)
and the i2c-dev interface which is more high level.  Also QEMU cannot do
clock stretching right now.  However, it's certainly doable.

>> (Finally, in the past we were doing device assignment tasks within KVM
>> and it was a bad idea.  Anything you want to do within KVM with respect
>> to device assignment, someone else will want to do it from bare metal.
> 
> Are you saying people would want to use this in non-virtualized
> scenarios like running drivers in userspace without any VMM/guest? And
> they could do that if this was part of VFIO and not part of KVM?

Yes, see above for an example.

Paolo


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] KVM: Add module for IRQ forwarding
  2020-05-13 22:05           ` Paolo Bonzini
@ 2020-05-14 17:44             ` Micah Morton
  2020-05-14 21:17               ` Paolo Bonzini
  2020-05-15 10:11               ` [RFC PATCH] KVM: Add module for IRQ forwarding Auger Eric
  0 siblings, 2 replies; 19+ messages in thread
From: Micah Morton @ 2020-05-14 17:44 UTC (permalink / raw)
  To: Paolo Bonzini, Auger Eric; +Cc: Alex Williamson, kvm, jmattson

On Wed, May 13, 2020 at 3:05 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 13/05/20 21:10, Micah Morton wrote:
> > * If we only care about the bus controller existing (in an emulated
> > fashion) enough for the guest to discover the device in question, this
> > could work. I’m concerned that power management could be an issue here
> > however. For instance, I have a touchscreen device assigned to the
> > guest (irq forwarding done with this module) that in response to the
> > screen being touched prepares the i2c controller for a transaction by
> > calling into the PM system which end up writing to the PCI config
> > space** (here https://elixir.bootlin.com/linux/v5.6.12/source/drivers/i2c/busses/i2c-designware-master.c#L435).
> > It seems like this kind of scenario expands the scope of what would
> > need to be supported by the emulated i2c controller, which is less
> > ideal. The way I have it currently working, vfio-pci emulates the PCI
> > config space so the guest can do power management by accessing that
> > space.
>
> This wouldn't be a problem.  When the emulated i2c controller starts a
> transaction on th edevice, it will be performed by the host i2c
> controller and this will lead to the same config space write.

I guess what you're saying is there would be an i2c controller
(emulated PCI device) in the guest and the i2c device driver would
still call i2c_dw_xfer as above and the execution in the guest would
still continue all the way to pci_write_config_word(). Then when the
guest executes the actual config write it would trap to the host,
which would need to have the logic that the guest is trying to do
runtime PM commands on an emulated PCI device so we need to step in
and reset the actual PCI device on the host that backs that emulated
device. Is this right?

Again, this is assuming we have the infrastructure to pass platform
devices on x86 to the guest with vfio-platform, which I don't think is
the case. +Auger Eric (not sure why gmail puts your name backwards)
would you be able to comment on this based on my previous message?

>
> I have another question: would it be possible to expose this IRQ through
> /dev/i2c-* instead of messing with VFIO?
>
> In fact, adding support for /dev/i2c passthrough to QEMU has long been a
> pet idea of mine (my usecase was different though: the idea was to write
> programs for a microcontroller on an ARM single board computer and run
> them under QEMU in emulation mode).  It's not trivial, because there
> could be some impedence mismatch between the guest (which might be
> programmed against a low-level controller or might even do bit banging)
> and the i2c-dev interface which is more high level.  Also QEMU cannot do
> clock stretching right now.  However, it's certainly doable.

I agree that would be a cool thing to have in QEMU. Unfortunately I am
interested in assigning other PCI bus controllers to a guest VM and
(similar to the i2c example above) in some cases these busses (e.g.
LPC, SPI) have devices with arbitrary interrupts that need to be
forwarded into the guest for things to work.

I realize this may seem like an over-use of VFIO, but I'm actually
coming from the angle of wanting to assign _most_ of the important
hardware on my device to a VM guest, and I'm looking to avoid
emulation wherever possible. Of course there will be devices like the
IOAPIC for which emulation is unavoidable, but I think emulation is
avoidable here for the busses we've mentioned if there is a way to
forward arbitrary interrupts into the guest.

Since all these use cases are so close to working with vfio-pci right
out of the box, I was really hoping to come up with a simple and
generic solution to the arbitrary interrupt problem that can be used
for multiple bus types.

>
> >> (Finally, in the past we were doing device assignment tasks within KVM
> >> and it was a bad idea.  Anything you want to do within KVM with respect
> >> to device assignment, someone else will want to do it from bare metal.
> >
> > Are you saying people would want to use this in non-virtualized
> > scenarios like running drivers in userspace without any VMM/guest? And
> > they could do that if this was part of VFIO and not part of KVM?
>
> Yes, see above for an example.
>
> Paolo
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] KVM: Add module for IRQ forwarding
  2020-05-14 17:44             ` Micah Morton
@ 2020-05-14 21:17               ` Paolo Bonzini
  2020-05-14 22:43                 ` Alex Williamson
  2020-05-15 10:11               ` [RFC PATCH] KVM: Add module for IRQ forwarding Auger Eric
  1 sibling, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2020-05-14 21:17 UTC (permalink / raw)
  To: Micah Morton, Auger Eric; +Cc: Alex Williamson, kvm, jmattson

On 14/05/20 19:44, Micah Morton wrote:
> I realize this may seem like an over-use of VFIO, but I'm actually
> coming from the angle of wanting to assign _most_ of the important
> hardware on my device to a VM guest, and I'm looking to avoid
> emulation wherever possible. Of course there will be devices like the
> IOAPIC for which emulation is unavoidable, but I think emulation is
> avoidable here for the busses we've mentioned if there is a way to
> forward arbitrary interrupts into the guest.
> 
> Since all these use cases are so close to working with vfio-pci right
> out of the box, I was really hoping to come up with a simple and
> generic solution to the arbitrary interrupt problem that can be used
> for multiple bus types.

I shall defer to Alex on this, but I think the main issue here is that
these interrupts are not visible to Linux as pertaining to the pci-stub
device.  Is this correct?

Paolo


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] KVM: Add module for IRQ forwarding
  2020-05-14 21:17               ` Paolo Bonzini
@ 2020-05-14 22:43                 ` Alex Williamson
  2020-05-15 20:30                   ` Micah Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Alex Williamson @ 2020-05-14 22:43 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Micah Morton, Auger Eric, kvm, jmattson

On Thu, 14 May 2020 23:17:29 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 14/05/20 19:44, Micah Morton wrote:
> > I realize this may seem like an over-use of VFIO, but I'm actually
> > coming from the angle of wanting to assign _most_ of the important
> > hardware on my device to a VM guest, and I'm looking to avoid
> > emulation wherever possible. Of course there will be devices like the
> > IOAPIC for which emulation is unavoidable, but I think emulation is
> > avoidable here for the busses we've mentioned if there is a way to
> > forward arbitrary interrupts into the guest.
> > 
> > Since all these use cases are so close to working with vfio-pci right
> > out of the box, I was really hoping to come up with a simple and
> > generic solution to the arbitrary interrupt problem that can be used
> > for multiple bus types.  
> 
> I shall defer to Alex on this, but I think the main issue here is that
> these interrupts are not visible to Linux as pertaining to the pci-stub
> device.  Is this correct?

Yes.  Allowing a user to grant themselves access to an arbitrary
interrupt is a non-starter, vfio-pci needs to somehow know that the
user is entitled to that interrupt.  If we could do that, then we could
just add it as a device specific interrupt.  But how do we do that?

The quirk method to this might be to key off of the PCI vendor and
device ID of the PCI i2c controller, lookup DMI information to know if
we're on the platform that has this fixed association, and setup the
extra interrupt.  The more extensible, but potentially bloated solution
might be for vfio-pci to recognize the class code for a i2c controller
and implement a very simple bus walk at device probe time that collects
external dependencies.  I don't really know how the jump is made from
that bus walk to digging the interrupt resource out of ACPI though or
how many LoC would be required to perform the minimum possible
discovery to collect this association.

I notice in this RFC patch that you're using an exclusive interrupt for
level triggered interrupts and therefore masking at the APIC.
Requiring an exclusive interrupt is often a usability issue for PCI
devices that don't support DisINTx and obviously we don't have that for
non-PCI sub-devices.  What type of interrupt do you actually need for
this device?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] KVM: Add module for IRQ forwarding
  2020-05-14 17:44             ` Micah Morton
  2020-05-14 21:17               ` Paolo Bonzini
@ 2020-05-15 10:11               ` Auger Eric
  2020-05-15 16:39                 ` Micah Morton
  1 sibling, 1 reply; 19+ messages in thread
From: Auger Eric @ 2020-05-15 10:11 UTC (permalink / raw)
  To: Micah Morton, Paolo Bonzini; +Cc: Alex Williamson, kvm, jmattson

Hi Micah,

On 5/14/20 7:44 PM, Micah Morton wrote:
> On Wed, May 13, 2020 at 3:05 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 13/05/20 21:10, Micah Morton wrote:
>>> * If we only care about the bus controller existing (in an emulated
>>> fashion) enough for the guest to discover the device in question, this
>>> could work. I’m concerned that power management could be an issue here
>>> however. For instance, I have a touchscreen device assigned to the
>>> guest (irq forwarding done with this module) that in response to the
>>> screen being touched prepares the i2c controller for a transaction by
>>> calling into the PM system which end up writing to the PCI config
>>> space** (here https://elixir.bootlin.com/linux/v5.6.12/source/drivers/i2c/busses/i2c-designware-master.c#L435).
>>> It seems like this kind of scenario expands the scope of what would
>>> need to be supported by the emulated i2c controller, which is less
>>> ideal. The way I have it currently working, vfio-pci emulates the PCI
>>> config space so the guest can do power management by accessing that
>>> space.
>>
>> This wouldn't be a problem.  When the emulated i2c controller starts a
>> transaction on th edevice, it will be performed by the host i2c
>> controller and this will lead to the same config space write.
> 
> I guess what you're saying is there would be an i2c controller
> (emulated PCI device) in the guest and the i2c device driver would
> still call i2c_dw_xfer as above and the execution in the guest would
> still continue all the way to pci_write_config_word(). Then when the
> guest executes the actual config write it would trap to the host,
> which would need to have the logic that the guest is trying to do
> runtime PM commands on an emulated PCI device so we need to step in
> and reset the actual PCI device on the host that backs that emulated
> device. Is this right?
> 
> Again, this is assuming we have the infrastructure to pass platform
> devices on x86 to the guest with vfio-platform, which I don't think is
> the case. +Auger Eric (not sure why gmail puts your name backwards)
> would you be able to comment on this based on my previous message?

VFIO_PLATFORM only is compiled on ARM today but that's probably not the
main issue here. I don't know if the fact the platform devices you want
to assign are behind this PCI I2C controller does change anything in the
way we would bind the devices to vfio-platform.

Up to now, in QEMU we have only generated DT bindings for the assigned
platform devices. Generating AML code has never been experienced.

What I don't get in your existing POC is how your enumerate the platform
devices resources (regs, IRQs) behing your controller. I understand you
devised a solution to expose the specific IRQ but what about regs? How
are they presented to your guest?

Thanks

Eric


> 
>>
>> I have another question: would it be possible to expose this IRQ through
>> /dev/i2c-* instead of messing with VFIO?
>>
>> In fact, adding support for /dev/i2c passthrough to QEMU has long been a
>> pet idea of mine (my usecase was different though: the idea was to write
>> programs for a microcontroller on an ARM single board computer and run
>> them under QEMU in emulation mode).  It's not trivial, because there
>> could be some impedence mismatch between the guest (which might be
>> programmed against a low-level controller or might even do bit banging)
>> and the i2c-dev interface which is more high level.  Also QEMU cannot do
>> clock stretching right now.  However, it's certainly doable.
> 
> I agree that would be a cool thing to have in QEMU. Unfortunately I am
> interested in assigning other PCI bus controllers to a guest VM and
> (similar to the i2c example above) in some cases these busses (e.g.
> LPC, SPI) have devices with arbitrary interrupts that need to be
> forwarded into the guest for things to work.
> 
> I realize this may seem like an over-use of VFIO, but I'm actually
> coming from the angle of wanting to assign _most_ of the important
> hardware on my device to a VM guest, and I'm looking to avoid
> emulation wherever possible. Of course there will be devices like the
> IOAPIC for which emulation is unavoidable, but I think emulation is
> avoidable here for the busses we've mentioned if there is a way to
> forward arbitrary interrupts into the guest.
> 
> Since all these use cases are so close to working with vfio-pci right
> out of the box, I was really hoping to come up with a simple and
> generic solution to the arbitrary interrupt problem that can be used
> for multiple bus types.
> 
>>
>>>> (Finally, in the past we were doing device assignment tasks within KVM
>>>> and it was a bad idea.  Anything you want to do within KVM with respect
>>>> to device assignment, someone else will want to do it from bare metal.
>>>
>>> Are you saying people would want to use this in non-virtualized
>>> scenarios like running drivers in userspace without any VMM/guest? And
>>> they could do that if this was part of VFIO and not part of KVM?
>>
>> Yes, see above for an example.
>>
>> Paolo
>>
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] KVM: Add module for IRQ forwarding
  2020-05-15 10:11               ` [RFC PATCH] KVM: Add module for IRQ forwarding Auger Eric
@ 2020-05-15 16:39                 ` Micah Morton
  0 siblings, 0 replies; 19+ messages in thread
From: Micah Morton @ 2020-05-15 16:39 UTC (permalink / raw)
  To: Auger Eric; +Cc: Paolo Bonzini, Alex Williamson, kvm, jmattson

On Fri, May 15, 2020 at 3:11 AM Auger Eric <eric.auger@redhat.com> wrote:
>
> Hi Micah,
>
> On 5/14/20 7:44 PM, Micah Morton wrote:
> > On Wed, May 13, 2020 at 3:05 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >>
> >> On 13/05/20 21:10, Micah Morton wrote:
> >>> * If we only care about the bus controller existing (in an emulated
> >>> fashion) enough for the guest to discover the device in question, this
> >>> could work. I’m concerned that power management could be an issue here
> >>> however. For instance, I have a touchscreen device assigned to the
> >>> guest (irq forwarding done with this module) that in response to the
> >>> screen being touched prepares the i2c controller for a transaction by
> >>> calling into the PM system which end up writing to the PCI config
> >>> space** (here https://elixir.bootlin.com/linux/v5.6.12/source/drivers/i2c/busses/i2c-designware-master.c#L435).
> >>> It seems like this kind of scenario expands the scope of what would
> >>> need to be supported by the emulated i2c controller, which is less
> >>> ideal. The way I have it currently working, vfio-pci emulates the PCI
> >>> config space so the guest can do power management by accessing that
> >>> space.
> >>
> >> This wouldn't be a problem.  When the emulated i2c controller starts a
> >> transaction on th edevice, it will be performed by the host i2c
> >> controller and this will lead to the same config space write.
> >
> > I guess what you're saying is there would be an i2c controller
> > (emulated PCI device) in the guest and the i2c device driver would
> > still call i2c_dw_xfer as above and the execution in the guest would
> > still continue all the way to pci_write_config_word(). Then when the
> > guest executes the actual config write it would trap to the host,
> > which would need to have the logic that the guest is trying to do
> > runtime PM commands on an emulated PCI device so we need to step in
> > and reset the actual PCI device on the host that backs that emulated
> > device. Is this right?
> >
> > Again, this is assuming we have the infrastructure to pass platform
> > devices on x86 to the guest with vfio-platform, which I don't think is
> > the case. +Auger Eric (not sure why gmail puts your name backwards)
> > would you be able to comment on this based on my previous message?
>
> VFIO_PLATFORM only is compiled on ARM today but that's probably not the
> main issue here. I don't know if the fact the platform devices you want
> to assign are behind this PCI I2C controller does change anything in the
> way we would bind the devices to vfio-platform.
>
> Up to now, in QEMU we have only generated DT bindings for the assigned
> platform devices. Generating AML code has never been experienced.
>
> What I don't get in your existing POC is how your enumerate the platform
> devices resources (regs, IRQs) behing your controller. I understand you
> devised a solution to expose the specific IRQ but what about regs? How
> are they presented to your guest?

For the most part I haven't needed to present any extra regs to the
guest beyond the I/O ports / MMIO regions exposed to the guest through
VFIO. My experimentation with passthrough of platform devices behind
bus controllers has been limited to i2c and LPC so far. I did have one
case where the EC device on my machine is behind the LPC controller
and I also needed to additionally expose some I/O ports for the EC
that VFIO wasn't aware of (this is easy with KVM on x86, the VMCS has
a bit map of what I/O ports the guest is allowed to access). I think
this is the best example of what you're referencing that I've seen.
It's a good point. The fact that I've seen I/O ports that VFIO doesn't
know about that need to be made accessible to the guest probably means
there are similar cases for MMIOs. Then again I got most of the
hardware on my machine working in a guest without hitting that issue,
but that's a small sample size.

>
> Thanks
>
> Eric
>
>
> >
> >>
> >> I have another question: would it be possible to expose this IRQ through
> >> /dev/i2c-* instead of messing with VFIO?
> >>
> >> In fact, adding support for /dev/i2c passthrough to QEMU has long been a
> >> pet idea of mine (my usecase was different though: the idea was to write
> >> programs for a microcontroller on an ARM single board computer and run
> >> them under QEMU in emulation mode).  It's not trivial, because there
> >> could be some impedence mismatch between the guest (which might be
> >> programmed against a low-level controller or might even do bit banging)
> >> and the i2c-dev interface which is more high level.  Also QEMU cannot do
> >> clock stretching right now.  However, it's certainly doable.
> >
> > I agree that would be a cool thing to have in QEMU. Unfortunately I am
> > interested in assigning other PCI bus controllers to a guest VM and
> > (similar to the i2c example above) in some cases these busses (e.g.
> > LPC, SPI) have devices with arbitrary interrupts that need to be
> > forwarded into the guest for things to work.
> >
> > I realize this may seem like an over-use of VFIO, but I'm actually
> > coming from the angle of wanting to assign _most_ of the important
> > hardware on my device to a VM guest, and I'm looking to avoid
> > emulation wherever possible. Of course there will be devices like the
> > IOAPIC for which emulation is unavoidable, but I think emulation is
> > avoidable here for the busses we've mentioned if there is a way to
> > forward arbitrary interrupts into the guest.
> >
> > Since all these use cases are so close to working with vfio-pci right
> > out of the box, I was really hoping to come up with a simple and
> > generic solution to the arbitrary interrupt problem that can be used
> > for multiple bus types.
> >
> >>
> >>>> (Finally, in the past we were doing device assignment tasks within KVM
> >>>> and it was a bad idea.  Anything you want to do within KVM with respect
> >>>> to device assignment, someone else will want to do it from bare metal.
> >>>
> >>> Are you saying people would want to use this in non-virtualized
> >>> scenarios like running drivers in userspace without any VMM/guest? And
> >>> they could do that if this was part of VFIO and not part of KVM?
> >>
> >> Yes, see above for an example.
> >>
> >> Paolo
> >>
> >
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] KVM: Add module for IRQ forwarding
  2020-05-14 22:43                 ` Alex Williamson
@ 2020-05-15 20:30                   ` Micah Morton
  2020-06-03 18:23                     ` [PATCH] vfio: PoC patch for printing IRQs used by i2c devices Micah Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Micah Morton @ 2020-05-15 20:30 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Paolo Bonzini, Auger Eric, kvm, jmattson

On Thu, May 14, 2020 at 3:43 PM Alex Williamson
<alex.williamson@redhat.com> wrote:
>
> On Thu, 14 May 2020 23:17:29 +0200
> Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> > On 14/05/20 19:44, Micah Morton wrote:
> > > I realize this may seem like an over-use of VFIO, but I'm actually
> > > coming from the angle of wanting to assign _most_ of the important
> > > hardware on my device to a VM guest, and I'm looking to avoid
> > > emulation wherever possible. Of course there will be devices like the
> > > IOAPIC for which emulation is unavoidable, but I think emulation is
> > > avoidable here for the busses we've mentioned if there is a way to
> > > forward arbitrary interrupts into the guest.
> > >
> > > Since all these use cases are so close to working with vfio-pci right
> > > out of the box, I was really hoping to come up with a simple and
> > > generic solution to the arbitrary interrupt problem that can be used
> > > for multiple bus types.
> >
> > I shall defer to Alex on this, but I think the main issue here is that
> > these interrupts are not visible to Linux as pertaining to the pci-stub
> > device.  Is this correct?
>
> Yes.  Allowing a user to grant themselves access to an arbitrary
> interrupt is a non-starter, vfio-pci needs to somehow know that the
> user is entitled to that interrupt.  If we could do that, then we could
> just add it as a device specific interrupt.  But how do we do that?
>
> The quirk method to this might be to key off of the PCI vendor and
> device ID of the PCI i2c controller, lookup DMI information to know if
> we're on the platform that has this fixed association, and setup the
> extra interrupt.  The more extensible, but potentially bloated solution
> might be for vfio-pci to recognize the class code for a i2c controller
> and implement a very simple bus walk at device probe time that collects
> external dependencies.  I don't really know how the jump is made from
> that bus walk to digging the interrupt resource out of ACPI though or
> how many LoC would be required to perform the minimum possible
> discovery to collect this association.

The quirk method is interesting. I wonder if we have a guarantee for a
given platform and PCI vendor/device which IRQ number/type will be
used. If so that might be an option.

Would you need to do a bus walk? I guess it would be possible to
simply look at ACPI whenever you see a bus controller being assigned
with VFIO. ACPI should tell you about all the sub-devices on that bus
and their IRQ details. This is how vfio-platform (with DT, not ACPI)
knows which IRQs to forward into the guest right? Is there a
fundamental reason this is more difficult on x86 or is the code just
not there since PCI generally precludes the need for this?

>
> I notice in this RFC patch that you're using an exclusive interrupt for
> level triggered interrupts and therefore masking at the APIC.
> Requiring an exclusive interrupt is often a usability issue for PCI
> devices that don't support DisINTx and obviously we don't have that for
> non-PCI sub-devices.  What type of interrupt do you actually need for
> this device?  Thanks,

I don't have any reason to think the interrupts for sub-devices on the
bus would be shared with any other PCI devices or platform devices.
Are you asking if there's any chance the platform IRQs are shared
rather than exclusive or are you saying there's some issue with
presenting the exclusive IRQ to the guest as exclusive through PCI
legacy-style interrupts?

>
> Alex
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] vfio: PoC patch for printing IRQs used by i2c devices
  2020-05-15 20:30                   ` Micah Morton
@ 2020-06-03 18:23                     ` Micah Morton
  2020-06-09 20:19                       ` Micah Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Micah Morton @ 2020-06-03 18:23 UTC (permalink / raw)
  To: alex.williamson; +Cc: pbonzini, eric.auger, kvm, Micah Morton

This patch accesses the ACPI system from vfio_pci_probe to print out
info if the PCI device being attached to vfio_pci is an i2c adapter with
associated i2c client devices that use platform IRQs. The info printed
out is the IRQ numbers that are associated with the given i2c client
devices. This patch doesn't attempt to forward any additional IRQs into
the guest, but shows how it could be possible.

Signed-off-by: Micah Morton <mortonm@chromium.org>

Change-Id: I5c77d35f246781a4a80703820860631e2c2091cf
---
What do you guys think about having code like this somewhere in
vfio_pci? There would have to be some logic added in vfio_pci to forward
these IRQs when they are found. For reference, below is what is printed
out during vfio_pci_probe on my development machine when I attach some
I2C adapter PCI devices to vfio_pci:

[   48.742699] ACPI i2c client device WCOM50C1:00 uses irq 31
[   53.913295] ACPI i2c client device GOOG0005:00 uses irq 24
[   58.040076] ACPI i2c client device ACPI0C50:00 uses irq 51

Ideally we could add code like this for other bus types (e.g. SPI).

NOTE: developed on v5.4

 drivers/vfio/pci/vfio_pci.c | 158 ++++++++++++++++++++++++++++++++++++
 1 file changed, 158 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 02206162eaa9..9ce3f34aa548 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -28,6 +28,10 @@
 #include <linux/vgaarb.h>
 #include <linux/nospec.h>
 
+#include <linux/acpi.h>
+#include <acpi/actypes.h>
+#include <linux/i2c.h>
+
 #include "vfio_pci_private.h"
 
 #define DRIVER_VERSION  "0.2"
@@ -1289,12 +1293,166 @@ static const struct vfio_device_ops vfio_pci_ops = {
 static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev);
 static void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck);
 
+
+struct i2c_acpi_lookup {
+	struct i2c_board_info *info;
+	acpi_handle adapter_handle;
+	acpi_handle device_handle;
+	acpi_handle search_handle;
+	int n;
+	int index;
+	u32 speed;
+	u32 min_speed;
+	u32 force_speed;
+};
+
+static const struct acpi_device_id i2c_acpi_ignored_device_ids[] = { 
+        /*  
+         * ACPI video acpi_devices, which are handled by the acpi-video driver
+         * sometimes contain a SERIAL_TYPE_I2C ACPI resource, ignore these.
+         */
+        { ACPI_VIDEO_HID, 0 },
+        {}  
+};
+
+static int i2c_acpi_get_info_for_node(struct acpi_resource *ares, void *data)
+{
+        struct i2c_acpi_lookup *lookup = data;
+        struct i2c_board_info *info = lookup->info;
+        struct acpi_resource_i2c_serialbus *sb;
+        acpi_status status;
+
+        if (info->addr || !i2c_acpi_get_i2c_resource(ares, &sb))
+                return 1;
+
+        if (lookup->index != -1 && lookup->n++ != lookup->index)
+                return 1;
+
+        status = acpi_get_handle(lookup->device_handle,
+                                 sb->resource_source.string_ptr,
+                                 &lookup->adapter_handle);
+        if (ACPI_FAILURE(status))
+                return 1;
+
+        info->addr = sb->slave_address;
+
+        return 1;
+}
+
+static int print_irq_info_if_i2c_slave(struct acpi_device *adev,
+                              struct i2c_acpi_lookup *lookup, acpi_handle adapter)
+{
+        struct i2c_board_info *info = lookup->info;
+        struct list_head resource_list, resource_list2;
+	struct resource_entry *entry;
+        int ret;
+
+        if (acpi_bus_get_status(adev) || !adev->status.present)
+                return -EINVAL;
+
+        if (acpi_match_device_ids(adev, i2c_acpi_ignored_device_ids) == 0)
+                return -ENODEV;
+
+        memset(info, 0, sizeof(*info));
+        lookup->device_handle = acpi_device_handle(adev);
+
+        /* Look up for I2cSerialBus resource */
+        INIT_LIST_HEAD(&resource_list);
+        ret = acpi_dev_get_resources(adev, &resource_list,
+                                     i2c_acpi_get_info_for_node, lookup);
+
+
+	if (ret < 0 || !info->addr) {
+		acpi_dev_free_resource_list(&resource_list);
+                return -EINVAL;
+	}
+
+        if (adapter) {
+                /* The adapter must match the one in I2cSerialBus() connector */
+                if (adapter != lookup->adapter_handle)
+                        return -ENODEV;
+	}
+
+        INIT_LIST_HEAD(&resource_list2);
+	ret = acpi_dev_get_resources(adev, &resource_list2, NULL, NULL);
+	if (ret < 0)
+		return -EINVAL;
+
+        resource_list_for_each_entry(entry, &resource_list2) {
+                if (resource_type(entry->res) == IORESOURCE_IRQ) {
+                        printk(KERN_EMERG "ACPI i2c client device %s uses irq %d\n",
+						dev_name(&adev->dev), entry->res->start);
+                        break;
+                }
+        }
+
+        acpi_dev_free_resource_list(&resource_list2);
+
+        return 0;
+}
+
+static int i2c_acpi_get_info(struct acpi_device *adev,
+                             struct i2c_board_info *info,
+                             acpi_handle adapter,
+                             acpi_handle *adapter_handle)
+{
+        struct i2c_acpi_lookup lookup;
+
+
+        memset(&lookup, 0, sizeof(lookup));
+        lookup.info = info;
+        lookup.index = -1;
+
+        if (acpi_device_enumerated(adev))
+                return -EINVAL;
+
+        return print_irq_info_if_i2c_slave(adev, &lookup, adapter);
+}
+
+static acpi_status process_acpi_node(acpi_handle handle, u32 level,
+                                       void *data, void **return_value)
+{
+        acpi_handle adapter = data;
+        struct acpi_device *adev;
+        struct i2c_board_info info;
+
+        if (acpi_bus_get_device(handle, &adev))
+                return AE_OK;
+
+        if (i2c_acpi_get_info(adev, &info, adapter, NULL))
+                return AE_OK;
+
+        return AE_OK;
+}
+
+#define MAX_SCAN_DEPTH 32
+
+void acpi_print_irqs_if_i2c(acpi_handle handle)
+{
+        acpi_status status;
+
+        status = acpi_walk_namespace(ACPI_TYPE_DEVICE, handle,
+                                     MAX_SCAN_DEPTH,
+                                     process_acpi_node, NULL,
+                                     handle, NULL);
+        if (ACPI_FAILURE(status))
+                printk(KERN_EMERG "failed to enumerate ACPI devices\n");
+}
+
+static void print_irqs_if_i2c_adapter(struct device *dev) {
+	acpi_handle handle = ACPI_HANDLE(dev);
+	acpi_print_irqs_if_i2c(handle);
+}
+
 static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	struct vfio_pci_device *vdev;
 	struct iommu_group *group;
 	int ret;
 
+        if (has_acpi_companion(&pdev->dev))
+		print_irqs_if_i2c_adapter(&pdev->dev);
+
 	if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
 		return -EINVAL;
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] vfio: PoC patch for printing IRQs used by i2c devices
  2020-06-03 18:23                     ` [PATCH] vfio: PoC patch for printing IRQs used by i2c devices Micah Morton
@ 2020-06-09 20:19                       ` Micah Morton
  2020-06-19 18:51                         ` Alex Williamson
  0 siblings, 1 reply; 19+ messages in thread
From: Micah Morton @ 2020-06-09 20:19 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Paolo Bonzini, Auger Eric, kvm

On Wed, Jun 3, 2020 at 11:23 AM Micah Morton <mortonm@chromium.org> wrote:
>
> This patch accesses the ACPI system from vfio_pci_probe to print out
> info if the PCI device being attached to vfio_pci is an i2c adapter with
> associated i2c client devices that use platform IRQs. The info printed
> out is the IRQ numbers that are associated with the given i2c client
> devices. This patch doesn't attempt to forward any additional IRQs into
> the guest, but shows how it could be possible.
>
> Signed-off-by: Micah Morton <mortonm@chromium.org>
>
> Change-Id: I5c77d35f246781a4a80703820860631e2c2091cf
> ---
> What do you guys think about having code like this somewhere in
> vfio_pci? There would have to be some logic added in vfio_pci to forward
> these IRQs when they are found. For reference, below is what is printed
> out during vfio_pci_probe on my development machine when I attach some
> I2C adapter PCI devices to vfio_pci:
>
> [   48.742699] ACPI i2c client device WCOM50C1:00 uses irq 31
> [   53.913295] ACPI i2c client device GOOG0005:00 uses irq 24
> [   58.040076] ACPI i2c client device ACPI0C50:00 uses irq 51
>
> Ideally we could add code like this for other bus types (e.g. SPI).
>
> NOTE: developed on v5.4
>
>  drivers/vfio/pci/vfio_pci.c | 158 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 158 insertions(+)
>
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 02206162eaa9..9ce3f34aa548 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -28,6 +28,10 @@
>  #include <linux/vgaarb.h>
>  #include <linux/nospec.h>
>
> +#include <linux/acpi.h>
> +#include <acpi/actypes.h>
> +#include <linux/i2c.h>
> +
>  #include "vfio_pci_private.h"
>
>  #define DRIVER_VERSION  "0.2"
> @@ -1289,12 +1293,166 @@ static const struct vfio_device_ops vfio_pci_ops = {
>  static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev);
>  static void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck);
>
> +
> +struct i2c_acpi_lookup {
> +       struct i2c_board_info *info;
> +       acpi_handle adapter_handle;
> +       acpi_handle device_handle;
> +       acpi_handle search_handle;
> +       int n;
> +       int index;
> +       u32 speed;
> +       u32 min_speed;
> +       u32 force_speed;
> +};
> +
> +static const struct acpi_device_id i2c_acpi_ignored_device_ids[] = {
> +        /*
> +         * ACPI video acpi_devices, which are handled by the acpi-video driver
> +         * sometimes contain a SERIAL_TYPE_I2C ACPI resource, ignore these.
> +         */
> +        { ACPI_VIDEO_HID, 0 },
> +        {}
> +};
> +
> +static int i2c_acpi_get_info_for_node(struct acpi_resource *ares, void *data)
> +{
> +        struct i2c_acpi_lookup *lookup = data;
> +        struct i2c_board_info *info = lookup->info;
> +        struct acpi_resource_i2c_serialbus *sb;
> +        acpi_status status;
> +
> +        if (info->addr || !i2c_acpi_get_i2c_resource(ares, &sb))
> +                return 1;
> +
> +        if (lookup->index != -1 && lookup->n++ != lookup->index)
> +                return 1;
> +
> +        status = acpi_get_handle(lookup->device_handle,
> +                                 sb->resource_source.string_ptr,
> +                                 &lookup->adapter_handle);
> +        if (ACPI_FAILURE(status))
> +                return 1;
> +
> +        info->addr = sb->slave_address;
> +
> +        return 1;
> +}
> +
> +static int print_irq_info_if_i2c_slave(struct acpi_device *adev,
> +                              struct i2c_acpi_lookup *lookup, acpi_handle adapter)
> +{
> +        struct i2c_board_info *info = lookup->info;
> +        struct list_head resource_list, resource_list2;
> +       struct resource_entry *entry;
> +        int ret;
> +
> +        if (acpi_bus_get_status(adev) || !adev->status.present)
> +                return -EINVAL;
> +
> +        if (acpi_match_device_ids(adev, i2c_acpi_ignored_device_ids) == 0)
> +                return -ENODEV;
> +
> +        memset(info, 0, sizeof(*info));
> +        lookup->device_handle = acpi_device_handle(adev);
> +
> +        /* Look up for I2cSerialBus resource */
> +        INIT_LIST_HEAD(&resource_list);
> +        ret = acpi_dev_get_resources(adev, &resource_list,
> +                                     i2c_acpi_get_info_for_node, lookup);
> +
> +
> +       if (ret < 0 || !info->addr) {
> +               acpi_dev_free_resource_list(&resource_list);
> +                return -EINVAL;
> +       }
> +
> +        if (adapter) {
> +                /* The adapter must match the one in I2cSerialBus() connector */
> +                if (adapter != lookup->adapter_handle)
> +                        return -ENODEV;
> +       }
> +
> +        INIT_LIST_HEAD(&resource_list2);
> +       ret = acpi_dev_get_resources(adev, &resource_list2, NULL, NULL);
> +       if (ret < 0)
> +               return -EINVAL;
> +
> +        resource_list_for_each_entry(entry, &resource_list2) {
> +                if (resource_type(entry->res) == IORESOURCE_IRQ) {

In reference to Eric's question, it's also simple to check for other
types of resources here (like IORESOURCE_IO (I/O port) or
IORESOURCE_MEM (MMIO)), which would allow for making those types of
resources for devices behind the bus controller available to the guest
as well. I think MMIO would be simple enough. I/O ports would require
a way for VFIO to tell KVM to set the right bits in the VMCS I/O port
bitmap* when initializing the guest so that the guest can access the
needed I/O ports without trapping to the host. Or I/O ports could just
be trapped and then the read/write carried out in the host.

Alex, does it seem reasonable to consult ACPI in this way and use this
info to make sub-device resources available in the guest? I don't
anticipate a case where any sub-devices are DMA-capable and could
circumvent IOMMU translation to write to host memory. Do you see any
other potential pitfalls? I guess the only drawback I see is needing
code in the VMM to copy chunks of the ACPI tables from host to guest
to tell it about the sub-devices -- but that's pretty doable.

I think this might be Intel specific, would have to check how to do
the equivalent thing on AMD.

> +                        printk(KERN_EMERG "ACPI i2c client device %s uses irq %d\n",
> +                                               dev_name(&adev->dev), entry->res->start);
> +                        break;
> +                }
> +        }
> +
> +        acpi_dev_free_resource_list(&resource_list2);
> +
> +        return 0;
> +}
> +
> +static int i2c_acpi_get_info(struct acpi_device *adev,
> +                             struct i2c_board_info *info,
> +                             acpi_handle adapter,
> +                             acpi_handle *adapter_handle)
> +{
> +        struct i2c_acpi_lookup lookup;
> +
> +
> +        memset(&lookup, 0, sizeof(lookup));
> +        lookup.info = info;
> +        lookup.index = -1;
> +
> +        if (acpi_device_enumerated(adev))
> +                return -EINVAL;
> +
> +        return print_irq_info_if_i2c_slave(adev, &lookup, adapter);
> +}
> +
> +static acpi_status process_acpi_node(acpi_handle handle, u32 level,
> +                                       void *data, void **return_value)
> +{
> +        acpi_handle adapter = data;
> +        struct acpi_device *adev;
> +        struct i2c_board_info info;
> +
> +        if (acpi_bus_get_device(handle, &adev))
> +                return AE_OK;
> +
> +        if (i2c_acpi_get_info(adev, &info, adapter, NULL))
> +                return AE_OK;
> +
> +        return AE_OK;
> +}
> +
> +#define MAX_SCAN_DEPTH 32
> +
> +void acpi_print_irqs_if_i2c(acpi_handle handle)
> +{
> +        acpi_status status;
> +
> +        status = acpi_walk_namespace(ACPI_TYPE_DEVICE, handle,
> +                                     MAX_SCAN_DEPTH,
> +                                     process_acpi_node, NULL,
> +                                     handle, NULL);
> +        if (ACPI_FAILURE(status))
> +                printk(KERN_EMERG "failed to enumerate ACPI devices\n");
> +}
> +
> +static void print_irqs_if_i2c_adapter(struct device *dev) {
> +       acpi_handle handle = ACPI_HANDLE(dev);
> +       acpi_print_irqs_if_i2c(handle);
> +}
> +
>  static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  {
>         struct vfio_pci_device *vdev;
>         struct iommu_group *group;
>         int ret;
>
> +        if (has_acpi_companion(&pdev->dev))
> +               print_irqs_if_i2c_adapter(&pdev->dev);
> +
>         if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
>                 return -EINVAL;
>
> --
> 2.26.2
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] vfio: PoC patch for printing IRQs used by i2c devices
  2020-06-09 20:19                       ` Micah Morton
@ 2020-06-19 18:51                         ` Alex Williamson
  2020-06-19 20:00                           ` Micah Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Alex Williamson @ 2020-06-19 18:51 UTC (permalink / raw)
  To: Micah Morton; +Cc: Paolo Bonzini, Auger Eric, kvm

On Tue, 9 Jun 2020 13:19:19 -0700
Micah Morton <mortonm@chromium.org> wrote:

> On Wed, Jun 3, 2020 at 11:23 AM Micah Morton <mortonm@chromium.org> wrote:
> >
> > This patch accesses the ACPI system from vfio_pci_probe to print out
> > info if the PCI device being attached to vfio_pci is an i2c adapter with
> > associated i2c client devices that use platform IRQs. The info printed
> > out is the IRQ numbers that are associated with the given i2c client
> > devices. This patch doesn't attempt to forward any additional IRQs into
> > the guest, but shows how it could be possible.
> >
> > Signed-off-by: Micah Morton <mortonm@chromium.org>
> >
> > Change-Id: I5c77d35f246781a4a80703820860631e2c2091cf
> > ---
> > What do you guys think about having code like this somewhere in
> > vfio_pci? There would have to be some logic added in vfio_pci to forward
> > these IRQs when they are found. For reference, below is what is printed
> > out during vfio_pci_probe on my development machine when I attach some
> > I2C adapter PCI devices to vfio_pci:
> >
> > [   48.742699] ACPI i2c client device WCOM50C1:00 uses irq 31
> > [   53.913295] ACPI i2c client device GOOG0005:00 uses irq 24
> > [   58.040076] ACPI i2c client device ACPI0C50:00 uses irq 51
> >
> > Ideally we could add code like this for other bus types (e.g. SPI).
> >
> > NOTE: developed on v5.4

Sorry for the delay, this took some time to reach the top of the heap
and process.

> >  drivers/vfio/pci/vfio_pci.c | 158 ++++++++++++++++++++++++++++++++++++

I think we'd want a separate file with Kconfig options to turn it off
if we were to consider adding this sort of thing.  The ACPI
dependencies may not be present on all platforms anyway.

> >  1 file changed, 158 insertions(+)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > index 02206162eaa9..9ce3f34aa548 100644
> > --- a/drivers/vfio/pci/vfio_pci.c
> > +++ b/drivers/vfio/pci/vfio_pci.c
> > @@ -28,6 +28,10 @@
> >  #include <linux/vgaarb.h>
> >  #include <linux/nospec.h>
> >
> > +#include <linux/acpi.h>
> > +#include <acpi/actypes.h>
> > +#include <linux/i2c.h>
> > +
> >  #include "vfio_pci_private.h"
> >
> >  #define DRIVER_VERSION  "0.2"
> > @@ -1289,12 +1293,166 @@ static const struct vfio_device_ops vfio_pci_ops = {
> >  static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev);
> >  static void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck);
> >
> > +
> > +struct i2c_acpi_lookup {
> > +       struct i2c_board_info *info;
> > +       acpi_handle adapter_handle;
> > +       acpi_handle device_handle;
> > +       acpi_handle search_handle;
> > +       int n;
> > +       int index;
> > +       u32 speed;
> > +       u32 min_speed;
> > +       u32 force_speed;
> > +};
> > +
> > +static const struct acpi_device_id i2c_acpi_ignored_device_ids[] = {
> > +        /*
> > +         * ACPI video acpi_devices, which are handled by the acpi-video driver
> > +         * sometimes contain a SERIAL_TYPE_I2C ACPI resource, ignore these.
> > +         */
> > +        { ACPI_VIDEO_HID, 0 },

I don't understand this, why are these devices special?  What other
devices might be special?  If the behavior of a host driver that's
independent of the PCI device driver makes it special, that seems like
a general problem with this approach, we can't have userspace and host
drivers competing for the same resources.

> > +        {}
> > +};
> > +
> > +static int i2c_acpi_get_info_for_node(struct acpi_resource *ares, void *data)
> > +{
> > +        struct i2c_acpi_lookup *lookup = data;
> > +        struct i2c_board_info *info = lookup->info;
> > +        struct acpi_resource_i2c_serialbus *sb;
> > +        acpi_status status;
> > +
> > +        if (info->addr || !i2c_acpi_get_i2c_resource(ares, &sb))
> > +                return 1;
> > +
> > +        if (lookup->index != -1 && lookup->n++ != lookup->index)
> > +                return 1;
> > +
> > +        status = acpi_get_handle(lookup->device_handle,
> > +                                 sb->resource_source.string_ptr,
> > +                                 &lookup->adapter_handle);
> > +        if (ACPI_FAILURE(status))
> > +                return 1;
> > +
> > +        info->addr = sb->slave_address;
> > +
> > +        return 1;
> > +}
> > +
> > +static int print_irq_info_if_i2c_slave(struct acpi_device *adev,
> > +                              struct i2c_acpi_lookup *lookup, acpi_handle adapter)
> > +{
> > +        struct i2c_board_info *info = lookup->info;
> > +        struct list_head resource_list, resource_list2;
> > +       struct resource_entry *entry;
> > +        int ret;
> > +
> > +        if (acpi_bus_get_status(adev) || !adev->status.present)
> > +                return -EINVAL;
> > +
> > +        if (acpi_match_device_ids(adev, i2c_acpi_ignored_device_ids) == 0)
> > +                return -ENODEV;
> > +
> > +        memset(info, 0, sizeof(*info));
> > +        lookup->device_handle = acpi_device_handle(adev);
> > +
> > +        /* Look up for I2cSerialBus resource */
> > +        INIT_LIST_HEAD(&resource_list);
> > +        ret = acpi_dev_get_resources(adev, &resource_list,
> > +                                     i2c_acpi_get_info_for_node, lookup);
> > +
> > +
> > +       if (ret < 0 || !info->addr) {
> > +               acpi_dev_free_resource_list(&resource_list);
> > +                return -EINVAL;
> > +       }
> > +
> > +        if (adapter) {
> > +                /* The adapter must match the one in I2cSerialBus() connector */
> > +                if (adapter != lookup->adapter_handle)
> > +                        return -ENODEV;
> > +       }
> > +
> > +        INIT_LIST_HEAD(&resource_list2);
> > +       ret = acpi_dev_get_resources(adev, &resource_list2, NULL, NULL);
> > +       if (ret < 0)
> > +               return -EINVAL;
> > +
> > +        resource_list_for_each_entry(entry, &resource_list2) {

Let me see if I understand what's happening here, first we walk all the
ACPI resources of the device to verify that it does have a
ACPI_RESOURCE_SERIAL_TYPE_I2C and we're able to get a handle for that
pathname.  info->addr is some throw-away data we use to determine
success (seems like we could return -1 on ACPI_FAILURE to stop
iterating at that point as we won't get past the index check for
anything else, and avoid info->addr).  Then once we know the device
includes the necessary i2c type, we collect all the resources (for some
reason using a different list even though our previous callback only
returns 1 so will never have anything added to the list).  Right?

> > +                if (resource_type(entry->res) == IORESOURCE_IRQ) {  
> 
> In reference to Eric's question, it's also simple to check for other
> types of resources here (like IORESOURCE_IO (I/O port) or
> IORESOURCE_MEM (MMIO)), which would allow for making those types of
> resources for devices behind the bus controller available to the guest
> as well. I think MMIO would be simple enough. I/O ports would require
> a way for VFIO to tell KVM to set the right bits in the VMCS I/O port
> bitmap* when initializing the guest so that the guest can access the
> needed I/O ports without trapping to the host. Or I/O ports could just
> be trapped and then the read/write carried out in the host.

The latter is how we support I/O port space, VFIO is a userspace driver
interface, not a KVM device interface.  I/O port access has not been
shown to be worth any further optimization.
 
> Alex, does it seem reasonable to consult ACPI in this way and use this
> info to make sub-device resources available in the guest? I don't
> anticipate a case where any sub-devices are DMA-capable and could
> circumvent IOMMU translation to write to host memory. Do you see any
> other potential pitfalls? I guess the only drawback I see is needing
> code in the VMM to copy chunks of the ACPI tables from host to guest
> to tell it about the sub-devices -- but that's pretty doable.

Is there a concern about shared versus exclusive resources?  For
example an interrupt could be shared among multiple i2c endpoints,
right?  We don't know who it might be shared with and we don't know
enough about our endpoint to know how to determine if our endpoint is
signaling the interrupt or how to mask it from signaling the interrupt
at the device level.  I assume where we handle SET_IRQS for this device
specific interrupt we'd therefore never use IRQF_SHARED, such that we
require an exclusive interrupt.  MMIO and I/O port resources seem
straightforward to expose as device specific regions, but makes me
curious what we're really exposing.  I'm nervous that we're blurring
the line between one device and another, for example if an i2c endpoint
is only accessible via the i2c controller on the PCI device then we can
claim the user owns those sub-devices, but if the device responds to
MMIO or I/O port space independent of the ownership of the PCI i2c
controller itself, how do we know we're not conflicting with an ACPI
based driver that's already attached to this sub-device?  Is this
essentially the concern with the ACPI_VIDEO_HID exclusion above?  Do we
need to take the same approach of requiring an exclusive interrupt by
using request_resource() and error on conflicts?

Seems like it might be prudent to not only have a Kconfig but require
and opt-in for this support at the vfio-pci module level.

Somehow we also need to factor in usability, for example how do we
provide enough context in describing the device specific region/irq to
the user that they can associate the object and perhaps avoid
independently parsing AML or providing a magic firmware blob.

> I think this might be Intel specific, would have to check how to do
> the equivalent thing on AMD.

This is all generic ACPI though, are you only suggesting different
conventions between the firmware implementations between vendors?
 
> > +                        printk(KERN_EMERG "ACPI i2c client device %s uses irq %d\n",
> > +                                               dev_name(&adev->dev), entry->res->start);
> > +                        break;
> > +                }
> > +        }
> > +
> > +        acpi_dev_free_resource_list(&resource_list2);
> > +
> > +        return 0;
> > +}
> > +
> > +static int i2c_acpi_get_info(struct acpi_device *adev,
> > +                             struct i2c_board_info *info,
> > +                             acpi_handle adapter,
> > +                             acpi_handle *adapter_handle)
> > +{
> > +        struct i2c_acpi_lookup lookup;
> > +
> > +
> > +        memset(&lookup, 0, sizeof(lookup));
> > +        lookup.info = info;
> > +        lookup.index = -1;
> > +
> > +        if (acpi_device_enumerated(adev))
> > +                return -EINVAL;
> > +
> > +        return print_irq_info_if_i2c_slave(adev, &lookup, adapter);
> > +}
> > +
> > +static acpi_status process_acpi_node(acpi_handle handle, u32 level,
> > +                                       void *data, void **return_value)
> > +{
> > +        acpi_handle adapter = data;
> > +        struct acpi_device *adev;
> > +        struct i2c_board_info info;
> > +
> > +        if (acpi_bus_get_device(handle, &adev))
> > +                return AE_OK;
> > +
> > +        if (i2c_acpi_get_info(adev, &info, adapter, NULL))
> > +                return AE_OK;
> > +
> > +        return AE_OK;
> > +}
> > +
> > +#define MAX_SCAN_DEPTH 32

Arbitrary?  Thanks,

Alex

> > +
> > +void acpi_print_irqs_if_i2c(acpi_handle handle)
> > +{
> > +        acpi_status status;
> > +
> > +        status = acpi_walk_namespace(ACPI_TYPE_DEVICE, handle,
> > +                                     MAX_SCAN_DEPTH,
> > +                                     process_acpi_node, NULL,
> > +                                     handle, NULL);
> > +        if (ACPI_FAILURE(status))
> > +                printk(KERN_EMERG "failed to enumerate ACPI devices\n");
> > +}
> > +
> > +static void print_irqs_if_i2c_adapter(struct device *dev) {
> > +       acpi_handle handle = ACPI_HANDLE(dev);
> > +       acpi_print_irqs_if_i2c(handle);
> > +}
> > +
> >  static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >  {
> >         struct vfio_pci_device *vdev;
> >         struct iommu_group *group;
> >         int ret;
> >
> > +        if (has_acpi_companion(&pdev->dev))
> > +               print_irqs_if_i2c_adapter(&pdev->dev);
> > +
> >         if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
> >                 return -EINVAL;
> >
> > --
> > 2.26.2
> >  
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] vfio: PoC patch for printing IRQs used by i2c devices
  2020-06-19 18:51                         ` Alex Williamson
@ 2020-06-19 20:00                           ` Micah Morton
  2020-06-22 21:59                             ` Micah Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Micah Morton @ 2020-06-19 20:00 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Paolo Bonzini, Auger Eric, kvm

On Fri, Jun 19, 2020 at 11:51 AM Alex Williamson
<alex.williamson@redhat.com> wrote:
>
> On Tue, 9 Jun 2020 13:19:19 -0700
> Micah Morton <mortonm@chromium.org> wrote:
>
> > On Wed, Jun 3, 2020 at 11:23 AM Micah Morton <mortonm@chromium.org> wrote:
> > >
> > > This patch accesses the ACPI system from vfio_pci_probe to print out
> > > info if the PCI device being attached to vfio_pci is an i2c adapter with
> > > associated i2c client devices that use platform IRQs. The info printed
> > > out is the IRQ numbers that are associated with the given i2c client
> > > devices. This patch doesn't attempt to forward any additional IRQs into
> > > the guest, but shows how it could be possible.
> > >
> > > Signed-off-by: Micah Morton <mortonm@chromium.org>
> > >
> > > Change-Id: I5c77d35f246781a4a80703820860631e2c2091cf
> > > ---
> > > What do you guys think about having code like this somewhere in
> > > vfio_pci? There would have to be some logic added in vfio_pci to forward
> > > these IRQs when they are found. For reference, below is what is printed
> > > out during vfio_pci_probe on my development machine when I attach some
> > > I2C adapter PCI devices to vfio_pci:
> > >
> > > [   48.742699] ACPI i2c client device WCOM50C1:00 uses irq 31
> > > [   53.913295] ACPI i2c client device GOOG0005:00 uses irq 24
> > > [   58.040076] ACPI i2c client device ACPI0C50:00 uses irq 51
> > >
> > > Ideally we could add code like this for other bus types (e.g. SPI).
> > >
> > > NOTE: developed on v5.4
>
> Sorry for the delay, this took some time to reach the top of the heap
> and process.
>
> > >  drivers/vfio/pci/vfio_pci.c | 158 ++++++++++++++++++++++++++++++++++++
>
> I think we'd want a separate file with Kconfig options to turn it off
> if we were to consider adding this sort of thing.  The ACPI
> dependencies may not be present on all platforms anyway.

Agreed.

>
> > >  1 file changed, 158 insertions(+)
> > >
> > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > > index 02206162eaa9..9ce3f34aa548 100644
> > > --- a/drivers/vfio/pci/vfio_pci.c
> > > +++ b/drivers/vfio/pci/vfio_pci.c
> > > @@ -28,6 +28,10 @@
> > >  #include <linux/vgaarb.h>
> > >  #include <linux/nospec.h>
> > >
> > > +#include <linux/acpi.h>
> > > +#include <acpi/actypes.h>
> > > +#include <linux/i2c.h>
> > > +
> > >  #include "vfio_pci_private.h"
> > >
> > >  #define DRIVER_VERSION  "0.2"
> > > @@ -1289,12 +1293,166 @@ static const struct vfio_device_ops vfio_pci_ops = {
> > >  static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev);
> > >  static void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck);
> > >
> > > +
> > > +struct i2c_acpi_lookup {
> > > +       struct i2c_board_info *info;
> > > +       acpi_handle adapter_handle;
> > > +       acpi_handle device_handle;
> > > +       acpi_handle search_handle;
> > > +       int n;
> > > +       int index;
> > > +       u32 speed;
> > > +       u32 min_speed;
> > > +       u32 force_speed;
> > > +};
> > > +
> > > +static const struct acpi_device_id i2c_acpi_ignored_device_ids[] = {
> > > +        /*
> > > +         * ACPI video acpi_devices, which are handled by the acpi-video driver
> > > +         * sometimes contain a SERIAL_TYPE_I2C ACPI resource, ignore these.
> > > +         */
> > > +        { ACPI_VIDEO_HID, 0 },
>
> I don't understand this, why are these devices special?  What other
> devices might be special?  If the behavior of a host driver that's
> independent of the PCI device driver makes it special, that seems like
> a general problem with this approach, we can't have userspace and host
> drivers competing for the same resources.

Sorry, I just copied this code from drivers/i2c/i2c-core-acpi.c. It
should have been removed. I think it is irrelevant to our discussion.
In that case the i2c code is looking for i2c clients to register in
the system and wants to avoid registering certain blacklisted devices
and avoid attaching to them. This is not relevant to simply passing
through the irq/MMIO/io port resources for an i2c client device. VFIO
wouldn't actually be trying to attach any driver to the i2c clients in
this case, and would defer to the host to choose what it wants to
attach to.

>
> > > +        {}
> > > +};
> > > +
> > > +static int i2c_acpi_get_info_for_node(struct acpi_resource *ares, void *data)
> > > +{
> > > +        struct i2c_acpi_lookup *lookup = data;
> > > +        struct i2c_board_info *info = lookup->info;
> > > +        struct acpi_resource_i2c_serialbus *sb;
> > > +        acpi_status status;
> > > +
> > > +        if (info->addr || !i2c_acpi_get_i2c_resource(ares, &sb))
> > > +                return 1;
> > > +
> > > +        if (lookup->index != -1 && lookup->n++ != lookup->index)
> > > +                return 1;
> > > +
> > > +        status = acpi_get_handle(lookup->device_handle,
> > > +                                 sb->resource_source.string_ptr,
> > > +                                 &lookup->adapter_handle);
> > > +        if (ACPI_FAILURE(status))
> > > +                return 1;
> > > +
> > > +        info->addr = sb->slave_address;
> > > +
> > > +        return 1;
> > > +}
> > > +
> > > +static int print_irq_info_if_i2c_slave(struct acpi_device *adev,
> > > +                              struct i2c_acpi_lookup *lookup, acpi_handle adapter)
> > > +{
> > > +        struct i2c_board_info *info = lookup->info;
> > > +        struct list_head resource_list, resource_list2;
> > > +       struct resource_entry *entry;
> > > +        int ret;
> > > +
> > > +        if (acpi_bus_get_status(adev) || !adev->status.present)
> > > +                return -EINVAL;
> > > +
> > > +        if (acpi_match_device_ids(adev, i2c_acpi_ignored_device_ids) == 0)
> > > +                return -ENODEV;
> > > +
> > > +        memset(info, 0, sizeof(*info));
> > > +        lookup->device_handle = acpi_device_handle(adev);
> > > +
> > > +        /* Look up for I2cSerialBus resource */
> > > +        INIT_LIST_HEAD(&resource_list);
> > > +        ret = acpi_dev_get_resources(adev, &resource_list,
> > > +                                     i2c_acpi_get_info_for_node, lookup);
> > > +
> > > +
> > > +       if (ret < 0 || !info->addr) {
> > > +               acpi_dev_free_resource_list(&resource_list);
> > > +                return -EINVAL;
> > > +       }
> > > +
> > > +        if (adapter) {
> > > +                /* The adapter must match the one in I2cSerialBus() connector */
> > > +                if (adapter != lookup->adapter_handle)
> > > +                        return -ENODEV;
> > > +       }
> > > +
> > > +        INIT_LIST_HEAD(&resource_list2);
> > > +       ret = acpi_dev_get_resources(adev, &resource_list2, NULL, NULL);
> > > +       if (ret < 0)
> > > +               return -EINVAL;
> > > +
> > > +        resource_list_for_each_entry(entry, &resource_list2) {
>
> Let me see if I understand what's happening here, first we walk all the
> ACPI resources of the device to verify that it does have a
> ACPI_RESOURCE_SERIAL_TYPE_I2C and we're able to get a handle for that
> pathname.  info->addr is some throw-away data we use to determine
> success (seems like we could return -1 on ACPI_FAILURE to stop
> iterating at that point as we won't get past the index check for
> anything else, and avoid info->addr).  Then once we know the device
> includes the necessary i2c type, we collect all the resources (for some
> reason using a different list even though our previous callback only
> returns 1 so will never have anything added to the list).  Right?

Yeah, its sloppy PoC code yanked out of drivers/i2c/i2c-core-acpi.c
for the most part. I tried to cut out the easy parts of the code that
didn't need to be there, but there would obviously be more work to do
in optimizing the code for what we would actually want to happen in
VFIO. Seems like we're on the same page on all the important points
though.

>
> > > +                if (resource_type(entry->res) == IORESOURCE_IRQ) {
> >
> > In reference to Eric's question, it's also simple to check for other
> > types of resources here (like IORESOURCE_IO (I/O port) or
> > IORESOURCE_MEM (MMIO)), which would allow for making those types of
> > resources for devices behind the bus controller available to the guest
> > as well. I think MMIO would be simple enough. I/O ports would require
> > a way for VFIO to tell KVM to set the right bits in the VMCS I/O port
> > bitmap* when initializing the guest so that the guest can access the
> > needed I/O ports without trapping to the host. Or I/O ports could just
> > be trapped and then the read/write carried out in the host.
>
> The latter is how we support I/O port space, VFIO is a userspace driver
> interface, not a KVM device interface.  I/O port access has not been
> shown to be worth any further optimization.

Makes sense, thanks.

>
> > Alex, does it seem reasonable to consult ACPI in this way and use this
> > info to make sub-device resources available in the guest? I don't
> > anticipate a case where any sub-devices are DMA-capable and could
> > circumvent IOMMU translation to write to host memory. Do you see any
> > other potential pitfalls? I guess the only drawback I see is needing
> > code in the VMM to copy chunks of the ACPI tables from host to guest
> > to tell it about the sub-devices -- but that's pretty doable.
>
> Is there a concern about shared versus exclusive resources?  For
> example an interrupt could be shared among multiple i2c endpoints,
> right?  We don't know who it might be shared with and we don't know
> enough about our endpoint to know how to determine if our endpoint is
> signaling the interrupt or how to mask it from signaling the interrupt
> at the device level.  I assume where we handle SET_IRQS for this device
> specific interrupt we'd therefore never use IRQF_SHARED, such that we
> require an exclusive interrupt.  MMIO and I/O port resources seem
> straightforward to expose as device specific regions, but makes me
> curious what we're really exposing.  I'm nervous that we're blurring
> the line between one device and another, for example if an i2c endpoint
> is only accessible via the i2c controller on the PCI device then we can
> claim the user owns those sub-devices, but if the device responds to
> MMIO or I/O port space independent of the ownership of the PCI i2c
> controller itself, how do we know we're not conflicting with an ACPI
> based driver that's already attached to this sub-device?  Is this
> essentially the concern with the ACPI_VIDEO_HID exclusion above?  Do we
> need to take the same approach of requiring an exclusive interrupt by
> using request_resource() and error on conflicts?

I'm going to do some further thinking/research on this and will reply
sometime early next week. Answering all the other questions now
though.

>
> Seems like it might be prudent to not only have a Kconfig but require
> and opt-in for this support at the vfio-pci module level.

Agreed.

>
> Somehow we also need to factor in usability, for example how do we
> provide enough context in describing the device specific region/irq to
> the user that they can associate the object and perhaps avoid
> independently parsing AML or providing a magic firmware blob.
>
> > I think this might be Intel specific, would have to check how to do
> > the equivalent thing on AMD.
>
> This is all generic ACPI though, are you only suggesting different
> conventions between the firmware implementations between vendors?

Sorry, there was supposed to be an asterisk before that line pointing
to the VMCS I/O port bitmap stuff. All I was saying is that the VMCS
stuff may be Intel-specific. Doesn't really matter however as you
pointed out its better just to emulate the I/O port access anyway.

>
> > > +                        printk(KERN_EMERG "ACPI i2c client device %s uses irq %d\n",
> > > +                                               dev_name(&adev->dev), entry->res->start);
> > > +                        break;
> > > +                }
> > > +        }
> > > +
> > > +        acpi_dev_free_resource_list(&resource_list2);
> > > +
> > > +        return 0;
> > > +}
> > > +
> > > +static int i2c_acpi_get_info(struct acpi_device *adev,
> > > +                             struct i2c_board_info *info,
> > > +                             acpi_handle adapter,
> > > +                             acpi_handle *adapter_handle)
> > > +{
> > > +        struct i2c_acpi_lookup lookup;
> > > +
> > > +
> > > +        memset(&lookup, 0, sizeof(lookup));
> > > +        lookup.info = info;
> > > +        lookup.index = -1;
> > > +
> > > +        if (acpi_device_enumerated(adev))
> > > +                return -EINVAL;
> > > +
> > > +        return print_irq_info_if_i2c_slave(adev, &lookup, adapter);
> > > +}
> > > +
> > > +static acpi_status process_acpi_node(acpi_handle handle, u32 level,
> > > +                                       void *data, void **return_value)
> > > +{
> > > +        acpi_handle adapter = data;
> > > +        struct acpi_device *adev;
> > > +        struct i2c_board_info info;
> > > +
> > > +        if (acpi_bus_get_device(handle, &adev))
> > > +                return AE_OK;
> > > +
> > > +        if (i2c_acpi_get_info(adev, &info, adapter, NULL))
> > > +                return AE_OK;
> > > +
> > > +        return AE_OK;
> > > +}
> > > +
> > > +#define MAX_SCAN_DEPTH 32
>
> Arbitrary?  Thanks,

Again just copied what was done in drivers/i2c/i2c-core-acpi.c. Not
really sure how they came up with that number. Seems potentially
arbitrary.

>
> Alex

Thanks for the reply, I'll get back to you with my thoughts on shared
vs exclusive resource stuff.

>
> > > +
> > > +void acpi_print_irqs_if_i2c(acpi_handle handle)
> > > +{
> > > +        acpi_status status;
> > > +
> > > +        status = acpi_walk_namespace(ACPI_TYPE_DEVICE, handle,
> > > +                                     MAX_SCAN_DEPTH,
> > > +                                     process_acpi_node, NULL,
> > > +                                     handle, NULL);
> > > +        if (ACPI_FAILURE(status))
> > > +                printk(KERN_EMERG "failed to enumerate ACPI devices\n");
> > > +}
> > > +
> > > +static void print_irqs_if_i2c_adapter(struct device *dev) {
> > > +       acpi_handle handle = ACPI_HANDLE(dev);
> > > +       acpi_print_irqs_if_i2c(handle);
> > > +}
> > > +
> > >  static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > >  {
> > >         struct vfio_pci_device *vdev;
> > >         struct iommu_group *group;
> > >         int ret;
> > >
> > > +        if (has_acpi_companion(&pdev->dev))
> > > +               print_irqs_if_i2c_adapter(&pdev->dev);
> > > +
> > >         if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
> > >                 return -EINVAL;
> > >
> > > --
> > > 2.26.2
> > >
> >
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] vfio: PoC patch for printing IRQs used by i2c devices
  2020-06-19 20:00                           ` Micah Morton
@ 2020-06-22 21:59                             ` Micah Morton
  0 siblings, 0 replies; 19+ messages in thread
From: Micah Morton @ 2020-06-22 21:59 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Paolo Bonzini, Auger Eric, kvm

On Fri, Jun 19, 2020 at 1:00 PM Micah Morton <mortonm@chromium.org> wrote:
>
> On Fri, Jun 19, 2020 at 11:51 AM Alex Williamson
> <alex.williamson@redhat.com> wrote:
> >
> > On Tue, 9 Jun 2020 13:19:19 -0700
> > Micah Morton <mortonm@chromium.org> wrote:
> >
> > > On Wed, Jun 3, 2020 at 11:23 AM Micah Morton <mortonm@chromium.org> wrote:
> > > >
> > > > This patch accesses the ACPI system from vfio_pci_probe to print out
> > > > info if the PCI device being attached to vfio_pci is an i2c adapter with
> > > > associated i2c client devices that use platform IRQs. The info printed
> > > > out is the IRQ numbers that are associated with the given i2c client
> > > > devices. This patch doesn't attempt to forward any additional IRQs into
> > > > the guest, but shows how it could be possible.
> > > >
> > > > Signed-off-by: Micah Morton <mortonm@chromium.org>
> > > >
> > > > Change-Id: I5c77d35f246781a4a80703820860631e2c2091cf
> > > > ---
> > > > What do you guys think about having code like this somewhere in
> > > > vfio_pci? There would have to be some logic added in vfio_pci to forward
> > > > these IRQs when they are found. For reference, below is what is printed
> > > > out during vfio_pci_probe on my development machine when I attach some
> > > > I2C adapter PCI devices to vfio_pci:
> > > >
> > > > [   48.742699] ACPI i2c client device WCOM50C1:00 uses irq 31
> > > > [   53.913295] ACPI i2c client device GOOG0005:00 uses irq 24
> > > > [   58.040076] ACPI i2c client device ACPI0C50:00 uses irq 51
> > > >
> > > > Ideally we could add code like this for other bus types (e.g. SPI).
> > > >
> > > > NOTE: developed on v5.4
> >
> > Sorry for the delay, this took some time to reach the top of the heap
> > and process.
> >
> > > >  drivers/vfio/pci/vfio_pci.c | 158 ++++++++++++++++++++++++++++++++++++
> >
> > I think we'd want a separate file with Kconfig options to turn it off
> > if we were to consider adding this sort of thing.  The ACPI
> > dependencies may not be present on all platforms anyway.
>
> Agreed.
>
> >
> > > >  1 file changed, 158 insertions(+)
> > > >
> > > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > > > index 02206162eaa9..9ce3f34aa548 100644
> > > > --- a/drivers/vfio/pci/vfio_pci.c
> > > > +++ b/drivers/vfio/pci/vfio_pci.c
> > > > @@ -28,6 +28,10 @@
> > > >  #include <linux/vgaarb.h>
> > > >  #include <linux/nospec.h>
> > > >
> > > > +#include <linux/acpi.h>
> > > > +#include <acpi/actypes.h>
> > > > +#include <linux/i2c.h>
> > > > +
> > > >  #include "vfio_pci_private.h"
> > > >
> > > >  #define DRIVER_VERSION  "0.2"
> > > > @@ -1289,12 +1293,166 @@ static const struct vfio_device_ops vfio_pci_ops = {
> > > >  static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev);
> > > >  static void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck);
> > > >
> > > > +
> > > > +struct i2c_acpi_lookup {
> > > > +       struct i2c_board_info *info;
> > > > +       acpi_handle adapter_handle;
> > > > +       acpi_handle device_handle;
> > > > +       acpi_handle search_handle;
> > > > +       int n;
> > > > +       int index;
> > > > +       u32 speed;
> > > > +       u32 min_speed;
> > > > +       u32 force_speed;
> > > > +};
> > > > +
> > > > +static const struct acpi_device_id i2c_acpi_ignored_device_ids[] = {
> > > > +        /*
> > > > +         * ACPI video acpi_devices, which are handled by the acpi-video driver
> > > > +         * sometimes contain a SERIAL_TYPE_I2C ACPI resource, ignore these.
> > > > +         */
> > > > +        { ACPI_VIDEO_HID, 0 },
> >
> > I don't understand this, why are these devices special?  What other
> > devices might be special?  If the behavior of a host driver that's
> > independent of the PCI device driver makes it special, that seems like
> > a general problem with this approach, we can't have userspace and host
> > drivers competing for the same resources.
>
> Sorry, I just copied this code from drivers/i2c/i2c-core-acpi.c. It
> should have been removed. I think it is irrelevant to our discussion.
> In that case the i2c code is looking for i2c clients to register in
> the system and wants to avoid registering certain blacklisted devices
> and avoid attaching to them. This is not relevant to simply passing
> through the irq/MMIO/io port resources for an i2c client device. VFIO
> wouldn't actually be trying to attach any driver to the i2c clients in
> this case, and would defer to the host to choose what it wants to
> attach to.
>
> >
> > > > +        {}
> > > > +};
> > > > +
> > > > +static int i2c_acpi_get_info_for_node(struct acpi_resource *ares, void *data)
> > > > +{
> > > > +        struct i2c_acpi_lookup *lookup = data;
> > > > +        struct i2c_board_info *info = lookup->info;
> > > > +        struct acpi_resource_i2c_serialbus *sb;
> > > > +        acpi_status status;
> > > > +
> > > > +        if (info->addr || !i2c_acpi_get_i2c_resource(ares, &sb))
> > > > +                return 1;
> > > > +
> > > > +        if (lookup->index != -1 && lookup->n++ != lookup->index)
> > > > +                return 1;
> > > > +
> > > > +        status = acpi_get_handle(lookup->device_handle,
> > > > +                                 sb->resource_source.string_ptr,
> > > > +                                 &lookup->adapter_handle);
> > > > +        if (ACPI_FAILURE(status))
> > > > +                return 1;
> > > > +
> > > > +        info->addr = sb->slave_address;
> > > > +
> > > > +        return 1;
> > > > +}
> > > > +
> > > > +static int print_irq_info_if_i2c_slave(struct acpi_device *adev,
> > > > +                              struct i2c_acpi_lookup *lookup, acpi_handle adapter)
> > > > +{
> > > > +        struct i2c_board_info *info = lookup->info;
> > > > +        struct list_head resource_list, resource_list2;
> > > > +       struct resource_entry *entry;
> > > > +        int ret;
> > > > +
> > > > +        if (acpi_bus_get_status(adev) || !adev->status.present)
> > > > +                return -EINVAL;
> > > > +
> > > > +        if (acpi_match_device_ids(adev, i2c_acpi_ignored_device_ids) == 0)
> > > > +                return -ENODEV;
> > > > +
> > > > +        memset(info, 0, sizeof(*info));
> > > > +        lookup->device_handle = acpi_device_handle(adev);
> > > > +
> > > > +        /* Look up for I2cSerialBus resource */
> > > > +        INIT_LIST_HEAD(&resource_list);
> > > > +        ret = acpi_dev_get_resources(adev, &resource_list,
> > > > +                                     i2c_acpi_get_info_for_node, lookup);
> > > > +
> > > > +
> > > > +       if (ret < 0 || !info->addr) {
> > > > +               acpi_dev_free_resource_list(&resource_list);
> > > > +                return -EINVAL;
> > > > +       }
> > > > +
> > > > +        if (adapter) {
> > > > +                /* The adapter must match the one in I2cSerialBus() connector */
> > > > +                if (adapter != lookup->adapter_handle)
> > > > +                        return -ENODEV;
> > > > +       }
> > > > +
> > > > +        INIT_LIST_HEAD(&resource_list2);
> > > > +       ret = acpi_dev_get_resources(adev, &resource_list2, NULL, NULL);
> > > > +       if (ret < 0)
> > > > +               return -EINVAL;
> > > > +
> > > > +        resource_list_for_each_entry(entry, &resource_list2) {
> >
> > Let me see if I understand what's happening here, first we walk all the
> > ACPI resources of the device to verify that it does have a
> > ACPI_RESOURCE_SERIAL_TYPE_I2C and we're able to get a handle for that
> > pathname.  info->addr is some throw-away data we use to determine
> > success (seems like we could return -1 on ACPI_FAILURE to stop
> > iterating at that point as we won't get past the index check for
> > anything else, and avoid info->addr).  Then once we know the device
> > includes the necessary i2c type, we collect all the resources (for some
> > reason using a different list even though our previous callback only
> > returns 1 so will never have anything added to the list).  Right?
>
> Yeah, its sloppy PoC code yanked out of drivers/i2c/i2c-core-acpi.c
> for the most part. I tried to cut out the easy parts of the code that
> didn't need to be there, but there would obviously be more work to do
> in optimizing the code for what we would actually want to happen in
> VFIO. Seems like we're on the same page on all the important points
> though.
>
> >
> > > > +                if (resource_type(entry->res) == IORESOURCE_IRQ) {
> > >
> > > In reference to Eric's question, it's also simple to check for other
> > > types of resources here (like IORESOURCE_IO (I/O port) or
> > > IORESOURCE_MEM (MMIO)), which would allow for making those types of
> > > resources for devices behind the bus controller available to the guest
> > > as well. I think MMIO would be simple enough. I/O ports would require
> > > a way for VFIO to tell KVM to set the right bits in the VMCS I/O port
> > > bitmap* when initializing the guest so that the guest can access the
> > > needed I/O ports without trapping to the host. Or I/O ports could just
> > > be trapped and then the read/write carried out in the host.
> >
> > The latter is how we support I/O port space, VFIO is a userspace driver
> > interface, not a KVM device interface.  I/O port access has not been
> > shown to be worth any further optimization.
>
> Makes sense, thanks.
>
> >
> > > Alex, does it seem reasonable to consult ACPI in this way and use this
> > > info to make sub-device resources available in the guest? I don't
> > > anticipate a case where any sub-devices are DMA-capable and could
> > > circumvent IOMMU translation to write to host memory. Do you see any
> > > other potential pitfalls? I guess the only drawback I see is needing
> > > code in the VMM to copy chunks of the ACPI tables from host to guest
> > > to tell it about the sub-devices -- but that's pretty doable.
> >
> > Is there a concern about shared versus exclusive resources?  For
> > example an interrupt could be shared among multiple i2c endpoints,
> > right?  We don't know who it might be shared with and we don't know
> > enough about our endpoint to know how to determine if our endpoint is
> > signaling the interrupt or how to mask it from signaling the interrupt
> > at the device level.  I assume where we handle SET_IRQS for this device
> > specific interrupt we'd therefore never use IRQF_SHARED, such that we
> > require an exclusive interrupt.  MMIO and I/O port resources seem

I think requiring an exclusive interrupt makes sense, since the
request_irq() in VFIO will fail when any other driver in the host has
a handler registered for that irq (shared or otherwise). This gets
into the territory of the user/VMM potentially needing to unbind an
ACPI driver from a device before assignment of the PCI
controller/adapter to the guest -- I guess that’s not too much to ask
given the similar precedent for PCI devices. In fact, that would
probably be a requirement for this use case anyway to make sure no
drivers in the host are still trying to talk to the MMIOs/ports for a
device that is being assigned to a guest.

The only alternative I can think of to using exclusively exclusive
IRQs would be to have platform-device-specific routines in vfio-pci to
check if the device whose IRQ we want to forward into the guest was
the one that triggered the shared IRQ (sort of like what is done for
reset drivers in vfio-platform:
https://elixir.bootlin.com/linux/latest/source/drivers/vfio/platform/reset).
This quirk-type approach is probably not the most appealing though.

> > straightforward to expose as device specific regions, but makes me
> > curious what we're really exposing.  I'm nervous that we're blurring
> > the line between one device and another, for example if an i2c endpoint
> > is only accessible via the i2c controller on the PCI device then we can
> > claim the user owns those sub-devices, but if the device responds to
> > MMIO or I/O port space independent of the ownership of the PCI i2c
> > controller itself, how do we know we're not conflicting with an ACPI
> > based driver that's already attached to this sub-device?  Is this
> > essentially the concern with the ACPI_VIDEO_HID exclusion above?  Do we
> > need to take the same approach of requiring an exclusive interrupt by
> > using request_resource() and error on conflicts?

When we discover ACPI sub-devices behind a given PCI device we could
just check whether there is any driver attached to that device and
deny binding the adapter to vfio-pci if so? The acpi_device struct we
get from acpi_bus_get_device() has a “driver” field that points to an
acpi_driver struct. I think if we can guarantee there are no drivers
attached to the sub-devices then we are in the clear in terms of
drivers in the host and guest fighting over MMIOs/ports. What do you
think? Maybe additional safeguards like request_resource() make sense
as well.

>
> I'm going to do some further thinking/research on this and will reply
> sometime early next week. Answering all the other questions now
> though.
>
> >
> > Seems like it might be prudent to not only have a Kconfig but require
> > and opt-in for this support at the vfio-pci module level.
>
> Agreed.
>
> >
> > Somehow we also need to factor in usability, for example how do we
> > provide enough context in describing the device specific region/irq to
> > the user that they can associate the object and perhaps avoid
> > independently parsing AML or providing a magic firmware blob.
> >
> > > I think this might be Intel specific, would have to check how to do
> > > the equivalent thing on AMD.
> >
> > This is all generic ACPI though, are you only suggesting different
> > conventions between the firmware implementations between vendors?
>
> Sorry, there was supposed to be an asterisk before that line pointing
> to the VMCS I/O port bitmap stuff. All I was saying is that the VMCS
> stuff may be Intel-specific. Doesn't really matter however as you
> pointed out its better just to emulate the I/O port access anyway.
>
> >
> > > > +                        printk(KERN_EMERG "ACPI i2c client device %s uses irq %d\n",
> > > > +                                               dev_name(&adev->dev), entry->res->start);
> > > > +                        break;
> > > > +                }
> > > > +        }
> > > > +
> > > > +        acpi_dev_free_resource_list(&resource_list2);
> > > > +
> > > > +        return 0;
> > > > +}
> > > > +
> > > > +static int i2c_acpi_get_info(struct acpi_device *adev,
> > > > +                             struct i2c_board_info *info,
> > > > +                             acpi_handle adapter,
> > > > +                             acpi_handle *adapter_handle)
> > > > +{
> > > > +        struct i2c_acpi_lookup lookup;
> > > > +
> > > > +
> > > > +        memset(&lookup, 0, sizeof(lookup));
> > > > +        lookup.info = info;
> > > > +        lookup.index = -1;
> > > > +
> > > > +        if (acpi_device_enumerated(adev))
> > > > +                return -EINVAL;
> > > > +
> > > > +        return print_irq_info_if_i2c_slave(adev, &lookup, adapter);
> > > > +}
> > > > +
> > > > +static acpi_status process_acpi_node(acpi_handle handle, u32 level,
> > > > +                                       void *data, void **return_value)
> > > > +{
> > > > +        acpi_handle adapter = data;
> > > > +        struct acpi_device *adev;
> > > > +        struct i2c_board_info info;
> > > > +
> > > > +        if (acpi_bus_get_device(handle, &adev))
> > > > +                return AE_OK;
> > > > +
> > > > +        if (i2c_acpi_get_info(adev, &info, adapter, NULL))
> > > > +                return AE_OK;
> > > > +
> > > > +        return AE_OK;
> > > > +}
> > > > +
> > > > +#define MAX_SCAN_DEPTH 32
> >
> > Arbitrary?  Thanks,
>
> Again just copied what was done in drivers/i2c/i2c-core-acpi.c. Not
> really sure how they came up with that number. Seems potentially
> arbitrary.
>
> >
> > Alex
>
> Thanks for the reply, I'll get back to you with my thoughts on shared
> vs exclusive resource stuff.
>
> >
> > > > +
> > > > +void acpi_print_irqs_if_i2c(acpi_handle handle)
> > > > +{
> > > > +        acpi_status status;
> > > > +
> > > > +        status = acpi_walk_namespace(ACPI_TYPE_DEVICE, handle,
> > > > +                                     MAX_SCAN_DEPTH,
> > > > +                                     process_acpi_node, NULL,
> > > > +                                     handle, NULL);
> > > > +        if (ACPI_FAILURE(status))
> > > > +                printk(KERN_EMERG "failed to enumerate ACPI devices\n");
> > > > +}
> > > > +
> > > > +static void print_irqs_if_i2c_adapter(struct device *dev) {
> > > > +       acpi_handle handle = ACPI_HANDLE(dev);
> > > > +       acpi_print_irqs_if_i2c(handle);
> > > > +}
> > > > +
> > > >  static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > > >  {
> > > >         struct vfio_pci_device *vdev;
> > > >         struct iommu_group *group;
> > > >         int ret;
> > > >
> > > > +        if (has_acpi_companion(&pdev->dev))
> > > > +               print_irqs_if_i2c_adapter(&pdev->dev);
> > > > +
> > > >         if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
> > > >                 return -EINVAL;
> > > >
> > > > --
> > > > 2.26.2
> > > >
> > >
> >

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-06-22 21:59 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-11 22:00 [RFC PATCH] KVM: Add module for IRQ forwarding Micah Morton
2020-05-12 17:14 ` Alex Williamson
2020-05-13  7:02   ` Paolo Bonzini
2020-05-13 14:34     ` Alex Williamson
2020-05-13 15:23       ` Paolo Bonzini
2020-05-13 19:10         ` Micah Morton
2020-05-13 22:05           ` Paolo Bonzini
2020-05-14 17:44             ` Micah Morton
2020-05-14 21:17               ` Paolo Bonzini
2020-05-14 22:43                 ` Alex Williamson
2020-05-15 20:30                   ` Micah Morton
2020-06-03 18:23                     ` [PATCH] vfio: PoC patch for printing IRQs used by i2c devices Micah Morton
2020-06-09 20:19                       ` Micah Morton
2020-06-19 18:51                         ` Alex Williamson
2020-06-19 20:00                           ` Micah Morton
2020-06-22 21:59                             ` Micah Morton
2020-05-15 10:11               ` [RFC PATCH] KVM: Add module for IRQ forwarding Auger Eric
2020-05-15 16:39                 ` Micah Morton
2020-05-13 21:52     ` Micah Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).