virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/3] Introduce Vdmabuf driver
@ 2021-01-19  8:28 Vivek Kasireddy
  2021-01-19  8:28 ` [RFC 1/3] virtio: " Vivek Kasireddy
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Vivek Kasireddy @ 2021-01-19  8:28 UTC (permalink / raw)
  To: virtualization; +Cc: dongwon.kim

The Virtual dmabuf or Virtio based dmabuf (Vdmabuf) driver can be used
to "transfer" a page-backed dmabuf created in the Guest to the Host
without making any copies. A use-case where this driver would be a good 
fit is a multi-GPU system (perhaps one discrete and one integrated)
where one of the GPUs does not have access to the display/connectors/outputs.
This could be a embedded system design decision or a restriction made at
the firmware/BIOS level. When such a GPU is passthrough'd to a Guest OS,
this driver can help in transferring the scanout buffer(s) (rendered
using the native rendering stack) to the Host for the purpose of
displaying them.

The userspace component running in the Guest that transfers the dmabuf
is referred to as the producer or exporter and its counterpart running
in the Host is referred to as importer or consumer. For instance, a
Wayland compositor would potentially be a producer and Qemu UI would
be a consumer. It is the producer's responsibility to not reuse or
destroy the shared buffer while it is still being used by the consumer.
The consumer would send a release cmd indicating that it is done after
which the shared buffer can be safely used again by the producer. One
way the producer can prevent accidental re-use of the shared buffer is
to lock the buffer when it exports it and unlock it after it gets a 
release cmd. As an example, the GBM API provides a simple way to lock 
and unlock a surface's buffers.

For each dmabuf that is to be shared with the Host, a 128-bit unique
ID is generated that identifies this buffer across the whole system.
This ID is a combination of the Qemu process ID, a counter and a
randomizer. We could potentially use UUID API but we currently use
the above mentioned combination to identify the source of the buffer
at any given time for bookkeeping.

Vivek Kasireddy (3):
  virtio: Introduce Vdmabuf driver
  vhost: Add Vdmabuf backend
  vfio: Share the KVM instance with Vdmabuf

 drivers/vfio/vfio.c                 |    9 +
 drivers/vhost/Kconfig               |    9 +
 drivers/vhost/Makefile              |    3 +
 drivers/vhost/vdmabuf.c             | 1332 +++++++++++++++++++++++++++
 drivers/virtio/Kconfig              |    8 +
 drivers/virtio/Makefile             |    1 +
 drivers/virtio/virtio_vdmabuf.c     |  973 +++++++++++++++++++
 include/linux/virtio_vdmabuf.h      |  271 ++++++
 include/uapi/linux/virtio_ids.h     |    1 +
 include/uapi/linux/virtio_vdmabuf.h |   99 ++
 10 files changed, 2706 insertions(+)
 create mode 100644 drivers/vhost/vdmabuf.c
 create mode 100644 drivers/virtio/virtio_vdmabuf.c
 create mode 100644 include/linux/virtio_vdmabuf.h
 create mode 100644 include/uapi/linux/virtio_vdmabuf.h

-- 
2.26.2

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC 1/3] virtio: Introduce Vdmabuf driver
  2021-01-19  8:28 [RFC 0/3] Introduce Vdmabuf driver Vivek Kasireddy
@ 2021-01-19  8:28 ` Vivek Kasireddy
  2021-01-19  8:28 ` [RFC 2/3] vhost: Add Vdmabuf backend Vivek Kasireddy
  2021-01-19  8:28 ` [RFC 3/3] vfio: Share the KVM instance with Vdmabuf Vivek Kasireddy
  2 siblings, 0 replies; 10+ messages in thread
From: Vivek Kasireddy @ 2021-01-19  8:28 UTC (permalink / raw)
  To: virtualization; +Cc: dongwon.kim

This driver "transfers" a dmabuf created on the Guest to the Host.
A common use-case for such a transfer includes sharing the scanout
buffer created by a display server or a compositor running in the
Guest with Qemu UI -- running on the Host.

The "transfer" is accomplished by sharing the PFNs of all the pages
associated with the dmabuf and having a new dmabuf created on the
Host that is backed up by the pages mapped from the Guest.

Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
---
 drivers/virtio/Kconfig              |   8 +
 drivers/virtio/Makefile             |   1 +
 drivers/virtio/virtio_vdmabuf.c     | 973 ++++++++++++++++++++++++++++
 include/linux/virtio_vdmabuf.h      | 271 ++++++++
 include/uapi/linux/virtio_ids.h     |   1 +
 include/uapi/linux/virtio_vdmabuf.h |  99 +++
 6 files changed, 1353 insertions(+)
 create mode 100644 drivers/virtio/virtio_vdmabuf.c
 create mode 100644 include/linux/virtio_vdmabuf.h
 create mode 100644 include/uapi/linux/virtio_vdmabuf.h

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 7b41130d3f35..e563c12f711e 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -139,4 +139,12 @@ config VIRTIO_DMA_SHARED_BUFFER
 	 This option adds a flavor of dma buffers that are backed by
 	 virtio resources.
 
+config VIRTIO_VDMABUF
+	bool "Enables Vdmabuf driver in guest os"
+	default n
+	depends on VIRTIO
+	help
+	 This driver provides a way to share the dmabufs created in
+	 the Guest with the Host.
+
 endif # VIRTIO_MENU
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 591e6f72aa54..b4bb0738009c 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -9,3 +9,4 @@ obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
 obj-$(CONFIG_VIRTIO_VDPA) += virtio_vdpa.o
 obj-$(CONFIG_VIRTIO_MEM) += virtio_mem.o
 obj-$(CONFIG_VIRTIO_DMA_SHARED_BUFFER) += virtio_dma_buf.o
+obj-$(CONFIG_VIRTIO_VDMABUF) += virtio_vdmabuf.o
diff --git a/drivers/virtio/virtio_vdmabuf.c b/drivers/virtio/virtio_vdmabuf.c
new file mode 100644
index 000000000000..e377114c2a2b
--- /dev/null
+++ b/drivers/virtio/virtio_vdmabuf.c
@@ -0,0 +1,973 @@
+// SPDX-License-Identifier: (MIT OR GPL-2.0)
+
+/*
+ * Copyright © 2021 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Dongwon Kim <dongwon.kim@intel.com>
+ *    Mateusz Polrola <mateusz.polrola@gmail.com>
+ *    Vivek Kasireddy <vivek.kasireddy@intel.com>
+ *
+ */
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/uaccess.h>
+#include <linux/miscdevice.h>
+#include <linux/delay.h>
+#include <linux/random.h>
+#include <linux/poll.h>
+#include <linux/spinlock.h>
+#include <linux/dma-buf.h>
+#include <linux/virtio.h>
+#include <linux/virtio_ids.h>
+#include <linux/virtio_config.h>
+#include <linux/virtio_vdmabuf.h>
+
+#define VIRTIO_VDMABUF_MAX_ID INT_MAX
+#define REFS_PER_PAGE (PAGE_SIZE/sizeof(long))
+#define NEW_BUF_ID_GEN(vmid, cnt) (((vmid & 0xFFFFFFFF) << 32) | \
+				    ((cnt) & 0xFFFFFFFF))
+
+/* one global drv object */
+static struct virtio_vdmabuf_info *drv_info;
+
+struct virtio_vdmabuf {
+	/* virtio device structure */
+	struct virtio_device *vdev;
+
+	/* virtual queue array */
+	struct virtqueue *vq;
+
+	/* ID of guest OS */
+	u64 vmid;
+
+	/* spin lock that needs to be acquired before accessing
+	 * virtual queue
+	 */
+	spinlock_t vq_lock;
+	struct mutex rx_lock;
+
+	/* workqueue */
+	struct workqueue_struct *wq;
+	struct work_struct rx_work;
+	struct virtio_vdmabuf_event_queue *evq;
+};
+
+static virtio_vdmabuf_buf_id_t get_buf_id(void)
+{
+	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
+	virtio_vdmabuf_buf_id_t buf_id = {0, {0, 0} };
+	static int count = 0;
+
+	count = count < VIRTIO_VDMABUF_MAX_ID ? count + 1 : 0;
+	buf_id.id = NEW_BUF_ID_GEN(vdmabuf->vmid, count);
+
+	/* random data embedded in the id for security */
+	get_random_bytes(&buf_id.rng_key[0], 8);
+
+	return buf_id;
+}
+
+/* sharing pages for original DMABUF with Host */
+static struct virtio_vdmabuf_shared_pages
+*virtio_vdmabuf_share_buf(struct page **pages, int nents,
+			  int first_ofst, int last_len)
+{
+	struct virtio_vdmabuf_shared_pages *pages_info;
+	int i;
+	int n_l2refs = nents/REFS_PER_PAGE +
+		       ((nents % REFS_PER_PAGE) ? 1 : 0);
+
+	pages_info = kvcalloc(1, sizeof(*pages_info), GFP_KERNEL);
+	if (!pages_info)
+		return NULL;
+
+	pages_info->pages = pages;
+	pages_info->nents = nents;
+	pages_info->first_ofst = first_ofst;
+	pages_info->last_len = last_len;
+	pages_info->l3refs = (gpa_t *)__get_free_page(GFP_KERNEL);
+
+	if (!pages_info->l3refs) {
+		kvfree(pages_info);
+		return NULL;
+	}
+
+	pages_info->l2refs = (gpa_t **)__get_free_pages(GFP_KERNEL,
+					get_order(n_l2refs * PAGE_SIZE));
+
+	if (!pages_info->l2refs) {
+		free_page((gpa_t)pages_info->l3refs);
+		kvfree(pages_info);
+		return NULL;
+	}
+
+	/* Share physical address of pages */
+	for (i = 0; i < nents; i++)
+		pages_info->l2refs[i] = (gpa_t *)page_to_phys(pages[i]);
+
+	for (i = 0; i < n_l2refs; i++)
+		pages_info->l3refs[i] =
+			virt_to_phys((void *)pages_info->l2refs +
+				     i * PAGE_SIZE);
+
+	pages_info->ref = (gpa_t)virt_to_phys(pages_info->l3refs);
+
+	return pages_info;
+}
+
+/* stop sharing pages */
+static void
+virtio_vdmabuf_free_buf(struct virtio_vdmabuf_shared_pages *pages_info)
+{
+	int n_l2refs = (pages_info->nents/REFS_PER_PAGE +
+		       ((pages_info->nents % REFS_PER_PAGE) ? 1 : 0));
+
+	free_pages((gpa_t)pages_info->l2refs, get_order(n_l2refs * PAGE_SIZE));
+	free_page((gpa_t)pages_info->l3refs);
+
+	kvfree(pages_info);
+}
+
+static int send_msg_to_host(enum virtio_vdmabuf_cmd cmd, int *op)
+{
+	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
+	struct virtio_vdmabuf_msg *msg;
+	struct scatterlist sg;
+	unsigned long flags;
+	int i, tx_size, ret = 0;
+
+	switch (cmd) {
+	case VIRTIO_VDMABUF_CMD_NEED_VMID:
+		msg = kvcalloc(1, sizeof(struct virtio_vdmabuf_msg),
+			       GFP_KERNEL);
+		if (!msg)
+			return -ENOMEM;
+
+		if (op)
+			for (i = 0; i < 4; i++)
+				msg->op[i] = op[i];
+
+		tx_size = sizeof(struct virtio_vdmabuf_msg);
+		break;
+
+	case VIRTIO_VDMABUF_CMD_EXPORT:
+		msg = kvcalloc(1, sizeof(struct virtio_vdmabuf_msg),
+			       GFP_KERNEL);
+		if (!msg)
+			return -ENOMEM;
+
+		memcpy(&msg->op[0], &op[0], 9 * sizeof(int) + op[9]);
+		tx_size = sizeof(struct virtio_vdmabuf_msg);
+		break;
+
+	default:
+		/* no command found */
+		return -EINVAL;
+	}
+
+	msg->cmd = cmd;
+	sg_init_one(&sg, msg, tx_size);
+
+	spin_lock_irqsave(&vdmabuf->vq_lock, flags);
+
+	ret = virtqueue_add_inbuf(vdmabuf->vq, &sg, 1, msg, GFP_KERNEL);
+	if (ret)
+		goto err;	
+
+	ret = virtqueue_kick(vdmabuf->vq);
+err:
+	spin_unlock_irqrestore(&vdmabuf->vq_lock, flags);
+
+	return ret;
+}
+
+static int add_event_buf_rel(struct virtio_vdmabuf_buf *buf_info)
+{
+	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
+	struct virtio_vdmabuf_event *e_oldest, *e_new;
+	struct virtio_vdmabuf_event_queue *eq = vdmabuf->evq;
+	unsigned long irqflags;
+
+	e_new = kvzalloc(sizeof(*e_new), GFP_KERNEL);
+	if (!e_new)
+		return -ENOMEM;
+
+	e_new->e_data.hdr.buf_id = buf_info->buf_id;
+	e_new->e_data.data = (void *)buf_info->priv;
+	e_new->e_data.hdr.size = buf_info->sz_priv;
+
+	spin_lock_irqsave(&eq->e_lock, irqflags);
+
+	/* check current number of events and if it hits the max num (32)
+	 * then remove the oldest event in the list
+	 */
+	if (eq->pending > 31) {
+		e_oldest = list_first_entry(&eq->e_list,
+					    struct virtio_vdmabuf_event, link);
+		list_del(&e_oldest->link);
+		eq->pending--;
+		kvfree(e_oldest);
+	}
+
+	list_add_tail(&e_new->link, &eq->e_list);
+
+	eq->pending++;
+
+	wake_up_interruptible(&eq->e_wait);
+	spin_unlock_irqrestore(&eq->e_lock, irqflags);
+
+	return 0;
+}
+
+static void virtio_vdmabuf_clear_buf(struct virtio_vdmabuf_buf *exp)
+{
+	/* Start cleanup of buffer in reverse order to exporting */
+	virtio_vdmabuf_free_buf(exp->pages_info);
+
+	dma_buf_unmap_attachment(exp->attach, exp->sgt,
+				 DMA_BIDIRECTIONAL);
+
+	if (exp->dma_buf) {
+		dma_buf_detach(exp->dma_buf, exp->attach);
+		/* close connection to dma-buf completely */
+		dma_buf_put(exp->dma_buf);
+		exp->dma_buf = NULL;
+	}
+}
+
+static int remove_buf(struct virtio_vdmabuf *vdmabuf,
+		      struct virtio_vdmabuf_buf *exp)
+{
+	int ret;
+
+	ret = add_event_buf_rel(exp);
+	if (ret)
+		return ret;
+
+	virtio_vdmabuf_clear_buf(exp);
+
+	ret = virtio_vdmabuf_del_buf(drv_info, &exp->buf_id);
+	if (ret)
+		return ret;
+
+	if (exp->sz_priv > 0 && !exp->priv)
+		kvfree(exp->priv);
+
+	kvfree(exp);
+	return 0;
+}
+
+static int parse_msg_from_host(struct virtio_vdmabuf *vdmabuf,
+		     	       struct virtio_vdmabuf_msg *msg)
+{
+	struct virtio_vdmabuf_buf *exp;
+	virtio_vdmabuf_buf_id_t buf_id;
+	int ret;
+
+	/* empty message not allowed and ignored */
+	if (!msg->cmd) {
+		dev_err(drv_info->dev, "empty cmd\n");
+		return -EINVAL;
+	}
+
+	memcpy(&buf_id, msg->op, sizeof(buf_id));
+
+	exp = virtio_vdmabuf_find_buf(drv_info, &buf_id);
+	if (!exp) {
+		dev_err(drv_info->dev, "can't find buffer\n");
+		return -EINVAL;
+	}
+
+	switch (msg->cmd) {
+	case VIRTIO_VDMABUF_CMD_DMABUF_REL:
+		ret = remove_buf(vdmabuf, exp);
+		if (ret)
+			return ret;
+
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static void virtio_vdmabuf_rx_work(struct work_struct *work)
+{
+	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
+	struct virtqueue *vq = vdmabuf->vq;
+	struct virtio_vdmabuf_msg *msg;
+	int sz;
+
+	mutex_lock(&vdmabuf->rx_lock);
+
+	do {
+		virtqueue_disable_cb(vq);
+		msg = virtqueue_get_buf(vq, &sz);
+		if (!msg)
+			break;
+
+		/* valid size */
+		if (sz == sizeof(struct virtio_vdmabuf_msg)) {
+			if (msg->cmd == VIRTIO_VDMABUF_CMD_NEED_VMID) {
+				vdmabuf->vmid = msg->op[0];
+			} else {
+				if (parse_msg_from_host(vdmabuf, msg))
+					dev_err(drv_info->dev,
+						"msg parse error\n");
+			}
+		} else {
+			dev_err(drv_info->dev,
+				"received malformed message\n");
+		}
+	} while (!virtqueue_enable_cb(vq));
+
+	mutex_unlock(&vdmabuf->rx_lock);
+}
+
+static void rx_event(struct virtqueue *vq)
+{
+	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
+
+	queue_work(vdmabuf->wq, &vdmabuf->rx_work);
+}
+
+static int remove_all_bufs(struct virtio_vdmabuf *vdmabuf)
+{
+	struct virtio_vdmabuf_buf *found;
+	struct hlist_node *tmp;
+	int bkt;
+	int ret;
+
+	hash_for_each_safe(drv_info->buf_list, bkt, tmp, found, node) {
+		ret = remove_buf(vdmabuf, found);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int virtio_vdmabuf_open(struct inode *inode, struct file *filp)
+{
+	if (!drv_info) {
+		pr_err("virtio vdmabuf driver is not ready\n");
+		return -EINVAL;
+	}
+
+	filp->private_data = drv_info->priv;
+
+	return 0;
+}
+
+static int virtio_vdmabuf_release(struct inode *inode, struct file *filp)
+{
+	return 0;
+}
+
+/* Notify Host about the new vdmabuf */
+static int export_notify(struct virtio_vdmabuf_buf *exp, struct page **pages)
+{
+	int *op;
+	int ret;
+
+	op = kvcalloc(1, sizeof(int) * 65, GFP_KERNEL);
+	if (!op)
+		return -ENOMEM;
+
+	memcpy(op, &exp->buf_id, sizeof(exp->buf_id));
+
+	/* if new pages are to be shared */
+	if (pages) {
+		op[4] = exp->pages_info->nents;
+		op[5] = exp->pages_info->first_ofst;
+		op[6] = exp->pages_info->last_len;
+
+		memcpy(&op[7], &exp->pages_info->ref, sizeof(long));
+	}
+
+	op[9] = exp->sz_priv;
+
+	/* driver/application specific private info */
+	memcpy(&op[10], exp->priv, op[9]);
+
+	ret = send_msg_to_host(VIRTIO_VDMABUF_CMD_EXPORT, op);
+
+	kvfree(op);
+	return ret;
+}
+
+/* return total number of pages referenced by a sgt
+ * for pre-calculation of # of pages behind a given sgt
+ */
+static int num_pgs(struct sg_table *sgt)
+{
+	struct scatterlist *sgl;
+	int len, i;
+	/* at least one page */
+	int n_pgs = 1;
+
+	sgl = sgt->sgl;
+
+	len = sgl->length - PAGE_SIZE + sgl->offset;
+
+	/* round-up */
+	n_pgs += ((len + PAGE_SIZE - 1)/PAGE_SIZE);
+
+	for (i = 1; i < sgt->nents; i++) {
+		sgl = sg_next(sgl);
+
+		/* round-up */
+		n_pgs += ((sgl->length + PAGE_SIZE - 1) /
+			  PAGE_SIZE); /* round-up */
+	}
+
+	return n_pgs;
+}
+
+/* extract pages referenced by sgt */
+static struct page **extr_pgs(struct sg_table *sgt, int *nents, int *last_len)
+{
+	struct scatterlist *sgl;
+	struct page **pages;
+	struct page **temp_pgs;
+	int i, j;
+	int len;
+
+	*nents = num_pgs(sgt);
+	pages =	kvmalloc_array(*nents, sizeof(struct page *), GFP_KERNEL);
+	if (!pages)
+		return NULL;
+
+	sgl = sgt->sgl;
+
+	temp_pgs = pages;
+	*temp_pgs++ = sg_page(sgl);
+	len = sgl->length - PAGE_SIZE + sgl->offset;
+
+	i = 1;
+	while (len > 0) {
+		*temp_pgs++ = nth_page(sg_page(sgl), i++);
+		len -= PAGE_SIZE;
+	}
+
+	for (i = 1; i < sgt->nents; i++) {
+		sgl = sg_next(sgl);
+		*temp_pgs++ = sg_page(sgl);
+		len = sgl->length - PAGE_SIZE;
+		j = 1;
+
+		while (len > 0) {
+			*temp_pgs++ = nth_page(sg_page(sgl), j++);
+			len -= PAGE_SIZE;
+		}
+	}
+
+	*last_len = len + PAGE_SIZE;
+
+	return pages;
+}
+
+/* ioctl - exporting new vdmabuf
+ *
+ *	 int dmabuf_fd - File handle of original DMABUF
+ *	 virtio_vdmabuf_buf_id_t buf_id - returned vdmabuf ID
+ *	 int sz_priv - size of private data from userspace
+ *	 char *priv - buffer of user private data
+ *
+ */
+static int export_ioctl(struct file *filp, void *data)
+{
+	struct virtio_vdmabuf_export *attr = data;
+	struct dma_buf *dmabuf;
+	struct dma_buf_attachment *attach;
+	struct sg_table *sgt;
+	struct virtio_vdmabuf_buf *exp;
+	struct page **pages;
+	int nents, last_len;
+	virtio_vdmabuf_buf_id_t buf_id;
+	int ret = 0;
+
+	dmabuf = dma_buf_get(attr->fd);
+	if (IS_ERR(dmabuf))
+		return PTR_ERR(dmabuf);
+
+	mutex_lock(&drv_info->g_mutex);
+
+	buf_id = get_buf_id();
+
+	attach = dma_buf_attach(dmabuf, drv_info->dev);
+	if (IS_ERR(attach)) {
+		ret = PTR_ERR(attach);
+		goto fail_attach;
+	}
+
+	sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
+	if (IS_ERR(sgt)) {
+		ret = PTR_ERR(sgt);
+		goto fail_map_attachment;
+	}
+
+	/* allocate a new exp */
+	exp = kvcalloc(1, sizeof(*exp), GFP_KERNEL);
+	if (!exp) {
+		ret = -ENOMEM;
+		goto fail_sgt_info_creation;
+	}
+
+	/* possible truncation */
+	if (attr->sz_priv > MAX_SIZE_PRIV_DATA)
+		exp->sz_priv = MAX_SIZE_PRIV_DATA;
+	else
+		exp->sz_priv = attr->sz_priv;
+
+	/* creating buffer for private data */
+	if (exp->sz_priv != 0) {
+		exp->priv = kvcalloc(1, exp->sz_priv, GFP_KERNEL);
+		if (!exp->priv) {
+			ret = -ENOMEM;
+			goto fail_priv_creation;
+		}
+	}
+
+	exp->buf_id = buf_id;
+	exp->attach = attach;
+	exp->sgt = sgt;
+	exp->dma_buf = dmabuf;
+	exp->valid = 1;
+
+	if (exp->sz_priv) {
+		/* copy private data to sgt_info */
+		ret = copy_from_user(exp->priv, attr->priv, exp->sz_priv);
+		if (ret) {
+			ret = -EINVAL;
+			goto fail_exp;
+		}
+	}
+
+	pages = extr_pgs(sgt, &nents, &last_len);
+	if (pages == NULL) {
+		ret = -ENOMEM;
+		goto fail_exp;
+	}
+
+	exp->pages_info = virtio_vdmabuf_share_buf(pages, nents,
+						   sgt->sgl->offset,
+					 	   last_len);
+	if (!exp->pages_info) {
+		ret = -ENOMEM;
+		goto fail_create_pages_info;
+	}
+
+	attr->buf_id = exp->buf_id;
+	ret = export_notify(exp, pages);
+	if (ret < 0)
+		goto fail_send_request;
+
+	/* now register it to the export list */
+	virtio_vdmabuf_add_buf(drv_info, exp);
+
+	exp->filp = filp;
+
+	mutex_unlock(&drv_info->g_mutex);
+
+	return ret;
+
+/* Clean-up if error occurs */
+fail_send_request:
+	virtio_vdmabuf_free_buf(exp->pages_info);
+
+fail_create_pages_info:
+	kvfree(pages);
+
+fail_exp:
+	kvfree(exp->priv);
+
+fail_priv_creation:
+	kvfree(exp);
+
+fail_sgt_info_creation:
+	dma_buf_unmap_attachment(attach, sgt,
+				 DMA_BIDIRECTIONAL);
+
+fail_map_attachment:
+	dma_buf_detach(dmabuf, attach);
+
+fail_attach:
+	dma_buf_put(dmabuf);
+
+	mutex_unlock(&drv_info->g_mutex);
+
+	return ret;
+}
+
+static const struct virtio_vdmabuf_ioctl_desc virtio_vdmabuf_ioctls[] = {
+	VIRTIO_VDMABUF_IOCTL_DEF(VIRTIO_VDMABUF_IOCTL_EXPORT, export_ioctl, 0),
+};
+
+static long virtio_vdmabuf_ioctl(struct file *filp, unsigned int cmd,
+		       		 unsigned long param)
+{
+	const struct virtio_vdmabuf_ioctl_desc *ioctl = NULL;
+	unsigned int nr = _IOC_NR(cmd);
+	int ret;
+	virtio_vdmabuf_ioctl_t func;
+	char *kdata;
+
+	if (nr >= ARRAY_SIZE(virtio_vdmabuf_ioctls)) {
+		dev_err(drv_info->dev, "invalid ioctl\n");
+		return -EINVAL;
+	}
+
+	ioctl = &virtio_vdmabuf_ioctls[nr];
+
+	func = ioctl->func;
+
+	if (unlikely(!func)) {
+		dev_err(drv_info->dev, "no function\n");
+		return -EINVAL;
+	}
+
+	kdata = kvmalloc(_IOC_SIZE(cmd), GFP_KERNEL);
+	if (!kdata)
+		return -ENOMEM;
+
+	if (copy_from_user(kdata, (void __user *)param,
+			   _IOC_SIZE(cmd)) != 0) {
+		dev_err(drv_info->dev,
+			"failed to copy from user arguments\n");
+		ret = -EFAULT;
+		goto ioctl_error;
+	}
+
+	ret = func(filp, kdata);
+
+	if (copy_to_user((void __user *)param, kdata,
+			 _IOC_SIZE(cmd)) != 0) {
+		dev_err(drv_info->dev,
+			"failed to copy to user arguments\n");
+		ret = -EFAULT;
+		goto ioctl_error;
+	}
+
+ioctl_error:
+	kvfree(kdata);
+	return ret;
+}
+
+static unsigned int virtio_vdmabuf_event_poll(struct file *filp,
+			    	    	      struct poll_table_struct *wait)
+{
+	struct virtio_vdmabuf *vdmabuf = filp->private_data;
+
+	poll_wait(filp, &vdmabuf->evq->e_wait, wait);
+
+	if (!list_empty(&vdmabuf->evq->e_list))
+		return POLLIN | POLLRDNORM;
+
+	return 0;
+}
+
+static ssize_t virtio_vdmabuf_event_read(struct file *filp, char __user *buf,
+			       		 size_t cnt, loff_t *ofst)
+{
+	struct virtio_vdmabuf *vdmabuf = filp->private_data;
+	int ret;
+
+	/* make sure user buffer can be written */
+	if (!access_ok(buf, sizeof (*buf))) {
+		dev_err(drv_info->dev, "user buffer can't be written.\n");
+		return -EINVAL;
+	}
+
+	ret = mutex_lock_interruptible(&vdmabuf->evq->e_readlock);
+	if (ret)
+		return ret;
+
+	for (;;) {
+		struct virtio_vdmabuf_event *e = NULL;
+
+		spin_lock_irq(&vdmabuf->evq->e_lock);
+		if (!list_empty(&vdmabuf->evq->e_list)) {
+			e = list_first_entry(&vdmabuf->evq->e_list,
+					     struct virtio_vdmabuf_event, link);
+			list_del(&e->link);
+		}
+		spin_unlock_irq(&vdmabuf->evq->e_lock);
+
+		if (!e) {
+			if (ret)
+				break;
+
+			if (filp->f_flags & O_NONBLOCK) {
+				ret = -EAGAIN;
+				break;
+			}
+
+			mutex_unlock(&vdmabuf->evq->e_readlock);
+			ret = wait_event_interruptible(vdmabuf->evq->e_wait,
+					!list_empty(&vdmabuf->evq->e_list));
+
+			if (ret == 0)
+				ret = mutex_lock_interruptible(
+						&vdmabuf->evq->e_readlock);
+
+			if (ret)
+				return ret;
+		} else {
+			unsigned int len = (sizeof(e->e_data.hdr) +
+					    e->e_data.hdr.size);
+
+			if (len > cnt - ret) {
+put_back_event:
+				spin_lock_irq(&vdmabuf->evq->e_lock);
+				list_add(&e->link, &vdmabuf->evq->e_list);
+				spin_unlock_irq(&vdmabuf->evq->e_lock);
+				break;
+			}
+
+			if (copy_to_user(buf + ret, &e->e_data.hdr,
+					 sizeof(e->e_data.hdr))) {
+				if (ret == 0)
+					ret = -EFAULT;
+
+				goto put_back_event;
+			}
+
+			ret += sizeof(e->e_data.hdr);
+
+			if (copy_to_user(buf + ret, e->e_data.data,
+					 e->e_data.hdr.size)) {
+				/* error while copying void *data */
+
+				struct virtio_vdmabuf_e_hdr dummy_hdr = {0};
+
+				ret -= sizeof(e->e_data.hdr);
+
+				/* nullifying hdr of the event in user buffer */
+				if (copy_to_user(buf + ret, &dummy_hdr,
+						 sizeof(dummy_hdr)))
+					dev_err(drv_info->dev,
+					   "fail to nullify invalid hdr\n");
+
+				ret = -EFAULT;
+
+				goto put_back_event;
+			}
+
+			ret += e->e_data.hdr.size;
+			vdmabuf->evq->pending--;
+			kvfree(e);
+		}
+	}
+
+	mutex_unlock(&vdmabuf->evq->e_readlock);
+
+	return ret;
+}
+
+static const struct file_operations virtio_vdmabuf_fops = {
+	.owner = THIS_MODULE,
+	.open = virtio_vdmabuf_open,
+	.release = virtio_vdmabuf_release,
+	.read = virtio_vdmabuf_event_read,
+	.poll = virtio_vdmabuf_event_poll,
+	.unlocked_ioctl = virtio_vdmabuf_ioctl,
+};
+
+static struct miscdevice virtio_vdmabuf_miscdev = {
+	.minor = MISC_DYNAMIC_MINOR,
+	.name = "virtio-vdmabuf",
+	.fops = &virtio_vdmabuf_fops,
+};
+
+static int virtio_vdmabuf_vdev_probe(struct virtio_device *vdev)
+{
+	vq_callback_t *cb[] = {rx_event};
+	static const char * const name[] = {"virtio_vdmabuf_virtqueue"};
+	struct virtio_vdmabuf *vdmabuf;
+	int ret = 0;
+
+	if (!drv_info)
+		return -EINVAL;
+
+	vdmabuf = drv_info->priv;
+
+	if (!vdmabuf)
+		return -EINVAL;
+
+	vdmabuf->vdev = vdev;
+	vdev->priv = vdmabuf;
+
+	/* initialize spinlock for synchronizing virtqueue accesses */
+	spin_lock_init(&vdmabuf->vq_lock);
+
+	ret = virtio_find_vqs(vdmabuf->vdev, 1, &vdmabuf->vq, cb,
+			      name, NULL);
+	if (ret) {
+		dev_err(drv_info->dev, "Cannot find any vqs\n");
+		return ret;
+	}
+
+	INIT_WORK(&vdmabuf->rx_work, virtio_vdmabuf_rx_work);
+
+	return ret;
+}
+
+static void virtio_vdmabuf_vdev_remove(struct virtio_device *vdev)
+{
+	struct virtio_vdmabuf *vdmabuf;
+
+	if (!drv_info)
+		return;
+
+	vdmabuf = drv_info->priv;
+	flush_work(&vdmabuf->rx_work);
+
+	vdev->config->reset(vdev);
+	vdev->config->del_vqs(vdev);
+}
+
+static void virtio_vdmabuf_vdev_scan(struct virtio_device *vdev)
+{
+	int ret;
+
+	/* Send VIRTIO_VDMABUF_CMD_NEED_VMID request to know vmid
+	 */
+	ret = send_msg_to_host(VIRTIO_VDMABUF_CMD_NEED_VMID, 0);
+	if (ret < 0)
+		dev_err(drv_info->dev, "fail to receive vmid\n");
+
+	return;
+}
+
+static struct virtio_device_id id_table[] = {
+	{ VIRTIO_ID_VDMABUF, VIRTIO_DEV_ANY_ID },
+	{ 0 },
+};
+
+static struct virtio_driver virtio_vdmabuf_vdev_drv = {
+	.driver.name =  KBUILD_MODNAME,
+	.driver.owner = THIS_MODULE,
+	.id_table =     id_table,
+	.probe =        virtio_vdmabuf_vdev_probe,
+	.remove =       virtio_vdmabuf_vdev_remove,
+	.scan =         virtio_vdmabuf_vdev_scan,
+};
+
+static int __init virtio_vdmabuf_init(void)
+{
+	struct virtio_vdmabuf *vdmabuf;
+	int ret = 0;
+
+	drv_info = NULL;
+
+	ret = misc_register(&virtio_vdmabuf_miscdev);
+	if (ret) {
+		pr_err("virtio-vdmabuf misc driver can't be registered\n");
+		return ret;
+	}
+
+	dma_coerce_mask_and_coherent(virtio_vdmabuf_miscdev.this_device,
+				     DMA_BIT_MASK(64));
+
+	drv_info = kvcalloc(1, sizeof(*drv_info), GFP_KERNEL);
+	if (!drv_info) {
+		misc_deregister(&virtio_vdmabuf_miscdev);
+		return -ENOMEM;
+	}
+
+	vdmabuf = kvcalloc(1, sizeof(*vdmabuf), GFP_KERNEL);
+	if (!vdmabuf) {
+		kvfree(drv_info);
+		misc_deregister(&virtio_vdmabuf_miscdev);
+		return -ENOMEM;
+	}
+
+	drv_info->priv = (void *)vdmabuf;
+	drv_info->dev = virtio_vdmabuf_miscdev.this_device;
+
+	mutex_init(&drv_info->g_mutex);
+
+	mutex_init(&vdmabuf->evq->e_readlock);
+	spin_lock_init(&vdmabuf->evq->e_lock);
+
+	INIT_LIST_HEAD(&vdmabuf->evq->e_list);
+	init_waitqueue_head(&vdmabuf->evq->e_wait);
+	hash_init(drv_info->buf_list);
+
+	vdmabuf->evq->pending = 0;
+	vdmabuf->wq = create_workqueue("virtio_vdmabuf_wq");
+
+	ret = register_virtio_driver(&virtio_vdmabuf_vdev_drv);
+	if (ret) {
+		dev_err(drv_info->dev, "vdmabuf driver can't be registered\n");
+		misc_deregister(&virtio_vdmabuf_miscdev);
+		kvfree(vdmabuf);
+		kvfree(drv_info);
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+static void __exit virtio_vdmabuf_deinit(void)
+{
+	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
+	struct virtio_vdmabuf_event *e, *et;
+	unsigned long irqflags;
+
+	misc_deregister(&virtio_vdmabuf_miscdev);
+	unregister_virtio_driver(&virtio_vdmabuf_vdev_drv);
+
+	if (vdmabuf->wq)
+		destroy_workqueue(vdmabuf->wq);
+
+	spin_lock_irqsave(&vdmabuf->evq->e_lock, irqflags);
+
+	list_for_each_entry_safe(e, et, &vdmabuf->evq->e_list,
+				 link) {
+		list_del(&e->link);
+		kvfree(e);
+		vdmabuf->evq->pending--;
+	}
+
+	spin_unlock_irqrestore(&vdmabuf->evq->e_lock, irqflags);
+
+	/* freeing all exported buffers */
+	remove_all_bufs(vdmabuf);
+
+	kvfree(vdmabuf);
+	kvfree(drv_info);
+}
+
+module_init(virtio_vdmabuf_init);
+module_exit(virtio_vdmabuf_deinit);
+
+MODULE_DEVICE_TABLE(virtio, virtio_vdmabuf_id_table);
+MODULE_DESCRIPTION("Virtio Vdmabuf frontend driver");
+MODULE_LICENSE("GPL and additional rights");
diff --git a/include/linux/virtio_vdmabuf.h b/include/linux/virtio_vdmabuf.h
new file mode 100644
index 000000000000..3b15024230c0
--- /dev/null
+++ b/include/linux/virtio_vdmabuf.h
@@ -0,0 +1,271 @@
+/* SPDX-License-Identifier: (MIT OR GPL-2.0) */
+
+/*
+ * Copyright © 2021 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#ifndef _LINUX_VIRTIO_VDMABUF_H 
+#define _LINUX_VIRTIO_VDMABUF_H 
+
+#include <uapi/linux/virtio_vdmabuf.h>
+#include <linux/hashtable.h>
+#include <linux/kvm_types.h>
+
+struct virtio_vdmabuf_shared_pages {
+	/* cross-VM ref addr for the buffer */
+	gpa_t ref;
+
+	/* page array */
+	struct page **pages;
+	gpa_t **l2refs;
+	gpa_t *l3refs;
+
+	/* data offset in the first page
+	 * and data length in the last page
+	 */
+	int first_ofst;
+	int last_len;
+
+	/* number of shared pages */
+	int nents;
+};
+
+struct virtio_vdmabuf_buf {
+	virtio_vdmabuf_buf_id_t buf_id;
+
+	struct dma_buf_attachment *attach;
+	struct dma_buf *dma_buf;
+	struct sg_table *sgt;
+	struct virtio_vdmabuf_shared_pages *pages_info;
+	int vmid;
+
+	/* validity of the buffer */
+	bool valid;
+
+	/* set if the buffer is imported via import_ioctl */
+	bool imported;
+
+	/* size of private */
+	size_t sz_priv;
+	/* private data associated with the exported buffer */
+	void *priv;
+
+	struct file *filp;
+	struct hlist_node node;
+};
+
+struct virtio_vdmabuf_event {
+	struct virtio_vdmabuf_e_data e_data;
+	struct list_head link;
+};
+
+struct virtio_vdmabuf_event_queue {
+	wait_queue_head_t e_wait;
+	struct list_head e_list;
+
+	spinlock_t e_lock;
+	struct mutex e_readlock;
+
+	/* # of pending events */
+	int pending;
+};
+
+/* driver information */
+struct virtio_vdmabuf_info {
+	struct device *dev;
+
+	struct list_head head_vdmabuf_list;
+	struct list_head kvm_instances;
+
+	DECLARE_HASHTABLE(buf_list, 7);
+
+	void *priv;
+	struct mutex g_mutex;
+};
+
+/* IOCTL definitions
+ */
+typedef int (*virtio_vdmabuf_ioctl_t)(struct file *filp, void *data);
+
+struct virtio_vdmabuf_ioctl_desc {
+	unsigned int cmd;
+	int flags;
+	virtio_vdmabuf_ioctl_t func;
+	const char *name;
+};
+
+#define VIRTIO_VDMABUF_IOCTL_DEF(ioctl, _func, _flags)	\
+	[_IOC_NR(ioctl)] = {			\
+			.cmd = ioctl,		\
+			.func = _func,		\
+			.flags = _flags,	\
+			.name = #ioctl		\
+}
+
+#define VIRTIO_VDMABUF_VMID(buf_id) ((((buf_id).id) >> 32) & 0xFFFFFFFF)
+
+/* Messages between Host and Guest */
+
+/* List of commands from Guest to Host:
+ *
+ * ------------------------------------------------------------------
+ * A. NEED_VMID
+ *
+ *  guest asks the host to provide its vmid
+ *
+ * req:
+ *
+ * cmd: VIRTIO_VDMABUF_NEED_VMID
+ *
+ * ack:
+ *
+ * cmd: same as req
+ * op[0] : vmid of guest
+ *
+ * ------------------------------------------------------------------
+ * B. EXPORT
+ *
+ *  export dmabuf to host
+ *
+ * req:
+ *
+ * cmd: VIRTIO_VDMABUF_CMD_EXPORT
+ * op0~op3 : HDMABUF ID
+ * op4 : number of pages to be shared
+ * op5 : offset of data in the first page
+ * op6 : length of data in the last page
+ * op7 : upper 32 bit of top-level ref of shared buf
+ * op8 : lower 32 bit of top-level ref of shared buf
+ * op9 : size of private data
+ * op10 ~ op64: User private date associated with the buffer
+ *	        (e.g. graphic buffer's meta info)
+ *
+ * ------------------------------------------------------------------
+ *
+ * List of commands from Host to Guest
+ *
+ * ------------------------------------------------------------------
+ * A. RELEASE
+ *
+ *  notifying guest that the shared buffer is released by an importer
+ *
+ * req:
+ *
+ * cmd: VIRTIO_VDMABUF_CMD_DMABUF_REL
+ * op0~op3 : VDMABUF ID
+ *
+ * ------------------------------------------------------------------
+ */
+
+/* msg structures */
+struct virtio_vdmabuf_msg {
+	unsigned int req_id;
+	unsigned int stat;
+	unsigned int cmd;
+	unsigned int op[64];
+};
+
+struct virtio_vdmabuf_txmsg {
+	struct virtio_vdmabuf_msg msg;
+	void __user *msg_ptr;
+	int head;
+};
+
+enum virtio_vdmabuf_cmd {
+	VIRTIO_VDMABUF_CMD_NEED_VMID,
+	VIRTIO_VDMABUF_CMD_EXPORT = 0x10,
+	VIRTIO_VDMABUF_CMD_DMABUF_REL
+};
+
+enum virtio_vdmabuf_ops {
+	VIRTIO_VDMABUF_HDMABUF_ID_ID = 0,
+	VIRTIO_VDMABUF_HDMABUF_ID_RNG_KEY0,
+	VIRTIO_VDMABUF_HDMABUF_ID_RNG_KEY1,
+	VIRTIO_VDMABUF_NUM_PAGES_SHARED,
+	VIRTIO_VDMABUF_FIRST_PAGE_DATA_OFFSET,
+	VIRTIO_VDMABUF_LAST_PAGE_DATA_LENGTH,
+	VIRTIO_VDMABUF_REF_ADDR_UPPER_32BIT,
+	VIRTIO_VDMABUF_REF_ADDR_LOWER_32BIT,
+	VIRTIO_VDMABUF_PRIVATE_DATA_SIZE,
+	VIRTIO_VDMABUF_PRIVATE_DATA_START
+};
+
+/* adding exported/imported vdmabuf info to hash */
+static inline int
+virtio_vdmabuf_add_buf(struct virtio_vdmabuf_info *info,
+                       struct virtio_vdmabuf_buf *new)
+{
+	hash_add(info->buf_list, &new->node, new->buf_id.id);
+	return 0;
+}
+
+/* comparing two vdmabuf IDs */
+static inline bool
+is_same_buf(virtio_vdmabuf_buf_id_t a,
+            virtio_vdmabuf_buf_id_t b)
+{
+	int i;
+
+	if (a.id != b.id)
+		return 1;
+
+	/* compare keys */
+	for (i = 0; i < 2; i++) {
+		if (a.rng_key[i] != b.rng_key[i])
+			return false;
+	}
+
+	return true;
+}
+
+/* find buf for given vdmabuf ID */
+static inline struct virtio_vdmabuf_buf
+*virtio_vdmabuf_find_buf(struct virtio_vdmabuf_info *info,
+			 virtio_vdmabuf_buf_id_t *buf_id)
+{
+	struct virtio_vdmabuf_buf *found;
+
+	hash_for_each_possible(info->buf_list, found, node, buf_id->id)
+		if (is_same_buf(found->buf_id, *buf_id))
+			return found;
+
+	return NULL;
+}
+
+/* delete buf from hash */
+static inline int
+virtio_vdmabuf_del_buf(struct virtio_vdmabuf_info *info,
+                       virtio_vdmabuf_buf_id_t *buf_id)
+{
+	struct virtio_vdmabuf_buf *found;
+
+	found = virtio_vdmabuf_find_buf(info, buf_id);
+	if (!found)
+		return -ENOENT;
+
+	hash_del(&found->node);
+
+	return 0;
+}
+
+#endif
diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
index bc1c0621f5ed..f1ea1c2f5cbf 100644
--- a/include/uapi/linux/virtio_ids.h
+++ b/include/uapi/linux/virtio_ids.h
@@ -54,5 +54,6 @@
 #define VIRTIO_ID_FS			26 /* virtio filesystem */
 #define VIRTIO_ID_PMEM			27 /* virtio pmem */
 #define VIRTIO_ID_MAC80211_HWSIM	29 /* virtio mac80211-hwsim */
+#define VIRTIO_ID_VDMABUF          	30 /* virtio vdmabuf */
 
 #endif /* _LINUX_VIRTIO_IDS_H */
diff --git a/include/uapi/linux/virtio_vdmabuf.h b/include/uapi/linux/virtio_vdmabuf.h
new file mode 100644
index 000000000000..7bddaa04ddd6
--- /dev/null
+++ b/include/uapi/linux/virtio_vdmabuf.h
@@ -0,0 +1,99 @@
+// SPDX-License-Identifier: (MIT OR GPL-2.0)
+
+/*
+ * Copyright © 2021 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#ifndef _UAPI_LINUX_VIRTIO_VDMABUF_H
+#define _UAPI_LINUX_VIRTIO_VDMABUF_H
+
+#define MAX_SIZE_PRIV_DATA 192
+
+typedef struct {
+	__u64 id;
+	/* 8B long Random number */
+	int rng_key[2];
+} virtio_vdmabuf_buf_id_t;
+
+struct virtio_vdmabuf_e_hdr {
+	/* buf_id of new buf */
+	virtio_vdmabuf_buf_id_t buf_id;
+	/* size of private data */
+	int size;
+};
+
+struct virtio_vdmabuf_e_data {
+	struct virtio_vdmabuf_e_hdr hdr;
+	/* ptr to private data */
+	void __user *data;
+};
+
+#define VIRTIO_VDMABUF_IOCTL_IMPORT \
+_IOC(_IOC_NONE, 'G', 2, sizeof(struct virtio_vdmabuf_import))
+#define VIRTIO_VDMABUF_IOCTL_RELEASE \
+_IOC(_IOC_NONE, 'G', 3, sizeof(struct virtio_vdmabuf_import))
+struct virtio_vdmabuf_import {
+	/* IN parameters */
+	/* ahdb buf id to be imported */
+	virtio_vdmabuf_buf_id_t buf_id;
+	/* flags */
+	int flags;
+	/* OUT parameters */
+	/* exported dma buf fd */
+	int fd;
+};
+
+#define VIRTIO_VDMABUF_IOCTL_EXPORT \
+_IOC(_IOC_NONE, 'G', 4, sizeof(struct virtio_vdmabuf_export))
+struct virtio_vdmabuf_export {
+	/* IN parameters */
+	/* DMA buf fd to be exported */
+	int fd;
+	/* exported dma buf id */
+	virtio_vdmabuf_buf_id_t buf_id;
+	int sz_priv;
+	char *priv;
+};
+
+#define VIRTIO_VDMABUF_IOCTL_QUERY \
+_IOC(_IOC_NONE, 'G', 5, sizeof(struct virtio_vdmabuf_query))
+struct virtio_vdmabuf_query {
+	/* in parameters */
+	/* id of buf to be queried */
+	virtio_vdmabuf_buf_id_t buf_id;
+	/* item to be queried */
+	int item;
+	/* OUT parameters */
+	/* Value of queried item */
+	unsigned long info;
+};
+
+/* DMABUF query */
+enum virtio_vdmabuf_query_cmd {
+	VIRTIO_VDMABUF_QUERY_SIZE = 0x10,
+	VIRTIO_VDMABUF_QUERY_BUSY,
+	VIRTIO_VDMABUF_QUERY_PRIV_INFO_SIZE,
+	VIRTIO_VDMABUF_QUERY_PRIV_INFO,
+};
+
+#endif
-- 
2.26.2

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC 2/3] vhost: Add Vdmabuf backend
  2021-01-19  8:28 [RFC 0/3] Introduce Vdmabuf driver Vivek Kasireddy
  2021-01-19  8:28 ` [RFC 1/3] virtio: " Vivek Kasireddy
@ 2021-01-19  8:28 ` Vivek Kasireddy
  2021-01-19  8:28 ` [RFC 3/3] vfio: Share the KVM instance with Vdmabuf Vivek Kasireddy
  2 siblings, 0 replies; 10+ messages in thread
From: Vivek Kasireddy @ 2021-01-19  8:28 UTC (permalink / raw)
  To: virtualization; +Cc: dongwon.kim

This backend acts as the counterpart to the Vdmabuf Virtio frontend.
When it receives a new export event from the frontend, it raises an
event to alert the Qemu UI/userspace. Qemu then "imports" this buffer
using the Unique ID.

As part of the import step, a new dmabuf is created on the Host using
the page information obtained from the Guest. The fd associated with
this dmabuf is made available to Qemu UI/userspace which then creates
a texture from it for the purpose of displaying it.

Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
---
 drivers/vhost/Kconfig   |    9 +
 drivers/vhost/Makefile  |    3 +
 drivers/vhost/vdmabuf.c | 1332 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 1344 insertions(+)
 create mode 100644 drivers/vhost/vdmabuf.c

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 587fbae06182..1f1c51c4499e 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -89,4 +89,13 @@ config VHOST_CROSS_ENDIAN_LEGACY
 
 	  If unsure, say "N".
 
+config VHOST_VDMABUF
+	bool "Vhost backend for the Vdmabuf driver"
+	depends on EVENTFD
+	select VHOST
+	default n
+	help
+	  This driver works in pair with the Virtio Vdmabuf frontend. It can
+	  be used to create a dmabuf using the pages shared by the Guest.
+
 endif
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index f3e1897cce85..5c2cea4a7eaf 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -17,3 +17,6 @@ obj-$(CONFIG_VHOST)	+= vhost.o
 
 obj-$(CONFIG_VHOST_IOTLB) += vhost_iotlb.o
 vhost_iotlb-y := iotlb.o
+
+obj-$(CONFIG_VHOST_VDMABUF) += vhost_vdmabuf.o
+vhost_vdmabuf-y := vdmabuf.o
diff --git a/drivers/vhost/vdmabuf.c b/drivers/vhost/vdmabuf.c
new file mode 100644
index 000000000000..7e2576fc2c0d
--- /dev/null
+++ b/drivers/vhost/vdmabuf.c
@@ -0,0 +1,1332 @@
+// SPDX-License-Identifier: (MIT OR GPL-2.0)
+
+/*
+ * Copyright © 2021 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Dongwon Kim <dongwon.kim@intel.com>
+ *    Mateusz Polrola <mateusz.polrola@gmail.com>
+ *    Vivek Kasireddy <vivek.kasireddy@intel.com>
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/miscdevice.h>
+#include <linux/workqueue.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/hashtable.h>
+#include <linux/uaccess.h>
+#include <linux/poll.h>
+#include <linux/dma-buf.h>
+#include <linux/vhost.h>
+#include <linux/vfio.h>
+#include <linux/kvm_host.h>
+#include <linux/virtio_vdmabuf.h>
+
+#include "vhost.h"
+
+#define REFS_PER_PAGE (PAGE_SIZE/sizeof(long))
+
+static struct virtio_vdmabuf_info *drv_info;
+
+struct kvm_instance {
+	struct kvm *kvm;
+	struct list_head link;
+};
+
+struct vhost_vdmabuf {
+	struct vhost_dev dev;
+	struct vhost_virtqueue vq;
+	struct vhost_work tx_work;
+	struct virtio_vdmabuf_event_queue *evq;
+	u64 vmid;
+
+	/* synchronization between transmissions */
+	struct mutex tx_mutex;
+	/* synchronization on tx and rx */
+	struct mutex vq_mutex;
+
+	struct virtio_vdmabuf_txmsg next;
+	struct list_head list;
+	struct kvm *kvm;
+};
+
+static inline void vhost_vdmabuf_add(struct vhost_vdmabuf *new)
+{
+	list_add_tail(&new->list, &drv_info->head_vdmabuf_list);
+}
+
+static inline struct vhost_vdmabuf *vhost_vdmabuf_find(u64 vmid)
+{
+	struct vhost_vdmabuf *found;
+
+	list_for_each_entry(found, &drv_info->head_vdmabuf_list, list)
+		if (found->vmid == vmid)
+			return found;
+
+	return NULL;
+}
+
+static inline bool vhost_vdmabuf_del(struct vhost_vdmabuf *vdmabuf)
+{
+	struct vhost_vdmabuf *iter, *temp;
+
+	list_for_each_entry_safe(iter, temp,
+				 &drv_info->head_vdmabuf_list,
+				 list)
+		if (iter == vdmabuf) {
+			list_del(&iter->list);
+			return true;
+		}
+
+	return false;
+}
+
+static inline void vhost_vdmabuf_del_all(void)
+{
+	struct vhost_vdmabuf *iter, *temp;
+
+	list_for_each_entry_safe(iter, temp,
+				 &drv_info->head_vdmabuf_list,
+				 list) {
+		list_del(&iter->list);
+		kfree(iter);
+	}
+}
+
+static void *map_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
+{
+	struct kvm_host_map map;
+	int ret;
+
+	ret = kvm_vcpu_map(vcpu, gpa_to_gfn(gpa), &map);
+	if (ret < 0)
+		return ERR_PTR(ret);
+	else
+		return map.hva;
+}
+
+static void unmap_hva(struct kvm_vcpu *vcpu, gpa_t hva)
+{
+	struct page *page = virt_to_page(hva);
+	struct kvm_host_map map;
+
+	map.hva = (void *)hva;
+	map.page = page;
+
+	kvm_vcpu_unmap(vcpu, &map, true);
+}
+
+/* mapping guest's pages for the vdmabuf */
+static int
+vhost_vdmabuf_map_pages(u64 vmid,
+		        struct virtio_vdmabuf_shared_pages *pages_info)
+{
+	struct vhost_vdmabuf *vdmabuf = vhost_vdmabuf_find(vmid);
+	struct kvm_vcpu *vcpu;
+	void *paddr;
+	int npgs = REFS_PER_PAGE;
+	int last_nents, n_l2refs;
+	int i, j = 0, k = 0;
+
+	if (!vdmabuf || !vdmabuf->kvm || !pages_info || pages_info->pages)
+		return -EINVAL;
+
+	vcpu = kvm_get_vcpu_by_id(vdmabuf->kvm, 0);
+	if (!vcpu)
+		return -EINVAL;
+
+	last_nents = (pages_info->nents - 1) % npgs + 1;
+	n_l2refs = (pages_info->nents / npgs) + ((last_nents > 0) ? 1 : 0) -
+		   (last_nents == npgs);
+
+	pages_info->pages = kcalloc(pages_info->nents, sizeof(struct page *),
+				    GFP_KERNEL);
+	if (!pages_info->pages)
+		goto fail_page_alloc;
+
+	pages_info->l2refs = kcalloc(n_l2refs, sizeof(gpa_t *), GFP_KERNEL);
+	if (!pages_info->l2refs)
+		goto fail_l2refs;
+
+	pages_info->l3refs = (gpa_t *)map_gpa(vcpu, pages_info->ref);
+	if (IS_ERR(pages_info->l3refs))
+		goto fail_l3refs;
+
+	for (i = 0; i < n_l2refs; i++) {
+		pages_info->l2refs[i] = (gpa_t *)map_gpa(vcpu,
+							 pages_info->l3refs[i]);
+
+		if (IS_ERR(pages_info->l2refs[i]))
+			goto fail_mapping_l2;
+
+		/* last level-2 ref */
+		if (i == n_l2refs - 1)
+			npgs = last_nents;
+
+		for (j = 0; j < npgs; j++) {
+			paddr = map_gpa(vcpu, pages_info->l2refs[i][j]);
+			if (IS_ERR(paddr))
+				goto fail_mapping_l1;
+
+			pages_info->pages[k] = virt_to_page(paddr);
+			k++;
+		}
+		unmap_hva(vcpu, pages_info->l3refs[i]);
+	}
+
+	unmap_hva(vcpu, pages_info->ref);
+
+	return 0;
+
+fail_mapping_l1:
+	for (k = 0; k < j; k++)
+		unmap_hva(vcpu, pages_info->l2refs[i][k]);
+
+fail_mapping_l2:
+	for (j = 0; j < i; j++) {
+		for (k = 0; k < REFS_PER_PAGE; k++)
+			unmap_hva(vcpu, pages_info->l2refs[i][k]);
+	}
+
+	unmap_hva(vcpu, pages_info->l3refs[i]);
+	unmap_hva(vcpu, pages_info->ref);
+
+fail_l3refs:
+	kfree(pages_info->l2refs);
+
+fail_l2refs:
+	kfree(pages_info->pages);
+
+fail_page_alloc:
+	return -ENOMEM;
+}
+
+/* unmapping mapped pages */
+static int
+vhost_vdmabuf_unmap_pages(u64 vmid,
+			  struct virtio_vdmabuf_shared_pages *pages_info)
+{
+	struct vhost_vdmabuf *vdmabuf = vhost_vdmabuf_find(vmid);
+	struct kvm_vcpu *vcpu;
+	int last_nents = (pages_info->nents - 1) % REFS_PER_PAGE + 1;
+	int n_l2refs = (pages_info->nents / REFS_PER_PAGE) +
+		       ((last_nents > 0) ? 1 : 0) -
+		       (last_nents == REFS_PER_PAGE);
+	int i, j;
+
+	if (!vdmabuf || !vdmabuf->kvm || !pages_info || pages_info->pages)
+		return -EINVAL;
+
+	vcpu = kvm_get_vcpu_by_id(vdmabuf->kvm, 0);
+	if (!vcpu)
+		return -EINVAL;
+
+	for (i = 0; i < n_l2refs - 1; i++) {
+		for (j = 0; j < REFS_PER_PAGE; j++)
+			unmap_hva(vcpu, pages_info->l2refs[i][j]);
+	}
+
+	for (j = 0; j < last_nents; j++)
+		unmap_hva(vcpu, pages_info->l2refs[i][j]);
+
+	kfree(pages_info->l2refs);
+	kfree(pages_info->pages);
+	pages_info->pages = NULL;
+
+	return 0;
+}
+
+/* create sg_table with given pages and other parameters */
+static struct sg_table *new_sgt(struct page **pgs,
+				int first_ofst, int last_len,
+				int nents)
+{
+	struct sg_table *sgt;
+	struct scatterlist *sgl;
+	int i, ret;
+
+	sgt = kmalloc(sizeof(struct sg_table), GFP_KERNEL);
+	if (!sgt)
+		return NULL;
+
+	ret = sg_alloc_table(sgt, nents, GFP_KERNEL);
+	if (ret) {
+		kfree(sgt);
+		return NULL;
+	}
+
+	sgl = sgt->sgl;
+	sg_set_page(sgl, pgs[0], PAGE_SIZE-first_ofst, first_ofst);
+
+	for (i = 1; i < nents-1; i++) {
+		sgl = sg_next(sgl);
+		sg_set_page(sgl, pgs[i], PAGE_SIZE, 0);
+	}
+
+	/* more than 1 page */
+	if (nents > 1) {
+		sgl = sg_next(sgl);
+		sg_set_page(sgl, pgs[i], last_len, 0);
+	}
+
+	return sgt;
+}
+
+static struct sg_table
+*vhost_vdmabuf_dmabuf_map(struct dma_buf_attachment *attachment,
+			  enum dma_data_direction dir)
+{
+	struct virtio_vdmabuf_buf *imp;
+
+	if (!attachment->dmabuf || !attachment->dmabuf->priv)
+		return NULL;
+
+	imp = (struct virtio_vdmabuf_buf *)attachment->dmabuf->priv;
+
+	/* if buffer has never been mapped */
+	if (!imp->sgt) {
+		imp->sgt = new_sgt(imp->pages_info->pages,
+				   imp->pages_info->first_ofst,
+				   imp->pages_info->last_len,
+				   imp->pages_info->nents);
+
+		if (!imp->sgt)
+			return NULL;
+	}
+
+	if (!dma_map_sg(attachment->dev, imp->sgt->sgl,
+			imp->sgt->nents, dir)) {
+		sg_free_table(imp->sgt);
+		kfree(imp->sgt);
+		return NULL;
+	}
+
+	return imp->sgt;
+}
+
+static void
+vhost_vdmabuf_dmabuf_unmap(struct dma_buf_attachment *attachment,
+	   	           struct sg_table *sg,
+			   enum dma_data_direction dir)
+{
+	dma_unmap_sg(attachment->dev, sg->sgl, sg->nents, dir);
+}
+
+static int vhost_vdmabuf_dmabuf_mmap(struct dma_buf *dmabuf,
+				     struct vm_area_struct *vma)
+{
+	struct virtio_vdmabuf_buf *imp;
+	u64 uaddr;
+	int i, err;
+
+	if (!dmabuf->priv)
+		return -EINVAL;
+
+	imp = (struct virtio_vdmabuf_buf *)dmabuf->priv;
+
+	if (!imp->pages_info)
+		return -EINVAL;
+
+	vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+
+	uaddr = vma->vm_start;
+	for (i = 0; i < imp->pages_info->nents; i++) {
+		err = vm_insert_page(vma, uaddr,
+				     imp->pages_info->pages[i]);
+		if (err)
+			return err;
+
+		uaddr += PAGE_SIZE;
+	}
+
+	return 0;
+}
+
+static int vhost_vdmabuf_dmabuf_vmap(struct dma_buf *dmabuf,
+				     struct dma_buf_map *map)
+{
+	struct virtio_vdmabuf_buf *imp;
+	void *addr;
+
+	if (!dmabuf->priv)
+		return -EINVAL;
+
+	imp = (struct virtio_vdmabuf_buf *)dmabuf->priv;
+
+	if (!imp->pages_info)
+		return -EINVAL;
+
+	addr = vmap(imp->pages_info->pages, imp->pages_info->nents,
+                    0, PAGE_KERNEL);
+	if (IS_ERR(addr))
+		return PTR_ERR(addr);
+
+	return 0;
+}
+
+static void vhost_vdmabuf_dmabuf_release(struct dma_buf *dma_buf)
+{
+	struct virtio_vdmabuf_buf *imp;
+
+	if (!dma_buf->priv)
+		return;
+
+	imp = (struct virtio_vdmabuf_buf *)dma_buf->priv;
+	imp->dma_buf = NULL;
+	imp->valid = false;
+
+	vhost_vdmabuf_unmap_pages(imp->vmid, imp->pages_info);
+	virtio_vdmabuf_del_buf(drv_info, &imp->buf_id);
+
+	if (imp->sgt) {
+		sg_free_table(imp->sgt);
+		kfree(imp->sgt);
+		imp->sgt = NULL;
+	}
+
+	kfree(imp->priv);
+	kfree(imp->pages_info);
+	kfree(imp);
+}
+
+static const struct dma_buf_ops vhost_vdmabuf_dmabuf_ops = {
+	.map_dma_buf = vhost_vdmabuf_dmabuf_map,
+	.unmap_dma_buf = vhost_vdmabuf_dmabuf_unmap,
+	.release = vhost_vdmabuf_dmabuf_release,
+	.mmap = vhost_vdmabuf_dmabuf_mmap,
+	.vmap = vhost_vdmabuf_dmabuf_vmap,
+};
+
+/* exporting dmabuf as fd */
+static int vhost_vdmabuf_exp_fd(struct virtio_vdmabuf_buf *imp, int flags)
+{
+	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
+
+	exp_info.ops = &vhost_vdmabuf_dmabuf_ops;
+
+	/* multiple of PAGE_SIZE, not considering offset */
+	exp_info.size = imp->pages_info->nents * PAGE_SIZE;
+	exp_info.flags = 0;
+	exp_info.priv = imp;
+
+	if (!imp->dma_buf) {
+		imp->dma_buf = dma_buf_export(&exp_info);
+		if (IS_ERR_OR_NULL(imp->dma_buf)) {
+			imp->dma_buf = NULL;
+			return -EINVAL;
+		}
+	}
+
+	return dma_buf_fd(imp->dma_buf, flags);
+}
+
+static int vhost_vdmabuf_add_event(struct vhost_vdmabuf *vdmabuf,
+				   struct virtio_vdmabuf_buf *buf_info)
+{
+	struct virtio_vdmabuf_event *e_oldest, *e_new;
+	struct virtio_vdmabuf_event_queue *evq = vdmabuf->evq;
+	unsigned long irqflags;
+
+	e_new = kzalloc(sizeof(*e_new), GFP_KERNEL);
+	if (!e_new)
+		return -ENOMEM;
+
+	e_new->e_data.hdr.buf_id = buf_info->buf_id;
+	e_new->e_data.data = (void *)buf_info->priv;
+	e_new->e_data.hdr.size = buf_info->sz_priv;
+
+	spin_lock_irqsave(&evq->e_lock, irqflags);
+
+	/* check current number of event then if it hits the max num (32)
+	 * then remove the oldest event in the list
+	 */
+	if (evq->pending > 31) {
+		e_oldest = list_first_entry(&evq->e_list,
+					    struct virtio_vdmabuf_event, link);
+		list_del(&e_oldest->link);
+		evq->pending--;
+		kfree(e_oldest);
+	}
+
+	list_add_tail(&e_new->link, &evq->e_list);
+
+	evq->pending++;
+
+	wake_up_interruptible(&evq->e_wait);
+	spin_unlock_irqrestore(&evq->e_lock, irqflags);
+
+	return 0;
+}
+
+/* transmitting message */
+static int send_msg_to_guest(u64 vmid, enum virtio_vdmabuf_cmd cmd, int *op)
+{
+	struct virtio_vdmabuf_msg msg;
+	struct vhost_vdmabuf *vdmabuf;
+
+	vdmabuf = vhost_vdmabuf_find(vmid);
+	if (!vdmabuf) {
+		dev_err(drv_info->dev,
+			"can't find vdmabuf for : vmid = %llu\n", vmid);
+		return -EINVAL;
+	}
+
+	switch (cmd) {
+	case VIRTIO_VDMABUF_CMD_DMABUF_REL:
+		memcpy(&msg.op[0], &op[0], 8 * sizeof(int));
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	msg.cmd = cmd;
+
+	mutex_lock(&vdmabuf->tx_mutex);
+	mutex_lock(&vdmabuf->vq_mutex);
+
+	if ((vdmabuf->next.msg_ptr == NULL) ||
+	    (vdmabuf->next.head == -1)) {
+		mutex_unlock(&vdmabuf->vq_mutex);
+		mutex_unlock(&vdmabuf->tx_mutex);
+		return -EBUSY;
+	}
+
+	memcpy(&vdmabuf->next.msg, &msg, sizeof(msg));
+	vhost_work_queue(&vdmabuf->dev, &vdmabuf->tx_work);
+
+	mutex_unlock(&vdmabuf->vq_mutex);
+	mutex_unlock(&vdmabuf->tx_mutex);
+
+	return 0;
+}
+
+static int register_exported(struct vhost_vdmabuf *vdmabuf,
+			     virtio_vdmabuf_buf_id_t *buf_id, int *ops)
+{
+	struct virtio_vdmabuf_buf *imp;
+	int ret;
+
+	imp = kcalloc(1, sizeof(*imp), GFP_KERNEL);
+	if (!imp)
+		return -ENOMEM;
+
+	imp->pages_info = kcalloc(1, sizeof(struct virtio_vdmabuf_shared_pages),
+				  GFP_KERNEL);
+	if (!imp->pages_info) {
+		kfree(imp);
+		return -ENOMEM;
+	}
+
+	imp->sz_priv = ops[VIRTIO_VDMABUF_PRIVATE_DATA_SIZE];
+	if (imp->sz_priv) {
+		imp->priv = kcalloc(1, ops[VIRTIO_VDMABUF_PRIVATE_DATA_SIZE],
+				    GFP_KERNEL);
+		if (!imp->priv) {
+			kfree(imp->pages_info);
+			kfree(imp);
+			return -ENOMEM;
+		}
+	}
+
+	memcpy(&imp->buf_id, buf_id, sizeof(*buf_id));
+
+	imp->pages_info->nents = ops[VIRTIO_VDMABUF_NUM_PAGES_SHARED];
+	imp->pages_info->first_ofst = ops[VIRTIO_VDMABUF_FIRST_PAGE_DATA_OFFSET];
+	imp->pages_info->last_len = ops[VIRTIO_VDMABUF_LAST_PAGE_DATA_LENGTH];
+	imp->pages_info->ref = *(gpa_t *)&ops[VIRTIO_VDMABUF_REF_ADDR_UPPER_32BIT];
+	imp->vmid = vdmabuf->vmid;
+	imp->valid = true;
+
+	virtio_vdmabuf_add_buf(drv_info, imp);
+
+	/* transferring private data */
+	memcpy(imp->priv, &ops[VIRTIO_VDMABUF_PRIVATE_DATA_START],
+	       ops[VIRTIO_VDMABUF_PRIVATE_DATA_SIZE]);
+
+	/* generate import event */
+	ret = vhost_vdmabuf_add_event(vdmabuf, imp);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static void send_to_txq(struct vhost_vdmabuf *vdmabuf,
+			struct virtio_vdmabuf_txmsg *msg_info)
+{
+	int ret;
+
+	ret = __copy_to_user(msg_info->msg_ptr, &msg_info->msg,
+			     sizeof(struct virtio_vdmabuf_msg));
+
+	if (!ret) {
+		vhost_add_used_and_signal(&vdmabuf->dev, &vdmabuf->vq,
+					  msg_info->head,
+					  sizeof(struct virtio_vdmabuf_msg));
+		msg_info->msg_ptr = NULL;
+		msg_info->head = -1;
+	} else {
+		dev_err(drv_info->dev,
+			"fail to copy tx msg\n");
+	}
+}
+
+static void tx_work(struct vhost_work *work)
+{
+	struct vhost_vdmabuf *vdmabuf = container_of(work,
+					             struct vhost_vdmabuf,
+					             tx_work);
+
+	mutex_lock(&vdmabuf->vq_mutex);
+	send_to_txq(vdmabuf, &vdmabuf->next);
+	mutex_unlock(&vdmabuf->vq_mutex);
+}
+
+/* parse incoming message from a guest */
+static int parse_msg(struct vhost_vdmabuf *vdmabuf,
+		     struct virtio_vdmabuf_msg *msg,
+                     void __user *out, int head)
+{
+	struct virtio_vdmabuf_txmsg *msg_info;
+	virtio_vdmabuf_buf_id_t *buf_id = (virtio_vdmabuf_buf_id_t *)msg->op;
+	int ret = 0;
+
+	msg_info = kcalloc(1, sizeof(*msg_info), GFP_KERNEL);
+	if (!msg_info)
+		return -ENOMEM;
+
+	switch (msg->cmd) {
+	case VIRTIO_VDMABUF_CMD_EXPORT:
+		ret = register_exported(vdmabuf, buf_id, msg->op);
+		if (ret)
+			break;
+
+		memcpy(&msg_info->msg, msg, sizeof(struct virtio_vdmabuf_msg));
+		break;
+
+	default:
+		ret = -EINVAL;
+		break;
+	}
+
+	msg_info->msg_ptr = out;
+	msg_info->head = head;
+
+	kfree(msg_info);
+
+	return ret;
+}
+
+static void rx_work(struct vhost_work *work)
+{
+	struct vhost_virtqueue *vq = container_of(work,
+						  struct vhost_virtqueue,
+						  poll.work);
+	struct vhost_vdmabuf *vdmabuf = container_of(vq->dev,
+					      	     struct vhost_vdmabuf,
+					      	     dev);
+	struct virtio_vdmabuf_msg msg;
+	int head, in, out, in_size;
+	int ret;
+
+	mutex_lock(&vdmabuf->vq_mutex);
+	vhost_disable_notify(&vdmabuf->dev, vq);
+
+	/* Make sure we will process all pending requests */
+	for (;;) {
+		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+					 &out, &in, NULL, NULL);
+
+		if (unlikely(head < 0))
+			break;
+
+		/* Nothing new? Wait for eventfd to tell us they refilled */
+		if (head == vq->num) {
+			if (unlikely(vhost_enable_notify(&vdmabuf->dev, vq))) {
+				vhost_disable_notify(&vdmabuf->dev, vq);
+				continue;
+			}
+			break;
+		}
+
+		if (out)
+			break;
+
+		in_size = iov_length(&vq->iov[0], in);
+
+		if (in_size == sizeof(struct virtio_vdmabuf_msg)) {
+			if (__copy_from_user(&msg, vq->iov[0].iov_base,
+					    in_size)) {
+				dev_err(drv_info->dev,
+					"err: can't get the msg from vq\n");
+				continue;
+			}
+
+			vdmabuf->next.msg_ptr = vq->iov[0].iov_base;
+			vdmabuf->next.head = head;
+
+			if (msg.cmd == VIRTIO_VDMABUF_CMD_NEED_VMID) {
+				struct virtio_vdmabuf_txmsg ack;
+
+				ack.msg.req_id = msg.req_id;
+				ack.msg.cmd = msg.cmd;
+				ack.msg.op[0] = vdmabuf->vmid;
+				ack.msg_ptr = vq->iov[0].iov_base;
+				ack.head = head;
+
+				send_to_txq(vdmabuf, &ack);
+			} else {
+				ret = parse_msg(vdmabuf, &msg,
+						vq->iov[0].iov_base,
+						head);
+				if (ret) {
+					dev_err(drv_info->dev,
+						"msg parse error: %d",
+						ret);
+					dev_err(drv_info->dev,
+						" cmd: %d\n", msg.cmd);
+				}
+			}
+		} else {
+			dev_err(drv_info->dev, "rx msg with wrong size\n");
+
+			/* just throw back the message to the client to
+			 * empty used buffer
+			 */
+			vhost_add_used_and_signal(&vdmabuf->dev, vq, head,
+						  in_size);
+		}
+	}
+
+	vhost_enable_notify(&vdmabuf->dev, vq);
+	mutex_unlock(&vdmabuf->vq_mutex);
+}
+
+void vhost_vdmabuf_get_kvm(unsigned long action, void *data)
+{
+	struct kvm_instance *instance;
+
+	instance = kzalloc(sizeof(*instance), GFP_KERNEL);
+	if (!instance)
+		return;
+
+	if (action == VFIO_GROUP_NOTIFY_SET_KVM) {
+		if (data) {
+			instance->kvm = data;
+			list_add_tail(&instance->link,
+				      &drv_info->kvm_instances);
+		}
+	}
+}
+EXPORT_SYMBOL_GPL(vhost_vdmabuf_get_kvm);
+
+static struct kvm *find_kvm_instance(u64 vmid)
+{
+	struct kvm_instance *instance, *tmp;
+	struct kvm *kvm = NULL;
+
+	list_for_each_entry_safe(instance, tmp, &drv_info->kvm_instances,
+                                 link) {
+		if (instance->kvm->userspace_pid == vmid) {
+			kvm = instance->kvm;
+
+			list_del(&instance->link);
+			kfree(instance);
+			break;
+		}
+	}
+
+	return kvm;
+}
+
+static int vhost_vdmabuf_open(struct inode *inode, struct file *filp)
+{
+	struct vhost_vdmabuf *vdmabuf;
+	struct vhost_virtqueue **vqs;
+	int ret = 0;
+
+	if (!drv_info) {
+		pr_err("vhost-vdmabuf: can't open misc device\n");
+		return -EINVAL;
+	}
+
+	vqs = kcalloc(1, sizeof(*vqs), GFP_KERNEL);
+	if (!vqs)
+		return -ENOMEM;
+
+	vdmabuf = kvzalloc(sizeof(*vdmabuf), GFP_KERNEL |
+			__GFP_RETRY_MAYFAIL);
+	if (!vdmabuf) {
+		kfree(vqs);
+		return -ENOMEM;
+	}
+
+	vdmabuf->evq = kcalloc(1, sizeof(*vdmabuf->evq), GFP_KERNEL);
+	if (!vdmabuf->evq) {
+		kfree(vdmabuf);
+		kfree(vqs);
+		return -ENOMEM;
+	}
+
+	mutex_lock(&drv_info->g_mutex);
+
+	vqs[0] = &vdmabuf->vq;
+	vdmabuf->vq.handle_kick = rx_work;
+
+	mutex_init(&vdmabuf->vq_mutex);
+	mutex_init(&vdmabuf->tx_mutex);
+	vhost_dev_init(&vdmabuf->dev, vqs, 1, UIO_MAXIOV, 0, 0, true, NULL);
+
+	vhost_work_init(&vdmabuf->tx_work, tx_work);
+	vdmabuf->vmid = task_pid_nr(current);
+	vdmabuf->kvm = find_kvm_instance(vdmabuf->vmid);
+	vhost_vdmabuf_add(vdmabuf);
+
+	mutex_init(&vdmabuf->evq->e_readlock);
+	spin_lock_init(&vdmabuf->evq->e_lock);
+
+	/* Initialize event queue */
+	INIT_LIST_HEAD(&vdmabuf->evq->e_list);
+	init_waitqueue_head(&vdmabuf->evq->e_wait);
+
+	/* resetting number of pending events */
+	vdmabuf->evq->pending = 0;
+	filp->private_data = vdmabuf;
+
+	mutex_unlock(&drv_info->g_mutex);
+
+	return ret;
+}
+
+static int vhost_vdmabuf_release(struct inode *inode, struct file *filp)
+{
+	struct vhost_vdmabuf *vdmabuf = filp->private_data;
+	struct virtio_vdmabuf_event *e, *et;
+
+	if (!vhost_vdmabuf_del(vdmabuf))
+		return -EINVAL;
+
+	mutex_lock(&drv_info->g_mutex);
+
+	list_for_each_entry_safe(e, et, &vdmabuf->evq->e_list,
+				 link) {
+		list_del(&e->link);
+		kfree(e);
+		vdmabuf->evq->pending--;
+	}
+
+	vhost_poll_stop(&vdmabuf->vq.poll);
+	vhost_poll_flush(&vdmabuf->vq.poll);
+	vhost_dev_cleanup(&vdmabuf->dev);
+
+	kfree(vdmabuf->dev.vqs);
+	kvfree(vdmabuf);
+
+	filp->private_data = NULL;
+	mutex_unlock(&drv_info->g_mutex);
+
+	return 0;
+}
+
+static unsigned int vhost_vdmabuf_event_poll(struct file *filp,
+				    	     struct poll_table_struct *wait)
+{
+	struct vhost_vdmabuf *vdmabuf = filp->private_data;
+
+	poll_wait(filp, &vdmabuf->evq->e_wait, wait);
+
+	if (!list_empty(&vdmabuf->evq->e_list))
+		return POLLIN | POLLRDNORM;
+
+	return 0;
+}
+
+static ssize_t vhost_vdmabuf_event_read(struct file *filp, char __user *buf,
+			       		size_t cnt, loff_t *ofst)
+{
+	struct vhost_vdmabuf *vdmabuf = filp->private_data;
+	int ret;
+
+	if (task_pid_nr(current) != vdmabuf->vmid) {
+		dev_err(drv_info->dev, "current process cannot read events\n");
+		return -EPERM;
+	}
+
+	/* make sure user buffer can be written */
+	if (!access_ok(buf, sizeof(*buf))) {
+		dev_err(drv_info->dev, "user buffer can't be written.\n");
+		return -EINVAL;
+	}
+
+	ret = mutex_lock_interruptible(&vdmabuf->evq->e_readlock);
+	if (ret)
+		return ret;
+
+	for (;;) {
+		struct virtio_vdmabuf_event *e = NULL;
+
+		spin_lock_irq(&vdmabuf->evq->e_lock);
+		if (!list_empty(&vdmabuf->evq->e_list)) {
+			e = list_first_entry(&vdmabuf->evq->e_list,
+					     struct virtio_vdmabuf_event, link);
+			list_del(&e->link);
+		}
+		spin_unlock_irq(&vdmabuf->evq->e_lock);
+
+		if (!e) {
+			if (ret)
+				break;
+
+			if (filp->f_flags & O_NONBLOCK) {
+				ret = -EAGAIN;
+				break;
+			}
+
+			mutex_unlock(&vdmabuf->evq->e_readlock);
+			ret = wait_event_interruptible(vdmabuf->evq->e_wait,
+					!list_empty(&vdmabuf->evq->e_list));
+
+			if (ret == 0)
+				ret = mutex_lock_interruptible(
+						&vdmabuf->evq->e_readlock);
+
+			if (ret)
+				return ret;
+		} else {
+			unsigned int len = (sizeof(e->e_data.hdr) +
+					    e->e_data.hdr.size);
+
+			if (len > cnt - ret) {
+put_back_event:
+				spin_lock_irq(&vdmabuf->evq->e_lock);
+				list_add(&e->link, &vdmabuf->evq->e_list);
+				spin_unlock_irq(&vdmabuf->evq->e_lock);
+				break;
+			}
+
+			if (copy_to_user(buf + ret, &e->e_data.hdr,
+					 sizeof(e->e_data.hdr))) {
+				if (ret == 0)
+					ret = -EFAULT;
+
+				goto put_back_event;
+			}
+
+			ret += sizeof(e->e_data.hdr);
+
+			if (copy_to_user(buf + ret, e->e_data.data,
+					 e->e_data.hdr.size)) {
+
+				struct virtio_vdmabuf_e_hdr dummy_hdr = {0};
+
+				ret -= sizeof(e->e_data.hdr);
+
+				/* nullifying hdr of the event in user buffer */
+				if (copy_to_user(buf + ret, &dummy_hdr,
+						 sizeof(dummy_hdr)))
+					dev_err(drv_info->dev,
+					   "fail to nullify invalid hdr\n");
+
+				ret = -EFAULT;
+
+				goto put_back_event;
+			}
+
+			ret += e->e_data.hdr.size;
+
+			spin_lock_irq(&vdmabuf->evq->e_lock);
+			vdmabuf->evq->pending--;
+			spin_unlock_irq(&vdmabuf->evq->e_lock);
+			kfree(e);
+		}
+	}
+
+	mutex_unlock(&vdmabuf->evq->e_readlock);
+
+	return ret;
+}
+
+/* vhost interface owner reset */
+static long vhost_vdmabuf_reset_owner(struct vhost_vdmabuf *vdmabuf)
+{
+	long ret;
+	struct vhost_iotlb *iotlb;
+
+	mutex_lock(&vdmabuf->dev.mutex);
+	ret = vhost_dev_check_owner(&vdmabuf->dev);
+	if (ret)
+		goto err;
+
+	iotlb = vhost_dev_reset_owner_prepare();
+	if (!iotlb) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	vhost_poll_stop(&vdmabuf->vq.poll);
+	vhost_poll_flush(&vdmabuf->vq.poll);
+
+	vhost_dev_reset_owner(&vdmabuf->dev, iotlb);
+err:
+	mutex_unlock(&vdmabuf->dev.mutex);
+	return ret;
+}
+
+/* wrapper ioctl for vhost interface control */
+static int vhost_core_ioctl(struct file *filp, unsigned int cmd,
+			    unsigned long param)
+{
+	struct vhost_vdmabuf *vdmabuf = filp->private_data;
+	int ret;
+
+	switch (cmd) {
+	case VHOST_GET_FEATURES:
+		/* TODO: future implementation */
+		return 0;
+	case VHOST_SET_FEATURES:
+		/* TODO: future implementation */
+		return 0;
+	case VHOST_RESET_OWNER:
+		return vhost_vdmabuf_reset_owner(vdmabuf);
+
+	default:
+		ret = vhost_dev_ioctl(&vdmabuf->dev, cmd, (void __user *)param);
+		if (ret == -ENOIOCTLCMD) {
+			ret = vhost_vring_ioctl(&vdmabuf->dev, cmd,
+						(void __user *)param);
+		} else {
+			vhost_poll_flush(&vdmabuf->vq.poll);
+		}
+	}
+
+	return ret;
+}
+
+/*
+ * ioctl - importing vdmabuf from guest OS
+ *
+ * user parameters:
+ *
+ *	virtio_vdmabuf_buf_id_t buf_id - vdmabuf ID of imported buffer
+ *	int flags - flags
+ *	int fd - file handle of	the imported buffer
+ *
+ */
+static int import_ioctl(struct file *filp, void *data)
+{
+	struct vhost_vdmabuf *vdmabuf = filp->private_data;
+	struct virtio_vdmabuf_import *attr = data;
+	struct virtio_vdmabuf_buf *imp;
+	int ret = 0;
+
+	mutex_lock(&vdmabuf->dev.mutex);
+
+	/* look for dmabuf for the id */
+	imp = virtio_vdmabuf_find_buf(drv_info, &attr->buf_id);
+	if (!imp || !imp->valid) {
+		mutex_unlock(&vdmabuf->dev.mutex);
+		dev_err(drv_info->dev,
+			"no valid buf found with id = %llu\n", attr->buf_id.id);
+		return -ENOENT;
+	}
+
+	/* only if mapped pages are not present */
+	if (!imp->pages_info->pages) {
+		ret = vhost_vdmabuf_map_pages(vdmabuf->vmid, imp->pages_info);
+		if (ret < 0) {
+			dev_err(drv_info->dev,
+				"failed to map guest pages\n");
+			goto fail_map;
+		}
+	}
+
+	attr->fd = vhost_vdmabuf_exp_fd(imp, attr->flags);
+	if (attr->fd < 0) {
+		dev_err(drv_info->dev, "failed to get file descriptor\n");
+		goto fail_import;
+	}
+
+	imp->imported = true;
+
+	mutex_unlock(&vdmabuf->dev.mutex);
+	goto success;
+
+fail_import:
+	/* not imported yet? */
+	if (!imp->imported) {
+		vhost_vdmabuf_unmap_pages(vdmabuf->vmid, imp->pages_info);
+		if (imp->dma_buf)
+			kfree(imp->dma_buf);
+
+		if (imp->sgt) {
+			sg_free_table(imp->sgt);
+			kfree(imp->sgt);
+			imp->sgt = NULL;
+		}
+	}
+
+fail_map:
+	/* Check if buffer is still valid and if not remove it
+	 * from imported list.
+	 */
+	if (!imp->valid && !imp->imported) {
+		virtio_vdmabuf_del_buf(drv_info, &imp->buf_id);
+		kfree(imp->priv);
+		kfree(imp->pages_info);
+		kfree(imp);
+	}
+
+	ret =  attr->fd;
+	mutex_unlock(&vdmabuf->dev.mutex);
+
+success:
+	return ret;
+}
+
+static int release_ioctl(struct file *filp, void *data)
+{
+	struct vhost_vdmabuf *vdmabuf = filp->private_data;
+	struct virtio_vdmabuf_import *attr = data;
+	struct virtio_vdmabuf_buf *imp;
+	virtio_vdmabuf_buf_id_t buf_id = attr->buf_id;
+	int *op;
+	int ret = 0;
+
+	op = kcalloc(1, sizeof(int) * 65, GFP_KERNEL);
+	if (!op)
+		return -ENOMEM;
+
+	imp = virtio_vdmabuf_find_buf(drv_info, &buf_id);
+	if (!imp)
+		return -EINVAL;
+
+	imp->imported = false;
+
+	memcpy(op, &imp->buf_id, sizeof(imp->buf_id));
+
+	ret = send_msg_to_guest(vdmabuf->vmid, VIRTIO_VDMABUF_CMD_DMABUF_REL,
+				op);
+	if (ret < 0) {
+		dev_err(drv_info->dev, "fail to send release cmd\n");
+		return ret;
+	}
+
+	return 0;
+}
+
+/*
+ * ioctl - querying various information of vdmabuf
+ *
+ * user parameters:
+ *
+ *	virtio_vdmabuf_buf_id_t buf_id - vdmabuf ID of imported buffer
+ *	unsigned long info - returned querying result
+ *
+ */
+static int query_ioctl(struct file *filp, void *data)
+{
+	struct virtio_vdmabuf_query *attr = data;
+	struct virtio_vdmabuf_buf *imp;
+	virtio_vdmabuf_buf_id_t buf_id = attr->buf_id;
+
+	/* query for imported dmabuf */
+	imp = virtio_vdmabuf_find_buf(drv_info, &buf_id);
+	if (!imp)
+		return -EINVAL;
+
+	switch (attr->item) {
+	/* size of dmabuf in byte */
+	case VIRTIO_VDMABUF_QUERY_SIZE:
+		if (imp->dma_buf) {
+			/* if local dma_buf is created (if it's
+			 * ever mapped), retrieve it directly
+			 * from struct dma_buf *
+			 */
+			attr->info = imp->dma_buf->size;
+		} else {
+			/* calcuate it from given nents, first_ofst
+			 * and last_len
+			 */
+			attr->info = ((imp->pages_info->nents)*PAGE_SIZE -
+				     (imp->pages_info->first_ofst) - PAGE_SIZE +
+				     (imp->pages_info->last_len));
+		}
+		break;
+
+	/* whether the buffer is used or not */
+	case VIRTIO_VDMABUF_QUERY_BUSY:
+		/* checks if it's used by importer */
+		attr->info = imp->imported;
+		break;
+
+	/* size of private info attached to buffer */
+	case VIRTIO_VDMABUF_QUERY_PRIV_INFO_SIZE:
+		attr->info = imp->sz_priv;
+		break;
+
+	/* copy private info attached to buffer */
+	case VIRTIO_VDMABUF_QUERY_PRIV_INFO:
+		if (imp->sz_priv > 0) {
+			int n;
+
+			n = copy_to_user((void __user *)attr->info,
+					imp->priv,
+					imp->sz_priv);
+			if (n != 0)
+				return -EINVAL;
+		}
+		break;
+
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static const struct virtio_vdmabuf_ioctl_desc vhost_vdmabuf_ioctls[] = {
+	VIRTIO_VDMABUF_IOCTL_DEF(VIRTIO_VDMABUF_IOCTL_IMPORT, import_ioctl, 0),
+	VIRTIO_VDMABUF_IOCTL_DEF(VIRTIO_VDMABUF_IOCTL_RELEASE, release_ioctl, 0),
+	VIRTIO_VDMABUF_IOCTL_DEF(VIRTIO_VDMABUF_IOCTL_QUERY, query_ioctl, 0),
+};
+
+static long vhost_vdmabuf_ioctl(struct file *filp, unsigned int cmd,
+				unsigned long param)
+{
+	const struct virtio_vdmabuf_ioctl_desc *ioctl;
+	virtio_vdmabuf_ioctl_t func;
+	unsigned int nr;
+	int ret;
+	char *kdata;
+
+	/* check if cmd is vhost's */
+	if (_IOC_TYPE(cmd) == VHOST_VIRTIO) {
+		ret = vhost_core_ioctl(filp, cmd, param);
+		return ret;
+	}
+
+	nr = _IOC_NR(cmd);
+
+	if (nr >= ARRAY_SIZE(vhost_vdmabuf_ioctls)) {
+		dev_err(drv_info->dev, "invalid ioctl\n");
+		return -EINVAL;
+	}
+
+	ioctl = &vhost_vdmabuf_ioctls[nr];
+
+	func = ioctl->func;
+
+	if (unlikely(!func)) {
+		dev_err(drv_info->dev, "no function\n");
+		return -EINVAL;
+	}
+
+	kdata = kmalloc(_IOC_SIZE(cmd), GFP_KERNEL);
+	if (!kdata)
+		return -ENOMEM;
+
+	if (copy_from_user(kdata, (void __user *)param,
+			   _IOC_SIZE(cmd)) != 0) {
+		dev_err(drv_info->dev,
+			"failed to copy args from userspace\n");
+		ret = -EFAULT;
+		goto ioctl_error;
+	}
+
+	ret = func(filp, kdata);
+
+	if (copy_to_user((void __user *)param, kdata,
+			 _IOC_SIZE(cmd)) != 0) {
+		dev_err(drv_info->dev,
+			"failed to copy args back to userspace\n");
+		ret = -EFAULT;
+		goto ioctl_error;
+	}
+
+ioctl_error:
+	kfree(kdata);
+	return ret;
+}
+
+static const struct file_operations vhost_vdmabuf_fops = {
+	.owner = THIS_MODULE,
+	.open = vhost_vdmabuf_open,
+	.release = vhost_vdmabuf_release,
+	.read = vhost_vdmabuf_event_read,
+	.poll = vhost_vdmabuf_event_poll,
+	.unlocked_ioctl = vhost_vdmabuf_ioctl,
+};
+
+static struct miscdevice vhost_vdmabuf_miscdev = {
+	.minor = MISC_DYNAMIC_MINOR,
+	.name = "vhost-vdmabuf",
+	.fops = &vhost_vdmabuf_fops,
+};
+
+static int __init vhost_vdmabuf_init(void)
+{
+	int ret = 0;
+
+	ret = misc_register(&vhost_vdmabuf_miscdev);
+	if (ret) {
+		pr_err("vhost-vdmabuf: driver can't be registered\n");
+		return ret;
+	}
+
+	dma_coerce_mask_and_coherent(vhost_vdmabuf_miscdev.this_device,
+				     DMA_BIT_MASK(64));
+
+	drv_info = kcalloc(1, sizeof(*drv_info), GFP_KERNEL);
+	if (!drv_info) {
+		misc_deregister(&vhost_vdmabuf_miscdev);
+		return -ENOMEM;
+	}
+
+	drv_info->dev = vhost_vdmabuf_miscdev.this_device;
+
+	mutex_init(&drv_info->g_mutex);
+	INIT_LIST_HEAD(&drv_info->head_vdmabuf_list);
+	hash_init(drv_info->buf_list);
+
+	return 0;
+}
+
+static void __exit vhost_vdmabuf_deinit(void)
+{
+	misc_deregister(&vhost_vdmabuf_miscdev);
+	vhost_vdmabuf_del_all();
+
+	kfree(drv_info);
+	drv_info = NULL;
+}
+
+module_init(vhost_vdmabuf_init);
+module_exit(vhost_vdmabuf_deinit);
+
+MODULE_DESCRIPTION("Vhost Vdmabuf Driver");
+MODULE_LICENSE("GPL and additional rights");
-- 
2.26.2

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC 3/3] vfio: Share the KVM instance with Vdmabuf
  2021-01-19  8:28 [RFC 0/3] Introduce Vdmabuf driver Vivek Kasireddy
  2021-01-19  8:28 ` [RFC 1/3] virtio: " Vivek Kasireddy
  2021-01-19  8:28 ` [RFC 2/3] vhost: Add Vdmabuf backend Vivek Kasireddy
@ 2021-01-19  8:28 ` Vivek Kasireddy
  2021-01-19 15:39   ` Alex Williamson
  2 siblings, 1 reply; 10+ messages in thread
From: Vivek Kasireddy @ 2021-01-19  8:28 UTC (permalink / raw)
  To: virtualization; +Cc: dongwon.kim

Getting a copy of the KVM instance is necessary for mapping Guest
pages in the Host.

TODO: Instead of invoking the symbol directly, there needs to be a
better way of getting a copy of the KVM instance probably by using
other notifiers. However, currently, KVM shares its instance only
with VFIO and therefore we are compelled to bind the passthrough'd
device to vfio-pci.

Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
---
 drivers/vfio/vfio.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 4ad8a35667a7..9fb11b1ad3cd 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -2213,11 +2213,20 @@ static int vfio_unregister_iommu_notifier(struct vfio_group *group,
 	return ret;
 }
 
+extern void vhost_vdmabuf_get_kvm(unsigned long action, void *data);
 void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
 {
+	void (*fn)(unsigned long, void *);
+
 	group->kvm = kvm;
 	blocking_notifier_call_chain(&group->notifier,
 				VFIO_GROUP_NOTIFY_SET_KVM, kvm);
+
+	fn = symbol_get(vhost_vdmabuf_get_kvm);
+	if (fn) {
+		fn(VFIO_GROUP_NOTIFY_SET_KVM, kvm);
+		symbol_put(vhost_vdmabuf_get_kvm);
+	}
 }
 EXPORT_SYMBOL_GPL(vfio_group_set_kvm);
 
-- 
2.26.2

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC 3/3] vfio: Share the KVM instance with Vdmabuf
  2021-01-19  8:28 ` [RFC 3/3] vfio: Share the KVM instance with Vdmabuf Vivek Kasireddy
@ 2021-01-19 15:39   ` Alex Williamson
  2021-01-20  0:14     ` Kasireddy, Vivek
  0 siblings, 1 reply; 10+ messages in thread
From: Alex Williamson @ 2021-01-19 15:39 UTC (permalink / raw)
  To: Vivek Kasireddy; +Cc: dongwon.kim, virtualization

On Tue, 19 Jan 2021 00:28:12 -0800
Vivek Kasireddy <vivek.kasireddy@intel.com> wrote:

> Getting a copy of the KVM instance is necessary for mapping Guest
> pages in the Host.
> 
> TODO: Instead of invoking the symbol directly, there needs to be a
> better way of getting a copy of the KVM instance probably by using
> other notifiers. However, currently, KVM shares its instance only
> with VFIO and therefore we are compelled to bind the passthrough'd
> device to vfio-pci.

Yeah, this is a bad solution, sorry, vfio is not going to gratuitously
call out to vhost to share a kvm pointer.  I'd prefer to get rid of
vfio having any knowledge or visibility of the kvm pointer.  Thanks,

Alex
 
> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> ---
>  drivers/vfio/vfio.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 4ad8a35667a7..9fb11b1ad3cd 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -2213,11 +2213,20 @@ static int vfio_unregister_iommu_notifier(struct vfio_group *group,
>  	return ret;
>  }
>  
> +extern void vhost_vdmabuf_get_kvm(unsigned long action, void *data);
>  void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
>  {
> +	void (*fn)(unsigned long, void *);
> +
>  	group->kvm = kvm;
>  	blocking_notifier_call_chain(&group->notifier,
>  				VFIO_GROUP_NOTIFY_SET_KVM, kvm);
> +
> +	fn = symbol_get(vhost_vdmabuf_get_kvm);
> +	if (fn) {
> +		fn(VFIO_GROUP_NOTIFY_SET_KVM, kvm);
> +		symbol_put(vhost_vdmabuf_get_kvm);
> +	}
>  }
>  EXPORT_SYMBOL_GPL(vfio_group_set_kvm);
>  

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [RFC 3/3] vfio: Share the KVM instance with Vdmabuf
  2021-01-19 15:39   ` Alex Williamson
@ 2021-01-20  0:14     ` Kasireddy, Vivek
  2021-01-20  0:50       ` Alex Williamson
  0 siblings, 1 reply; 10+ messages in thread
From: Kasireddy, Vivek @ 2021-01-20  0:14 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Kim, Dongwon, virtualization

Hi Alex,

> -----Original Message-----
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, January 19, 2021 7:40 AM
> To: Kasireddy, Vivek <vivek.kasireddy@intel.com>
> Cc: virtualization@lists.linux-foundation.org; Kim, Dongwon <dongwon.kim@intel.com>
> Subject: Re: [RFC 3/3] vfio: Share the KVM instance with Vdmabuf
> 
> On Tue, 19 Jan 2021 00:28:12 -0800
> Vivek Kasireddy <vivek.kasireddy@intel.com> wrote:
> 
> > Getting a copy of the KVM instance is necessary for mapping Guest
> > pages in the Host.
> >
> > TODO: Instead of invoking the symbol directly, there needs to be a
> > better way of getting a copy of the KVM instance probably by using
> > other notifiers. However, currently, KVM shares its instance only
> > with VFIO and therefore we are compelled to bind the passthrough'd
> > device to vfio-pci.
> 
> Yeah, this is a bad solution, sorry, vfio is not going to gratuitously
> call out to vhost to share a kvm pointer.  I'd prefer to get rid of
> vfio having any knowledge or visibility of the kvm pointer.  Thanks,

[Kasireddy, Vivek] I agree that this is definitely not ideal as I recognize it
in the TODO. However, it looks like VFIO also gets a copy of the KVM 
pointer in a similar manner:

virt/kvm/vfio.c

static void kvm_vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
{
        void (*fn)(struct vfio_group *, struct kvm *);

        fn = symbol_get(vfio_group_set_kvm);
        if (!fn)
                return;

        fn(group, kvm);

        symbol_put(vfio_group_set_kvm);
}

With this patch, I am not suggesting that this is a precedent that should be followed 
but it appears there doesn't seem to be an alternative way of getting a copy of the KVM 
pointer that is clean and elegant -- unless I have not looked hard enough. I guess we
could create a notifier chain with callbacks for VFIO and Vhost that KVM would call 
but this would mean modifying KVM.

Also, if I understand correctly, if VFIO does not want to share the KVM pointer with
VFIO groups, then I think it would break stuff like mdev which counts on it. 

Thanks,
Vivek

> Alex
> 
> > Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> > ---
> >  drivers/vfio/vfio.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > index 4ad8a35667a7..9fb11b1ad3cd 100644
> > --- a/drivers/vfio/vfio.c
> > +++ b/drivers/vfio/vfio.c
> > @@ -2213,11 +2213,20 @@ static int vfio_unregister_iommu_notifier(struct vfio_group
> *group,
> >  	return ret;
> >  }
> >
> > +extern void vhost_vdmabuf_get_kvm(unsigned long action, void *data);
> >  void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
> >  {
> > +	void (*fn)(unsigned long, void *);
> > +
> >  	group->kvm = kvm;
> >  	blocking_notifier_call_chain(&group->notifier,
> >  				VFIO_GROUP_NOTIFY_SET_KVM, kvm);
> > +
> > +	fn = symbol_get(vhost_vdmabuf_get_kvm);
> > +	if (fn) {
> > +		fn(VFIO_GROUP_NOTIFY_SET_KVM, kvm);
> > +		symbol_put(vhost_vdmabuf_get_kvm);
> > +	}
> >  }
> >  EXPORT_SYMBOL_GPL(vfio_group_set_kvm);
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 3/3] vfio: Share the KVM instance with Vdmabuf
  2021-01-20  0:14     ` Kasireddy, Vivek
@ 2021-01-20  0:50       ` Alex Williamson
  2021-01-20  3:05         ` Tian, Kevin
  0 siblings, 1 reply; 10+ messages in thread
From: Alex Williamson @ 2021-01-20  0:50 UTC (permalink / raw)
  To: Kasireddy, Vivek; +Cc: Kim, Dongwon, virtualization

On Wed, 20 Jan 2021 00:14:49 +0000
"Kasireddy, Vivek" <vivek.kasireddy@intel.com> wrote:

> Hi Alex,
> 
> > -----Original Message-----
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, January 19, 2021 7:40 AM
> > To: Kasireddy, Vivek <vivek.kasireddy@intel.com>
> > Cc: virtualization@lists.linux-foundation.org; Kim, Dongwon <dongwon.kim@intel.com>
> > Subject: Re: [RFC 3/3] vfio: Share the KVM instance with Vdmabuf
> > 
> > On Tue, 19 Jan 2021 00:28:12 -0800
> > Vivek Kasireddy <vivek.kasireddy@intel.com> wrote:
> >   
> > > Getting a copy of the KVM instance is necessary for mapping Guest
> > > pages in the Host.
> > >
> > > TODO: Instead of invoking the symbol directly, there needs to be a
> > > better way of getting a copy of the KVM instance probably by using
> > > other notifiers. However, currently, KVM shares its instance only
> > > with VFIO and therefore we are compelled to bind the passthrough'd
> > > device to vfio-pci.  
> > 
> > Yeah, this is a bad solution, sorry, vfio is not going to gratuitously
> > call out to vhost to share a kvm pointer.  I'd prefer to get rid of
> > vfio having any knowledge or visibility of the kvm pointer.  Thanks,  
> 
> [Kasireddy, Vivek] I agree that this is definitely not ideal as I recognize it
> in the TODO. However, it looks like VFIO also gets a copy of the KVM 
> pointer in a similar manner:
> 
> virt/kvm/vfio.c
> 
> static void kvm_vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
> {
>         void (*fn)(struct vfio_group *, struct kvm *);
> 
>         fn = symbol_get(vfio_group_set_kvm);
>         if (!fn)
>                 return;
> 
>         fn(group, kvm);
> 
>         symbol_put(vfio_group_set_kvm);
> }

You're equating the mechanism with the architecture.  We use symbols
here to avoid module dependencies between kvm and vfio, but this is
just propagating data that userspace is specifically registering
between kvm and vfio.  vhost doesn't get to piggyback on that channel.

> With this patch, I am not suggesting that this is a precedent that should be followed 
> but it appears there doesn't seem to be an alternative way of getting a copy of the KVM 
> pointer that is clean and elegant -- unless I have not looked hard enough. I guess we
> could create a notifier chain with callbacks for VFIO and Vhost that KVM would call 
> but this would mean modifying KVM.
> 
> Also, if I understand correctly, if VFIO does not want to share the KVM pointer with
> VFIO groups, then I think it would break stuff like mdev which counts on it. 

Only kvmgt requires the kvm pointer and the use case there is pretty
questionable, I wonder if it actually still exists now that we have the
DMA r/w interface through vfio.  Thanks,

Alex

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [RFC 3/3] vfio: Share the KVM instance with Vdmabuf
  2021-01-20  0:50       ` Alex Williamson
@ 2021-01-20  3:05         ` Tian, Kevin
  2021-01-20  3:36           ` Alex Williamson
  0 siblings, 1 reply; 10+ messages in thread
From: Tian, Kevin @ 2021-01-20  3:05 UTC (permalink / raw)
  To: Alex Williamson, Kasireddy, Vivek
  Cc: Zhao, Yan Y, Kim, Dongwon, virtualization

> From: Alex Williamson
> Sent: Wednesday, January 20, 2021 8:51 AM
> 
> On Wed, 20 Jan 2021 00:14:49 +0000
> "Kasireddy, Vivek" <vivek.kasireddy@intel.com> wrote:
> 
> > Hi Alex,
> >
> > > -----Original Message-----
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Tuesday, January 19, 2021 7:40 AM
> > > To: Kasireddy, Vivek <vivek.kasireddy@intel.com>
> > > Cc: virtualization@lists.linux-foundation.org; Kim, Dongwon
> <dongwon.kim@intel.com>
> > > Subject: Re: [RFC 3/3] vfio: Share the KVM instance with Vdmabuf
> > >
> > > On Tue, 19 Jan 2021 00:28:12 -0800
> > > Vivek Kasireddy <vivek.kasireddy@intel.com> wrote:
> > >
> > > > Getting a copy of the KVM instance is necessary for mapping Guest
> > > > pages in the Host.
> > > >
> > > > TODO: Instead of invoking the symbol directly, there needs to be a
> > > > better way of getting a copy of the KVM instance probably by using
> > > > other notifiers. However, currently, KVM shares its instance only
> > > > with VFIO and therefore we are compelled to bind the passthrough'd
> > > > device to vfio-pci.
> > >
> > > Yeah, this is a bad solution, sorry, vfio is not going to gratuitously
> > > call out to vhost to share a kvm pointer.  I'd prefer to get rid of
> > > vfio having any knowledge or visibility of the kvm pointer.  Thanks,
> >
> > [Kasireddy, Vivek] I agree that this is definitely not ideal as I recognize it
> > in the TODO. However, it looks like VFIO also gets a copy of the KVM
> > pointer in a similar manner:
> >
> > virt/kvm/vfio.c
> >
> > static void kvm_vfio_group_set_kvm(struct vfio_group *group, struct kvm
> *kvm)
> > {
> >         void (*fn)(struct vfio_group *, struct kvm *);
> >
> >         fn = symbol_get(vfio_group_set_kvm);
> >         if (!fn)
> >                 return;
> >
> >         fn(group, kvm);
> >
> >         symbol_put(vfio_group_set_kvm);
> > }
> 
> You're equating the mechanism with the architecture.  We use symbols
> here to avoid module dependencies between kvm and vfio, but this is
> just propagating data that userspace is specifically registering
> between kvm and vfio.  vhost doesn't get to piggyback on that channel.
> 
> > With this patch, I am not suggesting that this is a precedent that should be
> followed
> > but it appears there doesn't seem to be an alternative way of getting a copy
> of the KVM
> > pointer that is clean and elegant -- unless I have not looked hard enough. I
> guess we
> > could create a notifier chain with callbacks for VFIO and Vhost that KVM
> would call
> > but this would mean modifying KVM.
> >
> > Also, if I understand correctly, if VFIO does not want to share the KVM
> pointer with
> > VFIO groups, then I think it would break stuff like mdev which counts on it.
> 
> Only kvmgt requires the kvm pointer and the use case there is pretty
> questionable, I wonder if it actually still exists now that we have the
> DMA r/w interface through vfio.  Thanks,
> 

IIRC, kvmgt still needs the kvm pointer to use kvm page tracking interface 
for write-protecting guest pgtable.

Thanks
Kevin
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 3/3] vfio: Share the KVM instance with Vdmabuf
  2021-01-20  3:05         ` Tian, Kevin
@ 2021-01-20  3:36           ` Alex Williamson
  2021-01-21  3:15             ` Kasireddy, Vivek
  0 siblings, 1 reply; 10+ messages in thread
From: Alex Williamson @ 2021-01-20  3:36 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: virtualization, Zhao, Yan Y, Kim, Dongwon

On Wed, 20 Jan 2021 03:05:49 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Alex Williamson
> > Sent: Wednesday, January 20, 2021 8:51 AM
> > 
> > On Wed, 20 Jan 2021 00:14:49 +0000
> > "Kasireddy, Vivek" <vivek.kasireddy@intel.com> wrote:
> >   
> > > Hi Alex,
> > >  
> > > > -----Original Message-----
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Tuesday, January 19, 2021 7:40 AM
> > > > To: Kasireddy, Vivek <vivek.kasireddy@intel.com>
> > > > Cc: virtualization@lists.linux-foundation.org; Kim, Dongwon  
> > <dongwon.kim@intel.com>  
> > > > Subject: Re: [RFC 3/3] vfio: Share the KVM instance with Vdmabuf
> > > >
> > > > On Tue, 19 Jan 2021 00:28:12 -0800
> > > > Vivek Kasireddy <vivek.kasireddy@intel.com> wrote:
> > > >  
> > > > > Getting a copy of the KVM instance is necessary for mapping Guest
> > > > > pages in the Host.
> > > > >
> > > > > TODO: Instead of invoking the symbol directly, there needs to be a
> > > > > better way of getting a copy of the KVM instance probably by using
> > > > > other notifiers. However, currently, KVM shares its instance only
> > > > > with VFIO and therefore we are compelled to bind the passthrough'd
> > > > > device to vfio-pci.  
> > > >
> > > > Yeah, this is a bad solution, sorry, vfio is not going to gratuitously
> > > > call out to vhost to share a kvm pointer.  I'd prefer to get rid of
> > > > vfio having any knowledge or visibility of the kvm pointer.  Thanks,  
> > >
> > > [Kasireddy, Vivek] I agree that this is definitely not ideal as I recognize it
> > > in the TODO. However, it looks like VFIO also gets a copy of the KVM
> > > pointer in a similar manner:
> > >
> > > virt/kvm/vfio.c
> > >
> > > static void kvm_vfio_group_set_kvm(struct vfio_group *group, struct kvm  
> > *kvm)  
> > > {
> > >         void (*fn)(struct vfio_group *, struct kvm *);
> > >
> > >         fn = symbol_get(vfio_group_set_kvm);
> > >         if (!fn)
> > >                 return;
> > >
> > >         fn(group, kvm);
> > >
> > >         symbol_put(vfio_group_set_kvm);
> > > }  
> > 
> > You're equating the mechanism with the architecture.  We use symbols
> > here to avoid module dependencies between kvm and vfio, but this is
> > just propagating data that userspace is specifically registering
> > between kvm and vfio.  vhost doesn't get to piggyback on that channel.
> >   
> > > With this patch, I am not suggesting that this is a precedent that should be  
> > followed  
> > > but it appears there doesn't seem to be an alternative way of getting a copy  
> > of the KVM  
> > > pointer that is clean and elegant -- unless I have not looked hard enough. I  
> > guess we  
> > > could create a notifier chain with callbacks for VFIO and Vhost that KVM  
> > would call  
> > > but this would mean modifying KVM.
> > >
> > > Also, if I understand correctly, if VFIO does not want to share the KVM  
> > pointer with  
> > > VFIO groups, then I think it would break stuff like mdev which counts on it.  
> > 
> > Only kvmgt requires the kvm pointer and the use case there is pretty
> > questionable, I wonder if it actually still exists now that we have the
> > DMA r/w interface through vfio.  Thanks,
> >   
> 
> IIRC, kvmgt still needs the kvm pointer to use kvm page tracking interface 
> for write-protecting guest pgtable.

Thanks, Kevin.  Either way, a vhost device has no stake in the game wrt
the kvm pointer lifecycle here and no business adding a callout.  I'm
reluctant to add any further use cases even for mdevs as ideally mdevs
should have no dependency on kvm.  Thanks,

Alex

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [RFC 3/3] vfio: Share the KVM instance with Vdmabuf
  2021-01-20  3:36           ` Alex Williamson
@ 2021-01-21  3:15             ` Kasireddy, Vivek
  0 siblings, 0 replies; 10+ messages in thread
From: Kasireddy, Vivek @ 2021-01-21  3:15 UTC (permalink / raw)
  To: Alex Williamson, Tian, Kevin; +Cc: Zhao, Yan Y, Kim, Dongwon, virtualization

Hi Alex,

> -----Original Message-----
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, January 19, 2021 7:37 PM
> To: Tian, Kevin <kevin.tian@intel.com>
> Cc: Kasireddy, Vivek <vivek.kasireddy@intel.com>; Kim, Dongwon
> <dongwon.kim@intel.com>; virtualization@lists.linux-foundation.org; Zhao, Yan Y
> <yan.y.zhao@intel.com>
> Subject: Re: [RFC 3/3] vfio: Share the KVM instance with Vdmabuf
> 
> On Wed, 20 Jan 2021 03:05:49 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Alex Williamson
> > > Sent: Wednesday, January 20, 2021 8:51 AM
> > >
> > > On Wed, 20 Jan 2021 00:14:49 +0000
> > > "Kasireddy, Vivek" <vivek.kasireddy@intel.com> wrote:
> > >
> > > > Hi Alex,
> > > >
> > > > > -----Original Message-----
> > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > Sent: Tuesday, January 19, 2021 7:40 AM
> > > > > To: Kasireddy, Vivek <vivek.kasireddy@intel.com>
> > > > > Cc: virtualization@lists.linux-foundation.org; Kim, Dongwon
> > > <dongwon.kim@intel.com>
> > > > > Subject: Re: [RFC 3/3] vfio: Share the KVM instance with Vdmabuf
> > > > >
> > > > > On Tue, 19 Jan 2021 00:28:12 -0800
> > > > > Vivek Kasireddy <vivek.kasireddy@intel.com> wrote:
> > > > >
> > > > > > Getting a copy of the KVM instance is necessary for mapping Guest
> > > > > > pages in the Host.
> > > > > >
> > > > > > TODO: Instead of invoking the symbol directly, there needs to be a
> > > > > > better way of getting a copy of the KVM instance probably by using
> > > > > > other notifiers. However, currently, KVM shares its instance only
> > > > > > with VFIO and therefore we are compelled to bind the passthrough'd
> > > > > > device to vfio-pci.
> > > > >
> > > > > Yeah, this is a bad solution, sorry, vfio is not going to gratuitously
> > > > > call out to vhost to share a kvm pointer.  I'd prefer to get rid of
> > > > > vfio having any knowledge or visibility of the kvm pointer.  Thanks,
> > > >
> > > > [Kasireddy, Vivek] I agree that this is definitely not ideal as I recognize it
> > > > in the TODO. However, it looks like VFIO also gets a copy of the KVM
> > > > pointer in a similar manner:
> > > >
> > > > virt/kvm/vfio.c
> > > >
> > > > static void kvm_vfio_group_set_kvm(struct vfio_group *group, struct kvm
> > > *kvm)
> > > > {
> > > >         void (*fn)(struct vfio_group *, struct kvm *);
> > > >
> > > >         fn = symbol_get(vfio_group_set_kvm);
> > > >         if (!fn)
> > > >                 return;
> > > >
> > > >         fn(group, kvm);
> > > >
> > > >         symbol_put(vfio_group_set_kvm);
> > > > }
> > >
> > > You're equating the mechanism with the architecture.  We use symbols
> > > here to avoid module dependencies between kvm and vfio, but this is
> > > just propagating data that userspace is specifically registering
> > > between kvm and vfio.  vhost doesn't get to piggyback on that channel.
> > >
> > > > With this patch, I am not suggesting that this is a precedent that should be
> > > followed
> > > > but it appears there doesn't seem to be an alternative way of getting a copy
> > > of the KVM
> > > > pointer that is clean and elegant -- unless I have not looked hard enough. I
> > > guess we
> > > > could create a notifier chain with callbacks for VFIO and Vhost that KVM
> > > would call
> > > > but this would mean modifying KVM.
> > > >
> > > > Also, if I understand correctly, if VFIO does not want to share the KVM
> > > pointer with
> > > > VFIO groups, then I think it would break stuff like mdev which counts on it.
> > >
> > > Only kvmgt requires the kvm pointer and the use case there is pretty
> > > questionable, I wonder if it actually still exists now that we have the
> > > DMA r/w interface through vfio.  Thanks,
> > >
> >
> > IIRC, kvmgt still needs the kvm pointer to use kvm page tracking interface
> > for write-protecting guest pgtable.
> 
> Thanks, Kevin.  Either way, a vhost device has no stake in the game wrt
> the kvm pointer lifecycle here and no business adding a callout.  I'm
> reluctant to add any further use cases even for mdevs as ideally mdevs
> should have no dependency on kvm.  Thanks,

[Kasireddy, Vivek] All I am trying to do is leverage existing mechanism(s) 
instead of creating new ones. So, if Vhost cannot get the kvm pointer from 
VFIO in any manner, my only option, as it appears is to add a new 
notifier_block to KVM that gets triggered in kvm_create_vm() and 
kvm_destroy_vm(). However, I am not sure if that would be acceptable to 
the KVM maintainers. Does anyone know if there is another cleaner option 
available rather than having to modify KVM?

Thanks,
Vivek

> 
> Alex

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-01-21  3:15 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-19  8:28 [RFC 0/3] Introduce Vdmabuf driver Vivek Kasireddy
2021-01-19  8:28 ` [RFC 1/3] virtio: " Vivek Kasireddy
2021-01-19  8:28 ` [RFC 2/3] vhost: Add Vdmabuf backend Vivek Kasireddy
2021-01-19  8:28 ` [RFC 3/3] vfio: Share the KVM instance with Vdmabuf Vivek Kasireddy
2021-01-19 15:39   ` Alex Williamson
2021-01-20  0:14     ` Kasireddy, Vivek
2021-01-20  0:50       ` Alex Williamson
2021-01-20  3:05         ` Tian, Kevin
2021-01-20  3:36           ` Alex Williamson
2021-01-21  3:15             ` Kasireddy, Vivek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).