All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v3 0/3] Introduce Virtio based Dmabuf driver
@ 2021-02-03  7:35 ` Vivek Kasireddy
  0 siblings, 0 replies; 57+ messages in thread
From: Vivek Kasireddy @ 2021-02-03  7:35 UTC (permalink / raw)
  To: virtualization, dri-devel
  Cc: kraxel, daniel.vetter, daniel.vetter, dongwon.kim, sumit.semwal,
	christian.koenig, linux-media, Vivek Kasireddy

The Virtual Dmabuf or Virtio based Dmabuf (Vdmabuf) driver can be used
to "transfer" a page-backed dmabuf created in the Guest to the Host
without making any copies. This is mostly accomplished by recreating the
dmabuf on the Host using the PFNs and other meta-data shared by the guest. 
A use-case where this driver would be a good fit is a multi-GPU system 
(perhaps one discrete and one integrated) where one of the GPUs does not 
have access to the display/connectors/outputs. This could be an embedded 
system design decision or a restriction made at the firmware/BIOS level
or perhaps the device is setup in UPT (Universal Passthrough) mode. When 
such a GPU is passthrough'd to a Guest OS, this driver can help in 
transferring the scanout buffer(s) (rendered using the native rendering 
stack) to the Host for the purpose of displaying them. Or, quite simply,
this driver can be used to transfer a dmabuf created by an application
running on the Guest to another application running on the Host.

The userspace component running in the Guest that transfers the dmabuf
is referred to as the producer or exporter and its counterpart running
in the Host is referred to as importer or consumer. For instance, a
Wayland compositor would potentially be a producer and Qemu UI would
be a consumer. It is the producer's responsibility to not reuse or
destroy the shared buffer while it is still being used by the consumer.
The consumer would send a release cmd indicating that it is done after
which the shared buffer can be safely used again by the producer. One
way the producer can prevent accidental re-use of the shared buffer is
to lock the buffer when it exports it and unlock it after it gets a 
release cmd. As an example, the GBM API provides a simple way to lock 
and unlock a surface's buffers.

For each dmabuf that is to be shared with the Host, a 128-bit unique
ID is generated that identifies this buffer across the whole system.
This ID is a combination of the Qemu process ID, a counter and a
randomizer. We could potentially use UUID API but we currently use
the above mentioned combination to identify the source of the buffer
at any given time for potential bookkeeping.

A typical cycle starts with the producer or exporter calling the
export IOCTL to export a dmabuf; a new unique ID is generated for
this buffer and it gets registered with the Host. The Host then
alerts the consumer or importer by raising an event and shares the ID.
In response, the consumer calls the import IOCTL using the ID and gets
a newly created dmabuf fd in return. After it is done using the dmabuf,
the consumer finally calls the release IOCTL and the Guest is notified
which in turn notifies the producer letting it know that the buffer is
now safe to reuse. 

v2:
- Added a notifier mechanism for getting the kvm pointer.
- Added start and stop routines in the Vhost backend.
- Augmented the cover letter and made some minor improvements.

v3:
- Refactored the code to make it similar to vsock

Vivek Kasireddy (3):
  kvm: Add a notifier for create and destroy VM events
  virtio: Introduce Vdmabuf driver
  vhost: Add Vdmabuf backend

 drivers/vhost/Kconfig               |    9 +
 drivers/vhost/Makefile              |    3 +
 drivers/vhost/vdmabuf.c             | 1446 +++++++++++++++++++++++++++
 drivers/virtio/Kconfig              |    8 +
 drivers/virtio/Makefile             |    1 +
 drivers/virtio/virtio_vdmabuf.c     | 1090 ++++++++++++++++++++
 include/linux/kvm_host.h            |    5 +
 include/linux/virtio_vdmabuf.h      |  271 +++++
 include/uapi/linux/vhost.h          |    3 +
 include/uapi/linux/virtio_ids.h     |    1 +
 include/uapi/linux/virtio_vdmabuf.h |   99 ++
 virt/kvm/kvm_main.c                 |   20 +-
 12 files changed, 2954 insertions(+), 2 deletions(-)
 create mode 100644 drivers/vhost/vdmabuf.c
 create mode 100644 drivers/virtio/virtio_vdmabuf.c
 create mode 100644 include/linux/virtio_vdmabuf.h
 create mode 100644 include/uapi/linux/virtio_vdmabuf.h

-- 
2.26.2


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC v3 0/3] Introduce Virtio based Dmabuf driver
@ 2021-02-03  7:35 ` Vivek Kasireddy
  0 siblings, 0 replies; 57+ messages in thread
From: Vivek Kasireddy @ 2021-02-03  7:35 UTC (permalink / raw)
  To: virtualization, dri-devel
  Cc: dongwon.kim, daniel.vetter, sumit.semwal, daniel.vetter,
	christian.koenig, linux-media

The Virtual Dmabuf or Virtio based Dmabuf (Vdmabuf) driver can be used
to "transfer" a page-backed dmabuf created in the Guest to the Host
without making any copies. This is mostly accomplished by recreating the
dmabuf on the Host using the PFNs and other meta-data shared by the guest. 
A use-case where this driver would be a good fit is a multi-GPU system 
(perhaps one discrete and one integrated) where one of the GPUs does not 
have access to the display/connectors/outputs. This could be an embedded 
system design decision or a restriction made at the firmware/BIOS level
or perhaps the device is setup in UPT (Universal Passthrough) mode. When 
such a GPU is passthrough'd to a Guest OS, this driver can help in 
transferring the scanout buffer(s) (rendered using the native rendering 
stack) to the Host for the purpose of displaying them. Or, quite simply,
this driver can be used to transfer a dmabuf created by an application
running on the Guest to another application running on the Host.

The userspace component running in the Guest that transfers the dmabuf
is referred to as the producer or exporter and its counterpart running
in the Host is referred to as importer or consumer. For instance, a
Wayland compositor would potentially be a producer and Qemu UI would
be a consumer. It is the producer's responsibility to not reuse or
destroy the shared buffer while it is still being used by the consumer.
The consumer would send a release cmd indicating that it is done after
which the shared buffer can be safely used again by the producer. One
way the producer can prevent accidental re-use of the shared buffer is
to lock the buffer when it exports it and unlock it after it gets a 
release cmd. As an example, the GBM API provides a simple way to lock 
and unlock a surface's buffers.

For each dmabuf that is to be shared with the Host, a 128-bit unique
ID is generated that identifies this buffer across the whole system.
This ID is a combination of the Qemu process ID, a counter and a
randomizer. We could potentially use UUID API but we currently use
the above mentioned combination to identify the source of the buffer
at any given time for potential bookkeeping.

A typical cycle starts with the producer or exporter calling the
export IOCTL to export a dmabuf; a new unique ID is generated for
this buffer and it gets registered with the Host. The Host then
alerts the consumer or importer by raising an event and shares the ID.
In response, the consumer calls the import IOCTL using the ID and gets
a newly created dmabuf fd in return. After it is done using the dmabuf,
the consumer finally calls the release IOCTL and the Guest is notified
which in turn notifies the producer letting it know that the buffer is
now safe to reuse. 

v2:
- Added a notifier mechanism for getting the kvm pointer.
- Added start and stop routines in the Vhost backend.
- Augmented the cover letter and made some minor improvements.

v3:
- Refactored the code to make it similar to vsock

Vivek Kasireddy (3):
  kvm: Add a notifier for create and destroy VM events
  virtio: Introduce Vdmabuf driver
  vhost: Add Vdmabuf backend

 drivers/vhost/Kconfig               |    9 +
 drivers/vhost/Makefile              |    3 +
 drivers/vhost/vdmabuf.c             | 1446 +++++++++++++++++++++++++++
 drivers/virtio/Kconfig              |    8 +
 drivers/virtio/Makefile             |    1 +
 drivers/virtio/virtio_vdmabuf.c     | 1090 ++++++++++++++++++++
 include/linux/kvm_host.h            |    5 +
 include/linux/virtio_vdmabuf.h      |  271 +++++
 include/uapi/linux/vhost.h          |    3 +
 include/uapi/linux/virtio_ids.h     |    1 +
 include/uapi/linux/virtio_vdmabuf.h |   99 ++
 virt/kvm/kvm_main.c                 |   20 +-
 12 files changed, 2954 insertions(+), 2 deletions(-)
 create mode 100644 drivers/vhost/vdmabuf.c
 create mode 100644 drivers/virtio/virtio_vdmabuf.c
 create mode 100644 include/linux/virtio_vdmabuf.h
 create mode 100644 include/uapi/linux/virtio_vdmabuf.h

-- 
2.26.2

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC v3 0/3] Introduce Virtio based Dmabuf driver
@ 2021-02-03  7:35 ` Vivek Kasireddy
  0 siblings, 0 replies; 57+ messages in thread
From: Vivek Kasireddy @ 2021-02-03  7:35 UTC (permalink / raw)
  To: virtualization, dri-devel
  Cc: dongwon.kim, daniel.vetter, Vivek Kasireddy, kraxel,
	daniel.vetter, christian.koenig, linux-media

The Virtual Dmabuf or Virtio based Dmabuf (Vdmabuf) driver can be used
to "transfer" a page-backed dmabuf created in the Guest to the Host
without making any copies. This is mostly accomplished by recreating the
dmabuf on the Host using the PFNs and other meta-data shared by the guest. 
A use-case where this driver would be a good fit is a multi-GPU system 
(perhaps one discrete and one integrated) where one of the GPUs does not 
have access to the display/connectors/outputs. This could be an embedded 
system design decision or a restriction made at the firmware/BIOS level
or perhaps the device is setup in UPT (Universal Passthrough) mode. When 
such a GPU is passthrough'd to a Guest OS, this driver can help in 
transferring the scanout buffer(s) (rendered using the native rendering 
stack) to the Host for the purpose of displaying them. Or, quite simply,
this driver can be used to transfer a dmabuf created by an application
running on the Guest to another application running on the Host.

The userspace component running in the Guest that transfers the dmabuf
is referred to as the producer or exporter and its counterpart running
in the Host is referred to as importer or consumer. For instance, a
Wayland compositor would potentially be a producer and Qemu UI would
be a consumer. It is the producer's responsibility to not reuse or
destroy the shared buffer while it is still being used by the consumer.
The consumer would send a release cmd indicating that it is done after
which the shared buffer can be safely used again by the producer. One
way the producer can prevent accidental re-use of the shared buffer is
to lock the buffer when it exports it and unlock it after it gets a 
release cmd. As an example, the GBM API provides a simple way to lock 
and unlock a surface's buffers.

For each dmabuf that is to be shared with the Host, a 128-bit unique
ID is generated that identifies this buffer across the whole system.
This ID is a combination of the Qemu process ID, a counter and a
randomizer. We could potentially use UUID API but we currently use
the above mentioned combination to identify the source of the buffer
at any given time for potential bookkeeping.

A typical cycle starts with the producer or exporter calling the
export IOCTL to export a dmabuf; a new unique ID is generated for
this buffer and it gets registered with the Host. The Host then
alerts the consumer or importer by raising an event and shares the ID.
In response, the consumer calls the import IOCTL using the ID and gets
a newly created dmabuf fd in return. After it is done using the dmabuf,
the consumer finally calls the release IOCTL and the Guest is notified
which in turn notifies the producer letting it know that the buffer is
now safe to reuse. 

v2:
- Added a notifier mechanism for getting the kvm pointer.
- Added start and stop routines in the Vhost backend.
- Augmented the cover letter and made some minor improvements.

v3:
- Refactored the code to make it similar to vsock

Vivek Kasireddy (3):
  kvm: Add a notifier for create and destroy VM events
  virtio: Introduce Vdmabuf driver
  vhost: Add Vdmabuf backend

 drivers/vhost/Kconfig               |    9 +
 drivers/vhost/Makefile              |    3 +
 drivers/vhost/vdmabuf.c             | 1446 +++++++++++++++++++++++++++
 drivers/virtio/Kconfig              |    8 +
 drivers/virtio/Makefile             |    1 +
 drivers/virtio/virtio_vdmabuf.c     | 1090 ++++++++++++++++++++
 include/linux/kvm_host.h            |    5 +
 include/linux/virtio_vdmabuf.h      |  271 +++++
 include/uapi/linux/vhost.h          |    3 +
 include/uapi/linux/virtio_ids.h     |    1 +
 include/uapi/linux/virtio_vdmabuf.h |   99 ++
 virt/kvm/kvm_main.c                 |   20 +-
 12 files changed, 2954 insertions(+), 2 deletions(-)
 create mode 100644 drivers/vhost/vdmabuf.c
 create mode 100644 drivers/virtio/virtio_vdmabuf.c
 create mode 100644 include/linux/virtio_vdmabuf.h
 create mode 100644 include/uapi/linux/virtio_vdmabuf.h

-- 
2.26.2

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [RFC v3 1/3] kvm: Add a notifier for create and destroy VM events
  2021-02-03  7:35 ` Vivek Kasireddy
  (?)
@ 2021-02-03  7:35   ` Vivek Kasireddy
  -1 siblings, 0 replies; 57+ messages in thread
From: Vivek Kasireddy @ 2021-02-03  7:35 UTC (permalink / raw)
  To: virtualization, dri-devel
  Cc: kraxel, daniel.vetter, daniel.vetter, dongwon.kim, sumit.semwal,
	christian.koenig, linux-media, Vivek Kasireddy

After registering with this notifier, other drivers that are dependent
on KVM can get notified whenever a VM is created or destroyed. This
also provides a way for sharing the KVM instance pointer with other
drivers.

Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
---
 include/linux/kvm_host.h |  5 +++++
 virt/kvm/kvm_main.c      | 20 ++++++++++++++++++--
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f3b1013fb22c..fc1a688301a0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -88,6 +88,9 @@
 #define KVM_PFN_ERR_HWPOISON	(KVM_PFN_ERR_MASK + 1)
 #define KVM_PFN_ERR_RO_FAULT	(KVM_PFN_ERR_MASK + 2)
 
+#define KVM_EVENT_CREATE_VM 0
+#define KVM_EVENT_DESTROY_VM 1
+
 /*
  * error pfns indicate that the gfn is in slot but faild to
  * translate it to pfn on host.
@@ -1494,5 +1497,7 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
 
 /* Max number of entries allowed for each kvm dirty ring */
 #define  KVM_DIRTY_RING_MAX_ENTRIES  65536
+int kvm_vm_register_notifier(struct notifier_block *nb);
+int kvm_vm_unregister_notifier(struct notifier_block *nb);
 
 #endif
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5f260488e999..8a0e8bb02a5f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -101,6 +101,8 @@ DEFINE_MUTEX(kvm_lock);
 static DEFINE_RAW_SPINLOCK(kvm_count_lock);
 LIST_HEAD(vm_list);
 
+static struct blocking_notifier_head kvm_vm_notifier;
+
 static cpumask_var_t cpus_hardware_enabled;
 static int kvm_usage_count;
 static atomic_t hardware_enable_failed;
@@ -148,12 +150,20 @@ static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
 __visible bool kvm_rebooting;
 EXPORT_SYMBOL_GPL(kvm_rebooting);
 
-#define KVM_EVENT_CREATE_VM 0
-#define KVM_EVENT_DESTROY_VM 1
 static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm);
 static unsigned long long kvm_createvm_count;
 static unsigned long long kvm_active_vms;
 
+inline int kvm_vm_register_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_register(&kvm_vm_notifier, nb);
+}
+
+inline int kvm_vm_unregister_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_unregister(&kvm_vm_notifier, nb);
+}
+
 __weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
 						   unsigned long start, unsigned long end)
 {
@@ -808,6 +818,8 @@ static struct kvm *kvm_create_vm(unsigned long type)
 
 	preempt_notifier_inc();
 
+	blocking_notifier_call_chain(&kvm_vm_notifier,
+				     KVM_EVENT_CREATE_VM, kvm);
 	return kvm;
 
 out_err:
@@ -886,6 +898,8 @@ static void kvm_destroy_vm(struct kvm *kvm)
 	preempt_notifier_dec();
 	hardware_disable_all();
 	mmdrop(mm);
+	blocking_notifier_call_chain(&kvm_vm_notifier,
+				     KVM_EVENT_DESTROY_VM, kvm);
 }
 
 void kvm_get_kvm(struct kvm *kvm)
@@ -4968,6 +4982,8 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
 	r = kvm_vfio_ops_init();
 	WARN_ON(r);
 
+	BLOCKING_INIT_NOTIFIER_HEAD(&kvm_vm_notifier);
+
 	return 0;
 
 out_unreg:
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [RFC v3 1/3] kvm: Add a notifier for create and destroy VM events
@ 2021-02-03  7:35   ` Vivek Kasireddy
  0 siblings, 0 replies; 57+ messages in thread
From: Vivek Kasireddy @ 2021-02-03  7:35 UTC (permalink / raw)
  To: virtualization, dri-devel
  Cc: dongwon.kim, daniel.vetter, sumit.semwal, daniel.vetter,
	christian.koenig, linux-media

After registering with this notifier, other drivers that are dependent
on KVM can get notified whenever a VM is created or destroyed. This
also provides a way for sharing the KVM instance pointer with other
drivers.

Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
---
 include/linux/kvm_host.h |  5 +++++
 virt/kvm/kvm_main.c      | 20 ++++++++++++++++++--
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f3b1013fb22c..fc1a688301a0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -88,6 +88,9 @@
 #define KVM_PFN_ERR_HWPOISON	(KVM_PFN_ERR_MASK + 1)
 #define KVM_PFN_ERR_RO_FAULT	(KVM_PFN_ERR_MASK + 2)
 
+#define KVM_EVENT_CREATE_VM 0
+#define KVM_EVENT_DESTROY_VM 1
+
 /*
  * error pfns indicate that the gfn is in slot but faild to
  * translate it to pfn on host.
@@ -1494,5 +1497,7 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
 
 /* Max number of entries allowed for each kvm dirty ring */
 #define  KVM_DIRTY_RING_MAX_ENTRIES  65536
+int kvm_vm_register_notifier(struct notifier_block *nb);
+int kvm_vm_unregister_notifier(struct notifier_block *nb);
 
 #endif
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5f260488e999..8a0e8bb02a5f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -101,6 +101,8 @@ DEFINE_MUTEX(kvm_lock);
 static DEFINE_RAW_SPINLOCK(kvm_count_lock);
 LIST_HEAD(vm_list);
 
+static struct blocking_notifier_head kvm_vm_notifier;
+
 static cpumask_var_t cpus_hardware_enabled;
 static int kvm_usage_count;
 static atomic_t hardware_enable_failed;
@@ -148,12 +150,20 @@ static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
 __visible bool kvm_rebooting;
 EXPORT_SYMBOL_GPL(kvm_rebooting);
 
-#define KVM_EVENT_CREATE_VM 0
-#define KVM_EVENT_DESTROY_VM 1
 static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm);
 static unsigned long long kvm_createvm_count;
 static unsigned long long kvm_active_vms;
 
+inline int kvm_vm_register_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_register(&kvm_vm_notifier, nb);
+}
+
+inline int kvm_vm_unregister_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_unregister(&kvm_vm_notifier, nb);
+}
+
 __weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
 						   unsigned long start, unsigned long end)
 {
@@ -808,6 +818,8 @@ static struct kvm *kvm_create_vm(unsigned long type)
 
 	preempt_notifier_inc();
 
+	blocking_notifier_call_chain(&kvm_vm_notifier,
+				     KVM_EVENT_CREATE_VM, kvm);
 	return kvm;
 
 out_err:
@@ -886,6 +898,8 @@ static void kvm_destroy_vm(struct kvm *kvm)
 	preempt_notifier_dec();
 	hardware_disable_all();
 	mmdrop(mm);
+	blocking_notifier_call_chain(&kvm_vm_notifier,
+				     KVM_EVENT_DESTROY_VM, kvm);
 }
 
 void kvm_get_kvm(struct kvm *kvm)
@@ -4968,6 +4982,8 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
 	r = kvm_vfio_ops_init();
 	WARN_ON(r);
 
+	BLOCKING_INIT_NOTIFIER_HEAD(&kvm_vm_notifier);
+
 	return 0;
 
 out_unreg:
-- 
2.26.2

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [RFC v3 1/3] kvm: Add a notifier for create and destroy VM events
@ 2021-02-03  7:35   ` Vivek Kasireddy
  0 siblings, 0 replies; 57+ messages in thread
From: Vivek Kasireddy @ 2021-02-03  7:35 UTC (permalink / raw)
  To: virtualization, dri-devel
  Cc: dongwon.kim, daniel.vetter, Vivek Kasireddy, kraxel,
	daniel.vetter, christian.koenig, linux-media

After registering with this notifier, other drivers that are dependent
on KVM can get notified whenever a VM is created or destroyed. This
also provides a way for sharing the KVM instance pointer with other
drivers.

Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
---
 include/linux/kvm_host.h |  5 +++++
 virt/kvm/kvm_main.c      | 20 ++++++++++++++++++--
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f3b1013fb22c..fc1a688301a0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -88,6 +88,9 @@
 #define KVM_PFN_ERR_HWPOISON	(KVM_PFN_ERR_MASK + 1)
 #define KVM_PFN_ERR_RO_FAULT	(KVM_PFN_ERR_MASK + 2)
 
+#define KVM_EVENT_CREATE_VM 0
+#define KVM_EVENT_DESTROY_VM 1
+
 /*
  * error pfns indicate that the gfn is in slot but faild to
  * translate it to pfn on host.
@@ -1494,5 +1497,7 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
 
 /* Max number of entries allowed for each kvm dirty ring */
 #define  KVM_DIRTY_RING_MAX_ENTRIES  65536
+int kvm_vm_register_notifier(struct notifier_block *nb);
+int kvm_vm_unregister_notifier(struct notifier_block *nb);
 
 #endif
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5f260488e999..8a0e8bb02a5f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -101,6 +101,8 @@ DEFINE_MUTEX(kvm_lock);
 static DEFINE_RAW_SPINLOCK(kvm_count_lock);
 LIST_HEAD(vm_list);
 
+static struct blocking_notifier_head kvm_vm_notifier;
+
 static cpumask_var_t cpus_hardware_enabled;
 static int kvm_usage_count;
 static atomic_t hardware_enable_failed;
@@ -148,12 +150,20 @@ static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
 __visible bool kvm_rebooting;
 EXPORT_SYMBOL_GPL(kvm_rebooting);
 
-#define KVM_EVENT_CREATE_VM 0
-#define KVM_EVENT_DESTROY_VM 1
 static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm);
 static unsigned long long kvm_createvm_count;
 static unsigned long long kvm_active_vms;
 
+inline int kvm_vm_register_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_register(&kvm_vm_notifier, nb);
+}
+
+inline int kvm_vm_unregister_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_unregister(&kvm_vm_notifier, nb);
+}
+
 __weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
 						   unsigned long start, unsigned long end)
 {
@@ -808,6 +818,8 @@ static struct kvm *kvm_create_vm(unsigned long type)
 
 	preempt_notifier_inc();
 
+	blocking_notifier_call_chain(&kvm_vm_notifier,
+				     KVM_EVENT_CREATE_VM, kvm);
 	return kvm;
 
 out_err:
@@ -886,6 +898,8 @@ static void kvm_destroy_vm(struct kvm *kvm)
 	preempt_notifier_dec();
 	hardware_disable_all();
 	mmdrop(mm);
+	blocking_notifier_call_chain(&kvm_vm_notifier,
+				     KVM_EVENT_DESTROY_VM, kvm);
 }
 
 void kvm_get_kvm(struct kvm *kvm)
@@ -4968,6 +4982,8 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
 	r = kvm_vfio_ops_init();
 	WARN_ON(r);
 
+	BLOCKING_INIT_NOTIFIER_HEAD(&kvm_vm_notifier);
+
 	return 0;
 
 out_unreg:
-- 
2.26.2

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [RFC v3 2/3] virtio: Introduce Vdmabuf driver
  2021-02-03  7:35 ` Vivek Kasireddy
  (?)
@ 2021-02-03  7:35   ` Vivek Kasireddy
  -1 siblings, 0 replies; 57+ messages in thread
From: Vivek Kasireddy @ 2021-02-03  7:35 UTC (permalink / raw)
  To: virtualization, dri-devel
  Cc: kraxel, daniel.vetter, daniel.vetter, dongwon.kim, sumit.semwal,
	christian.koenig, linux-media, Vivek Kasireddy

This driver "transfers" a dmabuf created on the Guest to the Host.
A common use-case for such a transfer includes sharing the scanout
buffer created by a display server or a compositor running in the
Guest with Qemu UI -- running on the Host.

The "transfer" is accomplished by sharing the PFNs of all the pages
associated with the dmabuf and having a new dmabuf created on the
Host that is backed up by the pages mapped from the Guest.

Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
---
 drivers/virtio/Kconfig              |    8 +
 drivers/virtio/Makefile             |    1 +
 drivers/virtio/virtio_vdmabuf.c     | 1090 +++++++++++++++++++++++++++
 include/linux/virtio_vdmabuf.h      |  271 +++++++
 include/uapi/linux/virtio_ids.h     |    1 +
 include/uapi/linux/virtio_vdmabuf.h |   99 +++
 6 files changed, 1470 insertions(+)
 create mode 100644 drivers/virtio/virtio_vdmabuf.c
 create mode 100644 include/linux/virtio_vdmabuf.h
 create mode 100644 include/uapi/linux/virtio_vdmabuf.h

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 7b41130d3f35..e563c12f711e 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -139,4 +139,12 @@ config VIRTIO_DMA_SHARED_BUFFER
 	 This option adds a flavor of dma buffers that are backed by
 	 virtio resources.
 
+config VIRTIO_VDMABUF
+	bool "Enables Vdmabuf driver in guest os"
+	default n
+	depends on VIRTIO
+	help
+	 This driver provides a way to share the dmabufs created in
+	 the Guest with the Host.
+
 endif # VIRTIO_MENU
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 591e6f72aa54..b4bb0738009c 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -9,3 +9,4 @@ obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
 obj-$(CONFIG_VIRTIO_VDPA) += virtio_vdpa.o
 obj-$(CONFIG_VIRTIO_MEM) += virtio_mem.o
 obj-$(CONFIG_VIRTIO_DMA_SHARED_BUFFER) += virtio_dma_buf.o
+obj-$(CONFIG_VIRTIO_VDMABUF) += virtio_vdmabuf.o
diff --git a/drivers/virtio/virtio_vdmabuf.c b/drivers/virtio/virtio_vdmabuf.c
new file mode 100644
index 000000000000..c28f144eb126
--- /dev/null
+++ b/drivers/virtio/virtio_vdmabuf.c
@@ -0,0 +1,1090 @@
+// SPDX-License-Identifier: (MIT OR GPL-2.0)
+
+/*
+ * Copyright © 2021 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Dongwon Kim <dongwon.kim@intel.com>
+ *    Mateusz Polrola <mateusz.polrola@gmail.com>
+ *    Vivek Kasireddy <vivek.kasireddy@intel.com>
+ */
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/uaccess.h>
+#include <linux/miscdevice.h>
+#include <linux/delay.h>
+#include <linux/random.h>
+#include <linux/poll.h>
+#include <linux/spinlock.h>
+#include <linux/dma-buf.h>
+#include <linux/virtio.h>
+#include <linux/virtio_ids.h>
+#include <linux/virtio_config.h>
+#include <linux/virtio_vdmabuf.h>
+
+#define VIRTIO_VDMABUF_MAX_ID INT_MAX
+#define REFS_PER_PAGE (PAGE_SIZE/sizeof(long))
+#define NEW_BUF_ID_GEN(vmid, cnt) (((vmid & 0xFFFFFFFF) << 32) | \
+				    ((cnt) & 0xFFFFFFFF))
+
+/* one global drv object */
+static struct virtio_vdmabuf_info *drv_info;
+
+struct virtio_vdmabuf {
+	/* virtio device structure */
+	struct virtio_device *vdev;
+
+	/* virtual queue array */
+	struct virtqueue *vqs[VDMABUF_VQ_MAX];
+
+	/* ID of guest OS */
+	u64 vmid;
+
+	/* spin lock that needs to be acquired before accessing
+	 * virtual queue
+	 */
+	spinlock_t vq_lock;
+	struct mutex recv_lock;
+	struct mutex send_lock;
+
+	struct list_head msg_list;
+
+	/* workqueue */
+	struct workqueue_struct *wq;
+	struct work_struct recv_work;
+	struct work_struct send_work;
+	struct work_struct send_msg_work;
+
+	struct virtio_vdmabuf_event_queue *evq;
+};
+
+static virtio_vdmabuf_buf_id_t get_buf_id(struct virtio_vdmabuf *vdmabuf)
+{
+	virtio_vdmabuf_buf_id_t buf_id = {0, {0, 0} };
+	static int count = 0;
+
+	count = count < VIRTIO_VDMABUF_MAX_ID ? count + 1 : 0;
+	buf_id.id = NEW_BUF_ID_GEN(vdmabuf->vmid, count);
+
+	/* random data embedded in the id for security */
+	get_random_bytes(&buf_id.rng_key[0], 8);
+
+	return buf_id;
+}
+
+/* sharing pages for original DMABUF with Host */
+static struct virtio_vdmabuf_shared_pages
+*virtio_vdmabuf_share_buf(struct page **pages, int nents,
+			  int first_ofst, int last_len)
+{
+	struct virtio_vdmabuf_shared_pages *pages_info;
+	int i;
+	int n_l2refs = nents/REFS_PER_PAGE +
+		       ((nents % REFS_PER_PAGE) ? 1 : 0);
+
+	pages_info = kvcalloc(1, sizeof(*pages_info), GFP_KERNEL);
+	if (!pages_info)
+		return NULL;
+
+	pages_info->pages = pages;
+	pages_info->nents = nents;
+	pages_info->first_ofst = first_ofst;
+	pages_info->last_len = last_len;
+	pages_info->l3refs = (gpa_t *)__get_free_page(GFP_KERNEL);
+
+	if (!pages_info->l3refs) {
+		kvfree(pages_info);
+		return NULL;
+	}
+
+	pages_info->l2refs = (gpa_t **)__get_free_pages(GFP_KERNEL,
+					get_order(n_l2refs * PAGE_SIZE));
+
+	if (!pages_info->l2refs) {
+		free_page((gpa_t)pages_info->l3refs);
+		kvfree(pages_info);
+		return NULL;
+	}
+
+	/* Share physical address of pages */
+	for (i = 0; i < nents; i++)
+		pages_info->l2refs[i] = (gpa_t *)page_to_phys(pages[i]);
+
+	for (i = 0; i < n_l2refs; i++)
+		pages_info->l3refs[i] =
+			virt_to_phys((void *)pages_info->l2refs +
+				     i * PAGE_SIZE);
+
+	pages_info->ref = (gpa_t)virt_to_phys(pages_info->l3refs);
+
+	return pages_info;
+}
+
+/* stop sharing pages */
+static void
+virtio_vdmabuf_free_buf(struct virtio_vdmabuf_shared_pages *pages_info)
+{
+	int n_l2refs = (pages_info->nents/REFS_PER_PAGE +
+		       ((pages_info->nents % REFS_PER_PAGE) ? 1 : 0));
+
+	free_pages((gpa_t)pages_info->l2refs, get_order(n_l2refs * PAGE_SIZE));
+	free_page((gpa_t)pages_info->l3refs);
+
+	kvfree(pages_info);
+}
+
+static int send_msg_to_host(enum virtio_vdmabuf_cmd cmd, int *op)
+{
+	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
+	struct virtio_vdmabuf_msg *msg;
+	int i;
+
+	switch (cmd) {
+	case VIRTIO_VDMABUF_CMD_NEED_VMID:
+		msg = kvcalloc(1, sizeof(struct virtio_vdmabuf_msg),
+			       GFP_KERNEL);
+		if (!msg)
+			return -ENOMEM;
+
+		if (op)
+			for (i = 0; i < 4; i++)
+				msg->op[i] = op[i];
+		break;
+
+	case VIRTIO_VDMABUF_CMD_EXPORT:
+		msg = kvcalloc(1, sizeof(struct virtio_vdmabuf_msg),
+			       GFP_KERNEL);
+		if (!msg)
+			return -ENOMEM;
+
+		memcpy(&msg->op[0], &op[0], 9 * sizeof(int) + op[9]);
+		break;
+
+	default:
+		/* no command found */
+		return -EINVAL;
+	}
+
+	msg->cmd = cmd;
+	list_add_tail(&msg->list, &vdmabuf->msg_list);
+	queue_work(vdmabuf->wq, &vdmabuf->send_msg_work);
+
+	return 0;
+}
+
+static int add_event_buf_rel(struct virtio_vdmabuf_buf *buf_info)
+{
+	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
+	struct virtio_vdmabuf_event *e_oldest, *e_new;
+	struct virtio_vdmabuf_event_queue *eq = vdmabuf->evq;
+	unsigned long irqflags;
+
+	e_new = kvzalloc(sizeof(*e_new), GFP_KERNEL);
+	if (!e_new)
+		return -ENOMEM;
+
+	e_new->e_data.hdr.buf_id = buf_info->buf_id;
+	e_new->e_data.data = (void *)buf_info->priv;
+	e_new->e_data.hdr.size = buf_info->sz_priv;
+
+	spin_lock_irqsave(&eq->e_lock, irqflags);
+
+	/* check current number of events and if it hits the max num (32)
+	 * then remove the oldest event in the list
+	 */
+	if (eq->pending > 31) {
+		e_oldest = list_first_entry(&eq->e_list,
+					    struct virtio_vdmabuf_event, link);
+		list_del(&e_oldest->link);
+		eq->pending--;
+		kvfree(e_oldest);
+	}
+
+	list_add_tail(&e_new->link, &eq->e_list);
+
+	eq->pending++;
+
+	wake_up_interruptible(&eq->e_wait);
+	spin_unlock_irqrestore(&eq->e_lock, irqflags);
+
+	return 0;
+}
+
+static void virtio_vdmabuf_clear_buf(struct virtio_vdmabuf_buf *exp)
+{
+	/* Start cleanup of buffer in reverse order to exporting */
+	virtio_vdmabuf_free_buf(exp->pages_info);
+
+	dma_buf_unmap_attachment(exp->attach, exp->sgt,
+				 DMA_BIDIRECTIONAL);
+
+	if (exp->dma_buf) {
+		dma_buf_detach(exp->dma_buf, exp->attach);
+		/* close connection to dma-buf completely */
+		dma_buf_put(exp->dma_buf);
+		exp->dma_buf = NULL;
+	}
+}
+
+static int remove_buf(struct virtio_vdmabuf *vdmabuf,
+		      struct virtio_vdmabuf_buf *exp)
+{
+	int ret;
+
+	ret = add_event_buf_rel(exp);
+	if (ret)
+		return ret;
+
+	virtio_vdmabuf_clear_buf(exp);
+
+	ret = virtio_vdmabuf_del_buf(drv_info, &exp->buf_id);
+	if (ret)
+		return ret;
+
+	if (exp->sz_priv > 0 && !exp->priv)
+		kvfree(exp->priv);
+
+	kvfree(exp);
+	return 0;
+}
+
+static int parse_msg_from_host(struct virtio_vdmabuf *vdmabuf,
+		     	       struct virtio_vdmabuf_msg *msg)
+{
+	struct virtio_vdmabuf_buf *exp;
+	virtio_vdmabuf_buf_id_t buf_id;
+	int ret;
+
+	switch (msg->cmd) {
+	case VIRTIO_VDMABUF_CMD_NEED_VMID:
+		vdmabuf->vmid = msg->op[0];
+
+		break;
+	case VIRTIO_VDMABUF_CMD_DMABUF_REL:
+		memcpy(&buf_id, msg->op, sizeof(buf_id));
+
+		exp = virtio_vdmabuf_find_buf(drv_info, &buf_id);
+		if (!exp) {
+			dev_err(drv_info->dev, "can't find buffer\n");
+			return -EINVAL;
+		}
+
+		ret = remove_buf(vdmabuf, exp);
+		if (ret)
+			return ret;
+
+		break;
+	case VIRTIO_VDMABUF_CMD_EXPORT:
+		break;
+	default:
+		dev_err(drv_info->dev, "empty cmd\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static void virtio_vdmabuf_recv_work(struct work_struct *work)
+{
+	struct virtio_vdmabuf *vdmabuf =
+		container_of(work, struct virtio_vdmabuf, recv_work);
+	struct virtqueue *vq = vdmabuf->vqs[VDMABUF_VQ_RECV];
+	struct virtio_vdmabuf_msg *msg;
+	int sz;
+
+	mutex_lock(&vdmabuf->recv_lock);
+
+	do {
+		virtqueue_disable_cb(vq);
+		for (;;) {
+			msg = virtqueue_get_buf(vq, &sz);
+			if (!msg)
+				break;
+
+			/* valid size */
+			if (sz == sizeof(struct virtio_vdmabuf_msg)) {
+				if (parse_msg_from_host(vdmabuf, msg))
+					dev_err(drv_info->dev,
+						"msg parse error\n");
+
+				kvfree(msg);
+			} else {
+				dev_err(drv_info->dev,
+					"received malformed message\n");
+			}
+		}
+	} while (!virtqueue_enable_cb(vq));
+
+	mutex_unlock(&vdmabuf->recv_lock);
+}
+
+static void virtio_vdmabuf_fill_recv_msg(struct virtio_vdmabuf *vdmabuf)
+{
+	struct virtqueue *vq = vdmabuf->vqs[VDMABUF_VQ_RECV];
+	struct scatterlist sg;
+	struct virtio_vdmabuf_msg *msg;
+	int ret;
+
+	msg = kvzalloc(sizeof(*msg), GFP_KERNEL);
+	if (!msg)
+		return;
+
+	sg_init_one(&sg, msg, sizeof(struct virtio_vdmabuf_msg));
+	ret = virtqueue_add_inbuf(vq, &sg, 1, msg, GFP_KERNEL);
+	if (ret)
+		return;
+
+	virtqueue_kick(vq);
+}
+
+static void virtio_vdmabuf_send_msg_work(struct work_struct *work)
+{
+	struct virtio_vdmabuf *vdmabuf =
+		container_of(work, struct virtio_vdmabuf, send_msg_work);
+	struct virtqueue *vq = vdmabuf->vqs[VDMABUF_VQ_SEND];
+	struct scatterlist sg;
+	struct virtio_vdmabuf_msg *msg;
+	bool added = false;
+	int ret;
+
+	mutex_lock(&vdmabuf->send_lock);
+
+	for (;;) {
+		if (list_empty(&vdmabuf->msg_list))
+			break;
+
+		virtio_vdmabuf_fill_recv_msg(vdmabuf);
+
+		msg = list_first_entry(&vdmabuf->msg_list,
+				       struct virtio_vdmabuf_msg, list);
+		list_del_init(&msg->list);
+
+		sg_init_one(&sg, msg, sizeof(struct virtio_vdmabuf_msg));
+		ret = virtqueue_add_outbuf(vq, &sg, 1, msg, GFP_KERNEL);
+		if (ret < 0) {
+			dev_err(drv_info->dev,
+				"failed to add msg to vq\n");
+			break;
+		}
+
+		added = true;	
+	}
+
+	if (added)
+		virtqueue_kick(vq);
+
+	mutex_unlock(&vdmabuf->send_lock);
+}
+
+static void virtio_vdmabuf_send_work(struct work_struct *work)
+{
+	struct virtio_vdmabuf *vdmabuf =
+		container_of(work, struct virtio_vdmabuf, send_work);
+	struct virtqueue *vq = vdmabuf->vqs[VDMABUF_VQ_SEND];
+	struct virtio_vdmabuf_msg *msg;
+	unsigned int sz;
+	bool added = false;
+
+	mutex_lock(&vdmabuf->send_lock);
+
+	do {
+		virtqueue_disable_cb(vq);
+
+		for (;;) {
+			msg = virtqueue_get_buf(vq, &sz);
+			if (!msg)
+				break;
+
+			if (parse_msg_from_host(vdmabuf, msg))
+				dev_err(drv_info->dev,
+					"msg parse error\n");
+
+			kvfree(msg);
+			added = true;
+		}
+	} while (!virtqueue_enable_cb(vq));
+
+	mutex_unlock(&vdmabuf->send_lock);
+
+	if (added)
+		queue_work(vdmabuf->wq, &vdmabuf->send_msg_work);
+}
+
+static void virtio_vdmabuf_recv_cb(struct virtqueue *vq)
+{
+	struct virtio_vdmabuf *vdmabuf = vq->vdev->priv;
+
+	if (!vdmabuf)
+		return;
+
+	queue_work(vdmabuf->wq, &vdmabuf->recv_work);
+}
+
+static void virtio_vdmabuf_send_cb(struct virtqueue *vq)
+{
+	struct virtio_vdmabuf *vdmabuf = vq->vdev->priv;
+
+	if (!vdmabuf)
+		return;
+
+	queue_work(vdmabuf->wq, &vdmabuf->send_work);
+}
+
+static int remove_all_bufs(struct virtio_vdmabuf *vdmabuf)
+{
+	struct virtio_vdmabuf_buf *found;
+	struct hlist_node *tmp;
+	int bkt;
+	int ret;
+
+	hash_for_each_safe(drv_info->buf_list, bkt, tmp, found, node) {
+		ret = remove_buf(vdmabuf, found);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int virtio_vdmabuf_open(struct inode *inode, struct file *filp)
+{
+	int ret;
+
+	if (!drv_info) {
+		pr_err("virtio vdmabuf driver is not ready\n");
+		return -EINVAL;
+	}
+
+	ret = send_msg_to_host(VIRTIO_VDMABUF_CMD_NEED_VMID, 0);
+	if (ret < 0)
+		dev_err(drv_info->dev, "fail to receive vmid\n");
+
+	filp->private_data = drv_info->priv;
+
+	return 0;
+}
+
+static int virtio_vdmabuf_release(struct inode *inode, struct file *filp)
+{
+	return 0;
+}
+
+/* Notify Host about the new vdmabuf */
+static int export_notify(struct virtio_vdmabuf_buf *exp, struct page **pages)
+{
+	int *op;
+	int ret;
+
+	op = kvcalloc(1, sizeof(int) * 65, GFP_KERNEL);
+	if (!op)
+		return -ENOMEM;
+
+	memcpy(op, &exp->buf_id, sizeof(exp->buf_id));
+
+	/* if new pages are to be shared */
+	if (pages) {
+		op[4] = exp->pages_info->nents;
+		op[5] = exp->pages_info->first_ofst;
+		op[6] = exp->pages_info->last_len;
+
+		memcpy(&op[7], &exp->pages_info->ref, sizeof(gpa_t));
+	}
+
+	op[9] = exp->sz_priv;
+
+	/* driver/application specific private info */
+	memcpy(&op[10], exp->priv, op[9]);
+
+	ret = send_msg_to_host(VIRTIO_VDMABUF_CMD_EXPORT, op);
+
+	kvfree(op);
+	return ret;
+}
+
+/* return total number of pages referenced by a sgt
+ * for pre-calculation of # of pages behind a given sgt
+ */
+static int num_pgs(struct sg_table *sgt)
+{
+	struct scatterlist *sgl;
+	int len, i;
+	/* at least one page */
+	int n_pgs = 1;
+
+	sgl = sgt->sgl;
+
+	len = sgl->length - PAGE_SIZE + sgl->offset;
+
+	/* round-up */
+	n_pgs += ((len + PAGE_SIZE - 1)/PAGE_SIZE);
+
+	for (i = 1; i < sgt->nents; i++) {
+		sgl = sg_next(sgl);
+
+		/* round-up */
+		n_pgs += ((sgl->length + PAGE_SIZE - 1) /
+			  PAGE_SIZE); /* round-up */
+	}
+
+	return n_pgs;
+}
+
+/* extract pages referenced by sgt */
+static struct page **extr_pgs(struct sg_table *sgt, int *nents, int *last_len)
+{
+	struct scatterlist *sgl;
+	struct page **pages;
+	struct page **temp_pgs;
+	int i, j;
+	int len;
+
+	*nents = num_pgs(sgt);
+	pages =	kvmalloc_array(*nents, sizeof(struct page *), GFP_KERNEL);
+	if (!pages)
+		return NULL;
+
+	sgl = sgt->sgl;
+
+	temp_pgs = pages;
+	*temp_pgs++ = sg_page(sgl);
+	len = sgl->length - PAGE_SIZE + sgl->offset;
+
+	i = 1;
+	while (len > 0) {
+		*temp_pgs++ = nth_page(sg_page(sgl), i++);
+		len -= PAGE_SIZE;
+	}
+
+	for (i = 1; i < sgt->nents; i++) {
+		sgl = sg_next(sgl);
+		*temp_pgs++ = sg_page(sgl);
+		len = sgl->length - PAGE_SIZE;
+		j = 1;
+
+		while (len > 0) {
+			*temp_pgs++ = nth_page(sg_page(sgl), j++);
+			len -= PAGE_SIZE;
+		}
+	}
+
+	*last_len = len + PAGE_SIZE;
+
+	return pages;
+}
+
+/* ioctl - exporting new vdmabuf
+ *
+ *	 int dmabuf_fd - File handle of original DMABUF
+ *	 virtio_vdmabuf_buf_id_t buf_id - returned vdmabuf ID
+ *	 int sz_priv - size of private data from userspace
+ *	 char *priv - buffer of user private data
+ *
+ */
+static int export_ioctl(struct file *filp, void *data)
+{
+	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
+	struct virtio_vdmabuf_export *attr = data;
+	struct dma_buf *dmabuf;
+	struct dma_buf_attachment *attach;
+	struct sg_table *sgt;
+	struct virtio_vdmabuf_buf *exp;
+	struct page **pages;
+	int nents, last_len;
+	virtio_vdmabuf_buf_id_t buf_id;
+	int ret = 0;
+
+	if (vdmabuf->vmid <= 0)
+		return -EINVAL;
+
+	dmabuf = dma_buf_get(attr->fd);
+	if (IS_ERR(dmabuf))
+		return PTR_ERR(dmabuf);
+
+	mutex_lock(&drv_info->g_mutex);
+
+	buf_id = get_buf_id(vdmabuf);
+
+	attach = dma_buf_attach(dmabuf, drv_info->dev);
+	if (IS_ERR(attach)) {
+		ret = PTR_ERR(attach);
+		goto fail_attach;
+	}
+
+	sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
+	if (IS_ERR(sgt)) {
+		ret = PTR_ERR(sgt);
+		goto fail_map_attachment;
+	}
+
+	/* allocate a new exp */
+	exp = kvcalloc(1, sizeof(*exp), GFP_KERNEL);
+	if (!exp) {
+		ret = -ENOMEM;
+		goto fail_sgt_info_creation;
+	}
+
+	/* possible truncation */
+	if (attr->sz_priv > MAX_SIZE_PRIV_DATA)
+		exp->sz_priv = MAX_SIZE_PRIV_DATA;
+	else
+		exp->sz_priv = attr->sz_priv;
+
+	/* creating buffer for private data */
+	if (exp->sz_priv != 0) {
+		exp->priv = kvcalloc(1, exp->sz_priv, GFP_KERNEL);
+		if (!exp->priv) {
+			ret = -ENOMEM;
+			goto fail_priv_creation;
+		}
+	}
+
+	exp->buf_id = buf_id;
+	exp->attach = attach;
+	exp->sgt = sgt;
+	exp->dma_buf = dmabuf;
+	exp->valid = 1;
+
+	if (exp->sz_priv) {
+		/* copy private data to sgt_info */
+		ret = copy_from_user(exp->priv, attr->priv, exp->sz_priv);
+		if (ret) {
+			ret = -EINVAL;
+			goto fail_exp;
+		}
+	}
+
+	pages = extr_pgs(sgt, &nents, &last_len);
+	if (pages == NULL) {
+		ret = -ENOMEM;
+		goto fail_exp;
+	}
+
+	exp->pages_info = virtio_vdmabuf_share_buf(pages, nents,
+						   sgt->sgl->offset,
+					 	   last_len);
+	if (!exp->pages_info) {
+		ret = -ENOMEM;
+		goto fail_create_pages_info;
+	}
+
+	attr->buf_id = exp->buf_id;
+	ret = export_notify(exp, pages);
+	if (ret < 0)
+		goto fail_send_request;
+
+	/* now register it to the export list */
+	virtio_vdmabuf_add_buf(drv_info, exp);
+
+	exp->filp = filp;
+
+	mutex_unlock(&drv_info->g_mutex);
+
+	return ret;
+
+/* Clean-up if error occurs */
+fail_send_request:
+	virtio_vdmabuf_free_buf(exp->pages_info);
+
+fail_create_pages_info:
+	kvfree(pages);
+
+fail_exp:
+	kvfree(exp->priv);
+
+fail_priv_creation:
+	kvfree(exp);
+
+fail_sgt_info_creation:
+	dma_buf_unmap_attachment(attach, sgt,
+				 DMA_BIDIRECTIONAL);
+
+fail_map_attachment:
+	dma_buf_detach(dmabuf, attach);
+
+fail_attach:
+	dma_buf_put(dmabuf);
+
+	mutex_unlock(&drv_info->g_mutex);
+
+	return ret;
+}
+
+static const struct virtio_vdmabuf_ioctl_desc virtio_vdmabuf_ioctls[] = {
+	VIRTIO_VDMABUF_IOCTL_DEF(VIRTIO_VDMABUF_IOCTL_EXPORT, export_ioctl, 0),
+};
+
+static long virtio_vdmabuf_ioctl(struct file *filp, unsigned int cmd,
+		       		 unsigned long param)
+{
+	const struct virtio_vdmabuf_ioctl_desc *ioctl = NULL;
+	unsigned int nr = _IOC_NR(cmd);
+	int ret;
+	virtio_vdmabuf_ioctl_t func;
+	char *kdata;
+
+	if (nr >= ARRAY_SIZE(virtio_vdmabuf_ioctls)) {
+		dev_err(drv_info->dev, "invalid ioctl\n");
+		return -EINVAL;
+	}
+
+	ioctl = &virtio_vdmabuf_ioctls[nr];
+
+	func = ioctl->func;
+
+	if (unlikely(!func)) {
+		dev_err(drv_info->dev, "no function\n");
+		return -EINVAL;
+	}
+
+	kdata = kvmalloc(_IOC_SIZE(cmd), GFP_KERNEL);
+	if (!kdata)
+		return -ENOMEM;
+
+	if (copy_from_user(kdata, (void __user *)param,
+			   _IOC_SIZE(cmd)) != 0) {
+		dev_err(drv_info->dev,
+			"failed to copy from user arguments\n");
+		ret = -EFAULT;
+		goto ioctl_error;
+	}
+
+	ret = func(filp, kdata);
+
+	if (copy_to_user((void __user *)param, kdata,
+			 _IOC_SIZE(cmd)) != 0) {
+		dev_err(drv_info->dev,
+			"failed to copy to user arguments\n");
+		ret = -EFAULT;
+		goto ioctl_error;
+	}
+
+ioctl_error:
+	kvfree(kdata);
+	return ret;
+}
+
+static unsigned int virtio_vdmabuf_event_poll(struct file *filp,
+			    	    	      struct poll_table_struct *wait)
+{
+	struct virtio_vdmabuf *vdmabuf = filp->private_data;
+
+	poll_wait(filp, &vdmabuf->evq->e_wait, wait);
+
+	if (!list_empty(&vdmabuf->evq->e_list))
+		return POLLIN | POLLRDNORM;
+
+	return 0;
+}
+
+static ssize_t virtio_vdmabuf_event_read(struct file *filp, char __user *buf,
+			       		 size_t cnt, loff_t *ofst)
+{
+	struct virtio_vdmabuf *vdmabuf = filp->private_data;
+	int ret;
+
+	/* make sure user buffer can be written */
+	if (!access_ok(buf, sizeof (*buf))) {
+		dev_err(drv_info->dev, "user buffer can't be written.\n");
+		return -EINVAL;
+	}
+
+	ret = mutex_lock_interruptible(&vdmabuf->evq->e_readlock);
+	if (ret)
+		return ret;
+
+	for (;;) {
+		struct virtio_vdmabuf_event *e = NULL;
+
+		spin_lock_irq(&vdmabuf->evq->e_lock);
+		if (!list_empty(&vdmabuf->evq->e_list)) {
+			e = list_first_entry(&vdmabuf->evq->e_list,
+					     struct virtio_vdmabuf_event, link);
+			list_del(&e->link);
+		}
+		spin_unlock_irq(&vdmabuf->evq->e_lock);
+
+		if (!e) {
+			if (ret)
+				break;
+
+			if (filp->f_flags & O_NONBLOCK) {
+				ret = -EAGAIN;
+				break;
+			}
+
+			mutex_unlock(&vdmabuf->evq->e_readlock);
+			ret = wait_event_interruptible(vdmabuf->evq->e_wait,
+					!list_empty(&vdmabuf->evq->e_list));
+
+			if (ret == 0)
+				ret = mutex_lock_interruptible(
+						&vdmabuf->evq->e_readlock);
+
+			if (ret)
+				return ret;
+		} else {
+			unsigned int len = (sizeof(e->e_data.hdr) +
+					    e->e_data.hdr.size);
+
+			if (len > cnt - ret) {
+put_back_event:
+				spin_lock_irq(&vdmabuf->evq->e_lock);
+				list_add(&e->link, &vdmabuf->evq->e_list);
+				spin_unlock_irq(&vdmabuf->evq->e_lock);
+				break;
+			}
+
+			if (copy_to_user(buf + ret, &e->e_data.hdr,
+					 sizeof(e->e_data.hdr))) {
+				if (ret == 0)
+					ret = -EFAULT;
+
+				goto put_back_event;
+			}
+
+			ret += sizeof(e->e_data.hdr);
+
+			if (copy_to_user(buf + ret, e->e_data.data,
+					 e->e_data.hdr.size)) {
+				/* error while copying void *data */
+
+				struct virtio_vdmabuf_e_hdr dummy_hdr = {0};
+
+				ret -= sizeof(e->e_data.hdr);
+
+				/* nullifying hdr of the event in user buffer */
+				if (copy_to_user(buf + ret, &dummy_hdr,
+						 sizeof(dummy_hdr)))
+					dev_err(drv_info->dev,
+					   "fail to nullify invalid hdr\n");
+
+				ret = -EFAULT;
+
+				goto put_back_event;
+			}
+
+			ret += e->e_data.hdr.size;
+			vdmabuf->evq->pending--;
+			kvfree(e);
+		}
+	}
+
+	mutex_unlock(&vdmabuf->evq->e_readlock);
+
+	return ret;
+}
+
+static const struct file_operations virtio_vdmabuf_fops = {
+	.owner = THIS_MODULE,
+	.open = virtio_vdmabuf_open,
+	.release = virtio_vdmabuf_release,
+	.read = virtio_vdmabuf_event_read,
+	.poll = virtio_vdmabuf_event_poll,
+	.unlocked_ioctl = virtio_vdmabuf_ioctl,
+};
+
+static struct miscdevice virtio_vdmabuf_miscdev = {
+	.minor = MISC_DYNAMIC_MINOR,
+	.name = "virtio-vdmabuf",
+	.fops = &virtio_vdmabuf_fops,
+};
+
+static int virtio_vdmabuf_probe(struct virtio_device *vdev)
+{
+	vq_callback_t *cbs[] = {
+		virtio_vdmabuf_recv_cb,
+		virtio_vdmabuf_send_cb,
+	};
+	static const char *const names[] = {
+		"recv",
+		"send",
+	};
+	struct virtio_vdmabuf *vdmabuf;
+	int ret = 0;
+
+	if (!drv_info)
+		return -EINVAL;
+
+	vdmabuf = drv_info->priv;
+
+	if (!vdmabuf)
+		return -EINVAL;
+
+	vdmabuf->vdev = vdev;
+	vdev->priv = vdmabuf;
+
+	/* initialize spinlock for synchronizing virtqueue accesses */
+	spin_lock_init(&vdmabuf->vq_lock);
+
+	ret = virtio_find_vqs(vdmabuf->vdev, VDMABUF_VQ_MAX, vdmabuf->vqs,
+			      cbs, names, NULL);
+	if (ret) {
+		dev_err(drv_info->dev, "Cannot find any vqs\n");
+		return ret;
+	}
+
+	INIT_LIST_HEAD(&vdmabuf->msg_list);
+	INIT_WORK(&vdmabuf->recv_work, virtio_vdmabuf_recv_work);
+	INIT_WORK(&vdmabuf->send_work, virtio_vdmabuf_send_work);
+	INIT_WORK(&vdmabuf->send_msg_work, virtio_vdmabuf_send_msg_work);
+
+	return ret;
+}
+
+static void virtio_vdmabuf_remove(struct virtio_device *vdev)
+{
+	struct virtio_vdmabuf *vdmabuf;
+
+	if (!drv_info)
+		return;
+
+	vdmabuf = drv_info->priv;
+	flush_work(&vdmabuf->recv_work);
+	flush_work(&vdmabuf->send_work);
+	flush_work(&vdmabuf->send_msg_work);
+
+	vdev->config->reset(vdev);
+	vdev->config->del_vqs(vdev);
+}
+
+static struct virtio_device_id id_table[] = {
+	{ VIRTIO_ID_VDMABUF, VIRTIO_DEV_ANY_ID },
+	{ 0 },
+};
+
+static struct virtio_driver virtio_vdmabuf_vdev_drv = {
+	.driver.name =  KBUILD_MODNAME,
+	.driver.owner = THIS_MODULE,
+	.id_table =     id_table,
+	.probe =        virtio_vdmabuf_probe,
+	.remove =       virtio_vdmabuf_remove,
+};
+
+static int __init virtio_vdmabuf_init(void)
+{
+	struct virtio_vdmabuf *vdmabuf;
+	int ret = 0;
+
+	drv_info = NULL;
+
+	ret = misc_register(&virtio_vdmabuf_miscdev);
+	if (ret) {
+		pr_err("virtio-vdmabuf misc driver can't be registered\n");
+		return ret;
+	}
+
+	ret = dma_set_mask_and_coherent(virtio_vdmabuf_miscdev.this_device,
+					DMA_BIT_MASK(64));
+	if (ret < 0) {
+		misc_deregister(&virtio_vdmabuf_miscdev);
+		return -EINVAL;
+	}
+
+	drv_info = kvcalloc(1, sizeof(*drv_info), GFP_KERNEL);
+	if (!drv_info) {
+		misc_deregister(&virtio_vdmabuf_miscdev);
+		return -ENOMEM;
+	}
+
+	vdmabuf = kvcalloc(1, sizeof(*vdmabuf), GFP_KERNEL);
+	if (!vdmabuf) {
+		kvfree(drv_info);
+		misc_deregister(&virtio_vdmabuf_miscdev);
+		return -ENOMEM;
+	}
+
+	vdmabuf->evq = kvcalloc(1, sizeof(*(vdmabuf->evq)), GFP_KERNEL);
+	if (!vdmabuf->evq) {
+		kvfree(drv_info);
+		kvfree(vdmabuf);
+		misc_deregister(&virtio_vdmabuf_miscdev);
+		return -ENOMEM;
+	}
+
+	drv_info->priv = (void *)vdmabuf;
+	drv_info->dev = virtio_vdmabuf_miscdev.this_device;
+
+	mutex_init(&drv_info->g_mutex);
+
+	mutex_init(&vdmabuf->evq->e_readlock);
+	spin_lock_init(&vdmabuf->evq->e_lock);
+
+	INIT_LIST_HEAD(&vdmabuf->evq->e_list);
+	init_waitqueue_head(&vdmabuf->evq->e_wait);
+	hash_init(drv_info->buf_list);
+
+	vdmabuf->evq->pending = 0;
+	vdmabuf->wq = create_workqueue("virtio_vdmabuf_wq");
+
+	ret = register_virtio_driver(&virtio_vdmabuf_vdev_drv);
+	if (ret) {
+		dev_err(drv_info->dev, "vdmabuf driver can't be registered\n");
+		misc_deregister(&virtio_vdmabuf_miscdev);
+		kvfree(vdmabuf);
+		kvfree(drv_info);
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+static void __exit virtio_vdmabuf_deinit(void)
+{
+	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
+	struct virtio_vdmabuf_event *e, *et;
+	unsigned long irqflags;
+
+	misc_deregister(&virtio_vdmabuf_miscdev);
+	unregister_virtio_driver(&virtio_vdmabuf_vdev_drv);
+
+	if (vdmabuf->wq)
+		destroy_workqueue(vdmabuf->wq);
+
+	spin_lock_irqsave(&vdmabuf->evq->e_lock, irqflags);
+
+	list_for_each_entry_safe(e, et, &vdmabuf->evq->e_list,
+				 link) {
+		list_del(&e->link);
+		kvfree(e);
+		vdmabuf->evq->pending--;
+	}
+
+	spin_unlock_irqrestore(&vdmabuf->evq->e_lock, irqflags);
+
+	/* freeing all exported buffers */
+	remove_all_bufs(vdmabuf);
+
+	kvfree(vdmabuf->evq);
+	kvfree(vdmabuf);
+	kvfree(drv_info);
+}
+
+module_init(virtio_vdmabuf_init);
+module_exit(virtio_vdmabuf_deinit);
+
+MODULE_DEVICE_TABLE(virtio, virtio_vdmabuf_id_table);
+MODULE_DESCRIPTION("Virtio Vdmabuf frontend driver");
+MODULE_LICENSE("GPL and additional rights");
diff --git a/include/linux/virtio_vdmabuf.h b/include/linux/virtio_vdmabuf.h
new file mode 100644
index 000000000000..9500bf4a54ac
--- /dev/null
+++ b/include/linux/virtio_vdmabuf.h
@@ -0,0 +1,271 @@
+/* SPDX-License-Identifier: (MIT OR GPL-2.0) */
+
+/*
+ * Copyright © 2021 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#ifndef _LINUX_VIRTIO_VDMABUF_H 
+#define _LINUX_VIRTIO_VDMABUF_H 
+
+#include <uapi/linux/virtio_vdmabuf.h>
+#include <linux/hashtable.h>
+#include <linux/kvm_types.h>
+
+struct virtio_vdmabuf_shared_pages {
+	/* cross-VM ref addr for the buffer */
+	gpa_t ref;
+
+	/* page array */
+	struct page **pages;
+	gpa_t **l2refs;
+	gpa_t *l3refs;
+
+	/* data offset in the first page
+	 * and data length in the last page
+	 */
+	int first_ofst;
+	int last_len;
+
+	/* number of shared pages */
+	int nents;
+};
+
+struct virtio_vdmabuf_buf {
+	virtio_vdmabuf_buf_id_t buf_id;
+
+	struct dma_buf_attachment *attach;
+	struct dma_buf *dma_buf;
+	struct sg_table *sgt;
+	struct virtio_vdmabuf_shared_pages *pages_info;
+	int vmid;
+
+	/* validity of the buffer */
+	bool valid;
+
+	/* set if the buffer is imported via import_ioctl */
+	bool imported;
+
+	/* size of private */
+	size_t sz_priv;
+	/* private data associated with the exported buffer */
+	void *priv;
+
+	struct file *filp;
+	struct hlist_node node;
+};
+
+struct virtio_vdmabuf_event {
+	struct virtio_vdmabuf_e_data e_data;
+	struct list_head link;
+};
+
+struct virtio_vdmabuf_event_queue {
+	wait_queue_head_t e_wait;
+	struct list_head e_list;
+
+	spinlock_t e_lock;
+	struct mutex e_readlock;
+
+	/* # of pending events */
+	int pending;
+};
+
+/* driver information */
+struct virtio_vdmabuf_info {
+	struct device *dev;
+
+	struct list_head head_vdmabuf_list;
+	struct list_head kvm_instances;
+
+	DECLARE_HASHTABLE(buf_list, 7);
+
+	void *priv;
+	struct mutex g_mutex;
+	struct notifier_block kvm_notifier;
+};
+
+/* IOCTL definitions
+ */
+typedef int (*virtio_vdmabuf_ioctl_t)(struct file *filp, void *data);
+
+struct virtio_vdmabuf_ioctl_desc {
+	unsigned int cmd;
+	int flags;
+	virtio_vdmabuf_ioctl_t func;
+	const char *name;
+};
+
+#define VIRTIO_VDMABUF_IOCTL_DEF(ioctl, _func, _flags)	\
+	[_IOC_NR(ioctl)] = {			\
+			.cmd = ioctl,		\
+			.func = _func,		\
+			.flags = _flags,	\
+			.name = #ioctl		\
+}
+
+#define VIRTIO_VDMABUF_VMID(buf_id) ((((buf_id).id) >> 32) & 0xFFFFFFFF)
+
+/* Messages between Host and Guest */
+
+/* List of commands from Guest to Host:
+ *
+ * ------------------------------------------------------------------
+ * A. NEED_VMID
+ *
+ *  guest asks the host to provide its vmid
+ *
+ * req:
+ *
+ * cmd: VIRTIO_VDMABUF_NEED_VMID
+ *
+ * ack:
+ *
+ * cmd: same as req
+ * op[0] : vmid of guest
+ *
+ * ------------------------------------------------------------------
+ * B. EXPORT
+ *
+ *  export dmabuf to host
+ *
+ * req:
+ *
+ * cmd: VIRTIO_VDMABUF_CMD_EXPORT
+ * op0~op3 : HDMABUF ID
+ * op4 : number of pages to be shared
+ * op5 : offset of data in the first page
+ * op6 : length of data in the last page
+ * op7 : upper 32 bit of top-level ref of shared buf
+ * op8 : lower 32 bit of top-level ref of shared buf
+ * op9 : size of private data
+ * op10 ~ op64: User private date associated with the buffer
+ *	        (e.g. graphic buffer's meta info)
+ *
+ * ------------------------------------------------------------------
+ *
+ * List of commands from Host to Guest
+ *
+ * ------------------------------------------------------------------
+ * A. RELEASE
+ *
+ *  notifying guest that the shared buffer is released by an importer
+ *
+ * req:
+ *
+ * cmd: VIRTIO_VDMABUF_CMD_DMABUF_REL
+ * op0~op3 : VDMABUF ID
+ *
+ * ------------------------------------------------------------------
+ */
+
+/* msg structures */
+struct virtio_vdmabuf_msg {
+	struct list_head list;
+	unsigned int cmd;
+	unsigned int op[64];
+};
+
+enum {
+	VDMABUF_VQ_RECV = 0,
+	VDMABUF_VQ_SEND = 1,
+	VDMABUF_VQ_MAX  = 2,
+};
+
+enum virtio_vdmabuf_cmd {
+	VIRTIO_VDMABUF_CMD_NEED_VMID,
+	VIRTIO_VDMABUF_CMD_EXPORT = 0x10,
+	VIRTIO_VDMABUF_CMD_DMABUF_REL
+};
+
+enum virtio_vdmabuf_ops {
+	VIRTIO_VDMABUF_HDMABUF_ID_ID = 0,
+	VIRTIO_VDMABUF_HDMABUF_ID_RNG_KEY0,
+	VIRTIO_VDMABUF_HDMABUF_ID_RNG_KEY1,
+	VIRTIO_VDMABUF_NUM_PAGES_SHARED = 4,
+	VIRTIO_VDMABUF_FIRST_PAGE_DATA_OFFSET,
+	VIRTIO_VDMABUF_LAST_PAGE_DATA_LENGTH,
+	VIRTIO_VDMABUF_REF_ADDR_UPPER_32BIT,
+	VIRTIO_VDMABUF_REF_ADDR_LOWER_32BIT,
+	VIRTIO_VDMABUF_PRIVATE_DATA_SIZE,
+	VIRTIO_VDMABUF_PRIVATE_DATA_START
+};
+
+/* adding exported/imported vdmabuf info to hash */
+static inline int
+virtio_vdmabuf_add_buf(struct virtio_vdmabuf_info *info,
+                       struct virtio_vdmabuf_buf *new)
+{
+	hash_add(info->buf_list, &new->node, new->buf_id.id);
+	return 0;
+}
+
+/* comparing two vdmabuf IDs */
+static inline bool
+is_same_buf(virtio_vdmabuf_buf_id_t a,
+            virtio_vdmabuf_buf_id_t b)
+{
+	int i;
+
+	if (a.id != b.id)
+		return 1;
+
+	/* compare keys */
+	for (i = 0; i < 2; i++) {
+		if (a.rng_key[i] != b.rng_key[i])
+			return false;
+	}
+
+	return true;
+}
+
+/* find buf for given vdmabuf ID */
+static inline struct virtio_vdmabuf_buf
+*virtio_vdmabuf_find_buf(struct virtio_vdmabuf_info *info,
+			 virtio_vdmabuf_buf_id_t *buf_id)
+{
+	struct virtio_vdmabuf_buf *found;
+
+	hash_for_each_possible(info->buf_list, found, node, buf_id->id)
+		if (is_same_buf(found->buf_id, *buf_id))
+			return found;
+
+	return NULL;
+}
+
+/* delete buf from hash */
+static inline int
+virtio_vdmabuf_del_buf(struct virtio_vdmabuf_info *info,
+                       virtio_vdmabuf_buf_id_t *buf_id)
+{
+	struct virtio_vdmabuf_buf *found;
+
+	found = virtio_vdmabuf_find_buf(info, buf_id);
+	if (!found)
+		return -ENOENT;
+
+	hash_del(&found->node);
+
+	return 0;
+}
+
+#endif
diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
index bc1c0621f5ed..39c94637ddee 100644
--- a/include/uapi/linux/virtio_ids.h
+++ b/include/uapi/linux/virtio_ids.h
@@ -54,5 +54,6 @@
 #define VIRTIO_ID_FS			26 /* virtio filesystem */
 #define VIRTIO_ID_PMEM			27 /* virtio pmem */
 #define VIRTIO_ID_MAC80211_HWSIM	29 /* virtio mac80211-hwsim */
+#define VIRTIO_ID_VDMABUF          	40 /* virtio vdmabuf */
 
 #endif /* _LINUX_VIRTIO_IDS_H */
diff --git a/include/uapi/linux/virtio_vdmabuf.h b/include/uapi/linux/virtio_vdmabuf.h
new file mode 100644
index 000000000000..7bddaa04ddd6
--- /dev/null
+++ b/include/uapi/linux/virtio_vdmabuf.h
@@ -0,0 +1,99 @@
+// SPDX-License-Identifier: (MIT OR GPL-2.0)
+
+/*
+ * Copyright © 2021 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#ifndef _UAPI_LINUX_VIRTIO_VDMABUF_H
+#define _UAPI_LINUX_VIRTIO_VDMABUF_H
+
+#define MAX_SIZE_PRIV_DATA 192
+
+typedef struct {
+	__u64 id;
+	/* 8B long Random number */
+	int rng_key[2];
+} virtio_vdmabuf_buf_id_t;
+
+struct virtio_vdmabuf_e_hdr {
+	/* buf_id of new buf */
+	virtio_vdmabuf_buf_id_t buf_id;
+	/* size of private data */
+	int size;
+};
+
+struct virtio_vdmabuf_e_data {
+	struct virtio_vdmabuf_e_hdr hdr;
+	/* ptr to private data */
+	void __user *data;
+};
+
+#define VIRTIO_VDMABUF_IOCTL_IMPORT \
+_IOC(_IOC_NONE, 'G', 2, sizeof(struct virtio_vdmabuf_import))
+#define VIRTIO_VDMABUF_IOCTL_RELEASE \
+_IOC(_IOC_NONE, 'G', 3, sizeof(struct virtio_vdmabuf_import))
+struct virtio_vdmabuf_import {
+	/* IN parameters */
+	/* ahdb buf id to be imported */
+	virtio_vdmabuf_buf_id_t buf_id;
+	/* flags */
+	int flags;
+	/* OUT parameters */
+	/* exported dma buf fd */
+	int fd;
+};
+
+#define VIRTIO_VDMABUF_IOCTL_EXPORT \
+_IOC(_IOC_NONE, 'G', 4, sizeof(struct virtio_vdmabuf_export))
+struct virtio_vdmabuf_export {
+	/* IN parameters */
+	/* DMA buf fd to be exported */
+	int fd;
+	/* exported dma buf id */
+	virtio_vdmabuf_buf_id_t buf_id;
+	int sz_priv;
+	char *priv;
+};
+
+#define VIRTIO_VDMABUF_IOCTL_QUERY \
+_IOC(_IOC_NONE, 'G', 5, sizeof(struct virtio_vdmabuf_query))
+struct virtio_vdmabuf_query {
+	/* in parameters */
+	/* id of buf to be queried */
+	virtio_vdmabuf_buf_id_t buf_id;
+	/* item to be queried */
+	int item;
+	/* OUT parameters */
+	/* Value of queried item */
+	unsigned long info;
+};
+
+/* DMABUF query */
+enum virtio_vdmabuf_query_cmd {
+	VIRTIO_VDMABUF_QUERY_SIZE = 0x10,
+	VIRTIO_VDMABUF_QUERY_BUSY,
+	VIRTIO_VDMABUF_QUERY_PRIV_INFO_SIZE,
+	VIRTIO_VDMABUF_QUERY_PRIV_INFO,
+};
+
+#endif
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-03  7:35   ` Vivek Kasireddy
  0 siblings, 0 replies; 57+ messages in thread
From: Vivek Kasireddy @ 2021-02-03  7:35 UTC (permalink / raw)
  To: virtualization, dri-devel
  Cc: dongwon.kim, daniel.vetter, sumit.semwal, daniel.vetter,
	christian.koenig, linux-media

This driver "transfers" a dmabuf created on the Guest to the Host.
A common use-case for such a transfer includes sharing the scanout
buffer created by a display server or a compositor running in the
Guest with Qemu UI -- running on the Host.

The "transfer" is accomplished by sharing the PFNs of all the pages
associated with the dmabuf and having a new dmabuf created on the
Host that is backed up by the pages mapped from the Guest.

Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
---
 drivers/virtio/Kconfig              |    8 +
 drivers/virtio/Makefile             |    1 +
 drivers/virtio/virtio_vdmabuf.c     | 1090 +++++++++++++++++++++++++++
 include/linux/virtio_vdmabuf.h      |  271 +++++++
 include/uapi/linux/virtio_ids.h     |    1 +
 include/uapi/linux/virtio_vdmabuf.h |   99 +++
 6 files changed, 1470 insertions(+)
 create mode 100644 drivers/virtio/virtio_vdmabuf.c
 create mode 100644 include/linux/virtio_vdmabuf.h
 create mode 100644 include/uapi/linux/virtio_vdmabuf.h

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 7b41130d3f35..e563c12f711e 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -139,4 +139,12 @@ config VIRTIO_DMA_SHARED_BUFFER
 	 This option adds a flavor of dma buffers that are backed by
 	 virtio resources.
 
+config VIRTIO_VDMABUF
+	bool "Enables Vdmabuf driver in guest os"
+	default n
+	depends on VIRTIO
+	help
+	 This driver provides a way to share the dmabufs created in
+	 the Guest with the Host.
+
 endif # VIRTIO_MENU
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 591e6f72aa54..b4bb0738009c 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -9,3 +9,4 @@ obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
 obj-$(CONFIG_VIRTIO_VDPA) += virtio_vdpa.o
 obj-$(CONFIG_VIRTIO_MEM) += virtio_mem.o
 obj-$(CONFIG_VIRTIO_DMA_SHARED_BUFFER) += virtio_dma_buf.o
+obj-$(CONFIG_VIRTIO_VDMABUF) += virtio_vdmabuf.o
diff --git a/drivers/virtio/virtio_vdmabuf.c b/drivers/virtio/virtio_vdmabuf.c
new file mode 100644
index 000000000000..c28f144eb126
--- /dev/null
+++ b/drivers/virtio/virtio_vdmabuf.c
@@ -0,0 +1,1090 @@
+// SPDX-License-Identifier: (MIT OR GPL-2.0)
+
+/*
+ * Copyright © 2021 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Dongwon Kim <dongwon.kim@intel.com>
+ *    Mateusz Polrola <mateusz.polrola@gmail.com>
+ *    Vivek Kasireddy <vivek.kasireddy@intel.com>
+ */
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/uaccess.h>
+#include <linux/miscdevice.h>
+#include <linux/delay.h>
+#include <linux/random.h>
+#include <linux/poll.h>
+#include <linux/spinlock.h>
+#include <linux/dma-buf.h>
+#include <linux/virtio.h>
+#include <linux/virtio_ids.h>
+#include <linux/virtio_config.h>
+#include <linux/virtio_vdmabuf.h>
+
+#define VIRTIO_VDMABUF_MAX_ID INT_MAX
+#define REFS_PER_PAGE (PAGE_SIZE/sizeof(long))
+#define NEW_BUF_ID_GEN(vmid, cnt) (((vmid & 0xFFFFFFFF) << 32) | \
+				    ((cnt) & 0xFFFFFFFF))
+
+/* one global drv object */
+static struct virtio_vdmabuf_info *drv_info;
+
+struct virtio_vdmabuf {
+	/* virtio device structure */
+	struct virtio_device *vdev;
+
+	/* virtual queue array */
+	struct virtqueue *vqs[VDMABUF_VQ_MAX];
+
+	/* ID of guest OS */
+	u64 vmid;
+
+	/* spin lock that needs to be acquired before accessing
+	 * virtual queue
+	 */
+	spinlock_t vq_lock;
+	struct mutex recv_lock;
+	struct mutex send_lock;
+
+	struct list_head msg_list;
+
+	/* workqueue */
+	struct workqueue_struct *wq;
+	struct work_struct recv_work;
+	struct work_struct send_work;
+	struct work_struct send_msg_work;
+
+	struct virtio_vdmabuf_event_queue *evq;
+};
+
+static virtio_vdmabuf_buf_id_t get_buf_id(struct virtio_vdmabuf *vdmabuf)
+{
+	virtio_vdmabuf_buf_id_t buf_id = {0, {0, 0} };
+	static int count = 0;
+
+	count = count < VIRTIO_VDMABUF_MAX_ID ? count + 1 : 0;
+	buf_id.id = NEW_BUF_ID_GEN(vdmabuf->vmid, count);
+
+	/* random data embedded in the id for security */
+	get_random_bytes(&buf_id.rng_key[0], 8);
+
+	return buf_id;
+}
+
+/* sharing pages for original DMABUF with Host */
+static struct virtio_vdmabuf_shared_pages
+*virtio_vdmabuf_share_buf(struct page **pages, int nents,
+			  int first_ofst, int last_len)
+{
+	struct virtio_vdmabuf_shared_pages *pages_info;
+	int i;
+	int n_l2refs = nents/REFS_PER_PAGE +
+		       ((nents % REFS_PER_PAGE) ? 1 : 0);
+
+	pages_info = kvcalloc(1, sizeof(*pages_info), GFP_KERNEL);
+	if (!pages_info)
+		return NULL;
+
+	pages_info->pages = pages;
+	pages_info->nents = nents;
+	pages_info->first_ofst = first_ofst;
+	pages_info->last_len = last_len;
+	pages_info->l3refs = (gpa_t *)__get_free_page(GFP_KERNEL);
+
+	if (!pages_info->l3refs) {
+		kvfree(pages_info);
+		return NULL;
+	}
+
+	pages_info->l2refs = (gpa_t **)__get_free_pages(GFP_KERNEL,
+					get_order(n_l2refs * PAGE_SIZE));
+
+	if (!pages_info->l2refs) {
+		free_page((gpa_t)pages_info->l3refs);
+		kvfree(pages_info);
+		return NULL;
+	}
+
+	/* Share physical address of pages */
+	for (i = 0; i < nents; i++)
+		pages_info->l2refs[i] = (gpa_t *)page_to_phys(pages[i]);
+
+	for (i = 0; i < n_l2refs; i++)
+		pages_info->l3refs[i] =
+			virt_to_phys((void *)pages_info->l2refs +
+				     i * PAGE_SIZE);
+
+	pages_info->ref = (gpa_t)virt_to_phys(pages_info->l3refs);
+
+	return pages_info;
+}
+
+/* stop sharing pages */
+static void
+virtio_vdmabuf_free_buf(struct virtio_vdmabuf_shared_pages *pages_info)
+{
+	int n_l2refs = (pages_info->nents/REFS_PER_PAGE +
+		       ((pages_info->nents % REFS_PER_PAGE) ? 1 : 0));
+
+	free_pages((gpa_t)pages_info->l2refs, get_order(n_l2refs * PAGE_SIZE));
+	free_page((gpa_t)pages_info->l3refs);
+
+	kvfree(pages_info);
+}
+
+static int send_msg_to_host(enum virtio_vdmabuf_cmd cmd, int *op)
+{
+	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
+	struct virtio_vdmabuf_msg *msg;
+	int i;
+
+	switch (cmd) {
+	case VIRTIO_VDMABUF_CMD_NEED_VMID:
+		msg = kvcalloc(1, sizeof(struct virtio_vdmabuf_msg),
+			       GFP_KERNEL);
+		if (!msg)
+			return -ENOMEM;
+
+		if (op)
+			for (i = 0; i < 4; i++)
+				msg->op[i] = op[i];
+		break;
+
+	case VIRTIO_VDMABUF_CMD_EXPORT:
+		msg = kvcalloc(1, sizeof(struct virtio_vdmabuf_msg),
+			       GFP_KERNEL);
+		if (!msg)
+			return -ENOMEM;
+
+		memcpy(&msg->op[0], &op[0], 9 * sizeof(int) + op[9]);
+		break;
+
+	default:
+		/* no command found */
+		return -EINVAL;
+	}
+
+	msg->cmd = cmd;
+	list_add_tail(&msg->list, &vdmabuf->msg_list);
+	queue_work(vdmabuf->wq, &vdmabuf->send_msg_work);
+
+	return 0;
+}
+
+static int add_event_buf_rel(struct virtio_vdmabuf_buf *buf_info)
+{
+	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
+	struct virtio_vdmabuf_event *e_oldest, *e_new;
+	struct virtio_vdmabuf_event_queue *eq = vdmabuf->evq;
+	unsigned long irqflags;
+
+	e_new = kvzalloc(sizeof(*e_new), GFP_KERNEL);
+	if (!e_new)
+		return -ENOMEM;
+
+	e_new->e_data.hdr.buf_id = buf_info->buf_id;
+	e_new->e_data.data = (void *)buf_info->priv;
+	e_new->e_data.hdr.size = buf_info->sz_priv;
+
+	spin_lock_irqsave(&eq->e_lock, irqflags);
+
+	/* check current number of events and if it hits the max num (32)
+	 * then remove the oldest event in the list
+	 */
+	if (eq->pending > 31) {
+		e_oldest = list_first_entry(&eq->e_list,
+					    struct virtio_vdmabuf_event, link);
+		list_del(&e_oldest->link);
+		eq->pending--;
+		kvfree(e_oldest);
+	}
+
+	list_add_tail(&e_new->link, &eq->e_list);
+
+	eq->pending++;
+
+	wake_up_interruptible(&eq->e_wait);
+	spin_unlock_irqrestore(&eq->e_lock, irqflags);
+
+	return 0;
+}
+
+static void virtio_vdmabuf_clear_buf(struct virtio_vdmabuf_buf *exp)
+{
+	/* Start cleanup of buffer in reverse order to exporting */
+	virtio_vdmabuf_free_buf(exp->pages_info);
+
+	dma_buf_unmap_attachment(exp->attach, exp->sgt,
+				 DMA_BIDIRECTIONAL);
+
+	if (exp->dma_buf) {
+		dma_buf_detach(exp->dma_buf, exp->attach);
+		/* close connection to dma-buf completely */
+		dma_buf_put(exp->dma_buf);
+		exp->dma_buf = NULL;
+	}
+}
+
+static int remove_buf(struct virtio_vdmabuf *vdmabuf,
+		      struct virtio_vdmabuf_buf *exp)
+{
+	int ret;
+
+	ret = add_event_buf_rel(exp);
+	if (ret)
+		return ret;
+
+	virtio_vdmabuf_clear_buf(exp);
+
+	ret = virtio_vdmabuf_del_buf(drv_info, &exp->buf_id);
+	if (ret)
+		return ret;
+
+	if (exp->sz_priv > 0 && !exp->priv)
+		kvfree(exp->priv);
+
+	kvfree(exp);
+	return 0;
+}
+
+static int parse_msg_from_host(struct virtio_vdmabuf *vdmabuf,
+		     	       struct virtio_vdmabuf_msg *msg)
+{
+	struct virtio_vdmabuf_buf *exp;
+	virtio_vdmabuf_buf_id_t buf_id;
+	int ret;
+
+	switch (msg->cmd) {
+	case VIRTIO_VDMABUF_CMD_NEED_VMID:
+		vdmabuf->vmid = msg->op[0];
+
+		break;
+	case VIRTIO_VDMABUF_CMD_DMABUF_REL:
+		memcpy(&buf_id, msg->op, sizeof(buf_id));
+
+		exp = virtio_vdmabuf_find_buf(drv_info, &buf_id);
+		if (!exp) {
+			dev_err(drv_info->dev, "can't find buffer\n");
+			return -EINVAL;
+		}
+
+		ret = remove_buf(vdmabuf, exp);
+		if (ret)
+			return ret;
+
+		break;
+	case VIRTIO_VDMABUF_CMD_EXPORT:
+		break;
+	default:
+		dev_err(drv_info->dev, "empty cmd\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static void virtio_vdmabuf_recv_work(struct work_struct *work)
+{
+	struct virtio_vdmabuf *vdmabuf =
+		container_of(work, struct virtio_vdmabuf, recv_work);
+	struct virtqueue *vq = vdmabuf->vqs[VDMABUF_VQ_RECV];
+	struct virtio_vdmabuf_msg *msg;
+	int sz;
+
+	mutex_lock(&vdmabuf->recv_lock);
+
+	do {
+		virtqueue_disable_cb(vq);
+		for (;;) {
+			msg = virtqueue_get_buf(vq, &sz);
+			if (!msg)
+				break;
+
+			/* valid size */
+			if (sz == sizeof(struct virtio_vdmabuf_msg)) {
+				if (parse_msg_from_host(vdmabuf, msg))
+					dev_err(drv_info->dev,
+						"msg parse error\n");
+
+				kvfree(msg);
+			} else {
+				dev_err(drv_info->dev,
+					"received malformed message\n");
+			}
+		}
+	} while (!virtqueue_enable_cb(vq));
+
+	mutex_unlock(&vdmabuf->recv_lock);
+}
+
+static void virtio_vdmabuf_fill_recv_msg(struct virtio_vdmabuf *vdmabuf)
+{
+	struct virtqueue *vq = vdmabuf->vqs[VDMABUF_VQ_RECV];
+	struct scatterlist sg;
+	struct virtio_vdmabuf_msg *msg;
+	int ret;
+
+	msg = kvzalloc(sizeof(*msg), GFP_KERNEL);
+	if (!msg)
+		return;
+
+	sg_init_one(&sg, msg, sizeof(struct virtio_vdmabuf_msg));
+	ret = virtqueue_add_inbuf(vq, &sg, 1, msg, GFP_KERNEL);
+	if (ret)
+		return;
+
+	virtqueue_kick(vq);
+}
+
+static void virtio_vdmabuf_send_msg_work(struct work_struct *work)
+{
+	struct virtio_vdmabuf *vdmabuf =
+		container_of(work, struct virtio_vdmabuf, send_msg_work);
+	struct virtqueue *vq = vdmabuf->vqs[VDMABUF_VQ_SEND];
+	struct scatterlist sg;
+	struct virtio_vdmabuf_msg *msg;
+	bool added = false;
+	int ret;
+
+	mutex_lock(&vdmabuf->send_lock);
+
+	for (;;) {
+		if (list_empty(&vdmabuf->msg_list))
+			break;
+
+		virtio_vdmabuf_fill_recv_msg(vdmabuf);
+
+		msg = list_first_entry(&vdmabuf->msg_list,
+				       struct virtio_vdmabuf_msg, list);
+		list_del_init(&msg->list);
+
+		sg_init_one(&sg, msg, sizeof(struct virtio_vdmabuf_msg));
+		ret = virtqueue_add_outbuf(vq, &sg, 1, msg, GFP_KERNEL);
+		if (ret < 0) {
+			dev_err(drv_info->dev,
+				"failed to add msg to vq\n");
+			break;
+		}
+
+		added = true;	
+	}
+
+	if (added)
+		virtqueue_kick(vq);
+
+	mutex_unlock(&vdmabuf->send_lock);
+}
+
+static void virtio_vdmabuf_send_work(struct work_struct *work)
+{
+	struct virtio_vdmabuf *vdmabuf =
+		container_of(work, struct virtio_vdmabuf, send_work);
+	struct virtqueue *vq = vdmabuf->vqs[VDMABUF_VQ_SEND];
+	struct virtio_vdmabuf_msg *msg;
+	unsigned int sz;
+	bool added = false;
+
+	mutex_lock(&vdmabuf->send_lock);
+
+	do {
+		virtqueue_disable_cb(vq);
+
+		for (;;) {
+			msg = virtqueue_get_buf(vq, &sz);
+			if (!msg)
+				break;
+
+			if (parse_msg_from_host(vdmabuf, msg))
+				dev_err(drv_info->dev,
+					"msg parse error\n");
+
+			kvfree(msg);
+			added = true;
+		}
+	} while (!virtqueue_enable_cb(vq));
+
+	mutex_unlock(&vdmabuf->send_lock);
+
+	if (added)
+		queue_work(vdmabuf->wq, &vdmabuf->send_msg_work);
+}
+
+static void virtio_vdmabuf_recv_cb(struct virtqueue *vq)
+{
+	struct virtio_vdmabuf *vdmabuf = vq->vdev->priv;
+
+	if (!vdmabuf)
+		return;
+
+	queue_work(vdmabuf->wq, &vdmabuf->recv_work);
+}
+
+static void virtio_vdmabuf_send_cb(struct virtqueue *vq)
+{
+	struct virtio_vdmabuf *vdmabuf = vq->vdev->priv;
+
+	if (!vdmabuf)
+		return;
+
+	queue_work(vdmabuf->wq, &vdmabuf->send_work);
+}
+
+static int remove_all_bufs(struct virtio_vdmabuf *vdmabuf)
+{
+	struct virtio_vdmabuf_buf *found;
+	struct hlist_node *tmp;
+	int bkt;
+	int ret;
+
+	hash_for_each_safe(drv_info->buf_list, bkt, tmp, found, node) {
+		ret = remove_buf(vdmabuf, found);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int virtio_vdmabuf_open(struct inode *inode, struct file *filp)
+{
+	int ret;
+
+	if (!drv_info) {
+		pr_err("virtio vdmabuf driver is not ready\n");
+		return -EINVAL;
+	}
+
+	ret = send_msg_to_host(VIRTIO_VDMABUF_CMD_NEED_VMID, 0);
+	if (ret < 0)
+		dev_err(drv_info->dev, "fail to receive vmid\n");
+
+	filp->private_data = drv_info->priv;
+
+	return 0;
+}
+
+static int virtio_vdmabuf_release(struct inode *inode, struct file *filp)
+{
+	return 0;
+}
+
+/* Notify Host about the new vdmabuf */
+static int export_notify(struct virtio_vdmabuf_buf *exp, struct page **pages)
+{
+	int *op;
+	int ret;
+
+	op = kvcalloc(1, sizeof(int) * 65, GFP_KERNEL);
+	if (!op)
+		return -ENOMEM;
+
+	memcpy(op, &exp->buf_id, sizeof(exp->buf_id));
+
+	/* if new pages are to be shared */
+	if (pages) {
+		op[4] = exp->pages_info->nents;
+		op[5] = exp->pages_info->first_ofst;
+		op[6] = exp->pages_info->last_len;
+
+		memcpy(&op[7], &exp->pages_info->ref, sizeof(gpa_t));
+	}
+
+	op[9] = exp->sz_priv;
+
+	/* driver/application specific private info */
+	memcpy(&op[10], exp->priv, op[9]);
+
+	ret = send_msg_to_host(VIRTIO_VDMABUF_CMD_EXPORT, op);
+
+	kvfree(op);
+	return ret;
+}
+
+/* return total number of pages referenced by a sgt
+ * for pre-calculation of # of pages behind a given sgt
+ */
+static int num_pgs(struct sg_table *sgt)
+{
+	struct scatterlist *sgl;
+	int len, i;
+	/* at least one page */
+	int n_pgs = 1;
+
+	sgl = sgt->sgl;
+
+	len = sgl->length - PAGE_SIZE + sgl->offset;
+
+	/* round-up */
+	n_pgs += ((len + PAGE_SIZE - 1)/PAGE_SIZE);
+
+	for (i = 1; i < sgt->nents; i++) {
+		sgl = sg_next(sgl);
+
+		/* round-up */
+		n_pgs += ((sgl->length + PAGE_SIZE - 1) /
+			  PAGE_SIZE); /* round-up */
+	}
+
+	return n_pgs;
+}
+
+/* extract pages referenced by sgt */
+static struct page **extr_pgs(struct sg_table *sgt, int *nents, int *last_len)
+{
+	struct scatterlist *sgl;
+	struct page **pages;
+	struct page **temp_pgs;
+	int i, j;
+	int len;
+
+	*nents = num_pgs(sgt);
+	pages =	kvmalloc_array(*nents, sizeof(struct page *), GFP_KERNEL);
+	if (!pages)
+		return NULL;
+
+	sgl = sgt->sgl;
+
+	temp_pgs = pages;
+	*temp_pgs++ = sg_page(sgl);
+	len = sgl->length - PAGE_SIZE + sgl->offset;
+
+	i = 1;
+	while (len > 0) {
+		*temp_pgs++ = nth_page(sg_page(sgl), i++);
+		len -= PAGE_SIZE;
+	}
+
+	for (i = 1; i < sgt->nents; i++) {
+		sgl = sg_next(sgl);
+		*temp_pgs++ = sg_page(sgl);
+		len = sgl->length - PAGE_SIZE;
+		j = 1;
+
+		while (len > 0) {
+			*temp_pgs++ = nth_page(sg_page(sgl), j++);
+			len -= PAGE_SIZE;
+		}
+	}
+
+	*last_len = len + PAGE_SIZE;
+
+	return pages;
+}
+
+/* ioctl - exporting new vdmabuf
+ *
+ *	 int dmabuf_fd - File handle of original DMABUF
+ *	 virtio_vdmabuf_buf_id_t buf_id - returned vdmabuf ID
+ *	 int sz_priv - size of private data from userspace
+ *	 char *priv - buffer of user private data
+ *
+ */
+static int export_ioctl(struct file *filp, void *data)
+{
+	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
+	struct virtio_vdmabuf_export *attr = data;
+	struct dma_buf *dmabuf;
+	struct dma_buf_attachment *attach;
+	struct sg_table *sgt;
+	struct virtio_vdmabuf_buf *exp;
+	struct page **pages;
+	int nents, last_len;
+	virtio_vdmabuf_buf_id_t buf_id;
+	int ret = 0;
+
+	if (vdmabuf->vmid <= 0)
+		return -EINVAL;
+
+	dmabuf = dma_buf_get(attr->fd);
+	if (IS_ERR(dmabuf))
+		return PTR_ERR(dmabuf);
+
+	mutex_lock(&drv_info->g_mutex);
+
+	buf_id = get_buf_id(vdmabuf);
+
+	attach = dma_buf_attach(dmabuf, drv_info->dev);
+	if (IS_ERR(attach)) {
+		ret = PTR_ERR(attach);
+		goto fail_attach;
+	}
+
+	sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
+	if (IS_ERR(sgt)) {
+		ret = PTR_ERR(sgt);
+		goto fail_map_attachment;
+	}
+
+	/* allocate a new exp */
+	exp = kvcalloc(1, sizeof(*exp), GFP_KERNEL);
+	if (!exp) {
+		ret = -ENOMEM;
+		goto fail_sgt_info_creation;
+	}
+
+	/* possible truncation */
+	if (attr->sz_priv > MAX_SIZE_PRIV_DATA)
+		exp->sz_priv = MAX_SIZE_PRIV_DATA;
+	else
+		exp->sz_priv = attr->sz_priv;
+
+	/* creating buffer for private data */
+	if (exp->sz_priv != 0) {
+		exp->priv = kvcalloc(1, exp->sz_priv, GFP_KERNEL);
+		if (!exp->priv) {
+			ret = -ENOMEM;
+			goto fail_priv_creation;
+		}
+	}
+
+	exp->buf_id = buf_id;
+	exp->attach = attach;
+	exp->sgt = sgt;
+	exp->dma_buf = dmabuf;
+	exp->valid = 1;
+
+	if (exp->sz_priv) {
+		/* copy private data to sgt_info */
+		ret = copy_from_user(exp->priv, attr->priv, exp->sz_priv);
+		if (ret) {
+			ret = -EINVAL;
+			goto fail_exp;
+		}
+	}
+
+	pages = extr_pgs(sgt, &nents, &last_len);
+	if (pages == NULL) {
+		ret = -ENOMEM;
+		goto fail_exp;
+	}
+
+	exp->pages_info = virtio_vdmabuf_share_buf(pages, nents,
+						   sgt->sgl->offset,
+					 	   last_len);
+	if (!exp->pages_info) {
+		ret = -ENOMEM;
+		goto fail_create_pages_info;
+	}
+
+	attr->buf_id = exp->buf_id;
+	ret = export_notify(exp, pages);
+	if (ret < 0)
+		goto fail_send_request;
+
+	/* now register it to the export list */
+	virtio_vdmabuf_add_buf(drv_info, exp);
+
+	exp->filp = filp;
+
+	mutex_unlock(&drv_info->g_mutex);
+
+	return ret;
+
+/* Clean-up if error occurs */
+fail_send_request:
+	virtio_vdmabuf_free_buf(exp->pages_info);
+
+fail_create_pages_info:
+	kvfree(pages);
+
+fail_exp:
+	kvfree(exp->priv);
+
+fail_priv_creation:
+	kvfree(exp);
+
+fail_sgt_info_creation:
+	dma_buf_unmap_attachment(attach, sgt,
+				 DMA_BIDIRECTIONAL);
+
+fail_map_attachment:
+	dma_buf_detach(dmabuf, attach);
+
+fail_attach:
+	dma_buf_put(dmabuf);
+
+	mutex_unlock(&drv_info->g_mutex);
+
+	return ret;
+}
+
+static const struct virtio_vdmabuf_ioctl_desc virtio_vdmabuf_ioctls[] = {
+	VIRTIO_VDMABUF_IOCTL_DEF(VIRTIO_VDMABUF_IOCTL_EXPORT, export_ioctl, 0),
+};
+
+static long virtio_vdmabuf_ioctl(struct file *filp, unsigned int cmd,
+		       		 unsigned long param)
+{
+	const struct virtio_vdmabuf_ioctl_desc *ioctl = NULL;
+	unsigned int nr = _IOC_NR(cmd);
+	int ret;
+	virtio_vdmabuf_ioctl_t func;
+	char *kdata;
+
+	if (nr >= ARRAY_SIZE(virtio_vdmabuf_ioctls)) {
+		dev_err(drv_info->dev, "invalid ioctl\n");
+		return -EINVAL;
+	}
+
+	ioctl = &virtio_vdmabuf_ioctls[nr];
+
+	func = ioctl->func;
+
+	if (unlikely(!func)) {
+		dev_err(drv_info->dev, "no function\n");
+		return -EINVAL;
+	}
+
+	kdata = kvmalloc(_IOC_SIZE(cmd), GFP_KERNEL);
+	if (!kdata)
+		return -ENOMEM;
+
+	if (copy_from_user(kdata, (void __user *)param,
+			   _IOC_SIZE(cmd)) != 0) {
+		dev_err(drv_info->dev,
+			"failed to copy from user arguments\n");
+		ret = -EFAULT;
+		goto ioctl_error;
+	}
+
+	ret = func(filp, kdata);
+
+	if (copy_to_user((void __user *)param, kdata,
+			 _IOC_SIZE(cmd)) != 0) {
+		dev_err(drv_info->dev,
+			"failed to copy to user arguments\n");
+		ret = -EFAULT;
+		goto ioctl_error;
+	}
+
+ioctl_error:
+	kvfree(kdata);
+	return ret;
+}
+
+static unsigned int virtio_vdmabuf_event_poll(struct file *filp,
+			    	    	      struct poll_table_struct *wait)
+{
+	struct virtio_vdmabuf *vdmabuf = filp->private_data;
+
+	poll_wait(filp, &vdmabuf->evq->e_wait, wait);
+
+	if (!list_empty(&vdmabuf->evq->e_list))
+		return POLLIN | POLLRDNORM;
+
+	return 0;
+}
+
+static ssize_t virtio_vdmabuf_event_read(struct file *filp, char __user *buf,
+			       		 size_t cnt, loff_t *ofst)
+{
+	struct virtio_vdmabuf *vdmabuf = filp->private_data;
+	int ret;
+
+	/* make sure user buffer can be written */
+	if (!access_ok(buf, sizeof (*buf))) {
+		dev_err(drv_info->dev, "user buffer can't be written.\n");
+		return -EINVAL;
+	}
+
+	ret = mutex_lock_interruptible(&vdmabuf->evq->e_readlock);
+	if (ret)
+		return ret;
+
+	for (;;) {
+		struct virtio_vdmabuf_event *e = NULL;
+
+		spin_lock_irq(&vdmabuf->evq->e_lock);
+		if (!list_empty(&vdmabuf->evq->e_list)) {
+			e = list_first_entry(&vdmabuf->evq->e_list,
+					     struct virtio_vdmabuf_event, link);
+			list_del(&e->link);
+		}
+		spin_unlock_irq(&vdmabuf->evq->e_lock);
+
+		if (!e) {
+			if (ret)
+				break;
+
+			if (filp->f_flags & O_NONBLOCK) {
+				ret = -EAGAIN;
+				break;
+			}
+
+			mutex_unlock(&vdmabuf->evq->e_readlock);
+			ret = wait_event_interruptible(vdmabuf->evq->e_wait,
+					!list_empty(&vdmabuf->evq->e_list));
+
+			if (ret == 0)
+				ret = mutex_lock_interruptible(
+						&vdmabuf->evq->e_readlock);
+
+			if (ret)
+				return ret;
+		} else {
+			unsigned int len = (sizeof(e->e_data.hdr) +
+					    e->e_data.hdr.size);
+
+			if (len > cnt - ret) {
+put_back_event:
+				spin_lock_irq(&vdmabuf->evq->e_lock);
+				list_add(&e->link, &vdmabuf->evq->e_list);
+				spin_unlock_irq(&vdmabuf->evq->e_lock);
+				break;
+			}
+
+			if (copy_to_user(buf + ret, &e->e_data.hdr,
+					 sizeof(e->e_data.hdr))) {
+				if (ret == 0)
+					ret = -EFAULT;
+
+				goto put_back_event;
+			}
+
+			ret += sizeof(e->e_data.hdr);
+
+			if (copy_to_user(buf + ret, e->e_data.data,
+					 e->e_data.hdr.size)) {
+				/* error while copying void *data */
+
+				struct virtio_vdmabuf_e_hdr dummy_hdr = {0};
+
+				ret -= sizeof(e->e_data.hdr);
+
+				/* nullifying hdr of the event in user buffer */
+				if (copy_to_user(buf + ret, &dummy_hdr,
+						 sizeof(dummy_hdr)))
+					dev_err(drv_info->dev,
+					   "fail to nullify invalid hdr\n");
+
+				ret = -EFAULT;
+
+				goto put_back_event;
+			}
+
+			ret += e->e_data.hdr.size;
+			vdmabuf->evq->pending--;
+			kvfree(e);
+		}
+	}
+
+	mutex_unlock(&vdmabuf->evq->e_readlock);
+
+	return ret;
+}
+
+static const struct file_operations virtio_vdmabuf_fops = {
+	.owner = THIS_MODULE,
+	.open = virtio_vdmabuf_open,
+	.release = virtio_vdmabuf_release,
+	.read = virtio_vdmabuf_event_read,
+	.poll = virtio_vdmabuf_event_poll,
+	.unlocked_ioctl = virtio_vdmabuf_ioctl,
+};
+
+static struct miscdevice virtio_vdmabuf_miscdev = {
+	.minor = MISC_DYNAMIC_MINOR,
+	.name = "virtio-vdmabuf",
+	.fops = &virtio_vdmabuf_fops,
+};
+
+static int virtio_vdmabuf_probe(struct virtio_device *vdev)
+{
+	vq_callback_t *cbs[] = {
+		virtio_vdmabuf_recv_cb,
+		virtio_vdmabuf_send_cb,
+	};
+	static const char *const names[] = {
+		"recv",
+		"send",
+	};
+	struct virtio_vdmabuf *vdmabuf;
+	int ret = 0;
+
+	if (!drv_info)
+		return -EINVAL;
+
+	vdmabuf = drv_info->priv;
+
+	if (!vdmabuf)
+		return -EINVAL;
+
+	vdmabuf->vdev = vdev;
+	vdev->priv = vdmabuf;
+
+	/* initialize spinlock for synchronizing virtqueue accesses */
+	spin_lock_init(&vdmabuf->vq_lock);
+
+	ret = virtio_find_vqs(vdmabuf->vdev, VDMABUF_VQ_MAX, vdmabuf->vqs,
+			      cbs, names, NULL);
+	if (ret) {
+		dev_err(drv_info->dev, "Cannot find any vqs\n");
+		return ret;
+	}
+
+	INIT_LIST_HEAD(&vdmabuf->msg_list);
+	INIT_WORK(&vdmabuf->recv_work, virtio_vdmabuf_recv_work);
+	INIT_WORK(&vdmabuf->send_work, virtio_vdmabuf_send_work);
+	INIT_WORK(&vdmabuf->send_msg_work, virtio_vdmabuf_send_msg_work);
+
+	return ret;
+}
+
+static void virtio_vdmabuf_remove(struct virtio_device *vdev)
+{
+	struct virtio_vdmabuf *vdmabuf;
+
+	if (!drv_info)
+		return;
+
+	vdmabuf = drv_info->priv;
+	flush_work(&vdmabuf->recv_work);
+	flush_work(&vdmabuf->send_work);
+	flush_work(&vdmabuf->send_msg_work);
+
+	vdev->config->reset(vdev);
+	vdev->config->del_vqs(vdev);
+}
+
+static struct virtio_device_id id_table[] = {
+	{ VIRTIO_ID_VDMABUF, VIRTIO_DEV_ANY_ID },
+	{ 0 },
+};
+
+static struct virtio_driver virtio_vdmabuf_vdev_drv = {
+	.driver.name =  KBUILD_MODNAME,
+	.driver.owner = THIS_MODULE,
+	.id_table =     id_table,
+	.probe =        virtio_vdmabuf_probe,
+	.remove =       virtio_vdmabuf_remove,
+};
+
+static int __init virtio_vdmabuf_init(void)
+{
+	struct virtio_vdmabuf *vdmabuf;
+	int ret = 0;
+
+	drv_info = NULL;
+
+	ret = misc_register(&virtio_vdmabuf_miscdev);
+	if (ret) {
+		pr_err("virtio-vdmabuf misc driver can't be registered\n");
+		return ret;
+	}
+
+	ret = dma_set_mask_and_coherent(virtio_vdmabuf_miscdev.this_device,
+					DMA_BIT_MASK(64));
+	if (ret < 0) {
+		misc_deregister(&virtio_vdmabuf_miscdev);
+		return -EINVAL;
+	}
+
+	drv_info = kvcalloc(1, sizeof(*drv_info), GFP_KERNEL);
+	if (!drv_info) {
+		misc_deregister(&virtio_vdmabuf_miscdev);
+		return -ENOMEM;
+	}
+
+	vdmabuf = kvcalloc(1, sizeof(*vdmabuf), GFP_KERNEL);
+	if (!vdmabuf) {
+		kvfree(drv_info);
+		misc_deregister(&virtio_vdmabuf_miscdev);
+		return -ENOMEM;
+	}
+
+	vdmabuf->evq = kvcalloc(1, sizeof(*(vdmabuf->evq)), GFP_KERNEL);
+	if (!vdmabuf->evq) {
+		kvfree(drv_info);
+		kvfree(vdmabuf);
+		misc_deregister(&virtio_vdmabuf_miscdev);
+		return -ENOMEM;
+	}
+
+	drv_info->priv = (void *)vdmabuf;
+	drv_info->dev = virtio_vdmabuf_miscdev.this_device;
+
+	mutex_init(&drv_info->g_mutex);
+
+	mutex_init(&vdmabuf->evq->e_readlock);
+	spin_lock_init(&vdmabuf->evq->e_lock);
+
+	INIT_LIST_HEAD(&vdmabuf->evq->e_list);
+	init_waitqueue_head(&vdmabuf->evq->e_wait);
+	hash_init(drv_info->buf_list);
+
+	vdmabuf->evq->pending = 0;
+	vdmabuf->wq = create_workqueue("virtio_vdmabuf_wq");
+
+	ret = register_virtio_driver(&virtio_vdmabuf_vdev_drv);
+	if (ret) {
+		dev_err(drv_info->dev, "vdmabuf driver can't be registered\n");
+		misc_deregister(&virtio_vdmabuf_miscdev);
+		kvfree(vdmabuf);
+		kvfree(drv_info);
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+static void __exit virtio_vdmabuf_deinit(void)
+{
+	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
+	struct virtio_vdmabuf_event *e, *et;
+	unsigned long irqflags;
+
+	misc_deregister(&virtio_vdmabuf_miscdev);
+	unregister_virtio_driver(&virtio_vdmabuf_vdev_drv);
+
+	if (vdmabuf->wq)
+		destroy_workqueue(vdmabuf->wq);
+
+	spin_lock_irqsave(&vdmabuf->evq->e_lock, irqflags);
+
+	list_for_each_entry_safe(e, et, &vdmabuf->evq->e_list,
+				 link) {
+		list_del(&e->link);
+		kvfree(e);
+		vdmabuf->evq->pending--;
+	}
+
+	spin_unlock_irqrestore(&vdmabuf->evq->e_lock, irqflags);
+
+	/* freeing all exported buffers */
+	remove_all_bufs(vdmabuf);
+
+	kvfree(vdmabuf->evq);
+	kvfree(vdmabuf);
+	kvfree(drv_info);
+}
+
+module_init(virtio_vdmabuf_init);
+module_exit(virtio_vdmabuf_deinit);
+
+MODULE_DEVICE_TABLE(virtio, virtio_vdmabuf_id_table);
+MODULE_DESCRIPTION("Virtio Vdmabuf frontend driver");
+MODULE_LICENSE("GPL and additional rights");
diff --git a/include/linux/virtio_vdmabuf.h b/include/linux/virtio_vdmabuf.h
new file mode 100644
index 000000000000..9500bf4a54ac
--- /dev/null
+++ b/include/linux/virtio_vdmabuf.h
@@ -0,0 +1,271 @@
+/* SPDX-License-Identifier: (MIT OR GPL-2.0) */
+
+/*
+ * Copyright © 2021 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#ifndef _LINUX_VIRTIO_VDMABUF_H 
+#define _LINUX_VIRTIO_VDMABUF_H 
+
+#include <uapi/linux/virtio_vdmabuf.h>
+#include <linux/hashtable.h>
+#include <linux/kvm_types.h>
+
+struct virtio_vdmabuf_shared_pages {
+	/* cross-VM ref addr for the buffer */
+	gpa_t ref;
+
+	/* page array */
+	struct page **pages;
+	gpa_t **l2refs;
+	gpa_t *l3refs;
+
+	/* data offset in the first page
+	 * and data length in the last page
+	 */
+	int first_ofst;
+	int last_len;
+
+	/* number of shared pages */
+	int nents;
+};
+
+struct virtio_vdmabuf_buf {
+	virtio_vdmabuf_buf_id_t buf_id;
+
+	struct dma_buf_attachment *attach;
+	struct dma_buf *dma_buf;
+	struct sg_table *sgt;
+	struct virtio_vdmabuf_shared_pages *pages_info;
+	int vmid;
+
+	/* validity of the buffer */
+	bool valid;
+
+	/* set if the buffer is imported via import_ioctl */
+	bool imported;
+
+	/* size of private */
+	size_t sz_priv;
+	/* private data associated with the exported buffer */
+	void *priv;
+
+	struct file *filp;
+	struct hlist_node node;
+};
+
+struct virtio_vdmabuf_event {
+	struct virtio_vdmabuf_e_data e_data;
+	struct list_head link;
+};
+
+struct virtio_vdmabuf_event_queue {
+	wait_queue_head_t e_wait;
+	struct list_head e_list;
+
+	spinlock_t e_lock;
+	struct mutex e_readlock;
+
+	/* # of pending events */
+	int pending;
+};
+
+/* driver information */
+struct virtio_vdmabuf_info {
+	struct device *dev;
+
+	struct list_head head_vdmabuf_list;
+	struct list_head kvm_instances;
+
+	DECLARE_HASHTABLE(buf_list, 7);
+
+	void *priv;
+	struct mutex g_mutex;
+	struct notifier_block kvm_notifier;
+};
+
+/* IOCTL definitions
+ */
+typedef int (*virtio_vdmabuf_ioctl_t)(struct file *filp, void *data);
+
+struct virtio_vdmabuf_ioctl_desc {
+	unsigned int cmd;
+	int flags;
+	virtio_vdmabuf_ioctl_t func;
+	const char *name;
+};
+
+#define VIRTIO_VDMABUF_IOCTL_DEF(ioctl, _func, _flags)	\
+	[_IOC_NR(ioctl)] = {			\
+			.cmd = ioctl,		\
+			.func = _func,		\
+			.flags = _flags,	\
+			.name = #ioctl		\
+}
+
+#define VIRTIO_VDMABUF_VMID(buf_id) ((((buf_id).id) >> 32) & 0xFFFFFFFF)
+
+/* Messages between Host and Guest */
+
+/* List of commands from Guest to Host:
+ *
+ * ------------------------------------------------------------------
+ * A. NEED_VMID
+ *
+ *  guest asks the host to provide its vmid
+ *
+ * req:
+ *
+ * cmd: VIRTIO_VDMABUF_NEED_VMID
+ *
+ * ack:
+ *
+ * cmd: same as req
+ * op[0] : vmid of guest
+ *
+ * ------------------------------------------------------------------
+ * B. EXPORT
+ *
+ *  export dmabuf to host
+ *
+ * req:
+ *
+ * cmd: VIRTIO_VDMABUF_CMD_EXPORT
+ * op0~op3 : HDMABUF ID
+ * op4 : number of pages to be shared
+ * op5 : offset of data in the first page
+ * op6 : length of data in the last page
+ * op7 : upper 32 bit of top-level ref of shared buf
+ * op8 : lower 32 bit of top-level ref of shared buf
+ * op9 : size of private data
+ * op10 ~ op64: User private date associated with the buffer
+ *	        (e.g. graphic buffer's meta info)
+ *
+ * ------------------------------------------------------------------
+ *
+ * List of commands from Host to Guest
+ *
+ * ------------------------------------------------------------------
+ * A. RELEASE
+ *
+ *  notifying guest that the shared buffer is released by an importer
+ *
+ * req:
+ *
+ * cmd: VIRTIO_VDMABUF_CMD_DMABUF_REL
+ * op0~op3 : VDMABUF ID
+ *
+ * ------------------------------------------------------------------
+ */
+
+/* msg structures */
+struct virtio_vdmabuf_msg {
+	struct list_head list;
+	unsigned int cmd;
+	unsigned int op[64];
+};
+
+enum {
+	VDMABUF_VQ_RECV = 0,
+	VDMABUF_VQ_SEND = 1,
+	VDMABUF_VQ_MAX  = 2,
+};
+
+enum virtio_vdmabuf_cmd {
+	VIRTIO_VDMABUF_CMD_NEED_VMID,
+	VIRTIO_VDMABUF_CMD_EXPORT = 0x10,
+	VIRTIO_VDMABUF_CMD_DMABUF_REL
+};
+
+enum virtio_vdmabuf_ops {
+	VIRTIO_VDMABUF_HDMABUF_ID_ID = 0,
+	VIRTIO_VDMABUF_HDMABUF_ID_RNG_KEY0,
+	VIRTIO_VDMABUF_HDMABUF_ID_RNG_KEY1,
+	VIRTIO_VDMABUF_NUM_PAGES_SHARED = 4,
+	VIRTIO_VDMABUF_FIRST_PAGE_DATA_OFFSET,
+	VIRTIO_VDMABUF_LAST_PAGE_DATA_LENGTH,
+	VIRTIO_VDMABUF_REF_ADDR_UPPER_32BIT,
+	VIRTIO_VDMABUF_REF_ADDR_LOWER_32BIT,
+	VIRTIO_VDMABUF_PRIVATE_DATA_SIZE,
+	VIRTIO_VDMABUF_PRIVATE_DATA_START
+};
+
+/* adding exported/imported vdmabuf info to hash */
+static inline int
+virtio_vdmabuf_add_buf(struct virtio_vdmabuf_info *info,
+                       struct virtio_vdmabuf_buf *new)
+{
+	hash_add(info->buf_list, &new->node, new->buf_id.id);
+	return 0;
+}
+
+/* comparing two vdmabuf IDs */
+static inline bool
+is_same_buf(virtio_vdmabuf_buf_id_t a,
+            virtio_vdmabuf_buf_id_t b)
+{
+	int i;
+
+	if (a.id != b.id)
+		return 1;
+
+	/* compare keys */
+	for (i = 0; i < 2; i++) {
+		if (a.rng_key[i] != b.rng_key[i])
+			return false;
+	}
+
+	return true;
+}
+
+/* find buf for given vdmabuf ID */
+static inline struct virtio_vdmabuf_buf
+*virtio_vdmabuf_find_buf(struct virtio_vdmabuf_info *info,
+			 virtio_vdmabuf_buf_id_t *buf_id)
+{
+	struct virtio_vdmabuf_buf *found;
+
+	hash_for_each_possible(info->buf_list, found, node, buf_id->id)
+		if (is_same_buf(found->buf_id, *buf_id))
+			return found;
+
+	return NULL;
+}
+
+/* delete buf from hash */
+static inline int
+virtio_vdmabuf_del_buf(struct virtio_vdmabuf_info *info,
+                       virtio_vdmabuf_buf_id_t *buf_id)
+{
+	struct virtio_vdmabuf_buf *found;
+
+	found = virtio_vdmabuf_find_buf(info, buf_id);
+	if (!found)
+		return -ENOENT;
+
+	hash_del(&found->node);
+
+	return 0;
+}
+
+#endif
diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
index bc1c0621f5ed..39c94637ddee 100644
--- a/include/uapi/linux/virtio_ids.h
+++ b/include/uapi/linux/virtio_ids.h
@@ -54,5 +54,6 @@
 #define VIRTIO_ID_FS			26 /* virtio filesystem */
 #define VIRTIO_ID_PMEM			27 /* virtio pmem */
 #define VIRTIO_ID_MAC80211_HWSIM	29 /* virtio mac80211-hwsim */
+#define VIRTIO_ID_VDMABUF          	40 /* virtio vdmabuf */
 
 #endif /* _LINUX_VIRTIO_IDS_H */
diff --git a/include/uapi/linux/virtio_vdmabuf.h b/include/uapi/linux/virtio_vdmabuf.h
new file mode 100644
index 000000000000..7bddaa04ddd6
--- /dev/null
+++ b/include/uapi/linux/virtio_vdmabuf.h
@@ -0,0 +1,99 @@
+// SPDX-License-Identifier: (MIT OR GPL-2.0)
+
+/*
+ * Copyright © 2021 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#ifndef _UAPI_LINUX_VIRTIO_VDMABUF_H
+#define _UAPI_LINUX_VIRTIO_VDMABUF_H
+
+#define MAX_SIZE_PRIV_DATA 192
+
+typedef struct {
+	__u64 id;
+	/* 8B long Random number */
+	int rng_key[2];
+} virtio_vdmabuf_buf_id_t;
+
+struct virtio_vdmabuf_e_hdr {
+	/* buf_id of new buf */
+	virtio_vdmabuf_buf_id_t buf_id;
+	/* size of private data */
+	int size;
+};
+
+struct virtio_vdmabuf_e_data {
+	struct virtio_vdmabuf_e_hdr hdr;
+	/* ptr to private data */
+	void __user *data;
+};
+
+#define VIRTIO_VDMABUF_IOCTL_IMPORT \
+_IOC(_IOC_NONE, 'G', 2, sizeof(struct virtio_vdmabuf_import))
+#define VIRTIO_VDMABUF_IOCTL_RELEASE \
+_IOC(_IOC_NONE, 'G', 3, sizeof(struct virtio_vdmabuf_import))
+struct virtio_vdmabuf_import {
+	/* IN parameters */
+	/* ahdb buf id to be imported */
+	virtio_vdmabuf_buf_id_t buf_id;
+	/* flags */
+	int flags;
+	/* OUT parameters */
+	/* exported dma buf fd */
+	int fd;
+};
+
+#define VIRTIO_VDMABUF_IOCTL_EXPORT \
+_IOC(_IOC_NONE, 'G', 4, sizeof(struct virtio_vdmabuf_export))
+struct virtio_vdmabuf_export {
+	/* IN parameters */
+	/* DMA buf fd to be exported */
+	int fd;
+	/* exported dma buf id */
+	virtio_vdmabuf_buf_id_t buf_id;
+	int sz_priv;
+	char *priv;
+};
+
+#define VIRTIO_VDMABUF_IOCTL_QUERY \
+_IOC(_IOC_NONE, 'G', 5, sizeof(struct virtio_vdmabuf_query))
+struct virtio_vdmabuf_query {
+	/* in parameters */
+	/* id of buf to be queried */
+	virtio_vdmabuf_buf_id_t buf_id;
+	/* item to be queried */
+	int item;
+	/* OUT parameters */
+	/* Value of queried item */
+	unsigned long info;
+};
+
+/* DMABUF query */
+enum virtio_vdmabuf_query_cmd {
+	VIRTIO_VDMABUF_QUERY_SIZE = 0x10,
+	VIRTIO_VDMABUF_QUERY_BUSY,
+	VIRTIO_VDMABUF_QUERY_PRIV_INFO_SIZE,
+	VIRTIO_VDMABUF_QUERY_PRIV_INFO,
+};
+
+#endif
-- 
2.26.2


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-03  7:35   ` Vivek Kasireddy
  0 siblings, 0 replies; 57+ messages in thread
From: Vivek Kasireddy @ 2021-02-03  7:35 UTC (permalink / raw)
  To: virtualization, dri-devel
  Cc: dongwon.kim, daniel.vetter, Vivek Kasireddy, kraxel,
	daniel.vetter, christian.koenig, linux-media

This driver "transfers" a dmabuf created on the Guest to the Host.
A common use-case for such a transfer includes sharing the scanout
buffer created by a display server or a compositor running in the
Guest with Qemu UI -- running on the Host.

The "transfer" is accomplished by sharing the PFNs of all the pages
associated with the dmabuf and having a new dmabuf created on the
Host that is backed up by the pages mapped from the Guest.

Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
---
 drivers/virtio/Kconfig              |    8 +
 drivers/virtio/Makefile             |    1 +
 drivers/virtio/virtio_vdmabuf.c     | 1090 +++++++++++++++++++++++++++
 include/linux/virtio_vdmabuf.h      |  271 +++++++
 include/uapi/linux/virtio_ids.h     |    1 +
 include/uapi/linux/virtio_vdmabuf.h |   99 +++
 6 files changed, 1470 insertions(+)
 create mode 100644 drivers/virtio/virtio_vdmabuf.c
 create mode 100644 include/linux/virtio_vdmabuf.h
 create mode 100644 include/uapi/linux/virtio_vdmabuf.h

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 7b41130d3f35..e563c12f711e 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -139,4 +139,12 @@ config VIRTIO_DMA_SHARED_BUFFER
 	 This option adds a flavor of dma buffers that are backed by
 	 virtio resources.
 
+config VIRTIO_VDMABUF
+	bool "Enables Vdmabuf driver in guest os"
+	default n
+	depends on VIRTIO
+	help
+	 This driver provides a way to share the dmabufs created in
+	 the Guest with the Host.
+
 endif # VIRTIO_MENU
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 591e6f72aa54..b4bb0738009c 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -9,3 +9,4 @@ obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
 obj-$(CONFIG_VIRTIO_VDPA) += virtio_vdpa.o
 obj-$(CONFIG_VIRTIO_MEM) += virtio_mem.o
 obj-$(CONFIG_VIRTIO_DMA_SHARED_BUFFER) += virtio_dma_buf.o
+obj-$(CONFIG_VIRTIO_VDMABUF) += virtio_vdmabuf.o
diff --git a/drivers/virtio/virtio_vdmabuf.c b/drivers/virtio/virtio_vdmabuf.c
new file mode 100644
index 000000000000..c28f144eb126
--- /dev/null
+++ b/drivers/virtio/virtio_vdmabuf.c
@@ -0,0 +1,1090 @@
+// SPDX-License-Identifier: (MIT OR GPL-2.0)
+
+/*
+ * Copyright © 2021 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Dongwon Kim <dongwon.kim@intel.com>
+ *    Mateusz Polrola <mateusz.polrola@gmail.com>
+ *    Vivek Kasireddy <vivek.kasireddy@intel.com>
+ */
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/uaccess.h>
+#include <linux/miscdevice.h>
+#include <linux/delay.h>
+#include <linux/random.h>
+#include <linux/poll.h>
+#include <linux/spinlock.h>
+#include <linux/dma-buf.h>
+#include <linux/virtio.h>
+#include <linux/virtio_ids.h>
+#include <linux/virtio_config.h>
+#include <linux/virtio_vdmabuf.h>
+
+#define VIRTIO_VDMABUF_MAX_ID INT_MAX
+#define REFS_PER_PAGE (PAGE_SIZE/sizeof(long))
+#define NEW_BUF_ID_GEN(vmid, cnt) (((vmid & 0xFFFFFFFF) << 32) | \
+				    ((cnt) & 0xFFFFFFFF))
+
+/* one global drv object */
+static struct virtio_vdmabuf_info *drv_info;
+
+struct virtio_vdmabuf {
+	/* virtio device structure */
+	struct virtio_device *vdev;
+
+	/* virtual queue array */
+	struct virtqueue *vqs[VDMABUF_VQ_MAX];
+
+	/* ID of guest OS */
+	u64 vmid;
+
+	/* spin lock that needs to be acquired before accessing
+	 * virtual queue
+	 */
+	spinlock_t vq_lock;
+	struct mutex recv_lock;
+	struct mutex send_lock;
+
+	struct list_head msg_list;
+
+	/* workqueue */
+	struct workqueue_struct *wq;
+	struct work_struct recv_work;
+	struct work_struct send_work;
+	struct work_struct send_msg_work;
+
+	struct virtio_vdmabuf_event_queue *evq;
+};
+
+static virtio_vdmabuf_buf_id_t get_buf_id(struct virtio_vdmabuf *vdmabuf)
+{
+	virtio_vdmabuf_buf_id_t buf_id = {0, {0, 0} };
+	static int count = 0;
+
+	count = count < VIRTIO_VDMABUF_MAX_ID ? count + 1 : 0;
+	buf_id.id = NEW_BUF_ID_GEN(vdmabuf->vmid, count);
+
+	/* random data embedded in the id for security */
+	get_random_bytes(&buf_id.rng_key[0], 8);
+
+	return buf_id;
+}
+
+/* sharing pages for original DMABUF with Host */
+static struct virtio_vdmabuf_shared_pages
+*virtio_vdmabuf_share_buf(struct page **pages, int nents,
+			  int first_ofst, int last_len)
+{
+	struct virtio_vdmabuf_shared_pages *pages_info;
+	int i;
+	int n_l2refs = nents/REFS_PER_PAGE +
+		       ((nents % REFS_PER_PAGE) ? 1 : 0);
+
+	pages_info = kvcalloc(1, sizeof(*pages_info), GFP_KERNEL);
+	if (!pages_info)
+		return NULL;
+
+	pages_info->pages = pages;
+	pages_info->nents = nents;
+	pages_info->first_ofst = first_ofst;
+	pages_info->last_len = last_len;
+	pages_info->l3refs = (gpa_t *)__get_free_page(GFP_KERNEL);
+
+	if (!pages_info->l3refs) {
+		kvfree(pages_info);
+		return NULL;
+	}
+
+	pages_info->l2refs = (gpa_t **)__get_free_pages(GFP_KERNEL,
+					get_order(n_l2refs * PAGE_SIZE));
+
+	if (!pages_info->l2refs) {
+		free_page((gpa_t)pages_info->l3refs);
+		kvfree(pages_info);
+		return NULL;
+	}
+
+	/* Share physical address of pages */
+	for (i = 0; i < nents; i++)
+		pages_info->l2refs[i] = (gpa_t *)page_to_phys(pages[i]);
+
+	for (i = 0; i < n_l2refs; i++)
+		pages_info->l3refs[i] =
+			virt_to_phys((void *)pages_info->l2refs +
+				     i * PAGE_SIZE);
+
+	pages_info->ref = (gpa_t)virt_to_phys(pages_info->l3refs);
+
+	return pages_info;
+}
+
+/* stop sharing pages */
+static void
+virtio_vdmabuf_free_buf(struct virtio_vdmabuf_shared_pages *pages_info)
+{
+	int n_l2refs = (pages_info->nents/REFS_PER_PAGE +
+		       ((pages_info->nents % REFS_PER_PAGE) ? 1 : 0));
+
+	free_pages((gpa_t)pages_info->l2refs, get_order(n_l2refs * PAGE_SIZE));
+	free_page((gpa_t)pages_info->l3refs);
+
+	kvfree(pages_info);
+}
+
+static int send_msg_to_host(enum virtio_vdmabuf_cmd cmd, int *op)
+{
+	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
+	struct virtio_vdmabuf_msg *msg;
+	int i;
+
+	switch (cmd) {
+	case VIRTIO_VDMABUF_CMD_NEED_VMID:
+		msg = kvcalloc(1, sizeof(struct virtio_vdmabuf_msg),
+			       GFP_KERNEL);
+		if (!msg)
+			return -ENOMEM;
+
+		if (op)
+			for (i = 0; i < 4; i++)
+				msg->op[i] = op[i];
+		break;
+
+	case VIRTIO_VDMABUF_CMD_EXPORT:
+		msg = kvcalloc(1, sizeof(struct virtio_vdmabuf_msg),
+			       GFP_KERNEL);
+		if (!msg)
+			return -ENOMEM;
+
+		memcpy(&msg->op[0], &op[0], 9 * sizeof(int) + op[9]);
+		break;
+
+	default:
+		/* no command found */
+		return -EINVAL;
+	}
+
+	msg->cmd = cmd;
+	list_add_tail(&msg->list, &vdmabuf->msg_list);
+	queue_work(vdmabuf->wq, &vdmabuf->send_msg_work);
+
+	return 0;
+}
+
+static int add_event_buf_rel(struct virtio_vdmabuf_buf *buf_info)
+{
+	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
+	struct virtio_vdmabuf_event *e_oldest, *e_new;
+	struct virtio_vdmabuf_event_queue *eq = vdmabuf->evq;
+	unsigned long irqflags;
+
+	e_new = kvzalloc(sizeof(*e_new), GFP_KERNEL);
+	if (!e_new)
+		return -ENOMEM;
+
+	e_new->e_data.hdr.buf_id = buf_info->buf_id;
+	e_new->e_data.data = (void *)buf_info->priv;
+	e_new->e_data.hdr.size = buf_info->sz_priv;
+
+	spin_lock_irqsave(&eq->e_lock, irqflags);
+
+	/* check current number of events and if it hits the max num (32)
+	 * then remove the oldest event in the list
+	 */
+	if (eq->pending > 31) {
+		e_oldest = list_first_entry(&eq->e_list,
+					    struct virtio_vdmabuf_event, link);
+		list_del(&e_oldest->link);
+		eq->pending--;
+		kvfree(e_oldest);
+	}
+
+	list_add_tail(&e_new->link, &eq->e_list);
+
+	eq->pending++;
+
+	wake_up_interruptible(&eq->e_wait);
+	spin_unlock_irqrestore(&eq->e_lock, irqflags);
+
+	return 0;
+}
+
+static void virtio_vdmabuf_clear_buf(struct virtio_vdmabuf_buf *exp)
+{
+	/* Start cleanup of buffer in reverse order to exporting */
+	virtio_vdmabuf_free_buf(exp->pages_info);
+
+	dma_buf_unmap_attachment(exp->attach, exp->sgt,
+				 DMA_BIDIRECTIONAL);
+
+	if (exp->dma_buf) {
+		dma_buf_detach(exp->dma_buf, exp->attach);
+		/* close connection to dma-buf completely */
+		dma_buf_put(exp->dma_buf);
+		exp->dma_buf = NULL;
+	}
+}
+
+static int remove_buf(struct virtio_vdmabuf *vdmabuf,
+		      struct virtio_vdmabuf_buf *exp)
+{
+	int ret;
+
+	ret = add_event_buf_rel(exp);
+	if (ret)
+		return ret;
+
+	virtio_vdmabuf_clear_buf(exp);
+
+	ret = virtio_vdmabuf_del_buf(drv_info, &exp->buf_id);
+	if (ret)
+		return ret;
+
+	if (exp->sz_priv > 0 && !exp->priv)
+		kvfree(exp->priv);
+
+	kvfree(exp);
+	return 0;
+}
+
+static int parse_msg_from_host(struct virtio_vdmabuf *vdmabuf,
+		     	       struct virtio_vdmabuf_msg *msg)
+{
+	struct virtio_vdmabuf_buf *exp;
+	virtio_vdmabuf_buf_id_t buf_id;
+	int ret;
+
+	switch (msg->cmd) {
+	case VIRTIO_VDMABUF_CMD_NEED_VMID:
+		vdmabuf->vmid = msg->op[0];
+
+		break;
+	case VIRTIO_VDMABUF_CMD_DMABUF_REL:
+		memcpy(&buf_id, msg->op, sizeof(buf_id));
+
+		exp = virtio_vdmabuf_find_buf(drv_info, &buf_id);
+		if (!exp) {
+			dev_err(drv_info->dev, "can't find buffer\n");
+			return -EINVAL;
+		}
+
+		ret = remove_buf(vdmabuf, exp);
+		if (ret)
+			return ret;
+
+		break;
+	case VIRTIO_VDMABUF_CMD_EXPORT:
+		break;
+	default:
+		dev_err(drv_info->dev, "empty cmd\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static void virtio_vdmabuf_recv_work(struct work_struct *work)
+{
+	struct virtio_vdmabuf *vdmabuf =
+		container_of(work, struct virtio_vdmabuf, recv_work);
+	struct virtqueue *vq = vdmabuf->vqs[VDMABUF_VQ_RECV];
+	struct virtio_vdmabuf_msg *msg;
+	int sz;
+
+	mutex_lock(&vdmabuf->recv_lock);
+
+	do {
+		virtqueue_disable_cb(vq);
+		for (;;) {
+			msg = virtqueue_get_buf(vq, &sz);
+			if (!msg)
+				break;
+
+			/* valid size */
+			if (sz == sizeof(struct virtio_vdmabuf_msg)) {
+				if (parse_msg_from_host(vdmabuf, msg))
+					dev_err(drv_info->dev,
+						"msg parse error\n");
+
+				kvfree(msg);
+			} else {
+				dev_err(drv_info->dev,
+					"received malformed message\n");
+			}
+		}
+	} while (!virtqueue_enable_cb(vq));
+
+	mutex_unlock(&vdmabuf->recv_lock);
+}
+
+static void virtio_vdmabuf_fill_recv_msg(struct virtio_vdmabuf *vdmabuf)
+{
+	struct virtqueue *vq = vdmabuf->vqs[VDMABUF_VQ_RECV];
+	struct scatterlist sg;
+	struct virtio_vdmabuf_msg *msg;
+	int ret;
+
+	msg = kvzalloc(sizeof(*msg), GFP_KERNEL);
+	if (!msg)
+		return;
+
+	sg_init_one(&sg, msg, sizeof(struct virtio_vdmabuf_msg));
+	ret = virtqueue_add_inbuf(vq, &sg, 1, msg, GFP_KERNEL);
+	if (ret)
+		return;
+
+	virtqueue_kick(vq);
+}
+
+static void virtio_vdmabuf_send_msg_work(struct work_struct *work)
+{
+	struct virtio_vdmabuf *vdmabuf =
+		container_of(work, struct virtio_vdmabuf, send_msg_work);
+	struct virtqueue *vq = vdmabuf->vqs[VDMABUF_VQ_SEND];
+	struct scatterlist sg;
+	struct virtio_vdmabuf_msg *msg;
+	bool added = false;
+	int ret;
+
+	mutex_lock(&vdmabuf->send_lock);
+
+	for (;;) {
+		if (list_empty(&vdmabuf->msg_list))
+			break;
+
+		virtio_vdmabuf_fill_recv_msg(vdmabuf);
+
+		msg = list_first_entry(&vdmabuf->msg_list,
+				       struct virtio_vdmabuf_msg, list);
+		list_del_init(&msg->list);
+
+		sg_init_one(&sg, msg, sizeof(struct virtio_vdmabuf_msg));
+		ret = virtqueue_add_outbuf(vq, &sg, 1, msg, GFP_KERNEL);
+		if (ret < 0) {
+			dev_err(drv_info->dev,
+				"failed to add msg to vq\n");
+			break;
+		}
+
+		added = true;	
+	}
+
+	if (added)
+		virtqueue_kick(vq);
+
+	mutex_unlock(&vdmabuf->send_lock);
+}
+
+static void virtio_vdmabuf_send_work(struct work_struct *work)
+{
+	struct virtio_vdmabuf *vdmabuf =
+		container_of(work, struct virtio_vdmabuf, send_work);
+	struct virtqueue *vq = vdmabuf->vqs[VDMABUF_VQ_SEND];
+	struct virtio_vdmabuf_msg *msg;
+	unsigned int sz;
+	bool added = false;
+
+	mutex_lock(&vdmabuf->send_lock);
+
+	do {
+		virtqueue_disable_cb(vq);
+
+		for (;;) {
+			msg = virtqueue_get_buf(vq, &sz);
+			if (!msg)
+				break;
+
+			if (parse_msg_from_host(vdmabuf, msg))
+				dev_err(drv_info->dev,
+					"msg parse error\n");
+
+			kvfree(msg);
+			added = true;
+		}
+	} while (!virtqueue_enable_cb(vq));
+
+	mutex_unlock(&vdmabuf->send_lock);
+
+	if (added)
+		queue_work(vdmabuf->wq, &vdmabuf->send_msg_work);
+}
+
+static void virtio_vdmabuf_recv_cb(struct virtqueue *vq)
+{
+	struct virtio_vdmabuf *vdmabuf = vq->vdev->priv;
+
+	if (!vdmabuf)
+		return;
+
+	queue_work(vdmabuf->wq, &vdmabuf->recv_work);
+}
+
+static void virtio_vdmabuf_send_cb(struct virtqueue *vq)
+{
+	struct virtio_vdmabuf *vdmabuf = vq->vdev->priv;
+
+	if (!vdmabuf)
+		return;
+
+	queue_work(vdmabuf->wq, &vdmabuf->send_work);
+}
+
+static int remove_all_bufs(struct virtio_vdmabuf *vdmabuf)
+{
+	struct virtio_vdmabuf_buf *found;
+	struct hlist_node *tmp;
+	int bkt;
+	int ret;
+
+	hash_for_each_safe(drv_info->buf_list, bkt, tmp, found, node) {
+		ret = remove_buf(vdmabuf, found);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int virtio_vdmabuf_open(struct inode *inode, struct file *filp)
+{
+	int ret;
+
+	if (!drv_info) {
+		pr_err("virtio vdmabuf driver is not ready\n");
+		return -EINVAL;
+	}
+
+	ret = send_msg_to_host(VIRTIO_VDMABUF_CMD_NEED_VMID, 0);
+	if (ret < 0)
+		dev_err(drv_info->dev, "fail to receive vmid\n");
+
+	filp->private_data = drv_info->priv;
+
+	return 0;
+}
+
+static int virtio_vdmabuf_release(struct inode *inode, struct file *filp)
+{
+	return 0;
+}
+
+/* Notify Host about the new vdmabuf */
+static int export_notify(struct virtio_vdmabuf_buf *exp, struct page **pages)
+{
+	int *op;
+	int ret;
+
+	op = kvcalloc(1, sizeof(int) * 65, GFP_KERNEL);
+	if (!op)
+		return -ENOMEM;
+
+	memcpy(op, &exp->buf_id, sizeof(exp->buf_id));
+
+	/* if new pages are to be shared */
+	if (pages) {
+		op[4] = exp->pages_info->nents;
+		op[5] = exp->pages_info->first_ofst;
+		op[6] = exp->pages_info->last_len;
+
+		memcpy(&op[7], &exp->pages_info->ref, sizeof(gpa_t));
+	}
+
+	op[9] = exp->sz_priv;
+
+	/* driver/application specific private info */
+	memcpy(&op[10], exp->priv, op[9]);
+
+	ret = send_msg_to_host(VIRTIO_VDMABUF_CMD_EXPORT, op);
+
+	kvfree(op);
+	return ret;
+}
+
+/* return total number of pages referenced by a sgt
+ * for pre-calculation of # of pages behind a given sgt
+ */
+static int num_pgs(struct sg_table *sgt)
+{
+	struct scatterlist *sgl;
+	int len, i;
+	/* at least one page */
+	int n_pgs = 1;
+
+	sgl = sgt->sgl;
+
+	len = sgl->length - PAGE_SIZE + sgl->offset;
+
+	/* round-up */
+	n_pgs += ((len + PAGE_SIZE - 1)/PAGE_SIZE);
+
+	for (i = 1; i < sgt->nents; i++) {
+		sgl = sg_next(sgl);
+
+		/* round-up */
+		n_pgs += ((sgl->length + PAGE_SIZE - 1) /
+			  PAGE_SIZE); /* round-up */
+	}
+
+	return n_pgs;
+}
+
+/* extract pages referenced by sgt */
+static struct page **extr_pgs(struct sg_table *sgt, int *nents, int *last_len)
+{
+	struct scatterlist *sgl;
+	struct page **pages;
+	struct page **temp_pgs;
+	int i, j;
+	int len;
+
+	*nents = num_pgs(sgt);
+	pages =	kvmalloc_array(*nents, sizeof(struct page *), GFP_KERNEL);
+	if (!pages)
+		return NULL;
+
+	sgl = sgt->sgl;
+
+	temp_pgs = pages;
+	*temp_pgs++ = sg_page(sgl);
+	len = sgl->length - PAGE_SIZE + sgl->offset;
+
+	i = 1;
+	while (len > 0) {
+		*temp_pgs++ = nth_page(sg_page(sgl), i++);
+		len -= PAGE_SIZE;
+	}
+
+	for (i = 1; i < sgt->nents; i++) {
+		sgl = sg_next(sgl);
+		*temp_pgs++ = sg_page(sgl);
+		len = sgl->length - PAGE_SIZE;
+		j = 1;
+
+		while (len > 0) {
+			*temp_pgs++ = nth_page(sg_page(sgl), j++);
+			len -= PAGE_SIZE;
+		}
+	}
+
+	*last_len = len + PAGE_SIZE;
+
+	return pages;
+}
+
+/* ioctl - exporting new vdmabuf
+ *
+ *	 int dmabuf_fd - File handle of original DMABUF
+ *	 virtio_vdmabuf_buf_id_t buf_id - returned vdmabuf ID
+ *	 int sz_priv - size of private data from userspace
+ *	 char *priv - buffer of user private data
+ *
+ */
+static int export_ioctl(struct file *filp, void *data)
+{
+	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
+	struct virtio_vdmabuf_export *attr = data;
+	struct dma_buf *dmabuf;
+	struct dma_buf_attachment *attach;
+	struct sg_table *sgt;
+	struct virtio_vdmabuf_buf *exp;
+	struct page **pages;
+	int nents, last_len;
+	virtio_vdmabuf_buf_id_t buf_id;
+	int ret = 0;
+
+	if (vdmabuf->vmid <= 0)
+		return -EINVAL;
+
+	dmabuf = dma_buf_get(attr->fd);
+	if (IS_ERR(dmabuf))
+		return PTR_ERR(dmabuf);
+
+	mutex_lock(&drv_info->g_mutex);
+
+	buf_id = get_buf_id(vdmabuf);
+
+	attach = dma_buf_attach(dmabuf, drv_info->dev);
+	if (IS_ERR(attach)) {
+		ret = PTR_ERR(attach);
+		goto fail_attach;
+	}
+
+	sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
+	if (IS_ERR(sgt)) {
+		ret = PTR_ERR(sgt);
+		goto fail_map_attachment;
+	}
+
+	/* allocate a new exp */
+	exp = kvcalloc(1, sizeof(*exp), GFP_KERNEL);
+	if (!exp) {
+		ret = -ENOMEM;
+		goto fail_sgt_info_creation;
+	}
+
+	/* possible truncation */
+	if (attr->sz_priv > MAX_SIZE_PRIV_DATA)
+		exp->sz_priv = MAX_SIZE_PRIV_DATA;
+	else
+		exp->sz_priv = attr->sz_priv;
+
+	/* creating buffer for private data */
+	if (exp->sz_priv != 0) {
+		exp->priv = kvcalloc(1, exp->sz_priv, GFP_KERNEL);
+		if (!exp->priv) {
+			ret = -ENOMEM;
+			goto fail_priv_creation;
+		}
+	}
+
+	exp->buf_id = buf_id;
+	exp->attach = attach;
+	exp->sgt = sgt;
+	exp->dma_buf = dmabuf;
+	exp->valid = 1;
+
+	if (exp->sz_priv) {
+		/* copy private data to sgt_info */
+		ret = copy_from_user(exp->priv, attr->priv, exp->sz_priv);
+		if (ret) {
+			ret = -EINVAL;
+			goto fail_exp;
+		}
+	}
+
+	pages = extr_pgs(sgt, &nents, &last_len);
+	if (pages == NULL) {
+		ret = -ENOMEM;
+		goto fail_exp;
+	}
+
+	exp->pages_info = virtio_vdmabuf_share_buf(pages, nents,
+						   sgt->sgl->offset,
+					 	   last_len);
+	if (!exp->pages_info) {
+		ret = -ENOMEM;
+		goto fail_create_pages_info;
+	}
+
+	attr->buf_id = exp->buf_id;
+	ret = export_notify(exp, pages);
+	if (ret < 0)
+		goto fail_send_request;
+
+	/* now register it to the export list */
+	virtio_vdmabuf_add_buf(drv_info, exp);
+
+	exp->filp = filp;
+
+	mutex_unlock(&drv_info->g_mutex);
+
+	return ret;
+
+/* Clean-up if error occurs */
+fail_send_request:
+	virtio_vdmabuf_free_buf(exp->pages_info);
+
+fail_create_pages_info:
+	kvfree(pages);
+
+fail_exp:
+	kvfree(exp->priv);
+
+fail_priv_creation:
+	kvfree(exp);
+
+fail_sgt_info_creation:
+	dma_buf_unmap_attachment(attach, sgt,
+				 DMA_BIDIRECTIONAL);
+
+fail_map_attachment:
+	dma_buf_detach(dmabuf, attach);
+
+fail_attach:
+	dma_buf_put(dmabuf);
+
+	mutex_unlock(&drv_info->g_mutex);
+
+	return ret;
+}
+
+static const struct virtio_vdmabuf_ioctl_desc virtio_vdmabuf_ioctls[] = {
+	VIRTIO_VDMABUF_IOCTL_DEF(VIRTIO_VDMABUF_IOCTL_EXPORT, export_ioctl, 0),
+};
+
+static long virtio_vdmabuf_ioctl(struct file *filp, unsigned int cmd,
+		       		 unsigned long param)
+{
+	const struct virtio_vdmabuf_ioctl_desc *ioctl = NULL;
+	unsigned int nr = _IOC_NR(cmd);
+	int ret;
+	virtio_vdmabuf_ioctl_t func;
+	char *kdata;
+
+	if (nr >= ARRAY_SIZE(virtio_vdmabuf_ioctls)) {
+		dev_err(drv_info->dev, "invalid ioctl\n");
+		return -EINVAL;
+	}
+
+	ioctl = &virtio_vdmabuf_ioctls[nr];
+
+	func = ioctl->func;
+
+	if (unlikely(!func)) {
+		dev_err(drv_info->dev, "no function\n");
+		return -EINVAL;
+	}
+
+	kdata = kvmalloc(_IOC_SIZE(cmd), GFP_KERNEL);
+	if (!kdata)
+		return -ENOMEM;
+
+	if (copy_from_user(kdata, (void __user *)param,
+			   _IOC_SIZE(cmd)) != 0) {
+		dev_err(drv_info->dev,
+			"failed to copy from user arguments\n");
+		ret = -EFAULT;
+		goto ioctl_error;
+	}
+
+	ret = func(filp, kdata);
+
+	if (copy_to_user((void __user *)param, kdata,
+			 _IOC_SIZE(cmd)) != 0) {
+		dev_err(drv_info->dev,
+			"failed to copy to user arguments\n");
+		ret = -EFAULT;
+		goto ioctl_error;
+	}
+
+ioctl_error:
+	kvfree(kdata);
+	return ret;
+}
+
+static unsigned int virtio_vdmabuf_event_poll(struct file *filp,
+			    	    	      struct poll_table_struct *wait)
+{
+	struct virtio_vdmabuf *vdmabuf = filp->private_data;
+
+	poll_wait(filp, &vdmabuf->evq->e_wait, wait);
+
+	if (!list_empty(&vdmabuf->evq->e_list))
+		return POLLIN | POLLRDNORM;
+
+	return 0;
+}
+
+static ssize_t virtio_vdmabuf_event_read(struct file *filp, char __user *buf,
+			       		 size_t cnt, loff_t *ofst)
+{
+	struct virtio_vdmabuf *vdmabuf = filp->private_data;
+	int ret;
+
+	/* make sure user buffer can be written */
+	if (!access_ok(buf, sizeof (*buf))) {
+		dev_err(drv_info->dev, "user buffer can't be written.\n");
+		return -EINVAL;
+	}
+
+	ret = mutex_lock_interruptible(&vdmabuf->evq->e_readlock);
+	if (ret)
+		return ret;
+
+	for (;;) {
+		struct virtio_vdmabuf_event *e = NULL;
+
+		spin_lock_irq(&vdmabuf->evq->e_lock);
+		if (!list_empty(&vdmabuf->evq->e_list)) {
+			e = list_first_entry(&vdmabuf->evq->e_list,
+					     struct virtio_vdmabuf_event, link);
+			list_del(&e->link);
+		}
+		spin_unlock_irq(&vdmabuf->evq->e_lock);
+
+		if (!e) {
+			if (ret)
+				break;
+
+			if (filp->f_flags & O_NONBLOCK) {
+				ret = -EAGAIN;
+				break;
+			}
+
+			mutex_unlock(&vdmabuf->evq->e_readlock);
+			ret = wait_event_interruptible(vdmabuf->evq->e_wait,
+					!list_empty(&vdmabuf->evq->e_list));
+
+			if (ret == 0)
+				ret = mutex_lock_interruptible(
+						&vdmabuf->evq->e_readlock);
+
+			if (ret)
+				return ret;
+		} else {
+			unsigned int len = (sizeof(e->e_data.hdr) +
+					    e->e_data.hdr.size);
+
+			if (len > cnt - ret) {
+put_back_event:
+				spin_lock_irq(&vdmabuf->evq->e_lock);
+				list_add(&e->link, &vdmabuf->evq->e_list);
+				spin_unlock_irq(&vdmabuf->evq->e_lock);
+				break;
+			}
+
+			if (copy_to_user(buf + ret, &e->e_data.hdr,
+					 sizeof(e->e_data.hdr))) {
+				if (ret == 0)
+					ret = -EFAULT;
+
+				goto put_back_event;
+			}
+
+			ret += sizeof(e->e_data.hdr);
+
+			if (copy_to_user(buf + ret, e->e_data.data,
+					 e->e_data.hdr.size)) {
+				/* error while copying void *data */
+
+				struct virtio_vdmabuf_e_hdr dummy_hdr = {0};
+
+				ret -= sizeof(e->e_data.hdr);
+
+				/* nullifying hdr of the event in user buffer */
+				if (copy_to_user(buf + ret, &dummy_hdr,
+						 sizeof(dummy_hdr)))
+					dev_err(drv_info->dev,
+					   "fail to nullify invalid hdr\n");
+
+				ret = -EFAULT;
+
+				goto put_back_event;
+			}
+
+			ret += e->e_data.hdr.size;
+			vdmabuf->evq->pending--;
+			kvfree(e);
+		}
+	}
+
+	mutex_unlock(&vdmabuf->evq->e_readlock);
+
+	return ret;
+}
+
+static const struct file_operations virtio_vdmabuf_fops = {
+	.owner = THIS_MODULE,
+	.open = virtio_vdmabuf_open,
+	.release = virtio_vdmabuf_release,
+	.read = virtio_vdmabuf_event_read,
+	.poll = virtio_vdmabuf_event_poll,
+	.unlocked_ioctl = virtio_vdmabuf_ioctl,
+};
+
+static struct miscdevice virtio_vdmabuf_miscdev = {
+	.minor = MISC_DYNAMIC_MINOR,
+	.name = "virtio-vdmabuf",
+	.fops = &virtio_vdmabuf_fops,
+};
+
+static int virtio_vdmabuf_probe(struct virtio_device *vdev)
+{
+	vq_callback_t *cbs[] = {
+		virtio_vdmabuf_recv_cb,
+		virtio_vdmabuf_send_cb,
+	};
+	static const char *const names[] = {
+		"recv",
+		"send",
+	};
+	struct virtio_vdmabuf *vdmabuf;
+	int ret = 0;
+
+	if (!drv_info)
+		return -EINVAL;
+
+	vdmabuf = drv_info->priv;
+
+	if (!vdmabuf)
+		return -EINVAL;
+
+	vdmabuf->vdev = vdev;
+	vdev->priv = vdmabuf;
+
+	/* initialize spinlock for synchronizing virtqueue accesses */
+	spin_lock_init(&vdmabuf->vq_lock);
+
+	ret = virtio_find_vqs(vdmabuf->vdev, VDMABUF_VQ_MAX, vdmabuf->vqs,
+			      cbs, names, NULL);
+	if (ret) {
+		dev_err(drv_info->dev, "Cannot find any vqs\n");
+		return ret;
+	}
+
+	INIT_LIST_HEAD(&vdmabuf->msg_list);
+	INIT_WORK(&vdmabuf->recv_work, virtio_vdmabuf_recv_work);
+	INIT_WORK(&vdmabuf->send_work, virtio_vdmabuf_send_work);
+	INIT_WORK(&vdmabuf->send_msg_work, virtio_vdmabuf_send_msg_work);
+
+	return ret;
+}
+
+static void virtio_vdmabuf_remove(struct virtio_device *vdev)
+{
+	struct virtio_vdmabuf *vdmabuf;
+
+	if (!drv_info)
+		return;
+
+	vdmabuf = drv_info->priv;
+	flush_work(&vdmabuf->recv_work);
+	flush_work(&vdmabuf->send_work);
+	flush_work(&vdmabuf->send_msg_work);
+
+	vdev->config->reset(vdev);
+	vdev->config->del_vqs(vdev);
+}
+
+static struct virtio_device_id id_table[] = {
+	{ VIRTIO_ID_VDMABUF, VIRTIO_DEV_ANY_ID },
+	{ 0 },
+};
+
+static struct virtio_driver virtio_vdmabuf_vdev_drv = {
+	.driver.name =  KBUILD_MODNAME,
+	.driver.owner = THIS_MODULE,
+	.id_table =     id_table,
+	.probe =        virtio_vdmabuf_probe,
+	.remove =       virtio_vdmabuf_remove,
+};
+
+static int __init virtio_vdmabuf_init(void)
+{
+	struct virtio_vdmabuf *vdmabuf;
+	int ret = 0;
+
+	drv_info = NULL;
+
+	ret = misc_register(&virtio_vdmabuf_miscdev);
+	if (ret) {
+		pr_err("virtio-vdmabuf misc driver can't be registered\n");
+		return ret;
+	}
+
+	ret = dma_set_mask_and_coherent(virtio_vdmabuf_miscdev.this_device,
+					DMA_BIT_MASK(64));
+	if (ret < 0) {
+		misc_deregister(&virtio_vdmabuf_miscdev);
+		return -EINVAL;
+	}
+
+	drv_info = kvcalloc(1, sizeof(*drv_info), GFP_KERNEL);
+	if (!drv_info) {
+		misc_deregister(&virtio_vdmabuf_miscdev);
+		return -ENOMEM;
+	}
+
+	vdmabuf = kvcalloc(1, sizeof(*vdmabuf), GFP_KERNEL);
+	if (!vdmabuf) {
+		kvfree(drv_info);
+		misc_deregister(&virtio_vdmabuf_miscdev);
+		return -ENOMEM;
+	}
+
+	vdmabuf->evq = kvcalloc(1, sizeof(*(vdmabuf->evq)), GFP_KERNEL);
+	if (!vdmabuf->evq) {
+		kvfree(drv_info);
+		kvfree(vdmabuf);
+		misc_deregister(&virtio_vdmabuf_miscdev);
+		return -ENOMEM;
+	}
+
+	drv_info->priv = (void *)vdmabuf;
+	drv_info->dev = virtio_vdmabuf_miscdev.this_device;
+
+	mutex_init(&drv_info->g_mutex);
+
+	mutex_init(&vdmabuf->evq->e_readlock);
+	spin_lock_init(&vdmabuf->evq->e_lock);
+
+	INIT_LIST_HEAD(&vdmabuf->evq->e_list);
+	init_waitqueue_head(&vdmabuf->evq->e_wait);
+	hash_init(drv_info->buf_list);
+
+	vdmabuf->evq->pending = 0;
+	vdmabuf->wq = create_workqueue("virtio_vdmabuf_wq");
+
+	ret = register_virtio_driver(&virtio_vdmabuf_vdev_drv);
+	if (ret) {
+		dev_err(drv_info->dev, "vdmabuf driver can't be registered\n");
+		misc_deregister(&virtio_vdmabuf_miscdev);
+		kvfree(vdmabuf);
+		kvfree(drv_info);
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+static void __exit virtio_vdmabuf_deinit(void)
+{
+	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
+	struct virtio_vdmabuf_event *e, *et;
+	unsigned long irqflags;
+
+	misc_deregister(&virtio_vdmabuf_miscdev);
+	unregister_virtio_driver(&virtio_vdmabuf_vdev_drv);
+
+	if (vdmabuf->wq)
+		destroy_workqueue(vdmabuf->wq);
+
+	spin_lock_irqsave(&vdmabuf->evq->e_lock, irqflags);
+
+	list_for_each_entry_safe(e, et, &vdmabuf->evq->e_list,
+				 link) {
+		list_del(&e->link);
+		kvfree(e);
+		vdmabuf->evq->pending--;
+	}
+
+	spin_unlock_irqrestore(&vdmabuf->evq->e_lock, irqflags);
+
+	/* freeing all exported buffers */
+	remove_all_bufs(vdmabuf);
+
+	kvfree(vdmabuf->evq);
+	kvfree(vdmabuf);
+	kvfree(drv_info);
+}
+
+module_init(virtio_vdmabuf_init);
+module_exit(virtio_vdmabuf_deinit);
+
+MODULE_DEVICE_TABLE(virtio, virtio_vdmabuf_id_table);
+MODULE_DESCRIPTION("Virtio Vdmabuf frontend driver");
+MODULE_LICENSE("GPL and additional rights");
diff --git a/include/linux/virtio_vdmabuf.h b/include/linux/virtio_vdmabuf.h
new file mode 100644
index 000000000000..9500bf4a54ac
--- /dev/null
+++ b/include/linux/virtio_vdmabuf.h
@@ -0,0 +1,271 @@
+/* SPDX-License-Identifier: (MIT OR GPL-2.0) */
+
+/*
+ * Copyright © 2021 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#ifndef _LINUX_VIRTIO_VDMABUF_H 
+#define _LINUX_VIRTIO_VDMABUF_H 
+
+#include <uapi/linux/virtio_vdmabuf.h>
+#include <linux/hashtable.h>
+#include <linux/kvm_types.h>
+
+struct virtio_vdmabuf_shared_pages {
+	/* cross-VM ref addr for the buffer */
+	gpa_t ref;
+
+	/* page array */
+	struct page **pages;
+	gpa_t **l2refs;
+	gpa_t *l3refs;
+
+	/* data offset in the first page
+	 * and data length in the last page
+	 */
+	int first_ofst;
+	int last_len;
+
+	/* number of shared pages */
+	int nents;
+};
+
+struct virtio_vdmabuf_buf {
+	virtio_vdmabuf_buf_id_t buf_id;
+
+	struct dma_buf_attachment *attach;
+	struct dma_buf *dma_buf;
+	struct sg_table *sgt;
+	struct virtio_vdmabuf_shared_pages *pages_info;
+	int vmid;
+
+	/* validity of the buffer */
+	bool valid;
+
+	/* set if the buffer is imported via import_ioctl */
+	bool imported;
+
+	/* size of private */
+	size_t sz_priv;
+	/* private data associated with the exported buffer */
+	void *priv;
+
+	struct file *filp;
+	struct hlist_node node;
+};
+
+struct virtio_vdmabuf_event {
+	struct virtio_vdmabuf_e_data e_data;
+	struct list_head link;
+};
+
+struct virtio_vdmabuf_event_queue {
+	wait_queue_head_t e_wait;
+	struct list_head e_list;
+
+	spinlock_t e_lock;
+	struct mutex e_readlock;
+
+	/* # of pending events */
+	int pending;
+};
+
+/* driver information */
+struct virtio_vdmabuf_info {
+	struct device *dev;
+
+	struct list_head head_vdmabuf_list;
+	struct list_head kvm_instances;
+
+	DECLARE_HASHTABLE(buf_list, 7);
+
+	void *priv;
+	struct mutex g_mutex;
+	struct notifier_block kvm_notifier;
+};
+
+/* IOCTL definitions
+ */
+typedef int (*virtio_vdmabuf_ioctl_t)(struct file *filp, void *data);
+
+struct virtio_vdmabuf_ioctl_desc {
+	unsigned int cmd;
+	int flags;
+	virtio_vdmabuf_ioctl_t func;
+	const char *name;
+};
+
+#define VIRTIO_VDMABUF_IOCTL_DEF(ioctl, _func, _flags)	\
+	[_IOC_NR(ioctl)] = {			\
+			.cmd = ioctl,		\
+			.func = _func,		\
+			.flags = _flags,	\
+			.name = #ioctl		\
+}
+
+#define VIRTIO_VDMABUF_VMID(buf_id) ((((buf_id).id) >> 32) & 0xFFFFFFFF)
+
+/* Messages between Host and Guest */
+
+/* List of commands from Guest to Host:
+ *
+ * ------------------------------------------------------------------
+ * A. NEED_VMID
+ *
+ *  guest asks the host to provide its vmid
+ *
+ * req:
+ *
+ * cmd: VIRTIO_VDMABUF_NEED_VMID
+ *
+ * ack:
+ *
+ * cmd: same as req
+ * op[0] : vmid of guest
+ *
+ * ------------------------------------------------------------------
+ * B. EXPORT
+ *
+ *  export dmabuf to host
+ *
+ * req:
+ *
+ * cmd: VIRTIO_VDMABUF_CMD_EXPORT
+ * op0~op3 : HDMABUF ID
+ * op4 : number of pages to be shared
+ * op5 : offset of data in the first page
+ * op6 : length of data in the last page
+ * op7 : upper 32 bit of top-level ref of shared buf
+ * op8 : lower 32 bit of top-level ref of shared buf
+ * op9 : size of private data
+ * op10 ~ op64: User private date associated with the buffer
+ *	        (e.g. graphic buffer's meta info)
+ *
+ * ------------------------------------------------------------------
+ *
+ * List of commands from Host to Guest
+ *
+ * ------------------------------------------------------------------
+ * A. RELEASE
+ *
+ *  notifying guest that the shared buffer is released by an importer
+ *
+ * req:
+ *
+ * cmd: VIRTIO_VDMABUF_CMD_DMABUF_REL
+ * op0~op3 : VDMABUF ID
+ *
+ * ------------------------------------------------------------------
+ */
+
+/* msg structures */
+struct virtio_vdmabuf_msg {
+	struct list_head list;
+	unsigned int cmd;
+	unsigned int op[64];
+};
+
+enum {
+	VDMABUF_VQ_RECV = 0,
+	VDMABUF_VQ_SEND = 1,
+	VDMABUF_VQ_MAX  = 2,
+};
+
+enum virtio_vdmabuf_cmd {
+	VIRTIO_VDMABUF_CMD_NEED_VMID,
+	VIRTIO_VDMABUF_CMD_EXPORT = 0x10,
+	VIRTIO_VDMABUF_CMD_DMABUF_REL
+};
+
+enum virtio_vdmabuf_ops {
+	VIRTIO_VDMABUF_HDMABUF_ID_ID = 0,
+	VIRTIO_VDMABUF_HDMABUF_ID_RNG_KEY0,
+	VIRTIO_VDMABUF_HDMABUF_ID_RNG_KEY1,
+	VIRTIO_VDMABUF_NUM_PAGES_SHARED = 4,
+	VIRTIO_VDMABUF_FIRST_PAGE_DATA_OFFSET,
+	VIRTIO_VDMABUF_LAST_PAGE_DATA_LENGTH,
+	VIRTIO_VDMABUF_REF_ADDR_UPPER_32BIT,
+	VIRTIO_VDMABUF_REF_ADDR_LOWER_32BIT,
+	VIRTIO_VDMABUF_PRIVATE_DATA_SIZE,
+	VIRTIO_VDMABUF_PRIVATE_DATA_START
+};
+
+/* adding exported/imported vdmabuf info to hash */
+static inline int
+virtio_vdmabuf_add_buf(struct virtio_vdmabuf_info *info,
+                       struct virtio_vdmabuf_buf *new)
+{
+	hash_add(info->buf_list, &new->node, new->buf_id.id);
+	return 0;
+}
+
+/* comparing two vdmabuf IDs */
+static inline bool
+is_same_buf(virtio_vdmabuf_buf_id_t a,
+            virtio_vdmabuf_buf_id_t b)
+{
+	int i;
+
+	if (a.id != b.id)
+		return 1;
+
+	/* compare keys */
+	for (i = 0; i < 2; i++) {
+		if (a.rng_key[i] != b.rng_key[i])
+			return false;
+	}
+
+	return true;
+}
+
+/* find buf for given vdmabuf ID */
+static inline struct virtio_vdmabuf_buf
+*virtio_vdmabuf_find_buf(struct virtio_vdmabuf_info *info,
+			 virtio_vdmabuf_buf_id_t *buf_id)
+{
+	struct virtio_vdmabuf_buf *found;
+
+	hash_for_each_possible(info->buf_list, found, node, buf_id->id)
+		if (is_same_buf(found->buf_id, *buf_id))
+			return found;
+
+	return NULL;
+}
+
+/* delete buf from hash */
+static inline int
+virtio_vdmabuf_del_buf(struct virtio_vdmabuf_info *info,
+                       virtio_vdmabuf_buf_id_t *buf_id)
+{
+	struct virtio_vdmabuf_buf *found;
+
+	found = virtio_vdmabuf_find_buf(info, buf_id);
+	if (!found)
+		return -ENOENT;
+
+	hash_del(&found->node);
+
+	return 0;
+}
+
+#endif
diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
index bc1c0621f5ed..39c94637ddee 100644
--- a/include/uapi/linux/virtio_ids.h
+++ b/include/uapi/linux/virtio_ids.h
@@ -54,5 +54,6 @@
 #define VIRTIO_ID_FS			26 /* virtio filesystem */
 #define VIRTIO_ID_PMEM			27 /* virtio pmem */
 #define VIRTIO_ID_MAC80211_HWSIM	29 /* virtio mac80211-hwsim */
+#define VIRTIO_ID_VDMABUF          	40 /* virtio vdmabuf */
 
 #endif /* _LINUX_VIRTIO_IDS_H */
diff --git a/include/uapi/linux/virtio_vdmabuf.h b/include/uapi/linux/virtio_vdmabuf.h
new file mode 100644
index 000000000000..7bddaa04ddd6
--- /dev/null
+++ b/include/uapi/linux/virtio_vdmabuf.h
@@ -0,0 +1,99 @@
+// SPDX-License-Identifier: (MIT OR GPL-2.0)
+
+/*
+ * Copyright © 2021 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#ifndef _UAPI_LINUX_VIRTIO_VDMABUF_H
+#define _UAPI_LINUX_VIRTIO_VDMABUF_H
+
+#define MAX_SIZE_PRIV_DATA 192
+
+typedef struct {
+	__u64 id;
+	/* 8B long Random number */
+	int rng_key[2];
+} virtio_vdmabuf_buf_id_t;
+
+struct virtio_vdmabuf_e_hdr {
+	/* buf_id of new buf */
+	virtio_vdmabuf_buf_id_t buf_id;
+	/* size of private data */
+	int size;
+};
+
+struct virtio_vdmabuf_e_data {
+	struct virtio_vdmabuf_e_hdr hdr;
+	/* ptr to private data */
+	void __user *data;
+};
+
+#define VIRTIO_VDMABUF_IOCTL_IMPORT \
+_IOC(_IOC_NONE, 'G', 2, sizeof(struct virtio_vdmabuf_import))
+#define VIRTIO_VDMABUF_IOCTL_RELEASE \
+_IOC(_IOC_NONE, 'G', 3, sizeof(struct virtio_vdmabuf_import))
+struct virtio_vdmabuf_import {
+	/* IN parameters */
+	/* ahdb buf id to be imported */
+	virtio_vdmabuf_buf_id_t buf_id;
+	/* flags */
+	int flags;
+	/* OUT parameters */
+	/* exported dma buf fd */
+	int fd;
+};
+
+#define VIRTIO_VDMABUF_IOCTL_EXPORT \
+_IOC(_IOC_NONE, 'G', 4, sizeof(struct virtio_vdmabuf_export))
+struct virtio_vdmabuf_export {
+	/* IN parameters */
+	/* DMA buf fd to be exported */
+	int fd;
+	/* exported dma buf id */
+	virtio_vdmabuf_buf_id_t buf_id;
+	int sz_priv;
+	char *priv;
+};
+
+#define VIRTIO_VDMABUF_IOCTL_QUERY \
+_IOC(_IOC_NONE, 'G', 5, sizeof(struct virtio_vdmabuf_query))
+struct virtio_vdmabuf_query {
+	/* in parameters */
+	/* id of buf to be queried */
+	virtio_vdmabuf_buf_id_t buf_id;
+	/* item to be queried */
+	int item;
+	/* OUT parameters */
+	/* Value of queried item */
+	unsigned long info;
+};
+
+/* DMABUF query */
+enum virtio_vdmabuf_query_cmd {
+	VIRTIO_VDMABUF_QUERY_SIZE = 0x10,
+	VIRTIO_VDMABUF_QUERY_BUSY,
+	VIRTIO_VDMABUF_QUERY_PRIV_INFO_SIZE,
+	VIRTIO_VDMABUF_QUERY_PRIV_INFO,
+};
+
+#endif
-- 
2.26.2

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [RFC v3 3/3] vhost: Add Vdmabuf backend
  2021-02-03  7:35 ` Vivek Kasireddy
  (?)
@ 2021-02-03  7:35   ` Vivek Kasireddy
  -1 siblings, 0 replies; 57+ messages in thread
From: Vivek Kasireddy @ 2021-02-03  7:35 UTC (permalink / raw)
  To: virtualization, dri-devel
  Cc: kraxel, daniel.vetter, daniel.vetter, dongwon.kim, sumit.semwal,
	christian.koenig, linux-media, Vivek Kasireddy

This backend acts as the counterpart to the Vdmabuf Virtio frontend.
When it receives a new export event from the frontend, it raises an
event to alert the Qemu UI/userspace. Qemu then "imports" this buffer
using the Unique ID.

As part of the import step, a new dmabuf is created on the Host using
the page information obtained from the Guest. The fd associated with
this dmabuf is made available to Qemu UI/userspace which then creates
a texture from it for the purpose of displaying it.

Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
---
 drivers/vhost/Kconfig      |    9 +
 drivers/vhost/Makefile     |    3 +
 drivers/vhost/vdmabuf.c    | 1446 ++++++++++++++++++++++++++++++++++++
 include/uapi/linux/vhost.h |    3 +
 4 files changed, 1461 insertions(+)
 create mode 100644 drivers/vhost/vdmabuf.c

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 587fbae06182..9a99cc2611ca 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -89,4 +89,13 @@ config VHOST_CROSS_ENDIAN_LEGACY
 
 	  If unsure, say "N".
 
+config VHOST_VDMABUF
+	bool "Vhost backend for the Vdmabuf driver"
+	depends on KVM && EVENTFD
+	select VHOST
+	default n
+	help
+	  This driver works in pair with the Virtio Vdmabuf frontend. It can
+	  be used to create a dmabuf using the pages shared by the Guest.
+
 endif
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index f3e1897cce85..5c2cea4a7eaf 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -17,3 +17,6 @@ obj-$(CONFIG_VHOST)	+= vhost.o
 
 obj-$(CONFIG_VHOST_IOTLB) += vhost_iotlb.o
 vhost_iotlb-y := iotlb.o
+
+obj-$(CONFIG_VHOST_VDMABUF) += vhost_vdmabuf.o
+vhost_vdmabuf-y := vdmabuf.o
diff --git a/drivers/vhost/vdmabuf.c b/drivers/vhost/vdmabuf.c
new file mode 100644
index 000000000000..1d6e9bcf6648
--- /dev/null
+++ b/drivers/vhost/vdmabuf.c
@@ -0,0 +1,1446 @@
+// SPDX-License-Identifier: (MIT OR GPL-2.0)
+
+/*
+ * Copyright © 2021 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Dongwon Kim <dongwon.kim@intel.com>
+ *    Mateusz Polrola <mateusz.polrola@gmail.com>
+ *    Vivek Kasireddy <vivek.kasireddy@intel.com>
+ */
+
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/miscdevice.h>
+#include <linux/workqueue.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/hashtable.h>
+#include <linux/uaccess.h>
+#include <linux/poll.h>
+#include <linux/dma-buf.h>
+#include <linux/vhost.h>
+#include <linux/vfio.h>
+#include <linux/kvm_host.h>
+#include <linux/virtio_vdmabuf.h>
+
+#include "vhost.h"
+
+#define REFS_PER_PAGE (PAGE_SIZE/sizeof(long))
+
+enum {
+	VHOST_VDMABUF_FEATURES = VHOST_FEATURES,
+};
+
+static struct virtio_vdmabuf_info *drv_info;
+
+struct kvm_instance {
+	struct kvm *kvm;
+	struct list_head link;
+};
+
+struct vhost_vdmabuf {
+	struct vhost_dev dev;
+	struct vhost_virtqueue vqs[VDMABUF_VQ_MAX];
+	struct vhost_work send_work;
+	struct virtio_vdmabuf_event_queue *evq;
+	u64 vmid;
+
+	struct list_head msg_list;
+	struct list_head list;
+	struct kvm *kvm;
+};
+
+static inline void vhost_vdmabuf_add(struct vhost_vdmabuf *new)
+{
+	list_add_tail(&new->list, &drv_info->head_vdmabuf_list);
+}
+
+static inline struct vhost_vdmabuf *vhost_vdmabuf_find(u64 vmid)
+{
+	struct vhost_vdmabuf *found;
+
+	list_for_each_entry(found, &drv_info->head_vdmabuf_list, list)
+		if (found->vmid == vmid)
+			return found;
+
+	return NULL;
+}
+
+static inline bool vhost_vdmabuf_del(struct vhost_vdmabuf *vdmabuf)
+{
+	struct vhost_vdmabuf *iter, *temp;
+
+	list_for_each_entry_safe(iter, temp,
+				 &drv_info->head_vdmabuf_list,
+				 list)
+		if (iter == vdmabuf) {
+			list_del(&iter->list);
+			return true;
+		}
+
+	return false;
+}
+
+static inline void vhost_vdmabuf_del_all(void)
+{
+	struct vhost_vdmabuf *iter, *temp;
+
+	list_for_each_entry_safe(iter, temp,
+				 &drv_info->head_vdmabuf_list,
+				 list) {
+		list_del(&iter->list);
+		kfree(iter);
+	}
+}
+
+static void *map_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
+{
+	struct kvm_host_map map;
+	int ret;
+
+	ret = kvm_vcpu_map(vcpu, gpa_to_gfn(gpa), &map);
+	if (ret < 0)
+		return ERR_PTR(ret);
+	else
+		return map.hva;
+}
+
+static void unmap_hva(struct kvm_vcpu *vcpu, gpa_t hva)
+{
+	struct page *page = virt_to_page(hva);
+	struct kvm_host_map map;
+
+	map.hva = (void *)hva;
+	map.page = page;
+
+	kvm_vcpu_unmap(vcpu, &map, true);
+}
+
+/* mapping guest's pages for the vdmabuf */
+static int
+vhost_vdmabuf_map_pages(u64 vmid,
+		        struct virtio_vdmabuf_shared_pages *pages_info)
+{
+	struct vhost_vdmabuf *vdmabuf = vhost_vdmabuf_find(vmid);
+	struct kvm_vcpu *vcpu;
+	void *paddr;
+	int npgs = REFS_PER_PAGE;
+	int last_nents, n_l2refs;
+	int i, j = 0, k = 0;
+
+	if (!vdmabuf || !vdmabuf->kvm || !pages_info || pages_info->pages)
+		return -EINVAL;
+
+	vcpu = kvm_get_vcpu_by_id(vdmabuf->kvm, 0);
+	if (!vcpu)
+		return -EINVAL;
+
+	last_nents = (pages_info->nents - 1) % npgs + 1;
+	n_l2refs = (pages_info->nents / npgs) + ((last_nents > 0) ? 1 : 0) -
+		   (last_nents == npgs);
+
+	pages_info->pages = kcalloc(pages_info->nents, sizeof(struct page *),
+				    GFP_KERNEL);
+	if (!pages_info->pages)
+		goto fail_page_alloc;
+
+	pages_info->l2refs = kcalloc(n_l2refs, sizeof(gpa_t *), GFP_KERNEL);
+	if (!pages_info->l2refs)
+		goto fail_l2refs;
+
+	pages_info->l3refs = (gpa_t *)map_gpa(vcpu, pages_info->ref);
+	if (IS_ERR(pages_info->l3refs))
+		goto fail_l3refs;
+
+	for (i = 0; i < n_l2refs; i++) {
+		pages_info->l2refs[i] = (gpa_t *)map_gpa(vcpu,
+							 pages_info->l3refs[i]);
+
+		if (IS_ERR(pages_info->l2refs[i]))
+			goto fail_mapping_l2;
+
+		/* last level-2 ref */
+		if (i == n_l2refs - 1)
+			npgs = last_nents;
+
+		for (j = 0; j < npgs; j++) {
+			paddr = map_gpa(vcpu, pages_info->l2refs[i][j]);
+			if (IS_ERR(paddr))
+				goto fail_mapping_l1;
+
+			pages_info->pages[k] = virt_to_page(paddr);
+			k++;
+		}
+		unmap_hva(vcpu, pages_info->l3refs[i]);
+	}
+
+	unmap_hva(vcpu, pages_info->ref);
+
+	return 0;
+
+fail_mapping_l1:
+	for (k = 0; k < j; k++)
+		unmap_hva(vcpu, pages_info->l2refs[i][k]);
+
+fail_mapping_l2:
+	for (j = 0; j < i; j++) {
+		for (k = 0; k < REFS_PER_PAGE; k++)
+			unmap_hva(vcpu, pages_info->l2refs[i][k]);
+	}
+
+	unmap_hva(vcpu, pages_info->l3refs[i]);
+	unmap_hva(vcpu, pages_info->ref);
+
+fail_l3refs:
+	kfree(pages_info->l2refs);
+
+fail_l2refs:
+	kfree(pages_info->pages);
+
+fail_page_alloc:
+	return -ENOMEM;
+}
+
+/* unmapping mapped pages */
+static int
+vhost_vdmabuf_unmap_pages(u64 vmid,
+			  struct virtio_vdmabuf_shared_pages *pages_info)
+{
+	struct vhost_vdmabuf *vdmabuf = vhost_vdmabuf_find(vmid);
+	struct kvm_vcpu *vcpu;
+	int last_nents = (pages_info->nents - 1) % REFS_PER_PAGE + 1;
+	int n_l2refs = (pages_info->nents / REFS_PER_PAGE) +
+		       ((last_nents > 0) ? 1 : 0) -
+		       (last_nents == REFS_PER_PAGE);
+	int i, j;
+
+	if (!vdmabuf || !vdmabuf->kvm || !pages_info || pages_info->pages)
+		return -EINVAL;
+
+	vcpu = kvm_get_vcpu_by_id(vdmabuf->kvm, 0);
+	if (!vcpu)
+		return -EINVAL;
+
+	for (i = 0; i < n_l2refs - 1; i++) {
+		for (j = 0; j < REFS_PER_PAGE; j++)
+			unmap_hva(vcpu, pages_info->l2refs[i][j]);
+	}
+
+	for (j = 0; j < last_nents; j++)
+		unmap_hva(vcpu, pages_info->l2refs[i][j]);
+
+	kfree(pages_info->l2refs);
+	kfree(pages_info->pages);
+	pages_info->pages = NULL;
+
+	return 0;
+}
+
+/* create sg_table with given pages and other parameters */
+static struct sg_table *new_sgt(struct page **pgs,
+				int first_ofst, int last_len,
+				int nents)
+{
+	struct sg_table *sgt;
+	struct scatterlist *sgl;
+	int i, ret;
+
+	sgt = kmalloc(sizeof(struct sg_table), GFP_KERNEL);
+	if (!sgt)
+		return NULL;
+
+	ret = sg_alloc_table(sgt, nents, GFP_KERNEL);
+	if (ret) {
+		kfree(sgt);
+		return NULL;
+	}
+
+	sgl = sgt->sgl;
+	sg_set_page(sgl, pgs[0], PAGE_SIZE-first_ofst, first_ofst);
+
+	for (i = 1; i < nents-1; i++) {
+		sgl = sg_next(sgl);
+		sg_set_page(sgl, pgs[i], PAGE_SIZE, 0);
+	}
+
+	/* more than 1 page */
+	if (nents > 1) {
+		sgl = sg_next(sgl);
+		sg_set_page(sgl, pgs[i], last_len, 0);
+	}
+
+	return sgt;
+}
+
+static struct sg_table
+*vhost_vdmabuf_dmabuf_map(struct dma_buf_attachment *attachment,
+			  enum dma_data_direction dir)
+{
+	struct virtio_vdmabuf_buf *imp;
+
+	if (!attachment->dmabuf || !attachment->dmabuf->priv)
+		return NULL;
+
+	imp = (struct virtio_vdmabuf_buf *)attachment->dmabuf->priv;
+
+	/* if buffer has never been mapped */
+	if (!imp->sgt) {
+		imp->sgt = new_sgt(imp->pages_info->pages,
+				   imp->pages_info->first_ofst,
+				   imp->pages_info->last_len,
+				   imp->pages_info->nents);
+
+		if (!imp->sgt)
+			return NULL;
+	}
+
+	if (!dma_map_sg(attachment->dev, imp->sgt->sgl,
+			imp->sgt->nents, dir)) {
+		sg_free_table(imp->sgt);
+		kfree(imp->sgt);
+		return NULL;
+	}
+
+	return imp->sgt;
+}
+
+static void
+vhost_vdmabuf_dmabuf_unmap(struct dma_buf_attachment *attachment,
+	   	           struct sg_table *sg,
+			   enum dma_data_direction dir)
+{
+	dma_unmap_sg(attachment->dev, sg->sgl, sg->nents, dir);
+}
+
+static int vhost_vdmabuf_dmabuf_mmap(struct dma_buf *dmabuf,
+				     struct vm_area_struct *vma)
+{
+	struct virtio_vdmabuf_buf *imp;
+	u64 uaddr;
+	int i, err;
+
+	if (!dmabuf->priv)
+		return -EINVAL;
+
+	imp = (struct virtio_vdmabuf_buf *)dmabuf->priv;
+
+	if (!imp->pages_info)
+		return -EINVAL;
+
+	vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+
+	uaddr = vma->vm_start;
+	for (i = 0; i < imp->pages_info->nents; i++) {
+		err = vm_insert_page(vma, uaddr,
+				     imp->pages_info->pages[i]);
+		if (err)
+			return err;
+
+		uaddr += PAGE_SIZE;
+	}
+
+	return 0;
+}
+
+static int vhost_vdmabuf_dmabuf_vmap(struct dma_buf *dmabuf,
+				     struct dma_buf_map *map)
+{
+	struct virtio_vdmabuf_buf *imp;
+	void *addr;
+
+	if (!dmabuf->priv)
+		return -EINVAL;
+
+	imp = (struct virtio_vdmabuf_buf *)dmabuf->priv;
+
+	if (!imp->pages_info)
+		return -EINVAL;
+
+	addr = vmap(imp->pages_info->pages, imp->pages_info->nents,
+                    0, PAGE_KERNEL);
+	if (IS_ERR(addr))
+		return PTR_ERR(addr);
+
+	return 0;
+}
+
+static void vhost_vdmabuf_dmabuf_release(struct dma_buf *dma_buf)
+{
+	struct virtio_vdmabuf_buf *imp;
+
+	if (!dma_buf->priv)
+		return;
+
+	imp = (struct virtio_vdmabuf_buf *)dma_buf->priv;
+	imp->dma_buf = NULL;
+	imp->valid = false;
+
+	vhost_vdmabuf_unmap_pages(imp->vmid, imp->pages_info);
+	virtio_vdmabuf_del_buf(drv_info, &imp->buf_id);
+
+	if (imp->sgt) {
+		sg_free_table(imp->sgt);
+		kfree(imp->sgt);
+		imp->sgt = NULL;
+	}
+
+	kfree(imp->priv);
+	kfree(imp->pages_info);
+	kfree(imp);
+}
+
+static const struct dma_buf_ops vhost_vdmabuf_dmabuf_ops = {
+	.map_dma_buf = vhost_vdmabuf_dmabuf_map,
+	.unmap_dma_buf = vhost_vdmabuf_dmabuf_unmap,
+	.release = vhost_vdmabuf_dmabuf_release,
+	.mmap = vhost_vdmabuf_dmabuf_mmap,
+	.vmap = vhost_vdmabuf_dmabuf_vmap,
+};
+
+/* exporting dmabuf as fd */
+static int vhost_vdmabuf_exp_fd(struct virtio_vdmabuf_buf *imp, int flags)
+{
+	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
+
+	exp_info.ops = &vhost_vdmabuf_dmabuf_ops;
+
+	/* multiple of PAGE_SIZE, not considering offset */
+	exp_info.size = imp->pages_info->nents * PAGE_SIZE;
+	exp_info.flags = 0;
+	exp_info.priv = imp;
+
+	if (!imp->dma_buf) {
+		imp->dma_buf = dma_buf_export(&exp_info);
+		if (IS_ERR_OR_NULL(imp->dma_buf)) {
+			imp->dma_buf = NULL;
+			return -EINVAL;
+		}
+	}
+
+	return dma_buf_fd(imp->dma_buf, flags);
+}
+
+static int vhost_vdmabuf_add_event(struct vhost_vdmabuf *vdmabuf,
+				   struct virtio_vdmabuf_buf *buf_info)
+{
+	struct virtio_vdmabuf_event *e_oldest, *e_new;
+	struct virtio_vdmabuf_event_queue *evq = vdmabuf->evq;
+	unsigned long irqflags;
+
+	e_new = kzalloc(sizeof(*e_new), GFP_KERNEL);
+	if (!e_new)
+		return -ENOMEM;
+
+	e_new->e_data.hdr.buf_id = buf_info->buf_id;
+	e_new->e_data.data = (void *)buf_info->priv;
+	e_new->e_data.hdr.size = buf_info->sz_priv;
+
+	spin_lock_irqsave(&evq->e_lock, irqflags);
+
+	/* check current number of event then if it hits the max num (32)
+	 * then remove the oldest event in the list
+	 */
+	if (evq->pending > 31) {
+		e_oldest = list_first_entry(&evq->e_list,
+					    struct virtio_vdmabuf_event, link);
+		list_del(&e_oldest->link);
+		evq->pending--;
+		kfree(e_oldest);
+	}
+
+	list_add_tail(&e_new->link, &evq->e_list);
+
+	evq->pending++;
+
+	wake_up_interruptible(&evq->e_wait);
+	spin_unlock_irqrestore(&evq->e_lock, irqflags);
+
+	return 0;
+}
+
+static int send_msg_to_guest(u64 vmid, enum virtio_vdmabuf_cmd cmd, int *op)
+{
+	struct virtio_vdmabuf_msg *msg;
+	struct vhost_vdmabuf *vdmabuf;
+
+	vdmabuf = vhost_vdmabuf_find(vmid);
+	if (!vdmabuf) {
+		dev_err(drv_info->dev,
+			"can't find vdmabuf for : vmid = %llu\n", vmid);
+		return -EINVAL;
+	}
+
+	if (cmd != VIRTIO_VDMABUF_CMD_DMABUF_REL)
+		return -EINVAL;
+
+	msg = kvcalloc(1, sizeof(struct virtio_vdmabuf_msg),
+		       GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	memcpy(&msg->op[0], &op[0], 8 * sizeof(int));
+	msg->cmd = cmd;
+
+	list_add_tail(&msg->list, &vdmabuf->msg_list);
+	vhost_work_queue(&vdmabuf->dev, &vdmabuf->send_work);
+
+	return 0;
+}
+
+static int register_exported(struct vhost_vdmabuf *vdmabuf,
+			     virtio_vdmabuf_buf_id_t *buf_id, int *ops)
+{
+	struct virtio_vdmabuf_buf *imp;
+	int ret;
+
+	imp = kcalloc(1, sizeof(*imp), GFP_KERNEL);
+	if (!imp)
+		return -ENOMEM;
+
+	imp->pages_info = kcalloc(1, sizeof(struct virtio_vdmabuf_shared_pages),
+				  GFP_KERNEL);
+	if (!imp->pages_info) {
+		kfree(imp);
+		return -ENOMEM;
+	}
+
+	imp->sz_priv = ops[VIRTIO_VDMABUF_PRIVATE_DATA_SIZE];
+	if (imp->sz_priv) {
+		imp->priv = kcalloc(1, ops[VIRTIO_VDMABUF_PRIVATE_DATA_SIZE],
+				    GFP_KERNEL);
+		if (!imp->priv) {
+			kfree(imp->pages_info);
+			kfree(imp);
+			return -ENOMEM;
+		}
+	}
+
+	memcpy(&imp->buf_id, buf_id, sizeof(*buf_id));
+
+	imp->pages_info->nents = ops[VIRTIO_VDMABUF_NUM_PAGES_SHARED];
+	imp->pages_info->first_ofst = ops[VIRTIO_VDMABUF_FIRST_PAGE_DATA_OFFSET];
+	imp->pages_info->last_len = ops[VIRTIO_VDMABUF_LAST_PAGE_DATA_LENGTH];
+	imp->pages_info->ref = *(gpa_t *)&ops[VIRTIO_VDMABUF_REF_ADDR_UPPER_32BIT];
+	imp->vmid = vdmabuf->vmid;
+	imp->valid = true;
+
+	virtio_vdmabuf_add_buf(drv_info, imp);
+
+	/* transferring private data */
+	memcpy(imp->priv, &ops[VIRTIO_VDMABUF_PRIVATE_DATA_START],
+	       ops[VIRTIO_VDMABUF_PRIVATE_DATA_SIZE]);
+
+	/* generate import event */
+	ret = vhost_vdmabuf_add_event(vdmabuf, imp);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static void send_to_recvq(struct vhost_vdmabuf *vdmabuf,
+			  struct vhost_virtqueue *vq)
+{
+	struct virtio_vdmabuf_msg *msg;
+	int head, in, out, in_size;
+	bool added = false;
+	int ret;
+
+	mutex_lock(&vq->mutex);
+
+	if (!vhost_vq_get_backend(vq))
+		goto out;
+
+	vhost_disable_notify(&vdmabuf->dev, vq);
+
+	for (;;) {
+		if (list_empty(&vdmabuf->msg_list))
+			break;
+
+		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+					 &out, &in, NULL, NULL);
+
+		if (head < 0 || head == vq->num)
+			break;
+
+		in_size = iov_length(&vq->iov[out], in);
+		if (in_size != sizeof(struct virtio_vdmabuf_msg)) {
+			dev_err(drv_info->dev, "rx msg with wrong size\n");
+			break;
+		}
+
+		msg = list_first_entry(&vdmabuf->msg_list,
+				       struct virtio_vdmabuf_msg, list);
+		list_del_init(&msg->list);
+
+		ret = __copy_to_user(vq->iov[out].iov_base, msg,
+				     sizeof(struct virtio_vdmabuf_msg));
+		if (ret) {
+			dev_err(drv_info->dev,
+				"fail to copy tx msg\n");
+			break;
+		}
+
+		vhost_add_used(vq, head, in_size);
+		added = true;
+
+		//kfree(msg);
+	}
+
+	vhost_enable_notify(&vdmabuf->dev, vq);
+	if (added)
+		vhost_signal(&vdmabuf->dev, vq);
+out:
+	mutex_unlock(&vq->mutex);
+}
+
+static void vhost_send_msg_work(struct vhost_work *work)
+{
+	struct vhost_vdmabuf *vdmabuf = container_of(work,
+					             struct vhost_vdmabuf,
+					             send_work);
+	struct vhost_virtqueue *vq = &vdmabuf->vqs[VDMABUF_VQ_RECV];
+
+	send_to_recvq(vdmabuf, vq);
+}
+
+/* parse incoming message from a guest */
+static int parse_msg(struct vhost_vdmabuf *vdmabuf,
+		     struct virtio_vdmabuf_msg *msg)
+{
+	virtio_vdmabuf_buf_id_t *buf_id;
+	struct virtio_vdmabuf_msg *vmid_msg;
+	int ret = 0;
+
+	switch (msg->cmd) {
+	case VIRTIO_VDMABUF_CMD_EXPORT:
+		buf_id = (virtio_vdmabuf_buf_id_t *)msg->op;
+		ret = register_exported(vdmabuf, buf_id, msg->op);
+
+		break;
+	case VIRTIO_VDMABUF_CMD_NEED_VMID:
+		vmid_msg = kvcalloc(1, sizeof(struct virtio_vdmabuf_msg),
+				    GFP_KERNEL);
+		if (!vmid_msg) {
+			ret = -ENOMEM;
+			break;
+		}
+
+		vmid_msg->cmd = msg->cmd;
+		vmid_msg->op[0] = vdmabuf->vmid;
+		list_add_tail(&vmid_msg->list, &vdmabuf->msg_list);
+		vhost_work_queue(&vdmabuf->dev, &vdmabuf->send_work);
+
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+
+	return ret;
+}
+
+static void vhost_vdmabuf_handle_send_kick(struct vhost_work *work)
+{
+	struct vhost_virtqueue *vq = container_of(work,
+						  struct vhost_virtqueue,
+						  poll.work);
+	struct vhost_vdmabuf *vdmabuf = container_of(vq->dev,
+					      	     struct vhost_vdmabuf,
+					      	     dev);
+	struct virtio_vdmabuf_msg msg;
+	int head, in, out, in_size;
+	bool added = false;
+	int ret;
+
+	mutex_lock(&vq->mutex);
+
+	if (!vhost_vq_get_backend(vq))
+		goto out;
+
+	vhost_disable_notify(&vdmabuf->dev, vq);
+
+	/* Make sure we will process all pending requests */
+	for (;;) {
+		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+					 &out, &in, NULL, NULL);
+
+		if (head < 0 || head == vq->num)
+			break;
+
+		in_size = iov_length(&vq->iov[in], out);
+		if (in_size != sizeof(struct virtio_vdmabuf_msg)) {
+			dev_err(drv_info->dev, "rx msg with wrong size\n");
+			break;
+		}
+
+		if (__copy_from_user(&msg, vq->iov[in].iov_base, in_size)) {
+			dev_err(drv_info->dev,
+				"err: can't get the msg from vq\n");
+			break;
+		}
+
+		ret = parse_msg(vdmabuf, &msg);
+		if (ret) {
+			dev_err(drv_info->dev,
+				"msg parse error: %d",
+				ret);
+			dev_err(drv_info->dev,
+				" cmd: %d\n", msg.cmd);
+
+			break;
+		}
+
+		vhost_add_used(vq, head, in_size);
+		added = true;
+	}
+
+	vhost_enable_notify(&vdmabuf->dev, vq);
+	if (added)
+		vhost_signal(&vdmabuf->dev, vq);
+out:
+	mutex_unlock(&vq->mutex);
+}
+
+static void vhost_vdmabuf_handle_recv_kick(struct vhost_work *work)
+{
+	struct vhost_virtqueue *vq = container_of(work,
+						  struct vhost_virtqueue,
+						  poll.work);
+	struct vhost_vdmabuf *vdmabuf = container_of(vq->dev,
+					      	     struct vhost_vdmabuf,
+					      	     dev);
+
+	send_to_recvq(vdmabuf, vq);
+}
+
+static int vhost_vdmabuf_get_kvm(struct notifier_block *nb,
+				 unsigned long event, void *data)
+{
+	struct kvm_instance *instance;
+	struct virtio_vdmabuf_info *drv = container_of(nb,
+						struct virtio_vdmabuf_info,
+						kvm_notifier);
+
+	instance = kzalloc(sizeof(*instance), GFP_KERNEL);
+	if (instance && event == KVM_EVENT_CREATE_VM) {
+		if (data) {
+			instance->kvm = data;
+			list_add_tail(&instance->link,
+				      &drv->kvm_instances);
+		}
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct kvm *find_kvm_instance(u64 vmid)
+{
+	struct kvm_instance *instance, *tmp;
+	struct kvm *kvm = NULL;
+
+	list_for_each_entry_safe(instance, tmp, &drv_info->kvm_instances,
+                                 link) {
+		if (instance->kvm->userspace_pid == vmid) {
+			kvm = instance->kvm;
+
+			list_del(&instance->link);
+			kfree(instance);
+			break;
+		}
+	}
+
+	return kvm;
+}
+
+static int vhost_vdmabuf_open(struct inode *inode, struct file *filp)
+{
+	struct vhost_vdmabuf *vdmabuf;
+	struct vhost_virtqueue **vqs;
+	int ret = 0;
+
+	if (!drv_info) {
+		pr_err("vhost-vdmabuf: can't open misc device\n");
+		return -EINVAL;
+	}
+
+	vdmabuf = kzalloc(sizeof(*vdmabuf), GFP_KERNEL |
+			   __GFP_RETRY_MAYFAIL);
+	if (!vdmabuf)
+		return -ENOMEM;
+
+	vqs = kmalloc_array(ARRAY_SIZE(vdmabuf->vqs), sizeof(*vqs),
+			    GFP_KERNEL);
+	if (!vqs) {
+		kfree(vdmabuf);
+		return -ENOMEM;
+	}
+
+	vdmabuf->evq = kcalloc(1, sizeof(*(vdmabuf->evq)), GFP_KERNEL);
+	if (!vdmabuf->evq) {
+		kfree(vdmabuf);
+		kfree(vqs);
+		return -ENOMEM;
+	}
+
+	vqs[VDMABUF_VQ_SEND] = &vdmabuf->vqs[VDMABUF_VQ_SEND];
+	vqs[VDMABUF_VQ_RECV] = &vdmabuf->vqs[VDMABUF_VQ_RECV];
+	vdmabuf->vqs[VDMABUF_VQ_SEND].handle_kick = vhost_vdmabuf_handle_send_kick;
+	vdmabuf->vqs[VDMABUF_VQ_RECV].handle_kick = vhost_vdmabuf_handle_recv_kick;
+
+	vhost_dev_init(&vdmabuf->dev, vqs, ARRAY_SIZE(vdmabuf->vqs),
+		       UIO_MAXIOV, 0, 0, true, NULL);
+
+	INIT_LIST_HEAD(&vdmabuf->msg_list);
+	vhost_work_init(&vdmabuf->send_work, vhost_send_msg_work);
+	vdmabuf->vmid = task_pid_nr(current);
+	vdmabuf->kvm = find_kvm_instance(vdmabuf->vmid);
+	vhost_vdmabuf_add(vdmabuf);
+
+	mutex_init(&vdmabuf->evq->e_readlock);
+	spin_lock_init(&vdmabuf->evq->e_lock);
+
+	/* Initialize event queue */
+	INIT_LIST_HEAD(&vdmabuf->evq->e_list);
+	init_waitqueue_head(&vdmabuf->evq->e_wait);
+
+	/* resetting number of pending events */
+	vdmabuf->evq->pending = 0;
+	filp->private_data = vdmabuf;
+
+	return ret;
+}
+
+static void vhost_vdmabuf_flush(struct vhost_vdmabuf *vdmabuf)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(vdmabuf->vqs); i++)
+		if (vdmabuf->vqs[i].handle_kick)
+			vhost_poll_flush(&vdmabuf->vqs[i].poll);
+
+	vhost_work_flush(&vdmabuf->dev, &vdmabuf->send_work);
+}
+
+static int vhost_vdmabuf_release(struct inode *inode, struct file *filp)
+{
+	struct vhost_vdmabuf *vdmabuf = filp->private_data;
+	struct virtio_vdmabuf_event *e, *et;
+
+	if (!vhost_vdmabuf_del(vdmabuf))
+		return -EINVAL;
+
+	mutex_lock(&drv_info->g_mutex);
+
+	list_for_each_entry_safe(e, et, &vdmabuf->evq->e_list,
+				 link) {
+		list_del(&e->link);
+		kfree(e);
+		vdmabuf->evq->pending--;
+	}
+
+	vhost_vdmabuf_flush(vdmabuf);
+	vhost_dev_cleanup(&vdmabuf->dev);
+
+	kfree(vdmabuf->dev.vqs);
+	kvfree(vdmabuf);
+
+	filp->private_data = NULL;
+	mutex_unlock(&drv_info->g_mutex);
+
+	return 0;
+}
+
+static unsigned int vhost_vdmabuf_event_poll(struct file *filp,
+				    	     struct poll_table_struct *wait)
+{
+	struct vhost_vdmabuf *vdmabuf = filp->private_data;
+
+	poll_wait(filp, &vdmabuf->evq->e_wait, wait);
+
+	if (!list_empty(&vdmabuf->evq->e_list))
+		return POLLIN | POLLRDNORM;
+
+	return 0;
+}
+
+static ssize_t vhost_vdmabuf_event_read(struct file *filp, char __user *buf,
+			       		size_t cnt, loff_t *ofst)
+{
+	struct vhost_vdmabuf *vdmabuf = filp->private_data;
+	int ret;
+
+	if (task_pid_nr(current) != vdmabuf->vmid) {
+		dev_err(drv_info->dev, "current process cannot read events\n");
+		return -EPERM;
+	}
+
+	/* make sure user buffer can be written */
+	if (!access_ok(buf, sizeof(*buf))) {
+		dev_err(drv_info->dev, "user buffer can't be written.\n");
+		return -EINVAL;
+	}
+
+	ret = mutex_lock_interruptible(&vdmabuf->evq->e_readlock);
+	if (ret)
+		return ret;
+
+	for (;;) {
+		struct virtio_vdmabuf_event *e = NULL;
+
+		spin_lock_irq(&vdmabuf->evq->e_lock);
+		if (!list_empty(&vdmabuf->evq->e_list)) {
+			e = list_first_entry(&vdmabuf->evq->e_list,
+					     struct virtio_vdmabuf_event, link);
+			list_del(&e->link);
+		}
+		spin_unlock_irq(&vdmabuf->evq->e_lock);
+
+		if (!e) {
+			if (ret)
+				break;
+
+			if (filp->f_flags & O_NONBLOCK) {
+				ret = -EAGAIN;
+				break;
+			}
+
+			mutex_unlock(&vdmabuf->evq->e_readlock);
+			ret = wait_event_interruptible(vdmabuf->evq->e_wait,
+					!list_empty(&vdmabuf->evq->e_list));
+
+			if (ret == 0)
+				ret = mutex_lock_interruptible(
+						&vdmabuf->evq->e_readlock);
+
+			if (ret)
+				return ret;
+		} else {
+			unsigned int len = (sizeof(e->e_data.hdr) +
+					    e->e_data.hdr.size);
+
+			if (len > cnt - ret) {
+put_back_event:
+				spin_lock_irq(&vdmabuf->evq->e_lock);
+				list_add(&e->link, &vdmabuf->evq->e_list);
+				spin_unlock_irq(&vdmabuf->evq->e_lock);
+				break;
+			}
+
+			if (copy_to_user(buf + ret, &e->e_data.hdr,
+					 sizeof(e->e_data.hdr))) {
+				if (ret == 0)
+					ret = -EFAULT;
+
+				goto put_back_event;
+			}
+
+			ret += sizeof(e->e_data.hdr);
+
+			if (copy_to_user(buf + ret, e->e_data.data,
+					 e->e_data.hdr.size)) {
+
+				struct virtio_vdmabuf_e_hdr dummy_hdr = {0};
+
+				ret -= sizeof(e->e_data.hdr);
+
+				/* nullifying hdr of the event in user buffer */
+				if (copy_to_user(buf + ret, &dummy_hdr,
+						 sizeof(dummy_hdr)))
+					dev_err(drv_info->dev,
+					   "fail to nullify invalid hdr\n");
+
+				ret = -EFAULT;
+
+				goto put_back_event;
+			}
+
+			ret += e->e_data.hdr.size;
+
+			spin_lock_irq(&vdmabuf->evq->e_lock);
+			vdmabuf->evq->pending--;
+			spin_unlock_irq(&vdmabuf->evq->e_lock);
+			kfree(e);
+		}
+	}
+
+	mutex_unlock(&vdmabuf->evq->e_readlock);
+
+	return ret;
+}
+
+static int vhost_vdmabuf_start(struct vhost_vdmabuf *vdmabuf)
+{
+        struct vhost_virtqueue *vq;
+        int i, ret;
+
+	mutex_lock(&vdmabuf->dev.mutex);
+
+        ret = vhost_dev_check_owner(&vdmabuf->dev);
+        if (ret)
+                goto err;
+
+	for (i = 0; i < ARRAY_SIZE(vdmabuf->vqs); i++) {
+		vq = &vdmabuf->vqs[i];
+
+		mutex_lock(&vq->mutex);
+
+		if (!vhost_vq_access_ok(vq)) {
+			ret = -EFAULT;
+			goto err_vq;
+		}
+
+		if (!vhost_vq_get_backend(vq)) {
+			vhost_vq_set_backend(vq, vdmabuf);
+			ret = vhost_vq_init_access(vq);
+			if (ret)
+				goto err_vq;
+		}
+
+		mutex_unlock(&vq->mutex);
+	}
+
+	mutex_unlock(&vdmabuf->dev.mutex);
+        return 0;
+
+err_vq:
+        vhost_vq_set_backend(vq, NULL);
+        mutex_unlock(&vq->mutex);
+
+	for (i = 0; i < ARRAY_SIZE(vdmabuf->vqs); i++) {
+		vq = &vdmabuf->vqs[i];
+
+		mutex_lock(&vq->mutex);
+		vhost_vq_set_backend(vq, NULL);
+		mutex_unlock(&vq->mutex);
+	}
+
+err:
+	mutex_unlock(&vdmabuf->dev.mutex);
+        return ret;
+}
+
+static int vhost_vdmabuf_stop(struct vhost_vdmabuf *vdmabuf)
+{
+        struct vhost_virtqueue *vq;
+        int i, ret;
+
+	mutex_lock(&vdmabuf->dev.mutex);
+
+        ret = vhost_dev_check_owner(&vdmabuf->dev);
+        if (ret)
+                goto err;
+
+	for (i = 0; i < ARRAY_SIZE(vdmabuf->vqs); i++) {
+		vq = &vdmabuf->vqs[i];
+
+		mutex_lock(&vq->mutex);
+		vhost_vq_set_backend(vq, NULL);
+		mutex_unlock(&vq->mutex);
+	}
+
+err:
+	mutex_unlock(&vdmabuf->dev.mutex);
+        return ret;
+}
+
+static int vhost_vdmabuf_set_features(struct vhost_vdmabuf *vdmabuf,
+				      u64 features)
+{
+	struct vhost_virtqueue *vq;
+	int i;
+
+	if (features & ~VHOST_VDMABUF_FEATURES)
+		return -EOPNOTSUPP;
+
+	mutex_lock(&vdmabuf->dev.mutex);
+	if ((features & (1 << VHOST_F_LOG_ALL)) &&
+	    !vhost_log_access_ok(&vdmabuf->dev)) {
+		mutex_unlock(&vdmabuf->dev.mutex);
+		return -EFAULT;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(vdmabuf->vqs); i++) {
+		vq = &vdmabuf->vqs[i];
+		mutex_lock(&vq->mutex);
+		vq->acked_features = features;
+		mutex_unlock(&vq->mutex);
+	}
+
+	mutex_unlock(&vdmabuf->dev.mutex);
+	return 0;
+}
+
+/* wrapper ioctl for vhost interface control */
+static int vhost_core_ioctl(struct file *filp, unsigned int cmd,
+			    unsigned long param)
+{
+	struct vhost_vdmabuf *vdmabuf = filp->private_data;
+	void __user *argp = (void __user *)param;
+	u64 features;
+	int ret, start;
+
+	switch (cmd) {
+	case VHOST_GET_FEATURES:
+		features = VHOST_VDMABUF_FEATURES;
+		if (copy_to_user(argp, &features, sizeof(features)))
+			return -EFAULT;
+		return 0;
+	case VHOST_SET_FEATURES:
+		if (copy_from_user(&features, argp, sizeof(features)))
+			return -EFAULT;
+		return vhost_vdmabuf_set_features(vdmabuf, features);
+	case VHOST_VDMABUF_SET_RUNNING:
+		if (copy_from_user(&start, argp, sizeof(start)))
+                        return -EFAULT;
+
+		if (start)
+                	return vhost_vdmabuf_start(vdmabuf);
+                else
+                        return vhost_vdmabuf_stop(vdmabuf);
+	default:
+		mutex_lock(&vdmabuf->dev.mutex);
+		ret = vhost_dev_ioctl(&vdmabuf->dev, cmd, argp);
+		if (ret == -ENOIOCTLCMD) {
+			ret = vhost_vring_ioctl(&vdmabuf->dev, cmd, argp);
+		} else {
+			vhost_vdmabuf_flush(vdmabuf);
+		}
+		mutex_unlock(&vdmabuf->dev.mutex);
+	}
+
+	return ret;
+}
+
+/*
+ * ioctl - importing vdmabuf from guest OS
+ *
+ * user parameters:
+ *
+ *	virtio_vdmabuf_buf_id_t buf_id - vdmabuf ID of imported buffer
+ *	int flags - flags
+ *	int fd - file handle of	the imported buffer
+ *
+ */
+static int import_ioctl(struct file *filp, void *data)
+{
+	struct vhost_vdmabuf *vdmabuf = filp->private_data;
+	struct virtio_vdmabuf_import *attr = data;
+	struct virtio_vdmabuf_buf *imp;
+	int ret = 0;
+
+	mutex_lock(&vdmabuf->dev.mutex);
+
+	/* look for dmabuf for the id */
+	imp = virtio_vdmabuf_find_buf(drv_info, &attr->buf_id);
+	if (!imp || !imp->valid) {
+		mutex_unlock(&vdmabuf->dev.mutex);
+		dev_err(drv_info->dev,
+			"no valid buf found with id = %llu\n", attr->buf_id.id);
+		return -ENOENT;
+	}
+
+	/* only if mapped pages are not present */
+	if (!imp->pages_info->pages) {
+		ret = vhost_vdmabuf_map_pages(vdmabuf->vmid, imp->pages_info);
+		if (ret < 0) {
+			dev_err(drv_info->dev,
+				"failed to map guest pages\n");
+			goto fail_map;
+		}
+	}
+
+	attr->fd = vhost_vdmabuf_exp_fd(imp, attr->flags);
+	if (attr->fd < 0) {
+		dev_err(drv_info->dev, "failed to get file descriptor\n");
+		goto fail_import;
+	}
+
+	imp->imported = true;
+
+	mutex_unlock(&vdmabuf->dev.mutex);
+	goto success;
+
+fail_import:
+	/* not imported yet? */
+	if (!imp->imported) {
+		vhost_vdmabuf_unmap_pages(vdmabuf->vmid, imp->pages_info);
+		if (imp->dma_buf)
+			kfree(imp->dma_buf);
+
+		if (imp->sgt) {
+			sg_free_table(imp->sgt);
+			kfree(imp->sgt);
+			imp->sgt = NULL;
+		}
+	}
+
+fail_map:
+	/* Check if buffer is still valid and if not remove it
+	 * from imported list.
+	 */
+	if (!imp->valid && !imp->imported) {
+		virtio_vdmabuf_del_buf(drv_info, &imp->buf_id);
+		kfree(imp->priv);
+		kfree(imp->pages_info);
+		kfree(imp);
+	}
+
+	ret =  attr->fd;
+	mutex_unlock(&vdmabuf->dev.mutex);
+
+success:
+	return ret;
+}
+
+static int release_ioctl(struct file *filp, void *data)
+{
+	struct vhost_vdmabuf *vdmabuf = filp->private_data;
+	struct virtio_vdmabuf_import *attr = data;
+	struct virtio_vdmabuf_buf *imp;
+	virtio_vdmabuf_buf_id_t buf_id = attr->buf_id;
+	int *op;
+	int ret = 0;
+
+	op = kcalloc(1, sizeof(int) * 65, GFP_KERNEL);
+	if (!op)
+		return -ENOMEM;
+
+	imp = virtio_vdmabuf_find_buf(drv_info, &buf_id);
+	if (!imp)
+		return -EINVAL;
+
+	imp->imported = false;
+
+	memcpy(op, &imp->buf_id, sizeof(imp->buf_id));
+
+	ret = send_msg_to_guest(vdmabuf->vmid, VIRTIO_VDMABUF_CMD_DMABUF_REL,
+				op);
+	if (ret < 0) {
+		dev_err(drv_info->dev, "fail to send release cmd\n");
+		return ret;
+	}
+
+	return 0;
+}
+
+/*
+ * ioctl - querying various information of vdmabuf
+ *
+ * user parameters:
+ *
+ *	virtio_vdmabuf_buf_id_t buf_id - vdmabuf ID of imported buffer
+ *	unsigned long info - returned querying result
+ *
+ */
+static int query_ioctl(struct file *filp, void *data)
+{
+	struct virtio_vdmabuf_query *attr = data;
+	struct virtio_vdmabuf_buf *imp;
+	virtio_vdmabuf_buf_id_t buf_id = attr->buf_id;
+
+	/* query for imported dmabuf */
+	imp = virtio_vdmabuf_find_buf(drv_info, &buf_id);
+	if (!imp)
+		return -EINVAL;
+
+	switch (attr->item) {
+	/* size of dmabuf in byte */
+	case VIRTIO_VDMABUF_QUERY_SIZE:
+		if (imp->dma_buf) {
+			/* if local dma_buf is created (if it's
+			 * ever mapped), retrieve it directly
+			 * from struct dma_buf *
+			 */
+			attr->info = imp->dma_buf->size;
+		} else {
+			/* calcuate it from given nents, first_ofst
+			 * and last_len
+			 */
+			attr->info = ((imp->pages_info->nents)*PAGE_SIZE -
+				     (imp->pages_info->first_ofst) - PAGE_SIZE +
+				     (imp->pages_info->last_len));
+		}
+		break;
+
+	/* whether the buffer is used or not */
+	case VIRTIO_VDMABUF_QUERY_BUSY:
+		/* checks if it's used by importer */
+		attr->info = imp->imported;
+		break;
+
+	/* size of private info attached to buffer */
+	case VIRTIO_VDMABUF_QUERY_PRIV_INFO_SIZE:
+		attr->info = imp->sz_priv;
+		break;
+
+	/* copy private info attached to buffer */
+	case VIRTIO_VDMABUF_QUERY_PRIV_INFO:
+		if (imp->sz_priv > 0) {
+			int n;
+
+			n = copy_to_user((void __user *)attr->info,
+					imp->priv,
+					imp->sz_priv);
+			if (n != 0)
+				return -EINVAL;
+		}
+		break;
+
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static const struct virtio_vdmabuf_ioctl_desc vhost_vdmabuf_ioctls[] = {
+	VIRTIO_VDMABUF_IOCTL_DEF(VIRTIO_VDMABUF_IOCTL_IMPORT, import_ioctl, 0),
+	VIRTIO_VDMABUF_IOCTL_DEF(VIRTIO_VDMABUF_IOCTL_RELEASE, release_ioctl, 0),
+	VIRTIO_VDMABUF_IOCTL_DEF(VIRTIO_VDMABUF_IOCTL_QUERY, query_ioctl, 0),
+};
+
+static long vhost_vdmabuf_ioctl(struct file *filp, unsigned int cmd,
+				unsigned long param)
+{
+	const struct virtio_vdmabuf_ioctl_desc *ioctl;
+	virtio_vdmabuf_ioctl_t func;
+	unsigned int nr;
+	int ret;
+	char *kdata;
+
+	/* check if cmd is vhost's */
+	if (_IOC_TYPE(cmd) == VHOST_VIRTIO) {
+		ret = vhost_core_ioctl(filp, cmd, param);
+		return ret;
+	}
+
+	nr = _IOC_NR(cmd);
+
+	if (nr >= ARRAY_SIZE(vhost_vdmabuf_ioctls)) {
+		dev_err(drv_info->dev, "invalid ioctl\n");
+		return -EINVAL;
+	}
+
+	ioctl = &vhost_vdmabuf_ioctls[nr];
+
+	func = ioctl->func;
+
+	if (unlikely(!func)) {
+		dev_err(drv_info->dev, "no function\n");
+		return -EINVAL;
+	}
+
+	kdata = kmalloc(_IOC_SIZE(cmd), GFP_KERNEL);
+	if (!kdata)
+		return -ENOMEM;
+
+	if (copy_from_user(kdata, (void __user *)param,
+			   _IOC_SIZE(cmd)) != 0) {
+		dev_err(drv_info->dev,
+			"failed to copy args from userspace\n");
+		ret = -EFAULT;
+		goto ioctl_error;
+	}
+
+	ret = func(filp, kdata);
+
+	if (copy_to_user((void __user *)param, kdata,
+			 _IOC_SIZE(cmd)) != 0) {
+		dev_err(drv_info->dev,
+			"failed to copy args back to userspace\n");
+		ret = -EFAULT;
+		goto ioctl_error;
+	}
+
+ioctl_error:
+	kfree(kdata);
+	return ret;
+}
+
+static const struct file_operations vhost_vdmabuf_fops = {
+	.owner = THIS_MODULE,
+	.open = vhost_vdmabuf_open,
+	.release = vhost_vdmabuf_release,
+	.read = vhost_vdmabuf_event_read,
+	.poll = vhost_vdmabuf_event_poll,
+	.unlocked_ioctl = vhost_vdmabuf_ioctl,
+};
+
+static struct miscdevice vhost_vdmabuf_miscdev = {
+	.minor = MISC_DYNAMIC_MINOR,
+	.name = "vhost-vdmabuf",
+	.fops = &vhost_vdmabuf_fops,
+};
+
+static int __init vhost_vdmabuf_init(void)
+{
+	int ret = 0;
+
+	ret = misc_register(&vhost_vdmabuf_miscdev);
+	if (ret) {
+		pr_err("vhost-vdmabuf: driver can't be registered\n");
+		return ret;
+	}
+
+	dma_coerce_mask_and_coherent(vhost_vdmabuf_miscdev.this_device,
+				     DMA_BIT_MASK(64));
+
+	drv_info = kcalloc(1, sizeof(*drv_info), GFP_KERNEL);
+	if (!drv_info) {
+		misc_deregister(&vhost_vdmabuf_miscdev);
+		return -ENOMEM;
+	}
+
+	drv_info->dev = vhost_vdmabuf_miscdev.this_device;
+
+	hash_init(drv_info->buf_list);
+	mutex_init(&drv_info->g_mutex);
+
+	INIT_LIST_HEAD(&drv_info->head_vdmabuf_list);
+	INIT_LIST_HEAD(&drv_info->kvm_instances);
+
+	drv_info->kvm_notifier.notifier_call = vhost_vdmabuf_get_kvm;
+	ret = kvm_vm_register_notifier(&drv_info->kvm_notifier);
+
+	return ret;
+}
+
+static void __exit vhost_vdmabuf_deinit(void)
+{
+	misc_deregister(&vhost_vdmabuf_miscdev);
+	vhost_vdmabuf_del_all();
+
+	kvm_vm_unregister_notifier(&drv_info->kvm_notifier);
+	kfree(drv_info);
+	drv_info = NULL;
+}
+
+module_init(vhost_vdmabuf_init);
+module_exit(vhost_vdmabuf_deinit);
+
+MODULE_DESCRIPTION("Vhost Vdmabuf Driver");
+MODULE_LICENSE("GPL and additional rights");
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index c998860d7bbc..87e0d1b8cc76 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -150,4 +150,7 @@
 /* Get the valid iova range */
 #define VHOST_VDPA_GET_IOVA_RANGE	_IOR(VHOST_VIRTIO, 0x78, \
 					     struct vhost_vdpa_iova_range)
+
+/* VHOST_VDMABUF specific defines */
+#define VHOST_VDMABUF_SET_RUNNING       _IOW(VHOST_VIRTIO, 0x80, int)
 #endif
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [RFC v3 3/3] vhost: Add Vdmabuf backend
@ 2021-02-03  7:35   ` Vivek Kasireddy
  0 siblings, 0 replies; 57+ messages in thread
From: Vivek Kasireddy @ 2021-02-03  7:35 UTC (permalink / raw)
  To: virtualization, dri-devel
  Cc: dongwon.kim, daniel.vetter, sumit.semwal, daniel.vetter,
	christian.koenig, linux-media

This backend acts as the counterpart to the Vdmabuf Virtio frontend.
When it receives a new export event from the frontend, it raises an
event to alert the Qemu UI/userspace. Qemu then "imports" this buffer
using the Unique ID.

As part of the import step, a new dmabuf is created on the Host using
the page information obtained from the Guest. The fd associated with
this dmabuf is made available to Qemu UI/userspace which then creates
a texture from it for the purpose of displaying it.

Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
---
 drivers/vhost/Kconfig      |    9 +
 drivers/vhost/Makefile     |    3 +
 drivers/vhost/vdmabuf.c    | 1446 ++++++++++++++++++++++++++++++++++++
 include/uapi/linux/vhost.h |    3 +
 4 files changed, 1461 insertions(+)
 create mode 100644 drivers/vhost/vdmabuf.c

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 587fbae06182..9a99cc2611ca 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -89,4 +89,13 @@ config VHOST_CROSS_ENDIAN_LEGACY
 
 	  If unsure, say "N".
 
+config VHOST_VDMABUF
+	bool "Vhost backend for the Vdmabuf driver"
+	depends on KVM && EVENTFD
+	select VHOST
+	default n
+	help
+	  This driver works in pair with the Virtio Vdmabuf frontend. It can
+	  be used to create a dmabuf using the pages shared by the Guest.
+
 endif
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index f3e1897cce85..5c2cea4a7eaf 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -17,3 +17,6 @@ obj-$(CONFIG_VHOST)	+= vhost.o
 
 obj-$(CONFIG_VHOST_IOTLB) += vhost_iotlb.o
 vhost_iotlb-y := iotlb.o
+
+obj-$(CONFIG_VHOST_VDMABUF) += vhost_vdmabuf.o
+vhost_vdmabuf-y := vdmabuf.o
diff --git a/drivers/vhost/vdmabuf.c b/drivers/vhost/vdmabuf.c
new file mode 100644
index 000000000000..1d6e9bcf6648
--- /dev/null
+++ b/drivers/vhost/vdmabuf.c
@@ -0,0 +1,1446 @@
+// SPDX-License-Identifier: (MIT OR GPL-2.0)
+
+/*
+ * Copyright © 2021 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Dongwon Kim <dongwon.kim@intel.com>
+ *    Mateusz Polrola <mateusz.polrola@gmail.com>
+ *    Vivek Kasireddy <vivek.kasireddy@intel.com>
+ */
+
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/miscdevice.h>
+#include <linux/workqueue.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/hashtable.h>
+#include <linux/uaccess.h>
+#include <linux/poll.h>
+#include <linux/dma-buf.h>
+#include <linux/vhost.h>
+#include <linux/vfio.h>
+#include <linux/kvm_host.h>
+#include <linux/virtio_vdmabuf.h>
+
+#include "vhost.h"
+
+#define REFS_PER_PAGE (PAGE_SIZE/sizeof(long))
+
+enum {
+	VHOST_VDMABUF_FEATURES = VHOST_FEATURES,
+};
+
+static struct virtio_vdmabuf_info *drv_info;
+
+struct kvm_instance {
+	struct kvm *kvm;
+	struct list_head link;
+};
+
+struct vhost_vdmabuf {
+	struct vhost_dev dev;
+	struct vhost_virtqueue vqs[VDMABUF_VQ_MAX];
+	struct vhost_work send_work;
+	struct virtio_vdmabuf_event_queue *evq;
+	u64 vmid;
+
+	struct list_head msg_list;
+	struct list_head list;
+	struct kvm *kvm;
+};
+
+static inline void vhost_vdmabuf_add(struct vhost_vdmabuf *new)
+{
+	list_add_tail(&new->list, &drv_info->head_vdmabuf_list);
+}
+
+static inline struct vhost_vdmabuf *vhost_vdmabuf_find(u64 vmid)
+{
+	struct vhost_vdmabuf *found;
+
+	list_for_each_entry(found, &drv_info->head_vdmabuf_list, list)
+		if (found->vmid == vmid)
+			return found;
+
+	return NULL;
+}
+
+static inline bool vhost_vdmabuf_del(struct vhost_vdmabuf *vdmabuf)
+{
+	struct vhost_vdmabuf *iter, *temp;
+
+	list_for_each_entry_safe(iter, temp,
+				 &drv_info->head_vdmabuf_list,
+				 list)
+		if (iter == vdmabuf) {
+			list_del(&iter->list);
+			return true;
+		}
+
+	return false;
+}
+
+static inline void vhost_vdmabuf_del_all(void)
+{
+	struct vhost_vdmabuf *iter, *temp;
+
+	list_for_each_entry_safe(iter, temp,
+				 &drv_info->head_vdmabuf_list,
+				 list) {
+		list_del(&iter->list);
+		kfree(iter);
+	}
+}
+
+static void *map_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
+{
+	struct kvm_host_map map;
+	int ret;
+
+	ret = kvm_vcpu_map(vcpu, gpa_to_gfn(gpa), &map);
+	if (ret < 0)
+		return ERR_PTR(ret);
+	else
+		return map.hva;
+}
+
+static void unmap_hva(struct kvm_vcpu *vcpu, gpa_t hva)
+{
+	struct page *page = virt_to_page(hva);
+	struct kvm_host_map map;
+
+	map.hva = (void *)hva;
+	map.page = page;
+
+	kvm_vcpu_unmap(vcpu, &map, true);
+}
+
+/* mapping guest's pages for the vdmabuf */
+static int
+vhost_vdmabuf_map_pages(u64 vmid,
+		        struct virtio_vdmabuf_shared_pages *pages_info)
+{
+	struct vhost_vdmabuf *vdmabuf = vhost_vdmabuf_find(vmid);
+	struct kvm_vcpu *vcpu;
+	void *paddr;
+	int npgs = REFS_PER_PAGE;
+	int last_nents, n_l2refs;
+	int i, j = 0, k = 0;
+
+	if (!vdmabuf || !vdmabuf->kvm || !pages_info || pages_info->pages)
+		return -EINVAL;
+
+	vcpu = kvm_get_vcpu_by_id(vdmabuf->kvm, 0);
+	if (!vcpu)
+		return -EINVAL;
+
+	last_nents = (pages_info->nents - 1) % npgs + 1;
+	n_l2refs = (pages_info->nents / npgs) + ((last_nents > 0) ? 1 : 0) -
+		   (last_nents == npgs);
+
+	pages_info->pages = kcalloc(pages_info->nents, sizeof(struct page *),
+				    GFP_KERNEL);
+	if (!pages_info->pages)
+		goto fail_page_alloc;
+
+	pages_info->l2refs = kcalloc(n_l2refs, sizeof(gpa_t *), GFP_KERNEL);
+	if (!pages_info->l2refs)
+		goto fail_l2refs;
+
+	pages_info->l3refs = (gpa_t *)map_gpa(vcpu, pages_info->ref);
+	if (IS_ERR(pages_info->l3refs))
+		goto fail_l3refs;
+
+	for (i = 0; i < n_l2refs; i++) {
+		pages_info->l2refs[i] = (gpa_t *)map_gpa(vcpu,
+							 pages_info->l3refs[i]);
+
+		if (IS_ERR(pages_info->l2refs[i]))
+			goto fail_mapping_l2;
+
+		/* last level-2 ref */
+		if (i == n_l2refs - 1)
+			npgs = last_nents;
+
+		for (j = 0; j < npgs; j++) {
+			paddr = map_gpa(vcpu, pages_info->l2refs[i][j]);
+			if (IS_ERR(paddr))
+				goto fail_mapping_l1;
+
+			pages_info->pages[k] = virt_to_page(paddr);
+			k++;
+		}
+		unmap_hva(vcpu, pages_info->l3refs[i]);
+	}
+
+	unmap_hva(vcpu, pages_info->ref);
+
+	return 0;
+
+fail_mapping_l1:
+	for (k = 0; k < j; k++)
+		unmap_hva(vcpu, pages_info->l2refs[i][k]);
+
+fail_mapping_l2:
+	for (j = 0; j < i; j++) {
+		for (k = 0; k < REFS_PER_PAGE; k++)
+			unmap_hva(vcpu, pages_info->l2refs[i][k]);
+	}
+
+	unmap_hva(vcpu, pages_info->l3refs[i]);
+	unmap_hva(vcpu, pages_info->ref);
+
+fail_l3refs:
+	kfree(pages_info->l2refs);
+
+fail_l2refs:
+	kfree(pages_info->pages);
+
+fail_page_alloc:
+	return -ENOMEM;
+}
+
+/* unmapping mapped pages */
+static int
+vhost_vdmabuf_unmap_pages(u64 vmid,
+			  struct virtio_vdmabuf_shared_pages *pages_info)
+{
+	struct vhost_vdmabuf *vdmabuf = vhost_vdmabuf_find(vmid);
+	struct kvm_vcpu *vcpu;
+	int last_nents = (pages_info->nents - 1) % REFS_PER_PAGE + 1;
+	int n_l2refs = (pages_info->nents / REFS_PER_PAGE) +
+		       ((last_nents > 0) ? 1 : 0) -
+		       (last_nents == REFS_PER_PAGE);
+	int i, j;
+
+	if (!vdmabuf || !vdmabuf->kvm || !pages_info || pages_info->pages)
+		return -EINVAL;
+
+	vcpu = kvm_get_vcpu_by_id(vdmabuf->kvm, 0);
+	if (!vcpu)
+		return -EINVAL;
+
+	for (i = 0; i < n_l2refs - 1; i++) {
+		for (j = 0; j < REFS_PER_PAGE; j++)
+			unmap_hva(vcpu, pages_info->l2refs[i][j]);
+	}
+
+	for (j = 0; j < last_nents; j++)
+		unmap_hva(vcpu, pages_info->l2refs[i][j]);
+
+	kfree(pages_info->l2refs);
+	kfree(pages_info->pages);
+	pages_info->pages = NULL;
+
+	return 0;
+}
+
+/* create sg_table with given pages and other parameters */
+static struct sg_table *new_sgt(struct page **pgs,
+				int first_ofst, int last_len,
+				int nents)
+{
+	struct sg_table *sgt;
+	struct scatterlist *sgl;
+	int i, ret;
+
+	sgt = kmalloc(sizeof(struct sg_table), GFP_KERNEL);
+	if (!sgt)
+		return NULL;
+
+	ret = sg_alloc_table(sgt, nents, GFP_KERNEL);
+	if (ret) {
+		kfree(sgt);
+		return NULL;
+	}
+
+	sgl = sgt->sgl;
+	sg_set_page(sgl, pgs[0], PAGE_SIZE-first_ofst, first_ofst);
+
+	for (i = 1; i < nents-1; i++) {
+		sgl = sg_next(sgl);
+		sg_set_page(sgl, pgs[i], PAGE_SIZE, 0);
+	}
+
+	/* more than 1 page */
+	if (nents > 1) {
+		sgl = sg_next(sgl);
+		sg_set_page(sgl, pgs[i], last_len, 0);
+	}
+
+	return sgt;
+}
+
+static struct sg_table
+*vhost_vdmabuf_dmabuf_map(struct dma_buf_attachment *attachment,
+			  enum dma_data_direction dir)
+{
+	struct virtio_vdmabuf_buf *imp;
+
+	if (!attachment->dmabuf || !attachment->dmabuf->priv)
+		return NULL;
+
+	imp = (struct virtio_vdmabuf_buf *)attachment->dmabuf->priv;
+
+	/* if buffer has never been mapped */
+	if (!imp->sgt) {
+		imp->sgt = new_sgt(imp->pages_info->pages,
+				   imp->pages_info->first_ofst,
+				   imp->pages_info->last_len,
+				   imp->pages_info->nents);
+
+		if (!imp->sgt)
+			return NULL;
+	}
+
+	if (!dma_map_sg(attachment->dev, imp->sgt->sgl,
+			imp->sgt->nents, dir)) {
+		sg_free_table(imp->sgt);
+		kfree(imp->sgt);
+		return NULL;
+	}
+
+	return imp->sgt;
+}
+
+static void
+vhost_vdmabuf_dmabuf_unmap(struct dma_buf_attachment *attachment,
+	   	           struct sg_table *sg,
+			   enum dma_data_direction dir)
+{
+	dma_unmap_sg(attachment->dev, sg->sgl, sg->nents, dir);
+}
+
+static int vhost_vdmabuf_dmabuf_mmap(struct dma_buf *dmabuf,
+				     struct vm_area_struct *vma)
+{
+	struct virtio_vdmabuf_buf *imp;
+	u64 uaddr;
+	int i, err;
+
+	if (!dmabuf->priv)
+		return -EINVAL;
+
+	imp = (struct virtio_vdmabuf_buf *)dmabuf->priv;
+
+	if (!imp->pages_info)
+		return -EINVAL;
+
+	vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+
+	uaddr = vma->vm_start;
+	for (i = 0; i < imp->pages_info->nents; i++) {
+		err = vm_insert_page(vma, uaddr,
+				     imp->pages_info->pages[i]);
+		if (err)
+			return err;
+
+		uaddr += PAGE_SIZE;
+	}
+
+	return 0;
+}
+
+static int vhost_vdmabuf_dmabuf_vmap(struct dma_buf *dmabuf,
+				     struct dma_buf_map *map)
+{
+	struct virtio_vdmabuf_buf *imp;
+	void *addr;
+
+	if (!dmabuf->priv)
+		return -EINVAL;
+
+	imp = (struct virtio_vdmabuf_buf *)dmabuf->priv;
+
+	if (!imp->pages_info)
+		return -EINVAL;
+
+	addr = vmap(imp->pages_info->pages, imp->pages_info->nents,
+                    0, PAGE_KERNEL);
+	if (IS_ERR(addr))
+		return PTR_ERR(addr);
+
+	return 0;
+}
+
+static void vhost_vdmabuf_dmabuf_release(struct dma_buf *dma_buf)
+{
+	struct virtio_vdmabuf_buf *imp;
+
+	if (!dma_buf->priv)
+		return;
+
+	imp = (struct virtio_vdmabuf_buf *)dma_buf->priv;
+	imp->dma_buf = NULL;
+	imp->valid = false;
+
+	vhost_vdmabuf_unmap_pages(imp->vmid, imp->pages_info);
+	virtio_vdmabuf_del_buf(drv_info, &imp->buf_id);
+
+	if (imp->sgt) {
+		sg_free_table(imp->sgt);
+		kfree(imp->sgt);
+		imp->sgt = NULL;
+	}
+
+	kfree(imp->priv);
+	kfree(imp->pages_info);
+	kfree(imp);
+}
+
+static const struct dma_buf_ops vhost_vdmabuf_dmabuf_ops = {
+	.map_dma_buf = vhost_vdmabuf_dmabuf_map,
+	.unmap_dma_buf = vhost_vdmabuf_dmabuf_unmap,
+	.release = vhost_vdmabuf_dmabuf_release,
+	.mmap = vhost_vdmabuf_dmabuf_mmap,
+	.vmap = vhost_vdmabuf_dmabuf_vmap,
+};
+
+/* exporting dmabuf as fd */
+static int vhost_vdmabuf_exp_fd(struct virtio_vdmabuf_buf *imp, int flags)
+{
+	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
+
+	exp_info.ops = &vhost_vdmabuf_dmabuf_ops;
+
+	/* multiple of PAGE_SIZE, not considering offset */
+	exp_info.size = imp->pages_info->nents * PAGE_SIZE;
+	exp_info.flags = 0;
+	exp_info.priv = imp;
+
+	if (!imp->dma_buf) {
+		imp->dma_buf = dma_buf_export(&exp_info);
+		if (IS_ERR_OR_NULL(imp->dma_buf)) {
+			imp->dma_buf = NULL;
+			return -EINVAL;
+		}
+	}
+
+	return dma_buf_fd(imp->dma_buf, flags);
+}
+
+static int vhost_vdmabuf_add_event(struct vhost_vdmabuf *vdmabuf,
+				   struct virtio_vdmabuf_buf *buf_info)
+{
+	struct virtio_vdmabuf_event *e_oldest, *e_new;
+	struct virtio_vdmabuf_event_queue *evq = vdmabuf->evq;
+	unsigned long irqflags;
+
+	e_new = kzalloc(sizeof(*e_new), GFP_KERNEL);
+	if (!e_new)
+		return -ENOMEM;
+
+	e_new->e_data.hdr.buf_id = buf_info->buf_id;
+	e_new->e_data.data = (void *)buf_info->priv;
+	e_new->e_data.hdr.size = buf_info->sz_priv;
+
+	spin_lock_irqsave(&evq->e_lock, irqflags);
+
+	/* check current number of event then if it hits the max num (32)
+	 * then remove the oldest event in the list
+	 */
+	if (evq->pending > 31) {
+		e_oldest = list_first_entry(&evq->e_list,
+					    struct virtio_vdmabuf_event, link);
+		list_del(&e_oldest->link);
+		evq->pending--;
+		kfree(e_oldest);
+	}
+
+	list_add_tail(&e_new->link, &evq->e_list);
+
+	evq->pending++;
+
+	wake_up_interruptible(&evq->e_wait);
+	spin_unlock_irqrestore(&evq->e_lock, irqflags);
+
+	return 0;
+}
+
+static int send_msg_to_guest(u64 vmid, enum virtio_vdmabuf_cmd cmd, int *op)
+{
+	struct virtio_vdmabuf_msg *msg;
+	struct vhost_vdmabuf *vdmabuf;
+
+	vdmabuf = vhost_vdmabuf_find(vmid);
+	if (!vdmabuf) {
+		dev_err(drv_info->dev,
+			"can't find vdmabuf for : vmid = %llu\n", vmid);
+		return -EINVAL;
+	}
+
+	if (cmd != VIRTIO_VDMABUF_CMD_DMABUF_REL)
+		return -EINVAL;
+
+	msg = kvcalloc(1, sizeof(struct virtio_vdmabuf_msg),
+		       GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	memcpy(&msg->op[0], &op[0], 8 * sizeof(int));
+	msg->cmd = cmd;
+
+	list_add_tail(&msg->list, &vdmabuf->msg_list);
+	vhost_work_queue(&vdmabuf->dev, &vdmabuf->send_work);
+
+	return 0;
+}
+
+static int register_exported(struct vhost_vdmabuf *vdmabuf,
+			     virtio_vdmabuf_buf_id_t *buf_id, int *ops)
+{
+	struct virtio_vdmabuf_buf *imp;
+	int ret;
+
+	imp = kcalloc(1, sizeof(*imp), GFP_KERNEL);
+	if (!imp)
+		return -ENOMEM;
+
+	imp->pages_info = kcalloc(1, sizeof(struct virtio_vdmabuf_shared_pages),
+				  GFP_KERNEL);
+	if (!imp->pages_info) {
+		kfree(imp);
+		return -ENOMEM;
+	}
+
+	imp->sz_priv = ops[VIRTIO_VDMABUF_PRIVATE_DATA_SIZE];
+	if (imp->sz_priv) {
+		imp->priv = kcalloc(1, ops[VIRTIO_VDMABUF_PRIVATE_DATA_SIZE],
+				    GFP_KERNEL);
+		if (!imp->priv) {
+			kfree(imp->pages_info);
+			kfree(imp);
+			return -ENOMEM;
+		}
+	}
+
+	memcpy(&imp->buf_id, buf_id, sizeof(*buf_id));
+
+	imp->pages_info->nents = ops[VIRTIO_VDMABUF_NUM_PAGES_SHARED];
+	imp->pages_info->first_ofst = ops[VIRTIO_VDMABUF_FIRST_PAGE_DATA_OFFSET];
+	imp->pages_info->last_len = ops[VIRTIO_VDMABUF_LAST_PAGE_DATA_LENGTH];
+	imp->pages_info->ref = *(gpa_t *)&ops[VIRTIO_VDMABUF_REF_ADDR_UPPER_32BIT];
+	imp->vmid = vdmabuf->vmid;
+	imp->valid = true;
+
+	virtio_vdmabuf_add_buf(drv_info, imp);
+
+	/* transferring private data */
+	memcpy(imp->priv, &ops[VIRTIO_VDMABUF_PRIVATE_DATA_START],
+	       ops[VIRTIO_VDMABUF_PRIVATE_DATA_SIZE]);
+
+	/* generate import event */
+	ret = vhost_vdmabuf_add_event(vdmabuf, imp);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static void send_to_recvq(struct vhost_vdmabuf *vdmabuf,
+			  struct vhost_virtqueue *vq)
+{
+	struct virtio_vdmabuf_msg *msg;
+	int head, in, out, in_size;
+	bool added = false;
+	int ret;
+
+	mutex_lock(&vq->mutex);
+
+	if (!vhost_vq_get_backend(vq))
+		goto out;
+
+	vhost_disable_notify(&vdmabuf->dev, vq);
+
+	for (;;) {
+		if (list_empty(&vdmabuf->msg_list))
+			break;
+
+		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+					 &out, &in, NULL, NULL);
+
+		if (head < 0 || head == vq->num)
+			break;
+
+		in_size = iov_length(&vq->iov[out], in);
+		if (in_size != sizeof(struct virtio_vdmabuf_msg)) {
+			dev_err(drv_info->dev, "rx msg with wrong size\n");
+			break;
+		}
+
+		msg = list_first_entry(&vdmabuf->msg_list,
+				       struct virtio_vdmabuf_msg, list);
+		list_del_init(&msg->list);
+
+		ret = __copy_to_user(vq->iov[out].iov_base, msg,
+				     sizeof(struct virtio_vdmabuf_msg));
+		if (ret) {
+			dev_err(drv_info->dev,
+				"fail to copy tx msg\n");
+			break;
+		}
+
+		vhost_add_used(vq, head, in_size);
+		added = true;
+
+		//kfree(msg);
+	}
+
+	vhost_enable_notify(&vdmabuf->dev, vq);
+	if (added)
+		vhost_signal(&vdmabuf->dev, vq);
+out:
+	mutex_unlock(&vq->mutex);
+}
+
+static void vhost_send_msg_work(struct vhost_work *work)
+{
+	struct vhost_vdmabuf *vdmabuf = container_of(work,
+					             struct vhost_vdmabuf,
+					             send_work);
+	struct vhost_virtqueue *vq = &vdmabuf->vqs[VDMABUF_VQ_RECV];
+
+	send_to_recvq(vdmabuf, vq);
+}
+
+/* parse incoming message from a guest */
+static int parse_msg(struct vhost_vdmabuf *vdmabuf,
+		     struct virtio_vdmabuf_msg *msg)
+{
+	virtio_vdmabuf_buf_id_t *buf_id;
+	struct virtio_vdmabuf_msg *vmid_msg;
+	int ret = 0;
+
+	switch (msg->cmd) {
+	case VIRTIO_VDMABUF_CMD_EXPORT:
+		buf_id = (virtio_vdmabuf_buf_id_t *)msg->op;
+		ret = register_exported(vdmabuf, buf_id, msg->op);
+
+		break;
+	case VIRTIO_VDMABUF_CMD_NEED_VMID:
+		vmid_msg = kvcalloc(1, sizeof(struct virtio_vdmabuf_msg),
+				    GFP_KERNEL);
+		if (!vmid_msg) {
+			ret = -ENOMEM;
+			break;
+		}
+
+		vmid_msg->cmd = msg->cmd;
+		vmid_msg->op[0] = vdmabuf->vmid;
+		list_add_tail(&vmid_msg->list, &vdmabuf->msg_list);
+		vhost_work_queue(&vdmabuf->dev, &vdmabuf->send_work);
+
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+
+	return ret;
+}
+
+static void vhost_vdmabuf_handle_send_kick(struct vhost_work *work)
+{
+	struct vhost_virtqueue *vq = container_of(work,
+						  struct vhost_virtqueue,
+						  poll.work);
+	struct vhost_vdmabuf *vdmabuf = container_of(vq->dev,
+					      	     struct vhost_vdmabuf,
+					      	     dev);
+	struct virtio_vdmabuf_msg msg;
+	int head, in, out, in_size;
+	bool added = false;
+	int ret;
+
+	mutex_lock(&vq->mutex);
+
+	if (!vhost_vq_get_backend(vq))
+		goto out;
+
+	vhost_disable_notify(&vdmabuf->dev, vq);
+
+	/* Make sure we will process all pending requests */
+	for (;;) {
+		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+					 &out, &in, NULL, NULL);
+
+		if (head < 0 || head == vq->num)
+			break;
+
+		in_size = iov_length(&vq->iov[in], out);
+		if (in_size != sizeof(struct virtio_vdmabuf_msg)) {
+			dev_err(drv_info->dev, "rx msg with wrong size\n");
+			break;
+		}
+
+		if (__copy_from_user(&msg, vq->iov[in].iov_base, in_size)) {
+			dev_err(drv_info->dev,
+				"err: can't get the msg from vq\n");
+			break;
+		}
+
+		ret = parse_msg(vdmabuf, &msg);
+		if (ret) {
+			dev_err(drv_info->dev,
+				"msg parse error: %d",
+				ret);
+			dev_err(drv_info->dev,
+				" cmd: %d\n", msg.cmd);
+
+			break;
+		}
+
+		vhost_add_used(vq, head, in_size);
+		added = true;
+	}
+
+	vhost_enable_notify(&vdmabuf->dev, vq);
+	if (added)
+		vhost_signal(&vdmabuf->dev, vq);
+out:
+	mutex_unlock(&vq->mutex);
+}
+
+static void vhost_vdmabuf_handle_recv_kick(struct vhost_work *work)
+{
+	struct vhost_virtqueue *vq = container_of(work,
+						  struct vhost_virtqueue,
+						  poll.work);
+	struct vhost_vdmabuf *vdmabuf = container_of(vq->dev,
+					      	     struct vhost_vdmabuf,
+					      	     dev);
+
+	send_to_recvq(vdmabuf, vq);
+}
+
+static int vhost_vdmabuf_get_kvm(struct notifier_block *nb,
+				 unsigned long event, void *data)
+{
+	struct kvm_instance *instance;
+	struct virtio_vdmabuf_info *drv = container_of(nb,
+						struct virtio_vdmabuf_info,
+						kvm_notifier);
+
+	instance = kzalloc(sizeof(*instance), GFP_KERNEL);
+	if (instance && event == KVM_EVENT_CREATE_VM) {
+		if (data) {
+			instance->kvm = data;
+			list_add_tail(&instance->link,
+				      &drv->kvm_instances);
+		}
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct kvm *find_kvm_instance(u64 vmid)
+{
+	struct kvm_instance *instance, *tmp;
+	struct kvm *kvm = NULL;
+
+	list_for_each_entry_safe(instance, tmp, &drv_info->kvm_instances,
+                                 link) {
+		if (instance->kvm->userspace_pid == vmid) {
+			kvm = instance->kvm;
+
+			list_del(&instance->link);
+			kfree(instance);
+			break;
+		}
+	}
+
+	return kvm;
+}
+
+static int vhost_vdmabuf_open(struct inode *inode, struct file *filp)
+{
+	struct vhost_vdmabuf *vdmabuf;
+	struct vhost_virtqueue **vqs;
+	int ret = 0;
+
+	if (!drv_info) {
+		pr_err("vhost-vdmabuf: can't open misc device\n");
+		return -EINVAL;
+	}
+
+	vdmabuf = kzalloc(sizeof(*vdmabuf), GFP_KERNEL |
+			   __GFP_RETRY_MAYFAIL);
+	if (!vdmabuf)
+		return -ENOMEM;
+
+	vqs = kmalloc_array(ARRAY_SIZE(vdmabuf->vqs), sizeof(*vqs),
+			    GFP_KERNEL);
+	if (!vqs) {
+		kfree(vdmabuf);
+		return -ENOMEM;
+	}
+
+	vdmabuf->evq = kcalloc(1, sizeof(*(vdmabuf->evq)), GFP_KERNEL);
+	if (!vdmabuf->evq) {
+		kfree(vdmabuf);
+		kfree(vqs);
+		return -ENOMEM;
+	}
+
+	vqs[VDMABUF_VQ_SEND] = &vdmabuf->vqs[VDMABUF_VQ_SEND];
+	vqs[VDMABUF_VQ_RECV] = &vdmabuf->vqs[VDMABUF_VQ_RECV];
+	vdmabuf->vqs[VDMABUF_VQ_SEND].handle_kick = vhost_vdmabuf_handle_send_kick;
+	vdmabuf->vqs[VDMABUF_VQ_RECV].handle_kick = vhost_vdmabuf_handle_recv_kick;
+
+	vhost_dev_init(&vdmabuf->dev, vqs, ARRAY_SIZE(vdmabuf->vqs),
+		       UIO_MAXIOV, 0, 0, true, NULL);
+
+	INIT_LIST_HEAD(&vdmabuf->msg_list);
+	vhost_work_init(&vdmabuf->send_work, vhost_send_msg_work);
+	vdmabuf->vmid = task_pid_nr(current);
+	vdmabuf->kvm = find_kvm_instance(vdmabuf->vmid);
+	vhost_vdmabuf_add(vdmabuf);
+
+	mutex_init(&vdmabuf->evq->e_readlock);
+	spin_lock_init(&vdmabuf->evq->e_lock);
+
+	/* Initialize event queue */
+	INIT_LIST_HEAD(&vdmabuf->evq->e_list);
+	init_waitqueue_head(&vdmabuf->evq->e_wait);
+
+	/* resetting number of pending events */
+	vdmabuf->evq->pending = 0;
+	filp->private_data = vdmabuf;
+
+	return ret;
+}
+
+static void vhost_vdmabuf_flush(struct vhost_vdmabuf *vdmabuf)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(vdmabuf->vqs); i++)
+		if (vdmabuf->vqs[i].handle_kick)
+			vhost_poll_flush(&vdmabuf->vqs[i].poll);
+
+	vhost_work_flush(&vdmabuf->dev, &vdmabuf->send_work);
+}
+
+static int vhost_vdmabuf_release(struct inode *inode, struct file *filp)
+{
+	struct vhost_vdmabuf *vdmabuf = filp->private_data;
+	struct virtio_vdmabuf_event *e, *et;
+
+	if (!vhost_vdmabuf_del(vdmabuf))
+		return -EINVAL;
+
+	mutex_lock(&drv_info->g_mutex);
+
+	list_for_each_entry_safe(e, et, &vdmabuf->evq->e_list,
+				 link) {
+		list_del(&e->link);
+		kfree(e);
+		vdmabuf->evq->pending--;
+	}
+
+	vhost_vdmabuf_flush(vdmabuf);
+	vhost_dev_cleanup(&vdmabuf->dev);
+
+	kfree(vdmabuf->dev.vqs);
+	kvfree(vdmabuf);
+
+	filp->private_data = NULL;
+	mutex_unlock(&drv_info->g_mutex);
+
+	return 0;
+}
+
+static unsigned int vhost_vdmabuf_event_poll(struct file *filp,
+				    	     struct poll_table_struct *wait)
+{
+	struct vhost_vdmabuf *vdmabuf = filp->private_data;
+
+	poll_wait(filp, &vdmabuf->evq->e_wait, wait);
+
+	if (!list_empty(&vdmabuf->evq->e_list))
+		return POLLIN | POLLRDNORM;
+
+	return 0;
+}
+
+static ssize_t vhost_vdmabuf_event_read(struct file *filp, char __user *buf,
+			       		size_t cnt, loff_t *ofst)
+{
+	struct vhost_vdmabuf *vdmabuf = filp->private_data;
+	int ret;
+
+	if (task_pid_nr(current) != vdmabuf->vmid) {
+		dev_err(drv_info->dev, "current process cannot read events\n");
+		return -EPERM;
+	}
+
+	/* make sure user buffer can be written */
+	if (!access_ok(buf, sizeof(*buf))) {
+		dev_err(drv_info->dev, "user buffer can't be written.\n");
+		return -EINVAL;
+	}
+
+	ret = mutex_lock_interruptible(&vdmabuf->evq->e_readlock);
+	if (ret)
+		return ret;
+
+	for (;;) {
+		struct virtio_vdmabuf_event *e = NULL;
+
+		spin_lock_irq(&vdmabuf->evq->e_lock);
+		if (!list_empty(&vdmabuf->evq->e_list)) {
+			e = list_first_entry(&vdmabuf->evq->e_list,
+					     struct virtio_vdmabuf_event, link);
+			list_del(&e->link);
+		}
+		spin_unlock_irq(&vdmabuf->evq->e_lock);
+
+		if (!e) {
+			if (ret)
+				break;
+
+			if (filp->f_flags & O_NONBLOCK) {
+				ret = -EAGAIN;
+				break;
+			}
+
+			mutex_unlock(&vdmabuf->evq->e_readlock);
+			ret = wait_event_interruptible(vdmabuf->evq->e_wait,
+					!list_empty(&vdmabuf->evq->e_list));
+
+			if (ret == 0)
+				ret = mutex_lock_interruptible(
+						&vdmabuf->evq->e_readlock);
+
+			if (ret)
+				return ret;
+		} else {
+			unsigned int len = (sizeof(e->e_data.hdr) +
+					    e->e_data.hdr.size);
+
+			if (len > cnt - ret) {
+put_back_event:
+				spin_lock_irq(&vdmabuf->evq->e_lock);
+				list_add(&e->link, &vdmabuf->evq->e_list);
+				spin_unlock_irq(&vdmabuf->evq->e_lock);
+				break;
+			}
+
+			if (copy_to_user(buf + ret, &e->e_data.hdr,
+					 sizeof(e->e_data.hdr))) {
+				if (ret == 0)
+					ret = -EFAULT;
+
+				goto put_back_event;
+			}
+
+			ret += sizeof(e->e_data.hdr);
+
+			if (copy_to_user(buf + ret, e->e_data.data,
+					 e->e_data.hdr.size)) {
+
+				struct virtio_vdmabuf_e_hdr dummy_hdr = {0};
+
+				ret -= sizeof(e->e_data.hdr);
+
+				/* nullifying hdr of the event in user buffer */
+				if (copy_to_user(buf + ret, &dummy_hdr,
+						 sizeof(dummy_hdr)))
+					dev_err(drv_info->dev,
+					   "fail to nullify invalid hdr\n");
+
+				ret = -EFAULT;
+
+				goto put_back_event;
+			}
+
+			ret += e->e_data.hdr.size;
+
+			spin_lock_irq(&vdmabuf->evq->e_lock);
+			vdmabuf->evq->pending--;
+			spin_unlock_irq(&vdmabuf->evq->e_lock);
+			kfree(e);
+		}
+	}
+
+	mutex_unlock(&vdmabuf->evq->e_readlock);
+
+	return ret;
+}
+
+static int vhost_vdmabuf_start(struct vhost_vdmabuf *vdmabuf)
+{
+        struct vhost_virtqueue *vq;
+        int i, ret;
+
+	mutex_lock(&vdmabuf->dev.mutex);
+
+        ret = vhost_dev_check_owner(&vdmabuf->dev);
+        if (ret)
+                goto err;
+
+	for (i = 0; i < ARRAY_SIZE(vdmabuf->vqs); i++) {
+		vq = &vdmabuf->vqs[i];
+
+		mutex_lock(&vq->mutex);
+
+		if (!vhost_vq_access_ok(vq)) {
+			ret = -EFAULT;
+			goto err_vq;
+		}
+
+		if (!vhost_vq_get_backend(vq)) {
+			vhost_vq_set_backend(vq, vdmabuf);
+			ret = vhost_vq_init_access(vq);
+			if (ret)
+				goto err_vq;
+		}
+
+		mutex_unlock(&vq->mutex);
+	}
+
+	mutex_unlock(&vdmabuf->dev.mutex);
+        return 0;
+
+err_vq:
+        vhost_vq_set_backend(vq, NULL);
+        mutex_unlock(&vq->mutex);
+
+	for (i = 0; i < ARRAY_SIZE(vdmabuf->vqs); i++) {
+		vq = &vdmabuf->vqs[i];
+
+		mutex_lock(&vq->mutex);
+		vhost_vq_set_backend(vq, NULL);
+		mutex_unlock(&vq->mutex);
+	}
+
+err:
+	mutex_unlock(&vdmabuf->dev.mutex);
+        return ret;
+}
+
+static int vhost_vdmabuf_stop(struct vhost_vdmabuf *vdmabuf)
+{
+        struct vhost_virtqueue *vq;
+        int i, ret;
+
+	mutex_lock(&vdmabuf->dev.mutex);
+
+        ret = vhost_dev_check_owner(&vdmabuf->dev);
+        if (ret)
+                goto err;
+
+	for (i = 0; i < ARRAY_SIZE(vdmabuf->vqs); i++) {
+		vq = &vdmabuf->vqs[i];
+
+		mutex_lock(&vq->mutex);
+		vhost_vq_set_backend(vq, NULL);
+		mutex_unlock(&vq->mutex);
+	}
+
+err:
+	mutex_unlock(&vdmabuf->dev.mutex);
+        return ret;
+}
+
+static int vhost_vdmabuf_set_features(struct vhost_vdmabuf *vdmabuf,
+				      u64 features)
+{
+	struct vhost_virtqueue *vq;
+	int i;
+
+	if (features & ~VHOST_VDMABUF_FEATURES)
+		return -EOPNOTSUPP;
+
+	mutex_lock(&vdmabuf->dev.mutex);
+	if ((features & (1 << VHOST_F_LOG_ALL)) &&
+	    !vhost_log_access_ok(&vdmabuf->dev)) {
+		mutex_unlock(&vdmabuf->dev.mutex);
+		return -EFAULT;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(vdmabuf->vqs); i++) {
+		vq = &vdmabuf->vqs[i];
+		mutex_lock(&vq->mutex);
+		vq->acked_features = features;
+		mutex_unlock(&vq->mutex);
+	}
+
+	mutex_unlock(&vdmabuf->dev.mutex);
+	return 0;
+}
+
+/* wrapper ioctl for vhost interface control */
+static int vhost_core_ioctl(struct file *filp, unsigned int cmd,
+			    unsigned long param)
+{
+	struct vhost_vdmabuf *vdmabuf = filp->private_data;
+	void __user *argp = (void __user *)param;
+	u64 features;
+	int ret, start;
+
+	switch (cmd) {
+	case VHOST_GET_FEATURES:
+		features = VHOST_VDMABUF_FEATURES;
+		if (copy_to_user(argp, &features, sizeof(features)))
+			return -EFAULT;
+		return 0;
+	case VHOST_SET_FEATURES:
+		if (copy_from_user(&features, argp, sizeof(features)))
+			return -EFAULT;
+		return vhost_vdmabuf_set_features(vdmabuf, features);
+	case VHOST_VDMABUF_SET_RUNNING:
+		if (copy_from_user(&start, argp, sizeof(start)))
+                        return -EFAULT;
+
+		if (start)
+                	return vhost_vdmabuf_start(vdmabuf);
+                else
+                        return vhost_vdmabuf_stop(vdmabuf);
+	default:
+		mutex_lock(&vdmabuf->dev.mutex);
+		ret = vhost_dev_ioctl(&vdmabuf->dev, cmd, argp);
+		if (ret == -ENOIOCTLCMD) {
+			ret = vhost_vring_ioctl(&vdmabuf->dev, cmd, argp);
+		} else {
+			vhost_vdmabuf_flush(vdmabuf);
+		}
+		mutex_unlock(&vdmabuf->dev.mutex);
+	}
+
+	return ret;
+}
+
+/*
+ * ioctl - importing vdmabuf from guest OS
+ *
+ * user parameters:
+ *
+ *	virtio_vdmabuf_buf_id_t buf_id - vdmabuf ID of imported buffer
+ *	int flags - flags
+ *	int fd - file handle of	the imported buffer
+ *
+ */
+static int import_ioctl(struct file *filp, void *data)
+{
+	struct vhost_vdmabuf *vdmabuf = filp->private_data;
+	struct virtio_vdmabuf_import *attr = data;
+	struct virtio_vdmabuf_buf *imp;
+	int ret = 0;
+
+	mutex_lock(&vdmabuf->dev.mutex);
+
+	/* look for dmabuf for the id */
+	imp = virtio_vdmabuf_find_buf(drv_info, &attr->buf_id);
+	if (!imp || !imp->valid) {
+		mutex_unlock(&vdmabuf->dev.mutex);
+		dev_err(drv_info->dev,
+			"no valid buf found with id = %llu\n", attr->buf_id.id);
+		return -ENOENT;
+	}
+
+	/* only if mapped pages are not present */
+	if (!imp->pages_info->pages) {
+		ret = vhost_vdmabuf_map_pages(vdmabuf->vmid, imp->pages_info);
+		if (ret < 0) {
+			dev_err(drv_info->dev,
+				"failed to map guest pages\n");
+			goto fail_map;
+		}
+	}
+
+	attr->fd = vhost_vdmabuf_exp_fd(imp, attr->flags);
+	if (attr->fd < 0) {
+		dev_err(drv_info->dev, "failed to get file descriptor\n");
+		goto fail_import;
+	}
+
+	imp->imported = true;
+
+	mutex_unlock(&vdmabuf->dev.mutex);
+	goto success;
+
+fail_import:
+	/* not imported yet? */
+	if (!imp->imported) {
+		vhost_vdmabuf_unmap_pages(vdmabuf->vmid, imp->pages_info);
+		if (imp->dma_buf)
+			kfree(imp->dma_buf);
+
+		if (imp->sgt) {
+			sg_free_table(imp->sgt);
+			kfree(imp->sgt);
+			imp->sgt = NULL;
+		}
+	}
+
+fail_map:
+	/* Check if buffer is still valid and if not remove it
+	 * from imported list.
+	 */
+	if (!imp->valid && !imp->imported) {
+		virtio_vdmabuf_del_buf(drv_info, &imp->buf_id);
+		kfree(imp->priv);
+		kfree(imp->pages_info);
+		kfree(imp);
+	}
+
+	ret =  attr->fd;
+	mutex_unlock(&vdmabuf->dev.mutex);
+
+success:
+	return ret;
+}
+
+static int release_ioctl(struct file *filp, void *data)
+{
+	struct vhost_vdmabuf *vdmabuf = filp->private_data;
+	struct virtio_vdmabuf_import *attr = data;
+	struct virtio_vdmabuf_buf *imp;
+	virtio_vdmabuf_buf_id_t buf_id = attr->buf_id;
+	int *op;
+	int ret = 0;
+
+	op = kcalloc(1, sizeof(int) * 65, GFP_KERNEL);
+	if (!op)
+		return -ENOMEM;
+
+	imp = virtio_vdmabuf_find_buf(drv_info, &buf_id);
+	if (!imp)
+		return -EINVAL;
+
+	imp->imported = false;
+
+	memcpy(op, &imp->buf_id, sizeof(imp->buf_id));
+
+	ret = send_msg_to_guest(vdmabuf->vmid, VIRTIO_VDMABUF_CMD_DMABUF_REL,
+				op);
+	if (ret < 0) {
+		dev_err(drv_info->dev, "fail to send release cmd\n");
+		return ret;
+	}
+
+	return 0;
+}
+
+/*
+ * ioctl - querying various information of vdmabuf
+ *
+ * user parameters:
+ *
+ *	virtio_vdmabuf_buf_id_t buf_id - vdmabuf ID of imported buffer
+ *	unsigned long info - returned querying result
+ *
+ */
+static int query_ioctl(struct file *filp, void *data)
+{
+	struct virtio_vdmabuf_query *attr = data;
+	struct virtio_vdmabuf_buf *imp;
+	virtio_vdmabuf_buf_id_t buf_id = attr->buf_id;
+
+	/* query for imported dmabuf */
+	imp = virtio_vdmabuf_find_buf(drv_info, &buf_id);
+	if (!imp)
+		return -EINVAL;
+
+	switch (attr->item) {
+	/* size of dmabuf in byte */
+	case VIRTIO_VDMABUF_QUERY_SIZE:
+		if (imp->dma_buf) {
+			/* if local dma_buf is created (if it's
+			 * ever mapped), retrieve it directly
+			 * from struct dma_buf *
+			 */
+			attr->info = imp->dma_buf->size;
+		} else {
+			/* calcuate it from given nents, first_ofst
+			 * and last_len
+			 */
+			attr->info = ((imp->pages_info->nents)*PAGE_SIZE -
+				     (imp->pages_info->first_ofst) - PAGE_SIZE +
+				     (imp->pages_info->last_len));
+		}
+		break;
+
+	/* whether the buffer is used or not */
+	case VIRTIO_VDMABUF_QUERY_BUSY:
+		/* checks if it's used by importer */
+		attr->info = imp->imported;
+		break;
+
+	/* size of private info attached to buffer */
+	case VIRTIO_VDMABUF_QUERY_PRIV_INFO_SIZE:
+		attr->info = imp->sz_priv;
+		break;
+
+	/* copy private info attached to buffer */
+	case VIRTIO_VDMABUF_QUERY_PRIV_INFO:
+		if (imp->sz_priv > 0) {
+			int n;
+
+			n = copy_to_user((void __user *)attr->info,
+					imp->priv,
+					imp->sz_priv);
+			if (n != 0)
+				return -EINVAL;
+		}
+		break;
+
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static const struct virtio_vdmabuf_ioctl_desc vhost_vdmabuf_ioctls[] = {
+	VIRTIO_VDMABUF_IOCTL_DEF(VIRTIO_VDMABUF_IOCTL_IMPORT, import_ioctl, 0),
+	VIRTIO_VDMABUF_IOCTL_DEF(VIRTIO_VDMABUF_IOCTL_RELEASE, release_ioctl, 0),
+	VIRTIO_VDMABUF_IOCTL_DEF(VIRTIO_VDMABUF_IOCTL_QUERY, query_ioctl, 0),
+};
+
+static long vhost_vdmabuf_ioctl(struct file *filp, unsigned int cmd,
+				unsigned long param)
+{
+	const struct virtio_vdmabuf_ioctl_desc *ioctl;
+	virtio_vdmabuf_ioctl_t func;
+	unsigned int nr;
+	int ret;
+	char *kdata;
+
+	/* check if cmd is vhost's */
+	if (_IOC_TYPE(cmd) == VHOST_VIRTIO) {
+		ret = vhost_core_ioctl(filp, cmd, param);
+		return ret;
+	}
+
+	nr = _IOC_NR(cmd);
+
+	if (nr >= ARRAY_SIZE(vhost_vdmabuf_ioctls)) {
+		dev_err(drv_info->dev, "invalid ioctl\n");
+		return -EINVAL;
+	}
+
+	ioctl = &vhost_vdmabuf_ioctls[nr];
+
+	func = ioctl->func;
+
+	if (unlikely(!func)) {
+		dev_err(drv_info->dev, "no function\n");
+		return -EINVAL;
+	}
+
+	kdata = kmalloc(_IOC_SIZE(cmd), GFP_KERNEL);
+	if (!kdata)
+		return -ENOMEM;
+
+	if (copy_from_user(kdata, (void __user *)param,
+			   _IOC_SIZE(cmd)) != 0) {
+		dev_err(drv_info->dev,
+			"failed to copy args from userspace\n");
+		ret = -EFAULT;
+		goto ioctl_error;
+	}
+
+	ret = func(filp, kdata);
+
+	if (copy_to_user((void __user *)param, kdata,
+			 _IOC_SIZE(cmd)) != 0) {
+		dev_err(drv_info->dev,
+			"failed to copy args back to userspace\n");
+		ret = -EFAULT;
+		goto ioctl_error;
+	}
+
+ioctl_error:
+	kfree(kdata);
+	return ret;
+}
+
+static const struct file_operations vhost_vdmabuf_fops = {
+	.owner = THIS_MODULE,
+	.open = vhost_vdmabuf_open,
+	.release = vhost_vdmabuf_release,
+	.read = vhost_vdmabuf_event_read,
+	.poll = vhost_vdmabuf_event_poll,
+	.unlocked_ioctl = vhost_vdmabuf_ioctl,
+};
+
+static struct miscdevice vhost_vdmabuf_miscdev = {
+	.minor = MISC_DYNAMIC_MINOR,
+	.name = "vhost-vdmabuf",
+	.fops = &vhost_vdmabuf_fops,
+};
+
+static int __init vhost_vdmabuf_init(void)
+{
+	int ret = 0;
+
+	ret = misc_register(&vhost_vdmabuf_miscdev);
+	if (ret) {
+		pr_err("vhost-vdmabuf: driver can't be registered\n");
+		return ret;
+	}
+
+	dma_coerce_mask_and_coherent(vhost_vdmabuf_miscdev.this_device,
+				     DMA_BIT_MASK(64));
+
+	drv_info = kcalloc(1, sizeof(*drv_info), GFP_KERNEL);
+	if (!drv_info) {
+		misc_deregister(&vhost_vdmabuf_miscdev);
+		return -ENOMEM;
+	}
+
+	drv_info->dev = vhost_vdmabuf_miscdev.this_device;
+
+	hash_init(drv_info->buf_list);
+	mutex_init(&drv_info->g_mutex);
+
+	INIT_LIST_HEAD(&drv_info->head_vdmabuf_list);
+	INIT_LIST_HEAD(&drv_info->kvm_instances);
+
+	drv_info->kvm_notifier.notifier_call = vhost_vdmabuf_get_kvm;
+	ret = kvm_vm_register_notifier(&drv_info->kvm_notifier);
+
+	return ret;
+}
+
+static void __exit vhost_vdmabuf_deinit(void)
+{
+	misc_deregister(&vhost_vdmabuf_miscdev);
+	vhost_vdmabuf_del_all();
+
+	kvm_vm_unregister_notifier(&drv_info->kvm_notifier);
+	kfree(drv_info);
+	drv_info = NULL;
+}
+
+module_init(vhost_vdmabuf_init);
+module_exit(vhost_vdmabuf_deinit);
+
+MODULE_DESCRIPTION("Vhost Vdmabuf Driver");
+MODULE_LICENSE("GPL and additional rights");
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index c998860d7bbc..87e0d1b8cc76 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -150,4 +150,7 @@
 /* Get the valid iova range */
 #define VHOST_VDPA_GET_IOVA_RANGE	_IOR(VHOST_VIRTIO, 0x78, \
 					     struct vhost_vdpa_iova_range)
+
+/* VHOST_VDMABUF specific defines */
+#define VHOST_VDMABUF_SET_RUNNING       _IOW(VHOST_VIRTIO, 0x80, int)
 #endif
-- 
2.26.2


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [RFC v3 3/3] vhost: Add Vdmabuf backend
@ 2021-02-03  7:35   ` Vivek Kasireddy
  0 siblings, 0 replies; 57+ messages in thread
From: Vivek Kasireddy @ 2021-02-03  7:35 UTC (permalink / raw)
  To: virtualization, dri-devel
  Cc: dongwon.kim, daniel.vetter, Vivek Kasireddy, kraxel,
	daniel.vetter, christian.koenig, linux-media

This backend acts as the counterpart to the Vdmabuf Virtio frontend.
When it receives a new export event from the frontend, it raises an
event to alert the Qemu UI/userspace. Qemu then "imports" this buffer
using the Unique ID.

As part of the import step, a new dmabuf is created on the Host using
the page information obtained from the Guest. The fd associated with
this dmabuf is made available to Qemu UI/userspace which then creates
a texture from it for the purpose of displaying it.

Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
---
 drivers/vhost/Kconfig      |    9 +
 drivers/vhost/Makefile     |    3 +
 drivers/vhost/vdmabuf.c    | 1446 ++++++++++++++++++++++++++++++++++++
 include/uapi/linux/vhost.h |    3 +
 4 files changed, 1461 insertions(+)
 create mode 100644 drivers/vhost/vdmabuf.c

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 587fbae06182..9a99cc2611ca 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -89,4 +89,13 @@ config VHOST_CROSS_ENDIAN_LEGACY
 
 	  If unsure, say "N".
 
+config VHOST_VDMABUF
+	bool "Vhost backend for the Vdmabuf driver"
+	depends on KVM && EVENTFD
+	select VHOST
+	default n
+	help
+	  This driver works in pair with the Virtio Vdmabuf frontend. It can
+	  be used to create a dmabuf using the pages shared by the Guest.
+
 endif
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index f3e1897cce85..5c2cea4a7eaf 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -17,3 +17,6 @@ obj-$(CONFIG_VHOST)	+= vhost.o
 
 obj-$(CONFIG_VHOST_IOTLB) += vhost_iotlb.o
 vhost_iotlb-y := iotlb.o
+
+obj-$(CONFIG_VHOST_VDMABUF) += vhost_vdmabuf.o
+vhost_vdmabuf-y := vdmabuf.o
diff --git a/drivers/vhost/vdmabuf.c b/drivers/vhost/vdmabuf.c
new file mode 100644
index 000000000000..1d6e9bcf6648
--- /dev/null
+++ b/drivers/vhost/vdmabuf.c
@@ -0,0 +1,1446 @@
+// SPDX-License-Identifier: (MIT OR GPL-2.0)
+
+/*
+ * Copyright © 2021 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Dongwon Kim <dongwon.kim@intel.com>
+ *    Mateusz Polrola <mateusz.polrola@gmail.com>
+ *    Vivek Kasireddy <vivek.kasireddy@intel.com>
+ */
+
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/miscdevice.h>
+#include <linux/workqueue.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/hashtable.h>
+#include <linux/uaccess.h>
+#include <linux/poll.h>
+#include <linux/dma-buf.h>
+#include <linux/vhost.h>
+#include <linux/vfio.h>
+#include <linux/kvm_host.h>
+#include <linux/virtio_vdmabuf.h>
+
+#include "vhost.h"
+
+#define REFS_PER_PAGE (PAGE_SIZE/sizeof(long))
+
+enum {
+	VHOST_VDMABUF_FEATURES = VHOST_FEATURES,
+};
+
+static struct virtio_vdmabuf_info *drv_info;
+
+struct kvm_instance {
+	struct kvm *kvm;
+	struct list_head link;
+};
+
+struct vhost_vdmabuf {
+	struct vhost_dev dev;
+	struct vhost_virtqueue vqs[VDMABUF_VQ_MAX];
+	struct vhost_work send_work;
+	struct virtio_vdmabuf_event_queue *evq;
+	u64 vmid;
+
+	struct list_head msg_list;
+	struct list_head list;
+	struct kvm *kvm;
+};
+
+static inline void vhost_vdmabuf_add(struct vhost_vdmabuf *new)
+{
+	list_add_tail(&new->list, &drv_info->head_vdmabuf_list);
+}
+
+static inline struct vhost_vdmabuf *vhost_vdmabuf_find(u64 vmid)
+{
+	struct vhost_vdmabuf *found;
+
+	list_for_each_entry(found, &drv_info->head_vdmabuf_list, list)
+		if (found->vmid == vmid)
+			return found;
+
+	return NULL;
+}
+
+static inline bool vhost_vdmabuf_del(struct vhost_vdmabuf *vdmabuf)
+{
+	struct vhost_vdmabuf *iter, *temp;
+
+	list_for_each_entry_safe(iter, temp,
+				 &drv_info->head_vdmabuf_list,
+				 list)
+		if (iter == vdmabuf) {
+			list_del(&iter->list);
+			return true;
+		}
+
+	return false;
+}
+
+static inline void vhost_vdmabuf_del_all(void)
+{
+	struct vhost_vdmabuf *iter, *temp;
+
+	list_for_each_entry_safe(iter, temp,
+				 &drv_info->head_vdmabuf_list,
+				 list) {
+		list_del(&iter->list);
+		kfree(iter);
+	}
+}
+
+static void *map_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
+{
+	struct kvm_host_map map;
+	int ret;
+
+	ret = kvm_vcpu_map(vcpu, gpa_to_gfn(gpa), &map);
+	if (ret < 0)
+		return ERR_PTR(ret);
+	else
+		return map.hva;
+}
+
+static void unmap_hva(struct kvm_vcpu *vcpu, gpa_t hva)
+{
+	struct page *page = virt_to_page(hva);
+	struct kvm_host_map map;
+
+	map.hva = (void *)hva;
+	map.page = page;
+
+	kvm_vcpu_unmap(vcpu, &map, true);
+}
+
+/* mapping guest's pages for the vdmabuf */
+static int
+vhost_vdmabuf_map_pages(u64 vmid,
+		        struct virtio_vdmabuf_shared_pages *pages_info)
+{
+	struct vhost_vdmabuf *vdmabuf = vhost_vdmabuf_find(vmid);
+	struct kvm_vcpu *vcpu;
+	void *paddr;
+	int npgs = REFS_PER_PAGE;
+	int last_nents, n_l2refs;
+	int i, j = 0, k = 0;
+
+	if (!vdmabuf || !vdmabuf->kvm || !pages_info || pages_info->pages)
+		return -EINVAL;
+
+	vcpu = kvm_get_vcpu_by_id(vdmabuf->kvm, 0);
+	if (!vcpu)
+		return -EINVAL;
+
+	last_nents = (pages_info->nents - 1) % npgs + 1;
+	n_l2refs = (pages_info->nents / npgs) + ((last_nents > 0) ? 1 : 0) -
+		   (last_nents == npgs);
+
+	pages_info->pages = kcalloc(pages_info->nents, sizeof(struct page *),
+				    GFP_KERNEL);
+	if (!pages_info->pages)
+		goto fail_page_alloc;
+
+	pages_info->l2refs = kcalloc(n_l2refs, sizeof(gpa_t *), GFP_KERNEL);
+	if (!pages_info->l2refs)
+		goto fail_l2refs;
+
+	pages_info->l3refs = (gpa_t *)map_gpa(vcpu, pages_info->ref);
+	if (IS_ERR(pages_info->l3refs))
+		goto fail_l3refs;
+
+	for (i = 0; i < n_l2refs; i++) {
+		pages_info->l2refs[i] = (gpa_t *)map_gpa(vcpu,
+							 pages_info->l3refs[i]);
+
+		if (IS_ERR(pages_info->l2refs[i]))
+			goto fail_mapping_l2;
+
+		/* last level-2 ref */
+		if (i == n_l2refs - 1)
+			npgs = last_nents;
+
+		for (j = 0; j < npgs; j++) {
+			paddr = map_gpa(vcpu, pages_info->l2refs[i][j]);
+			if (IS_ERR(paddr))
+				goto fail_mapping_l1;
+
+			pages_info->pages[k] = virt_to_page(paddr);
+			k++;
+		}
+		unmap_hva(vcpu, pages_info->l3refs[i]);
+	}
+
+	unmap_hva(vcpu, pages_info->ref);
+
+	return 0;
+
+fail_mapping_l1:
+	for (k = 0; k < j; k++)
+		unmap_hva(vcpu, pages_info->l2refs[i][k]);
+
+fail_mapping_l2:
+	for (j = 0; j < i; j++) {
+		for (k = 0; k < REFS_PER_PAGE; k++)
+			unmap_hva(vcpu, pages_info->l2refs[i][k]);
+	}
+
+	unmap_hva(vcpu, pages_info->l3refs[i]);
+	unmap_hva(vcpu, pages_info->ref);
+
+fail_l3refs:
+	kfree(pages_info->l2refs);
+
+fail_l2refs:
+	kfree(pages_info->pages);
+
+fail_page_alloc:
+	return -ENOMEM;
+}
+
+/* unmapping mapped pages */
+static int
+vhost_vdmabuf_unmap_pages(u64 vmid,
+			  struct virtio_vdmabuf_shared_pages *pages_info)
+{
+	struct vhost_vdmabuf *vdmabuf = vhost_vdmabuf_find(vmid);
+	struct kvm_vcpu *vcpu;
+	int last_nents = (pages_info->nents - 1) % REFS_PER_PAGE + 1;
+	int n_l2refs = (pages_info->nents / REFS_PER_PAGE) +
+		       ((last_nents > 0) ? 1 : 0) -
+		       (last_nents == REFS_PER_PAGE);
+	int i, j;
+
+	if (!vdmabuf || !vdmabuf->kvm || !pages_info || pages_info->pages)
+		return -EINVAL;
+
+	vcpu = kvm_get_vcpu_by_id(vdmabuf->kvm, 0);
+	if (!vcpu)
+		return -EINVAL;
+
+	for (i = 0; i < n_l2refs - 1; i++) {
+		for (j = 0; j < REFS_PER_PAGE; j++)
+			unmap_hva(vcpu, pages_info->l2refs[i][j]);
+	}
+
+	for (j = 0; j < last_nents; j++)
+		unmap_hva(vcpu, pages_info->l2refs[i][j]);
+
+	kfree(pages_info->l2refs);
+	kfree(pages_info->pages);
+	pages_info->pages = NULL;
+
+	return 0;
+}
+
+/* create sg_table with given pages and other parameters */
+static struct sg_table *new_sgt(struct page **pgs,
+				int first_ofst, int last_len,
+				int nents)
+{
+	struct sg_table *sgt;
+	struct scatterlist *sgl;
+	int i, ret;
+
+	sgt = kmalloc(sizeof(struct sg_table), GFP_KERNEL);
+	if (!sgt)
+		return NULL;
+
+	ret = sg_alloc_table(sgt, nents, GFP_KERNEL);
+	if (ret) {
+		kfree(sgt);
+		return NULL;
+	}
+
+	sgl = sgt->sgl;
+	sg_set_page(sgl, pgs[0], PAGE_SIZE-first_ofst, first_ofst);
+
+	for (i = 1; i < nents-1; i++) {
+		sgl = sg_next(sgl);
+		sg_set_page(sgl, pgs[i], PAGE_SIZE, 0);
+	}
+
+	/* more than 1 page */
+	if (nents > 1) {
+		sgl = sg_next(sgl);
+		sg_set_page(sgl, pgs[i], last_len, 0);
+	}
+
+	return sgt;
+}
+
+static struct sg_table
+*vhost_vdmabuf_dmabuf_map(struct dma_buf_attachment *attachment,
+			  enum dma_data_direction dir)
+{
+	struct virtio_vdmabuf_buf *imp;
+
+	if (!attachment->dmabuf || !attachment->dmabuf->priv)
+		return NULL;
+
+	imp = (struct virtio_vdmabuf_buf *)attachment->dmabuf->priv;
+
+	/* if buffer has never been mapped */
+	if (!imp->sgt) {
+		imp->sgt = new_sgt(imp->pages_info->pages,
+				   imp->pages_info->first_ofst,
+				   imp->pages_info->last_len,
+				   imp->pages_info->nents);
+
+		if (!imp->sgt)
+			return NULL;
+	}
+
+	if (!dma_map_sg(attachment->dev, imp->sgt->sgl,
+			imp->sgt->nents, dir)) {
+		sg_free_table(imp->sgt);
+		kfree(imp->sgt);
+		return NULL;
+	}
+
+	return imp->sgt;
+}
+
+static void
+vhost_vdmabuf_dmabuf_unmap(struct dma_buf_attachment *attachment,
+	   	           struct sg_table *sg,
+			   enum dma_data_direction dir)
+{
+	dma_unmap_sg(attachment->dev, sg->sgl, sg->nents, dir);
+}
+
+static int vhost_vdmabuf_dmabuf_mmap(struct dma_buf *dmabuf,
+				     struct vm_area_struct *vma)
+{
+	struct virtio_vdmabuf_buf *imp;
+	u64 uaddr;
+	int i, err;
+
+	if (!dmabuf->priv)
+		return -EINVAL;
+
+	imp = (struct virtio_vdmabuf_buf *)dmabuf->priv;
+
+	if (!imp->pages_info)
+		return -EINVAL;
+
+	vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+
+	uaddr = vma->vm_start;
+	for (i = 0; i < imp->pages_info->nents; i++) {
+		err = vm_insert_page(vma, uaddr,
+				     imp->pages_info->pages[i]);
+		if (err)
+			return err;
+
+		uaddr += PAGE_SIZE;
+	}
+
+	return 0;
+}
+
+static int vhost_vdmabuf_dmabuf_vmap(struct dma_buf *dmabuf,
+				     struct dma_buf_map *map)
+{
+	struct virtio_vdmabuf_buf *imp;
+	void *addr;
+
+	if (!dmabuf->priv)
+		return -EINVAL;
+
+	imp = (struct virtio_vdmabuf_buf *)dmabuf->priv;
+
+	if (!imp->pages_info)
+		return -EINVAL;
+
+	addr = vmap(imp->pages_info->pages, imp->pages_info->nents,
+                    0, PAGE_KERNEL);
+	if (IS_ERR(addr))
+		return PTR_ERR(addr);
+
+	return 0;
+}
+
+static void vhost_vdmabuf_dmabuf_release(struct dma_buf *dma_buf)
+{
+	struct virtio_vdmabuf_buf *imp;
+
+	if (!dma_buf->priv)
+		return;
+
+	imp = (struct virtio_vdmabuf_buf *)dma_buf->priv;
+	imp->dma_buf = NULL;
+	imp->valid = false;
+
+	vhost_vdmabuf_unmap_pages(imp->vmid, imp->pages_info);
+	virtio_vdmabuf_del_buf(drv_info, &imp->buf_id);
+
+	if (imp->sgt) {
+		sg_free_table(imp->sgt);
+		kfree(imp->sgt);
+		imp->sgt = NULL;
+	}
+
+	kfree(imp->priv);
+	kfree(imp->pages_info);
+	kfree(imp);
+}
+
+static const struct dma_buf_ops vhost_vdmabuf_dmabuf_ops = {
+	.map_dma_buf = vhost_vdmabuf_dmabuf_map,
+	.unmap_dma_buf = vhost_vdmabuf_dmabuf_unmap,
+	.release = vhost_vdmabuf_dmabuf_release,
+	.mmap = vhost_vdmabuf_dmabuf_mmap,
+	.vmap = vhost_vdmabuf_dmabuf_vmap,
+};
+
+/* exporting dmabuf as fd */
+static int vhost_vdmabuf_exp_fd(struct virtio_vdmabuf_buf *imp, int flags)
+{
+	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
+
+	exp_info.ops = &vhost_vdmabuf_dmabuf_ops;
+
+	/* multiple of PAGE_SIZE, not considering offset */
+	exp_info.size = imp->pages_info->nents * PAGE_SIZE;
+	exp_info.flags = 0;
+	exp_info.priv = imp;
+
+	if (!imp->dma_buf) {
+		imp->dma_buf = dma_buf_export(&exp_info);
+		if (IS_ERR_OR_NULL(imp->dma_buf)) {
+			imp->dma_buf = NULL;
+			return -EINVAL;
+		}
+	}
+
+	return dma_buf_fd(imp->dma_buf, flags);
+}
+
+static int vhost_vdmabuf_add_event(struct vhost_vdmabuf *vdmabuf,
+				   struct virtio_vdmabuf_buf *buf_info)
+{
+	struct virtio_vdmabuf_event *e_oldest, *e_new;
+	struct virtio_vdmabuf_event_queue *evq = vdmabuf->evq;
+	unsigned long irqflags;
+
+	e_new = kzalloc(sizeof(*e_new), GFP_KERNEL);
+	if (!e_new)
+		return -ENOMEM;
+
+	e_new->e_data.hdr.buf_id = buf_info->buf_id;
+	e_new->e_data.data = (void *)buf_info->priv;
+	e_new->e_data.hdr.size = buf_info->sz_priv;
+
+	spin_lock_irqsave(&evq->e_lock, irqflags);
+
+	/* check current number of event then if it hits the max num (32)
+	 * then remove the oldest event in the list
+	 */
+	if (evq->pending > 31) {
+		e_oldest = list_first_entry(&evq->e_list,
+					    struct virtio_vdmabuf_event, link);
+		list_del(&e_oldest->link);
+		evq->pending--;
+		kfree(e_oldest);
+	}
+
+	list_add_tail(&e_new->link, &evq->e_list);
+
+	evq->pending++;
+
+	wake_up_interruptible(&evq->e_wait);
+	spin_unlock_irqrestore(&evq->e_lock, irqflags);
+
+	return 0;
+}
+
+static int send_msg_to_guest(u64 vmid, enum virtio_vdmabuf_cmd cmd, int *op)
+{
+	struct virtio_vdmabuf_msg *msg;
+	struct vhost_vdmabuf *vdmabuf;
+
+	vdmabuf = vhost_vdmabuf_find(vmid);
+	if (!vdmabuf) {
+		dev_err(drv_info->dev,
+			"can't find vdmabuf for : vmid = %llu\n", vmid);
+		return -EINVAL;
+	}
+
+	if (cmd != VIRTIO_VDMABUF_CMD_DMABUF_REL)
+		return -EINVAL;
+
+	msg = kvcalloc(1, sizeof(struct virtio_vdmabuf_msg),
+		       GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	memcpy(&msg->op[0], &op[0], 8 * sizeof(int));
+	msg->cmd = cmd;
+
+	list_add_tail(&msg->list, &vdmabuf->msg_list);
+	vhost_work_queue(&vdmabuf->dev, &vdmabuf->send_work);
+
+	return 0;
+}
+
+static int register_exported(struct vhost_vdmabuf *vdmabuf,
+			     virtio_vdmabuf_buf_id_t *buf_id, int *ops)
+{
+	struct virtio_vdmabuf_buf *imp;
+	int ret;
+
+	imp = kcalloc(1, sizeof(*imp), GFP_KERNEL);
+	if (!imp)
+		return -ENOMEM;
+
+	imp->pages_info = kcalloc(1, sizeof(struct virtio_vdmabuf_shared_pages),
+				  GFP_KERNEL);
+	if (!imp->pages_info) {
+		kfree(imp);
+		return -ENOMEM;
+	}
+
+	imp->sz_priv = ops[VIRTIO_VDMABUF_PRIVATE_DATA_SIZE];
+	if (imp->sz_priv) {
+		imp->priv = kcalloc(1, ops[VIRTIO_VDMABUF_PRIVATE_DATA_SIZE],
+				    GFP_KERNEL);
+		if (!imp->priv) {
+			kfree(imp->pages_info);
+			kfree(imp);
+			return -ENOMEM;
+		}
+	}
+
+	memcpy(&imp->buf_id, buf_id, sizeof(*buf_id));
+
+	imp->pages_info->nents = ops[VIRTIO_VDMABUF_NUM_PAGES_SHARED];
+	imp->pages_info->first_ofst = ops[VIRTIO_VDMABUF_FIRST_PAGE_DATA_OFFSET];
+	imp->pages_info->last_len = ops[VIRTIO_VDMABUF_LAST_PAGE_DATA_LENGTH];
+	imp->pages_info->ref = *(gpa_t *)&ops[VIRTIO_VDMABUF_REF_ADDR_UPPER_32BIT];
+	imp->vmid = vdmabuf->vmid;
+	imp->valid = true;
+
+	virtio_vdmabuf_add_buf(drv_info, imp);
+
+	/* transferring private data */
+	memcpy(imp->priv, &ops[VIRTIO_VDMABUF_PRIVATE_DATA_START],
+	       ops[VIRTIO_VDMABUF_PRIVATE_DATA_SIZE]);
+
+	/* generate import event */
+	ret = vhost_vdmabuf_add_event(vdmabuf, imp);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static void send_to_recvq(struct vhost_vdmabuf *vdmabuf,
+			  struct vhost_virtqueue *vq)
+{
+	struct virtio_vdmabuf_msg *msg;
+	int head, in, out, in_size;
+	bool added = false;
+	int ret;
+
+	mutex_lock(&vq->mutex);
+
+	if (!vhost_vq_get_backend(vq))
+		goto out;
+
+	vhost_disable_notify(&vdmabuf->dev, vq);
+
+	for (;;) {
+		if (list_empty(&vdmabuf->msg_list))
+			break;
+
+		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+					 &out, &in, NULL, NULL);
+
+		if (head < 0 || head == vq->num)
+			break;
+
+		in_size = iov_length(&vq->iov[out], in);
+		if (in_size != sizeof(struct virtio_vdmabuf_msg)) {
+			dev_err(drv_info->dev, "rx msg with wrong size\n");
+			break;
+		}
+
+		msg = list_first_entry(&vdmabuf->msg_list,
+				       struct virtio_vdmabuf_msg, list);
+		list_del_init(&msg->list);
+
+		ret = __copy_to_user(vq->iov[out].iov_base, msg,
+				     sizeof(struct virtio_vdmabuf_msg));
+		if (ret) {
+			dev_err(drv_info->dev,
+				"fail to copy tx msg\n");
+			break;
+		}
+
+		vhost_add_used(vq, head, in_size);
+		added = true;
+
+		//kfree(msg);
+	}
+
+	vhost_enable_notify(&vdmabuf->dev, vq);
+	if (added)
+		vhost_signal(&vdmabuf->dev, vq);
+out:
+	mutex_unlock(&vq->mutex);
+}
+
+static void vhost_send_msg_work(struct vhost_work *work)
+{
+	struct vhost_vdmabuf *vdmabuf = container_of(work,
+					             struct vhost_vdmabuf,
+					             send_work);
+	struct vhost_virtqueue *vq = &vdmabuf->vqs[VDMABUF_VQ_RECV];
+
+	send_to_recvq(vdmabuf, vq);
+}
+
+/* parse incoming message from a guest */
+static int parse_msg(struct vhost_vdmabuf *vdmabuf,
+		     struct virtio_vdmabuf_msg *msg)
+{
+	virtio_vdmabuf_buf_id_t *buf_id;
+	struct virtio_vdmabuf_msg *vmid_msg;
+	int ret = 0;
+
+	switch (msg->cmd) {
+	case VIRTIO_VDMABUF_CMD_EXPORT:
+		buf_id = (virtio_vdmabuf_buf_id_t *)msg->op;
+		ret = register_exported(vdmabuf, buf_id, msg->op);
+
+		break;
+	case VIRTIO_VDMABUF_CMD_NEED_VMID:
+		vmid_msg = kvcalloc(1, sizeof(struct virtio_vdmabuf_msg),
+				    GFP_KERNEL);
+		if (!vmid_msg) {
+			ret = -ENOMEM;
+			break;
+		}
+
+		vmid_msg->cmd = msg->cmd;
+		vmid_msg->op[0] = vdmabuf->vmid;
+		list_add_tail(&vmid_msg->list, &vdmabuf->msg_list);
+		vhost_work_queue(&vdmabuf->dev, &vdmabuf->send_work);
+
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+
+	return ret;
+}
+
+static void vhost_vdmabuf_handle_send_kick(struct vhost_work *work)
+{
+	struct vhost_virtqueue *vq = container_of(work,
+						  struct vhost_virtqueue,
+						  poll.work);
+	struct vhost_vdmabuf *vdmabuf = container_of(vq->dev,
+					      	     struct vhost_vdmabuf,
+					      	     dev);
+	struct virtio_vdmabuf_msg msg;
+	int head, in, out, in_size;
+	bool added = false;
+	int ret;
+
+	mutex_lock(&vq->mutex);
+
+	if (!vhost_vq_get_backend(vq))
+		goto out;
+
+	vhost_disable_notify(&vdmabuf->dev, vq);
+
+	/* Make sure we will process all pending requests */
+	for (;;) {
+		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+					 &out, &in, NULL, NULL);
+
+		if (head < 0 || head == vq->num)
+			break;
+
+		in_size = iov_length(&vq->iov[in], out);
+		if (in_size != sizeof(struct virtio_vdmabuf_msg)) {
+			dev_err(drv_info->dev, "rx msg with wrong size\n");
+			break;
+		}
+
+		if (__copy_from_user(&msg, vq->iov[in].iov_base, in_size)) {
+			dev_err(drv_info->dev,
+				"err: can't get the msg from vq\n");
+			break;
+		}
+
+		ret = parse_msg(vdmabuf, &msg);
+		if (ret) {
+			dev_err(drv_info->dev,
+				"msg parse error: %d",
+				ret);
+			dev_err(drv_info->dev,
+				" cmd: %d\n", msg.cmd);
+
+			break;
+		}
+
+		vhost_add_used(vq, head, in_size);
+		added = true;
+	}
+
+	vhost_enable_notify(&vdmabuf->dev, vq);
+	if (added)
+		vhost_signal(&vdmabuf->dev, vq);
+out:
+	mutex_unlock(&vq->mutex);
+}
+
+static void vhost_vdmabuf_handle_recv_kick(struct vhost_work *work)
+{
+	struct vhost_virtqueue *vq = container_of(work,
+						  struct vhost_virtqueue,
+						  poll.work);
+	struct vhost_vdmabuf *vdmabuf = container_of(vq->dev,
+					      	     struct vhost_vdmabuf,
+					      	     dev);
+
+	send_to_recvq(vdmabuf, vq);
+}
+
+static int vhost_vdmabuf_get_kvm(struct notifier_block *nb,
+				 unsigned long event, void *data)
+{
+	struct kvm_instance *instance;
+	struct virtio_vdmabuf_info *drv = container_of(nb,
+						struct virtio_vdmabuf_info,
+						kvm_notifier);
+
+	instance = kzalloc(sizeof(*instance), GFP_KERNEL);
+	if (instance && event == KVM_EVENT_CREATE_VM) {
+		if (data) {
+			instance->kvm = data;
+			list_add_tail(&instance->link,
+				      &drv->kvm_instances);
+		}
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct kvm *find_kvm_instance(u64 vmid)
+{
+	struct kvm_instance *instance, *tmp;
+	struct kvm *kvm = NULL;
+
+	list_for_each_entry_safe(instance, tmp, &drv_info->kvm_instances,
+                                 link) {
+		if (instance->kvm->userspace_pid == vmid) {
+			kvm = instance->kvm;
+
+			list_del(&instance->link);
+			kfree(instance);
+			break;
+		}
+	}
+
+	return kvm;
+}
+
+static int vhost_vdmabuf_open(struct inode *inode, struct file *filp)
+{
+	struct vhost_vdmabuf *vdmabuf;
+	struct vhost_virtqueue **vqs;
+	int ret = 0;
+
+	if (!drv_info) {
+		pr_err("vhost-vdmabuf: can't open misc device\n");
+		return -EINVAL;
+	}
+
+	vdmabuf = kzalloc(sizeof(*vdmabuf), GFP_KERNEL |
+			   __GFP_RETRY_MAYFAIL);
+	if (!vdmabuf)
+		return -ENOMEM;
+
+	vqs = kmalloc_array(ARRAY_SIZE(vdmabuf->vqs), sizeof(*vqs),
+			    GFP_KERNEL);
+	if (!vqs) {
+		kfree(vdmabuf);
+		return -ENOMEM;
+	}
+
+	vdmabuf->evq = kcalloc(1, sizeof(*(vdmabuf->evq)), GFP_KERNEL);
+	if (!vdmabuf->evq) {
+		kfree(vdmabuf);
+		kfree(vqs);
+		return -ENOMEM;
+	}
+
+	vqs[VDMABUF_VQ_SEND] = &vdmabuf->vqs[VDMABUF_VQ_SEND];
+	vqs[VDMABUF_VQ_RECV] = &vdmabuf->vqs[VDMABUF_VQ_RECV];
+	vdmabuf->vqs[VDMABUF_VQ_SEND].handle_kick = vhost_vdmabuf_handle_send_kick;
+	vdmabuf->vqs[VDMABUF_VQ_RECV].handle_kick = vhost_vdmabuf_handle_recv_kick;
+
+	vhost_dev_init(&vdmabuf->dev, vqs, ARRAY_SIZE(vdmabuf->vqs),
+		       UIO_MAXIOV, 0, 0, true, NULL);
+
+	INIT_LIST_HEAD(&vdmabuf->msg_list);
+	vhost_work_init(&vdmabuf->send_work, vhost_send_msg_work);
+	vdmabuf->vmid = task_pid_nr(current);
+	vdmabuf->kvm = find_kvm_instance(vdmabuf->vmid);
+	vhost_vdmabuf_add(vdmabuf);
+
+	mutex_init(&vdmabuf->evq->e_readlock);
+	spin_lock_init(&vdmabuf->evq->e_lock);
+
+	/* Initialize event queue */
+	INIT_LIST_HEAD(&vdmabuf->evq->e_list);
+	init_waitqueue_head(&vdmabuf->evq->e_wait);
+
+	/* resetting number of pending events */
+	vdmabuf->evq->pending = 0;
+	filp->private_data = vdmabuf;
+
+	return ret;
+}
+
+static void vhost_vdmabuf_flush(struct vhost_vdmabuf *vdmabuf)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(vdmabuf->vqs); i++)
+		if (vdmabuf->vqs[i].handle_kick)
+			vhost_poll_flush(&vdmabuf->vqs[i].poll);
+
+	vhost_work_flush(&vdmabuf->dev, &vdmabuf->send_work);
+}
+
+static int vhost_vdmabuf_release(struct inode *inode, struct file *filp)
+{
+	struct vhost_vdmabuf *vdmabuf = filp->private_data;
+	struct virtio_vdmabuf_event *e, *et;
+
+	if (!vhost_vdmabuf_del(vdmabuf))
+		return -EINVAL;
+
+	mutex_lock(&drv_info->g_mutex);
+
+	list_for_each_entry_safe(e, et, &vdmabuf->evq->e_list,
+				 link) {
+		list_del(&e->link);
+		kfree(e);
+		vdmabuf->evq->pending--;
+	}
+
+	vhost_vdmabuf_flush(vdmabuf);
+	vhost_dev_cleanup(&vdmabuf->dev);
+
+	kfree(vdmabuf->dev.vqs);
+	kvfree(vdmabuf);
+
+	filp->private_data = NULL;
+	mutex_unlock(&drv_info->g_mutex);
+
+	return 0;
+}
+
+static unsigned int vhost_vdmabuf_event_poll(struct file *filp,
+				    	     struct poll_table_struct *wait)
+{
+	struct vhost_vdmabuf *vdmabuf = filp->private_data;
+
+	poll_wait(filp, &vdmabuf->evq->e_wait, wait);
+
+	if (!list_empty(&vdmabuf->evq->e_list))
+		return POLLIN | POLLRDNORM;
+
+	return 0;
+}
+
+static ssize_t vhost_vdmabuf_event_read(struct file *filp, char __user *buf,
+			       		size_t cnt, loff_t *ofst)
+{
+	struct vhost_vdmabuf *vdmabuf = filp->private_data;
+	int ret;
+
+	if (task_pid_nr(current) != vdmabuf->vmid) {
+		dev_err(drv_info->dev, "current process cannot read events\n");
+		return -EPERM;
+	}
+
+	/* make sure user buffer can be written */
+	if (!access_ok(buf, sizeof(*buf))) {
+		dev_err(drv_info->dev, "user buffer can't be written.\n");
+		return -EINVAL;
+	}
+
+	ret = mutex_lock_interruptible(&vdmabuf->evq->e_readlock);
+	if (ret)
+		return ret;
+
+	for (;;) {
+		struct virtio_vdmabuf_event *e = NULL;
+
+		spin_lock_irq(&vdmabuf->evq->e_lock);
+		if (!list_empty(&vdmabuf->evq->e_list)) {
+			e = list_first_entry(&vdmabuf->evq->e_list,
+					     struct virtio_vdmabuf_event, link);
+			list_del(&e->link);
+		}
+		spin_unlock_irq(&vdmabuf->evq->e_lock);
+
+		if (!e) {
+			if (ret)
+				break;
+
+			if (filp->f_flags & O_NONBLOCK) {
+				ret = -EAGAIN;
+				break;
+			}
+
+			mutex_unlock(&vdmabuf->evq->e_readlock);
+			ret = wait_event_interruptible(vdmabuf->evq->e_wait,
+					!list_empty(&vdmabuf->evq->e_list));
+
+			if (ret == 0)
+				ret = mutex_lock_interruptible(
+						&vdmabuf->evq->e_readlock);
+
+			if (ret)
+				return ret;
+		} else {
+			unsigned int len = (sizeof(e->e_data.hdr) +
+					    e->e_data.hdr.size);
+
+			if (len > cnt - ret) {
+put_back_event:
+				spin_lock_irq(&vdmabuf->evq->e_lock);
+				list_add(&e->link, &vdmabuf->evq->e_list);
+				spin_unlock_irq(&vdmabuf->evq->e_lock);
+				break;
+			}
+
+			if (copy_to_user(buf + ret, &e->e_data.hdr,
+					 sizeof(e->e_data.hdr))) {
+				if (ret == 0)
+					ret = -EFAULT;
+
+				goto put_back_event;
+			}
+
+			ret += sizeof(e->e_data.hdr);
+
+			if (copy_to_user(buf + ret, e->e_data.data,
+					 e->e_data.hdr.size)) {
+
+				struct virtio_vdmabuf_e_hdr dummy_hdr = {0};
+
+				ret -= sizeof(e->e_data.hdr);
+
+				/* nullifying hdr of the event in user buffer */
+				if (copy_to_user(buf + ret, &dummy_hdr,
+						 sizeof(dummy_hdr)))
+					dev_err(drv_info->dev,
+					   "fail to nullify invalid hdr\n");
+
+				ret = -EFAULT;
+
+				goto put_back_event;
+			}
+
+			ret += e->e_data.hdr.size;
+
+			spin_lock_irq(&vdmabuf->evq->e_lock);
+			vdmabuf->evq->pending--;
+			spin_unlock_irq(&vdmabuf->evq->e_lock);
+			kfree(e);
+		}
+	}
+
+	mutex_unlock(&vdmabuf->evq->e_readlock);
+
+	return ret;
+}
+
+static int vhost_vdmabuf_start(struct vhost_vdmabuf *vdmabuf)
+{
+        struct vhost_virtqueue *vq;
+        int i, ret;
+
+	mutex_lock(&vdmabuf->dev.mutex);
+
+        ret = vhost_dev_check_owner(&vdmabuf->dev);
+        if (ret)
+                goto err;
+
+	for (i = 0; i < ARRAY_SIZE(vdmabuf->vqs); i++) {
+		vq = &vdmabuf->vqs[i];
+
+		mutex_lock(&vq->mutex);
+
+		if (!vhost_vq_access_ok(vq)) {
+			ret = -EFAULT;
+			goto err_vq;
+		}
+
+		if (!vhost_vq_get_backend(vq)) {
+			vhost_vq_set_backend(vq, vdmabuf);
+			ret = vhost_vq_init_access(vq);
+			if (ret)
+				goto err_vq;
+		}
+
+		mutex_unlock(&vq->mutex);
+	}
+
+	mutex_unlock(&vdmabuf->dev.mutex);
+        return 0;
+
+err_vq:
+        vhost_vq_set_backend(vq, NULL);
+        mutex_unlock(&vq->mutex);
+
+	for (i = 0; i < ARRAY_SIZE(vdmabuf->vqs); i++) {
+		vq = &vdmabuf->vqs[i];
+
+		mutex_lock(&vq->mutex);
+		vhost_vq_set_backend(vq, NULL);
+		mutex_unlock(&vq->mutex);
+	}
+
+err:
+	mutex_unlock(&vdmabuf->dev.mutex);
+        return ret;
+}
+
+static int vhost_vdmabuf_stop(struct vhost_vdmabuf *vdmabuf)
+{
+        struct vhost_virtqueue *vq;
+        int i, ret;
+
+	mutex_lock(&vdmabuf->dev.mutex);
+
+        ret = vhost_dev_check_owner(&vdmabuf->dev);
+        if (ret)
+                goto err;
+
+	for (i = 0; i < ARRAY_SIZE(vdmabuf->vqs); i++) {
+		vq = &vdmabuf->vqs[i];
+
+		mutex_lock(&vq->mutex);
+		vhost_vq_set_backend(vq, NULL);
+		mutex_unlock(&vq->mutex);
+	}
+
+err:
+	mutex_unlock(&vdmabuf->dev.mutex);
+        return ret;
+}
+
+static int vhost_vdmabuf_set_features(struct vhost_vdmabuf *vdmabuf,
+				      u64 features)
+{
+	struct vhost_virtqueue *vq;
+	int i;
+
+	if (features & ~VHOST_VDMABUF_FEATURES)
+		return -EOPNOTSUPP;
+
+	mutex_lock(&vdmabuf->dev.mutex);
+	if ((features & (1 << VHOST_F_LOG_ALL)) &&
+	    !vhost_log_access_ok(&vdmabuf->dev)) {
+		mutex_unlock(&vdmabuf->dev.mutex);
+		return -EFAULT;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(vdmabuf->vqs); i++) {
+		vq = &vdmabuf->vqs[i];
+		mutex_lock(&vq->mutex);
+		vq->acked_features = features;
+		mutex_unlock(&vq->mutex);
+	}
+
+	mutex_unlock(&vdmabuf->dev.mutex);
+	return 0;
+}
+
+/* wrapper ioctl for vhost interface control */
+static int vhost_core_ioctl(struct file *filp, unsigned int cmd,
+			    unsigned long param)
+{
+	struct vhost_vdmabuf *vdmabuf = filp->private_data;
+	void __user *argp = (void __user *)param;
+	u64 features;
+	int ret, start;
+
+	switch (cmd) {
+	case VHOST_GET_FEATURES:
+		features = VHOST_VDMABUF_FEATURES;
+		if (copy_to_user(argp, &features, sizeof(features)))
+			return -EFAULT;
+		return 0;
+	case VHOST_SET_FEATURES:
+		if (copy_from_user(&features, argp, sizeof(features)))
+			return -EFAULT;
+		return vhost_vdmabuf_set_features(vdmabuf, features);
+	case VHOST_VDMABUF_SET_RUNNING:
+		if (copy_from_user(&start, argp, sizeof(start)))
+                        return -EFAULT;
+
+		if (start)
+                	return vhost_vdmabuf_start(vdmabuf);
+                else
+                        return vhost_vdmabuf_stop(vdmabuf);
+	default:
+		mutex_lock(&vdmabuf->dev.mutex);
+		ret = vhost_dev_ioctl(&vdmabuf->dev, cmd, argp);
+		if (ret == -ENOIOCTLCMD) {
+			ret = vhost_vring_ioctl(&vdmabuf->dev, cmd, argp);
+		} else {
+			vhost_vdmabuf_flush(vdmabuf);
+		}
+		mutex_unlock(&vdmabuf->dev.mutex);
+	}
+
+	return ret;
+}
+
+/*
+ * ioctl - importing vdmabuf from guest OS
+ *
+ * user parameters:
+ *
+ *	virtio_vdmabuf_buf_id_t buf_id - vdmabuf ID of imported buffer
+ *	int flags - flags
+ *	int fd - file handle of	the imported buffer
+ *
+ */
+static int import_ioctl(struct file *filp, void *data)
+{
+	struct vhost_vdmabuf *vdmabuf = filp->private_data;
+	struct virtio_vdmabuf_import *attr = data;
+	struct virtio_vdmabuf_buf *imp;
+	int ret = 0;
+
+	mutex_lock(&vdmabuf->dev.mutex);
+
+	/* look for dmabuf for the id */
+	imp = virtio_vdmabuf_find_buf(drv_info, &attr->buf_id);
+	if (!imp || !imp->valid) {
+		mutex_unlock(&vdmabuf->dev.mutex);
+		dev_err(drv_info->dev,
+			"no valid buf found with id = %llu\n", attr->buf_id.id);
+		return -ENOENT;
+	}
+
+	/* only if mapped pages are not present */
+	if (!imp->pages_info->pages) {
+		ret = vhost_vdmabuf_map_pages(vdmabuf->vmid, imp->pages_info);
+		if (ret < 0) {
+			dev_err(drv_info->dev,
+				"failed to map guest pages\n");
+			goto fail_map;
+		}
+	}
+
+	attr->fd = vhost_vdmabuf_exp_fd(imp, attr->flags);
+	if (attr->fd < 0) {
+		dev_err(drv_info->dev, "failed to get file descriptor\n");
+		goto fail_import;
+	}
+
+	imp->imported = true;
+
+	mutex_unlock(&vdmabuf->dev.mutex);
+	goto success;
+
+fail_import:
+	/* not imported yet? */
+	if (!imp->imported) {
+		vhost_vdmabuf_unmap_pages(vdmabuf->vmid, imp->pages_info);
+		if (imp->dma_buf)
+			kfree(imp->dma_buf);
+
+		if (imp->sgt) {
+			sg_free_table(imp->sgt);
+			kfree(imp->sgt);
+			imp->sgt = NULL;
+		}
+	}
+
+fail_map:
+	/* Check if buffer is still valid and if not remove it
+	 * from imported list.
+	 */
+	if (!imp->valid && !imp->imported) {
+		virtio_vdmabuf_del_buf(drv_info, &imp->buf_id);
+		kfree(imp->priv);
+		kfree(imp->pages_info);
+		kfree(imp);
+	}
+
+	ret =  attr->fd;
+	mutex_unlock(&vdmabuf->dev.mutex);
+
+success:
+	return ret;
+}
+
+static int release_ioctl(struct file *filp, void *data)
+{
+	struct vhost_vdmabuf *vdmabuf = filp->private_data;
+	struct virtio_vdmabuf_import *attr = data;
+	struct virtio_vdmabuf_buf *imp;
+	virtio_vdmabuf_buf_id_t buf_id = attr->buf_id;
+	int *op;
+	int ret = 0;
+
+	op = kcalloc(1, sizeof(int) * 65, GFP_KERNEL);
+	if (!op)
+		return -ENOMEM;
+
+	imp = virtio_vdmabuf_find_buf(drv_info, &buf_id);
+	if (!imp)
+		return -EINVAL;
+
+	imp->imported = false;
+
+	memcpy(op, &imp->buf_id, sizeof(imp->buf_id));
+
+	ret = send_msg_to_guest(vdmabuf->vmid, VIRTIO_VDMABUF_CMD_DMABUF_REL,
+				op);
+	if (ret < 0) {
+		dev_err(drv_info->dev, "fail to send release cmd\n");
+		return ret;
+	}
+
+	return 0;
+}
+
+/*
+ * ioctl - querying various information of vdmabuf
+ *
+ * user parameters:
+ *
+ *	virtio_vdmabuf_buf_id_t buf_id - vdmabuf ID of imported buffer
+ *	unsigned long info - returned querying result
+ *
+ */
+static int query_ioctl(struct file *filp, void *data)
+{
+	struct virtio_vdmabuf_query *attr = data;
+	struct virtio_vdmabuf_buf *imp;
+	virtio_vdmabuf_buf_id_t buf_id = attr->buf_id;
+
+	/* query for imported dmabuf */
+	imp = virtio_vdmabuf_find_buf(drv_info, &buf_id);
+	if (!imp)
+		return -EINVAL;
+
+	switch (attr->item) {
+	/* size of dmabuf in byte */
+	case VIRTIO_VDMABUF_QUERY_SIZE:
+		if (imp->dma_buf) {
+			/* if local dma_buf is created (if it's
+			 * ever mapped), retrieve it directly
+			 * from struct dma_buf *
+			 */
+			attr->info = imp->dma_buf->size;
+		} else {
+			/* calcuate it from given nents, first_ofst
+			 * and last_len
+			 */
+			attr->info = ((imp->pages_info->nents)*PAGE_SIZE -
+				     (imp->pages_info->first_ofst) - PAGE_SIZE +
+				     (imp->pages_info->last_len));
+		}
+		break;
+
+	/* whether the buffer is used or not */
+	case VIRTIO_VDMABUF_QUERY_BUSY:
+		/* checks if it's used by importer */
+		attr->info = imp->imported;
+		break;
+
+	/* size of private info attached to buffer */
+	case VIRTIO_VDMABUF_QUERY_PRIV_INFO_SIZE:
+		attr->info = imp->sz_priv;
+		break;
+
+	/* copy private info attached to buffer */
+	case VIRTIO_VDMABUF_QUERY_PRIV_INFO:
+		if (imp->sz_priv > 0) {
+			int n;
+
+			n = copy_to_user((void __user *)attr->info,
+					imp->priv,
+					imp->sz_priv);
+			if (n != 0)
+				return -EINVAL;
+		}
+		break;
+
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static const struct virtio_vdmabuf_ioctl_desc vhost_vdmabuf_ioctls[] = {
+	VIRTIO_VDMABUF_IOCTL_DEF(VIRTIO_VDMABUF_IOCTL_IMPORT, import_ioctl, 0),
+	VIRTIO_VDMABUF_IOCTL_DEF(VIRTIO_VDMABUF_IOCTL_RELEASE, release_ioctl, 0),
+	VIRTIO_VDMABUF_IOCTL_DEF(VIRTIO_VDMABUF_IOCTL_QUERY, query_ioctl, 0),
+};
+
+static long vhost_vdmabuf_ioctl(struct file *filp, unsigned int cmd,
+				unsigned long param)
+{
+	const struct virtio_vdmabuf_ioctl_desc *ioctl;
+	virtio_vdmabuf_ioctl_t func;
+	unsigned int nr;
+	int ret;
+	char *kdata;
+
+	/* check if cmd is vhost's */
+	if (_IOC_TYPE(cmd) == VHOST_VIRTIO) {
+		ret = vhost_core_ioctl(filp, cmd, param);
+		return ret;
+	}
+
+	nr = _IOC_NR(cmd);
+
+	if (nr >= ARRAY_SIZE(vhost_vdmabuf_ioctls)) {
+		dev_err(drv_info->dev, "invalid ioctl\n");
+		return -EINVAL;
+	}
+
+	ioctl = &vhost_vdmabuf_ioctls[nr];
+
+	func = ioctl->func;
+
+	if (unlikely(!func)) {
+		dev_err(drv_info->dev, "no function\n");
+		return -EINVAL;
+	}
+
+	kdata = kmalloc(_IOC_SIZE(cmd), GFP_KERNEL);
+	if (!kdata)
+		return -ENOMEM;
+
+	if (copy_from_user(kdata, (void __user *)param,
+			   _IOC_SIZE(cmd)) != 0) {
+		dev_err(drv_info->dev,
+			"failed to copy args from userspace\n");
+		ret = -EFAULT;
+		goto ioctl_error;
+	}
+
+	ret = func(filp, kdata);
+
+	if (copy_to_user((void __user *)param, kdata,
+			 _IOC_SIZE(cmd)) != 0) {
+		dev_err(drv_info->dev,
+			"failed to copy args back to userspace\n");
+		ret = -EFAULT;
+		goto ioctl_error;
+	}
+
+ioctl_error:
+	kfree(kdata);
+	return ret;
+}
+
+static const struct file_operations vhost_vdmabuf_fops = {
+	.owner = THIS_MODULE,
+	.open = vhost_vdmabuf_open,
+	.release = vhost_vdmabuf_release,
+	.read = vhost_vdmabuf_event_read,
+	.poll = vhost_vdmabuf_event_poll,
+	.unlocked_ioctl = vhost_vdmabuf_ioctl,
+};
+
+static struct miscdevice vhost_vdmabuf_miscdev = {
+	.minor = MISC_DYNAMIC_MINOR,
+	.name = "vhost-vdmabuf",
+	.fops = &vhost_vdmabuf_fops,
+};
+
+static int __init vhost_vdmabuf_init(void)
+{
+	int ret = 0;
+
+	ret = misc_register(&vhost_vdmabuf_miscdev);
+	if (ret) {
+		pr_err("vhost-vdmabuf: driver can't be registered\n");
+		return ret;
+	}
+
+	dma_coerce_mask_and_coherent(vhost_vdmabuf_miscdev.this_device,
+				     DMA_BIT_MASK(64));
+
+	drv_info = kcalloc(1, sizeof(*drv_info), GFP_KERNEL);
+	if (!drv_info) {
+		misc_deregister(&vhost_vdmabuf_miscdev);
+		return -ENOMEM;
+	}
+
+	drv_info->dev = vhost_vdmabuf_miscdev.this_device;
+
+	hash_init(drv_info->buf_list);
+	mutex_init(&drv_info->g_mutex);
+
+	INIT_LIST_HEAD(&drv_info->head_vdmabuf_list);
+	INIT_LIST_HEAD(&drv_info->kvm_instances);
+
+	drv_info->kvm_notifier.notifier_call = vhost_vdmabuf_get_kvm;
+	ret = kvm_vm_register_notifier(&drv_info->kvm_notifier);
+
+	return ret;
+}
+
+static void __exit vhost_vdmabuf_deinit(void)
+{
+	misc_deregister(&vhost_vdmabuf_miscdev);
+	vhost_vdmabuf_del_all();
+
+	kvm_vm_unregister_notifier(&drv_info->kvm_notifier);
+	kfree(drv_info);
+	drv_info = NULL;
+}
+
+module_init(vhost_vdmabuf_init);
+module_exit(vhost_vdmabuf_deinit);
+
+MODULE_DESCRIPTION("Vhost Vdmabuf Driver");
+MODULE_LICENSE("GPL and additional rights");
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index c998860d7bbc..87e0d1b8cc76 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -150,4 +150,7 @@
 /* Get the valid iova range */
 #define VHOST_VDPA_GET_IOVA_RANGE	_IOR(VHOST_VIRTIO, 0x78, \
 					     struct vhost_vdpa_iova_range)
+
+/* VHOST_VDMABUF specific defines */
+#define VHOST_VDMABUF_SET_RUNNING       _IOW(VHOST_VIRTIO, 0x80, int)
 #endif
-- 
2.26.2

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
  2021-02-03  7:35   ` Vivek Kasireddy
  (?)
  (?)
@ 2021-02-03  9:15   ` kernel test robot
  -1 siblings, 0 replies; 57+ messages in thread
From: kernel test robot @ 2021-02-03  9:15 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4043 bytes --]

Hi Vivek,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on vhost/linux-next]
[also build test ERROR on tegra-drm/drm/tegra/for-next linus/master v5.11-rc6 next-20210125]
[cannot apply to kvm/linux-next drm/drm-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Vivek-Kasireddy/Introduce-Virtio-based-Dmabuf-driver/20210203-154729
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: nds32-randconfig-r033-20210202 (attached as .config)
compiler: nds32le-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/940514813aa8ac779639e6d0e447ca361daf4555
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Vivek-Kasireddy/Introduce-Virtio-based-Dmabuf-driver/20210203-154729
        git checkout 940514813aa8ac779639e6d0e447ca361daf4555
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=nds32 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> error: include/uapi/linux/virtio_vdmabuf.h: missing "WITH Linux-syscall-note" for SPDX-License-Identifier
--
>> error: include/uapi/linux/virtio_vdmabuf.h: missing "WITH Linux-syscall-note" for SPDX-License-Identifier
   make[2]: *** [scripts/Makefile.headersinst:63: usr/include/linux/virtio_vdmabuf.h] Error 1
   make[2]: Target '__headers' not remade because of errors.
   make[1]: *** [Makefile:1294: headers] Error 2
   make[1]: Target 'headers_install' not remade because of errors.
   make: *** [Makefile:185: __sub-make] Error 2
   make: Target 'headers_install' not remade because of errors.
--
>> error: include/uapi/linux/virtio_vdmabuf.h: missing "WITH Linux-syscall-note" for SPDX-License-Identifier
   make[2]: *** [scripts/Makefile.headersinst:63: usr/include/linux/virtio_vdmabuf.h] Error 1
   make[2]: Target '__headers' not remade because of errors.
   make[1]: *** [Makefile:1294: headers] Error 2
   <stdin>:1511:2: warning: #warning syscall clone3 not implemented [-Wcpp]
   arch/nds32/kernel/vdso/gettimeofday.c:158:13: warning: no previous prototype for '__vdso_clock_gettime' [-Wmissing-prototypes]
     158 | notrace int __vdso_clock_gettime(clockid_t clkid, struct __kernel_old_timespec *ts)
         |             ^~~~~~~~~~~~~~~~~~~~
   arch/nds32/kernel/vdso/gettimeofday.c:206:13: warning: no previous prototype for '__vdso_clock_getres' [-Wmissing-prototypes]
     206 | notrace int __vdso_clock_getres(clockid_t clk_id, struct __kernel_old_timespec *res)
         |             ^~~~~~~~~~~~~~~~~~~
   arch/nds32/kernel/vdso/gettimeofday.c:246:13: warning: no previous prototype for '__vdso_gettimeofday' [-Wmissing-prototypes]
     246 | notrace int __vdso_gettimeofday(struct __kernel_old_timeval *tv, struct timezone *tz)
         |             ^~~~~~~~~~~~~~~~~~~
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [Makefile:185: __sub-make] Error 2
   make: Target 'prepare' not remade because of errors.

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for FRAME_POINTER
   Depends on DEBUG_KERNEL && (M68K || UML || SUPERH) || ARCH_WANT_FRAME_POINTERS
   Selected by
   - FAULT_INJECTION_STACKTRACE_FILTER && FAULT_INJECTION_DEBUG_FS && STACKTRACE_SUPPORT && !X86_64 && !MIPS && !PPC && !S390 && !MICROBLAZE && !ARM && !ARC && !X86

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 33174 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
  2021-02-03  7:35   ` Vivek Kasireddy
  (?)
@ 2021-02-05 16:03     ` Daniel Vetter
  -1 siblings, 0 replies; 57+ messages in thread
From: Daniel Vetter @ 2021-02-05 16:03 UTC (permalink / raw)
  To: Vivek Kasireddy, Gerd Hoffmann
  Cc: virtualization, dri-devel, kraxel, daniel.vetter, daniel.vetter,
	dongwon.kim, sumit.semwal, christian.koenig, linux-media

On Tue, Feb 02, 2021 at 11:35:16PM -0800, Vivek Kasireddy wrote:
> This driver "transfers" a dmabuf created on the Guest to the Host.
> A common use-case for such a transfer includes sharing the scanout
> buffer created by a display server or a compositor running in the
> Guest with Qemu UI -- running on the Host.
> 
> The "transfer" is accomplished by sharing the PFNs of all the pages
> associated with the dmabuf and having a new dmabuf created on the
> Host that is backed up by the pages mapped from the Guest.
> 
> Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> ---
>  drivers/virtio/Kconfig              |    8 +
>  drivers/virtio/Makefile             |    1 +
>  drivers/virtio/virtio_vdmabuf.c     | 1090 +++++++++++++++++++++++++++
>  include/linux/virtio_vdmabuf.h      |  271 +++++++
>  include/uapi/linux/virtio_ids.h     |    1 +
>  include/uapi/linux/virtio_vdmabuf.h |   99 +++
>  6 files changed, 1470 insertions(+)
>  create mode 100644 drivers/virtio/virtio_vdmabuf.c
>  create mode 100644 include/linux/virtio_vdmabuf.h
>  create mode 100644 include/uapi/linux/virtio_vdmabuf.h
> 
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 7b41130d3f35..e563c12f711e 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -139,4 +139,12 @@ config VIRTIO_DMA_SHARED_BUFFER
>  	 This option adds a flavor of dma buffers that are backed by
>  	 virtio resources.
>  
> +config VIRTIO_VDMABUF
> +	bool "Enables Vdmabuf driver in guest os"
> +	default n
> +	depends on VIRTIO
> +	help
> +	 This driver provides a way to share the dmabufs created in
> +	 the Guest with the Host.
> +
>  endif # VIRTIO_MENU
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 591e6f72aa54..b4bb0738009c 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -9,3 +9,4 @@ obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
>  obj-$(CONFIG_VIRTIO_VDPA) += virtio_vdpa.o
>  obj-$(CONFIG_VIRTIO_MEM) += virtio_mem.o
>  obj-$(CONFIG_VIRTIO_DMA_SHARED_BUFFER) += virtio_dma_buf.o
> +obj-$(CONFIG_VIRTIO_VDMABUF) += virtio_vdmabuf.o
> diff --git a/drivers/virtio/virtio_vdmabuf.c b/drivers/virtio/virtio_vdmabuf.c
> new file mode 100644
> index 000000000000..c28f144eb126
> --- /dev/null
> +++ b/drivers/virtio/virtio_vdmabuf.c
> @@ -0,0 +1,1090 @@
> +// SPDX-License-Identifier: (MIT OR GPL-2.0)
> +
> +/*
> + * Copyright © 2021 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + * Authors:
> + *    Dongwon Kim <dongwon.kim@intel.com>
> + *    Mateusz Polrola <mateusz.polrola@gmail.com>
> + *    Vivek Kasireddy <vivek.kasireddy@intel.com>
> + */
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/errno.h>
> +#include <linux/module.h>
> +#include <linux/device.h>
> +#include <linux/uaccess.h>
> +#include <linux/miscdevice.h>
> +#include <linux/delay.h>
> +#include <linux/random.h>
> +#include <linux/poll.h>
> +#include <linux/spinlock.h>
> +#include <linux/dma-buf.h>
> +#include <linux/virtio.h>
> +#include <linux/virtio_ids.h>
> +#include <linux/virtio_config.h>
> +#include <linux/virtio_vdmabuf.h>
> +
> +#define VIRTIO_VDMABUF_MAX_ID INT_MAX
> +#define REFS_PER_PAGE (PAGE_SIZE/sizeof(long))
> +#define NEW_BUF_ID_GEN(vmid, cnt) (((vmid & 0xFFFFFFFF) << 32) | \
> +				    ((cnt) & 0xFFFFFFFF))
> +
> +/* one global drv object */
> +static struct virtio_vdmabuf_info *drv_info;
> +
> +struct virtio_vdmabuf {
> +	/* virtio device structure */
> +	struct virtio_device *vdev;
> +
> +	/* virtual queue array */
> +	struct virtqueue *vqs[VDMABUF_VQ_MAX];
> +
> +	/* ID of guest OS */
> +	u64 vmid;
> +
> +	/* spin lock that needs to be acquired before accessing
> +	 * virtual queue
> +	 */
> +	spinlock_t vq_lock;
> +	struct mutex recv_lock;
> +	struct mutex send_lock;
> +
> +	struct list_head msg_list;
> +
> +	/* workqueue */
> +	struct workqueue_struct *wq;
> +	struct work_struct recv_work;
> +	struct work_struct send_work;
> +	struct work_struct send_msg_work;
> +
> +	struct virtio_vdmabuf_event_queue *evq;
> +};
> +
> +static virtio_vdmabuf_buf_id_t get_buf_id(struct virtio_vdmabuf *vdmabuf)
> +{
> +	virtio_vdmabuf_buf_id_t buf_id = {0, {0, 0} };
> +	static int count = 0;
> +
> +	count = count < VIRTIO_VDMABUF_MAX_ID ? count + 1 : 0;
> +	buf_id.id = NEW_BUF_ID_GEN(vdmabuf->vmid, count);
> +
> +	/* random data embedded in the id for security */
> +	get_random_bytes(&buf_id.rng_key[0], 8);
> +
> +	return buf_id;
> +}
> +
> +/* sharing pages for original DMABUF with Host */
> +static struct virtio_vdmabuf_shared_pages
> +*virtio_vdmabuf_share_buf(struct page **pages, int nents,
> +			  int first_ofst, int last_len)
> +{
> +	struct virtio_vdmabuf_shared_pages *pages_info;
> +	int i;
> +	int n_l2refs = nents/REFS_PER_PAGE +
> +		       ((nents % REFS_PER_PAGE) ? 1 : 0);
> +
> +	pages_info = kvcalloc(1, sizeof(*pages_info), GFP_KERNEL);
> +	if (!pages_info)
> +		return NULL;
> +
> +	pages_info->pages = pages;
> +	pages_info->nents = nents;
> +	pages_info->first_ofst = first_ofst;
> +	pages_info->last_len = last_len;
> +	pages_info->l3refs = (gpa_t *)__get_free_page(GFP_KERNEL);
> +
> +	if (!pages_info->l3refs) {
> +		kvfree(pages_info);
> +		return NULL;
> +	}
> +
> +	pages_info->l2refs = (gpa_t **)__get_free_pages(GFP_KERNEL,
> +					get_order(n_l2refs * PAGE_SIZE));
> +
> +	if (!pages_info->l2refs) {
> +		free_page((gpa_t)pages_info->l3refs);
> +		kvfree(pages_info);
> +		return NULL;
> +	}
> +
> +	/* Share physical address of pages */
> +	for (i = 0; i < nents; i++)
> +		pages_info->l2refs[i] = (gpa_t *)page_to_phys(pages[i]);
> +
> +	for (i = 0; i < n_l2refs; i++)
> +		pages_info->l3refs[i] =
> +			virt_to_phys((void *)pages_info->l2refs +
> +				     i * PAGE_SIZE);
> +
> +	pages_info->ref = (gpa_t)virt_to_phys(pages_info->l3refs);
> +
> +	return pages_info;
> +}
> +
> +/* stop sharing pages */
> +static void
> +virtio_vdmabuf_free_buf(struct virtio_vdmabuf_shared_pages *pages_info)
> +{
> +	int n_l2refs = (pages_info->nents/REFS_PER_PAGE +
> +		       ((pages_info->nents % REFS_PER_PAGE) ? 1 : 0));
> +
> +	free_pages((gpa_t)pages_info->l2refs, get_order(n_l2refs * PAGE_SIZE));
> +	free_page((gpa_t)pages_info->l3refs);
> +
> +	kvfree(pages_info);
> +}
> +
> +static int send_msg_to_host(enum virtio_vdmabuf_cmd cmd, int *op)
> +{
> +	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
> +	struct virtio_vdmabuf_msg *msg;
> +	int i;
> +
> +	switch (cmd) {
> +	case VIRTIO_VDMABUF_CMD_NEED_VMID:
> +		msg = kvcalloc(1, sizeof(struct virtio_vdmabuf_msg),
> +			       GFP_KERNEL);
> +		if (!msg)
> +			return -ENOMEM;
> +
> +		if (op)
> +			for (i = 0; i < 4; i++)
> +				msg->op[i] = op[i];
> +		break;
> +
> +	case VIRTIO_VDMABUF_CMD_EXPORT:
> +		msg = kvcalloc(1, sizeof(struct virtio_vdmabuf_msg),
> +			       GFP_KERNEL);
> +		if (!msg)
> +			return -ENOMEM;
> +
> +		memcpy(&msg->op[0], &op[0], 9 * sizeof(int) + op[9]);
> +		break;
> +
> +	default:
> +		/* no command found */
> +		return -EINVAL;
> +	}
> +
> +	msg->cmd = cmd;
> +	list_add_tail(&msg->list, &vdmabuf->msg_list);
> +	queue_work(vdmabuf->wq, &vdmabuf->send_msg_work);
> +
> +	return 0;
> +}
> +
> +static int add_event_buf_rel(struct virtio_vdmabuf_buf *buf_info)
> +{
> +	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
> +	struct virtio_vdmabuf_event *e_oldest, *e_new;
> +	struct virtio_vdmabuf_event_queue *eq = vdmabuf->evq;
> +	unsigned long irqflags;
> +
> +	e_new = kvzalloc(sizeof(*e_new), GFP_KERNEL);
> +	if (!e_new)
> +		return -ENOMEM;
> +
> +	e_new->e_data.hdr.buf_id = buf_info->buf_id;
> +	e_new->e_data.data = (void *)buf_info->priv;
> +	e_new->e_data.hdr.size = buf_info->sz_priv;
> +
> +	spin_lock_irqsave(&eq->e_lock, irqflags);
> +
> +	/* check current number of events and if it hits the max num (32)
> +	 * then remove the oldest event in the list
> +	 */
> +	if (eq->pending > 31) {
> +		e_oldest = list_first_entry(&eq->e_list,
> +					    struct virtio_vdmabuf_event, link);
> +		list_del(&e_oldest->link);
> +		eq->pending--;
> +		kvfree(e_oldest);
> +	}
> +
> +	list_add_tail(&e_new->link, &eq->e_list);
> +
> +	eq->pending++;
> +
> +	wake_up_interruptible(&eq->e_wait);
> +	spin_unlock_irqrestore(&eq->e_lock, irqflags);
> +
> +	return 0;
> +}
> +
> +static void virtio_vdmabuf_clear_buf(struct virtio_vdmabuf_buf *exp)
> +{
> +	/* Start cleanup of buffer in reverse order to exporting */
> +	virtio_vdmabuf_free_buf(exp->pages_info);
> +
> +	dma_buf_unmap_attachment(exp->attach, exp->sgt,
> +				 DMA_BIDIRECTIONAL);
> +
> +	if (exp->dma_buf) {
> +		dma_buf_detach(exp->dma_buf, exp->attach);
> +		/* close connection to dma-buf completely */
> +		dma_buf_put(exp->dma_buf);
> +		exp->dma_buf = NULL;
> +	}
> +}
> +
> +static int remove_buf(struct virtio_vdmabuf *vdmabuf,
> +		      struct virtio_vdmabuf_buf *exp)
> +{
> +	int ret;
> +
> +	ret = add_event_buf_rel(exp);
> +	if (ret)
> +		return ret;
> +
> +	virtio_vdmabuf_clear_buf(exp);
> +
> +	ret = virtio_vdmabuf_del_buf(drv_info, &exp->buf_id);
> +	if (ret)
> +		return ret;
> +
> +	if (exp->sz_priv > 0 && !exp->priv)
> +		kvfree(exp->priv);
> +
> +	kvfree(exp);
> +	return 0;
> +}
> +
> +static int parse_msg_from_host(struct virtio_vdmabuf *vdmabuf,
> +		     	       struct virtio_vdmabuf_msg *msg)
> +{
> +	struct virtio_vdmabuf_buf *exp;
> +	virtio_vdmabuf_buf_id_t buf_id;
> +	int ret;
> +
> +	switch (msg->cmd) {
> +	case VIRTIO_VDMABUF_CMD_NEED_VMID:
> +		vdmabuf->vmid = msg->op[0];
> +
> +		break;
> +	case VIRTIO_VDMABUF_CMD_DMABUF_REL:
> +		memcpy(&buf_id, msg->op, sizeof(buf_id));
> +
> +		exp = virtio_vdmabuf_find_buf(drv_info, &buf_id);
> +		if (!exp) {
> +			dev_err(drv_info->dev, "can't find buffer\n");
> +			return -EINVAL;
> +		}
> +
> +		ret = remove_buf(vdmabuf, exp);
> +		if (ret)
> +			return ret;
> +
> +		break;
> +	case VIRTIO_VDMABUF_CMD_EXPORT:
> +		break;
> +	default:
> +		dev_err(drv_info->dev, "empty cmd\n");
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static void virtio_vdmabuf_recv_work(struct work_struct *work)
> +{
> +	struct virtio_vdmabuf *vdmabuf =
> +		container_of(work, struct virtio_vdmabuf, recv_work);
> +	struct virtqueue *vq = vdmabuf->vqs[VDMABUF_VQ_RECV];
> +	struct virtio_vdmabuf_msg *msg;
> +	int sz;
> +
> +	mutex_lock(&vdmabuf->recv_lock);
> +
> +	do {
> +		virtqueue_disable_cb(vq);
> +		for (;;) {
> +			msg = virtqueue_get_buf(vq, &sz);
> +			if (!msg)
> +				break;
> +
> +			/* valid size */
> +			if (sz == sizeof(struct virtio_vdmabuf_msg)) {
> +				if (parse_msg_from_host(vdmabuf, msg))
> +					dev_err(drv_info->dev,
> +						"msg parse error\n");
> +
> +				kvfree(msg);
> +			} else {
> +				dev_err(drv_info->dev,
> +					"received malformed message\n");
> +			}
> +		}
> +	} while (!virtqueue_enable_cb(vq));
> +
> +	mutex_unlock(&vdmabuf->recv_lock);
> +}
> +
> +static void virtio_vdmabuf_fill_recv_msg(struct virtio_vdmabuf *vdmabuf)
> +{
> +	struct virtqueue *vq = vdmabuf->vqs[VDMABUF_VQ_RECV];
> +	struct scatterlist sg;
> +	struct virtio_vdmabuf_msg *msg;
> +	int ret;
> +
> +	msg = kvzalloc(sizeof(*msg), GFP_KERNEL);
> +	if (!msg)
> +		return;
> +
> +	sg_init_one(&sg, msg, sizeof(struct virtio_vdmabuf_msg));
> +	ret = virtqueue_add_inbuf(vq, &sg, 1, msg, GFP_KERNEL);
> +	if (ret)
> +		return;
> +
> +	virtqueue_kick(vq);
> +}
> +
> +static void virtio_vdmabuf_send_msg_work(struct work_struct *work)
> +{
> +	struct virtio_vdmabuf *vdmabuf =
> +		container_of(work, struct virtio_vdmabuf, send_msg_work);
> +	struct virtqueue *vq = vdmabuf->vqs[VDMABUF_VQ_SEND];
> +	struct scatterlist sg;
> +	struct virtio_vdmabuf_msg *msg;
> +	bool added = false;
> +	int ret;
> +
> +	mutex_lock(&vdmabuf->send_lock);
> +
> +	for (;;) {
> +		if (list_empty(&vdmabuf->msg_list))
> +			break;
> +
> +		virtio_vdmabuf_fill_recv_msg(vdmabuf);
> +
> +		msg = list_first_entry(&vdmabuf->msg_list,
> +				       struct virtio_vdmabuf_msg, list);
> +		list_del_init(&msg->list);
> +
> +		sg_init_one(&sg, msg, sizeof(struct virtio_vdmabuf_msg));
> +		ret = virtqueue_add_outbuf(vq, &sg, 1, msg, GFP_KERNEL);
> +		if (ret < 0) {
> +			dev_err(drv_info->dev,
> +				"failed to add msg to vq\n");
> +			break;
> +		}
> +
> +		added = true;	
> +	}
> +
> +	if (added)
> +		virtqueue_kick(vq);
> +
> +	mutex_unlock(&vdmabuf->send_lock);
> +}
> +
> +static void virtio_vdmabuf_send_work(struct work_struct *work)
> +{
> +	struct virtio_vdmabuf *vdmabuf =
> +		container_of(work, struct virtio_vdmabuf, send_work);
> +	struct virtqueue *vq = vdmabuf->vqs[VDMABUF_VQ_SEND];
> +	struct virtio_vdmabuf_msg *msg;
> +	unsigned int sz;
> +	bool added = false;
> +
> +	mutex_lock(&vdmabuf->send_lock);
> +
> +	do {
> +		virtqueue_disable_cb(vq);
> +
> +		for (;;) {
> +			msg = virtqueue_get_buf(vq, &sz);
> +			if (!msg)
> +				break;
> +
> +			if (parse_msg_from_host(vdmabuf, msg))
> +				dev_err(drv_info->dev,
> +					"msg parse error\n");
> +
> +			kvfree(msg);
> +			added = true;
> +		}
> +	} while (!virtqueue_enable_cb(vq));
> +
> +	mutex_unlock(&vdmabuf->send_lock);
> +
> +	if (added)
> +		queue_work(vdmabuf->wq, &vdmabuf->send_msg_work);
> +}
> +
> +static void virtio_vdmabuf_recv_cb(struct virtqueue *vq)
> +{
> +	struct virtio_vdmabuf *vdmabuf = vq->vdev->priv;
> +
> +	if (!vdmabuf)
> +		return;
> +
> +	queue_work(vdmabuf->wq, &vdmabuf->recv_work);
> +}
> +
> +static void virtio_vdmabuf_send_cb(struct virtqueue *vq)
> +{
> +	struct virtio_vdmabuf *vdmabuf = vq->vdev->priv;
> +
> +	if (!vdmabuf)
> +		return;
> +
> +	queue_work(vdmabuf->wq, &vdmabuf->send_work);
> +}
> +
> +static int remove_all_bufs(struct virtio_vdmabuf *vdmabuf)
> +{
> +	struct virtio_vdmabuf_buf *found;
> +	struct hlist_node *tmp;
> +	int bkt;
> +	int ret;
> +
> +	hash_for_each_safe(drv_info->buf_list, bkt, tmp, found, node) {
> +		ret = remove_buf(vdmabuf, found);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static int virtio_vdmabuf_open(struct inode *inode, struct file *filp)
> +{
> +	int ret;
> +
> +	if (!drv_info) {
> +		pr_err("virtio vdmabuf driver is not ready\n");
> +		return -EINVAL;
> +	}
> +
> +	ret = send_msg_to_host(VIRTIO_VDMABUF_CMD_NEED_VMID, 0);
> +	if (ret < 0)
> +		dev_err(drv_info->dev, "fail to receive vmid\n");
> +
> +	filp->private_data = drv_info->priv;
> +
> +	return 0;
> +}
> +
> +static int virtio_vdmabuf_release(struct inode *inode, struct file *filp)
> +{
> +	return 0;
> +}
> +
> +/* Notify Host about the new vdmabuf */
> +static int export_notify(struct virtio_vdmabuf_buf *exp, struct page **pages)
> +{
> +	int *op;
> +	int ret;
> +
> +	op = kvcalloc(1, sizeof(int) * 65, GFP_KERNEL);
> +	if (!op)
> +		return -ENOMEM;
> +
> +	memcpy(op, &exp->buf_id, sizeof(exp->buf_id));
> +
> +	/* if new pages are to be shared */
> +	if (pages) {
> +		op[4] = exp->pages_info->nents;
> +		op[5] = exp->pages_info->first_ofst;
> +		op[6] = exp->pages_info->last_len;
> +
> +		memcpy(&op[7], &exp->pages_info->ref, sizeof(gpa_t));
> +	}
> +
> +	op[9] = exp->sz_priv;
> +
> +	/* driver/application specific private info */
> +	memcpy(&op[10], exp->priv, op[9]);
> +
> +	ret = send_msg_to_host(VIRTIO_VDMABUF_CMD_EXPORT, op);
> +
> +	kvfree(op);
> +	return ret;
> +}
> +
> +/* return total number of pages referenced by a sgt
> + * for pre-calculation of # of pages behind a given sgt
> + */
> +static int num_pgs(struct sg_table *sgt)
> +{
> +	struct scatterlist *sgl;
> +	int len, i;
> +	/* at least one page */
> +	int n_pgs = 1;
> +
> +	sgl = sgt->sgl;
> +
> +	len = sgl->length - PAGE_SIZE + sgl->offset;
> +
> +	/* round-up */
> +	n_pgs += ((len + PAGE_SIZE - 1)/PAGE_SIZE);
> +
> +	for (i = 1; i < sgt->nents; i++) {
> +		sgl = sg_next(sgl);
> +
> +		/* round-up */
> +		n_pgs += ((sgl->length + PAGE_SIZE - 1) /
> +			  PAGE_SIZE); /* round-up */
> +	}
> +
> +	return n_pgs;
> +}
> +
> +/* extract pages referenced by sgt */
> +static struct page **extr_pgs(struct sg_table *sgt, int *nents, int *last_len)

Nack, this doesn't work on dma-buf. And it'll blow up at runtime when you
enable the very recently merged CONFIG_DMABUF_DEBUG (would be good to test
with that, just to make sure).

Aside from this, for virtio/kvm use-cases we've already merged the udmabuf
driver. Does this not work for your usecase?

Adding Gerd as the subject expert for this area.

Thanks, Daniel

> +{
> +	struct scatterlist *sgl;
> +	struct page **pages;
> +	struct page **temp_pgs;
> +	int i, j;
> +	int len;
> +
> +	*nents = num_pgs(sgt);
> +	pages =	kvmalloc_array(*nents, sizeof(struct page *), GFP_KERNEL);
> +	if (!pages)
> +		return NULL;
> +
> +	sgl = sgt->sgl;
> +
> +	temp_pgs = pages;
> +	*temp_pgs++ = sg_page(sgl);
> +	len = sgl->length - PAGE_SIZE + sgl->offset;
> +
> +	i = 1;
> +	while (len > 0) {
> +		*temp_pgs++ = nth_page(sg_page(sgl), i++);
> +		len -= PAGE_SIZE;
> +	}
> +
> +	for (i = 1; i < sgt->nents; i++) {
> +		sgl = sg_next(sgl);
> +		*temp_pgs++ = sg_page(sgl);
> +		len = sgl->length - PAGE_SIZE;
> +		j = 1;
> +
> +		while (len > 0) {
> +			*temp_pgs++ = nth_page(sg_page(sgl), j++);
> +			len -= PAGE_SIZE;
> +		}
> +	}
> +
> +	*last_len = len + PAGE_SIZE;
> +
> +	return pages;
> +}
> +
> +/* ioctl - exporting new vdmabuf
> + *
> + *	 int dmabuf_fd - File handle of original DMABUF
> + *	 virtio_vdmabuf_buf_id_t buf_id - returned vdmabuf ID
> + *	 int sz_priv - size of private data from userspace
> + *	 char *priv - buffer of user private data
> + *
> + */
> +static int export_ioctl(struct file *filp, void *data)
> +{
> +	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
> +	struct virtio_vdmabuf_export *attr = data;
> +	struct dma_buf *dmabuf;
> +	struct dma_buf_attachment *attach;
> +	struct sg_table *sgt;
> +	struct virtio_vdmabuf_buf *exp;
> +	struct page **pages;
> +	int nents, last_len;
> +	virtio_vdmabuf_buf_id_t buf_id;
> +	int ret = 0;
> +
> +	if (vdmabuf->vmid <= 0)
> +		return -EINVAL;
> +
> +	dmabuf = dma_buf_get(attr->fd);
> +	if (IS_ERR(dmabuf))
> +		return PTR_ERR(dmabuf);
> +
> +	mutex_lock(&drv_info->g_mutex);
> +
> +	buf_id = get_buf_id(vdmabuf);
> +
> +	attach = dma_buf_attach(dmabuf, drv_info->dev);
> +	if (IS_ERR(attach)) {
> +		ret = PTR_ERR(attach);
> +		goto fail_attach;
> +	}
> +
> +	sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
> +	if (IS_ERR(sgt)) {
> +		ret = PTR_ERR(sgt);
> +		goto fail_map_attachment;
> +	}
> +
> +	/* allocate a new exp */
> +	exp = kvcalloc(1, sizeof(*exp), GFP_KERNEL);
> +	if (!exp) {
> +		ret = -ENOMEM;
> +		goto fail_sgt_info_creation;
> +	}
> +
> +	/* possible truncation */
> +	if (attr->sz_priv > MAX_SIZE_PRIV_DATA)
> +		exp->sz_priv = MAX_SIZE_PRIV_DATA;
> +	else
> +		exp->sz_priv = attr->sz_priv;
> +
> +	/* creating buffer for private data */
> +	if (exp->sz_priv != 0) {
> +		exp->priv = kvcalloc(1, exp->sz_priv, GFP_KERNEL);
> +		if (!exp->priv) {
> +			ret = -ENOMEM;
> +			goto fail_priv_creation;
> +		}
> +	}
> +
> +	exp->buf_id = buf_id;
> +	exp->attach = attach;
> +	exp->sgt = sgt;
> +	exp->dma_buf = dmabuf;
> +	exp->valid = 1;
> +
> +	if (exp->sz_priv) {
> +		/* copy private data to sgt_info */
> +		ret = copy_from_user(exp->priv, attr->priv, exp->sz_priv);
> +		if (ret) {
> +			ret = -EINVAL;
> +			goto fail_exp;
> +		}
> +	}
> +
> +	pages = extr_pgs(sgt, &nents, &last_len);
> +	if (pages == NULL) {
> +		ret = -ENOMEM;
> +		goto fail_exp;
> +	}
> +
> +	exp->pages_info = virtio_vdmabuf_share_buf(pages, nents,
> +						   sgt->sgl->offset,
> +					 	   last_len);
> +	if (!exp->pages_info) {
> +		ret = -ENOMEM;
> +		goto fail_create_pages_info;
> +	}
> +
> +	attr->buf_id = exp->buf_id;
> +	ret = export_notify(exp, pages);
> +	if (ret < 0)
> +		goto fail_send_request;
> +
> +	/* now register it to the export list */
> +	virtio_vdmabuf_add_buf(drv_info, exp);
> +
> +	exp->filp = filp;
> +
> +	mutex_unlock(&drv_info->g_mutex);
> +
> +	return ret;
> +
> +/* Clean-up if error occurs */
> +fail_send_request:
> +	virtio_vdmabuf_free_buf(exp->pages_info);
> +
> +fail_create_pages_info:
> +	kvfree(pages);
> +
> +fail_exp:
> +	kvfree(exp->priv);
> +
> +fail_priv_creation:
> +	kvfree(exp);
> +
> +fail_sgt_info_creation:
> +	dma_buf_unmap_attachment(attach, sgt,
> +				 DMA_BIDIRECTIONAL);
> +
> +fail_map_attachment:
> +	dma_buf_detach(dmabuf, attach);
> +
> +fail_attach:
> +	dma_buf_put(dmabuf);
> +
> +	mutex_unlock(&drv_info->g_mutex);
> +
> +	return ret;
> +}
> +
> +static const struct virtio_vdmabuf_ioctl_desc virtio_vdmabuf_ioctls[] = {
> +	VIRTIO_VDMABUF_IOCTL_DEF(VIRTIO_VDMABUF_IOCTL_EXPORT, export_ioctl, 0),
> +};
> +
> +static long virtio_vdmabuf_ioctl(struct file *filp, unsigned int cmd,
> +		       		 unsigned long param)
> +{
> +	const struct virtio_vdmabuf_ioctl_desc *ioctl = NULL;
> +	unsigned int nr = _IOC_NR(cmd);
> +	int ret;
> +	virtio_vdmabuf_ioctl_t func;
> +	char *kdata;
> +
> +	if (nr >= ARRAY_SIZE(virtio_vdmabuf_ioctls)) {
> +		dev_err(drv_info->dev, "invalid ioctl\n");
> +		return -EINVAL;
> +	}
> +
> +	ioctl = &virtio_vdmabuf_ioctls[nr];
> +
> +	func = ioctl->func;
> +
> +	if (unlikely(!func)) {
> +		dev_err(drv_info->dev, "no function\n");
> +		return -EINVAL;
> +	}
> +
> +	kdata = kvmalloc(_IOC_SIZE(cmd), GFP_KERNEL);
> +	if (!kdata)
> +		return -ENOMEM;
> +
> +	if (copy_from_user(kdata, (void __user *)param,
> +			   _IOC_SIZE(cmd)) != 0) {
> +		dev_err(drv_info->dev,
> +			"failed to copy from user arguments\n");
> +		ret = -EFAULT;
> +		goto ioctl_error;
> +	}
> +
> +	ret = func(filp, kdata);
> +
> +	if (copy_to_user((void __user *)param, kdata,
> +			 _IOC_SIZE(cmd)) != 0) {
> +		dev_err(drv_info->dev,
> +			"failed to copy to user arguments\n");
> +		ret = -EFAULT;
> +		goto ioctl_error;
> +	}
> +
> +ioctl_error:
> +	kvfree(kdata);
> +	return ret;
> +}
> +
> +static unsigned int virtio_vdmabuf_event_poll(struct file *filp,
> +			    	    	      struct poll_table_struct *wait)
> +{
> +	struct virtio_vdmabuf *vdmabuf = filp->private_data;
> +
> +	poll_wait(filp, &vdmabuf->evq->e_wait, wait);
> +
> +	if (!list_empty(&vdmabuf->evq->e_list))
> +		return POLLIN | POLLRDNORM;
> +
> +	return 0;
> +}
> +
> +static ssize_t virtio_vdmabuf_event_read(struct file *filp, char __user *buf,
> +			       		 size_t cnt, loff_t *ofst)
> +{
> +	struct virtio_vdmabuf *vdmabuf = filp->private_data;
> +	int ret;
> +
> +	/* make sure user buffer can be written */
> +	if (!access_ok(buf, sizeof (*buf))) {
> +		dev_err(drv_info->dev, "user buffer can't be written.\n");
> +		return -EINVAL;
> +	}
> +
> +	ret = mutex_lock_interruptible(&vdmabuf->evq->e_readlock);
> +	if (ret)
> +		return ret;
> +
> +	for (;;) {
> +		struct virtio_vdmabuf_event *e = NULL;
> +
> +		spin_lock_irq(&vdmabuf->evq->e_lock);
> +		if (!list_empty(&vdmabuf->evq->e_list)) {
> +			e = list_first_entry(&vdmabuf->evq->e_list,
> +					     struct virtio_vdmabuf_event, link);
> +			list_del(&e->link);
> +		}
> +		spin_unlock_irq(&vdmabuf->evq->e_lock);
> +
> +		if (!e) {
> +			if (ret)
> +				break;
> +
> +			if (filp->f_flags & O_NONBLOCK) {
> +				ret = -EAGAIN;
> +				break;
> +			}
> +
> +			mutex_unlock(&vdmabuf->evq->e_readlock);
> +			ret = wait_event_interruptible(vdmabuf->evq->e_wait,
> +					!list_empty(&vdmabuf->evq->e_list));
> +
> +			if (ret == 0)
> +				ret = mutex_lock_interruptible(
> +						&vdmabuf->evq->e_readlock);
> +
> +			if (ret)
> +				return ret;
> +		} else {
> +			unsigned int len = (sizeof(e->e_data.hdr) +
> +					    e->e_data.hdr.size);
> +
> +			if (len > cnt - ret) {
> +put_back_event:
> +				spin_lock_irq(&vdmabuf->evq->e_lock);
> +				list_add(&e->link, &vdmabuf->evq->e_list);
> +				spin_unlock_irq(&vdmabuf->evq->e_lock);
> +				break;
> +			}
> +
> +			if (copy_to_user(buf + ret, &e->e_data.hdr,
> +					 sizeof(e->e_data.hdr))) {
> +				if (ret == 0)
> +					ret = -EFAULT;
> +
> +				goto put_back_event;
> +			}
> +
> +			ret += sizeof(e->e_data.hdr);
> +
> +			if (copy_to_user(buf + ret, e->e_data.data,
> +					 e->e_data.hdr.size)) {
> +				/* error while copying void *data */
> +
> +				struct virtio_vdmabuf_e_hdr dummy_hdr = {0};
> +
> +				ret -= sizeof(e->e_data.hdr);
> +
> +				/* nullifying hdr of the event in user buffer */
> +				if (copy_to_user(buf + ret, &dummy_hdr,
> +						 sizeof(dummy_hdr)))
> +					dev_err(drv_info->dev,
> +					   "fail to nullify invalid hdr\n");
> +
> +				ret = -EFAULT;
> +
> +				goto put_back_event;
> +			}
> +
> +			ret += e->e_data.hdr.size;
> +			vdmabuf->evq->pending--;
> +			kvfree(e);
> +		}
> +	}
> +
> +	mutex_unlock(&vdmabuf->evq->e_readlock);
> +
> +	return ret;
> +}
> +
> +static const struct file_operations virtio_vdmabuf_fops = {
> +	.owner = THIS_MODULE,
> +	.open = virtio_vdmabuf_open,
> +	.release = virtio_vdmabuf_release,
> +	.read = virtio_vdmabuf_event_read,
> +	.poll = virtio_vdmabuf_event_poll,
> +	.unlocked_ioctl = virtio_vdmabuf_ioctl,
> +};
> +
> +static struct miscdevice virtio_vdmabuf_miscdev = {
> +	.minor = MISC_DYNAMIC_MINOR,
> +	.name = "virtio-vdmabuf",
> +	.fops = &virtio_vdmabuf_fops,
> +};
> +
> +static int virtio_vdmabuf_probe(struct virtio_device *vdev)
> +{
> +	vq_callback_t *cbs[] = {
> +		virtio_vdmabuf_recv_cb,
> +		virtio_vdmabuf_send_cb,
> +	};
> +	static const char *const names[] = {
> +		"recv",
> +		"send",
> +	};
> +	struct virtio_vdmabuf *vdmabuf;
> +	int ret = 0;
> +
> +	if (!drv_info)
> +		return -EINVAL;
> +
> +	vdmabuf = drv_info->priv;
> +
> +	if (!vdmabuf)
> +		return -EINVAL;
> +
> +	vdmabuf->vdev = vdev;
> +	vdev->priv = vdmabuf;
> +
> +	/* initialize spinlock for synchronizing virtqueue accesses */
> +	spin_lock_init(&vdmabuf->vq_lock);
> +
> +	ret = virtio_find_vqs(vdmabuf->vdev, VDMABUF_VQ_MAX, vdmabuf->vqs,
> +			      cbs, names, NULL);
> +	if (ret) {
> +		dev_err(drv_info->dev, "Cannot find any vqs\n");
> +		return ret;
> +	}
> +
> +	INIT_LIST_HEAD(&vdmabuf->msg_list);
> +	INIT_WORK(&vdmabuf->recv_work, virtio_vdmabuf_recv_work);
> +	INIT_WORK(&vdmabuf->send_work, virtio_vdmabuf_send_work);
> +	INIT_WORK(&vdmabuf->send_msg_work, virtio_vdmabuf_send_msg_work);
> +
> +	return ret;
> +}
> +
> +static void virtio_vdmabuf_remove(struct virtio_device *vdev)
> +{
> +	struct virtio_vdmabuf *vdmabuf;
> +
> +	if (!drv_info)
> +		return;
> +
> +	vdmabuf = drv_info->priv;
> +	flush_work(&vdmabuf->recv_work);
> +	flush_work(&vdmabuf->send_work);
> +	flush_work(&vdmabuf->send_msg_work);
> +
> +	vdev->config->reset(vdev);
> +	vdev->config->del_vqs(vdev);
> +}
> +
> +static struct virtio_device_id id_table[] = {
> +	{ VIRTIO_ID_VDMABUF, VIRTIO_DEV_ANY_ID },
> +	{ 0 },
> +};
> +
> +static struct virtio_driver virtio_vdmabuf_vdev_drv = {
> +	.driver.name =  KBUILD_MODNAME,
> +	.driver.owner = THIS_MODULE,
> +	.id_table =     id_table,
> +	.probe =        virtio_vdmabuf_probe,
> +	.remove =       virtio_vdmabuf_remove,
> +};
> +
> +static int __init virtio_vdmabuf_init(void)
> +{
> +	struct virtio_vdmabuf *vdmabuf;
> +	int ret = 0;
> +
> +	drv_info = NULL;
> +
> +	ret = misc_register(&virtio_vdmabuf_miscdev);
> +	if (ret) {
> +		pr_err("virtio-vdmabuf misc driver can't be registered\n");
> +		return ret;
> +	}
> +
> +	ret = dma_set_mask_and_coherent(virtio_vdmabuf_miscdev.this_device,
> +					DMA_BIT_MASK(64));
> +	if (ret < 0) {
> +		misc_deregister(&virtio_vdmabuf_miscdev);
> +		return -EINVAL;
> +	}
> +
> +	drv_info = kvcalloc(1, sizeof(*drv_info), GFP_KERNEL);
> +	if (!drv_info) {
> +		misc_deregister(&virtio_vdmabuf_miscdev);
> +		return -ENOMEM;
> +	}
> +
> +	vdmabuf = kvcalloc(1, sizeof(*vdmabuf), GFP_KERNEL);
> +	if (!vdmabuf) {
> +		kvfree(drv_info);
> +		misc_deregister(&virtio_vdmabuf_miscdev);
> +		return -ENOMEM;
> +	}
> +
> +	vdmabuf->evq = kvcalloc(1, sizeof(*(vdmabuf->evq)), GFP_KERNEL);
> +	if (!vdmabuf->evq) {
> +		kvfree(drv_info);
> +		kvfree(vdmabuf);
> +		misc_deregister(&virtio_vdmabuf_miscdev);
> +		return -ENOMEM;
> +	}
> +
> +	drv_info->priv = (void *)vdmabuf;
> +	drv_info->dev = virtio_vdmabuf_miscdev.this_device;
> +
> +	mutex_init(&drv_info->g_mutex);
> +
> +	mutex_init(&vdmabuf->evq->e_readlock);
> +	spin_lock_init(&vdmabuf->evq->e_lock);
> +
> +	INIT_LIST_HEAD(&vdmabuf->evq->e_list);
> +	init_waitqueue_head(&vdmabuf->evq->e_wait);
> +	hash_init(drv_info->buf_list);
> +
> +	vdmabuf->evq->pending = 0;
> +	vdmabuf->wq = create_workqueue("virtio_vdmabuf_wq");
> +
> +	ret = register_virtio_driver(&virtio_vdmabuf_vdev_drv);
> +	if (ret) {
> +		dev_err(drv_info->dev, "vdmabuf driver can't be registered\n");
> +		misc_deregister(&virtio_vdmabuf_miscdev);
> +		kvfree(vdmabuf);
> +		kvfree(drv_info);
> +		return -EFAULT;
> +	}
> +
> +	return 0;
> +}
> +
> +static void __exit virtio_vdmabuf_deinit(void)
> +{
> +	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
> +	struct virtio_vdmabuf_event *e, *et;
> +	unsigned long irqflags;
> +
> +	misc_deregister(&virtio_vdmabuf_miscdev);
> +	unregister_virtio_driver(&virtio_vdmabuf_vdev_drv);
> +
> +	if (vdmabuf->wq)
> +		destroy_workqueue(vdmabuf->wq);
> +
> +	spin_lock_irqsave(&vdmabuf->evq->e_lock, irqflags);
> +
> +	list_for_each_entry_safe(e, et, &vdmabuf->evq->e_list,
> +				 link) {
> +		list_del(&e->link);
> +		kvfree(e);
> +		vdmabuf->evq->pending--;
> +	}
> +
> +	spin_unlock_irqrestore(&vdmabuf->evq->e_lock, irqflags);
> +
> +	/* freeing all exported buffers */
> +	remove_all_bufs(vdmabuf);
> +
> +	kvfree(vdmabuf->evq);
> +	kvfree(vdmabuf);
> +	kvfree(drv_info);
> +}
> +
> +module_init(virtio_vdmabuf_init);
> +module_exit(virtio_vdmabuf_deinit);
> +
> +MODULE_DEVICE_TABLE(virtio, virtio_vdmabuf_id_table);
> +MODULE_DESCRIPTION("Virtio Vdmabuf frontend driver");
> +MODULE_LICENSE("GPL and additional rights");
> diff --git a/include/linux/virtio_vdmabuf.h b/include/linux/virtio_vdmabuf.h
> new file mode 100644
> index 000000000000..9500bf4a54ac
> --- /dev/null
> +++ b/include/linux/virtio_vdmabuf.h
> @@ -0,0 +1,271 @@
> +/* SPDX-License-Identifier: (MIT OR GPL-2.0) */
> +
> +/*
> + * Copyright © 2021 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef _LINUX_VIRTIO_VDMABUF_H 
> +#define _LINUX_VIRTIO_VDMABUF_H 
> +
> +#include <uapi/linux/virtio_vdmabuf.h>
> +#include <linux/hashtable.h>
> +#include <linux/kvm_types.h>
> +
> +struct virtio_vdmabuf_shared_pages {
> +	/* cross-VM ref addr for the buffer */
> +	gpa_t ref;
> +
> +	/* page array */
> +	struct page **pages;
> +	gpa_t **l2refs;
> +	gpa_t *l3refs;
> +
> +	/* data offset in the first page
> +	 * and data length in the last page
> +	 */
> +	int first_ofst;
> +	int last_len;
> +
> +	/* number of shared pages */
> +	int nents;
> +};
> +
> +struct virtio_vdmabuf_buf {
> +	virtio_vdmabuf_buf_id_t buf_id;
> +
> +	struct dma_buf_attachment *attach;
> +	struct dma_buf *dma_buf;
> +	struct sg_table *sgt;
> +	struct virtio_vdmabuf_shared_pages *pages_info;
> +	int vmid;
> +
> +	/* validity of the buffer */
> +	bool valid;
> +
> +	/* set if the buffer is imported via import_ioctl */
> +	bool imported;
> +
> +	/* size of private */
> +	size_t sz_priv;
> +	/* private data associated with the exported buffer */
> +	void *priv;
> +
> +	struct file *filp;
> +	struct hlist_node node;
> +};
> +
> +struct virtio_vdmabuf_event {
> +	struct virtio_vdmabuf_e_data e_data;
> +	struct list_head link;
> +};
> +
> +struct virtio_vdmabuf_event_queue {
> +	wait_queue_head_t e_wait;
> +	struct list_head e_list;
> +
> +	spinlock_t e_lock;
> +	struct mutex e_readlock;
> +
> +	/* # of pending events */
> +	int pending;
> +};
> +
> +/* driver information */
> +struct virtio_vdmabuf_info {
> +	struct device *dev;
> +
> +	struct list_head head_vdmabuf_list;
> +	struct list_head kvm_instances;
> +
> +	DECLARE_HASHTABLE(buf_list, 7);
> +
> +	void *priv;
> +	struct mutex g_mutex;
> +	struct notifier_block kvm_notifier;
> +};
> +
> +/* IOCTL definitions
> + */
> +typedef int (*virtio_vdmabuf_ioctl_t)(struct file *filp, void *data);
> +
> +struct virtio_vdmabuf_ioctl_desc {
> +	unsigned int cmd;
> +	int flags;
> +	virtio_vdmabuf_ioctl_t func;
> +	const char *name;
> +};
> +
> +#define VIRTIO_VDMABUF_IOCTL_DEF(ioctl, _func, _flags)	\
> +	[_IOC_NR(ioctl)] = {			\
> +			.cmd = ioctl,		\
> +			.func = _func,		\
> +			.flags = _flags,	\
> +			.name = #ioctl		\
> +}
> +
> +#define VIRTIO_VDMABUF_VMID(buf_id) ((((buf_id).id) >> 32) & 0xFFFFFFFF)
> +
> +/* Messages between Host and Guest */
> +
> +/* List of commands from Guest to Host:
> + *
> + * ------------------------------------------------------------------
> + * A. NEED_VMID
> + *
> + *  guest asks the host to provide its vmid
> + *
> + * req:
> + *
> + * cmd: VIRTIO_VDMABUF_NEED_VMID
> + *
> + * ack:
> + *
> + * cmd: same as req
> + * op[0] : vmid of guest
> + *
> + * ------------------------------------------------------------------
> + * B. EXPORT
> + *
> + *  export dmabuf to host
> + *
> + * req:
> + *
> + * cmd: VIRTIO_VDMABUF_CMD_EXPORT
> + * op0~op3 : HDMABUF ID
> + * op4 : number of pages to be shared
> + * op5 : offset of data in the first page
> + * op6 : length of data in the last page
> + * op7 : upper 32 bit of top-level ref of shared buf
> + * op8 : lower 32 bit of top-level ref of shared buf
> + * op9 : size of private data
> + * op10 ~ op64: User private date associated with the buffer
> + *	        (e.g. graphic buffer's meta info)
> + *
> + * ------------------------------------------------------------------
> + *
> + * List of commands from Host to Guest
> + *
> + * ------------------------------------------------------------------
> + * A. RELEASE
> + *
> + *  notifying guest that the shared buffer is released by an importer
> + *
> + * req:
> + *
> + * cmd: VIRTIO_VDMABUF_CMD_DMABUF_REL
> + * op0~op3 : VDMABUF ID
> + *
> + * ------------------------------------------------------------------
> + */
> +
> +/* msg structures */
> +struct virtio_vdmabuf_msg {
> +	struct list_head list;
> +	unsigned int cmd;
> +	unsigned int op[64];
> +};
> +
> +enum {
> +	VDMABUF_VQ_RECV = 0,
> +	VDMABUF_VQ_SEND = 1,
> +	VDMABUF_VQ_MAX  = 2,
> +};
> +
> +enum virtio_vdmabuf_cmd {
> +	VIRTIO_VDMABUF_CMD_NEED_VMID,
> +	VIRTIO_VDMABUF_CMD_EXPORT = 0x10,
> +	VIRTIO_VDMABUF_CMD_DMABUF_REL
> +};
> +
> +enum virtio_vdmabuf_ops {
> +	VIRTIO_VDMABUF_HDMABUF_ID_ID = 0,
> +	VIRTIO_VDMABUF_HDMABUF_ID_RNG_KEY0,
> +	VIRTIO_VDMABUF_HDMABUF_ID_RNG_KEY1,
> +	VIRTIO_VDMABUF_NUM_PAGES_SHARED = 4,
> +	VIRTIO_VDMABUF_FIRST_PAGE_DATA_OFFSET,
> +	VIRTIO_VDMABUF_LAST_PAGE_DATA_LENGTH,
> +	VIRTIO_VDMABUF_REF_ADDR_UPPER_32BIT,
> +	VIRTIO_VDMABUF_REF_ADDR_LOWER_32BIT,
> +	VIRTIO_VDMABUF_PRIVATE_DATA_SIZE,
> +	VIRTIO_VDMABUF_PRIVATE_DATA_START
> +};
> +
> +/* adding exported/imported vdmabuf info to hash */
> +static inline int
> +virtio_vdmabuf_add_buf(struct virtio_vdmabuf_info *info,
> +                       struct virtio_vdmabuf_buf *new)
> +{
> +	hash_add(info->buf_list, &new->node, new->buf_id.id);
> +	return 0;
> +}
> +
> +/* comparing two vdmabuf IDs */
> +static inline bool
> +is_same_buf(virtio_vdmabuf_buf_id_t a,
> +            virtio_vdmabuf_buf_id_t b)
> +{
> +	int i;
> +
> +	if (a.id != b.id)
> +		return 1;
> +
> +	/* compare keys */
> +	for (i = 0; i < 2; i++) {
> +		if (a.rng_key[i] != b.rng_key[i])
> +			return false;
> +	}
> +
> +	return true;
> +}
> +
> +/* find buf for given vdmabuf ID */
> +static inline struct virtio_vdmabuf_buf
> +*virtio_vdmabuf_find_buf(struct virtio_vdmabuf_info *info,
> +			 virtio_vdmabuf_buf_id_t *buf_id)
> +{
> +	struct virtio_vdmabuf_buf *found;
> +
> +	hash_for_each_possible(info->buf_list, found, node, buf_id->id)
> +		if (is_same_buf(found->buf_id, *buf_id))
> +			return found;
> +
> +	return NULL;
> +}
> +
> +/* delete buf from hash */
> +static inline int
> +virtio_vdmabuf_del_buf(struct virtio_vdmabuf_info *info,
> +                       virtio_vdmabuf_buf_id_t *buf_id)
> +{
> +	struct virtio_vdmabuf_buf *found;
> +
> +	found = virtio_vdmabuf_find_buf(info, buf_id);
> +	if (!found)
> +		return -ENOENT;
> +
> +	hash_del(&found->node);
> +
> +	return 0;
> +}
> +
> +#endif
> diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
> index bc1c0621f5ed..39c94637ddee 100644
> --- a/include/uapi/linux/virtio_ids.h
> +++ b/include/uapi/linux/virtio_ids.h
> @@ -54,5 +54,6 @@
>  #define VIRTIO_ID_FS			26 /* virtio filesystem */
>  #define VIRTIO_ID_PMEM			27 /* virtio pmem */
>  #define VIRTIO_ID_MAC80211_HWSIM	29 /* virtio mac80211-hwsim */
> +#define VIRTIO_ID_VDMABUF          	40 /* virtio vdmabuf */
>  
>  #endif /* _LINUX_VIRTIO_IDS_H */
> diff --git a/include/uapi/linux/virtio_vdmabuf.h b/include/uapi/linux/virtio_vdmabuf.h
> new file mode 100644
> index 000000000000..7bddaa04ddd6
> --- /dev/null
> +++ b/include/uapi/linux/virtio_vdmabuf.h
> @@ -0,0 +1,99 @@
> +// SPDX-License-Identifier: (MIT OR GPL-2.0)
> +
> +/*
> + * Copyright © 2021 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef _UAPI_LINUX_VIRTIO_VDMABUF_H
> +#define _UAPI_LINUX_VIRTIO_VDMABUF_H
> +
> +#define MAX_SIZE_PRIV_DATA 192
> +
> +typedef struct {
> +	__u64 id;
> +	/* 8B long Random number */
> +	int rng_key[2];
> +} virtio_vdmabuf_buf_id_t;
> +
> +struct virtio_vdmabuf_e_hdr {
> +	/* buf_id of new buf */
> +	virtio_vdmabuf_buf_id_t buf_id;
> +	/* size of private data */
> +	int size;
> +};
> +
> +struct virtio_vdmabuf_e_data {
> +	struct virtio_vdmabuf_e_hdr hdr;
> +	/* ptr to private data */
> +	void __user *data;
> +};
> +
> +#define VIRTIO_VDMABUF_IOCTL_IMPORT \
> +_IOC(_IOC_NONE, 'G', 2, sizeof(struct virtio_vdmabuf_import))
> +#define VIRTIO_VDMABUF_IOCTL_RELEASE \
> +_IOC(_IOC_NONE, 'G', 3, sizeof(struct virtio_vdmabuf_import))
> +struct virtio_vdmabuf_import {
> +	/* IN parameters */
> +	/* ahdb buf id to be imported */
> +	virtio_vdmabuf_buf_id_t buf_id;
> +	/* flags */
> +	int flags;
> +	/* OUT parameters */
> +	/* exported dma buf fd */
> +	int fd;
> +};
> +
> +#define VIRTIO_VDMABUF_IOCTL_EXPORT \
> +_IOC(_IOC_NONE, 'G', 4, sizeof(struct virtio_vdmabuf_export))
> +struct virtio_vdmabuf_export {
> +	/* IN parameters */
> +	/* DMA buf fd to be exported */
> +	int fd;
> +	/* exported dma buf id */
> +	virtio_vdmabuf_buf_id_t buf_id;
> +	int sz_priv;
> +	char *priv;
> +};
> +
> +#define VIRTIO_VDMABUF_IOCTL_QUERY \
> +_IOC(_IOC_NONE, 'G', 5, sizeof(struct virtio_vdmabuf_query))
> +struct virtio_vdmabuf_query {
> +	/* in parameters */
> +	/* id of buf to be queried */
> +	virtio_vdmabuf_buf_id_t buf_id;
> +	/* item to be queried */
> +	int item;
> +	/* OUT parameters */
> +	/* Value of queried item */
> +	unsigned long info;
> +};
> +
> +/* DMABUF query */
> +enum virtio_vdmabuf_query_cmd {
> +	VIRTIO_VDMABUF_QUERY_SIZE = 0x10,
> +	VIRTIO_VDMABUF_QUERY_BUSY,
> +	VIRTIO_VDMABUF_QUERY_PRIV_INFO_SIZE,
> +	VIRTIO_VDMABUF_QUERY_PRIV_INFO,
> +};
> +
> +#endif
> -- 
> 2.26.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-05 16:03     ` Daniel Vetter
  0 siblings, 0 replies; 57+ messages in thread
From: Daniel Vetter @ 2021-02-05 16:03 UTC (permalink / raw)
  To: Vivek Kasireddy, Gerd Hoffmann
  Cc: dongwon.kim, christian.koenig, daniel.vetter, dri-devel,
	virtualization, daniel.vetter, sumit.semwal, linux-media

On Tue, Feb 02, 2021 at 11:35:16PM -0800, Vivek Kasireddy wrote:
> This driver "transfers" a dmabuf created on the Guest to the Host.
> A common use-case for such a transfer includes sharing the scanout
> buffer created by a display server or a compositor running in the
> Guest with Qemu UI -- running on the Host.
> 
> The "transfer" is accomplished by sharing the PFNs of all the pages
> associated with the dmabuf and having a new dmabuf created on the
> Host that is backed up by the pages mapped from the Guest.
> 
> Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> ---
>  drivers/virtio/Kconfig              |    8 +
>  drivers/virtio/Makefile             |    1 +
>  drivers/virtio/virtio_vdmabuf.c     | 1090 +++++++++++++++++++++++++++
>  include/linux/virtio_vdmabuf.h      |  271 +++++++
>  include/uapi/linux/virtio_ids.h     |    1 +
>  include/uapi/linux/virtio_vdmabuf.h |   99 +++
>  6 files changed, 1470 insertions(+)
>  create mode 100644 drivers/virtio/virtio_vdmabuf.c
>  create mode 100644 include/linux/virtio_vdmabuf.h
>  create mode 100644 include/uapi/linux/virtio_vdmabuf.h
> 
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 7b41130d3f35..e563c12f711e 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -139,4 +139,12 @@ config VIRTIO_DMA_SHARED_BUFFER
>  	 This option adds a flavor of dma buffers that are backed by
>  	 virtio resources.
>  
> +config VIRTIO_VDMABUF
> +	bool "Enables Vdmabuf driver in guest os"
> +	default n
> +	depends on VIRTIO
> +	help
> +	 This driver provides a way to share the dmabufs created in
> +	 the Guest with the Host.
> +
>  endif # VIRTIO_MENU
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 591e6f72aa54..b4bb0738009c 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -9,3 +9,4 @@ obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
>  obj-$(CONFIG_VIRTIO_VDPA) += virtio_vdpa.o
>  obj-$(CONFIG_VIRTIO_MEM) += virtio_mem.o
>  obj-$(CONFIG_VIRTIO_DMA_SHARED_BUFFER) += virtio_dma_buf.o
> +obj-$(CONFIG_VIRTIO_VDMABUF) += virtio_vdmabuf.o
> diff --git a/drivers/virtio/virtio_vdmabuf.c b/drivers/virtio/virtio_vdmabuf.c
> new file mode 100644
> index 000000000000..c28f144eb126
> --- /dev/null
> +++ b/drivers/virtio/virtio_vdmabuf.c
> @@ -0,0 +1,1090 @@
> +// SPDX-License-Identifier: (MIT OR GPL-2.0)
> +
> +/*
> + * Copyright © 2021 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + * Authors:
> + *    Dongwon Kim <dongwon.kim@intel.com>
> + *    Mateusz Polrola <mateusz.polrola@gmail.com>
> + *    Vivek Kasireddy <vivek.kasireddy@intel.com>
> + */
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/errno.h>
> +#include <linux/module.h>
> +#include <linux/device.h>
> +#include <linux/uaccess.h>
> +#include <linux/miscdevice.h>
> +#include <linux/delay.h>
> +#include <linux/random.h>
> +#include <linux/poll.h>
> +#include <linux/spinlock.h>
> +#include <linux/dma-buf.h>
> +#include <linux/virtio.h>
> +#include <linux/virtio_ids.h>
> +#include <linux/virtio_config.h>
> +#include <linux/virtio_vdmabuf.h>
> +
> +#define VIRTIO_VDMABUF_MAX_ID INT_MAX
> +#define REFS_PER_PAGE (PAGE_SIZE/sizeof(long))
> +#define NEW_BUF_ID_GEN(vmid, cnt) (((vmid & 0xFFFFFFFF) << 32) | \
> +				    ((cnt) & 0xFFFFFFFF))
> +
> +/* one global drv object */
> +static struct virtio_vdmabuf_info *drv_info;
> +
> +struct virtio_vdmabuf {
> +	/* virtio device structure */
> +	struct virtio_device *vdev;
> +
> +	/* virtual queue array */
> +	struct virtqueue *vqs[VDMABUF_VQ_MAX];
> +
> +	/* ID of guest OS */
> +	u64 vmid;
> +
> +	/* spin lock that needs to be acquired before accessing
> +	 * virtual queue
> +	 */
> +	spinlock_t vq_lock;
> +	struct mutex recv_lock;
> +	struct mutex send_lock;
> +
> +	struct list_head msg_list;
> +
> +	/* workqueue */
> +	struct workqueue_struct *wq;
> +	struct work_struct recv_work;
> +	struct work_struct send_work;
> +	struct work_struct send_msg_work;
> +
> +	struct virtio_vdmabuf_event_queue *evq;
> +};
> +
> +static virtio_vdmabuf_buf_id_t get_buf_id(struct virtio_vdmabuf *vdmabuf)
> +{
> +	virtio_vdmabuf_buf_id_t buf_id = {0, {0, 0} };
> +	static int count = 0;
> +
> +	count = count < VIRTIO_VDMABUF_MAX_ID ? count + 1 : 0;
> +	buf_id.id = NEW_BUF_ID_GEN(vdmabuf->vmid, count);
> +
> +	/* random data embedded in the id for security */
> +	get_random_bytes(&buf_id.rng_key[0], 8);
> +
> +	return buf_id;
> +}
> +
> +/* sharing pages for original DMABUF with Host */
> +static struct virtio_vdmabuf_shared_pages
> +*virtio_vdmabuf_share_buf(struct page **pages, int nents,
> +			  int first_ofst, int last_len)
> +{
> +	struct virtio_vdmabuf_shared_pages *pages_info;
> +	int i;
> +	int n_l2refs = nents/REFS_PER_PAGE +
> +		       ((nents % REFS_PER_PAGE) ? 1 : 0);
> +
> +	pages_info = kvcalloc(1, sizeof(*pages_info), GFP_KERNEL);
> +	if (!pages_info)
> +		return NULL;
> +
> +	pages_info->pages = pages;
> +	pages_info->nents = nents;
> +	pages_info->first_ofst = first_ofst;
> +	pages_info->last_len = last_len;
> +	pages_info->l3refs = (gpa_t *)__get_free_page(GFP_KERNEL);
> +
> +	if (!pages_info->l3refs) {
> +		kvfree(pages_info);
> +		return NULL;
> +	}
> +
> +	pages_info->l2refs = (gpa_t **)__get_free_pages(GFP_KERNEL,
> +					get_order(n_l2refs * PAGE_SIZE));
> +
> +	if (!pages_info->l2refs) {
> +		free_page((gpa_t)pages_info->l3refs);
> +		kvfree(pages_info);
> +		return NULL;
> +	}
> +
> +	/* Share physical address of pages */
> +	for (i = 0; i < nents; i++)
> +		pages_info->l2refs[i] = (gpa_t *)page_to_phys(pages[i]);
> +
> +	for (i = 0; i < n_l2refs; i++)
> +		pages_info->l3refs[i] =
> +			virt_to_phys((void *)pages_info->l2refs +
> +				     i * PAGE_SIZE);
> +
> +	pages_info->ref = (gpa_t)virt_to_phys(pages_info->l3refs);
> +
> +	return pages_info;
> +}
> +
> +/* stop sharing pages */
> +static void
> +virtio_vdmabuf_free_buf(struct virtio_vdmabuf_shared_pages *pages_info)
> +{
> +	int n_l2refs = (pages_info->nents/REFS_PER_PAGE +
> +		       ((pages_info->nents % REFS_PER_PAGE) ? 1 : 0));
> +
> +	free_pages((gpa_t)pages_info->l2refs, get_order(n_l2refs * PAGE_SIZE));
> +	free_page((gpa_t)pages_info->l3refs);
> +
> +	kvfree(pages_info);
> +}
> +
> +static int send_msg_to_host(enum virtio_vdmabuf_cmd cmd, int *op)
> +{
> +	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
> +	struct virtio_vdmabuf_msg *msg;
> +	int i;
> +
> +	switch (cmd) {
> +	case VIRTIO_VDMABUF_CMD_NEED_VMID:
> +		msg = kvcalloc(1, sizeof(struct virtio_vdmabuf_msg),
> +			       GFP_KERNEL);
> +		if (!msg)
> +			return -ENOMEM;
> +
> +		if (op)
> +			for (i = 0; i < 4; i++)
> +				msg->op[i] = op[i];
> +		break;
> +
> +	case VIRTIO_VDMABUF_CMD_EXPORT:
> +		msg = kvcalloc(1, sizeof(struct virtio_vdmabuf_msg),
> +			       GFP_KERNEL);
> +		if (!msg)
> +			return -ENOMEM;
> +
> +		memcpy(&msg->op[0], &op[0], 9 * sizeof(int) + op[9]);
> +		break;
> +
> +	default:
> +		/* no command found */
> +		return -EINVAL;
> +	}
> +
> +	msg->cmd = cmd;
> +	list_add_tail(&msg->list, &vdmabuf->msg_list);
> +	queue_work(vdmabuf->wq, &vdmabuf->send_msg_work);
> +
> +	return 0;
> +}
> +
> +static int add_event_buf_rel(struct virtio_vdmabuf_buf *buf_info)
> +{
> +	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
> +	struct virtio_vdmabuf_event *e_oldest, *e_new;
> +	struct virtio_vdmabuf_event_queue *eq = vdmabuf->evq;
> +	unsigned long irqflags;
> +
> +	e_new = kvzalloc(sizeof(*e_new), GFP_KERNEL);
> +	if (!e_new)
> +		return -ENOMEM;
> +
> +	e_new->e_data.hdr.buf_id = buf_info->buf_id;
> +	e_new->e_data.data = (void *)buf_info->priv;
> +	e_new->e_data.hdr.size = buf_info->sz_priv;
> +
> +	spin_lock_irqsave(&eq->e_lock, irqflags);
> +
> +	/* check current number of events and if it hits the max num (32)
> +	 * then remove the oldest event in the list
> +	 */
> +	if (eq->pending > 31) {
> +		e_oldest = list_first_entry(&eq->e_list,
> +					    struct virtio_vdmabuf_event, link);
> +		list_del(&e_oldest->link);
> +		eq->pending--;
> +		kvfree(e_oldest);
> +	}
> +
> +	list_add_tail(&e_new->link, &eq->e_list);
> +
> +	eq->pending++;
> +
> +	wake_up_interruptible(&eq->e_wait);
> +	spin_unlock_irqrestore(&eq->e_lock, irqflags);
> +
> +	return 0;
> +}
> +
> +static void virtio_vdmabuf_clear_buf(struct virtio_vdmabuf_buf *exp)
> +{
> +	/* Start cleanup of buffer in reverse order to exporting */
> +	virtio_vdmabuf_free_buf(exp->pages_info);
> +
> +	dma_buf_unmap_attachment(exp->attach, exp->sgt,
> +				 DMA_BIDIRECTIONAL);
> +
> +	if (exp->dma_buf) {
> +		dma_buf_detach(exp->dma_buf, exp->attach);
> +		/* close connection to dma-buf completely */
> +		dma_buf_put(exp->dma_buf);
> +		exp->dma_buf = NULL;
> +	}
> +}
> +
> +static int remove_buf(struct virtio_vdmabuf *vdmabuf,
> +		      struct virtio_vdmabuf_buf *exp)
> +{
> +	int ret;
> +
> +	ret = add_event_buf_rel(exp);
> +	if (ret)
> +		return ret;
> +
> +	virtio_vdmabuf_clear_buf(exp);
> +
> +	ret = virtio_vdmabuf_del_buf(drv_info, &exp->buf_id);
> +	if (ret)
> +		return ret;
> +
> +	if (exp->sz_priv > 0 && !exp->priv)
> +		kvfree(exp->priv);
> +
> +	kvfree(exp);
> +	return 0;
> +}
> +
> +static int parse_msg_from_host(struct virtio_vdmabuf *vdmabuf,
> +		     	       struct virtio_vdmabuf_msg *msg)
> +{
> +	struct virtio_vdmabuf_buf *exp;
> +	virtio_vdmabuf_buf_id_t buf_id;
> +	int ret;
> +
> +	switch (msg->cmd) {
> +	case VIRTIO_VDMABUF_CMD_NEED_VMID:
> +		vdmabuf->vmid = msg->op[0];
> +
> +		break;
> +	case VIRTIO_VDMABUF_CMD_DMABUF_REL:
> +		memcpy(&buf_id, msg->op, sizeof(buf_id));
> +
> +		exp = virtio_vdmabuf_find_buf(drv_info, &buf_id);
> +		if (!exp) {
> +			dev_err(drv_info->dev, "can't find buffer\n");
> +			return -EINVAL;
> +		}
> +
> +		ret = remove_buf(vdmabuf, exp);
> +		if (ret)
> +			return ret;
> +
> +		break;
> +	case VIRTIO_VDMABUF_CMD_EXPORT:
> +		break;
> +	default:
> +		dev_err(drv_info->dev, "empty cmd\n");
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static void virtio_vdmabuf_recv_work(struct work_struct *work)
> +{
> +	struct virtio_vdmabuf *vdmabuf =
> +		container_of(work, struct virtio_vdmabuf, recv_work);
> +	struct virtqueue *vq = vdmabuf->vqs[VDMABUF_VQ_RECV];
> +	struct virtio_vdmabuf_msg *msg;
> +	int sz;
> +
> +	mutex_lock(&vdmabuf->recv_lock);
> +
> +	do {
> +		virtqueue_disable_cb(vq);
> +		for (;;) {
> +			msg = virtqueue_get_buf(vq, &sz);
> +			if (!msg)
> +				break;
> +
> +			/* valid size */
> +			if (sz == sizeof(struct virtio_vdmabuf_msg)) {
> +				if (parse_msg_from_host(vdmabuf, msg))
> +					dev_err(drv_info->dev,
> +						"msg parse error\n");
> +
> +				kvfree(msg);
> +			} else {
> +				dev_err(drv_info->dev,
> +					"received malformed message\n");
> +			}
> +		}
> +	} while (!virtqueue_enable_cb(vq));
> +
> +	mutex_unlock(&vdmabuf->recv_lock);
> +}
> +
> +static void virtio_vdmabuf_fill_recv_msg(struct virtio_vdmabuf *vdmabuf)
> +{
> +	struct virtqueue *vq = vdmabuf->vqs[VDMABUF_VQ_RECV];
> +	struct scatterlist sg;
> +	struct virtio_vdmabuf_msg *msg;
> +	int ret;
> +
> +	msg = kvzalloc(sizeof(*msg), GFP_KERNEL);
> +	if (!msg)
> +		return;
> +
> +	sg_init_one(&sg, msg, sizeof(struct virtio_vdmabuf_msg));
> +	ret = virtqueue_add_inbuf(vq, &sg, 1, msg, GFP_KERNEL);
> +	if (ret)
> +		return;
> +
> +	virtqueue_kick(vq);
> +}
> +
> +static void virtio_vdmabuf_send_msg_work(struct work_struct *work)
> +{
> +	struct virtio_vdmabuf *vdmabuf =
> +		container_of(work, struct virtio_vdmabuf, send_msg_work);
> +	struct virtqueue *vq = vdmabuf->vqs[VDMABUF_VQ_SEND];
> +	struct scatterlist sg;
> +	struct virtio_vdmabuf_msg *msg;
> +	bool added = false;
> +	int ret;
> +
> +	mutex_lock(&vdmabuf->send_lock);
> +
> +	for (;;) {
> +		if (list_empty(&vdmabuf->msg_list))
> +			break;
> +
> +		virtio_vdmabuf_fill_recv_msg(vdmabuf);
> +
> +		msg = list_first_entry(&vdmabuf->msg_list,
> +				       struct virtio_vdmabuf_msg, list);
> +		list_del_init(&msg->list);
> +
> +		sg_init_one(&sg, msg, sizeof(struct virtio_vdmabuf_msg));
> +		ret = virtqueue_add_outbuf(vq, &sg, 1, msg, GFP_KERNEL);
> +		if (ret < 0) {
> +			dev_err(drv_info->dev,
> +				"failed to add msg to vq\n");
> +			break;
> +		}
> +
> +		added = true;	
> +	}
> +
> +	if (added)
> +		virtqueue_kick(vq);
> +
> +	mutex_unlock(&vdmabuf->send_lock);
> +}
> +
> +static void virtio_vdmabuf_send_work(struct work_struct *work)
> +{
> +	struct virtio_vdmabuf *vdmabuf =
> +		container_of(work, struct virtio_vdmabuf, send_work);
> +	struct virtqueue *vq = vdmabuf->vqs[VDMABUF_VQ_SEND];
> +	struct virtio_vdmabuf_msg *msg;
> +	unsigned int sz;
> +	bool added = false;
> +
> +	mutex_lock(&vdmabuf->send_lock);
> +
> +	do {
> +		virtqueue_disable_cb(vq);
> +
> +		for (;;) {
> +			msg = virtqueue_get_buf(vq, &sz);
> +			if (!msg)
> +				break;
> +
> +			if (parse_msg_from_host(vdmabuf, msg))
> +				dev_err(drv_info->dev,
> +					"msg parse error\n");
> +
> +			kvfree(msg);
> +			added = true;
> +		}
> +	} while (!virtqueue_enable_cb(vq));
> +
> +	mutex_unlock(&vdmabuf->send_lock);
> +
> +	if (added)
> +		queue_work(vdmabuf->wq, &vdmabuf->send_msg_work);
> +}
> +
> +static void virtio_vdmabuf_recv_cb(struct virtqueue *vq)
> +{
> +	struct virtio_vdmabuf *vdmabuf = vq->vdev->priv;
> +
> +	if (!vdmabuf)
> +		return;
> +
> +	queue_work(vdmabuf->wq, &vdmabuf->recv_work);
> +}
> +
> +static void virtio_vdmabuf_send_cb(struct virtqueue *vq)
> +{
> +	struct virtio_vdmabuf *vdmabuf = vq->vdev->priv;
> +
> +	if (!vdmabuf)
> +		return;
> +
> +	queue_work(vdmabuf->wq, &vdmabuf->send_work);
> +}
> +
> +static int remove_all_bufs(struct virtio_vdmabuf *vdmabuf)
> +{
> +	struct virtio_vdmabuf_buf *found;
> +	struct hlist_node *tmp;
> +	int bkt;
> +	int ret;
> +
> +	hash_for_each_safe(drv_info->buf_list, bkt, tmp, found, node) {
> +		ret = remove_buf(vdmabuf, found);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static int virtio_vdmabuf_open(struct inode *inode, struct file *filp)
> +{
> +	int ret;
> +
> +	if (!drv_info) {
> +		pr_err("virtio vdmabuf driver is not ready\n");
> +		return -EINVAL;
> +	}
> +
> +	ret = send_msg_to_host(VIRTIO_VDMABUF_CMD_NEED_VMID, 0);
> +	if (ret < 0)
> +		dev_err(drv_info->dev, "fail to receive vmid\n");
> +
> +	filp->private_data = drv_info->priv;
> +
> +	return 0;
> +}
> +
> +static int virtio_vdmabuf_release(struct inode *inode, struct file *filp)
> +{
> +	return 0;
> +}
> +
> +/* Notify Host about the new vdmabuf */
> +static int export_notify(struct virtio_vdmabuf_buf *exp, struct page **pages)
> +{
> +	int *op;
> +	int ret;
> +
> +	op = kvcalloc(1, sizeof(int) * 65, GFP_KERNEL);
> +	if (!op)
> +		return -ENOMEM;
> +
> +	memcpy(op, &exp->buf_id, sizeof(exp->buf_id));
> +
> +	/* if new pages are to be shared */
> +	if (pages) {
> +		op[4] = exp->pages_info->nents;
> +		op[5] = exp->pages_info->first_ofst;
> +		op[6] = exp->pages_info->last_len;
> +
> +		memcpy(&op[7], &exp->pages_info->ref, sizeof(gpa_t));
> +	}
> +
> +	op[9] = exp->sz_priv;
> +
> +	/* driver/application specific private info */
> +	memcpy(&op[10], exp->priv, op[9]);
> +
> +	ret = send_msg_to_host(VIRTIO_VDMABUF_CMD_EXPORT, op);
> +
> +	kvfree(op);
> +	return ret;
> +}
> +
> +/* return total number of pages referenced by a sgt
> + * for pre-calculation of # of pages behind a given sgt
> + */
> +static int num_pgs(struct sg_table *sgt)
> +{
> +	struct scatterlist *sgl;
> +	int len, i;
> +	/* at least one page */
> +	int n_pgs = 1;
> +
> +	sgl = sgt->sgl;
> +
> +	len = sgl->length - PAGE_SIZE + sgl->offset;
> +
> +	/* round-up */
> +	n_pgs += ((len + PAGE_SIZE - 1)/PAGE_SIZE);
> +
> +	for (i = 1; i < sgt->nents; i++) {
> +		sgl = sg_next(sgl);
> +
> +		/* round-up */
> +		n_pgs += ((sgl->length + PAGE_SIZE - 1) /
> +			  PAGE_SIZE); /* round-up */
> +	}
> +
> +	return n_pgs;
> +}
> +
> +/* extract pages referenced by sgt */
> +static struct page **extr_pgs(struct sg_table *sgt, int *nents, int *last_len)

Nack, this doesn't work on dma-buf. And it'll blow up at runtime when you
enable the very recently merged CONFIG_DMABUF_DEBUG (would be good to test
with that, just to make sure).

Aside from this, for virtio/kvm use-cases we've already merged the udmabuf
driver. Does this not work for your usecase?

Adding Gerd as the subject expert for this area.

Thanks, Daniel

> +{
> +	struct scatterlist *sgl;
> +	struct page **pages;
> +	struct page **temp_pgs;
> +	int i, j;
> +	int len;
> +
> +	*nents = num_pgs(sgt);
> +	pages =	kvmalloc_array(*nents, sizeof(struct page *), GFP_KERNEL);
> +	if (!pages)
> +		return NULL;
> +
> +	sgl = sgt->sgl;
> +
> +	temp_pgs = pages;
> +	*temp_pgs++ = sg_page(sgl);
> +	len = sgl->length - PAGE_SIZE + sgl->offset;
> +
> +	i = 1;
> +	while (len > 0) {
> +		*temp_pgs++ = nth_page(sg_page(sgl), i++);
> +		len -= PAGE_SIZE;
> +	}
> +
> +	for (i = 1; i < sgt->nents; i++) {
> +		sgl = sg_next(sgl);
> +		*temp_pgs++ = sg_page(sgl);
> +		len = sgl->length - PAGE_SIZE;
> +		j = 1;
> +
> +		while (len > 0) {
> +			*temp_pgs++ = nth_page(sg_page(sgl), j++);
> +			len -= PAGE_SIZE;
> +		}
> +	}
> +
> +	*last_len = len + PAGE_SIZE;
> +
> +	return pages;
> +}
> +
> +/* ioctl - exporting new vdmabuf
> + *
> + *	 int dmabuf_fd - File handle of original DMABUF
> + *	 virtio_vdmabuf_buf_id_t buf_id - returned vdmabuf ID
> + *	 int sz_priv - size of private data from userspace
> + *	 char *priv - buffer of user private data
> + *
> + */
> +static int export_ioctl(struct file *filp, void *data)
> +{
> +	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
> +	struct virtio_vdmabuf_export *attr = data;
> +	struct dma_buf *dmabuf;
> +	struct dma_buf_attachment *attach;
> +	struct sg_table *sgt;
> +	struct virtio_vdmabuf_buf *exp;
> +	struct page **pages;
> +	int nents, last_len;
> +	virtio_vdmabuf_buf_id_t buf_id;
> +	int ret = 0;
> +
> +	if (vdmabuf->vmid <= 0)
> +		return -EINVAL;
> +
> +	dmabuf = dma_buf_get(attr->fd);
> +	if (IS_ERR(dmabuf))
> +		return PTR_ERR(dmabuf);
> +
> +	mutex_lock(&drv_info->g_mutex);
> +
> +	buf_id = get_buf_id(vdmabuf);
> +
> +	attach = dma_buf_attach(dmabuf, drv_info->dev);
> +	if (IS_ERR(attach)) {
> +		ret = PTR_ERR(attach);
> +		goto fail_attach;
> +	}
> +
> +	sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
> +	if (IS_ERR(sgt)) {
> +		ret = PTR_ERR(sgt);
> +		goto fail_map_attachment;
> +	}
> +
> +	/* allocate a new exp */
> +	exp = kvcalloc(1, sizeof(*exp), GFP_KERNEL);
> +	if (!exp) {
> +		ret = -ENOMEM;
> +		goto fail_sgt_info_creation;
> +	}
> +
> +	/* possible truncation */
> +	if (attr->sz_priv > MAX_SIZE_PRIV_DATA)
> +		exp->sz_priv = MAX_SIZE_PRIV_DATA;
> +	else
> +		exp->sz_priv = attr->sz_priv;
> +
> +	/* creating buffer for private data */
> +	if (exp->sz_priv != 0) {
> +		exp->priv = kvcalloc(1, exp->sz_priv, GFP_KERNEL);
> +		if (!exp->priv) {
> +			ret = -ENOMEM;
> +			goto fail_priv_creation;
> +		}
> +	}
> +
> +	exp->buf_id = buf_id;
> +	exp->attach = attach;
> +	exp->sgt = sgt;
> +	exp->dma_buf = dmabuf;
> +	exp->valid = 1;
> +
> +	if (exp->sz_priv) {
> +		/* copy private data to sgt_info */
> +		ret = copy_from_user(exp->priv, attr->priv, exp->sz_priv);
> +		if (ret) {
> +			ret = -EINVAL;
> +			goto fail_exp;
> +		}
> +	}
> +
> +	pages = extr_pgs(sgt, &nents, &last_len);
> +	if (pages == NULL) {
> +		ret = -ENOMEM;
> +		goto fail_exp;
> +	}
> +
> +	exp->pages_info = virtio_vdmabuf_share_buf(pages, nents,
> +						   sgt->sgl->offset,
> +					 	   last_len);
> +	if (!exp->pages_info) {
> +		ret = -ENOMEM;
> +		goto fail_create_pages_info;
> +	}
> +
> +	attr->buf_id = exp->buf_id;
> +	ret = export_notify(exp, pages);
> +	if (ret < 0)
> +		goto fail_send_request;
> +
> +	/* now register it to the export list */
> +	virtio_vdmabuf_add_buf(drv_info, exp);
> +
> +	exp->filp = filp;
> +
> +	mutex_unlock(&drv_info->g_mutex);
> +
> +	return ret;
> +
> +/* Clean-up if error occurs */
> +fail_send_request:
> +	virtio_vdmabuf_free_buf(exp->pages_info);
> +
> +fail_create_pages_info:
> +	kvfree(pages);
> +
> +fail_exp:
> +	kvfree(exp->priv);
> +
> +fail_priv_creation:
> +	kvfree(exp);
> +
> +fail_sgt_info_creation:
> +	dma_buf_unmap_attachment(attach, sgt,
> +				 DMA_BIDIRECTIONAL);
> +
> +fail_map_attachment:
> +	dma_buf_detach(dmabuf, attach);
> +
> +fail_attach:
> +	dma_buf_put(dmabuf);
> +
> +	mutex_unlock(&drv_info->g_mutex);
> +
> +	return ret;
> +}
> +
> +static const struct virtio_vdmabuf_ioctl_desc virtio_vdmabuf_ioctls[] = {
> +	VIRTIO_VDMABUF_IOCTL_DEF(VIRTIO_VDMABUF_IOCTL_EXPORT, export_ioctl, 0),
> +};
> +
> +static long virtio_vdmabuf_ioctl(struct file *filp, unsigned int cmd,
> +		       		 unsigned long param)
> +{
> +	const struct virtio_vdmabuf_ioctl_desc *ioctl = NULL;
> +	unsigned int nr = _IOC_NR(cmd);
> +	int ret;
> +	virtio_vdmabuf_ioctl_t func;
> +	char *kdata;
> +
> +	if (nr >= ARRAY_SIZE(virtio_vdmabuf_ioctls)) {
> +		dev_err(drv_info->dev, "invalid ioctl\n");
> +		return -EINVAL;
> +	}
> +
> +	ioctl = &virtio_vdmabuf_ioctls[nr];
> +
> +	func = ioctl->func;
> +
> +	if (unlikely(!func)) {
> +		dev_err(drv_info->dev, "no function\n");
> +		return -EINVAL;
> +	}
> +
> +	kdata = kvmalloc(_IOC_SIZE(cmd), GFP_KERNEL);
> +	if (!kdata)
> +		return -ENOMEM;
> +
> +	if (copy_from_user(kdata, (void __user *)param,
> +			   _IOC_SIZE(cmd)) != 0) {
> +		dev_err(drv_info->dev,
> +			"failed to copy from user arguments\n");
> +		ret = -EFAULT;
> +		goto ioctl_error;
> +	}
> +
> +	ret = func(filp, kdata);
> +
> +	if (copy_to_user((void __user *)param, kdata,
> +			 _IOC_SIZE(cmd)) != 0) {
> +		dev_err(drv_info->dev,
> +			"failed to copy to user arguments\n");
> +		ret = -EFAULT;
> +		goto ioctl_error;
> +	}
> +
> +ioctl_error:
> +	kvfree(kdata);
> +	return ret;
> +}
> +
> +static unsigned int virtio_vdmabuf_event_poll(struct file *filp,
> +			    	    	      struct poll_table_struct *wait)
> +{
> +	struct virtio_vdmabuf *vdmabuf = filp->private_data;
> +
> +	poll_wait(filp, &vdmabuf->evq->e_wait, wait);
> +
> +	if (!list_empty(&vdmabuf->evq->e_list))
> +		return POLLIN | POLLRDNORM;
> +
> +	return 0;
> +}
> +
> +static ssize_t virtio_vdmabuf_event_read(struct file *filp, char __user *buf,
> +			       		 size_t cnt, loff_t *ofst)
> +{
> +	struct virtio_vdmabuf *vdmabuf = filp->private_data;
> +	int ret;
> +
> +	/* make sure user buffer can be written */
> +	if (!access_ok(buf, sizeof (*buf))) {
> +		dev_err(drv_info->dev, "user buffer can't be written.\n");
> +		return -EINVAL;
> +	}
> +
> +	ret = mutex_lock_interruptible(&vdmabuf->evq->e_readlock);
> +	if (ret)
> +		return ret;
> +
> +	for (;;) {
> +		struct virtio_vdmabuf_event *e = NULL;
> +
> +		spin_lock_irq(&vdmabuf->evq->e_lock);
> +		if (!list_empty(&vdmabuf->evq->e_list)) {
> +			e = list_first_entry(&vdmabuf->evq->e_list,
> +					     struct virtio_vdmabuf_event, link);
> +			list_del(&e->link);
> +		}
> +		spin_unlock_irq(&vdmabuf->evq->e_lock);
> +
> +		if (!e) {
> +			if (ret)
> +				break;
> +
> +			if (filp->f_flags & O_NONBLOCK) {
> +				ret = -EAGAIN;
> +				break;
> +			}
> +
> +			mutex_unlock(&vdmabuf->evq->e_readlock);
> +			ret = wait_event_interruptible(vdmabuf->evq->e_wait,
> +					!list_empty(&vdmabuf->evq->e_list));
> +
> +			if (ret == 0)
> +				ret = mutex_lock_interruptible(
> +						&vdmabuf->evq->e_readlock);
> +
> +			if (ret)
> +				return ret;
> +		} else {
> +			unsigned int len = (sizeof(e->e_data.hdr) +
> +					    e->e_data.hdr.size);
> +
> +			if (len > cnt - ret) {
> +put_back_event:
> +				spin_lock_irq(&vdmabuf->evq->e_lock);
> +				list_add(&e->link, &vdmabuf->evq->e_list);
> +				spin_unlock_irq(&vdmabuf->evq->e_lock);
> +				break;
> +			}
> +
> +			if (copy_to_user(buf + ret, &e->e_data.hdr,
> +					 sizeof(e->e_data.hdr))) {
> +				if (ret == 0)
> +					ret = -EFAULT;
> +
> +				goto put_back_event;
> +			}
> +
> +			ret += sizeof(e->e_data.hdr);
> +
> +			if (copy_to_user(buf + ret, e->e_data.data,
> +					 e->e_data.hdr.size)) {
> +				/* error while copying void *data */
> +
> +				struct virtio_vdmabuf_e_hdr dummy_hdr = {0};
> +
> +				ret -= sizeof(e->e_data.hdr);
> +
> +				/* nullifying hdr of the event in user buffer */
> +				if (copy_to_user(buf + ret, &dummy_hdr,
> +						 sizeof(dummy_hdr)))
> +					dev_err(drv_info->dev,
> +					   "fail to nullify invalid hdr\n");
> +
> +				ret = -EFAULT;
> +
> +				goto put_back_event;
> +			}
> +
> +			ret += e->e_data.hdr.size;
> +			vdmabuf->evq->pending--;
> +			kvfree(e);
> +		}
> +	}
> +
> +	mutex_unlock(&vdmabuf->evq->e_readlock);
> +
> +	return ret;
> +}
> +
> +static const struct file_operations virtio_vdmabuf_fops = {
> +	.owner = THIS_MODULE,
> +	.open = virtio_vdmabuf_open,
> +	.release = virtio_vdmabuf_release,
> +	.read = virtio_vdmabuf_event_read,
> +	.poll = virtio_vdmabuf_event_poll,
> +	.unlocked_ioctl = virtio_vdmabuf_ioctl,
> +};
> +
> +static struct miscdevice virtio_vdmabuf_miscdev = {
> +	.minor = MISC_DYNAMIC_MINOR,
> +	.name = "virtio-vdmabuf",
> +	.fops = &virtio_vdmabuf_fops,
> +};
> +
> +static int virtio_vdmabuf_probe(struct virtio_device *vdev)
> +{
> +	vq_callback_t *cbs[] = {
> +		virtio_vdmabuf_recv_cb,
> +		virtio_vdmabuf_send_cb,
> +	};
> +	static const char *const names[] = {
> +		"recv",
> +		"send",
> +	};
> +	struct virtio_vdmabuf *vdmabuf;
> +	int ret = 0;
> +
> +	if (!drv_info)
> +		return -EINVAL;
> +
> +	vdmabuf = drv_info->priv;
> +
> +	if (!vdmabuf)
> +		return -EINVAL;
> +
> +	vdmabuf->vdev = vdev;
> +	vdev->priv = vdmabuf;
> +
> +	/* initialize spinlock for synchronizing virtqueue accesses */
> +	spin_lock_init(&vdmabuf->vq_lock);
> +
> +	ret = virtio_find_vqs(vdmabuf->vdev, VDMABUF_VQ_MAX, vdmabuf->vqs,
> +			      cbs, names, NULL);
> +	if (ret) {
> +		dev_err(drv_info->dev, "Cannot find any vqs\n");
> +		return ret;
> +	}
> +
> +	INIT_LIST_HEAD(&vdmabuf->msg_list);
> +	INIT_WORK(&vdmabuf->recv_work, virtio_vdmabuf_recv_work);
> +	INIT_WORK(&vdmabuf->send_work, virtio_vdmabuf_send_work);
> +	INIT_WORK(&vdmabuf->send_msg_work, virtio_vdmabuf_send_msg_work);
> +
> +	return ret;
> +}
> +
> +static void virtio_vdmabuf_remove(struct virtio_device *vdev)
> +{
> +	struct virtio_vdmabuf *vdmabuf;
> +
> +	if (!drv_info)
> +		return;
> +
> +	vdmabuf = drv_info->priv;
> +	flush_work(&vdmabuf->recv_work);
> +	flush_work(&vdmabuf->send_work);
> +	flush_work(&vdmabuf->send_msg_work);
> +
> +	vdev->config->reset(vdev);
> +	vdev->config->del_vqs(vdev);
> +}
> +
> +static struct virtio_device_id id_table[] = {
> +	{ VIRTIO_ID_VDMABUF, VIRTIO_DEV_ANY_ID },
> +	{ 0 },
> +};
> +
> +static struct virtio_driver virtio_vdmabuf_vdev_drv = {
> +	.driver.name =  KBUILD_MODNAME,
> +	.driver.owner = THIS_MODULE,
> +	.id_table =     id_table,
> +	.probe =        virtio_vdmabuf_probe,
> +	.remove =       virtio_vdmabuf_remove,
> +};
> +
> +static int __init virtio_vdmabuf_init(void)
> +{
> +	struct virtio_vdmabuf *vdmabuf;
> +	int ret = 0;
> +
> +	drv_info = NULL;
> +
> +	ret = misc_register(&virtio_vdmabuf_miscdev);
> +	if (ret) {
> +		pr_err("virtio-vdmabuf misc driver can't be registered\n");
> +		return ret;
> +	}
> +
> +	ret = dma_set_mask_and_coherent(virtio_vdmabuf_miscdev.this_device,
> +					DMA_BIT_MASK(64));
> +	if (ret < 0) {
> +		misc_deregister(&virtio_vdmabuf_miscdev);
> +		return -EINVAL;
> +	}
> +
> +	drv_info = kvcalloc(1, sizeof(*drv_info), GFP_KERNEL);
> +	if (!drv_info) {
> +		misc_deregister(&virtio_vdmabuf_miscdev);
> +		return -ENOMEM;
> +	}
> +
> +	vdmabuf = kvcalloc(1, sizeof(*vdmabuf), GFP_KERNEL);
> +	if (!vdmabuf) {
> +		kvfree(drv_info);
> +		misc_deregister(&virtio_vdmabuf_miscdev);
> +		return -ENOMEM;
> +	}
> +
> +	vdmabuf->evq = kvcalloc(1, sizeof(*(vdmabuf->evq)), GFP_KERNEL);
> +	if (!vdmabuf->evq) {
> +		kvfree(drv_info);
> +		kvfree(vdmabuf);
> +		misc_deregister(&virtio_vdmabuf_miscdev);
> +		return -ENOMEM;
> +	}
> +
> +	drv_info->priv = (void *)vdmabuf;
> +	drv_info->dev = virtio_vdmabuf_miscdev.this_device;
> +
> +	mutex_init(&drv_info->g_mutex);
> +
> +	mutex_init(&vdmabuf->evq->e_readlock);
> +	spin_lock_init(&vdmabuf->evq->e_lock);
> +
> +	INIT_LIST_HEAD(&vdmabuf->evq->e_list);
> +	init_waitqueue_head(&vdmabuf->evq->e_wait);
> +	hash_init(drv_info->buf_list);
> +
> +	vdmabuf->evq->pending = 0;
> +	vdmabuf->wq = create_workqueue("virtio_vdmabuf_wq");
> +
> +	ret = register_virtio_driver(&virtio_vdmabuf_vdev_drv);
> +	if (ret) {
> +		dev_err(drv_info->dev, "vdmabuf driver can't be registered\n");
> +		misc_deregister(&virtio_vdmabuf_miscdev);
> +		kvfree(vdmabuf);
> +		kvfree(drv_info);
> +		return -EFAULT;
> +	}
> +
> +	return 0;
> +}
> +
> +static void __exit virtio_vdmabuf_deinit(void)
> +{
> +	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
> +	struct virtio_vdmabuf_event *e, *et;
> +	unsigned long irqflags;
> +
> +	misc_deregister(&virtio_vdmabuf_miscdev);
> +	unregister_virtio_driver(&virtio_vdmabuf_vdev_drv);
> +
> +	if (vdmabuf->wq)
> +		destroy_workqueue(vdmabuf->wq);
> +
> +	spin_lock_irqsave(&vdmabuf->evq->e_lock, irqflags);
> +
> +	list_for_each_entry_safe(e, et, &vdmabuf->evq->e_list,
> +				 link) {
> +		list_del(&e->link);
> +		kvfree(e);
> +		vdmabuf->evq->pending--;
> +	}
> +
> +	spin_unlock_irqrestore(&vdmabuf->evq->e_lock, irqflags);
> +
> +	/* freeing all exported buffers */
> +	remove_all_bufs(vdmabuf);
> +
> +	kvfree(vdmabuf->evq);
> +	kvfree(vdmabuf);
> +	kvfree(drv_info);
> +}
> +
> +module_init(virtio_vdmabuf_init);
> +module_exit(virtio_vdmabuf_deinit);
> +
> +MODULE_DEVICE_TABLE(virtio, virtio_vdmabuf_id_table);
> +MODULE_DESCRIPTION("Virtio Vdmabuf frontend driver");
> +MODULE_LICENSE("GPL and additional rights");
> diff --git a/include/linux/virtio_vdmabuf.h b/include/linux/virtio_vdmabuf.h
> new file mode 100644
> index 000000000000..9500bf4a54ac
> --- /dev/null
> +++ b/include/linux/virtio_vdmabuf.h
> @@ -0,0 +1,271 @@
> +/* SPDX-License-Identifier: (MIT OR GPL-2.0) */
> +
> +/*
> + * Copyright © 2021 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef _LINUX_VIRTIO_VDMABUF_H 
> +#define _LINUX_VIRTIO_VDMABUF_H 
> +
> +#include <uapi/linux/virtio_vdmabuf.h>
> +#include <linux/hashtable.h>
> +#include <linux/kvm_types.h>
> +
> +struct virtio_vdmabuf_shared_pages {
> +	/* cross-VM ref addr for the buffer */
> +	gpa_t ref;
> +
> +	/* page array */
> +	struct page **pages;
> +	gpa_t **l2refs;
> +	gpa_t *l3refs;
> +
> +	/* data offset in the first page
> +	 * and data length in the last page
> +	 */
> +	int first_ofst;
> +	int last_len;
> +
> +	/* number of shared pages */
> +	int nents;
> +};
> +
> +struct virtio_vdmabuf_buf {
> +	virtio_vdmabuf_buf_id_t buf_id;
> +
> +	struct dma_buf_attachment *attach;
> +	struct dma_buf *dma_buf;
> +	struct sg_table *sgt;
> +	struct virtio_vdmabuf_shared_pages *pages_info;
> +	int vmid;
> +
> +	/* validity of the buffer */
> +	bool valid;
> +
> +	/* set if the buffer is imported via import_ioctl */
> +	bool imported;
> +
> +	/* size of private */
> +	size_t sz_priv;
> +	/* private data associated with the exported buffer */
> +	void *priv;
> +
> +	struct file *filp;
> +	struct hlist_node node;
> +};
> +
> +struct virtio_vdmabuf_event {
> +	struct virtio_vdmabuf_e_data e_data;
> +	struct list_head link;
> +};
> +
> +struct virtio_vdmabuf_event_queue {
> +	wait_queue_head_t e_wait;
> +	struct list_head e_list;
> +
> +	spinlock_t e_lock;
> +	struct mutex e_readlock;
> +
> +	/* # of pending events */
> +	int pending;
> +};
> +
> +/* driver information */
> +struct virtio_vdmabuf_info {
> +	struct device *dev;
> +
> +	struct list_head head_vdmabuf_list;
> +	struct list_head kvm_instances;
> +
> +	DECLARE_HASHTABLE(buf_list, 7);
> +
> +	void *priv;
> +	struct mutex g_mutex;
> +	struct notifier_block kvm_notifier;
> +};
> +
> +/* IOCTL definitions
> + */
> +typedef int (*virtio_vdmabuf_ioctl_t)(struct file *filp, void *data);
> +
> +struct virtio_vdmabuf_ioctl_desc {
> +	unsigned int cmd;
> +	int flags;
> +	virtio_vdmabuf_ioctl_t func;
> +	const char *name;
> +};
> +
> +#define VIRTIO_VDMABUF_IOCTL_DEF(ioctl, _func, _flags)	\
> +	[_IOC_NR(ioctl)] = {			\
> +			.cmd = ioctl,		\
> +			.func = _func,		\
> +			.flags = _flags,	\
> +			.name = #ioctl		\
> +}
> +
> +#define VIRTIO_VDMABUF_VMID(buf_id) ((((buf_id).id) >> 32) & 0xFFFFFFFF)
> +
> +/* Messages between Host and Guest */
> +
> +/* List of commands from Guest to Host:
> + *
> + * ------------------------------------------------------------------
> + * A. NEED_VMID
> + *
> + *  guest asks the host to provide its vmid
> + *
> + * req:
> + *
> + * cmd: VIRTIO_VDMABUF_NEED_VMID
> + *
> + * ack:
> + *
> + * cmd: same as req
> + * op[0] : vmid of guest
> + *
> + * ------------------------------------------------------------------
> + * B. EXPORT
> + *
> + *  export dmabuf to host
> + *
> + * req:
> + *
> + * cmd: VIRTIO_VDMABUF_CMD_EXPORT
> + * op0~op3 : HDMABUF ID
> + * op4 : number of pages to be shared
> + * op5 : offset of data in the first page
> + * op6 : length of data in the last page
> + * op7 : upper 32 bit of top-level ref of shared buf
> + * op8 : lower 32 bit of top-level ref of shared buf
> + * op9 : size of private data
> + * op10 ~ op64: User private date associated with the buffer
> + *	        (e.g. graphic buffer's meta info)
> + *
> + * ------------------------------------------------------------------
> + *
> + * List of commands from Host to Guest
> + *
> + * ------------------------------------------------------------------
> + * A. RELEASE
> + *
> + *  notifying guest that the shared buffer is released by an importer
> + *
> + * req:
> + *
> + * cmd: VIRTIO_VDMABUF_CMD_DMABUF_REL
> + * op0~op3 : VDMABUF ID
> + *
> + * ------------------------------------------------------------------
> + */
> +
> +/* msg structures */
> +struct virtio_vdmabuf_msg {
> +	struct list_head list;
> +	unsigned int cmd;
> +	unsigned int op[64];
> +};
> +
> +enum {
> +	VDMABUF_VQ_RECV = 0,
> +	VDMABUF_VQ_SEND = 1,
> +	VDMABUF_VQ_MAX  = 2,
> +};
> +
> +enum virtio_vdmabuf_cmd {
> +	VIRTIO_VDMABUF_CMD_NEED_VMID,
> +	VIRTIO_VDMABUF_CMD_EXPORT = 0x10,
> +	VIRTIO_VDMABUF_CMD_DMABUF_REL
> +};
> +
> +enum virtio_vdmabuf_ops {
> +	VIRTIO_VDMABUF_HDMABUF_ID_ID = 0,
> +	VIRTIO_VDMABUF_HDMABUF_ID_RNG_KEY0,
> +	VIRTIO_VDMABUF_HDMABUF_ID_RNG_KEY1,
> +	VIRTIO_VDMABUF_NUM_PAGES_SHARED = 4,
> +	VIRTIO_VDMABUF_FIRST_PAGE_DATA_OFFSET,
> +	VIRTIO_VDMABUF_LAST_PAGE_DATA_LENGTH,
> +	VIRTIO_VDMABUF_REF_ADDR_UPPER_32BIT,
> +	VIRTIO_VDMABUF_REF_ADDR_LOWER_32BIT,
> +	VIRTIO_VDMABUF_PRIVATE_DATA_SIZE,
> +	VIRTIO_VDMABUF_PRIVATE_DATA_START
> +};
> +
> +/* adding exported/imported vdmabuf info to hash */
> +static inline int
> +virtio_vdmabuf_add_buf(struct virtio_vdmabuf_info *info,
> +                       struct virtio_vdmabuf_buf *new)
> +{
> +	hash_add(info->buf_list, &new->node, new->buf_id.id);
> +	return 0;
> +}
> +
> +/* comparing two vdmabuf IDs */
> +static inline bool
> +is_same_buf(virtio_vdmabuf_buf_id_t a,
> +            virtio_vdmabuf_buf_id_t b)
> +{
> +	int i;
> +
> +	if (a.id != b.id)
> +		return 1;
> +
> +	/* compare keys */
> +	for (i = 0; i < 2; i++) {
> +		if (a.rng_key[i] != b.rng_key[i])
> +			return false;
> +	}
> +
> +	return true;
> +}
> +
> +/* find buf for given vdmabuf ID */
> +static inline struct virtio_vdmabuf_buf
> +*virtio_vdmabuf_find_buf(struct virtio_vdmabuf_info *info,
> +			 virtio_vdmabuf_buf_id_t *buf_id)
> +{
> +	struct virtio_vdmabuf_buf *found;
> +
> +	hash_for_each_possible(info->buf_list, found, node, buf_id->id)
> +		if (is_same_buf(found->buf_id, *buf_id))
> +			return found;
> +
> +	return NULL;
> +}
> +
> +/* delete buf from hash */
> +static inline int
> +virtio_vdmabuf_del_buf(struct virtio_vdmabuf_info *info,
> +                       virtio_vdmabuf_buf_id_t *buf_id)
> +{
> +	struct virtio_vdmabuf_buf *found;
> +
> +	found = virtio_vdmabuf_find_buf(info, buf_id);
> +	if (!found)
> +		return -ENOENT;
> +
> +	hash_del(&found->node);
> +
> +	return 0;
> +}
> +
> +#endif
> diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
> index bc1c0621f5ed..39c94637ddee 100644
> --- a/include/uapi/linux/virtio_ids.h
> +++ b/include/uapi/linux/virtio_ids.h
> @@ -54,5 +54,6 @@
>  #define VIRTIO_ID_FS			26 /* virtio filesystem */
>  #define VIRTIO_ID_PMEM			27 /* virtio pmem */
>  #define VIRTIO_ID_MAC80211_HWSIM	29 /* virtio mac80211-hwsim */
> +#define VIRTIO_ID_VDMABUF          	40 /* virtio vdmabuf */
>  
>  #endif /* _LINUX_VIRTIO_IDS_H */
> diff --git a/include/uapi/linux/virtio_vdmabuf.h b/include/uapi/linux/virtio_vdmabuf.h
> new file mode 100644
> index 000000000000..7bddaa04ddd6
> --- /dev/null
> +++ b/include/uapi/linux/virtio_vdmabuf.h
> @@ -0,0 +1,99 @@
> +// SPDX-License-Identifier: (MIT OR GPL-2.0)
> +
> +/*
> + * Copyright © 2021 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef _UAPI_LINUX_VIRTIO_VDMABUF_H
> +#define _UAPI_LINUX_VIRTIO_VDMABUF_H
> +
> +#define MAX_SIZE_PRIV_DATA 192
> +
> +typedef struct {
> +	__u64 id;
> +	/* 8B long Random number */
> +	int rng_key[2];
> +} virtio_vdmabuf_buf_id_t;
> +
> +struct virtio_vdmabuf_e_hdr {
> +	/* buf_id of new buf */
> +	virtio_vdmabuf_buf_id_t buf_id;
> +	/* size of private data */
> +	int size;
> +};
> +
> +struct virtio_vdmabuf_e_data {
> +	struct virtio_vdmabuf_e_hdr hdr;
> +	/* ptr to private data */
> +	void __user *data;
> +};
> +
> +#define VIRTIO_VDMABUF_IOCTL_IMPORT \
> +_IOC(_IOC_NONE, 'G', 2, sizeof(struct virtio_vdmabuf_import))
> +#define VIRTIO_VDMABUF_IOCTL_RELEASE \
> +_IOC(_IOC_NONE, 'G', 3, sizeof(struct virtio_vdmabuf_import))
> +struct virtio_vdmabuf_import {
> +	/* IN parameters */
> +	/* ahdb buf id to be imported */
> +	virtio_vdmabuf_buf_id_t buf_id;
> +	/* flags */
> +	int flags;
> +	/* OUT parameters */
> +	/* exported dma buf fd */
> +	int fd;
> +};
> +
> +#define VIRTIO_VDMABUF_IOCTL_EXPORT \
> +_IOC(_IOC_NONE, 'G', 4, sizeof(struct virtio_vdmabuf_export))
> +struct virtio_vdmabuf_export {
> +	/* IN parameters */
> +	/* DMA buf fd to be exported */
> +	int fd;
> +	/* exported dma buf id */
> +	virtio_vdmabuf_buf_id_t buf_id;
> +	int sz_priv;
> +	char *priv;
> +};
> +
> +#define VIRTIO_VDMABUF_IOCTL_QUERY \
> +_IOC(_IOC_NONE, 'G', 5, sizeof(struct virtio_vdmabuf_query))
> +struct virtio_vdmabuf_query {
> +	/* in parameters */
> +	/* id of buf to be queried */
> +	virtio_vdmabuf_buf_id_t buf_id;
> +	/* item to be queried */
> +	int item;
> +	/* OUT parameters */
> +	/* Value of queried item */
> +	unsigned long info;
> +};
> +
> +/* DMABUF query */
> +enum virtio_vdmabuf_query_cmd {
> +	VIRTIO_VDMABUF_QUERY_SIZE = 0x10,
> +	VIRTIO_VDMABUF_QUERY_BUSY,
> +	VIRTIO_VDMABUF_QUERY_PRIV_INFO_SIZE,
> +	VIRTIO_VDMABUF_QUERY_PRIV_INFO,
> +};
> +
> +#endif
> -- 
> 2.26.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-05 16:03     ` Daniel Vetter
  0 siblings, 0 replies; 57+ messages in thread
From: Daniel Vetter @ 2021-02-05 16:03 UTC (permalink / raw)
  To: Vivek Kasireddy, Gerd Hoffmann
  Cc: dongwon.kim, christian.koenig, daniel.vetter, dri-devel,
	virtualization, kraxel, daniel.vetter, linux-media

On Tue, Feb 02, 2021 at 11:35:16PM -0800, Vivek Kasireddy wrote:
> This driver "transfers" a dmabuf created on the Guest to the Host.
> A common use-case for such a transfer includes sharing the scanout
> buffer created by a display server or a compositor running in the
> Guest with Qemu UI -- running on the Host.
> 
> The "transfer" is accomplished by sharing the PFNs of all the pages
> associated with the dmabuf and having a new dmabuf created on the
> Host that is backed up by the pages mapped from the Guest.
> 
> Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> ---
>  drivers/virtio/Kconfig              |    8 +
>  drivers/virtio/Makefile             |    1 +
>  drivers/virtio/virtio_vdmabuf.c     | 1090 +++++++++++++++++++++++++++
>  include/linux/virtio_vdmabuf.h      |  271 +++++++
>  include/uapi/linux/virtio_ids.h     |    1 +
>  include/uapi/linux/virtio_vdmabuf.h |   99 +++
>  6 files changed, 1470 insertions(+)
>  create mode 100644 drivers/virtio/virtio_vdmabuf.c
>  create mode 100644 include/linux/virtio_vdmabuf.h
>  create mode 100644 include/uapi/linux/virtio_vdmabuf.h
> 
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 7b41130d3f35..e563c12f711e 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -139,4 +139,12 @@ config VIRTIO_DMA_SHARED_BUFFER
>  	 This option adds a flavor of dma buffers that are backed by
>  	 virtio resources.
>  
> +config VIRTIO_VDMABUF
> +	bool "Enables Vdmabuf driver in guest os"
> +	default n
> +	depends on VIRTIO
> +	help
> +	 This driver provides a way to share the dmabufs created in
> +	 the Guest with the Host.
> +
>  endif # VIRTIO_MENU
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 591e6f72aa54..b4bb0738009c 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -9,3 +9,4 @@ obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
>  obj-$(CONFIG_VIRTIO_VDPA) += virtio_vdpa.o
>  obj-$(CONFIG_VIRTIO_MEM) += virtio_mem.o
>  obj-$(CONFIG_VIRTIO_DMA_SHARED_BUFFER) += virtio_dma_buf.o
> +obj-$(CONFIG_VIRTIO_VDMABUF) += virtio_vdmabuf.o
> diff --git a/drivers/virtio/virtio_vdmabuf.c b/drivers/virtio/virtio_vdmabuf.c
> new file mode 100644
> index 000000000000..c28f144eb126
> --- /dev/null
> +++ b/drivers/virtio/virtio_vdmabuf.c
> @@ -0,0 +1,1090 @@
> +// SPDX-License-Identifier: (MIT OR GPL-2.0)
> +
> +/*
> + * Copyright © 2021 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + * Authors:
> + *    Dongwon Kim <dongwon.kim@intel.com>
> + *    Mateusz Polrola <mateusz.polrola@gmail.com>
> + *    Vivek Kasireddy <vivek.kasireddy@intel.com>
> + */
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/errno.h>
> +#include <linux/module.h>
> +#include <linux/device.h>
> +#include <linux/uaccess.h>
> +#include <linux/miscdevice.h>
> +#include <linux/delay.h>
> +#include <linux/random.h>
> +#include <linux/poll.h>
> +#include <linux/spinlock.h>
> +#include <linux/dma-buf.h>
> +#include <linux/virtio.h>
> +#include <linux/virtio_ids.h>
> +#include <linux/virtio_config.h>
> +#include <linux/virtio_vdmabuf.h>
> +
> +#define VIRTIO_VDMABUF_MAX_ID INT_MAX
> +#define REFS_PER_PAGE (PAGE_SIZE/sizeof(long))
> +#define NEW_BUF_ID_GEN(vmid, cnt) (((vmid & 0xFFFFFFFF) << 32) | \
> +				    ((cnt) & 0xFFFFFFFF))
> +
> +/* one global drv object */
> +static struct virtio_vdmabuf_info *drv_info;
> +
> +struct virtio_vdmabuf {
> +	/* virtio device structure */
> +	struct virtio_device *vdev;
> +
> +	/* virtual queue array */
> +	struct virtqueue *vqs[VDMABUF_VQ_MAX];
> +
> +	/* ID of guest OS */
> +	u64 vmid;
> +
> +	/* spin lock that needs to be acquired before accessing
> +	 * virtual queue
> +	 */
> +	spinlock_t vq_lock;
> +	struct mutex recv_lock;
> +	struct mutex send_lock;
> +
> +	struct list_head msg_list;
> +
> +	/* workqueue */
> +	struct workqueue_struct *wq;
> +	struct work_struct recv_work;
> +	struct work_struct send_work;
> +	struct work_struct send_msg_work;
> +
> +	struct virtio_vdmabuf_event_queue *evq;
> +};
> +
> +static virtio_vdmabuf_buf_id_t get_buf_id(struct virtio_vdmabuf *vdmabuf)
> +{
> +	virtio_vdmabuf_buf_id_t buf_id = {0, {0, 0} };
> +	static int count = 0;
> +
> +	count = count < VIRTIO_VDMABUF_MAX_ID ? count + 1 : 0;
> +	buf_id.id = NEW_BUF_ID_GEN(vdmabuf->vmid, count);
> +
> +	/* random data embedded in the id for security */
> +	get_random_bytes(&buf_id.rng_key[0], 8);
> +
> +	return buf_id;
> +}
> +
> +/* sharing pages for original DMABUF with Host */
> +static struct virtio_vdmabuf_shared_pages
> +*virtio_vdmabuf_share_buf(struct page **pages, int nents,
> +			  int first_ofst, int last_len)
> +{
> +	struct virtio_vdmabuf_shared_pages *pages_info;
> +	int i;
> +	int n_l2refs = nents/REFS_PER_PAGE +
> +		       ((nents % REFS_PER_PAGE) ? 1 : 0);
> +
> +	pages_info = kvcalloc(1, sizeof(*pages_info), GFP_KERNEL);
> +	if (!pages_info)
> +		return NULL;
> +
> +	pages_info->pages = pages;
> +	pages_info->nents = nents;
> +	pages_info->first_ofst = first_ofst;
> +	pages_info->last_len = last_len;
> +	pages_info->l3refs = (gpa_t *)__get_free_page(GFP_KERNEL);
> +
> +	if (!pages_info->l3refs) {
> +		kvfree(pages_info);
> +		return NULL;
> +	}
> +
> +	pages_info->l2refs = (gpa_t **)__get_free_pages(GFP_KERNEL,
> +					get_order(n_l2refs * PAGE_SIZE));
> +
> +	if (!pages_info->l2refs) {
> +		free_page((gpa_t)pages_info->l3refs);
> +		kvfree(pages_info);
> +		return NULL;
> +	}
> +
> +	/* Share physical address of pages */
> +	for (i = 0; i < nents; i++)
> +		pages_info->l2refs[i] = (gpa_t *)page_to_phys(pages[i]);
> +
> +	for (i = 0; i < n_l2refs; i++)
> +		pages_info->l3refs[i] =
> +			virt_to_phys((void *)pages_info->l2refs +
> +				     i * PAGE_SIZE);
> +
> +	pages_info->ref = (gpa_t)virt_to_phys(pages_info->l3refs);
> +
> +	return pages_info;
> +}
> +
> +/* stop sharing pages */
> +static void
> +virtio_vdmabuf_free_buf(struct virtio_vdmabuf_shared_pages *pages_info)
> +{
> +	int n_l2refs = (pages_info->nents/REFS_PER_PAGE +
> +		       ((pages_info->nents % REFS_PER_PAGE) ? 1 : 0));
> +
> +	free_pages((gpa_t)pages_info->l2refs, get_order(n_l2refs * PAGE_SIZE));
> +	free_page((gpa_t)pages_info->l3refs);
> +
> +	kvfree(pages_info);
> +}
> +
> +static int send_msg_to_host(enum virtio_vdmabuf_cmd cmd, int *op)
> +{
> +	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
> +	struct virtio_vdmabuf_msg *msg;
> +	int i;
> +
> +	switch (cmd) {
> +	case VIRTIO_VDMABUF_CMD_NEED_VMID:
> +		msg = kvcalloc(1, sizeof(struct virtio_vdmabuf_msg),
> +			       GFP_KERNEL);
> +		if (!msg)
> +			return -ENOMEM;
> +
> +		if (op)
> +			for (i = 0; i < 4; i++)
> +				msg->op[i] = op[i];
> +		break;
> +
> +	case VIRTIO_VDMABUF_CMD_EXPORT:
> +		msg = kvcalloc(1, sizeof(struct virtio_vdmabuf_msg),
> +			       GFP_KERNEL);
> +		if (!msg)
> +			return -ENOMEM;
> +
> +		memcpy(&msg->op[0], &op[0], 9 * sizeof(int) + op[9]);
> +		break;
> +
> +	default:
> +		/* no command found */
> +		return -EINVAL;
> +	}
> +
> +	msg->cmd = cmd;
> +	list_add_tail(&msg->list, &vdmabuf->msg_list);
> +	queue_work(vdmabuf->wq, &vdmabuf->send_msg_work);
> +
> +	return 0;
> +}
> +
> +static int add_event_buf_rel(struct virtio_vdmabuf_buf *buf_info)
> +{
> +	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
> +	struct virtio_vdmabuf_event *e_oldest, *e_new;
> +	struct virtio_vdmabuf_event_queue *eq = vdmabuf->evq;
> +	unsigned long irqflags;
> +
> +	e_new = kvzalloc(sizeof(*e_new), GFP_KERNEL);
> +	if (!e_new)
> +		return -ENOMEM;
> +
> +	e_new->e_data.hdr.buf_id = buf_info->buf_id;
> +	e_new->e_data.data = (void *)buf_info->priv;
> +	e_new->e_data.hdr.size = buf_info->sz_priv;
> +
> +	spin_lock_irqsave(&eq->e_lock, irqflags);
> +
> +	/* check current number of events and if it hits the max num (32)
> +	 * then remove the oldest event in the list
> +	 */
> +	if (eq->pending > 31) {
> +		e_oldest = list_first_entry(&eq->e_list,
> +					    struct virtio_vdmabuf_event, link);
> +		list_del(&e_oldest->link);
> +		eq->pending--;
> +		kvfree(e_oldest);
> +	}
> +
> +	list_add_tail(&e_new->link, &eq->e_list);
> +
> +	eq->pending++;
> +
> +	wake_up_interruptible(&eq->e_wait);
> +	spin_unlock_irqrestore(&eq->e_lock, irqflags);
> +
> +	return 0;
> +}
> +
> +static void virtio_vdmabuf_clear_buf(struct virtio_vdmabuf_buf *exp)
> +{
> +	/* Start cleanup of buffer in reverse order to exporting */
> +	virtio_vdmabuf_free_buf(exp->pages_info);
> +
> +	dma_buf_unmap_attachment(exp->attach, exp->sgt,
> +				 DMA_BIDIRECTIONAL);
> +
> +	if (exp->dma_buf) {
> +		dma_buf_detach(exp->dma_buf, exp->attach);
> +		/* close connection to dma-buf completely */
> +		dma_buf_put(exp->dma_buf);
> +		exp->dma_buf = NULL;
> +	}
> +}
> +
> +static int remove_buf(struct virtio_vdmabuf *vdmabuf,
> +		      struct virtio_vdmabuf_buf *exp)
> +{
> +	int ret;
> +
> +	ret = add_event_buf_rel(exp);
> +	if (ret)
> +		return ret;
> +
> +	virtio_vdmabuf_clear_buf(exp);
> +
> +	ret = virtio_vdmabuf_del_buf(drv_info, &exp->buf_id);
> +	if (ret)
> +		return ret;
> +
> +	if (exp->sz_priv > 0 && !exp->priv)
> +		kvfree(exp->priv);
> +
> +	kvfree(exp);
> +	return 0;
> +}
> +
> +static int parse_msg_from_host(struct virtio_vdmabuf *vdmabuf,
> +		     	       struct virtio_vdmabuf_msg *msg)
> +{
> +	struct virtio_vdmabuf_buf *exp;
> +	virtio_vdmabuf_buf_id_t buf_id;
> +	int ret;
> +
> +	switch (msg->cmd) {
> +	case VIRTIO_VDMABUF_CMD_NEED_VMID:
> +		vdmabuf->vmid = msg->op[0];
> +
> +		break;
> +	case VIRTIO_VDMABUF_CMD_DMABUF_REL:
> +		memcpy(&buf_id, msg->op, sizeof(buf_id));
> +
> +		exp = virtio_vdmabuf_find_buf(drv_info, &buf_id);
> +		if (!exp) {
> +			dev_err(drv_info->dev, "can't find buffer\n");
> +			return -EINVAL;
> +		}
> +
> +		ret = remove_buf(vdmabuf, exp);
> +		if (ret)
> +			return ret;
> +
> +		break;
> +	case VIRTIO_VDMABUF_CMD_EXPORT:
> +		break;
> +	default:
> +		dev_err(drv_info->dev, "empty cmd\n");
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static void virtio_vdmabuf_recv_work(struct work_struct *work)
> +{
> +	struct virtio_vdmabuf *vdmabuf =
> +		container_of(work, struct virtio_vdmabuf, recv_work);
> +	struct virtqueue *vq = vdmabuf->vqs[VDMABUF_VQ_RECV];
> +	struct virtio_vdmabuf_msg *msg;
> +	int sz;
> +
> +	mutex_lock(&vdmabuf->recv_lock);
> +
> +	do {
> +		virtqueue_disable_cb(vq);
> +		for (;;) {
> +			msg = virtqueue_get_buf(vq, &sz);
> +			if (!msg)
> +				break;
> +
> +			/* valid size */
> +			if (sz == sizeof(struct virtio_vdmabuf_msg)) {
> +				if (parse_msg_from_host(vdmabuf, msg))
> +					dev_err(drv_info->dev,
> +						"msg parse error\n");
> +
> +				kvfree(msg);
> +			} else {
> +				dev_err(drv_info->dev,
> +					"received malformed message\n");
> +			}
> +		}
> +	} while (!virtqueue_enable_cb(vq));
> +
> +	mutex_unlock(&vdmabuf->recv_lock);
> +}
> +
> +static void virtio_vdmabuf_fill_recv_msg(struct virtio_vdmabuf *vdmabuf)
> +{
> +	struct virtqueue *vq = vdmabuf->vqs[VDMABUF_VQ_RECV];
> +	struct scatterlist sg;
> +	struct virtio_vdmabuf_msg *msg;
> +	int ret;
> +
> +	msg = kvzalloc(sizeof(*msg), GFP_KERNEL);
> +	if (!msg)
> +		return;
> +
> +	sg_init_one(&sg, msg, sizeof(struct virtio_vdmabuf_msg));
> +	ret = virtqueue_add_inbuf(vq, &sg, 1, msg, GFP_KERNEL);
> +	if (ret)
> +		return;
> +
> +	virtqueue_kick(vq);
> +}
> +
> +static void virtio_vdmabuf_send_msg_work(struct work_struct *work)
> +{
> +	struct virtio_vdmabuf *vdmabuf =
> +		container_of(work, struct virtio_vdmabuf, send_msg_work);
> +	struct virtqueue *vq = vdmabuf->vqs[VDMABUF_VQ_SEND];
> +	struct scatterlist sg;
> +	struct virtio_vdmabuf_msg *msg;
> +	bool added = false;
> +	int ret;
> +
> +	mutex_lock(&vdmabuf->send_lock);
> +
> +	for (;;) {
> +		if (list_empty(&vdmabuf->msg_list))
> +			break;
> +
> +		virtio_vdmabuf_fill_recv_msg(vdmabuf);
> +
> +		msg = list_first_entry(&vdmabuf->msg_list,
> +				       struct virtio_vdmabuf_msg, list);
> +		list_del_init(&msg->list);
> +
> +		sg_init_one(&sg, msg, sizeof(struct virtio_vdmabuf_msg));
> +		ret = virtqueue_add_outbuf(vq, &sg, 1, msg, GFP_KERNEL);
> +		if (ret < 0) {
> +			dev_err(drv_info->dev,
> +				"failed to add msg to vq\n");
> +			break;
> +		}
> +
> +		added = true;	
> +	}
> +
> +	if (added)
> +		virtqueue_kick(vq);
> +
> +	mutex_unlock(&vdmabuf->send_lock);
> +}
> +
> +static void virtio_vdmabuf_send_work(struct work_struct *work)
> +{
> +	struct virtio_vdmabuf *vdmabuf =
> +		container_of(work, struct virtio_vdmabuf, send_work);
> +	struct virtqueue *vq = vdmabuf->vqs[VDMABUF_VQ_SEND];
> +	struct virtio_vdmabuf_msg *msg;
> +	unsigned int sz;
> +	bool added = false;
> +
> +	mutex_lock(&vdmabuf->send_lock);
> +
> +	do {
> +		virtqueue_disable_cb(vq);
> +
> +		for (;;) {
> +			msg = virtqueue_get_buf(vq, &sz);
> +			if (!msg)
> +				break;
> +
> +			if (parse_msg_from_host(vdmabuf, msg))
> +				dev_err(drv_info->dev,
> +					"msg parse error\n");
> +
> +			kvfree(msg);
> +			added = true;
> +		}
> +	} while (!virtqueue_enable_cb(vq));
> +
> +	mutex_unlock(&vdmabuf->send_lock);
> +
> +	if (added)
> +		queue_work(vdmabuf->wq, &vdmabuf->send_msg_work);
> +}
> +
> +static void virtio_vdmabuf_recv_cb(struct virtqueue *vq)
> +{
> +	struct virtio_vdmabuf *vdmabuf = vq->vdev->priv;
> +
> +	if (!vdmabuf)
> +		return;
> +
> +	queue_work(vdmabuf->wq, &vdmabuf->recv_work);
> +}
> +
> +static void virtio_vdmabuf_send_cb(struct virtqueue *vq)
> +{
> +	struct virtio_vdmabuf *vdmabuf = vq->vdev->priv;
> +
> +	if (!vdmabuf)
> +		return;
> +
> +	queue_work(vdmabuf->wq, &vdmabuf->send_work);
> +}
> +
> +static int remove_all_bufs(struct virtio_vdmabuf *vdmabuf)
> +{
> +	struct virtio_vdmabuf_buf *found;
> +	struct hlist_node *tmp;
> +	int bkt;
> +	int ret;
> +
> +	hash_for_each_safe(drv_info->buf_list, bkt, tmp, found, node) {
> +		ret = remove_buf(vdmabuf, found);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static int virtio_vdmabuf_open(struct inode *inode, struct file *filp)
> +{
> +	int ret;
> +
> +	if (!drv_info) {
> +		pr_err("virtio vdmabuf driver is not ready\n");
> +		return -EINVAL;
> +	}
> +
> +	ret = send_msg_to_host(VIRTIO_VDMABUF_CMD_NEED_VMID, 0);
> +	if (ret < 0)
> +		dev_err(drv_info->dev, "fail to receive vmid\n");
> +
> +	filp->private_data = drv_info->priv;
> +
> +	return 0;
> +}
> +
> +static int virtio_vdmabuf_release(struct inode *inode, struct file *filp)
> +{
> +	return 0;
> +}
> +
> +/* Notify Host about the new vdmabuf */
> +static int export_notify(struct virtio_vdmabuf_buf *exp, struct page **pages)
> +{
> +	int *op;
> +	int ret;
> +
> +	op = kvcalloc(1, sizeof(int) * 65, GFP_KERNEL);
> +	if (!op)
> +		return -ENOMEM;
> +
> +	memcpy(op, &exp->buf_id, sizeof(exp->buf_id));
> +
> +	/* if new pages are to be shared */
> +	if (pages) {
> +		op[4] = exp->pages_info->nents;
> +		op[5] = exp->pages_info->first_ofst;
> +		op[6] = exp->pages_info->last_len;
> +
> +		memcpy(&op[7], &exp->pages_info->ref, sizeof(gpa_t));
> +	}
> +
> +	op[9] = exp->sz_priv;
> +
> +	/* driver/application specific private info */
> +	memcpy(&op[10], exp->priv, op[9]);
> +
> +	ret = send_msg_to_host(VIRTIO_VDMABUF_CMD_EXPORT, op);
> +
> +	kvfree(op);
> +	return ret;
> +}
> +
> +/* return total number of pages referenced by a sgt
> + * for pre-calculation of # of pages behind a given sgt
> + */
> +static int num_pgs(struct sg_table *sgt)
> +{
> +	struct scatterlist *sgl;
> +	int len, i;
> +	/* at least one page */
> +	int n_pgs = 1;
> +
> +	sgl = sgt->sgl;
> +
> +	len = sgl->length - PAGE_SIZE + sgl->offset;
> +
> +	/* round-up */
> +	n_pgs += ((len + PAGE_SIZE - 1)/PAGE_SIZE);
> +
> +	for (i = 1; i < sgt->nents; i++) {
> +		sgl = sg_next(sgl);
> +
> +		/* round-up */
> +		n_pgs += ((sgl->length + PAGE_SIZE - 1) /
> +			  PAGE_SIZE); /* round-up */
> +	}
> +
> +	return n_pgs;
> +}
> +
> +/* extract pages referenced by sgt */
> +static struct page **extr_pgs(struct sg_table *sgt, int *nents, int *last_len)

Nack, this doesn't work on dma-buf. And it'll blow up at runtime when you
enable the very recently merged CONFIG_DMABUF_DEBUG (would be good to test
with that, just to make sure).

Aside from this, for virtio/kvm use-cases we've already merged the udmabuf
driver. Does this not work for your usecase?

Adding Gerd as the subject expert for this area.

Thanks, Daniel

> +{
> +	struct scatterlist *sgl;
> +	struct page **pages;
> +	struct page **temp_pgs;
> +	int i, j;
> +	int len;
> +
> +	*nents = num_pgs(sgt);
> +	pages =	kvmalloc_array(*nents, sizeof(struct page *), GFP_KERNEL);
> +	if (!pages)
> +		return NULL;
> +
> +	sgl = sgt->sgl;
> +
> +	temp_pgs = pages;
> +	*temp_pgs++ = sg_page(sgl);
> +	len = sgl->length - PAGE_SIZE + sgl->offset;
> +
> +	i = 1;
> +	while (len > 0) {
> +		*temp_pgs++ = nth_page(sg_page(sgl), i++);
> +		len -= PAGE_SIZE;
> +	}
> +
> +	for (i = 1; i < sgt->nents; i++) {
> +		sgl = sg_next(sgl);
> +		*temp_pgs++ = sg_page(sgl);
> +		len = sgl->length - PAGE_SIZE;
> +		j = 1;
> +
> +		while (len > 0) {
> +			*temp_pgs++ = nth_page(sg_page(sgl), j++);
> +			len -= PAGE_SIZE;
> +		}
> +	}
> +
> +	*last_len = len + PAGE_SIZE;
> +
> +	return pages;
> +}
> +
> +/* ioctl - exporting new vdmabuf
> + *
> + *	 int dmabuf_fd - File handle of original DMABUF
> + *	 virtio_vdmabuf_buf_id_t buf_id - returned vdmabuf ID
> + *	 int sz_priv - size of private data from userspace
> + *	 char *priv - buffer of user private data
> + *
> + */
> +static int export_ioctl(struct file *filp, void *data)
> +{
> +	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
> +	struct virtio_vdmabuf_export *attr = data;
> +	struct dma_buf *dmabuf;
> +	struct dma_buf_attachment *attach;
> +	struct sg_table *sgt;
> +	struct virtio_vdmabuf_buf *exp;
> +	struct page **pages;
> +	int nents, last_len;
> +	virtio_vdmabuf_buf_id_t buf_id;
> +	int ret = 0;
> +
> +	if (vdmabuf->vmid <= 0)
> +		return -EINVAL;
> +
> +	dmabuf = dma_buf_get(attr->fd);
> +	if (IS_ERR(dmabuf))
> +		return PTR_ERR(dmabuf);
> +
> +	mutex_lock(&drv_info->g_mutex);
> +
> +	buf_id = get_buf_id(vdmabuf);
> +
> +	attach = dma_buf_attach(dmabuf, drv_info->dev);
> +	if (IS_ERR(attach)) {
> +		ret = PTR_ERR(attach);
> +		goto fail_attach;
> +	}
> +
> +	sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
> +	if (IS_ERR(sgt)) {
> +		ret = PTR_ERR(sgt);
> +		goto fail_map_attachment;
> +	}
> +
> +	/* allocate a new exp */
> +	exp = kvcalloc(1, sizeof(*exp), GFP_KERNEL);
> +	if (!exp) {
> +		ret = -ENOMEM;
> +		goto fail_sgt_info_creation;
> +	}
> +
> +	/* possible truncation */
> +	if (attr->sz_priv > MAX_SIZE_PRIV_DATA)
> +		exp->sz_priv = MAX_SIZE_PRIV_DATA;
> +	else
> +		exp->sz_priv = attr->sz_priv;
> +
> +	/* creating buffer for private data */
> +	if (exp->sz_priv != 0) {
> +		exp->priv = kvcalloc(1, exp->sz_priv, GFP_KERNEL);
> +		if (!exp->priv) {
> +			ret = -ENOMEM;
> +			goto fail_priv_creation;
> +		}
> +	}
> +
> +	exp->buf_id = buf_id;
> +	exp->attach = attach;
> +	exp->sgt = sgt;
> +	exp->dma_buf = dmabuf;
> +	exp->valid = 1;
> +
> +	if (exp->sz_priv) {
> +		/* copy private data to sgt_info */
> +		ret = copy_from_user(exp->priv, attr->priv, exp->sz_priv);
> +		if (ret) {
> +			ret = -EINVAL;
> +			goto fail_exp;
> +		}
> +	}
> +
> +	pages = extr_pgs(sgt, &nents, &last_len);
> +	if (pages == NULL) {
> +		ret = -ENOMEM;
> +		goto fail_exp;
> +	}
> +
> +	exp->pages_info = virtio_vdmabuf_share_buf(pages, nents,
> +						   sgt->sgl->offset,
> +					 	   last_len);
> +	if (!exp->pages_info) {
> +		ret = -ENOMEM;
> +		goto fail_create_pages_info;
> +	}
> +
> +	attr->buf_id = exp->buf_id;
> +	ret = export_notify(exp, pages);
> +	if (ret < 0)
> +		goto fail_send_request;
> +
> +	/* now register it to the export list */
> +	virtio_vdmabuf_add_buf(drv_info, exp);
> +
> +	exp->filp = filp;
> +
> +	mutex_unlock(&drv_info->g_mutex);
> +
> +	return ret;
> +
> +/* Clean-up if error occurs */
> +fail_send_request:
> +	virtio_vdmabuf_free_buf(exp->pages_info);
> +
> +fail_create_pages_info:
> +	kvfree(pages);
> +
> +fail_exp:
> +	kvfree(exp->priv);
> +
> +fail_priv_creation:
> +	kvfree(exp);
> +
> +fail_sgt_info_creation:
> +	dma_buf_unmap_attachment(attach, sgt,
> +				 DMA_BIDIRECTIONAL);
> +
> +fail_map_attachment:
> +	dma_buf_detach(dmabuf, attach);
> +
> +fail_attach:
> +	dma_buf_put(dmabuf);
> +
> +	mutex_unlock(&drv_info->g_mutex);
> +
> +	return ret;
> +}
> +
> +static const struct virtio_vdmabuf_ioctl_desc virtio_vdmabuf_ioctls[] = {
> +	VIRTIO_VDMABUF_IOCTL_DEF(VIRTIO_VDMABUF_IOCTL_EXPORT, export_ioctl, 0),
> +};
> +
> +static long virtio_vdmabuf_ioctl(struct file *filp, unsigned int cmd,
> +		       		 unsigned long param)
> +{
> +	const struct virtio_vdmabuf_ioctl_desc *ioctl = NULL;
> +	unsigned int nr = _IOC_NR(cmd);
> +	int ret;
> +	virtio_vdmabuf_ioctl_t func;
> +	char *kdata;
> +
> +	if (nr >= ARRAY_SIZE(virtio_vdmabuf_ioctls)) {
> +		dev_err(drv_info->dev, "invalid ioctl\n");
> +		return -EINVAL;
> +	}
> +
> +	ioctl = &virtio_vdmabuf_ioctls[nr];
> +
> +	func = ioctl->func;
> +
> +	if (unlikely(!func)) {
> +		dev_err(drv_info->dev, "no function\n");
> +		return -EINVAL;
> +	}
> +
> +	kdata = kvmalloc(_IOC_SIZE(cmd), GFP_KERNEL);
> +	if (!kdata)
> +		return -ENOMEM;
> +
> +	if (copy_from_user(kdata, (void __user *)param,
> +			   _IOC_SIZE(cmd)) != 0) {
> +		dev_err(drv_info->dev,
> +			"failed to copy from user arguments\n");
> +		ret = -EFAULT;
> +		goto ioctl_error;
> +	}
> +
> +	ret = func(filp, kdata);
> +
> +	if (copy_to_user((void __user *)param, kdata,
> +			 _IOC_SIZE(cmd)) != 0) {
> +		dev_err(drv_info->dev,
> +			"failed to copy to user arguments\n");
> +		ret = -EFAULT;
> +		goto ioctl_error;
> +	}
> +
> +ioctl_error:
> +	kvfree(kdata);
> +	return ret;
> +}
> +
> +static unsigned int virtio_vdmabuf_event_poll(struct file *filp,
> +			    	    	      struct poll_table_struct *wait)
> +{
> +	struct virtio_vdmabuf *vdmabuf = filp->private_data;
> +
> +	poll_wait(filp, &vdmabuf->evq->e_wait, wait);
> +
> +	if (!list_empty(&vdmabuf->evq->e_list))
> +		return POLLIN | POLLRDNORM;
> +
> +	return 0;
> +}
> +
> +static ssize_t virtio_vdmabuf_event_read(struct file *filp, char __user *buf,
> +			       		 size_t cnt, loff_t *ofst)
> +{
> +	struct virtio_vdmabuf *vdmabuf = filp->private_data;
> +	int ret;
> +
> +	/* make sure user buffer can be written */
> +	if (!access_ok(buf, sizeof (*buf))) {
> +		dev_err(drv_info->dev, "user buffer can't be written.\n");
> +		return -EINVAL;
> +	}
> +
> +	ret = mutex_lock_interruptible(&vdmabuf->evq->e_readlock);
> +	if (ret)
> +		return ret;
> +
> +	for (;;) {
> +		struct virtio_vdmabuf_event *e = NULL;
> +
> +		spin_lock_irq(&vdmabuf->evq->e_lock);
> +		if (!list_empty(&vdmabuf->evq->e_list)) {
> +			e = list_first_entry(&vdmabuf->evq->e_list,
> +					     struct virtio_vdmabuf_event, link);
> +			list_del(&e->link);
> +		}
> +		spin_unlock_irq(&vdmabuf->evq->e_lock);
> +
> +		if (!e) {
> +			if (ret)
> +				break;
> +
> +			if (filp->f_flags & O_NONBLOCK) {
> +				ret = -EAGAIN;
> +				break;
> +			}
> +
> +			mutex_unlock(&vdmabuf->evq->e_readlock);
> +			ret = wait_event_interruptible(vdmabuf->evq->e_wait,
> +					!list_empty(&vdmabuf->evq->e_list));
> +
> +			if (ret == 0)
> +				ret = mutex_lock_interruptible(
> +						&vdmabuf->evq->e_readlock);
> +
> +			if (ret)
> +				return ret;
> +		} else {
> +			unsigned int len = (sizeof(e->e_data.hdr) +
> +					    e->e_data.hdr.size);
> +
> +			if (len > cnt - ret) {
> +put_back_event:
> +				spin_lock_irq(&vdmabuf->evq->e_lock);
> +				list_add(&e->link, &vdmabuf->evq->e_list);
> +				spin_unlock_irq(&vdmabuf->evq->e_lock);
> +				break;
> +			}
> +
> +			if (copy_to_user(buf + ret, &e->e_data.hdr,
> +					 sizeof(e->e_data.hdr))) {
> +				if (ret == 0)
> +					ret = -EFAULT;
> +
> +				goto put_back_event;
> +			}
> +
> +			ret += sizeof(e->e_data.hdr);
> +
> +			if (copy_to_user(buf + ret, e->e_data.data,
> +					 e->e_data.hdr.size)) {
> +				/* error while copying void *data */
> +
> +				struct virtio_vdmabuf_e_hdr dummy_hdr = {0};
> +
> +				ret -= sizeof(e->e_data.hdr);
> +
> +				/* nullifying hdr of the event in user buffer */
> +				if (copy_to_user(buf + ret, &dummy_hdr,
> +						 sizeof(dummy_hdr)))
> +					dev_err(drv_info->dev,
> +					   "fail to nullify invalid hdr\n");
> +
> +				ret = -EFAULT;
> +
> +				goto put_back_event;
> +			}
> +
> +			ret += e->e_data.hdr.size;
> +			vdmabuf->evq->pending--;
> +			kvfree(e);
> +		}
> +	}
> +
> +	mutex_unlock(&vdmabuf->evq->e_readlock);
> +
> +	return ret;
> +}
> +
> +static const struct file_operations virtio_vdmabuf_fops = {
> +	.owner = THIS_MODULE,
> +	.open = virtio_vdmabuf_open,
> +	.release = virtio_vdmabuf_release,
> +	.read = virtio_vdmabuf_event_read,
> +	.poll = virtio_vdmabuf_event_poll,
> +	.unlocked_ioctl = virtio_vdmabuf_ioctl,
> +};
> +
> +static struct miscdevice virtio_vdmabuf_miscdev = {
> +	.minor = MISC_DYNAMIC_MINOR,
> +	.name = "virtio-vdmabuf",
> +	.fops = &virtio_vdmabuf_fops,
> +};
> +
> +static int virtio_vdmabuf_probe(struct virtio_device *vdev)
> +{
> +	vq_callback_t *cbs[] = {
> +		virtio_vdmabuf_recv_cb,
> +		virtio_vdmabuf_send_cb,
> +	};
> +	static const char *const names[] = {
> +		"recv",
> +		"send",
> +	};
> +	struct virtio_vdmabuf *vdmabuf;
> +	int ret = 0;
> +
> +	if (!drv_info)
> +		return -EINVAL;
> +
> +	vdmabuf = drv_info->priv;
> +
> +	if (!vdmabuf)
> +		return -EINVAL;
> +
> +	vdmabuf->vdev = vdev;
> +	vdev->priv = vdmabuf;
> +
> +	/* initialize spinlock for synchronizing virtqueue accesses */
> +	spin_lock_init(&vdmabuf->vq_lock);
> +
> +	ret = virtio_find_vqs(vdmabuf->vdev, VDMABUF_VQ_MAX, vdmabuf->vqs,
> +			      cbs, names, NULL);
> +	if (ret) {
> +		dev_err(drv_info->dev, "Cannot find any vqs\n");
> +		return ret;
> +	}
> +
> +	INIT_LIST_HEAD(&vdmabuf->msg_list);
> +	INIT_WORK(&vdmabuf->recv_work, virtio_vdmabuf_recv_work);
> +	INIT_WORK(&vdmabuf->send_work, virtio_vdmabuf_send_work);
> +	INIT_WORK(&vdmabuf->send_msg_work, virtio_vdmabuf_send_msg_work);
> +
> +	return ret;
> +}
> +
> +static void virtio_vdmabuf_remove(struct virtio_device *vdev)
> +{
> +	struct virtio_vdmabuf *vdmabuf;
> +
> +	if (!drv_info)
> +		return;
> +
> +	vdmabuf = drv_info->priv;
> +	flush_work(&vdmabuf->recv_work);
> +	flush_work(&vdmabuf->send_work);
> +	flush_work(&vdmabuf->send_msg_work);
> +
> +	vdev->config->reset(vdev);
> +	vdev->config->del_vqs(vdev);
> +}
> +
> +static struct virtio_device_id id_table[] = {
> +	{ VIRTIO_ID_VDMABUF, VIRTIO_DEV_ANY_ID },
> +	{ 0 },
> +};
> +
> +static struct virtio_driver virtio_vdmabuf_vdev_drv = {
> +	.driver.name =  KBUILD_MODNAME,
> +	.driver.owner = THIS_MODULE,
> +	.id_table =     id_table,
> +	.probe =        virtio_vdmabuf_probe,
> +	.remove =       virtio_vdmabuf_remove,
> +};
> +
> +static int __init virtio_vdmabuf_init(void)
> +{
> +	struct virtio_vdmabuf *vdmabuf;
> +	int ret = 0;
> +
> +	drv_info = NULL;
> +
> +	ret = misc_register(&virtio_vdmabuf_miscdev);
> +	if (ret) {
> +		pr_err("virtio-vdmabuf misc driver can't be registered\n");
> +		return ret;
> +	}
> +
> +	ret = dma_set_mask_and_coherent(virtio_vdmabuf_miscdev.this_device,
> +					DMA_BIT_MASK(64));
> +	if (ret < 0) {
> +		misc_deregister(&virtio_vdmabuf_miscdev);
> +		return -EINVAL;
> +	}
> +
> +	drv_info = kvcalloc(1, sizeof(*drv_info), GFP_KERNEL);
> +	if (!drv_info) {
> +		misc_deregister(&virtio_vdmabuf_miscdev);
> +		return -ENOMEM;
> +	}
> +
> +	vdmabuf = kvcalloc(1, sizeof(*vdmabuf), GFP_KERNEL);
> +	if (!vdmabuf) {
> +		kvfree(drv_info);
> +		misc_deregister(&virtio_vdmabuf_miscdev);
> +		return -ENOMEM;
> +	}
> +
> +	vdmabuf->evq = kvcalloc(1, sizeof(*(vdmabuf->evq)), GFP_KERNEL);
> +	if (!vdmabuf->evq) {
> +		kvfree(drv_info);
> +		kvfree(vdmabuf);
> +		misc_deregister(&virtio_vdmabuf_miscdev);
> +		return -ENOMEM;
> +	}
> +
> +	drv_info->priv = (void *)vdmabuf;
> +	drv_info->dev = virtio_vdmabuf_miscdev.this_device;
> +
> +	mutex_init(&drv_info->g_mutex);
> +
> +	mutex_init(&vdmabuf->evq->e_readlock);
> +	spin_lock_init(&vdmabuf->evq->e_lock);
> +
> +	INIT_LIST_HEAD(&vdmabuf->evq->e_list);
> +	init_waitqueue_head(&vdmabuf->evq->e_wait);
> +	hash_init(drv_info->buf_list);
> +
> +	vdmabuf->evq->pending = 0;
> +	vdmabuf->wq = create_workqueue("virtio_vdmabuf_wq");
> +
> +	ret = register_virtio_driver(&virtio_vdmabuf_vdev_drv);
> +	if (ret) {
> +		dev_err(drv_info->dev, "vdmabuf driver can't be registered\n");
> +		misc_deregister(&virtio_vdmabuf_miscdev);
> +		kvfree(vdmabuf);
> +		kvfree(drv_info);
> +		return -EFAULT;
> +	}
> +
> +	return 0;
> +}
> +
> +static void __exit virtio_vdmabuf_deinit(void)
> +{
> +	struct virtio_vdmabuf *vdmabuf = drv_info->priv;
> +	struct virtio_vdmabuf_event *e, *et;
> +	unsigned long irqflags;
> +
> +	misc_deregister(&virtio_vdmabuf_miscdev);
> +	unregister_virtio_driver(&virtio_vdmabuf_vdev_drv);
> +
> +	if (vdmabuf->wq)
> +		destroy_workqueue(vdmabuf->wq);
> +
> +	spin_lock_irqsave(&vdmabuf->evq->e_lock, irqflags);
> +
> +	list_for_each_entry_safe(e, et, &vdmabuf->evq->e_list,
> +				 link) {
> +		list_del(&e->link);
> +		kvfree(e);
> +		vdmabuf->evq->pending--;
> +	}
> +
> +	spin_unlock_irqrestore(&vdmabuf->evq->e_lock, irqflags);
> +
> +	/* freeing all exported buffers */
> +	remove_all_bufs(vdmabuf);
> +
> +	kvfree(vdmabuf->evq);
> +	kvfree(vdmabuf);
> +	kvfree(drv_info);
> +}
> +
> +module_init(virtio_vdmabuf_init);
> +module_exit(virtio_vdmabuf_deinit);
> +
> +MODULE_DEVICE_TABLE(virtio, virtio_vdmabuf_id_table);
> +MODULE_DESCRIPTION("Virtio Vdmabuf frontend driver");
> +MODULE_LICENSE("GPL and additional rights");
> diff --git a/include/linux/virtio_vdmabuf.h b/include/linux/virtio_vdmabuf.h
> new file mode 100644
> index 000000000000..9500bf4a54ac
> --- /dev/null
> +++ b/include/linux/virtio_vdmabuf.h
> @@ -0,0 +1,271 @@
> +/* SPDX-License-Identifier: (MIT OR GPL-2.0) */
> +
> +/*
> + * Copyright © 2021 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef _LINUX_VIRTIO_VDMABUF_H 
> +#define _LINUX_VIRTIO_VDMABUF_H 
> +
> +#include <uapi/linux/virtio_vdmabuf.h>
> +#include <linux/hashtable.h>
> +#include <linux/kvm_types.h>
> +
> +struct virtio_vdmabuf_shared_pages {
> +	/* cross-VM ref addr for the buffer */
> +	gpa_t ref;
> +
> +	/* page array */
> +	struct page **pages;
> +	gpa_t **l2refs;
> +	gpa_t *l3refs;
> +
> +	/* data offset in the first page
> +	 * and data length in the last page
> +	 */
> +	int first_ofst;
> +	int last_len;
> +
> +	/* number of shared pages */
> +	int nents;
> +};
> +
> +struct virtio_vdmabuf_buf {
> +	virtio_vdmabuf_buf_id_t buf_id;
> +
> +	struct dma_buf_attachment *attach;
> +	struct dma_buf *dma_buf;
> +	struct sg_table *sgt;
> +	struct virtio_vdmabuf_shared_pages *pages_info;
> +	int vmid;
> +
> +	/* validity of the buffer */
> +	bool valid;
> +
> +	/* set if the buffer is imported via import_ioctl */
> +	bool imported;
> +
> +	/* size of private */
> +	size_t sz_priv;
> +	/* private data associated with the exported buffer */
> +	void *priv;
> +
> +	struct file *filp;
> +	struct hlist_node node;
> +};
> +
> +struct virtio_vdmabuf_event {
> +	struct virtio_vdmabuf_e_data e_data;
> +	struct list_head link;
> +};
> +
> +struct virtio_vdmabuf_event_queue {
> +	wait_queue_head_t e_wait;
> +	struct list_head e_list;
> +
> +	spinlock_t e_lock;
> +	struct mutex e_readlock;
> +
> +	/* # of pending events */
> +	int pending;
> +};
> +
> +/* driver information */
> +struct virtio_vdmabuf_info {
> +	struct device *dev;
> +
> +	struct list_head head_vdmabuf_list;
> +	struct list_head kvm_instances;
> +
> +	DECLARE_HASHTABLE(buf_list, 7);
> +
> +	void *priv;
> +	struct mutex g_mutex;
> +	struct notifier_block kvm_notifier;
> +};
> +
> +/* IOCTL definitions
> + */
> +typedef int (*virtio_vdmabuf_ioctl_t)(struct file *filp, void *data);
> +
> +struct virtio_vdmabuf_ioctl_desc {
> +	unsigned int cmd;
> +	int flags;
> +	virtio_vdmabuf_ioctl_t func;
> +	const char *name;
> +};
> +
> +#define VIRTIO_VDMABUF_IOCTL_DEF(ioctl, _func, _flags)	\
> +	[_IOC_NR(ioctl)] = {			\
> +			.cmd = ioctl,		\
> +			.func = _func,		\
> +			.flags = _flags,	\
> +			.name = #ioctl		\
> +}
> +
> +#define VIRTIO_VDMABUF_VMID(buf_id) ((((buf_id).id) >> 32) & 0xFFFFFFFF)
> +
> +/* Messages between Host and Guest */
> +
> +/* List of commands from Guest to Host:
> + *
> + * ------------------------------------------------------------------
> + * A. NEED_VMID
> + *
> + *  guest asks the host to provide its vmid
> + *
> + * req:
> + *
> + * cmd: VIRTIO_VDMABUF_NEED_VMID
> + *
> + * ack:
> + *
> + * cmd: same as req
> + * op[0] : vmid of guest
> + *
> + * ------------------------------------------------------------------
> + * B. EXPORT
> + *
> + *  export dmabuf to host
> + *
> + * req:
> + *
> + * cmd: VIRTIO_VDMABUF_CMD_EXPORT
> + * op0~op3 : HDMABUF ID
> + * op4 : number of pages to be shared
> + * op5 : offset of data in the first page
> + * op6 : length of data in the last page
> + * op7 : upper 32 bit of top-level ref of shared buf
> + * op8 : lower 32 bit of top-level ref of shared buf
> + * op9 : size of private data
> + * op10 ~ op64: User private date associated with the buffer
> + *	        (e.g. graphic buffer's meta info)
> + *
> + * ------------------------------------------------------------------
> + *
> + * List of commands from Host to Guest
> + *
> + * ------------------------------------------------------------------
> + * A. RELEASE
> + *
> + *  notifying guest that the shared buffer is released by an importer
> + *
> + * req:
> + *
> + * cmd: VIRTIO_VDMABUF_CMD_DMABUF_REL
> + * op0~op3 : VDMABUF ID
> + *
> + * ------------------------------------------------------------------
> + */
> +
> +/* msg structures */
> +struct virtio_vdmabuf_msg {
> +	struct list_head list;
> +	unsigned int cmd;
> +	unsigned int op[64];
> +};
> +
> +enum {
> +	VDMABUF_VQ_RECV = 0,
> +	VDMABUF_VQ_SEND = 1,
> +	VDMABUF_VQ_MAX  = 2,
> +};
> +
> +enum virtio_vdmabuf_cmd {
> +	VIRTIO_VDMABUF_CMD_NEED_VMID,
> +	VIRTIO_VDMABUF_CMD_EXPORT = 0x10,
> +	VIRTIO_VDMABUF_CMD_DMABUF_REL
> +};
> +
> +enum virtio_vdmabuf_ops {
> +	VIRTIO_VDMABUF_HDMABUF_ID_ID = 0,
> +	VIRTIO_VDMABUF_HDMABUF_ID_RNG_KEY0,
> +	VIRTIO_VDMABUF_HDMABUF_ID_RNG_KEY1,
> +	VIRTIO_VDMABUF_NUM_PAGES_SHARED = 4,
> +	VIRTIO_VDMABUF_FIRST_PAGE_DATA_OFFSET,
> +	VIRTIO_VDMABUF_LAST_PAGE_DATA_LENGTH,
> +	VIRTIO_VDMABUF_REF_ADDR_UPPER_32BIT,
> +	VIRTIO_VDMABUF_REF_ADDR_LOWER_32BIT,
> +	VIRTIO_VDMABUF_PRIVATE_DATA_SIZE,
> +	VIRTIO_VDMABUF_PRIVATE_DATA_START
> +};
> +
> +/* adding exported/imported vdmabuf info to hash */
> +static inline int
> +virtio_vdmabuf_add_buf(struct virtio_vdmabuf_info *info,
> +                       struct virtio_vdmabuf_buf *new)
> +{
> +	hash_add(info->buf_list, &new->node, new->buf_id.id);
> +	return 0;
> +}
> +
> +/* comparing two vdmabuf IDs */
> +static inline bool
> +is_same_buf(virtio_vdmabuf_buf_id_t a,
> +            virtio_vdmabuf_buf_id_t b)
> +{
> +	int i;
> +
> +	if (a.id != b.id)
> +		return 1;
> +
> +	/* compare keys */
> +	for (i = 0; i < 2; i++) {
> +		if (a.rng_key[i] != b.rng_key[i])
> +			return false;
> +	}
> +
> +	return true;
> +}
> +
> +/* find buf for given vdmabuf ID */
> +static inline struct virtio_vdmabuf_buf
> +*virtio_vdmabuf_find_buf(struct virtio_vdmabuf_info *info,
> +			 virtio_vdmabuf_buf_id_t *buf_id)
> +{
> +	struct virtio_vdmabuf_buf *found;
> +
> +	hash_for_each_possible(info->buf_list, found, node, buf_id->id)
> +		if (is_same_buf(found->buf_id, *buf_id))
> +			return found;
> +
> +	return NULL;
> +}
> +
> +/* delete buf from hash */
> +static inline int
> +virtio_vdmabuf_del_buf(struct virtio_vdmabuf_info *info,
> +                       virtio_vdmabuf_buf_id_t *buf_id)
> +{
> +	struct virtio_vdmabuf_buf *found;
> +
> +	found = virtio_vdmabuf_find_buf(info, buf_id);
> +	if (!found)
> +		return -ENOENT;
> +
> +	hash_del(&found->node);
> +
> +	return 0;
> +}
> +
> +#endif
> diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
> index bc1c0621f5ed..39c94637ddee 100644
> --- a/include/uapi/linux/virtio_ids.h
> +++ b/include/uapi/linux/virtio_ids.h
> @@ -54,5 +54,6 @@
>  #define VIRTIO_ID_FS			26 /* virtio filesystem */
>  #define VIRTIO_ID_PMEM			27 /* virtio pmem */
>  #define VIRTIO_ID_MAC80211_HWSIM	29 /* virtio mac80211-hwsim */
> +#define VIRTIO_ID_VDMABUF          	40 /* virtio vdmabuf */
>  
>  #endif /* _LINUX_VIRTIO_IDS_H */
> diff --git a/include/uapi/linux/virtio_vdmabuf.h b/include/uapi/linux/virtio_vdmabuf.h
> new file mode 100644
> index 000000000000..7bddaa04ddd6
> --- /dev/null
> +++ b/include/uapi/linux/virtio_vdmabuf.h
> @@ -0,0 +1,99 @@
> +// SPDX-License-Identifier: (MIT OR GPL-2.0)
> +
> +/*
> + * Copyright © 2021 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef _UAPI_LINUX_VIRTIO_VDMABUF_H
> +#define _UAPI_LINUX_VIRTIO_VDMABUF_H
> +
> +#define MAX_SIZE_PRIV_DATA 192
> +
> +typedef struct {
> +	__u64 id;
> +	/* 8B long Random number */
> +	int rng_key[2];
> +} virtio_vdmabuf_buf_id_t;
> +
> +struct virtio_vdmabuf_e_hdr {
> +	/* buf_id of new buf */
> +	virtio_vdmabuf_buf_id_t buf_id;
> +	/* size of private data */
> +	int size;
> +};
> +
> +struct virtio_vdmabuf_e_data {
> +	struct virtio_vdmabuf_e_hdr hdr;
> +	/* ptr to private data */
> +	void __user *data;
> +};
> +
> +#define VIRTIO_VDMABUF_IOCTL_IMPORT \
> +_IOC(_IOC_NONE, 'G', 2, sizeof(struct virtio_vdmabuf_import))
> +#define VIRTIO_VDMABUF_IOCTL_RELEASE \
> +_IOC(_IOC_NONE, 'G', 3, sizeof(struct virtio_vdmabuf_import))
> +struct virtio_vdmabuf_import {
> +	/* IN parameters */
> +	/* ahdb buf id to be imported */
> +	virtio_vdmabuf_buf_id_t buf_id;
> +	/* flags */
> +	int flags;
> +	/* OUT parameters */
> +	/* exported dma buf fd */
> +	int fd;
> +};
> +
> +#define VIRTIO_VDMABUF_IOCTL_EXPORT \
> +_IOC(_IOC_NONE, 'G', 4, sizeof(struct virtio_vdmabuf_export))
> +struct virtio_vdmabuf_export {
> +	/* IN parameters */
> +	/* DMA buf fd to be exported */
> +	int fd;
> +	/* exported dma buf id */
> +	virtio_vdmabuf_buf_id_t buf_id;
> +	int sz_priv;
> +	char *priv;
> +};
> +
> +#define VIRTIO_VDMABUF_IOCTL_QUERY \
> +_IOC(_IOC_NONE, 'G', 5, sizeof(struct virtio_vdmabuf_query))
> +struct virtio_vdmabuf_query {
> +	/* in parameters */
> +	/* id of buf to be queried */
> +	virtio_vdmabuf_buf_id_t buf_id;
> +	/* item to be queried */
> +	int item;
> +	/* OUT parameters */
> +	/* Value of queried item */
> +	unsigned long info;
> +};
> +
> +/* DMABUF query */
> +enum virtio_vdmabuf_query_cmd {
> +	VIRTIO_VDMABUF_QUERY_SIZE = 0x10,
> +	VIRTIO_VDMABUF_QUERY_BUSY,
> +	VIRTIO_VDMABUF_QUERY_PRIV_INFO_SIZE,
> +	VIRTIO_VDMABUF_QUERY_PRIV_INFO,
> +};
> +
> +#endif
> -- 
> 2.26.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
  2021-02-05 16:03     ` Daniel Vetter
  (?)
@ 2021-02-08  7:57       ` Gerd Hoffmann
  -1 siblings, 0 replies; 57+ messages in thread
From: Gerd Hoffmann @ 2021-02-08  7:57 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Vivek Kasireddy, virtualization, dri-devel, daniel.vetter,
	daniel.vetter, dongwon.kim, sumit.semwal, christian.koenig,
	linux-media

  Hi,

> > +/* extract pages referenced by sgt */
> > +static struct page **extr_pgs(struct sg_table *sgt, int *nents, int *last_len)
> 
> Nack, this doesn't work on dma-buf. And it'll blow up at runtime when you
> enable the very recently merged CONFIG_DMABUF_DEBUG (would be good to test
> with that, just to make sure).

> Aside from this, for virtio/kvm use-cases we've already merged the udmabuf
> driver. Does this not work for your usecase?

udmabuf can be used on the host side to make a collection of guest pages
available as host dmabuf.  It's part of the puzzle, but not a complete
solution.

As I understand it the intended workflow is this:

  (1) guest gpu driver exports some object as dma-buf
  (2) dma-buf is imported into this new driver.
  (3) driver sends the pages to the host.
  (4) hypervisor uses udmabuf to create a host dma-buf.
  (5) host dma-buf is passed on.

And step (3) is the problematic one as this will not
work in case the dma-buf doesn't live in guest ram but
in -- for example -- gpu device memory.

Reversing the driver roles in the guest (virtio driver
allocates pages and exports the dma-buf to the guest
gpu driver) should work fine.

Which btw is something you can do today with virtio-gpu.
Maybe it makes sense to have the option to run virtio-gpu
in render-only mode for that use case.

take care,
  Gerd


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-08  7:57       ` Gerd Hoffmann
  0 siblings, 0 replies; 57+ messages in thread
From: Gerd Hoffmann @ 2021-02-08  7:57 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: dongwon.kim, christian.koenig, daniel.vetter, dri-devel,
	virtualization, daniel.vetter, sumit.semwal, linux-media

  Hi,

> > +/* extract pages referenced by sgt */
> > +static struct page **extr_pgs(struct sg_table *sgt, int *nents, int *last_len)
> 
> Nack, this doesn't work on dma-buf. And it'll blow up at runtime when you
> enable the very recently merged CONFIG_DMABUF_DEBUG (would be good to test
> with that, just to make sure).

> Aside from this, for virtio/kvm use-cases we've already merged the udmabuf
> driver. Does this not work for your usecase?

udmabuf can be used on the host side to make a collection of guest pages
available as host dmabuf.  It's part of the puzzle, but not a complete
solution.

As I understand it the intended workflow is this:

  (1) guest gpu driver exports some object as dma-buf
  (2) dma-buf is imported into this new driver.
  (3) driver sends the pages to the host.
  (4) hypervisor uses udmabuf to create a host dma-buf.
  (5) host dma-buf is passed on.

And step (3) is the problematic one as this will not
work in case the dma-buf doesn't live in guest ram but
in -- for example -- gpu device memory.

Reversing the driver roles in the guest (virtio driver
allocates pages and exports the dma-buf to the guest
gpu driver) should work fine.

Which btw is something you can do today with virtio-gpu.
Maybe it makes sense to have the option to run virtio-gpu
in render-only mode for that use case.

take care,
  Gerd

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-08  7:57       ` Gerd Hoffmann
  0 siblings, 0 replies; 57+ messages in thread
From: Gerd Hoffmann @ 2021-02-08  7:57 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: dongwon.kim, christian.koenig, daniel.vetter, Vivek Kasireddy,
	dri-devel, virtualization, daniel.vetter, linux-media

  Hi,

> > +/* extract pages referenced by sgt */
> > +static struct page **extr_pgs(struct sg_table *sgt, int *nents, int *last_len)
> 
> Nack, this doesn't work on dma-buf. And it'll blow up at runtime when you
> enable the very recently merged CONFIG_DMABUF_DEBUG (would be good to test
> with that, just to make sure).

> Aside from this, for virtio/kvm use-cases we've already merged the udmabuf
> driver. Does this not work for your usecase?

udmabuf can be used on the host side to make a collection of guest pages
available as host dmabuf.  It's part of the puzzle, but not a complete
solution.

As I understand it the intended workflow is this:

  (1) guest gpu driver exports some object as dma-buf
  (2) dma-buf is imported into this new driver.
  (3) driver sends the pages to the host.
  (4) hypervisor uses udmabuf to create a host dma-buf.
  (5) host dma-buf is passed on.

And step (3) is the problematic one as this will not
work in case the dma-buf doesn't live in guest ram but
in -- for example -- gpu device memory.

Reversing the driver roles in the guest (virtio driver
allocates pages and exports the dma-buf to the guest
gpu driver) should work fine.

Which btw is something you can do today with virtio-gpu.
Maybe it makes sense to have the option to run virtio-gpu
in render-only mode for that use case.

take care,
  Gerd

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
  2021-02-08  7:57       ` Gerd Hoffmann
  (?)
@ 2021-02-08  9:38         ` Daniel Vetter
  -1 siblings, 0 replies; 57+ messages in thread
From: Daniel Vetter @ 2021-02-08  9:38 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Daniel Vetter, Vivek Kasireddy, virtualization, dri-devel,
	daniel.vetter, daniel.vetter, dongwon.kim, sumit.semwal,
	christian.koenig, linux-media

On Mon, Feb 08, 2021 at 08:57:48AM +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> > > +/* extract pages referenced by sgt */
> > > +static struct page **extr_pgs(struct sg_table *sgt, int *nents, int *last_len)
> > 
> > Nack, this doesn't work on dma-buf. And it'll blow up at runtime when you
> > enable the very recently merged CONFIG_DMABUF_DEBUG (would be good to test
> > with that, just to make sure).
> 
> > Aside from this, for virtio/kvm use-cases we've already merged the udmabuf
> > driver. Does this not work for your usecase?
> 
> udmabuf can be used on the host side to make a collection of guest pages
> available as host dmabuf.  It's part of the puzzle, but not a complete
> solution.
> 
> As I understand it the intended workflow is this:
> 
>   (1) guest gpu driver exports some object as dma-buf
>   (2) dma-buf is imported into this new driver.
>   (3) driver sends the pages to the host.
>   (4) hypervisor uses udmabuf to create a host dma-buf.
>   (5) host dma-buf is passed on.
> 
> And step (3) is the problematic one as this will not
> work in case the dma-buf doesn't live in guest ram but
> in -- for example -- gpu device memory.

Yup, vram or similar special ram is the reason why an importer can't look
at the pages behind a dma-buf sg table.

> Reversing the driver roles in the guest (virtio driver
> allocates pages and exports the dma-buf to the guest
> gpu driver) should work fine.

Yup, this needs to flow the other way round than in these patches.

> Which btw is something you can do today with virtio-gpu.
> Maybe it makes sense to have the option to run virtio-gpu
> in render-only mode for that use case.

Yeah that sounds like a useful addition.

Also, the same flow should work for real gpus passed through as pci
devices. What we need is some way to surface the dma-buf on the guest
side, which I think doesn't exist yet stand-alone. But this role could be
fulfilled by virtio-gpu in render-only mode I think. And (assuming I've
understood the recent discussions around virtio dma-buf sharing using
virtio ids) this would give you some neat zero-copy tricks for free if you
share multiple devices.

Also if you really want seamless buffer sharing between devices that are
passed to the guest and devices on the host side (like displays I guess?
or maybe video encode if this is for cloug gaming?), then using virtio-gpu
in render mode should also allow you to pass the dma_fence back&forth.
Which we'll need too, not just the dma-buf.

So at a first guess I'd say "render-only virtio-gpu mode" sounds like
something rather useful. But I might be totally off here.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-08  9:38         ` Daniel Vetter
  0 siblings, 0 replies; 57+ messages in thread
From: Daniel Vetter @ 2021-02-08  9:38 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: dongwon.kim, christian.koenig, daniel.vetter, dri-devel,
	virtualization, Daniel Vetter, daniel.vetter, sumit.semwal,
	linux-media

On Mon, Feb 08, 2021 at 08:57:48AM +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> > > +/* extract pages referenced by sgt */
> > > +static struct page **extr_pgs(struct sg_table *sgt, int *nents, int *last_len)
> > 
> > Nack, this doesn't work on dma-buf. And it'll blow up at runtime when you
> > enable the very recently merged CONFIG_DMABUF_DEBUG (would be good to test
> > with that, just to make sure).
> 
> > Aside from this, for virtio/kvm use-cases we've already merged the udmabuf
> > driver. Does this not work for your usecase?
> 
> udmabuf can be used on the host side to make a collection of guest pages
> available as host dmabuf.  It's part of the puzzle, but not a complete
> solution.
> 
> As I understand it the intended workflow is this:
> 
>   (1) guest gpu driver exports some object as dma-buf
>   (2) dma-buf is imported into this new driver.
>   (3) driver sends the pages to the host.
>   (4) hypervisor uses udmabuf to create a host dma-buf.
>   (5) host dma-buf is passed on.
> 
> And step (3) is the problematic one as this will not
> work in case the dma-buf doesn't live in guest ram but
> in -- for example -- gpu device memory.

Yup, vram or similar special ram is the reason why an importer can't look
at the pages behind a dma-buf sg table.

> Reversing the driver roles in the guest (virtio driver
> allocates pages and exports the dma-buf to the guest
> gpu driver) should work fine.

Yup, this needs to flow the other way round than in these patches.

> Which btw is something you can do today with virtio-gpu.
> Maybe it makes sense to have the option to run virtio-gpu
> in render-only mode for that use case.

Yeah that sounds like a useful addition.

Also, the same flow should work for real gpus passed through as pci
devices. What we need is some way to surface the dma-buf on the guest
side, which I think doesn't exist yet stand-alone. But this role could be
fulfilled by virtio-gpu in render-only mode I think. And (assuming I've
understood the recent discussions around virtio dma-buf sharing using
virtio ids) this would give you some neat zero-copy tricks for free if you
share multiple devices.

Also if you really want seamless buffer sharing between devices that are
passed to the guest and devices on the host side (like displays I guess?
or maybe video encode if this is for cloug gaming?), then using virtio-gpu
in render mode should also allow you to pass the dma_fence back&forth.
Which we'll need too, not just the dma-buf.

So at a first guess I'd say "render-only virtio-gpu mode" sounds like
something rather useful. But I might be totally off here.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-08  9:38         ` Daniel Vetter
  0 siblings, 0 replies; 57+ messages in thread
From: Daniel Vetter @ 2021-02-08  9:38 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: dongwon.kim, christian.koenig, daniel.vetter, Vivek Kasireddy,
	dri-devel, virtualization, daniel.vetter, linux-media

On Mon, Feb 08, 2021 at 08:57:48AM +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> > > +/* extract pages referenced by sgt */
> > > +static struct page **extr_pgs(struct sg_table *sgt, int *nents, int *last_len)
> > 
> > Nack, this doesn't work on dma-buf. And it'll blow up at runtime when you
> > enable the very recently merged CONFIG_DMABUF_DEBUG (would be good to test
> > with that, just to make sure).
> 
> > Aside from this, for virtio/kvm use-cases we've already merged the udmabuf
> > driver. Does this not work for your usecase?
> 
> udmabuf can be used on the host side to make a collection of guest pages
> available as host dmabuf.  It's part of the puzzle, but not a complete
> solution.
> 
> As I understand it the intended workflow is this:
> 
>   (1) guest gpu driver exports some object as dma-buf
>   (2) dma-buf is imported into this new driver.
>   (3) driver sends the pages to the host.
>   (4) hypervisor uses udmabuf to create a host dma-buf.
>   (5) host dma-buf is passed on.
> 
> And step (3) is the problematic one as this will not
> work in case the dma-buf doesn't live in guest ram but
> in -- for example -- gpu device memory.

Yup, vram or similar special ram is the reason why an importer can't look
at the pages behind a dma-buf sg table.

> Reversing the driver roles in the guest (virtio driver
> allocates pages and exports the dma-buf to the guest
> gpu driver) should work fine.

Yup, this needs to flow the other way round than in these patches.

> Which btw is something you can do today with virtio-gpu.
> Maybe it makes sense to have the option to run virtio-gpu
> in render-only mode for that use case.

Yeah that sounds like a useful addition.

Also, the same flow should work for real gpus passed through as pci
devices. What we need is some way to surface the dma-buf on the guest
side, which I think doesn't exist yet stand-alone. But this role could be
fulfilled by virtio-gpu in render-only mode I think. And (assuming I've
understood the recent discussions around virtio dma-buf sharing using
virtio ids) this would give you some neat zero-copy tricks for free if you
share multiple devices.

Also if you really want seamless buffer sharing between devices that are
passed to the guest and devices on the host side (like displays I guess?
or maybe video encode if this is for cloug gaming?), then using virtio-gpu
in render mode should also allow you to pass the dma_fence back&forth.
Which we'll need too, not just the dma-buf.

So at a first guess I'd say "render-only virtio-gpu mode" sounds like
something rather useful. But I might be totally off here.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
  2021-02-08  9:38         ` Daniel Vetter
  (?)
@ 2021-02-09  0:25           ` Kasireddy, Vivek
  -1 siblings, 0 replies; 57+ messages in thread
From: Kasireddy, Vivek @ 2021-02-09  0:25 UTC (permalink / raw)
  To: Daniel Vetter, Gerd Hoffmann
  Cc: virtualization, dri-devel, Vetter, Daniel, daniel.vetter, Kim,
	Dongwon, sumit.semwal, christian.koenig, linux-media

Hi Gerd, Daniel,

> -----Original Message-----
> From: Daniel Vetter <daniel@ffwll.ch>
> Sent: Monday, February 08, 2021 1:39 AM
> To: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Daniel Vetter <daniel@ffwll.ch>; Kasireddy, Vivek <vivek.kasireddy@intel.com>;
> virtualization@lists.linux-foundation.org; dri-devel@lists.freedesktop.org; Vetter, Daniel
> <daniel.vetter@intel.com>; daniel.vetter@ffwll.ch; Kim, Dongwon
> <dongwon.kim@intel.com>; sumit.semwal@linaro.org; christian.koenig@amd.com;
> linux-media@vger.kernel.org
> Subject: Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
> 
> On Mon, Feb 08, 2021 at 08:57:48AM +0100, Gerd Hoffmann wrote:
> >   Hi,
> >
> > > > +/* extract pages referenced by sgt */ static struct page
> > > > +**extr_pgs(struct sg_table *sgt, int *nents, int *last_len)
> > >
> > > Nack, this doesn't work on dma-buf. And it'll blow up at runtime
> > > when you enable the very recently merged CONFIG_DMABUF_DEBUG (would
> > > be good to test with that, just to make sure).
[Kasireddy, Vivek] Although, I have not tested it yet but it looks like this will
throw a wrench in our solution as we use sg_next to iterate over all the struct page *
and get their PFNs. I wonder if there is any other clean way to get the PFNs of all 
the pages associated with a dmabuf.

> >
> > > Aside from this, for virtio/kvm use-cases we've already merged the
> > > udmabuf driver. Does this not work for your usecase?
> >
> > udmabuf can be used on the host side to make a collection of guest
> > pages available as host dmabuf.  It's part of the puzzle, but not a
> > complete solution.
> >
> > As I understand it the intended workflow is this:
> >
> >   (1) guest gpu driver exports some object as dma-buf
> >   (2) dma-buf is imported into this new driver.
> >   (3) driver sends the pages to the host.
> >   (4) hypervisor uses udmabuf to create a host dma-buf.
> >   (5) host dma-buf is passed on.
> >
> > And step (3) is the problematic one as this will not work in case the
> > dma-buf doesn't live in guest ram but in -- for example -- gpu device
> > memory.
> 
> Yup, vram or similar special ram is the reason why an importer can't look at the pages
> behind a dma-buf sg table.
[Kasireddy, Vivek] To exclude such cases, would it not be OK to limit the scope 
of this solution (Vdmabuf) to make it clear that the dma-buf has to live in Guest RAM?
Or, are there any ways to pin the dma-buf pages in Guest RAM to make this
solution work?

> 
> > Reversing the driver roles in the guest (virtio driver allocates pages
> > and exports the dma-buf to the guest gpu driver) should work fine.
> 
> Yup, this needs to flow the other way round than in these patches.
[Kasireddy, Vivek] That might work but I am afraid it means making invasive changes
to the Guest GPU driver (i915 in our case) which we are trying to avoid to
keep this solution more generic.

> 
> > Which btw is something you can do today with virtio-gpu.
> > Maybe it makes sense to have the option to run virtio-gpu in
> > render-only mode for that use case.
> 
> Yeah that sounds like a useful addition.
> 
> Also, the same flow should work for real gpus passed through as pci devices. What we
> need is some way to surface the dma-buf on the guest side, which I think doesn't exist yet
> stand-alone. But this role could be fulfilled by virtio-gpu in render-only mode I think. And
> (assuming I've understood the recent discussions around virtio dma-buf sharing using
> virtio ids) this would give you some neat zero-copy tricks for free if you share multiple
> devices.
> 
> Also if you really want seamless buffer sharing between devices that are passed to the
> guest and devices on the host side (like displays I guess?
> or maybe video encode if this is for cloug gaming?), then using virtio-gpu in render mode
> should also allow you to pass the dma_fence back&forth.
> Which we'll need too, not just the dma-buf.
> 
> So at a first guess I'd say "render-only virtio-gpu mode" sounds like something rather
> useful. But I might be totally off here.
[Kasireddy, Vivek] Let me present more details about the use-case we are trying to solve;
Sorry for the crude graphic below:

Guest:						Host:
+-----------+					+----------------+
|   Weston  |					|    Qemu UI	|
|(Headless)|					|		|
+-----------+					+-^--------------+
         |  (1. Export prime fd	  (3.Share UUID |	       |  (4. Qemu calls Import using this UUID and a gets a new Dmabuf
         |   of scanout buffer)	     with Qemu)   |	       |   fd that is used with EGL_LINUX_DMA_BUF_EXT)
+-----v-------------+				+------------v-----+
|Virtio-Vdmabuf  |------------------------------->|Vhost-Vdmabuf|
|                            | (2.Generate & share UUID|	                |
+--------------------+ and PFNs for this buffer) +------------------+

Here is a link to the Weston Headless backend that we tested:
https://github.com/vivekkreddy/Intel-Distribution-of-Weston/blob/vdmabuf/libweston/backend-headless/headless.c#L287
And, here is the link to the Qemu part:
https://lists.nongnu.org/archive/html/qemu-devel/2021-02/msg02976.html

IIUC, Virtio GPU is used to present a virtual GPU to the Guest and all the rendering 
commands are captured and forwarded to the Host GPU via Virtio. However, in our
use-case, we are passthrough'ing a real GPU (claimed by i915 running in the Guest)
that is Headless; hence, we need to use Weston with the headless backend. The 
rendering in the Guest is accomplished via the unmodified native stack that includes 
Iris and i915. Therefore, it would not be efficient to use virtio-gpu for rendering or
for any other purpose in the Guest given that we could use the native stack which
is definitely faster. 

We really want to make this solution GPU driver (Host and Guest) agnostic and for
that reason it would need to rely on Dmabuf interfaces/APIs and preferrably avoid
making modifications to the native DRM drivers.

Thanks,
Vivek

> 
> Cheers, Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-09  0:25           ` Kasireddy, Vivek
  0 siblings, 0 replies; 57+ messages in thread
From: Kasireddy, Vivek @ 2021-02-09  0:25 UTC (permalink / raw)
  To: Daniel Vetter, Gerd Hoffmann
  Cc: Kim, Dongwon, christian.koenig, daniel.vetter, dri-devel,
	virtualization, Vetter,  Daniel, sumit.semwal, linux-media

Hi Gerd, Daniel,

> -----Original Message-----
> From: Daniel Vetter <daniel@ffwll.ch>
> Sent: Monday, February 08, 2021 1:39 AM
> To: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Daniel Vetter <daniel@ffwll.ch>; Kasireddy, Vivek <vivek.kasireddy@intel.com>;
> virtualization@lists.linux-foundation.org; dri-devel@lists.freedesktop.org; Vetter, Daniel
> <daniel.vetter@intel.com>; daniel.vetter@ffwll.ch; Kim, Dongwon
> <dongwon.kim@intel.com>; sumit.semwal@linaro.org; christian.koenig@amd.com;
> linux-media@vger.kernel.org
> Subject: Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
> 
> On Mon, Feb 08, 2021 at 08:57:48AM +0100, Gerd Hoffmann wrote:
> >   Hi,
> >
> > > > +/* extract pages referenced by sgt */ static struct page
> > > > +**extr_pgs(struct sg_table *sgt, int *nents, int *last_len)
> > >
> > > Nack, this doesn't work on dma-buf. And it'll blow up at runtime
> > > when you enable the very recently merged CONFIG_DMABUF_DEBUG (would
> > > be good to test with that, just to make sure).
[Kasireddy, Vivek] Although, I have not tested it yet but it looks like this will
throw a wrench in our solution as we use sg_next to iterate over all the struct page *
and get their PFNs. I wonder if there is any other clean way to get the PFNs of all 
the pages associated with a dmabuf.

> >
> > > Aside from this, for virtio/kvm use-cases we've already merged the
> > > udmabuf driver. Does this not work for your usecase?
> >
> > udmabuf can be used on the host side to make a collection of guest
> > pages available as host dmabuf.  It's part of the puzzle, but not a
> > complete solution.
> >
> > As I understand it the intended workflow is this:
> >
> >   (1) guest gpu driver exports some object as dma-buf
> >   (2) dma-buf is imported into this new driver.
> >   (3) driver sends the pages to the host.
> >   (4) hypervisor uses udmabuf to create a host dma-buf.
> >   (5) host dma-buf is passed on.
> >
> > And step (3) is the problematic one as this will not work in case the
> > dma-buf doesn't live in guest ram but in -- for example -- gpu device
> > memory.
> 
> Yup, vram or similar special ram is the reason why an importer can't look at the pages
> behind a dma-buf sg table.
[Kasireddy, Vivek] To exclude such cases, would it not be OK to limit the scope 
of this solution (Vdmabuf) to make it clear that the dma-buf has to live in Guest RAM?
Or, are there any ways to pin the dma-buf pages in Guest RAM to make this
solution work?

> 
> > Reversing the driver roles in the guest (virtio driver allocates pages
> > and exports the dma-buf to the guest gpu driver) should work fine.
> 
> Yup, this needs to flow the other way round than in these patches.
[Kasireddy, Vivek] That might work but I am afraid it means making invasive changes
to the Guest GPU driver (i915 in our case) which we are trying to avoid to
keep this solution more generic.

> 
> > Which btw is something you can do today with virtio-gpu.
> > Maybe it makes sense to have the option to run virtio-gpu in
> > render-only mode for that use case.
> 
> Yeah that sounds like a useful addition.
> 
> Also, the same flow should work for real gpus passed through as pci devices. What we
> need is some way to surface the dma-buf on the guest side, which I think doesn't exist yet
> stand-alone. But this role could be fulfilled by virtio-gpu in render-only mode I think. And
> (assuming I've understood the recent discussions around virtio dma-buf sharing using
> virtio ids) this would give you some neat zero-copy tricks for free if you share multiple
> devices.
> 
> Also if you really want seamless buffer sharing between devices that are passed to the
> guest and devices on the host side (like displays I guess?
> or maybe video encode if this is for cloug gaming?), then using virtio-gpu in render mode
> should also allow you to pass the dma_fence back&forth.
> Which we'll need too, not just the dma-buf.
> 
> So at a first guess I'd say "render-only virtio-gpu mode" sounds like something rather
> useful. But I might be totally off here.
[Kasireddy, Vivek] Let me present more details about the use-case we are trying to solve;
Sorry for the crude graphic below:

Guest:						Host:
+-----------+					+----------------+
|   Weston  |					|    Qemu UI	|
|(Headless)|					|		|
+-----------+					+-^--------------+
         |  (1. Export prime fd	  (3.Share UUID |	       |  (4. Qemu calls Import using this UUID and a gets a new Dmabuf
         |   of scanout buffer)	     with Qemu)   |	       |   fd that is used with EGL_LINUX_DMA_BUF_EXT)
+-----v-------------+				+------------v-----+
|Virtio-Vdmabuf  |------------------------------->|Vhost-Vdmabuf|
|                            | (2.Generate & share UUID|	                |
+--------------------+ and PFNs for this buffer) +------------------+

Here is a link to the Weston Headless backend that we tested:
https://github.com/vivekkreddy/Intel-Distribution-of-Weston/blob/vdmabuf/libweston/backend-headless/headless.c#L287
And, here is the link to the Qemu part:
https://lists.nongnu.org/archive/html/qemu-devel/2021-02/msg02976.html

IIUC, Virtio GPU is used to present a virtual GPU to the Guest and all the rendering 
commands are captured and forwarded to the Host GPU via Virtio. However, in our
use-case, we are passthrough'ing a real GPU (claimed by i915 running in the Guest)
that is Headless; hence, we need to use Weston with the headless backend. The 
rendering in the Guest is accomplished via the unmodified native stack that includes 
Iris and i915. Therefore, it would not be efficient to use virtio-gpu for rendering or
for any other purpose in the Guest given that we could use the native stack which
is definitely faster. 

We really want to make this solution GPU driver (Host and Guest) agnostic and for
that reason it would need to rely on Dmabuf interfaces/APIs and preferrably avoid
making modifications to the native DRM drivers.

Thanks,
Vivek

> 
> Cheers, Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-09  0:25           ` Kasireddy, Vivek
  0 siblings, 0 replies; 57+ messages in thread
From: Kasireddy, Vivek @ 2021-02-09  0:25 UTC (permalink / raw)
  To: Daniel Vetter, Gerd Hoffmann
  Cc: Kim, Dongwon, christian.koenig, daniel.vetter, dri-devel,
	virtualization, Vetter,  Daniel, linux-media

Hi Gerd, Daniel,

> -----Original Message-----
> From: Daniel Vetter <daniel@ffwll.ch>
> Sent: Monday, February 08, 2021 1:39 AM
> To: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Daniel Vetter <daniel@ffwll.ch>; Kasireddy, Vivek <vivek.kasireddy@intel.com>;
> virtualization@lists.linux-foundation.org; dri-devel@lists.freedesktop.org; Vetter, Daniel
> <daniel.vetter@intel.com>; daniel.vetter@ffwll.ch; Kim, Dongwon
> <dongwon.kim@intel.com>; sumit.semwal@linaro.org; christian.koenig@amd.com;
> linux-media@vger.kernel.org
> Subject: Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
> 
> On Mon, Feb 08, 2021 at 08:57:48AM +0100, Gerd Hoffmann wrote:
> >   Hi,
> >
> > > > +/* extract pages referenced by sgt */ static struct page
> > > > +**extr_pgs(struct sg_table *sgt, int *nents, int *last_len)
> > >
> > > Nack, this doesn't work on dma-buf. And it'll blow up at runtime
> > > when you enable the very recently merged CONFIG_DMABUF_DEBUG (would
> > > be good to test with that, just to make sure).
[Kasireddy, Vivek] Although, I have not tested it yet but it looks like this will
throw a wrench in our solution as we use sg_next to iterate over all the struct page *
and get their PFNs. I wonder if there is any other clean way to get the PFNs of all 
the pages associated with a dmabuf.

> >
> > > Aside from this, for virtio/kvm use-cases we've already merged the
> > > udmabuf driver. Does this not work for your usecase?
> >
> > udmabuf can be used on the host side to make a collection of guest
> > pages available as host dmabuf.  It's part of the puzzle, but not a
> > complete solution.
> >
> > As I understand it the intended workflow is this:
> >
> >   (1) guest gpu driver exports some object as dma-buf
> >   (2) dma-buf is imported into this new driver.
> >   (3) driver sends the pages to the host.
> >   (4) hypervisor uses udmabuf to create a host dma-buf.
> >   (5) host dma-buf is passed on.
> >
> > And step (3) is the problematic one as this will not work in case the
> > dma-buf doesn't live in guest ram but in -- for example -- gpu device
> > memory.
> 
> Yup, vram or similar special ram is the reason why an importer can't look at the pages
> behind a dma-buf sg table.
[Kasireddy, Vivek] To exclude such cases, would it not be OK to limit the scope 
of this solution (Vdmabuf) to make it clear that the dma-buf has to live in Guest RAM?
Or, are there any ways to pin the dma-buf pages in Guest RAM to make this
solution work?

> 
> > Reversing the driver roles in the guest (virtio driver allocates pages
> > and exports the dma-buf to the guest gpu driver) should work fine.
> 
> Yup, this needs to flow the other way round than in these patches.
[Kasireddy, Vivek] That might work but I am afraid it means making invasive changes
to the Guest GPU driver (i915 in our case) which we are trying to avoid to
keep this solution more generic.

> 
> > Which btw is something you can do today with virtio-gpu.
> > Maybe it makes sense to have the option to run virtio-gpu in
> > render-only mode for that use case.
> 
> Yeah that sounds like a useful addition.
> 
> Also, the same flow should work for real gpus passed through as pci devices. What we
> need is some way to surface the dma-buf on the guest side, which I think doesn't exist yet
> stand-alone. But this role could be fulfilled by virtio-gpu in render-only mode I think. And
> (assuming I've understood the recent discussions around virtio dma-buf sharing using
> virtio ids) this would give you some neat zero-copy tricks for free if you share multiple
> devices.
> 
> Also if you really want seamless buffer sharing between devices that are passed to the
> guest and devices on the host side (like displays I guess?
> or maybe video encode if this is for cloug gaming?), then using virtio-gpu in render mode
> should also allow you to pass the dma_fence back&forth.
> Which we'll need too, not just the dma-buf.
> 
> So at a first guess I'd say "render-only virtio-gpu mode" sounds like something rather
> useful. But I might be totally off here.
[Kasireddy, Vivek] Let me present more details about the use-case we are trying to solve;
Sorry for the crude graphic below:

Guest:						Host:
+-----------+					+----------------+
|   Weston  |					|    Qemu UI	|
|(Headless)|					|		|
+-----------+					+-^--------------+
         |  (1. Export prime fd	  (3.Share UUID |	       |  (4. Qemu calls Import using this UUID and a gets a new Dmabuf
         |   of scanout buffer)	     with Qemu)   |	       |   fd that is used with EGL_LINUX_DMA_BUF_EXT)
+-----v-------------+				+------------v-----+
|Virtio-Vdmabuf  |------------------------------->|Vhost-Vdmabuf|
|                            | (2.Generate & share UUID|	                |
+--------------------+ and PFNs for this buffer) +------------------+

Here is a link to the Weston Headless backend that we tested:
https://github.com/vivekkreddy/Intel-Distribution-of-Weston/blob/vdmabuf/libweston/backend-headless/headless.c#L287
And, here is the link to the Qemu part:
https://lists.nongnu.org/archive/html/qemu-devel/2021-02/msg02976.html

IIUC, Virtio GPU is used to present a virtual GPU to the Guest and all the rendering 
commands are captured and forwarded to the Host GPU via Virtio. However, in our
use-case, we are passthrough'ing a real GPU (claimed by i915 running in the Guest)
that is Headless; hence, we need to use Weston with the headless backend. The 
rendering in the Guest is accomplished via the unmodified native stack that includes 
Iris and i915. Therefore, it would not be efficient to use virtio-gpu for rendering or
for any other purpose in the Guest given that we could use the native stack which
is definitely faster. 

We really want to make this solution GPU driver (Host and Guest) agnostic and for
that reason it would need to rely on Dmabuf interfaces/APIs and preferrably avoid
making modifications to the native DRM drivers.

Thanks,
Vivek

> 
> Cheers, Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
  2021-02-09  0:25           ` Kasireddy, Vivek
@ 2021-02-09  8:44             ` Gerd Hoffmann
  -1 siblings, 0 replies; 57+ messages in thread
From: Gerd Hoffmann @ 2021-02-09  8:44 UTC (permalink / raw)
  To: Kasireddy, Vivek
  Cc: Daniel Vetter, virtualization, dri-devel, Vetter, Daniel,
	daniel.vetter, Kim, Dongwon, sumit.semwal, christian.koenig,
	linux-media

  Hi,

> > > > Nack, this doesn't work on dma-buf. And it'll blow up at runtime
> > > > when you enable the very recently merged CONFIG_DMABUF_DEBUG (would
> > > > be good to test with that, just to make sure).
> [Kasireddy, Vivek] Although, I have not tested it yet but it looks like this will
> throw a wrench in our solution as we use sg_next to iterate over all the struct page *
> and get their PFNs. I wonder if there is any other clean way to get the PFNs of all 
> the pages associated with a dmabuf.

Well, there is no guarantee that dma-buf backing storage actually has
struct page ...

> [Kasireddy, Vivek] To exclude such cases, would it not be OK to limit the scope 
> of this solution (Vdmabuf) to make it clear that the dma-buf has to live in Guest RAM?
> Or, are there any ways to pin the dma-buf pages in Guest RAM to make this
> solution work?

At that point it becomes (i915) driver-specific.  If you go that route
it doesn't look that useful to use dma-bufs in the first place ...

> IIUC, Virtio GPU is used to present a virtual GPU to the Guest and all the rendering 
> commands are captured and forwarded to the Host GPU via Virtio.

You don't have to use the rendering pipeline.  You can let the i915 gpu
render into a dma-buf shared with virtio-gpu, then use virtio-gpu only for
buffer sharing with the host.

take care,
  Gerd


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-09  8:44             ` Gerd Hoffmann
  0 siblings, 0 replies; 57+ messages in thread
From: Gerd Hoffmann @ 2021-02-09  8:44 UTC (permalink / raw)
  To: Kasireddy, Vivek
  Cc: Kim, Dongwon, christian.koenig, daniel.vetter, dri-devel,
	virtualization, Vetter, Daniel, linux-media

  Hi,

> > > > Nack, this doesn't work on dma-buf. And it'll blow up at runtime
> > > > when you enable the very recently merged CONFIG_DMABUF_DEBUG (would
> > > > be good to test with that, just to make sure).
> [Kasireddy, Vivek] Although, I have not tested it yet but it looks like this will
> throw a wrench in our solution as we use sg_next to iterate over all the struct page *
> and get their PFNs. I wonder if there is any other clean way to get the PFNs of all 
> the pages associated with a dmabuf.

Well, there is no guarantee that dma-buf backing storage actually has
struct page ...

> [Kasireddy, Vivek] To exclude such cases, would it not be OK to limit the scope 
> of this solution (Vdmabuf) to make it clear that the dma-buf has to live in Guest RAM?
> Or, are there any ways to pin the dma-buf pages in Guest RAM to make this
> solution work?

At that point it becomes (i915) driver-specific.  If you go that route
it doesn't look that useful to use dma-bufs in the first place ...

> IIUC, Virtio GPU is used to present a virtual GPU to the Guest and all the rendering 
> commands are captured and forwarded to the Host GPU via Virtio.

You don't have to use the rendering pipeline.  You can let the i915 gpu
render into a dma-buf shared with virtio-gpu, then use virtio-gpu only for
buffer sharing with the host.

take care,
  Gerd

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
  2021-02-09  8:44             ` Gerd Hoffmann
  (?)
@ 2021-02-10  4:47               ` Kasireddy, Vivek
  -1 siblings, 0 replies; 57+ messages in thread
From: Kasireddy, Vivek @ 2021-02-10  4:47 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Daniel Vetter, virtualization, dri-devel, Vetter, Daniel,
	daniel.vetter, Kim, Dongwon, sumit.semwal, christian.koenig,
	linux-media

Hi Gerd,

> -----Original Message-----
> From: Gerd Hoffmann <kraxel@redhat.com>
> Sent: Tuesday, February 09, 2021 12:45 AM
> To: Kasireddy, Vivek <vivek.kasireddy@intel.com>
> Cc: Daniel Vetter <daniel@ffwll.ch>; virtualization@lists.linux-foundation.org; dri-
> devel@lists.freedesktop.org; Vetter, Daniel <daniel.vetter@intel.com>;
> daniel.vetter@ffwll.ch; Kim, Dongwon <dongwon.kim@intel.com>;
> sumit.semwal@linaro.org; christian.koenig@amd.com; linux-media@vger.kernel.org
> Subject: Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
> 
>   Hi,
> 
> > > > > Nack, this doesn't work on dma-buf. And it'll blow up at runtime
> > > > > when you enable the very recently merged CONFIG_DMABUF_DEBUG (would
> > > > > be good to test with that, just to make sure).
> > [Kasireddy, Vivek] Although, I have not tested it yet but it looks like this will
> > throw a wrench in our solution as we use sg_next to iterate over all the struct page *
> > and get their PFNs. I wonder if there is any other clean way to get the PFNs of all
> > the pages associated with a dmabuf.
> 
> Well, there is no guarantee that dma-buf backing storage actually has
> struct page ...
[Kasireddy, Vivek] What if I do mmap() on the fd followed by mlock() or mmap()
followed by get_user_pages()? If it still fails, would ioremapping the device memory
and poking at the backing storage be an option? Or, if I bind the passthrough'd GPU device
to vfio-pci and tap into the memory region associated with the device memory, can it be
made to work? 

And, I noticed that for PFNs that do not have valid struct page associated with it, KVM
does a memremap() to access/map them. Is this an option?

> 
> > [Kasireddy, Vivek] To exclude such cases, would it not be OK to limit the scope
> > of this solution (Vdmabuf) to make it clear that the dma-buf has to live in Guest RAM?
> > Or, are there any ways to pin the dma-buf pages in Guest RAM to make this
> > solution work?
> 
> At that point it becomes (i915) driver-specific.  If you go that route
> it doesn't look that useful to use dma-bufs in the first place ...
[Kasireddy, Vivek] I prefer not to make this driver specific if possible.

> 
> > IIUC, Virtio GPU is used to present a virtual GPU to the Guest and all the rendering
> > commands are captured and forwarded to the Host GPU via Virtio.
> 
> You don't have to use the rendering pipeline.  You can let the i915 gpu
> render into a dma-buf shared with virtio-gpu, then use virtio-gpu only for
> buffer sharing with the host.
[Kasireddy, Vivek] Is this the most viable path forward? I am not sure how complex or 
feasible it would be but I'll look into it.
Also, not using the rendering capabilities of virtio-gpu and turning it into a sharing only
device means there would be a giant mode switch with a lot of if() conditions sprinkled
across. Are you OK with that?

Thanks,
Vivek
> 
> take care,
>   Gerd


^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-10  4:47               ` Kasireddy, Vivek
  0 siblings, 0 replies; 57+ messages in thread
From: Kasireddy, Vivek @ 2021-02-10  4:47 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Kim, Dongwon, christian.koenig, daniel.vetter, dri-devel,
	virtualization, Daniel Vetter, Vetter, Daniel, sumit.semwal,
	linux-media

Hi Gerd,

> -----Original Message-----
> From: Gerd Hoffmann <kraxel@redhat.com>
> Sent: Tuesday, February 09, 2021 12:45 AM
> To: Kasireddy, Vivek <vivek.kasireddy@intel.com>
> Cc: Daniel Vetter <daniel@ffwll.ch>; virtualization@lists.linux-foundation.org; dri-
> devel@lists.freedesktop.org; Vetter, Daniel <daniel.vetter@intel.com>;
> daniel.vetter@ffwll.ch; Kim, Dongwon <dongwon.kim@intel.com>;
> sumit.semwal@linaro.org; christian.koenig@amd.com; linux-media@vger.kernel.org
> Subject: Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
> 
>   Hi,
> 
> > > > > Nack, this doesn't work on dma-buf. And it'll blow up at runtime
> > > > > when you enable the very recently merged CONFIG_DMABUF_DEBUG (would
> > > > > be good to test with that, just to make sure).
> > [Kasireddy, Vivek] Although, I have not tested it yet but it looks like this will
> > throw a wrench in our solution as we use sg_next to iterate over all the struct page *
> > and get their PFNs. I wonder if there is any other clean way to get the PFNs of all
> > the pages associated with a dmabuf.
> 
> Well, there is no guarantee that dma-buf backing storage actually has
> struct page ...
[Kasireddy, Vivek] What if I do mmap() on the fd followed by mlock() or mmap()
followed by get_user_pages()? If it still fails, would ioremapping the device memory
and poking at the backing storage be an option? Or, if I bind the passthrough'd GPU device
to vfio-pci and tap into the memory region associated with the device memory, can it be
made to work? 

And, I noticed that for PFNs that do not have valid struct page associated with it, KVM
does a memremap() to access/map them. Is this an option?

> 
> > [Kasireddy, Vivek] To exclude such cases, would it not be OK to limit the scope
> > of this solution (Vdmabuf) to make it clear that the dma-buf has to live in Guest RAM?
> > Or, are there any ways to pin the dma-buf pages in Guest RAM to make this
> > solution work?
> 
> At that point it becomes (i915) driver-specific.  If you go that route
> it doesn't look that useful to use dma-bufs in the first place ...
[Kasireddy, Vivek] I prefer not to make this driver specific if possible.

> 
> > IIUC, Virtio GPU is used to present a virtual GPU to the Guest and all the rendering
> > commands are captured and forwarded to the Host GPU via Virtio.
> 
> You don't have to use the rendering pipeline.  You can let the i915 gpu
> render into a dma-buf shared with virtio-gpu, then use virtio-gpu only for
> buffer sharing with the host.
[Kasireddy, Vivek] Is this the most viable path forward? I am not sure how complex or 
feasible it would be but I'll look into it.
Also, not using the rendering capabilities of virtio-gpu and turning it into a sharing only
device means there would be a giant mode switch with a lot of if() conditions sprinkled
across. Are you OK with that?

Thanks,
Vivek
> 
> take care,
>   Gerd

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-10  4:47               ` Kasireddy, Vivek
  0 siblings, 0 replies; 57+ messages in thread
From: Kasireddy, Vivek @ 2021-02-10  4:47 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Kim, Dongwon, christian.koenig, daniel.vetter, dri-devel,
	virtualization, Vetter,  Daniel, linux-media

Hi Gerd,

> -----Original Message-----
> From: Gerd Hoffmann <kraxel@redhat.com>
> Sent: Tuesday, February 09, 2021 12:45 AM
> To: Kasireddy, Vivek <vivek.kasireddy@intel.com>
> Cc: Daniel Vetter <daniel@ffwll.ch>; virtualization@lists.linux-foundation.org; dri-
> devel@lists.freedesktop.org; Vetter, Daniel <daniel.vetter@intel.com>;
> daniel.vetter@ffwll.ch; Kim, Dongwon <dongwon.kim@intel.com>;
> sumit.semwal@linaro.org; christian.koenig@amd.com; linux-media@vger.kernel.org
> Subject: Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
> 
>   Hi,
> 
> > > > > Nack, this doesn't work on dma-buf. And it'll blow up at runtime
> > > > > when you enable the very recently merged CONFIG_DMABUF_DEBUG (would
> > > > > be good to test with that, just to make sure).
> > [Kasireddy, Vivek] Although, I have not tested it yet but it looks like this will
> > throw a wrench in our solution as we use sg_next to iterate over all the struct page *
> > and get their PFNs. I wonder if there is any other clean way to get the PFNs of all
> > the pages associated with a dmabuf.
> 
> Well, there is no guarantee that dma-buf backing storage actually has
> struct page ...
[Kasireddy, Vivek] What if I do mmap() on the fd followed by mlock() or mmap()
followed by get_user_pages()? If it still fails, would ioremapping the device memory
and poking at the backing storage be an option? Or, if I bind the passthrough'd GPU device
to vfio-pci and tap into the memory region associated with the device memory, can it be
made to work? 

And, I noticed that for PFNs that do not have valid struct page associated with it, KVM
does a memremap() to access/map them. Is this an option?

> 
> > [Kasireddy, Vivek] To exclude such cases, would it not be OK to limit the scope
> > of this solution (Vdmabuf) to make it clear that the dma-buf has to live in Guest RAM?
> > Or, are there any ways to pin the dma-buf pages in Guest RAM to make this
> > solution work?
> 
> At that point it becomes (i915) driver-specific.  If you go that route
> it doesn't look that useful to use dma-bufs in the first place ...
[Kasireddy, Vivek] I prefer not to make this driver specific if possible.

> 
> > IIUC, Virtio GPU is used to present a virtual GPU to the Guest and all the rendering
> > commands are captured and forwarded to the Host GPU via Virtio.
> 
> You don't have to use the rendering pipeline.  You can let the i915 gpu
> render into a dma-buf shared with virtio-gpu, then use virtio-gpu only for
> buffer sharing with the host.
[Kasireddy, Vivek] Is this the most viable path forward? I am not sure how complex or 
feasible it would be but I'll look into it.
Also, not using the rendering capabilities of virtio-gpu and turning it into a sharing only
device means there would be a giant mode switch with a lot of if() conditions sprinkled
across. Are you OK with that?

Thanks,
Vivek
> 
> take care,
>   Gerd

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
  2021-02-10  4:47               ` Kasireddy, Vivek
  (?)
@ 2021-02-10  8:05                 ` Christian König
  -1 siblings, 0 replies; 57+ messages in thread
From: Christian König @ 2021-02-10  8:05 UTC (permalink / raw)
  To: Kasireddy, Vivek, Gerd Hoffmann
  Cc: Daniel Vetter, virtualization, dri-devel, Vetter, Daniel,
	daniel.vetter, Kim, Dongwon, sumit.semwal, linux-media

Hi Vivek,

Am 10.02.21 um 05:47 schrieb Kasireddy, Vivek:
> Hi Gerd,
>
>> -----Original Message-----
>> From: Gerd Hoffmann <kraxel@redhat.com>
>> Sent: Tuesday, February 09, 2021 12:45 AM
>> To: Kasireddy, Vivek <vivek.kasireddy@intel.com>
>> Cc: Daniel Vetter <daniel@ffwll.ch>; virtualization@lists.linux-foundation.org; dri-
>> devel@lists.freedesktop.org; Vetter, Daniel <daniel.vetter@intel.com>;
>> daniel.vetter@ffwll.ch; Kim, Dongwon <dongwon.kim@intel.com>;
>> sumit.semwal@linaro.org; christian.koenig@amd.com; linux-media@vger.kernel.org
>> Subject: Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
>>
>>    Hi,
>>
>>>>>> Nack, this doesn't work on dma-buf. And it'll blow up at runtime
>>>>>> when you enable the very recently merged CONFIG_DMABUF_DEBUG (would
>>>>>> be good to test with that, just to make sure).
>>> [Kasireddy, Vivek] Although, I have not tested it yet but it looks like this will
>>> throw a wrench in our solution as we use sg_next to iterate over all the struct page *
>>> and get their PFNs. I wonder if there is any other clean way to get the PFNs of all
>>> the pages associated with a dmabuf.
>> Well, there is no guarantee that dma-buf backing storage actually has
>> struct page ...
> [Kasireddy, Vivek] What if I do mmap() on the fd followed by mlock() or mmap()
> followed by get_user_pages()? If it still fails, would ioremapping the device memory
> and poking at the backing storage be an option? Or, if I bind the passthrough'd GPU device
> to vfio-pci and tap into the memory region associated with the device memory, can it be
> made to work?

get_user_pages() is not allowed on mmaped DMA-bufs in the first place.

Daniel is currently adding code to make sure that this is never ever used.

> And, I noticed that for PFNs that do not have valid struct page associated with it, KVM
> does a memremap() to access/map them. Is this an option?

No, even for system memory which has a valid struct page touching it 
when it is part of a DMA-buf is illegal since the reference count and 
mapping fields in struct page might be used for something different.

Keep in mind that struct page is a heavily overloaded structure for 
different use cases. You can't just use it for a different use case than 
what the owner of the page has intended it.

Regards,
Christian.

>
>>> [Kasireddy, Vivek] To exclude such cases, would it not be OK to limit the scope
>>> of this solution (Vdmabuf) to make it clear that the dma-buf has to live in Guest RAM?
>>> Or, are there any ways to pin the dma-buf pages in Guest RAM to make this
>>> solution work?
>> At that point it becomes (i915) driver-specific.  If you go that route
>> it doesn't look that useful to use dma-bufs in the first place ...
> [Kasireddy, Vivek] I prefer not to make this driver specific if possible.
>
>>> IIUC, Virtio GPU is used to present a virtual GPU to the Guest and all the rendering
>>> commands are captured and forwarded to the Host GPU via Virtio.
>> You don't have to use the rendering pipeline.  You can let the i915 gpu
>> render into a dma-buf shared with virtio-gpu, then use virtio-gpu only for
>> buffer sharing with the host.
> [Kasireddy, Vivek] Is this the most viable path forward? I am not sure how complex or
> feasible it would be but I'll look into it.
> Also, not using the rendering capabilities of virtio-gpu and turning it into a sharing only
> device means there would be a giant mode switch with a lot of if() conditions sprinkled
> across. Are you OK with that?
>
> Thanks,
> Vivek
>> take care,
>>    Gerd


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-10  8:05                 ` Christian König
  0 siblings, 0 replies; 57+ messages in thread
From: Christian König @ 2021-02-10  8:05 UTC (permalink / raw)
  To: Kasireddy, Vivek, Gerd Hoffmann
  Cc: Kim, Dongwon, daniel.vetter, dri-devel, virtualization,
	Daniel Vetter, Vetter, Daniel, sumit.semwal, linux-media

Hi Vivek,

Am 10.02.21 um 05:47 schrieb Kasireddy, Vivek:
> Hi Gerd,
>
>> -----Original Message-----
>> From: Gerd Hoffmann <kraxel@redhat.com>
>> Sent: Tuesday, February 09, 2021 12:45 AM
>> To: Kasireddy, Vivek <vivek.kasireddy@intel.com>
>> Cc: Daniel Vetter <daniel@ffwll.ch>; virtualization@lists.linux-foundation.org; dri-
>> devel@lists.freedesktop.org; Vetter, Daniel <daniel.vetter@intel.com>;
>> daniel.vetter@ffwll.ch; Kim, Dongwon <dongwon.kim@intel.com>;
>> sumit.semwal@linaro.org; christian.koenig@amd.com; linux-media@vger.kernel.org
>> Subject: Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
>>
>>    Hi,
>>
>>>>>> Nack, this doesn't work on dma-buf. And it'll blow up at runtime
>>>>>> when you enable the very recently merged CONFIG_DMABUF_DEBUG (would
>>>>>> be good to test with that, just to make sure).
>>> [Kasireddy, Vivek] Although, I have not tested it yet but it looks like this will
>>> throw a wrench in our solution as we use sg_next to iterate over all the struct page *
>>> and get their PFNs. I wonder if there is any other clean way to get the PFNs of all
>>> the pages associated with a dmabuf.
>> Well, there is no guarantee that dma-buf backing storage actually has
>> struct page ...
> [Kasireddy, Vivek] What if I do mmap() on the fd followed by mlock() or mmap()
> followed by get_user_pages()? If it still fails, would ioremapping the device memory
> and poking at the backing storage be an option? Or, if I bind the passthrough'd GPU device
> to vfio-pci and tap into the memory region associated with the device memory, can it be
> made to work?

get_user_pages() is not allowed on mmaped DMA-bufs in the first place.

Daniel is currently adding code to make sure that this is never ever used.

> And, I noticed that for PFNs that do not have valid struct page associated with it, KVM
> does a memremap() to access/map them. Is this an option?

No, even for system memory which has a valid struct page touching it 
when it is part of a DMA-buf is illegal since the reference count and 
mapping fields in struct page might be used for something different.

Keep in mind that struct page is a heavily overloaded structure for 
different use cases. You can't just use it for a different use case than 
what the owner of the page has intended it.

Regards,
Christian.

>
>>> [Kasireddy, Vivek] To exclude such cases, would it not be OK to limit the scope
>>> of this solution (Vdmabuf) to make it clear that the dma-buf has to live in Guest RAM?
>>> Or, are there any ways to pin the dma-buf pages in Guest RAM to make this
>>> solution work?
>> At that point it becomes (i915) driver-specific.  If you go that route
>> it doesn't look that useful to use dma-bufs in the first place ...
> [Kasireddy, Vivek] I prefer not to make this driver specific if possible.
>
>>> IIUC, Virtio GPU is used to present a virtual GPU to the Guest and all the rendering
>>> commands are captured and forwarded to the Host GPU via Virtio.
>> You don't have to use the rendering pipeline.  You can let the i915 gpu
>> render into a dma-buf shared with virtio-gpu, then use virtio-gpu only for
>> buffer sharing with the host.
> [Kasireddy, Vivek] Is this the most viable path forward? I am not sure how complex or
> feasible it would be but I'll look into it.
> Also, not using the rendering capabilities of virtio-gpu and turning it into a sharing only
> device means there would be a giant mode switch with a lot of if() conditions sprinkled
> across. Are you OK with that?
>
> Thanks,
> Vivek
>> take care,
>>    Gerd

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-10  8:05                 ` Christian König
  0 siblings, 0 replies; 57+ messages in thread
From: Christian König @ 2021-02-10  8:05 UTC (permalink / raw)
  To: Kasireddy, Vivek, Gerd Hoffmann
  Cc: Kim, Dongwon, daniel.vetter, dri-devel, virtualization, Vetter,
	Daniel, linux-media

Hi Vivek,

Am 10.02.21 um 05:47 schrieb Kasireddy, Vivek:
> Hi Gerd,
>
>> -----Original Message-----
>> From: Gerd Hoffmann <kraxel@redhat.com>
>> Sent: Tuesday, February 09, 2021 12:45 AM
>> To: Kasireddy, Vivek <vivek.kasireddy@intel.com>
>> Cc: Daniel Vetter <daniel@ffwll.ch>; virtualization@lists.linux-foundation.org; dri-
>> devel@lists.freedesktop.org; Vetter, Daniel <daniel.vetter@intel.com>;
>> daniel.vetter@ffwll.ch; Kim, Dongwon <dongwon.kim@intel.com>;
>> sumit.semwal@linaro.org; christian.koenig@amd.com; linux-media@vger.kernel.org
>> Subject: Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
>>
>>    Hi,
>>
>>>>>> Nack, this doesn't work on dma-buf. And it'll blow up at runtime
>>>>>> when you enable the very recently merged CONFIG_DMABUF_DEBUG (would
>>>>>> be good to test with that, just to make sure).
>>> [Kasireddy, Vivek] Although, I have not tested it yet but it looks like this will
>>> throw a wrench in our solution as we use sg_next to iterate over all the struct page *
>>> and get their PFNs. I wonder if there is any other clean way to get the PFNs of all
>>> the pages associated with a dmabuf.
>> Well, there is no guarantee that dma-buf backing storage actually has
>> struct page ...
> [Kasireddy, Vivek] What if I do mmap() on the fd followed by mlock() or mmap()
> followed by get_user_pages()? If it still fails, would ioremapping the device memory
> and poking at the backing storage be an option? Or, if I bind the passthrough'd GPU device
> to vfio-pci and tap into the memory region associated with the device memory, can it be
> made to work?

get_user_pages() is not allowed on mmaped DMA-bufs in the first place.

Daniel is currently adding code to make sure that this is never ever used.

> And, I noticed that for PFNs that do not have valid struct page associated with it, KVM
> does a memremap() to access/map them. Is this an option?

No, even for system memory which has a valid struct page touching it 
when it is part of a DMA-buf is illegal since the reference count and 
mapping fields in struct page might be used for something different.

Keep in mind that struct page is a heavily overloaded structure for 
different use cases. You can't just use it for a different use case than 
what the owner of the page has intended it.

Regards,
Christian.

>
>>> [Kasireddy, Vivek] To exclude such cases, would it not be OK to limit the scope
>>> of this solution (Vdmabuf) to make it clear that the dma-buf has to live in Guest RAM?
>>> Or, are there any ways to pin the dma-buf pages in Guest RAM to make this
>>> solution work?
>> At that point it becomes (i915) driver-specific.  If you go that route
>> it doesn't look that useful to use dma-bufs in the first place ...
> [Kasireddy, Vivek] I prefer not to make this driver specific if possible.
>
>>> IIUC, Virtio GPU is used to present a virtual GPU to the Guest and all the rendering
>>> commands are captured and forwarded to the Host GPU via Virtio.
>> You don't have to use the rendering pipeline.  You can let the i915 gpu
>> render into a dma-buf shared with virtio-gpu, then use virtio-gpu only for
>> buffer sharing with the host.
> [Kasireddy, Vivek] Is this the most viable path forward? I am not sure how complex or
> feasible it would be but I'll look into it.
> Also, not using the rendering capabilities of virtio-gpu and turning it into a sharing only
> device means there would be a giant mode switch with a lot of if() conditions sprinkled
> across. Are you OK with that?
>
> Thanks,
> Vivek
>> take care,
>>    Gerd

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
  2021-02-10  4:47               ` Kasireddy, Vivek
  (?)
@ 2021-02-10  9:16                 ` Gerd Hoffmann
  -1 siblings, 0 replies; 57+ messages in thread
From: Gerd Hoffmann @ 2021-02-10  9:16 UTC (permalink / raw)
  To: Kasireddy, Vivek
  Cc: Daniel Vetter, virtualization, dri-devel, Vetter, Daniel,
	daniel.vetter, Kim, Dongwon, sumit.semwal, christian.koenig,
	linux-media

> > You don't have to use the rendering pipeline.  You can let the i915 gpu
> > render into a dma-buf shared with virtio-gpu, then use virtio-gpu only for
> > buffer sharing with the host.
> [Kasireddy, Vivek] Is this the most viable path forward? I am not sure how complex or 
> feasible it would be but I'll look into it.
> Also, not using the rendering capabilities of virtio-gpu and turning it into a sharing only
> device means there would be a giant mode switch with a lot of if() conditions sprinkled
> across. Are you OK with that?

Hmm, why a big mode switch?  You should be able to do that without
modifying the virtio-gpu guest driver.  On the host side qemu needs
some work to support the most recent virtio-gpu features like the
buffer uuids (assuming you use qemu userspace), right now those
are only supported by crosvm.

It might be useful to add support for display-less virtio-gpu, i.e.
"qemu -device virtio-gpu-pci,max_outputs=0".  Right now the linux
driver throws an error in case no output (crtc) is present.  Should
be fixable without too much effort though, effectively the sanity
check would have to be moved from driver initialization to commands
like SET_SCANOUT which manage the outputs.

take care,
  Gerd


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-10  9:16                 ` Gerd Hoffmann
  0 siblings, 0 replies; 57+ messages in thread
From: Gerd Hoffmann @ 2021-02-10  9:16 UTC (permalink / raw)
  To: Kasireddy, Vivek
  Cc: Kim, Dongwon, christian.koenig, daniel.vetter, dri-devel,
	virtualization, Daniel Vetter, Vetter, Daniel, sumit.semwal,
	linux-media

> > You don't have to use the rendering pipeline.  You can let the i915 gpu
> > render into a dma-buf shared with virtio-gpu, then use virtio-gpu only for
> > buffer sharing with the host.
> [Kasireddy, Vivek] Is this the most viable path forward? I am not sure how complex or 
> feasible it would be but I'll look into it.
> Also, not using the rendering capabilities of virtio-gpu and turning it into a sharing only
> device means there would be a giant mode switch with a lot of if() conditions sprinkled
> across. Are you OK with that?

Hmm, why a big mode switch?  You should be able to do that without
modifying the virtio-gpu guest driver.  On the host side qemu needs
some work to support the most recent virtio-gpu features like the
buffer uuids (assuming you use qemu userspace), right now those
are only supported by crosvm.

It might be useful to add support for display-less virtio-gpu, i.e.
"qemu -device virtio-gpu-pci,max_outputs=0".  Right now the linux
driver throws an error in case no output (crtc) is present.  Should
be fixable without too much effort though, effectively the sanity
check would have to be moved from driver initialization to commands
like SET_SCANOUT which manage the outputs.

take care,
  Gerd

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-10  9:16                 ` Gerd Hoffmann
  0 siblings, 0 replies; 57+ messages in thread
From: Gerd Hoffmann @ 2021-02-10  9:16 UTC (permalink / raw)
  To: Kasireddy, Vivek
  Cc: Kim, Dongwon, christian.koenig, daniel.vetter, dri-devel,
	virtualization, Vetter, Daniel, linux-media

> > You don't have to use the rendering pipeline.  You can let the i915 gpu
> > render into a dma-buf shared with virtio-gpu, then use virtio-gpu only for
> > buffer sharing with the host.
> [Kasireddy, Vivek] Is this the most viable path forward? I am not sure how complex or 
> feasible it would be but I'll look into it.
> Also, not using the rendering capabilities of virtio-gpu and turning it into a sharing only
> device means there would be a giant mode switch with a lot of if() conditions sprinkled
> across. Are you OK with that?

Hmm, why a big mode switch?  You should be able to do that without
modifying the virtio-gpu guest driver.  On the host side qemu needs
some work to support the most recent virtio-gpu features like the
buffer uuids (assuming you use qemu userspace), right now those
are only supported by crosvm.

It might be useful to add support for display-less virtio-gpu, i.e.
"qemu -device virtio-gpu-pci,max_outputs=0".  Right now the linux
driver throws an error in case no output (crtc) is present.  Should
be fixable without too much effort though, effectively the sanity
check would have to be moved from driver initialization to commands
like SET_SCANOUT which manage the outputs.

take care,
  Gerd

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
  2021-02-10  9:16                 ` Gerd Hoffmann
  (?)
@ 2021-02-12  8:15                   ` Kasireddy, Vivek
  -1 siblings, 0 replies; 57+ messages in thread
From: Kasireddy, Vivek @ 2021-02-12  8:15 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Daniel Vetter, virtualization, dri-devel, Vetter, Daniel,
	daniel.vetter, Kim, Dongwon, sumit.semwal, christian.koenig,
	linux-media

Hi Gerd,

> > > You don't have to use the rendering pipeline.  You can let the i915
> > > gpu render into a dma-buf shared with virtio-gpu, then use
> > > virtio-gpu only for buffer sharing with the host.
[Kasireddy, Vivek] Just to confirm my understanding of what you are suggesting, are
you saying that we need to either have Weston allocate scanout buffers (GBM surface/BO)
using virtio-gpu and render into them using i915; or have virtio-gpu allocate pages and 
export a dma-buf and have Weston create a GBM BO by calling gbm_bo_import(fd) and
render into the BO using i915?

> Hmm, why a big mode switch?  You should be able to do that without modifying the
> virtio-gpu guest driver.  On the host side qemu needs some work to support the most
> recent virtio-gpu features like the buffer uuids (assuming you use qemu userspace), right
> now those are only supported by crosvm.
[Kasireddy, Vivek] We are only interested in Qemu UI at the moment but if we were to use
virtio-gpu, we are going to need to add one more vq and support for managing buffers, 
events, etc.

Thanks,
Vivek

> 
> It might be useful to add support for display-less virtio-gpu, i.e.
> "qemu -device virtio-gpu-pci,max_outputs=0".  Right now the linux driver throws an error
> in case no output (crtc) is present.  Should be fixable without too much effort though,
> effectively the sanity check would have to be moved from driver initialization to
> commands like SET_SCANOUT which manage the outputs.
> 
> take care,
>   Gerd


^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-12  8:15                   ` Kasireddy, Vivek
  0 siblings, 0 replies; 57+ messages in thread
From: Kasireddy, Vivek @ 2021-02-12  8:15 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Kim, Dongwon, christian.koenig, daniel.vetter, dri-devel,
	virtualization, Daniel Vetter, Vetter, Daniel, sumit.semwal,
	linux-media

Hi Gerd,

> > > You don't have to use the rendering pipeline.  You can let the i915
> > > gpu render into a dma-buf shared with virtio-gpu, then use
> > > virtio-gpu only for buffer sharing with the host.
[Kasireddy, Vivek] Just to confirm my understanding of what you are suggesting, are
you saying that we need to either have Weston allocate scanout buffers (GBM surface/BO)
using virtio-gpu and render into them using i915; or have virtio-gpu allocate pages and 
export a dma-buf and have Weston create a GBM BO by calling gbm_bo_import(fd) and
render into the BO using i915?

> Hmm, why a big mode switch?  You should be able to do that without modifying the
> virtio-gpu guest driver.  On the host side qemu needs some work to support the most
> recent virtio-gpu features like the buffer uuids (assuming you use qemu userspace), right
> now those are only supported by crosvm.
[Kasireddy, Vivek] We are only interested in Qemu UI at the moment but if we were to use
virtio-gpu, we are going to need to add one more vq and support for managing buffers, 
events, etc.

Thanks,
Vivek

> 
> It might be useful to add support for display-less virtio-gpu, i.e.
> "qemu -device virtio-gpu-pci,max_outputs=0".  Right now the linux driver throws an error
> in case no output (crtc) is present.  Should be fixable without too much effort though,
> effectively the sanity check would have to be moved from driver initialization to
> commands like SET_SCANOUT which manage the outputs.
> 
> take care,
>   Gerd

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-12  8:15                   ` Kasireddy, Vivek
  0 siblings, 0 replies; 57+ messages in thread
From: Kasireddy, Vivek @ 2021-02-12  8:15 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Kim, Dongwon, christian.koenig, daniel.vetter, dri-devel,
	virtualization, Vetter,  Daniel, linux-media

Hi Gerd,

> > > You don't have to use the rendering pipeline.  You can let the i915
> > > gpu render into a dma-buf shared with virtio-gpu, then use
> > > virtio-gpu only for buffer sharing with the host.
[Kasireddy, Vivek] Just to confirm my understanding of what you are suggesting, are
you saying that we need to either have Weston allocate scanout buffers (GBM surface/BO)
using virtio-gpu and render into them using i915; or have virtio-gpu allocate pages and 
export a dma-buf and have Weston create a GBM BO by calling gbm_bo_import(fd) and
render into the BO using i915?

> Hmm, why a big mode switch?  You should be able to do that without modifying the
> virtio-gpu guest driver.  On the host side qemu needs some work to support the most
> recent virtio-gpu features like the buffer uuids (assuming you use qemu userspace), right
> now those are only supported by crosvm.
[Kasireddy, Vivek] We are only interested in Qemu UI at the moment but if we were to use
virtio-gpu, we are going to need to add one more vq and support for managing buffers, 
events, etc.

Thanks,
Vivek

> 
> It might be useful to add support for display-less virtio-gpu, i.e.
> "qemu -device virtio-gpu-pci,max_outputs=0".  Right now the linux driver throws an error
> in case no output (crtc) is present.  Should be fixable without too much effort though,
> effectively the sanity check would have to be moved from driver initialization to
> commands like SET_SCANOUT which manage the outputs.
> 
> take care,
>   Gerd

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
  2021-02-10  8:05                 ` Christian König
  (?)
@ 2021-02-12  8:36                   ` Kasireddy, Vivek
  -1 siblings, 0 replies; 57+ messages in thread
From: Kasireddy, Vivek @ 2021-02-12  8:36 UTC (permalink / raw)
  To: Christian König, Gerd Hoffmann
  Cc: Daniel Vetter, virtualization, dri-devel, Vetter, Daniel,
	daniel.vetter, Kim, Dongwon, sumit.semwal, linux-media

Hi Christian,

> 
> Hi Vivek,
> 
> > [Kasireddy, Vivek] What if I do mmap() on the fd followed by mlock()
> > or mmap() followed by get_user_pages()? If it still fails, would
> > ioremapping the device memory and poking at the backing storage be an
> > option? Or, if I bind the passthrough'd GPU device to vfio-pci and tap
> > into the memory region associated with the device memory, can it be made to work?
> 
> get_user_pages() is not allowed on mmaped DMA-bufs in the first place.
> 
> Daniel is currently adding code to make sure that this is never ever used.
> 
> > And, I noticed that for PFNs that do not have valid struct page
> > associated with it, KVM does a memremap() to access/map them. Is this an option?
> 
> No, even for system memory which has a valid struct page touching it when it is part of a
> DMA-buf is illegal since the reference count and mapping fields in struct page might be
> used for something different.
> 
> Keep in mind that struct page is a heavily overloaded structure for different use cases. You
> can't just use it for a different use case than what the owner of the page has intended it.
[Kasireddy, Vivek] What is your recommended/acceptable way for doing what I am trying to 
do?

Thanks,
Vivek

> 
> Regards,
> Christian.
> 
> >
> >
> > Thanks,
> > Vivek
> >> take care,
> >>    Gerd


^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-12  8:36                   ` Kasireddy, Vivek
  0 siblings, 0 replies; 57+ messages in thread
From: Kasireddy, Vivek @ 2021-02-12  8:36 UTC (permalink / raw)
  To: Christian König, Gerd Hoffmann
  Cc: Kim, Dongwon, daniel.vetter, dri-devel, virtualization,
	Daniel Vetter, Vetter, Daniel, sumit.semwal, linux-media

Hi Christian,

> 
> Hi Vivek,
> 
> > [Kasireddy, Vivek] What if I do mmap() on the fd followed by mlock()
> > or mmap() followed by get_user_pages()? If it still fails, would
> > ioremapping the device memory and poking at the backing storage be an
> > option? Or, if I bind the passthrough'd GPU device to vfio-pci and tap
> > into the memory region associated with the device memory, can it be made to work?
> 
> get_user_pages() is not allowed on mmaped DMA-bufs in the first place.
> 
> Daniel is currently adding code to make sure that this is never ever used.
> 
> > And, I noticed that for PFNs that do not have valid struct page
> > associated with it, KVM does a memremap() to access/map them. Is this an option?
> 
> No, even for system memory which has a valid struct page touching it when it is part of a
> DMA-buf is illegal since the reference count and mapping fields in struct page might be
> used for something different.
> 
> Keep in mind that struct page is a heavily overloaded structure for different use cases. You
> can't just use it for a different use case than what the owner of the page has intended it.
[Kasireddy, Vivek] What is your recommended/acceptable way for doing what I am trying to 
do?

Thanks,
Vivek

> 
> Regards,
> Christian.
> 
> >
> >
> > Thanks,
> > Vivek
> >> take care,
> >>    Gerd

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-12  8:36                   ` Kasireddy, Vivek
  0 siblings, 0 replies; 57+ messages in thread
From: Kasireddy, Vivek @ 2021-02-12  8:36 UTC (permalink / raw)
  To: Christian König, Gerd Hoffmann
  Cc: Kim, Dongwon, daniel.vetter, dri-devel, virtualization, Vetter,
	 Daniel, linux-media

Hi Christian,

> 
> Hi Vivek,
> 
> > [Kasireddy, Vivek] What if I do mmap() on the fd followed by mlock()
> > or mmap() followed by get_user_pages()? If it still fails, would
> > ioremapping the device memory and poking at the backing storage be an
> > option? Or, if I bind the passthrough'd GPU device to vfio-pci and tap
> > into the memory region associated with the device memory, can it be made to work?
> 
> get_user_pages() is not allowed on mmaped DMA-bufs in the first place.
> 
> Daniel is currently adding code to make sure that this is never ever used.
> 
> > And, I noticed that for PFNs that do not have valid struct page
> > associated with it, KVM does a memremap() to access/map them. Is this an option?
> 
> No, even for system memory which has a valid struct page touching it when it is part of a
> DMA-buf is illegal since the reference count and mapping fields in struct page might be
> used for something different.
> 
> Keep in mind that struct page is a heavily overloaded structure for different use cases. You
> can't just use it for a different use case than what the owner of the page has intended it.
[Kasireddy, Vivek] What is your recommended/acceptable way for doing what I am trying to 
do?

Thanks,
Vivek

> 
> Regards,
> Christian.
> 
> >
> >
> > Thanks,
> > Vivek
> >> take care,
> >>    Gerd

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
  2021-02-12  8:36                   ` Kasireddy, Vivek
  (?)
@ 2021-02-12  8:47                     ` Christian König
  -1 siblings, 0 replies; 57+ messages in thread
From: Christian König @ 2021-02-12  8:47 UTC (permalink / raw)
  To: Kasireddy, Vivek, Gerd Hoffmann
  Cc: Daniel Vetter, virtualization, dri-devel, Vetter, Daniel,
	daniel.vetter, Kim, Dongwon, sumit.semwal, linux-media

Hi Vivek,

Am 12.02.21 um 09:36 schrieb Kasireddy, Vivek:
> Hi Christian,
>
>> Hi Vivek,
>>
>>> [Kasireddy, Vivek] What if I do mmap() on the fd followed by mlock()
>>> or mmap() followed by get_user_pages()? If it still fails, would
>>> ioremapping the device memory and poking at the backing storage be an
>>> option? Or, if I bind the passthrough'd GPU device to vfio-pci and tap
>>> into the memory region associated with the device memory, can it be made to work?
>> get_user_pages() is not allowed on mmaped DMA-bufs in the first place.
>>
>> Daniel is currently adding code to make sure that this is never ever used.
>>
>>> And, I noticed that for PFNs that do not have valid struct page
>>> associated with it, KVM does a memremap() to access/map them. Is this an option?
>> No, even for system memory which has a valid struct page touching it when it is part of a
>> DMA-buf is illegal since the reference count and mapping fields in struct page might be
>> used for something different.
>>
>> Keep in mind that struct page is a heavily overloaded structure for different use cases. You
>> can't just use it for a different use case than what the owner of the page has intended it.
> [Kasireddy, Vivek] What is your recommended/acceptable way for doing what I am trying to
> do?

I'm not an expert on virtualisation, but Gerd seems to have a couple of 
ideas of how to get this working.

In general I think it is pretty much impossible to export stuff from the 
guest to the host by DMA-buf.

This is because of the fundamental concept of DMA-buf that the exporter 
needs to setup mappings (both CPU page tables as well as stuff like 
IOMMU). When the guest exports something it would mean that you give the 
guest control over the IOMMU and/or host page tables. And that is not 
something you can do as far as I can see.

You can only export stuff the other way around so that the host is 
providing the memory and the guest is consuming it. If I understand it 
correctly that's exactly what Gerd is suggesting here.

Regards,
Christian.

>
> Thanks,
> Vivek
>
>> Regards,
>> Christian.
>>
>>>
>>> Thanks,
>>> Vivek
>>>> take care,
>>>>     Gerd


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-12  8:47                     ` Christian König
  0 siblings, 0 replies; 57+ messages in thread
From: Christian König @ 2021-02-12  8:47 UTC (permalink / raw)
  To: Kasireddy, Vivek, Gerd Hoffmann
  Cc: Kim, Dongwon, daniel.vetter, dri-devel, virtualization,
	Daniel Vetter, Vetter, Daniel, sumit.semwal, linux-media

Hi Vivek,

Am 12.02.21 um 09:36 schrieb Kasireddy, Vivek:
> Hi Christian,
>
>> Hi Vivek,
>>
>>> [Kasireddy, Vivek] What if I do mmap() on the fd followed by mlock()
>>> or mmap() followed by get_user_pages()? If it still fails, would
>>> ioremapping the device memory and poking at the backing storage be an
>>> option? Or, if I bind the passthrough'd GPU device to vfio-pci and tap
>>> into the memory region associated with the device memory, can it be made to work?
>> get_user_pages() is not allowed on mmaped DMA-bufs in the first place.
>>
>> Daniel is currently adding code to make sure that this is never ever used.
>>
>>> And, I noticed that for PFNs that do not have valid struct page
>>> associated with it, KVM does a memremap() to access/map them. Is this an option?
>> No, even for system memory which has a valid struct page touching it when it is part of a
>> DMA-buf is illegal since the reference count and mapping fields in struct page might be
>> used for something different.
>>
>> Keep in mind that struct page is a heavily overloaded structure for different use cases. You
>> can't just use it for a different use case than what the owner of the page has intended it.
> [Kasireddy, Vivek] What is your recommended/acceptable way for doing what I am trying to
> do?

I'm not an expert on virtualisation, but Gerd seems to have a couple of 
ideas of how to get this working.

In general I think it is pretty much impossible to export stuff from the 
guest to the host by DMA-buf.

This is because of the fundamental concept of DMA-buf that the exporter 
needs to setup mappings (both CPU page tables as well as stuff like 
IOMMU). When the guest exports something it would mean that you give the 
guest control over the IOMMU and/or host page tables. And that is not 
something you can do as far as I can see.

You can only export stuff the other way around so that the host is 
providing the memory and the guest is consuming it. If I understand it 
correctly that's exactly what Gerd is suggesting here.

Regards,
Christian.

>
> Thanks,
> Vivek
>
>> Regards,
>> Christian.
>>
>>>
>>> Thanks,
>>> Vivek
>>>> take care,
>>>>     Gerd

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-12  8:47                     ` Christian König
  0 siblings, 0 replies; 57+ messages in thread
From: Christian König @ 2021-02-12  8:47 UTC (permalink / raw)
  To: Kasireddy, Vivek, Gerd Hoffmann
  Cc: Kim, Dongwon, daniel.vetter, dri-devel, virtualization, Vetter,
	Daniel, linux-media

Hi Vivek,

Am 12.02.21 um 09:36 schrieb Kasireddy, Vivek:
> Hi Christian,
>
>> Hi Vivek,
>>
>>> [Kasireddy, Vivek] What if I do mmap() on the fd followed by mlock()
>>> or mmap() followed by get_user_pages()? If it still fails, would
>>> ioremapping the device memory and poking at the backing storage be an
>>> option? Or, if I bind the passthrough'd GPU device to vfio-pci and tap
>>> into the memory region associated with the device memory, can it be made to work?
>> get_user_pages() is not allowed on mmaped DMA-bufs in the first place.
>>
>> Daniel is currently adding code to make sure that this is never ever used.
>>
>>> And, I noticed that for PFNs that do not have valid struct page
>>> associated with it, KVM does a memremap() to access/map them. Is this an option?
>> No, even for system memory which has a valid struct page touching it when it is part of a
>> DMA-buf is illegal since the reference count and mapping fields in struct page might be
>> used for something different.
>>
>> Keep in mind that struct page is a heavily overloaded structure for different use cases. You
>> can't just use it for a different use case than what the owner of the page has intended it.
> [Kasireddy, Vivek] What is your recommended/acceptable way for doing what I am trying to
> do?

I'm not an expert on virtualisation, but Gerd seems to have a couple of 
ideas of how to get this working.

In general I think it is pretty much impossible to export stuff from the 
guest to the host by DMA-buf.

This is because of the fundamental concept of DMA-buf that the exporter 
needs to setup mappings (both CPU page tables as well as stuff like 
IOMMU). When the guest exports something it would mean that you give the 
guest control over the IOMMU and/or host page tables. And that is not 
something you can do as far as I can see.

You can only export stuff the other way around so that the host is 
providing the memory and the guest is consuming it. If I understand it 
correctly that's exactly what Gerd is suggesting here.

Regards,
Christian.

>
> Thanks,
> Vivek
>
>> Regards,
>> Christian.
>>
>>>
>>> Thanks,
>>> Vivek
>>>> take care,
>>>>     Gerd

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
  2021-02-12  8:47                     ` Christian König
  (?)
@ 2021-02-12 10:14                       ` Gerd Hoffmann
  -1 siblings, 0 replies; 57+ messages in thread
From: Gerd Hoffmann @ 2021-02-12 10:14 UTC (permalink / raw)
  To: Christian König
  Cc: Kasireddy, Vivek, Daniel Vetter, virtualization, dri-devel,
	Vetter, Daniel, daniel.vetter, Kim, Dongwon, sumit.semwal,
	linux-media

  Hi,

> This is because of the fundamental concept of DMA-buf that the exporter
> needs to setup mappings (both CPU page tables as well as stuff like IOMMU).
> When the guest exports something it would mean that you give the guest
> control over the IOMMU and/or host page tables. And that is not something
> you can do as far as I can see.

Correct.

> You can only export stuff the other way around so that the host is providing
> the memory and the guest is consuming it. If I understand it correctly
> that's exactly what Gerd is suggesting here.

It can also work the other way around (guest allocating and host
consuming).  That is just an implementation detail.  The /important/
thing is that the driver which exports the dma-buf (and thus handles the
mappings) must be aware of the virtualization so it can properly
coordinate things with the host side.

So vdmabuf allocating and exporting dma-bufs works.

But vdmabuf importing dma-bufs doesn't because you can't ask the
exporter to create *host* mappings as Christian outlined above.  Sure,
you can try to sidestep the exporter, fish the list of pages out of the
scatter list and run with that.  That will explode as soon as you meet a
dma-buf which is not backed by pages in the first place.  And even for
page-backed dma-bufs you can run into trouble, for example due to
mapping pages with the wrong caching attributes.  Alternatively you can
double-buffer and copy data from the imported dma-buf to some
host-shared memory, but I guess you don't want that for performance
reasons ...

take care,
  Gerd


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-12 10:14                       ` Gerd Hoffmann
  0 siblings, 0 replies; 57+ messages in thread
From: Gerd Hoffmann @ 2021-02-12 10:14 UTC (permalink / raw)
  To: Christian König
  Cc: Kim, Dongwon, daniel.vetter, dri-devel, virtualization,
	Daniel Vetter, Vetter, Daniel, sumit.semwal, linux-media

  Hi,

> This is because of the fundamental concept of DMA-buf that the exporter
> needs to setup mappings (both CPU page tables as well as stuff like IOMMU).
> When the guest exports something it would mean that you give the guest
> control over the IOMMU and/or host page tables. And that is not something
> you can do as far as I can see.

Correct.

> You can only export stuff the other way around so that the host is providing
> the memory and the guest is consuming it. If I understand it correctly
> that's exactly what Gerd is suggesting here.

It can also work the other way around (guest allocating and host
consuming).  That is just an implementation detail.  The /important/
thing is that the driver which exports the dma-buf (and thus handles the
mappings) must be aware of the virtualization so it can properly
coordinate things with the host side.

So vdmabuf allocating and exporting dma-bufs works.

But vdmabuf importing dma-bufs doesn't because you can't ask the
exporter to create *host* mappings as Christian outlined above.  Sure,
you can try to sidestep the exporter, fish the list of pages out of the
scatter list and run with that.  That will explode as soon as you meet a
dma-buf which is not backed by pages in the first place.  And even for
page-backed dma-bufs you can run into trouble, for example due to
mapping pages with the wrong caching attributes.  Alternatively you can
double-buffer and copy data from the imported dma-buf to some
host-shared memory, but I guess you don't want that for performance
reasons ...

take care,
  Gerd

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-12 10:14                       ` Gerd Hoffmann
  0 siblings, 0 replies; 57+ messages in thread
From: Gerd Hoffmann @ 2021-02-12 10:14 UTC (permalink / raw)
  To: Christian König
  Cc: Kim, Dongwon, daniel.vetter, Kasireddy, Vivek, dri-devel,
	virtualization, Vetter, Daniel, linux-media

  Hi,

> This is because of the fundamental concept of DMA-buf that the exporter
> needs to setup mappings (both CPU page tables as well as stuff like IOMMU).
> When the guest exports something it would mean that you give the guest
> control over the IOMMU and/or host page tables. And that is not something
> you can do as far as I can see.

Correct.

> You can only export stuff the other way around so that the host is providing
> the memory and the guest is consuming it. If I understand it correctly
> that's exactly what Gerd is suggesting here.

It can also work the other way around (guest allocating and host
consuming).  That is just an implementation detail.  The /important/
thing is that the driver which exports the dma-buf (and thus handles the
mappings) must be aware of the virtualization so it can properly
coordinate things with the host side.

So vdmabuf allocating and exporting dma-bufs works.

But vdmabuf importing dma-bufs doesn't because you can't ask the
exporter to create *host* mappings as Christian outlined above.  Sure,
you can try to sidestep the exporter, fish the list of pages out of the
scatter list and run with that.  That will explode as soon as you meet a
dma-buf which is not backed by pages in the first place.  And even for
page-backed dma-bufs you can run into trouble, for example due to
mapping pages with the wrong caching attributes.  Alternatively you can
double-buffer and copy data from the imported dma-buf to some
host-shared memory, but I guess you don't want that for performance
reasons ...

take care,
  Gerd

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
  2021-02-12  8:15                   ` Kasireddy, Vivek
  (?)
@ 2021-02-12 11:01                     ` Gerd Hoffmann
  -1 siblings, 0 replies; 57+ messages in thread
From: Gerd Hoffmann @ 2021-02-12 11:01 UTC (permalink / raw)
  To: Kasireddy, Vivek
  Cc: Daniel Vetter, virtualization, dri-devel, Vetter, Daniel,
	daniel.vetter, Kim, Dongwon, sumit.semwal, christian.koenig,
	linux-media

On Fri, Feb 12, 2021 at 08:15:12AM +0000, Kasireddy, Vivek wrote:
> Hi Gerd,
> 
> > > > You don't have to use the rendering pipeline.  You can let the i915
> > > > gpu render into a dma-buf shared with virtio-gpu, then use
> > > > virtio-gpu only for buffer sharing with the host.
> [Kasireddy, Vivek] Just to confirm my understanding of what you are suggesting, are
> you saying that we need to either have Weston allocate scanout buffers (GBM surface/BO)
> using virtio-gpu and render into them using i915; or have virtio-gpu allocate pages and 
> export a dma-buf and have Weston create a GBM BO by calling gbm_bo_import(fd) and
> render into the BO using i915?

Not sure what the difference between the former and the latter is.

> > Hmm, why a big mode switch?  You should be able to do that without modifying the
> > virtio-gpu guest driver.  On the host side qemu needs some work to support the most
> > recent virtio-gpu features like the buffer uuids (assuming you use qemu userspace), right
> > now those are only supported by crosvm.
> [Kasireddy, Vivek] We are only interested in Qemu UI at the moment but if we were to use
> virtio-gpu, we are going to need to add one more vq and support for managing buffers, 
> events, etc.

Should be easy and it should not need any virtio-gpu driver changes.

You can use virtio-gpu like a dumb scanout device.  Create a dumb
bo, create a framebuffer for the bo, map the framebuffer to the crtc.

Then export the bo, import into i915, use it as render target.  When
rendering is done flush (DRM_IOCTL_MODE_DIRTYFB).  Alternatively
allocate multiple bo's + framebuffers and pageflip.

Pretty standard workflow for cases where rendering and scanout are
handled by different devices.  As far I know not uncommon in the arm
world.

Right now this will involve a memcpy() for any display update because
qemu is a bit behind on supporting recent virtio-gpu features.

take care,
  Gerd


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-12 11:01                     ` Gerd Hoffmann
  0 siblings, 0 replies; 57+ messages in thread
From: Gerd Hoffmann @ 2021-02-12 11:01 UTC (permalink / raw)
  To: Kasireddy, Vivek
  Cc: Kim, Dongwon, christian.koenig, daniel.vetter, dri-devel,
	virtualization, Daniel Vetter, Vetter, Daniel, sumit.semwal,
	linux-media

On Fri, Feb 12, 2021 at 08:15:12AM +0000, Kasireddy, Vivek wrote:
> Hi Gerd,
> 
> > > > You don't have to use the rendering pipeline.  You can let the i915
> > > > gpu render into a dma-buf shared with virtio-gpu, then use
> > > > virtio-gpu only for buffer sharing with the host.
> [Kasireddy, Vivek] Just to confirm my understanding of what you are suggesting, are
> you saying that we need to either have Weston allocate scanout buffers (GBM surface/BO)
> using virtio-gpu and render into them using i915; or have virtio-gpu allocate pages and 
> export a dma-buf and have Weston create a GBM BO by calling gbm_bo_import(fd) and
> render into the BO using i915?

Not sure what the difference between the former and the latter is.

> > Hmm, why a big mode switch?  You should be able to do that without modifying the
> > virtio-gpu guest driver.  On the host side qemu needs some work to support the most
> > recent virtio-gpu features like the buffer uuids (assuming you use qemu userspace), right
> > now those are only supported by crosvm.
> [Kasireddy, Vivek] We are only interested in Qemu UI at the moment but if we were to use
> virtio-gpu, we are going to need to add one more vq and support for managing buffers, 
> events, etc.

Should be easy and it should not need any virtio-gpu driver changes.

You can use virtio-gpu like a dumb scanout device.  Create a dumb
bo, create a framebuffer for the bo, map the framebuffer to the crtc.

Then export the bo, import into i915, use it as render target.  When
rendering is done flush (DRM_IOCTL_MODE_DIRTYFB).  Alternatively
allocate multiple bo's + framebuffers and pageflip.

Pretty standard workflow for cases where rendering and scanout are
handled by different devices.  As far I know not uncommon in the arm
world.

Right now this will involve a memcpy() for any display update because
qemu is a bit behind on supporting recent virtio-gpu features.

take care,
  Gerd

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-12 11:01                     ` Gerd Hoffmann
  0 siblings, 0 replies; 57+ messages in thread
From: Gerd Hoffmann @ 2021-02-12 11:01 UTC (permalink / raw)
  To: Kasireddy, Vivek
  Cc: Kim, Dongwon, christian.koenig, daniel.vetter, dri-devel,
	virtualization, Vetter, Daniel, linux-media

On Fri, Feb 12, 2021 at 08:15:12AM +0000, Kasireddy, Vivek wrote:
> Hi Gerd,
> 
> > > > You don't have to use the rendering pipeline.  You can let the i915
> > > > gpu render into a dma-buf shared with virtio-gpu, then use
> > > > virtio-gpu only for buffer sharing with the host.
> [Kasireddy, Vivek] Just to confirm my understanding of what you are suggesting, are
> you saying that we need to either have Weston allocate scanout buffers (GBM surface/BO)
> using virtio-gpu and render into them using i915; or have virtio-gpu allocate pages and 
> export a dma-buf and have Weston create a GBM BO by calling gbm_bo_import(fd) and
> render into the BO using i915?

Not sure what the difference between the former and the latter is.

> > Hmm, why a big mode switch?  You should be able to do that without modifying the
> > virtio-gpu guest driver.  On the host side qemu needs some work to support the most
> > recent virtio-gpu features like the buffer uuids (assuming you use qemu userspace), right
> > now those are only supported by crosvm.
> [Kasireddy, Vivek] We are only interested in Qemu UI at the moment but if we were to use
> virtio-gpu, we are going to need to add one more vq and support for managing buffers, 
> events, etc.

Should be easy and it should not need any virtio-gpu driver changes.

You can use virtio-gpu like a dumb scanout device.  Create a dumb
bo, create a framebuffer for the bo, map the framebuffer to the crtc.

Then export the bo, import into i915, use it as render target.  When
rendering is done flush (DRM_IOCTL_MODE_DIRTYFB).  Alternatively
allocate multiple bo's + framebuffers and pageflip.

Pretty standard workflow for cases where rendering and scanout are
handled by different devices.  As far I know not uncommon in the arm
world.

Right now this will involve a memcpy() for any display update because
qemu is a bit behind on supporting recent virtio-gpu features.

take care,
  Gerd

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
  2021-02-12 11:01                     ` Gerd Hoffmann
  (?)
@ 2021-02-22  8:52                       ` Kasireddy, Vivek
  -1 siblings, 0 replies; 57+ messages in thread
From: Kasireddy, Vivek @ 2021-02-22  8:52 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Daniel Vetter, virtualization, dri-devel, Vetter, Daniel,
	daniel.vetter, Kim, Dongwon, sumit.semwal, christian.koenig,
	linux-media

Hi Gerd,

> 
> On Fri, Feb 12, 2021 at 08:15:12AM +0000, Kasireddy, Vivek wrote:
> > Hi Gerd,
> > [Kasireddy, Vivek] Just to confirm my understanding of what you are
> > suggesting, are you saying that we need to either have Weston allocate
> > scanout buffers (GBM surface/BO) using virtio-gpu and render into them
> > using i915; or have virtio-gpu allocate pages and export a dma-buf and
> > have Weston create a GBM BO by calling gbm_bo_import(fd) and render into the BO
> using i915?
> 
> Not sure what the difference between the former and the latter is.
[Kasireddy, Vivek] Oh, what I meant is whether you were suggesting that we 
create a GBM device and create a GBM surface and BOs using this device or
just create a raw/dumb GEM object and create a GBM BO by importing it. As
we just discovered, the former means we have to initialize virgl which complicates
things so we went with the latter.

> 
> > [Kasireddy, Vivek] We are only interested in Qemu UI at the moment but
> > if we were to use virtio-gpu, we are going to need to add one more vq
> > and support for managing buffers, events, etc.
> 
> Should be easy and it should not need any virtio-gpu driver changes.
[Kasireddy, Vivek] Vdmabuf v4, that implements your suggestion -- to have
Vdmabuf allocate pages --  is posted here:
https://lists.freedesktop.org/archives/dri-devel/2021-February/297841.html
and tested it with Weston Headless and Qemu:
https://gitlab.freedesktop.org/Vivek/weston/-/blob/vdmabuf/libweston/backend-headless/headless.c#L522
https://lists.nongnu.org/archive/html/qemu-devel/2021-02/msg02976.html

Having said that, after discussing with Daniel Vetter, we are now switching our
focus to virtio-gpu to compare and contrast both solutions. 

> 
> You can use virtio-gpu like a dumb scanout device.  Create a dumb bo, create a
> framebuffer for the bo, map the framebuffer to the crtc.
> 
> Then export the bo, import into i915, use it as render target.  When rendering is done flush
> (DRM_IOCTL_MODE_DIRTYFB).  Alternatively allocate multiple bo's + framebuffers
> and pageflip.
[Kasireddy, Vivek] Since we are testing with Weston, we are looking at pageflips (4 color
buffers). And, this part so far seems to work where virtio-gpu is used for kms (max_outputs=1)
and Iris/i915 is used for rendering. We are currently glueing virtio-gpu and i915 in Weston but
eventually the plan is to glue them (virgl/virtio-gpu and Iris) in Mesa if possible using KMSRO
(KMS render only) to avoid having to change Weston or X or other user-space components.

> 
> Pretty standard workflow for cases where rendering and scanout are handled by different
> devices.  As far I know not uncommon in the arm world.
> 
> Right now this will involve a memcpy() for any display update because qemu is a bit
> behind on supporting recent virtio-gpu features.
[Kasireddy, Vivek] IIUC, I think you are referring to creating the Pixman image in set_scanout.
What additional features need to be implemented or what is your recommendation in terms of
what needs to be done to turn the memcpy() into a dma-buf? Also, how should we ensure that
access to the guest fb/dmabuf is synchronized to ensure that the Guest and the Host do not access
the backing storage of the dmabuf at the same time?

Thanks,
Vivek

> 
> take care,
>   Gerd


^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-22  8:52                       ` Kasireddy, Vivek
  0 siblings, 0 replies; 57+ messages in thread
From: Kasireddy, Vivek @ 2021-02-22  8:52 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Kim, Dongwon, christian.koenig, daniel.vetter, dri-devel,
	virtualization, Daniel Vetter, Vetter, Daniel, sumit.semwal,
	linux-media

Hi Gerd,

> 
> On Fri, Feb 12, 2021 at 08:15:12AM +0000, Kasireddy, Vivek wrote:
> > Hi Gerd,
> > [Kasireddy, Vivek] Just to confirm my understanding of what you are
> > suggesting, are you saying that we need to either have Weston allocate
> > scanout buffers (GBM surface/BO) using virtio-gpu and render into them
> > using i915; or have virtio-gpu allocate pages and export a dma-buf and
> > have Weston create a GBM BO by calling gbm_bo_import(fd) and render into the BO
> using i915?
> 
> Not sure what the difference between the former and the latter is.
[Kasireddy, Vivek] Oh, what I meant is whether you were suggesting that we 
create a GBM device and create a GBM surface and BOs using this device or
just create a raw/dumb GEM object and create a GBM BO by importing it. As
we just discovered, the former means we have to initialize virgl which complicates
things so we went with the latter.

> 
> > [Kasireddy, Vivek] We are only interested in Qemu UI at the moment but
> > if we were to use virtio-gpu, we are going to need to add one more vq
> > and support for managing buffers, events, etc.
> 
> Should be easy and it should not need any virtio-gpu driver changes.
[Kasireddy, Vivek] Vdmabuf v4, that implements your suggestion -- to have
Vdmabuf allocate pages --  is posted here:
https://lists.freedesktop.org/archives/dri-devel/2021-February/297841.html
and tested it with Weston Headless and Qemu:
https://gitlab.freedesktop.org/Vivek/weston/-/blob/vdmabuf/libweston/backend-headless/headless.c#L522
https://lists.nongnu.org/archive/html/qemu-devel/2021-02/msg02976.html

Having said that, after discussing with Daniel Vetter, we are now switching our
focus to virtio-gpu to compare and contrast both solutions. 

> 
> You can use virtio-gpu like a dumb scanout device.  Create a dumb bo, create a
> framebuffer for the bo, map the framebuffer to the crtc.
> 
> Then export the bo, import into i915, use it as render target.  When rendering is done flush
> (DRM_IOCTL_MODE_DIRTYFB).  Alternatively allocate multiple bo's + framebuffers
> and pageflip.
[Kasireddy, Vivek] Since we are testing with Weston, we are looking at pageflips (4 color
buffers). And, this part so far seems to work where virtio-gpu is used for kms (max_outputs=1)
and Iris/i915 is used for rendering. We are currently glueing virtio-gpu and i915 in Weston but
eventually the plan is to glue them (virgl/virtio-gpu and Iris) in Mesa if possible using KMSRO
(KMS render only) to avoid having to change Weston or X or other user-space components.

> 
> Pretty standard workflow for cases where rendering and scanout are handled by different
> devices.  As far I know not uncommon in the arm world.
> 
> Right now this will involve a memcpy() for any display update because qemu is a bit
> behind on supporting recent virtio-gpu features.
[Kasireddy, Vivek] IIUC, I think you are referring to creating the Pixman image in set_scanout.
What additional features need to be implemented or what is your recommendation in terms of
what needs to be done to turn the memcpy() into a dma-buf? Also, how should we ensure that
access to the guest fb/dmabuf is synchronized to ensure that the Guest and the Host do not access
the backing storage of the dmabuf at the same time?

Thanks,
Vivek

> 
> take care,
>   Gerd

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-02-22  8:52                       ` Kasireddy, Vivek
  0 siblings, 0 replies; 57+ messages in thread
From: Kasireddy, Vivek @ 2021-02-22  8:52 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Kim, Dongwon, christian.koenig, daniel.vetter, dri-devel,
	virtualization, Vetter,  Daniel, linux-media

Hi Gerd,

> 
> On Fri, Feb 12, 2021 at 08:15:12AM +0000, Kasireddy, Vivek wrote:
> > Hi Gerd,
> > [Kasireddy, Vivek] Just to confirm my understanding of what you are
> > suggesting, are you saying that we need to either have Weston allocate
> > scanout buffers (GBM surface/BO) using virtio-gpu and render into them
> > using i915; or have virtio-gpu allocate pages and export a dma-buf and
> > have Weston create a GBM BO by calling gbm_bo_import(fd) and render into the BO
> using i915?
> 
> Not sure what the difference between the former and the latter is.
[Kasireddy, Vivek] Oh, what I meant is whether you were suggesting that we 
create a GBM device and create a GBM surface and BOs using this device or
just create a raw/dumb GEM object and create a GBM BO by importing it. As
we just discovered, the former means we have to initialize virgl which complicates
things so we went with the latter.

> 
> > [Kasireddy, Vivek] We are only interested in Qemu UI at the moment but
> > if we were to use virtio-gpu, we are going to need to add one more vq
> > and support for managing buffers, events, etc.
> 
> Should be easy and it should not need any virtio-gpu driver changes.
[Kasireddy, Vivek] Vdmabuf v4, that implements your suggestion -- to have
Vdmabuf allocate pages --  is posted here:
https://lists.freedesktop.org/archives/dri-devel/2021-February/297841.html
and tested it with Weston Headless and Qemu:
https://gitlab.freedesktop.org/Vivek/weston/-/blob/vdmabuf/libweston/backend-headless/headless.c#L522
https://lists.nongnu.org/archive/html/qemu-devel/2021-02/msg02976.html

Having said that, after discussing with Daniel Vetter, we are now switching our
focus to virtio-gpu to compare and contrast both solutions. 

> 
> You can use virtio-gpu like a dumb scanout device.  Create a dumb bo, create a
> framebuffer for the bo, map the framebuffer to the crtc.
> 
> Then export the bo, import into i915, use it as render target.  When rendering is done flush
> (DRM_IOCTL_MODE_DIRTYFB).  Alternatively allocate multiple bo's + framebuffers
> and pageflip.
[Kasireddy, Vivek] Since we are testing with Weston, we are looking at pageflips (4 color
buffers). And, this part so far seems to work where virtio-gpu is used for kms (max_outputs=1)
and Iris/i915 is used for rendering. We are currently glueing virtio-gpu and i915 in Weston but
eventually the plan is to glue them (virgl/virtio-gpu and Iris) in Mesa if possible using KMSRO
(KMS render only) to avoid having to change Weston or X or other user-space components.

> 
> Pretty standard workflow for cases where rendering and scanout are handled by different
> devices.  As far I know not uncommon in the arm world.
> 
> Right now this will involve a memcpy() for any display update because qemu is a bit
> behind on supporting recent virtio-gpu features.
[Kasireddy, Vivek] IIUC, I think you are referring to creating the Pixman image in set_scanout.
What additional features need to be implemented or what is your recommendation in terms of
what needs to be done to turn the memcpy() into a dma-buf? Also, how should we ensure that
access to the guest fb/dmabuf is synchronized to ensure that the Guest and the Host do not access
the backing storage of the dmabuf at the same time?

Thanks,
Vivek

> 
> take care,
>   Gerd

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
  2021-02-12 11:01                     ` Gerd Hoffmann
  (?)
@ 2021-03-15  2:27                       ` Zhang, Tina
  -1 siblings, 0 replies; 57+ messages in thread
From: Zhang, Tina @ 2021-03-15  2:27 UTC (permalink / raw)
  To: Gerd Hoffmann, Kasireddy, Vivek
  Cc: Kim, Dongwon, christian.koenig, daniel.vetter, dri-devel,
	virtualization, Vetter, Daniel, linux-media



> -----Original Message-----
> From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On Behalf Of Gerd
> Hoffmann
> Sent: Friday, February 12, 2021 7:02 PM
> To: Kasireddy, Vivek <vivek.kasireddy@intel.com>
> Cc: Kim, Dongwon <dongwon.kim@intel.com>; christian.koenig@amd.com;
> daniel.vetter@ffwll.ch; dri-devel@lists.freedesktop.org;
> virtualization@lists.linux-foundation.org; Vetter, Daniel
> <daniel.vetter@intel.com>; linux-media@vger.kernel.org
> Subject: Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
> 
> On Fri, Feb 12, 2021 at 08:15:12AM +0000, Kasireddy, Vivek wrote:
> > Hi Gerd,
> >
> > > > > You don't have to use the rendering pipeline.  You can let the
> > > > > i915 gpu render into a dma-buf shared with virtio-gpu, then use
> > > > > virtio-gpu only for buffer sharing with the host.
> > [Kasireddy, Vivek] Just to confirm my understanding of what you are
> > suggesting, are you saying that we need to either have Weston allocate
> > scanout buffers (GBM surface/BO) using virtio-gpu and render into them
> > using i915; or have virtio-gpu allocate pages and export a dma-buf and
> > have Weston create a GBM BO by calling gbm_bo_import(fd) and render into
> the BO using i915?
> 
> Not sure what the difference between the former and the latter is.
> 
> > > Hmm, why a big mode switch?  You should be able to do that without
> > > modifying the virtio-gpu guest driver.  On the host side qemu needs
> > > some work to support the most recent virtio-gpu features like the
> > > buffer uuids (assuming you use qemu userspace), right now those are only
> supported by crosvm.
> > [Kasireddy, Vivek] We are only interested in Qemu UI at the moment but
> > if we were to use virtio-gpu, we are going to need to add one more vq
> > and support for managing buffers, events, etc.
> 
> Should be easy and it should not need any virtio-gpu driver changes.
> 
> You can use virtio-gpu like a dumb scanout device.  Create a dumb bo, create a
> framebuffer for the bo, map the framebuffer to the crtc.
> 
> Then export the bo, import into i915, use it as render target.  When rendering is
> done flush (DRM_IOCTL_MODE_DIRTYFB).  Alternatively allocate multiple bo's +
> framebuffers and pageflip.

Hi,

We've got a MR(https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9592) for this suggested implementation. Comments are welcome. Thanks.

BR,
Tina

> 
> Pretty standard workflow for cases where rendering and scanout are handled by
> different devices.  As far I know not uncommon in the arm world.
> 
> Right now this will involve a memcpy() for any display update because qemu is a
> bit behind on supporting recent virtio-gpu features.
> 
> take care,
>   Gerd
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-03-15  2:27                       ` Zhang, Tina
  0 siblings, 0 replies; 57+ messages in thread
From: Zhang, Tina @ 2021-03-15  2:27 UTC (permalink / raw)
  To: Gerd Hoffmann, Kasireddy, Vivek
  Cc: Kim, Dongwon, daniel.vetter, dri-devel, virtualization, Vetter,
	Daniel, christian.koenig, linux-media



> -----Original Message-----
> From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On Behalf Of Gerd
> Hoffmann
> Sent: Friday, February 12, 2021 7:02 PM
> To: Kasireddy, Vivek <vivek.kasireddy@intel.com>
> Cc: Kim, Dongwon <dongwon.kim@intel.com>; christian.koenig@amd.com;
> daniel.vetter@ffwll.ch; dri-devel@lists.freedesktop.org;
> virtualization@lists.linux-foundation.org; Vetter, Daniel
> <daniel.vetter@intel.com>; linux-media@vger.kernel.org
> Subject: Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
> 
> On Fri, Feb 12, 2021 at 08:15:12AM +0000, Kasireddy, Vivek wrote:
> > Hi Gerd,
> >
> > > > > You don't have to use the rendering pipeline.  You can let the
> > > > > i915 gpu render into a dma-buf shared with virtio-gpu, then use
> > > > > virtio-gpu only for buffer sharing with the host.
> > [Kasireddy, Vivek] Just to confirm my understanding of what you are
> > suggesting, are you saying that we need to either have Weston allocate
> > scanout buffers (GBM surface/BO) using virtio-gpu and render into them
> > using i915; or have virtio-gpu allocate pages and export a dma-buf and
> > have Weston create a GBM BO by calling gbm_bo_import(fd) and render into
> the BO using i915?
> 
> Not sure what the difference between the former and the latter is.
> 
> > > Hmm, why a big mode switch?  You should be able to do that without
> > > modifying the virtio-gpu guest driver.  On the host side qemu needs
> > > some work to support the most recent virtio-gpu features like the
> > > buffer uuids (assuming you use qemu userspace), right now those are only
> supported by crosvm.
> > [Kasireddy, Vivek] We are only interested in Qemu UI at the moment but
> > if we were to use virtio-gpu, we are going to need to add one more vq
> > and support for managing buffers, events, etc.
> 
> Should be easy and it should not need any virtio-gpu driver changes.
> 
> You can use virtio-gpu like a dumb scanout device.  Create a dumb bo, create a
> framebuffer for the bo, map the framebuffer to the crtc.
> 
> Then export the bo, import into i915, use it as render target.  When rendering is
> done flush (DRM_IOCTL_MODE_DIRTYFB).  Alternatively allocate multiple bo's +
> framebuffers and pageflip.

Hi,

We've got a MR(https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9592) for this suggested implementation. Comments are welcome. Thanks.

BR,
Tina

> 
> Pretty standard workflow for cases where rendering and scanout are handled by
> different devices.  As far I know not uncommon in the arm world.
> 
> Right now this will involve a memcpy() for any display update because qemu is a
> bit behind on supporting recent virtio-gpu features.
> 
> take care,
>   Gerd
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
@ 2021-03-15  2:27                       ` Zhang, Tina
  0 siblings, 0 replies; 57+ messages in thread
From: Zhang, Tina @ 2021-03-15  2:27 UTC (permalink / raw)
  To: Gerd Hoffmann, Kasireddy, Vivek
  Cc: Kim, Dongwon, daniel.vetter, dri-devel, virtualization, Vetter,
	Daniel, christian.koenig, linux-media



> -----Original Message-----
> From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On Behalf Of Gerd
> Hoffmann
> Sent: Friday, February 12, 2021 7:02 PM
> To: Kasireddy, Vivek <vivek.kasireddy@intel.com>
> Cc: Kim, Dongwon <dongwon.kim@intel.com>; christian.koenig@amd.com;
> daniel.vetter@ffwll.ch; dri-devel@lists.freedesktop.org;
> virtualization@lists.linux-foundation.org; Vetter, Daniel
> <daniel.vetter@intel.com>; linux-media@vger.kernel.org
> Subject: Re: [RFC v3 2/3] virtio: Introduce Vdmabuf driver
> 
> On Fri, Feb 12, 2021 at 08:15:12AM +0000, Kasireddy, Vivek wrote:
> > Hi Gerd,
> >
> > > > > You don't have to use the rendering pipeline.  You can let the
> > > > > i915 gpu render into a dma-buf shared with virtio-gpu, then use
> > > > > virtio-gpu only for buffer sharing with the host.
> > [Kasireddy, Vivek] Just to confirm my understanding of what you are
> > suggesting, are you saying that we need to either have Weston allocate
> > scanout buffers (GBM surface/BO) using virtio-gpu and render into them
> > using i915; or have virtio-gpu allocate pages and export a dma-buf and
> > have Weston create a GBM BO by calling gbm_bo_import(fd) and render into
> the BO using i915?
> 
> Not sure what the difference between the former and the latter is.
> 
> > > Hmm, why a big mode switch?  You should be able to do that without
> > > modifying the virtio-gpu guest driver.  On the host side qemu needs
> > > some work to support the most recent virtio-gpu features like the
> > > buffer uuids (assuming you use qemu userspace), right now those are only
> supported by crosvm.
> > [Kasireddy, Vivek] We are only interested in Qemu UI at the moment but
> > if we were to use virtio-gpu, we are going to need to add one more vq
> > and support for managing buffers, events, etc.
> 
> Should be easy and it should not need any virtio-gpu driver changes.
> 
> You can use virtio-gpu like a dumb scanout device.  Create a dumb bo, create a
> framebuffer for the bo, map the framebuffer to the crtc.
> 
> Then export the bo, import into i915, use it as render target.  When rendering is
> done flush (DRM_IOCTL_MODE_DIRTYFB).  Alternatively allocate multiple bo's +
> framebuffers and pageflip.

Hi,

We've got a MR(https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9592) for this suggested implementation. Comments are welcome. Thanks.

BR,
Tina

> 
> Pretty standard workflow for cases where rendering and scanout are handled by
> different devices.  As far I know not uncommon in the arm world.
> 
> Right now this will involve a memcpy() for any display update because qemu is a
> bit behind on supporting recent virtio-gpu features.
> 
> take care,
>   Gerd
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2021-03-15  2:28 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-03  7:35 [RFC v3 0/3] Introduce Virtio based Dmabuf driver Vivek Kasireddy
2021-02-03  7:35 ` Vivek Kasireddy
2021-02-03  7:35 ` Vivek Kasireddy
2021-02-03  7:35 ` [RFC v3 1/3] kvm: Add a notifier for create and destroy VM events Vivek Kasireddy
2021-02-03  7:35   ` Vivek Kasireddy
2021-02-03  7:35   ` Vivek Kasireddy
2021-02-03  7:35 ` [RFC v3 2/3] virtio: Introduce Vdmabuf driver Vivek Kasireddy
2021-02-03  7:35   ` Vivek Kasireddy
2021-02-03  7:35   ` Vivek Kasireddy
2021-02-03  9:15   ` kernel test robot
2021-02-05 16:03   ` Daniel Vetter
2021-02-05 16:03     ` Daniel Vetter
2021-02-05 16:03     ` Daniel Vetter
2021-02-08  7:57     ` Gerd Hoffmann
2021-02-08  7:57       ` Gerd Hoffmann
2021-02-08  7:57       ` Gerd Hoffmann
2021-02-08  9:38       ` Daniel Vetter
2021-02-08  9:38         ` Daniel Vetter
2021-02-08  9:38         ` Daniel Vetter
2021-02-09  0:25         ` Kasireddy, Vivek
2021-02-09  0:25           ` Kasireddy, Vivek
2021-02-09  0:25           ` Kasireddy, Vivek
2021-02-09  8:44           ` Gerd Hoffmann
2021-02-09  8:44             ` Gerd Hoffmann
2021-02-10  4:47             ` Kasireddy, Vivek
2021-02-10  4:47               ` Kasireddy, Vivek
2021-02-10  4:47               ` Kasireddy, Vivek
2021-02-10  8:05               ` Christian König
2021-02-10  8:05                 ` Christian König
2021-02-10  8:05                 ` Christian König
2021-02-12  8:36                 ` Kasireddy, Vivek
2021-02-12  8:36                   ` Kasireddy, Vivek
2021-02-12  8:36                   ` Kasireddy, Vivek
2021-02-12  8:47                   ` Christian König
2021-02-12  8:47                     ` Christian König
2021-02-12  8:47                     ` Christian König
2021-02-12 10:14                     ` Gerd Hoffmann
2021-02-12 10:14                       ` Gerd Hoffmann
2021-02-12 10:14                       ` Gerd Hoffmann
2021-02-10  9:16               ` Gerd Hoffmann
2021-02-10  9:16                 ` Gerd Hoffmann
2021-02-10  9:16                 ` Gerd Hoffmann
2021-02-12  8:15                 ` Kasireddy, Vivek
2021-02-12  8:15                   ` Kasireddy, Vivek
2021-02-12  8:15                   ` Kasireddy, Vivek
2021-02-12 11:01                   ` Gerd Hoffmann
2021-02-12 11:01                     ` Gerd Hoffmann
2021-02-12 11:01                     ` Gerd Hoffmann
2021-02-22  8:52                     ` Kasireddy, Vivek
2021-02-22  8:52                       ` Kasireddy, Vivek
2021-02-22  8:52                       ` Kasireddy, Vivek
2021-03-15  2:27                     ` Zhang, Tina
2021-03-15  2:27                       ` Zhang, Tina
2021-03-15  2:27                       ` Zhang, Tina
2021-02-03  7:35 ` [RFC v3 3/3] vhost: Add Vdmabuf backend Vivek Kasireddy
2021-02-03  7:35   ` Vivek Kasireddy
2021-02-03  7:35   ` Vivek Kasireddy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.