Virtualization Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH V2 vhost next 00/10] VDPA support for Mellanox ConnectX devices
@ 2020-07-20  7:14 Eli Cohen
  2020-07-20  7:14 ` [PATCH V2 vhost next 01/10] vhost-vdpa: support batch updating Eli Cohen
                   ` (9 more replies)
  0 siblings, 10 replies; 18+ messages in thread
From: Eli Cohen @ 2020-07-20  7:14 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: shahafs, saeedm, parav, Eli Cohen

Hi Michael,
please note that this series depends on mlx5 core device driver patches
in mlx5-next branch in
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git.

git pull git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git mlx5-next 

They also depend Jason Wang's patches submitted a couple of weeks ago.

vdpa_sim: use the batching API
vhost-vdpa: support batch updating


The following series of patches provide VDPA support for Mellanox
devices. The supported devices are ConnectX6 DX and newer.

Currently, only a network driver is implemented; future patches will
introduce a block device driver. iperf performance on a single queue is
around 12 Gbps.  Future patches will introduce multi queue support.

The files are organized in such a way that code that can be used by
different VDPA implementations will be placed in a common are resides in
drivers/vdpa/mlx5/core.

Only virtual functions are currently supported. Also, certain firmware
capabilities must be set to enable the driver. Physical functions (PFs)
are skipped by the driver.

To make use of the VDPA net driver, one must load mlx5_vdpa. In such
case, VFs will be operated by the VDPA driver. Although one can see a
regular instance of a network driver on the VF, the VDPA driver takes
precedence over the NIC driver, steering-wize.

Currently, the device/interface infrastructure in mlx5_core is used to
probe drivers. Future patches will introduce virtbus as a means to
register devices and drivers and VDPA will be adapted to it.

The mlx5 mode of operation required to support VDPA is switchdev mode.
Once can use Linux or OVS bridge to take care of layer 2 switching.

In order to provide virtio networking to a guest, an updated version of
qemu is required. This version has been tested by the following quemu
version:

url: https://github.com/jasowang/qemu.git
branch: vdpa
Commit ID: 6f4e59b807db

Eli Cohen (7):
  net/vdpa: Use struct for set/get vq state
  vhost: Fix documentation
  vdpa: Modify get_vq_state() to return error code
  vdpa/mlx5: Add hardware descriptive header file
  vdpa/mlx5: Add support library for mlx5 VDPA implementation
  vdpa/mlx5: Add shared memory registration code
  vdpa/mlx5: Add VDPA driver for supported mlx5 devices

Jason Wang (2):
  vhost-vdpa: support batch updating
  vdpa_sim: use the batching API

Max Gurtovoy (1):
  vdpa: remove hard coded virtq num

 drivers/vdpa/Kconfig                   |   18 +
 drivers/vdpa/Makefile                  |    1 +
 drivers/vdpa/ifcvf/ifcvf_base.c        |    4 +-
 drivers/vdpa/ifcvf/ifcvf_base.h        |    4 +-
 drivers/vdpa/ifcvf/ifcvf_main.c        |   13 +-
 drivers/vdpa/mlx5/Makefile             |    4 +
 drivers/vdpa/mlx5/core/mlx5_vdpa.h     |   91 ++
 drivers/vdpa/mlx5/core/mlx5_vdpa_ifc.h |  168 ++
 drivers/vdpa/mlx5/core/mr.c            |  473 ++++++
 drivers/vdpa/mlx5/core/resources.c     |  284 ++++
 drivers/vdpa/mlx5/net/main.c           |   76 +
 drivers/vdpa/mlx5/net/mlx5_vnet.c      | 1950 ++++++++++++++++++++++++
 drivers/vdpa/mlx5/net/mlx5_vnet.h      |   24 +
 drivers/vdpa/vdpa.c                    |    3 +
 drivers/vdpa/vdpa_sim/vdpa_sim.c       |   35 +-
 drivers/vhost/iotlb.c                  |    4 +-
 drivers/vhost/vdpa.c                   |   46 +-
 include/linux/vdpa.h                   |   24 +-
 include/uapi/linux/vhost_types.h       |    2 +
 19 files changed, 3165 insertions(+), 59 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/Makefile
 create mode 100644 drivers/vdpa/mlx5/core/mlx5_vdpa.h
 create mode 100644 drivers/vdpa/mlx5/core/mlx5_vdpa_ifc.h
 create mode 100644 drivers/vdpa/mlx5/core/mr.c
 create mode 100644 drivers/vdpa/mlx5/core/resources.c
 create mode 100644 drivers/vdpa/mlx5/net/main.c
 create mode 100644 drivers/vdpa/mlx5/net/mlx5_vnet.c
 create mode 100644 drivers/vdpa/mlx5/net/mlx5_vnet.h

-- 
2.26.0

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V2 vhost next 01/10] vhost-vdpa: support batch updating
  2020-07-20  7:14 [PATCH V2 vhost next 00/10] VDPA support for Mellanox ConnectX devices Eli Cohen
@ 2020-07-20  7:14 ` Eli Cohen
  2020-07-20  7:14 ` [PATCH V2 vhost next 02/10] vdpa_sim: use the batching API Eli Cohen
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Eli Cohen @ 2020-07-20  7:14 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel; +Cc: shahafs, saeedm, parav

From: Jason Wang <jasowang@redhat.com>

Change-Id: Ifd7139cfbd122dc22493f4bb93fa884f8edbddb6
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vdpa.c             | 23 +++++++++++++++++------
 include/uapi/linux/vhost_types.h |  2 ++
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index a54b60d6623f..8827ae31f96d 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -73,6 +73,7 @@ struct vhost_vdpa {
 	int virtio_id;
 	int minor;
 	struct eventfd_ctx *config_ctx;
+	bool in_batch;
 };
 
 static DEFINE_IDA(vhost_vdpa_ida);
@@ -521,9 +522,9 @@ static int vhost_vdpa_map(struct vhost_vdpa *v,
 
 	if (ops->dma_map)
 		r = ops->dma_map(vdpa, iova, size, pa, perm);
-	else if (ops->set_map)
-		r = ops->set_map(vdpa, dev->iotlb);
-	else
+	else if (ops->set_map) {
+
+	} else
 		r = iommu_map(v->domain, iova, pa, size,
 			      perm_to_iommu_flags(perm));
 
@@ -540,9 +541,8 @@ static void vhost_vdpa_unmap(struct vhost_vdpa *v, u64 iova, u64 size)
 
 	if (ops->dma_map)
 		ops->dma_unmap(vdpa, iova, size);
-	else if (ops->set_map)
-		ops->set_map(vdpa, dev->iotlb);
-	else
+	else if (ops->set_map) {
+	} else
 		iommu_unmap(v->domain, iova, size);
 }
 
@@ -636,6 +636,8 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev,
 					struct vhost_iotlb_msg *msg)
 {
 	struct vhost_vdpa *v = container_of(dev, struct vhost_vdpa, vdev);
+	struct vdpa_device *vdpa = v->vdpa;
+	const struct vdpa_config_ops *ops = vdpa->config;
 	int r = 0;
 
 	r = vhost_dev_check_owner(dev);
@@ -649,6 +651,15 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev,
 	case VHOST_IOTLB_INVALIDATE:
 		vhost_vdpa_unmap(v, msg->iova, msg->size);
 		break;
+	case VHOST_IOTLB_BATCH_BEGIN:
+		v->in_batch = true;
+		break;
+	case VHOST_IOTLB_BATCH_END:
+		if (v->in_batch && ops->set_map) {
+			ops->set_map(vdpa, dev->iotlb);
+		}
+		v->in_batch = false;
+		break;
 	default:
 		r = -EINVAL;
 		break;
diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
index 669457ce5c48..7f52245f8d5d 100644
--- a/include/uapi/linux/vhost_types.h
+++ b/include/uapi/linux/vhost_types.h
@@ -60,6 +60,8 @@ struct vhost_iotlb_msg {
 #define VHOST_IOTLB_UPDATE         2
 #define VHOST_IOTLB_INVALIDATE     3
 #define VHOST_IOTLB_ACCESS_FAIL    4
+#define VHOST_IOTLB_BATCH_BEGIN    5
+#define VHOST_IOTLB_BATCH_END      6
 	__u8 type;
 };
 
-- 
2.26.0

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V2 vhost next 02/10] vdpa_sim: use the batching API
  2020-07-20  7:14 [PATCH V2 vhost next 00/10] VDPA support for Mellanox ConnectX devices Eli Cohen
  2020-07-20  7:14 ` [PATCH V2 vhost next 01/10] vhost-vdpa: support batch updating Eli Cohen
@ 2020-07-20  7:14 ` Eli Cohen
  2020-07-20  7:14 ` [PATCH V2 vhost next 03/10] vdpa: remove hard coded virtq num Eli Cohen
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Eli Cohen @ 2020-07-20  7:14 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel; +Cc: shahafs, saeedm, parav

From: Jason Wang <jasowang@redhat.com>

Change-Id: I3751f1aecce285e0f61530c69616852d49e5f547
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 20 --------------------
 drivers/vhost/vdpa.c             |  1 -
 2 files changed, 21 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index c7334cc65bb2..82d1068af3ef 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -548,24 +548,6 @@ static int vdpasim_set_map(struct vdpa_device *vdpa,
 	return ret;
 }
 
-static int vdpasim_dma_map(struct vdpa_device *vdpa, u64 iova, u64 size,
-			   u64 pa, u32 perm)
-{
-	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
-
-	return vhost_iotlb_add_range(vdpasim->iommu, iova,
-				     iova + size - 1, pa, perm);
-}
-
-static int vdpasim_dma_unmap(struct vdpa_device *vdpa, u64 iova, u64 size)
-{
-	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
-
-	vhost_iotlb_del_range(vdpasim->iommu, iova, iova + size - 1);
-
-	return 0;
-}
-
 static void vdpasim_free(struct vdpa_device *vdpa)
 {
 	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
@@ -598,8 +580,6 @@ static const struct vdpa_config_ops vdpasim_net_config_ops = {
 	.set_config             = vdpasim_set_config,
 	.get_generation         = vdpasim_get_generation,
 	.set_map                = vdpasim_set_map,
-	.dma_map                = vdpasim_dma_map,
-	.dma_unmap              = vdpasim_dma_unmap,
 	.free                   = vdpasim_free,
 };
 
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 8827ae31f96d..65195b30133b 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -533,7 +533,6 @@ static int vhost_vdpa_map(struct vhost_vdpa *v,
 
 static void vhost_vdpa_unmap(struct vhost_vdpa *v, u64 iova, u64 size)
 {
-	struct vhost_dev *dev = &v->vdev;
 	struct vdpa_device *vdpa = v->vdpa;
 	const struct vdpa_config_ops *ops = vdpa->config;
 
-- 
2.26.0

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V2 vhost next 03/10] vdpa: remove hard coded virtq num
  2020-07-20  7:14 [PATCH V2 vhost next 00/10] VDPA support for Mellanox ConnectX devices Eli Cohen
  2020-07-20  7:14 ` [PATCH V2 vhost next 01/10] vhost-vdpa: support batch updating Eli Cohen
  2020-07-20  7:14 ` [PATCH V2 vhost next 02/10] vdpa_sim: use the batching API Eli Cohen
@ 2020-07-20  7:14 ` Eli Cohen
  2020-07-20  7:14 ` [PATCH V2 vhost next 04/10] net/vdpa: Use struct for set/get vq state Eli Cohen
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Eli Cohen @ 2020-07-20  7:14 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: shahafs, saeedm, parav, Max Gurtovoy

From: Max Gurtovoy <maxg@mellanox.com>

This will enable vdpa providers to add support for multi queue feature
and publish it to upper layers (vhost and virtio).

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vdpa/ifcvf/ifcvf_main.c  | 3 ++-
 drivers/vdpa/vdpa.c              | 3 +++
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 4 ++--
 drivers/vhost/vdpa.c             | 9 +++------
 include/linux/vdpa.h             | 6 ++++--
 5 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
index f5a60c14b979..e0d43f25fc88 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
@@ -420,7 +420,8 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	}
 
 	adapter = vdpa_alloc_device(struct ifcvf_adapter, vdpa,
-				    dev, &ifc_vdpa_ops);
+				    dev, &ifc_vdpa_ops,
+				    IFCVF_MAX_QUEUE_PAIRS * 2);
 	if (adapter == NULL) {
 		IFCVF_ERR(pdev, "Failed to allocate vDPA structure");
 		return -ENOMEM;
diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index de211ef3738c..f9c59f023d6d 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -61,6 +61,7 @@ static void vdpa_release_dev(struct device *d)
  * initialized but before registered.
  * @parent: the parent device
  * @config: the bus operations that is supported by this device
+ * @nvqs: number of virtqueues supported by this device
  * @size: size of the parent structure that contains private data
  *
  * Driver should use vdpa_alloc_device() wrapper macro instead of
@@ -71,6 +72,7 @@ static void vdpa_release_dev(struct device *d)
  */
 struct vdpa_device *__vdpa_alloc_device(struct device *parent,
 					const struct vdpa_config_ops *config,
+					int nvqs,
 					size_t size)
 {
 	struct vdpa_device *vdev;
@@ -96,6 +98,7 @@ struct vdpa_device *__vdpa_alloc_device(struct device *parent,
 	vdev->dev.release = vdpa_release_dev;
 	vdev->index = err;
 	vdev->config = config;
+	vdev->nvqs = nvqs;
 
 	err = dev_set_name(&vdev->dev, "vdpa%u", vdev->index);
 	if (err)
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 82d1068af3ef..a38d8dc12192 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -60,7 +60,7 @@ static u64 vdpasim_features = (1ULL << VIRTIO_F_ANY_LAYOUT) |
 /* State of each vdpasim device */
 struct vdpasim {
 	struct vdpa_device vdpa;
-	struct vdpasim_virtqueue vqs[2];
+	struct vdpasim_virtqueue vqs[VDPASIM_VQ_NUM];
 	struct work_struct work;
 	/* spinlock to synchronize virtqueue state */
 	spinlock_t lock;
@@ -312,7 +312,7 @@ static struct vdpasim *vdpasim_create(void)
 	int ret = -ENOMEM;
 
 	vdpasim = vdpa_alloc_device(struct vdpasim, vdpa, NULL,
-				    &vdpasim_net_config_ops);
+				    &vdpasim_net_config_ops, VDPASIM_VQ_NUM);
 	if (!vdpasim)
 		goto err_alloc;
 
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 65195b30133b..78f9d90c23b1 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -55,9 +55,6 @@ enum {
 		(1ULL << VIRTIO_NET_F_SPEED_DUPLEX),
 };
 
-/* Currently, only network backend w/o multiqueue is supported. */
-#define VHOST_VDPA_VQ_MAX	2
-
 #define VHOST_VDPA_DEV_MAX (1U << MINORBITS)
 
 struct vhost_vdpa {
@@ -882,7 +879,7 @@ static int vhost_vdpa_probe(struct vdpa_device *vdpa)
 {
 	const struct vdpa_config_ops *ops = vdpa->config;
 	struct vhost_vdpa *v;
-	int minor, nvqs = VHOST_VDPA_VQ_MAX;
+	int minor;
 	int r;
 
 	/* Currently, we only accept the network devices. */
@@ -903,14 +900,14 @@ static int vhost_vdpa_probe(struct vdpa_device *vdpa)
 	atomic_set(&v->opened, 0);
 	v->minor = minor;
 	v->vdpa = vdpa;
-	v->nvqs = nvqs;
+	v->nvqs = vdpa->nvqs;
 	v->virtio_id = ops->get_device_id(vdpa);
 
 	device_initialize(&v->dev);
 	v->dev.release = vhost_vdpa_release_dev;
 	v->dev.parent = &vdpa->dev;
 	v->dev.devt = MKDEV(MAJOR(vhost_vdpa_major), minor);
-	v->vqs = kmalloc_array(nvqs, sizeof(struct vhost_virtqueue),
+	v->vqs = kmalloc_array(v->nvqs, sizeof(struct vhost_virtqueue),
 			       GFP_KERNEL);
 	if (!v->vqs) {
 		r = -ENOMEM;
diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 239db794357c..715a159ee1f0 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -39,6 +39,7 @@ struct vdpa_device {
 	struct device *dma_dev;
 	const struct vdpa_config_ops *config;
 	unsigned int index;
+	int nvqs;
 };
 
 /**
@@ -208,11 +209,12 @@ struct vdpa_config_ops {
 
 struct vdpa_device *__vdpa_alloc_device(struct device *parent,
 					const struct vdpa_config_ops *config,
+					int nvqs,
 					size_t size);
 
-#define vdpa_alloc_device(dev_struct, member, parent, config)   \
+#define vdpa_alloc_device(dev_struct, member, parent, config, nvqs)   \
 			  container_of(__vdpa_alloc_device( \
-				       parent, config, \
+				       parent, config, nvqs, \
 				       sizeof(dev_struct) + \
 				       BUILD_BUG_ON_ZERO(offsetof( \
 				       dev_struct, member))), \
-- 
2.26.0

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V2 vhost next 04/10] net/vdpa: Use struct for set/get vq state
  2020-07-20  7:14 [PATCH V2 vhost next 00/10] VDPA support for Mellanox ConnectX devices Eli Cohen
                   ` (2 preceding siblings ...)
  2020-07-20  7:14 ` [PATCH V2 vhost next 03/10] vdpa: remove hard coded virtq num Eli Cohen
@ 2020-07-20  7:14 ` Eli Cohen
  2020-07-20  7:14 ` [PATCH V2 vhost next 05/10] vhost: Fix documentation Eli Cohen
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Eli Cohen @ 2020-07-20  7:14 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: shahafs, saeedm, parav, Eli Cohen

For now VQ state involves 16 bit available index value encoded in u64
variable. In the future it will be extended to contain more fields. Use
struct to contain the state, now containing only a single u16 for the
available index. In the future we can add fields to this struct.

Reviewed-by: Parav Pandit <parav@mellanox.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
---
 drivers/vdpa/ifcvf/ifcvf_base.c  |  4 ++--
 drivers/vdpa/ifcvf/ifcvf_base.h  |  4 ++--
 drivers/vdpa/ifcvf/ifcvf_main.c  |  9 +++++----
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 10 ++++++----
 drivers/vhost/vdpa.c             | 10 +++++++---
 include/linux/vdpa.h             | 18 ++++++++++++++----
 6 files changed, 36 insertions(+), 19 deletions(-)

diff --git a/drivers/vdpa/ifcvf/ifcvf_base.c b/drivers/vdpa/ifcvf/ifcvf_base.c
index 94bf0328b68d..f2a128e56de5 100644
--- a/drivers/vdpa/ifcvf/ifcvf_base.c
+++ b/drivers/vdpa/ifcvf/ifcvf_base.c
@@ -272,7 +272,7 @@ static int ifcvf_config_features(struct ifcvf_hw *hw)
 	return 0;
 }
 
-u64 ifcvf_get_vq_state(struct ifcvf_hw *hw, u16 qid)
+u16 ifcvf_get_vq_state(struct ifcvf_hw *hw, u16 qid)
 {
 	struct ifcvf_lm_cfg __iomem *ifcvf_lm;
 	void __iomem *avail_idx_addr;
@@ -287,7 +287,7 @@ u64 ifcvf_get_vq_state(struct ifcvf_hw *hw, u16 qid)
 	return last_avail_idx;
 }
 
-int ifcvf_set_vq_state(struct ifcvf_hw *hw, u16 qid, u64 num)
+int ifcvf_set_vq_state(struct ifcvf_hw *hw, u16 qid, u16 num)
 {
 	struct ifcvf_lm_cfg __iomem *ifcvf_lm;
 	void __iomem *avail_idx_addr;
diff --git a/drivers/vdpa/ifcvf/ifcvf_base.h b/drivers/vdpa/ifcvf/ifcvf_base.h
index f4554412e607..d35c47a812a3 100644
--- a/drivers/vdpa/ifcvf/ifcvf_base.h
+++ b/drivers/vdpa/ifcvf/ifcvf_base.h
@@ -116,7 +116,7 @@ void ifcvf_set_status(struct ifcvf_hw *hw, u8 status);
 void io_write64_twopart(u64 val, u32 *lo, u32 *hi);
 void ifcvf_reset(struct ifcvf_hw *hw);
 u64 ifcvf_get_features(struct ifcvf_hw *hw);
-u64 ifcvf_get_vq_state(struct ifcvf_hw *hw, u16 qid);
-int ifcvf_set_vq_state(struct ifcvf_hw *hw, u16 qid, u64 num);
+u16 ifcvf_get_vq_state(struct ifcvf_hw *hw, u16 qid);
+int ifcvf_set_vq_state(struct ifcvf_hw *hw, u16 qid, u16 num);
 struct ifcvf_adapter *vf_to_adapter(struct ifcvf_hw *hw);
 #endif /* _IFCVF_H_ */
diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
index e0d43f25fc88..69032ee97824 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
@@ -235,19 +235,20 @@ static u16 ifcvf_vdpa_get_vq_num_max(struct vdpa_device *vdpa_dev)
 	return IFCVF_QUEUE_MAX;
 }
 
-static u64 ifcvf_vdpa_get_vq_state(struct vdpa_device *vdpa_dev, u16 qid)
+static void ifcvf_vdpa_get_vq_state(struct vdpa_device *vdpa_dev, u16 qid,
+				    struct vdpa_vq_state *state)
 {
 	struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
 
-	return ifcvf_get_vq_state(vf, qid);
+	state->avail_index = ifcvf_get_vq_state(vf, qid);
 }
 
 static int ifcvf_vdpa_set_vq_state(struct vdpa_device *vdpa_dev, u16 qid,
-				   u64 num)
+				   const struct vdpa_vq_state *state)
 {
 	struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
 
-	return ifcvf_set_vq_state(vf, qid, num);
+	return ifcvf_set_vq_state(vf, qid, state->avail_index);
 }
 
 static void ifcvf_vdpa_set_vq_cb(struct vdpa_device *vdpa_dev, u16 qid,
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index a38d8dc12192..599519039f8d 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -413,26 +413,28 @@ static bool vdpasim_get_vq_ready(struct vdpa_device *vdpa, u16 idx)
 	return vq->ready;
 }
 
-static int vdpasim_set_vq_state(struct vdpa_device *vdpa, u16 idx, u64 state)
+static int vdpasim_set_vq_state(struct vdpa_device *vdpa, u16 idx,
+				const struct vdpa_vq_state *state)
 {
 	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
 	struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
 	struct vringh *vrh = &vq->vring;
 
 	spin_lock(&vdpasim->lock);
-	vrh->last_avail_idx = state;
+	vrh->last_avail_idx = state->avail_index;
 	spin_unlock(&vdpasim->lock);
 
 	return 0;
 }
 
-static u64 vdpasim_get_vq_state(struct vdpa_device *vdpa, u16 idx)
+static void vdpasim_get_vq_state(struct vdpa_device *vdpa, u16 idx,
+				 struct vdpa_vq_state *state)
 {
 	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
 	struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
 	struct vringh *vrh = &vq->vring;
 
-	return vrh->last_avail_idx;
+	state->avail_index = vrh->last_avail_idx;
 }
 
 static u32 vdpasim_get_vq_align(struct vdpa_device *vdpa)
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 78f9d90c23b1..af98c11c9d26 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -335,6 +335,7 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
 {
 	struct vdpa_device *vdpa = v->vdpa;
 	const struct vdpa_config_ops *ops = vdpa->config;
+	struct vdpa_vq_state vq_state;
 	struct vdpa_callback cb;
 	struct vhost_virtqueue *vq;
 	struct vhost_vring_state s;
@@ -358,8 +359,10 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
 		return 0;
 	}
 
-	if (cmd == VHOST_GET_VRING_BASE)
-		vq->last_avail_idx = ops->get_vq_state(v->vdpa, idx);
+	if (cmd == VHOST_GET_VRING_BASE) {
+		ops->get_vq_state(v->vdpa, idx, &vq_state);
+		vq->last_avail_idx = vq_state.avail_index;
+	}
 
 	r = vhost_vring_ioctl(&v->vdev, cmd, argp);
 	if (r)
@@ -375,7 +378,8 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
 		break;
 
 	case VHOST_SET_VRING_BASE:
-		if (ops->set_vq_state(vdpa, idx, vq->last_avail_idx))
+		vq_state.avail_index = vq->last_avail_idx;
+		if (ops->set_vq_state(vdpa, idx, &vq_state))
 			r = -EINVAL;
 		break;
 
diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 715a159ee1f0..7b088bebffe8 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -27,6 +27,14 @@ struct vdpa_notification_area {
 	resource_size_t size;
 };
 
+/**
+ * vDPA vq_state definition
+ * @avail_index: available index
+ */
+struct vdpa_vq_state {
+	u16	avail_index;
+};
+
 /**
  * vDPA device - representation of a vDPA device
  * @dev: underlying device
@@ -78,12 +86,12 @@ struct vdpa_device {
  * @set_vq_state:		Set the state for a virtqueue
  *				@vdev: vdpa device
  *				@idx: virtqueue index
- *				@state: virtqueue state (last_avail_idx)
+ *				@state: pointer to set virtqueue state (last_avail_idx)
  *				Returns integer: success (0) or error (< 0)
  * @get_vq_state:		Get the state for a virtqueue
  *				@vdev: vdpa device
  *				@idx: virtqueue index
- *				Returns virtqueue state (last_avail_idx)
+ *				@state: pointer to returned state (last_avail_idx)
  * @get_vq_notification: 	Get the notification area for a virtqueue
  *				@vdev: vdpa device
  *				@idx: virtqueue index
@@ -175,8 +183,10 @@ struct vdpa_config_ops {
 			  struct vdpa_callback *cb);
 	void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool ready);
 	bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
-	int (*set_vq_state)(struct vdpa_device *vdev, u16 idx, u64 state);
-	u64 (*get_vq_state)(struct vdpa_device *vdev, u16 idx);
+	int (*set_vq_state)(struct vdpa_device *vdev, u16 idx,
+			    const struct vdpa_vq_state *state);
+	void (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
+			     struct vdpa_vq_state *state);
 	struct vdpa_notification_area
 	(*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
 
-- 
2.26.0

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V2 vhost next 05/10] vhost: Fix documentation
  2020-07-20  7:14 [PATCH V2 vhost next 00/10] VDPA support for Mellanox ConnectX devices Eli Cohen
                   ` (3 preceding siblings ...)
  2020-07-20  7:14 ` [PATCH V2 vhost next 04/10] net/vdpa: Use struct for set/get vq state Eli Cohen
@ 2020-07-20  7:14 ` Eli Cohen
  2020-07-21  5:35   ` Jason Wang
  2020-07-20  7:14 ` [PATCH V2 vhost next 06/10] vdpa: Modify get_vq_state() to return error code Eli Cohen
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 18+ messages in thread
From: Eli Cohen @ 2020-07-20  7:14 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: shahafs, saeedm, parav, Eli Cohen

Fix documentation to match actual function prototypes.

Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
---
 drivers/vhost/iotlb.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/iotlb.c b/drivers/vhost/iotlb.c
index 1f0ca6e44410..0d4213a54a88 100644
--- a/drivers/vhost/iotlb.c
+++ b/drivers/vhost/iotlb.c
@@ -149,7 +149,7 @@ EXPORT_SYMBOL_GPL(vhost_iotlb_free);
  * vhost_iotlb_itree_first - return the first overlapped range
  * @iotlb: the IOTLB
  * @start: start of IOVA range
- * @end: end of IOVA range
+ * @last: last byte in IOVA range
  */
 struct vhost_iotlb_map *
 vhost_iotlb_itree_first(struct vhost_iotlb *iotlb, u64 start, u64 last)
@@ -162,7 +162,7 @@ EXPORT_SYMBOL_GPL(vhost_iotlb_itree_first);
  * vhost_iotlb_itree_first - return the next overlapped range
  * @iotlb: the IOTLB
  * @start: start of IOVA range
- * @end: end of IOVA range
+ * @last: last byte IOVA range
  */
 struct vhost_iotlb_map *
 vhost_iotlb_itree_next(struct vhost_iotlb_map *map, u64 start, u64 last)
-- 
2.26.0

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V2 vhost next 06/10] vdpa: Modify get_vq_state() to return error code
  2020-07-20  7:14 [PATCH V2 vhost next 00/10] VDPA support for Mellanox ConnectX devices Eli Cohen
                   ` (4 preceding siblings ...)
  2020-07-20  7:14 ` [PATCH V2 vhost next 05/10] vhost: Fix documentation Eli Cohen
@ 2020-07-20  7:14 ` Eli Cohen
  2020-07-21  5:34   ` Jason Wang
  2020-07-20  7:14 ` [PATCH V2 vhost next 07/10] vdpa/mlx5: Add hardware descriptive header file Eli Cohen
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 18+ messages in thread
From: Eli Cohen @ 2020-07-20  7:14 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: shahafs, saeedm, parav, Eli Cohen

Modify get_vq_state() so it returns an error code. In case of hardware
acceleration, the available index may be retrieved from the device, an
operation that can possibly fail.

Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
---
 drivers/vdpa/ifcvf/ifcvf_main.c  | 5 +++--
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 5 +++--
 drivers/vhost/vdpa.c             | 5 ++++-
 include/linux/vdpa.h             | 4 ++--
 4 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
index 69032ee97824..d9b5f465ac81 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
@@ -235,12 +235,13 @@ static u16 ifcvf_vdpa_get_vq_num_max(struct vdpa_device *vdpa_dev)
 	return IFCVF_QUEUE_MAX;
 }
 
-static void ifcvf_vdpa_get_vq_state(struct vdpa_device *vdpa_dev, u16 qid,
-				    struct vdpa_vq_state *state)
+static int ifcvf_vdpa_get_vq_state(struct vdpa_device *vdpa_dev, u16 qid,
+				   struct vdpa_vq_state *state)
 {
 	struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
 
 	state->avail_index = ifcvf_get_vq_state(vf, qid);
+	return 0;
 }
 
 static int ifcvf_vdpa_set_vq_state(struct vdpa_device *vdpa_dev, u16 qid,
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 599519039f8d..ddf6086d43c2 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -427,14 +427,15 @@ static int vdpasim_set_vq_state(struct vdpa_device *vdpa, u16 idx,
 	return 0;
 }
 
-static void vdpasim_get_vq_state(struct vdpa_device *vdpa, u16 idx,
-				 struct vdpa_vq_state *state)
+static int vdpasim_get_vq_state(struct vdpa_device *vdpa, u16 idx,
+				struct vdpa_vq_state *state)
 {
 	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
 	struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
 	struct vringh *vrh = &vq->vring;
 
 	state->avail_index = vrh->last_avail_idx;
+	return 0;
 }
 
 static u32 vdpasim_get_vq_align(struct vdpa_device *vdpa)
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index af98c11c9d26..fadad74f882e 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -360,7 +360,10 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
 	}
 
 	if (cmd == VHOST_GET_VRING_BASE) {
-		ops->get_vq_state(v->vdpa, idx, &vq_state);
+		r = ops->get_vq_state(v->vdpa, idx, &vq_state);
+		if (r)
+			return r;
+
 		vq->last_avail_idx = vq_state.avail_index;
 	}
 
diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 7b088bebffe8..000d71a9f988 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -185,8 +185,8 @@ struct vdpa_config_ops {
 	bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
 	int (*set_vq_state)(struct vdpa_device *vdev, u16 idx,
 			    const struct vdpa_vq_state *state);
-	void (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
-			     struct vdpa_vq_state *state);
+	int (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
+			    struct vdpa_vq_state *state);
 	struct vdpa_notification_area
 	(*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
 
-- 
2.26.0

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V2 vhost next 07/10] vdpa/mlx5: Add hardware descriptive header file
  2020-07-20  7:14 [PATCH V2 vhost next 00/10] VDPA support for Mellanox ConnectX devices Eli Cohen
                   ` (5 preceding siblings ...)
  2020-07-20  7:14 ` [PATCH V2 vhost next 06/10] vdpa: Modify get_vq_state() to return error code Eli Cohen
@ 2020-07-20  7:14 ` Eli Cohen
  2020-07-20  7:14 ` [PATCH V2 vhost next 08/10] vdpa/mlx5: Add support library for mlx5 VDPA implementation Eli Cohen
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Eli Cohen @ 2020-07-20  7:14 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: shahafs, saeedm, parav, Eli Cohen

Keep all vdpa related hardware definitions in this file.

Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
---
 drivers/vdpa/mlx5/core/mlx5_vdpa_ifc.h | 168 +++++++++++++++++++++++++
 1 file changed, 168 insertions(+)
 create mode 100644 drivers/vdpa/mlx5/core/mlx5_vdpa_ifc.h

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa_ifc.h b/drivers/vdpa/mlx5/core/mlx5_vdpa_ifc.h
new file mode 100644
index 000000000000..f6f57a29b38e
--- /dev/null
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa_ifc.h
@@ -0,0 +1,168 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2020 Mellanox Technologies Ltd. */
+
+#ifndef __MLX5_VDPA_IFC_H_
+#define __MLX5_VDPA_IFC_H_
+
+#include <linux/mlx5/mlx5_ifc.h>
+
+enum {
+	MLX5_VIRTIO_Q_EVENT_MODE_NO_MSIX_MODE  = 0x0,
+	MLX5_VIRTIO_Q_EVENT_MODE_QP_MODE       = 0x1,
+	MLX5_VIRTIO_Q_EVENT_MODE_MSIX_MODE     = 0x2,
+};
+
+enum {
+	MLX5_VIRTIO_EMULATION_CAP_VIRTIO_QUEUE_TYPE_SPLIT   = 0x1, // do I check this caps?
+	MLX5_VIRTIO_EMULATION_CAP_VIRTIO_QUEUE_TYPE_PACKED  = 0x2,
+};
+
+enum {
+	MLX5_VIRTIO_EMULATION_VIRTIO_QUEUE_TYPE_SPLIT   = 0,
+	MLX5_VIRTIO_EMULATION_VIRTIO_QUEUE_TYPE_PACKED  = 1,
+};
+
+struct mlx5_ifc_virtio_q_bits {
+	u8    virtio_q_type[0x8];
+	u8    reserved_at_8[0x5];
+	u8    event_mode[0x3];
+	u8    queue_index[0x10];
+
+	u8    full_emulation[0x1];
+	u8    virtio_version_1_0[0x1];
+	u8    reserved_at_22[0x2];
+	u8    offload_type[0x4];
+	u8    event_qpn_or_msix[0x18];
+
+	u8    doorbell_stride_index[0x10];
+	u8    queue_size[0x10];
+
+	u8    device_emulation_id[0x20];
+
+	u8    desc_addr[0x40];
+
+	u8    used_addr[0x40];
+
+	u8    available_addr[0x40];
+
+	u8    virtio_q_mkey[0x20];
+
+	u8    max_tunnel_desc[0x10];
+	u8    reserved_at_170[0x8];
+	u8    error_type[0x8];
+
+	u8    umem_1_id[0x20];
+
+	u8    umem_1_size[0x20];
+
+	u8    umem_1_offset[0x40];
+
+	u8    umem_2_id[0x20];
+
+	u8    umem_2_size[0x20];
+
+	u8    umem_2_offset[0x40];
+
+	u8    umem_3_id[0x20];
+
+	u8    umem_3_size[0x20];
+
+	u8    umem_3_offset[0x40];
+
+	u8    counter_set_id[0x20];
+
+	u8    reserved_at_320[0x8];
+	u8    pd[0x18];
+
+	u8    reserved_at_340[0xc0];
+};
+
+struct mlx5_ifc_virtio_net_q_object_bits {
+	u8    modify_field_select[0x40];
+
+	u8    reserved_at_40[0x20];
+
+	u8    vhca_id[0x10];
+	u8    reserved_at_70[0x10];
+
+	u8    queue_feature_bit_mask_12_3[0xa];
+	u8    dirty_bitmap_dump_enable[0x1];
+	u8    vhost_log_page[0x5];
+	u8    reserved_at_90[0xc];
+	u8    state[0x4];
+
+	u8    reserved_at_a0[0x5];
+	u8    queue_feature_bit_mask_2_0[0x3];
+	u8    tisn_or_qpn[0x18];
+
+	u8    dirty_bitmap_mkey[0x20];
+
+	u8    dirty_bitmap_size[0x20];
+
+	u8    dirty_bitmap_addr[0x40];
+
+	u8    hw_available_index[0x10];
+	u8    hw_used_index[0x10];
+
+	u8    reserved_at_160[0xa0];
+
+	struct mlx5_ifc_virtio_q_bits virtio_q_context;
+};
+
+struct mlx5_ifc_create_virtio_net_q_in_bits {
+	struct mlx5_ifc_general_obj_in_cmd_hdr_bits general_obj_in_cmd_hdr;
+
+	struct mlx5_ifc_virtio_net_q_object_bits obj_context;
+};
+
+struct mlx5_ifc_create_virtio_net_q_out_bits {
+	struct mlx5_ifc_general_obj_out_cmd_hdr_bits general_obj_out_cmd_hdr;
+};
+
+struct mlx5_ifc_destroy_virtio_net_q_in_bits {
+	struct mlx5_ifc_general_obj_in_cmd_hdr_bits general_obj_out_cmd_hdr;
+};
+
+struct mlx5_ifc_destroy_virtio_net_q_out_bits {
+	struct mlx5_ifc_general_obj_out_cmd_hdr_bits general_obj_out_cmd_hdr;
+};
+
+struct mlx5_ifc_query_virtio_net_q_in_bits {
+	struct mlx5_ifc_general_obj_in_cmd_hdr_bits general_obj_in_cmd_hdr;
+};
+
+struct mlx5_ifc_query_virtio_net_q_out_bits {
+	struct mlx5_ifc_general_obj_out_cmd_hdr_bits general_obj_out_cmd_hdr;
+
+	struct mlx5_ifc_virtio_net_q_object_bits obj_context;
+};
+
+enum {
+	MLX5_VIRTQ_MODIFY_MASK_STATE                    = (u64)1 << 0,
+	MLX5_VIRTQ_MODIFY_MASK_DIRTY_BITMAP_PARAMS      = (u64)1 << 3,
+	MLX5_VIRTQ_MODIFY_MASK_DIRTY_BITMAP_DUMP_ENABLE = (u64)1 << 4,
+};
+
+enum {
+	MLX5_VIRTIO_NET_Q_OBJECT_STATE_INIT     = 0x0,
+	MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY      = 0x1,
+	MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND  = 0x2,
+	MLX5_VIRTIO_NET_Q_OBJECT_STATE_ERR      = 0x3,
+};
+
+enum {
+	MLX5_RQTC_LIST_Q_TYPE_RQ            = 0x0,
+	MLX5_RQTC_LIST_Q_TYPE_VIRTIO_NET_Q  = 0x1,
+};
+
+struct mlx5_ifc_modify_virtio_net_q_in_bits {
+	struct mlx5_ifc_general_obj_in_cmd_hdr_bits general_obj_in_cmd_hdr;
+
+	struct mlx5_ifc_virtio_net_q_object_bits obj_context;
+};
+
+struct mlx5_ifc_modify_virtio_net_q_out_bits {
+	struct mlx5_ifc_general_obj_out_cmd_hdr_bits general_obj_out_cmd_hdr;
+};
+
+#endif /* __MLX5_VDPA_IFC_H_ */
-- 
2.26.0

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V2 vhost next 08/10] vdpa/mlx5: Add support library for mlx5 VDPA implementation
  2020-07-20  7:14 [PATCH V2 vhost next 00/10] VDPA support for Mellanox ConnectX devices Eli Cohen
                   ` (6 preceding siblings ...)
  2020-07-20  7:14 ` [PATCH V2 vhost next 07/10] vdpa/mlx5: Add hardware descriptive header file Eli Cohen
@ 2020-07-20  7:14 ` Eli Cohen
  2020-07-20  7:14 ` [PATCH V2 vhost next 09/10] vdpa/mlx5: Add shared memory registration code Eli Cohen
  2020-07-20  7:14 ` [PATCH V2 vhost next 10/10] vdpa/mlx5: Add VDPA driver for supported mlx5 devices Eli Cohen
  9 siblings, 0 replies; 18+ messages in thread
From: Eli Cohen @ 2020-07-20  7:14 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: shahafs, saeedm, parav, Eli Cohen

Following patches introduce VDPA network driver for Mellanox Connectx6
devices. This patch provides functionality that will be used by those
patches.

Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
---
 drivers/vdpa/Kconfig               |   8 +
 drivers/vdpa/Makefile              |   1 +
 drivers/vdpa/mlx5/Makefile         |   1 +
 drivers/vdpa/mlx5/core/Makefile    |   1 +
 drivers/vdpa/mlx5/core/mlx5_vdpa.h |  57 ++++++
 drivers/vdpa/mlx5/core/resources.c | 281 +++++++++++++++++++++++++++++
 6 files changed, 349 insertions(+)
 create mode 100644 drivers/vdpa/mlx5/Makefile
 create mode 100644 drivers/vdpa/mlx5/core/Makefile
 create mode 100644 drivers/vdpa/mlx5/core/mlx5_vdpa.h
 create mode 100644 drivers/vdpa/mlx5/core/resources.c

diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
index 3e1ceb8e9f2b..48a1a776dd86 100644
--- a/drivers/vdpa/Kconfig
+++ b/drivers/vdpa/Kconfig
@@ -28,4 +28,12 @@ config IFCVF
 	  To compile this driver as a module, choose M here: the module will
 	  be called ifcvf.
 
+config MLX5_VDPA
+	bool "MLX5 VDPA support library for ConnectX devices"
+	depends on MLX5_CORE
+	default n
+	help
+	  Support library for Mellanox VDPA drivers. Provides code that is
+	  common for all types of VDPA drivers.
+
 endif # VDPA
diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
index 8bbb686ca7a2..d160e9b63a66 100644
--- a/drivers/vdpa/Makefile
+++ b/drivers/vdpa/Makefile
@@ -2,3 +2,4 @@
 obj-$(CONFIG_VDPA) += vdpa.o
 obj-$(CONFIG_VDPA_SIM) += vdpa_sim/
 obj-$(CONFIG_IFCVF)    += ifcvf/
+obj-$(CONFIG_MLX5_VDPA) += mlx5/
diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
new file mode 100644
index 000000000000..d552abb1d126
--- /dev/null
+++ b/drivers/vdpa/mlx5/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_MLX5_VDPA) += core/resources.o
diff --git a/drivers/vdpa/mlx5/core/Makefile b/drivers/vdpa/mlx5/core/Makefile
new file mode 100644
index 000000000000..7070f8c8680d
--- /dev/null
+++ b/drivers/vdpa/mlx5/core/Makefile
@@ -0,0 +1 @@
+obj-y += resources.o
diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
new file mode 100644
index 000000000000..f3571c8b257e
--- /dev/null
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2020 Mellanox Technologies Ltd. */
+
+#ifndef __MLX5_VDPA_H__
+#define __MLX5_VDPA_H__
+
+#include <linux/vdpa.h>
+#include <linux/mlx5/driver.h>
+
+struct mlx5_vdpa_resources {
+	u32 pdn;
+	struct mlx5_uars_page *uar;
+	void __iomem *kick_addr;
+	u16 uid;
+	u32 null_mkey;
+	bool valid;
+};
+
+struct mlx5_vdpa_dev {
+	struct vdpa_device vdev;
+	struct mlx5_core_dev *mdev;
+	struct mlx5_vdpa_resources res;
+
+	u64 mlx_features;
+	u64 actual_features;
+	u8 status;
+	u32 max_vqs;
+	u32 generation;
+};
+
+int mlx5_vdpa_alloc_pd(struct mlx5_vdpa_dev *dev, u32 *pdn, u16 uid);
+int mlx5_vdpa_dealloc_pd(struct mlx5_vdpa_dev *dev, u32 pdn, u16 uid);
+int mlx5_vdpa_get_null_mkey(struct mlx5_vdpa_dev *dev, u32 *null_mkey);
+int mlx5_vdpa_create_tis(struct mlx5_vdpa_dev *mvdev, void *in, u32 *tisn);
+void mlx5_vdpa_destroy_tis(struct mlx5_vdpa_dev *mvdev, u32 tisn);
+int mlx5_vdpa_create_rqt(struct mlx5_vdpa_dev *mvdev, void *in, int inlen, u32 *rqtn);
+void mlx5_vdpa_destroy_rqt(struct mlx5_vdpa_dev *mvdev, u32 rqtn);
+int mlx5_vdpa_create_tir(struct mlx5_vdpa_dev *mvdev, void *in, u32 *tirn);
+void mlx5_vdpa_destroy_tir(struct mlx5_vdpa_dev *mvdev, u32 tirn);
+int mlx5_vdpa_alloc_transport_domain(struct mlx5_vdpa_dev *mvdev, u32 *tdn);
+void mlx5_vdpa_dealloc_transport_domain(struct mlx5_vdpa_dev *mvdev, u32 tdn);
+int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev);
+void mlx5_vdpa_free_resources(struct mlx5_vdpa_dev *mvdev);
+
+#define mlx5_vdpa_warn(__dev, format, ...)                                                         \
+	dev_warn((__dev)->mdev->device, "%s:%d:(pid %d) warning: " format, __func__, __LINE__,     \
+		 current->pid, ##__VA_ARGS__)
+
+#define mlx5_vdpa_info(__dev, format, ...)                                                         \
+	dev_info((__dev)->mdev->device, "%s:%d:(pid %d): " format, __func__, __LINE__,             \
+		 current->pid, ##__VA_ARGS__)
+
+#define mlx5_vdpa_dbg(__dev, format, ...)                                                          \
+	dev_debug((__dev)->mdev->device, "%s:%d:(pid %d): " format, __func__, __LINE__,            \
+		  current->pid, ##__VA_ARGS__)
+
+#endif /* __MLX5_VDPA_H__ */
diff --git a/drivers/vdpa/mlx5/core/resources.c b/drivers/vdpa/mlx5/core/resources.c
new file mode 100644
index 000000000000..6c6552b7e9b5
--- /dev/null
+++ b/drivers/vdpa/mlx5/core/resources.c
@@ -0,0 +1,281 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2020 Mellanox Technologies Ltd. */
+
+#include <linux/mlx5/driver.h>
+#include "mlx5_vdpa.h"
+
+static int alloc_pd(struct mlx5_vdpa_dev *dev, u32 *pdn, u16 uid)
+{
+	struct mlx5_core_dev *mdev = dev->mdev;
+
+	u32 out[MLX5_ST_SZ_DW(alloc_pd_out)] = {};
+	u32 in[MLX5_ST_SZ_DW(alloc_pd_in)] = {};
+	int err;
+
+	MLX5_SET(alloc_pd_in, in, opcode, MLX5_CMD_OP_ALLOC_PD);
+	MLX5_SET(alloc_pd_in, in, uid, uid);
+
+	err = mlx5_cmd_exec_inout(mdev, alloc_pd, in, out);
+	if (!err)
+		*pdn = MLX5_GET(alloc_pd_out, out, pd);
+
+	return err;
+}
+
+static int dealloc_pd(struct mlx5_vdpa_dev *dev, u32 pdn, u16 uid)
+{
+	u32 in[MLX5_ST_SZ_DW(dealloc_pd_in)] = {};
+	struct mlx5_core_dev *mdev = dev->mdev;
+
+	MLX5_SET(dealloc_pd_in, in, opcode, MLX5_CMD_OP_DEALLOC_PD);
+	MLX5_SET(dealloc_pd_in, in, pd, pdn);
+	MLX5_SET(dealloc_pd_in, in, uid, uid);
+	return mlx5_cmd_exec_in(mdev, dealloc_pd, in);
+}
+
+static int get_null_mkey(struct mlx5_vdpa_dev *dev, u32 *null_mkey)
+{
+	u32 out[MLX5_ST_SZ_DW(query_special_contexts_out)] = {};
+	u32 in[MLX5_ST_SZ_DW(query_special_contexts_in)] = {};
+	struct mlx5_core_dev *mdev = dev->mdev;
+	int err;
+
+	MLX5_SET(query_special_contexts_in, in, opcode, MLX5_CMD_OP_QUERY_SPECIAL_CONTEXTS);
+	err = mlx5_cmd_exec_inout(mdev, query_special_contexts, in, out);
+	if (!err)
+		*null_mkey = MLX5_GET(query_special_contexts_out, out, null_mkey);
+	return err;
+}
+
+static int create_uctx(struct mlx5_vdpa_dev *mvdev, u16 *uid)
+{
+	u32 out[MLX5_ST_SZ_DW(create_uctx_out)] = {};
+	int inlen;
+	void *in;
+	int err;
+
+	/* 0 means not supported */
+	if (!MLX5_CAP_GEN(mvdev->mdev, log_max_uctx))
+		return -EOPNOTSUPP;
+
+	inlen = MLX5_ST_SZ_BYTES(create_uctx_in);
+	in = kzalloc(inlen, GFP_KERNEL);
+	if (!in)
+		return -ENOMEM;
+
+	MLX5_SET(create_uctx_in, in, opcode, MLX5_CMD_OP_CREATE_UCTX);
+	MLX5_SET(create_uctx_in, in, uctx.cap, MLX5_UCTX_CAP_RAW_TX);
+
+	err = mlx5_cmd_exec(mvdev->mdev, in, inlen, out, sizeof(out));
+	kfree(in);
+	if (!err)
+		*uid = MLX5_GET(create_uctx_out, out, uid);
+
+	return err;
+}
+
+static void destroy_uctx(struct mlx5_vdpa_dev *mvdev, u32 uid)
+{
+	u32 out[MLX5_ST_SZ_DW(destroy_uctx_out)] = {};
+	u32 in[MLX5_ST_SZ_DW(destroy_uctx_in)] = {};
+
+	MLX5_SET(destroy_uctx_in, in, opcode, MLX5_CMD_OP_DESTROY_UCTX);
+	MLX5_SET(destroy_uctx_in, in, uid, uid);
+
+	mlx5_cmd_exec(mvdev->mdev, in, sizeof(in), out, sizeof(out));
+}
+
+int mlx5_vdpa_create_tis(struct mlx5_vdpa_dev *mvdev, void *in, u32 *tisn)
+{
+	u32 out[MLX5_ST_SZ_DW(create_tis_out)] = {};
+	int err;
+
+	MLX5_SET(create_tis_in, in, opcode, MLX5_CMD_OP_CREATE_TIS);
+	MLX5_SET(create_tis_in, in, uid, mvdev->res.uid);
+	err = mlx5_cmd_exec_inout(mvdev->mdev, create_tis, in, out);
+	if (!err)
+		*tisn = MLX5_GET(create_tis_out, out, tisn);
+
+	return err;
+}
+
+void mlx5_vdpa_destroy_tis(struct mlx5_vdpa_dev *mvdev, u32 tisn)
+{
+	u32 in[MLX5_ST_SZ_DW(destroy_tis_in)] = {};
+
+	MLX5_SET(destroy_tis_in, in, opcode, MLX5_CMD_OP_DESTROY_TIS);
+	MLX5_SET(destroy_tis_in, in, uid, mvdev->res.uid);
+	MLX5_SET(destroy_tis_in, in, tisn, tisn);
+	mlx5_cmd_exec_in(mvdev->mdev, destroy_tis, in);
+}
+
+int mlx5_vdpa_create_rqt(struct mlx5_vdpa_dev *mvdev, void *in, int inlen, u32 *rqtn)
+{
+	u32 out[MLX5_ST_SZ_DW(create_rqt_out)] = {};
+	int err;
+
+	MLX5_SET(create_rqt_in, in, opcode, MLX5_CMD_OP_CREATE_RQT);
+	err = mlx5_cmd_exec(mvdev->mdev, in, inlen, out, sizeof(out));
+	if (!err)
+		*rqtn = MLX5_GET(create_rqt_out, out, rqtn);
+
+	return err;
+}
+
+void mlx5_vdpa_destroy_rqt(struct mlx5_vdpa_dev *mvdev, u32 rqtn)
+{
+	u32 in[MLX5_ST_SZ_DW(destroy_rqt_in)] = {};
+
+	MLX5_SET(destroy_rqt_in, in, opcode, MLX5_CMD_OP_DESTROY_RQT);
+	MLX5_SET(destroy_rqt_in, in, uid, mvdev->res.uid);
+	MLX5_SET(destroy_rqt_in, in, rqtn, rqtn);
+	mlx5_cmd_exec_in(mvdev->mdev, destroy_rqt, in);
+}
+
+int mlx5_vdpa_create_tir(struct mlx5_vdpa_dev *mvdev, void *in, u32 *tirn)
+{
+	u32 out[MLX5_ST_SZ_DW(create_tir_out)] = {};
+	int err;
+
+	MLX5_SET(create_tir_in, in, opcode, MLX5_CMD_OP_CREATE_TIR);
+	err = mlx5_cmd_exec_inout(mvdev->mdev, create_tir, in, out);
+	if (!err)
+		*tirn = MLX5_GET(create_tir_out, out, tirn);
+
+	return err;
+}
+
+void mlx5_vdpa_destroy_tir(struct mlx5_vdpa_dev *mvdev, u32 tirn)
+{
+	u32 in[MLX5_ST_SZ_DW(destroy_tir_in)] = {};
+
+	MLX5_SET(destroy_tir_in, in, opcode, MLX5_CMD_OP_DESTROY_TIR);
+	MLX5_SET(destroy_tir_in, in, uid, mvdev->res.uid);
+	MLX5_SET(destroy_tir_in, in, tirn, tirn);
+	mlx5_cmd_exec_in(mvdev->mdev, destroy_tir, in);
+}
+
+int mlx5_vdpa_alloc_transport_domain(struct mlx5_vdpa_dev *mvdev, u32 *tdn)
+{
+	u32 out[MLX5_ST_SZ_DW(alloc_transport_domain_out)] = {};
+	u32 in[MLX5_ST_SZ_DW(alloc_transport_domain_in)] = {};
+	int err;
+
+	MLX5_SET(alloc_transport_domain_in, in, opcode, MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN);
+	MLX5_SET(alloc_transport_domain_in, in, uid, mvdev->res.uid);
+
+	err = mlx5_cmd_exec_inout(mvdev->mdev, alloc_transport_domain, in, out);
+	if (!err)
+		*tdn = MLX5_GET(alloc_transport_domain_out, out, transport_domain);
+
+	return err;
+}
+
+void mlx5_vdpa_dealloc_transport_domain(struct mlx5_vdpa_dev *mvdev, u32 tdn)
+{
+	u32 in[MLX5_ST_SZ_DW(dealloc_transport_domain_in)] = {};
+
+	MLX5_SET(dealloc_transport_domain_in, in, opcode, MLX5_CMD_OP_DEALLOC_TRANSPORT_DOMAIN);
+	MLX5_SET(dealloc_transport_domain_in, in, uid, mvdev->res.uid);
+	MLX5_SET(dealloc_transport_domain_in, in, transport_domain, tdn);
+	mlx5_cmd_exec_in(mvdev->mdev, dealloc_transport_domain, in);
+}
+
+int mlx5_vdpa_create_mkey(struct mlx5_vdpa_dev *mvdev, struct mlx5_core_mkey *mkey, u32 *in,
+			  int inlen)
+{
+	u32 lout[MLX5_ST_SZ_DW(create_mkey_out)] = {};
+	u32 mkey_index;
+	void *mkc;
+	int err;
+
+	MLX5_SET(create_mkey_in, in, opcode, MLX5_CMD_OP_CREATE_MKEY);
+	MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
+
+	err = mlx5_cmd_exec(mvdev->mdev, in, inlen, lout, sizeof(lout));
+	if (err)
+		return err;
+
+	mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
+	mkey_index = MLX5_GET(create_mkey_out, lout, mkey_index);
+	mkey->iova = MLX5_GET64(mkc, mkc, start_addr);
+	mkey->size = MLX5_GET64(mkc, mkc, len);
+	mkey->key |= mlx5_idx_to_mkey(mkey_index);
+	mkey->pd = MLX5_GET(mkc, mkc, pd);
+	return 0;
+}
+
+int mlx5_vdpa_destroy_mkey(struct mlx5_vdpa_dev *mvdev, struct mlx5_core_mkey *mkey)
+{
+	u32 in[MLX5_ST_SZ_DW(destroy_mkey_in)] = {};
+
+	MLX5_SET(destroy_mkey_in, in, uid, mvdev->res.uid);
+	MLX5_SET(destroy_mkey_in, in, opcode, MLX5_CMD_OP_DESTROY_MKEY);
+	MLX5_SET(destroy_mkey_in, in, mkey_index, mlx5_mkey_to_idx(mkey->key));
+	return mlx5_cmd_exec_in(mvdev->mdev, destroy_mkey, in);
+}
+
+int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
+{
+	u64 offset = MLX5_CAP64_DEV_VDPA_EMULATION(mvdev->mdev, doorbell_bar_offset);
+	struct mlx5_vdpa_resources *res = &mvdev->res;
+	struct mlx5_core_dev *mdev = mvdev->mdev;
+	u64 kick_addr;
+	int err;
+
+	if (res->valid) {
+		mlx5_vdpa_warn(mvdev, "resources already allocated\n");
+		return -EINVAL;
+	}
+	res->uar = mlx5_get_uars_page(mdev);
+	if (IS_ERR(res->uar)) {
+		err = PTR_ERR(res->uar);
+		goto err_uars;
+	}
+
+	err = create_uctx(mvdev, &res->uid);
+	if (err)
+		goto err_uctx;
+
+	err = alloc_pd(mvdev, &res->pdn, res->uid);
+	if (err)
+		goto err_pd;
+
+	err = get_null_mkey(mvdev, &res->null_mkey);
+	if (err)
+		goto err_key;
+
+	kick_addr = pci_resource_start(mdev->pdev, 0) + offset;
+	res->kick_addr = ioremap(kick_addr, PAGE_SIZE);
+	if (!res->kick_addr) {
+		err = -ENOMEM;
+		goto err_key;
+	}
+	res->valid = true;
+
+	return 0;
+
+err_key:
+	dealloc_pd(mvdev, res->pdn, res->uid);
+err_pd:
+	destroy_uctx(mvdev, res->uid);
+err_uctx:
+	mlx5_put_uars_page(mdev, res->uar);
+err_uars:
+	return err;
+}
+
+void mlx5_vdpa_free_resources(struct mlx5_vdpa_dev *mvdev)
+{
+	struct mlx5_vdpa_resources *res = &mvdev->res;
+
+	if (!res->valid)
+		return;
+
+	iounmap(res->kick_addr);
+	res->kick_addr = NULL;
+	dealloc_pd(mvdev, res->pdn, res->uid);
+	destroy_uctx(mvdev, res->uid);
+	mlx5_put_uars_page(mvdev->mdev, res->uar);
+	res->valid = false;
+}
-- 
2.26.0

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V2 vhost next 09/10] vdpa/mlx5: Add shared memory registration code
  2020-07-20  7:14 [PATCH V2 vhost next 00/10] VDPA support for Mellanox ConnectX devices Eli Cohen
                   ` (7 preceding siblings ...)
  2020-07-20  7:14 ` [PATCH V2 vhost next 08/10] vdpa/mlx5: Add support library for mlx5 VDPA implementation Eli Cohen
@ 2020-07-20  7:14 ` Eli Cohen
  2020-07-20  7:14 ` [PATCH V2 vhost next 10/10] vdpa/mlx5: Add VDPA driver for supported mlx5 devices Eli Cohen
  9 siblings, 0 replies; 18+ messages in thread
From: Eli Cohen @ 2020-07-20  7:14 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: shahafs, saeedm, parav, Eli Cohen

Add code to support registering address space region for the device. The
virtio driver can run as either:
1. Guest virtio driver
2. Userspace virtio driver on the host
3. Kernel virtio driver on the host

In any case a memory key object is required to provide access to memory
for the device.

This code will be shared by network or block driver implementations.

Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
---
V0 -> V2:
1. Improve changelog
2. Fix static checker warning

 drivers/vdpa/mlx5/Makefile         |   2 +-
 drivers/vdpa/mlx5/core/Makefile    |   1 -
 drivers/vdpa/mlx5/core/mlx5_vdpa.h |  34 +++
 drivers/vdpa/mlx5/core/mr.c        | 473 +++++++++++++++++++++++++++++
 drivers/vdpa/mlx5/core/resources.c |   3 +
 5 files changed, 511 insertions(+), 2 deletions(-)
 delete mode 100644 drivers/vdpa/mlx5/core/Makefile
 create mode 100644 drivers/vdpa/mlx5/core/mr.c

diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
index d552abb1d126..b347c62032ea 100644
--- a/drivers/vdpa/mlx5/Makefile
+++ b/drivers/vdpa/mlx5/Makefile
@@ -1 +1 @@
-obj-$(CONFIG_MLX5_VDPA) += core/resources.o
+obj-$(CONFIG_MLX5_VDPA) += core/resources.o core/mr.o
diff --git a/drivers/vdpa/mlx5/core/Makefile b/drivers/vdpa/mlx5/core/Makefile
deleted file mode 100644
index 7070f8c8680d..000000000000
--- a/drivers/vdpa/mlx5/core/Makefile
+++ /dev/null
@@ -1 +0,0 @@
-obj-y += resources.o
diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index f3571c8b257e..5c92a576edae 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -7,6 +7,31 @@
 #include <linux/vdpa.h>
 #include <linux/mlx5/driver.h>
 
+struct mlx5_vdpa_direct_mr {
+	u64 start;
+	u64 end;
+	u32 perm;
+	struct mlx5_core_mkey mr;
+	struct sg_table sg_head;
+	int log_size;
+	int nsg;
+	struct list_head list;
+	u64 offset;
+};
+
+struct mlx5_vdpa_mr {
+	struct mlx5_core_mkey mkey;
+
+	/* list of direct MRs descendants of this indirect mr */
+	struct list_head head;
+	unsigned long num_directs;
+	unsigned long num_klms;
+	bool initialized;
+
+	/* serialize mkey creation and destruction */
+	struct mutex mkey_mtx;
+};
+
 struct mlx5_vdpa_resources {
 	u32 pdn;
 	struct mlx5_uars_page *uar;
@@ -26,6 +51,8 @@ struct mlx5_vdpa_dev {
 	u8 status;
 	u32 max_vqs;
 	u32 generation;
+
+	struct mlx5_vdpa_mr mr;
 };
 
 int mlx5_vdpa_alloc_pd(struct mlx5_vdpa_dev *dev, u32 *pdn, u16 uid);
@@ -41,6 +68,13 @@ int mlx5_vdpa_alloc_transport_domain(struct mlx5_vdpa_dev *mvdev, u32 *tdn);
 void mlx5_vdpa_dealloc_transport_domain(struct mlx5_vdpa_dev *mvdev, u32 tdn);
 int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev);
 void mlx5_vdpa_free_resources(struct mlx5_vdpa_dev *mvdev);
+int mlx5_vdpa_create_mkey(struct mlx5_vdpa_dev *mvdev, struct mlx5_core_mkey *mkey, u32 *in,
+			  int inlen);
+int mlx5_vdpa_destroy_mkey(struct mlx5_vdpa_dev *mvdev, struct mlx5_core_mkey *mkey);
+int mlx5_vdpa_handle_set_map(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb,
+			     bool *change_map);
+int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb);
+void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev);
 
 #define mlx5_vdpa_warn(__dev, format, ...)                                                         \
 	dev_warn((__dev)->mdev->device, "%s:%d:(pid %d) warning: " format, __func__, __LINE__,     \
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
new file mode 100644
index 000000000000..975aa45fd78b
--- /dev/null
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -0,0 +1,473 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2020 Mellanox Technologies Ltd. */
+
+#include <linux/vdpa.h>
+#include <linux/gcd.h>
+#include <linux/string.h>
+#include <linux/mlx5/qp.h>
+#include "mlx5_vdpa.h"
+
+static int get_octo_len(u64 len, int page_shift)
+{
+	u64 page_size = 1ULL << page_shift;
+	int npages;
+
+	npages = ALIGN(len, page_size) >> page_shift;
+	return (npages + 1) / 2;
+}
+
+static void fill_sg(struct mlx5_vdpa_direct_mr *mr, void *in)
+{
+	struct scatterlist *sg;
+	__be64 *pas;
+	int i;
+
+	pas = MLX5_ADDR_OF(create_mkey_in, in, klm_pas_mtt);
+	for_each_sg(mr->sg_head.sgl, sg, mr->nsg, i)
+		(*pas) = cpu_to_be64(sg_dma_address(sg));
+}
+
+static void mlx5_set_access_mode(void *mkc, int mode)
+{
+	MLX5_SET(mkc, mkc, access_mode_1_0, mode & 0x3);
+	MLX5_SET(mkc, mkc, access_mode_4_2, mode >> 2);
+}
+
+static void populate_mtts(struct mlx5_vdpa_direct_mr *mr, __be64 *mtt)
+{
+	struct scatterlist *sg;
+	int i;
+
+	for_each_sg(mr->sg_head.sgl, sg, mr->nsg, i)
+		mtt[i] = cpu_to_be64(sg_dma_address(sg));
+}
+
+static int create_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct_mr *mr)
+{
+	int inlen;
+	void *mkc;
+	void *in;
+	int err;
+
+	inlen = MLX5_ST_SZ_BYTES(create_mkey_in) + roundup(MLX5_ST_SZ_BYTES(mtt) * mr->nsg, 16);
+	in = kvzalloc(inlen, GFP_KERNEL);
+	if (!in)
+		return -ENOMEM;
+
+	MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
+	fill_sg(mr, in);
+	mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
+	MLX5_SET(mkc, mkc, lw, !!(mr->perm & VHOST_MAP_WO));
+	MLX5_SET(mkc, mkc, lr, !!(mr->perm & VHOST_MAP_RO));
+	mlx5_set_access_mode(mkc, MLX5_MKC_ACCESS_MODE_MTT);
+	MLX5_SET(mkc, mkc, qpn, 0xffffff);
+	MLX5_SET(mkc, mkc, pd, mvdev->res.pdn);
+	MLX5_SET64(mkc, mkc, start_addr, mr->offset);
+	MLX5_SET64(mkc, mkc, len, mr->end - mr->start);
+	MLX5_SET(mkc, mkc, log_page_size, mr->log_size);
+	MLX5_SET(mkc, mkc, translations_octword_size,
+		 get_octo_len(mr->end - mr->start, mr->log_size));
+	MLX5_SET(create_mkey_in, in, translations_octword_actual_size,
+		 get_octo_len(mr->end - mr->start, mr->log_size));
+	populate_mtts(mr, MLX5_ADDR_OF(create_mkey_in, in, klm_pas_mtt));
+	err = mlx5_vdpa_create_mkey(mvdev, &mr->mr, in, inlen);
+	kvfree(in);
+	if (err) {
+		mlx5_vdpa_warn(mvdev, "Failed to create direct MR\n");
+		return err;
+	}
+
+	return 0;
+}
+
+static void destroy_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct_mr *mr)
+{
+	mlx5_vdpa_destroy_mkey(mvdev, &mr->mr);
+}
+
+static u64 map_start(struct vhost_iotlb_map *map, struct mlx5_vdpa_direct_mr *mr)
+{
+	return max_t(u64, map->start, mr->start);
+}
+
+static u64 map_end(struct vhost_iotlb_map *map, struct mlx5_vdpa_direct_mr *mr)
+{
+	return min_t(u64, map->last + 1, mr->end);
+}
+
+static u64 maplen(struct vhost_iotlb_map *map, struct mlx5_vdpa_direct_mr *mr)
+{
+	return map_end(map, mr) - map_start(map, mr);
+}
+
+#define MLX5_VDPA_INVALID_START_ADDR ((u64)-1)
+#define MLX5_VDPA_INVALID_LEN ((u64)-1)
+
+static u64 indir_start_addr(struct mlx5_vdpa_mr *mkey)
+{
+	struct mlx5_vdpa_direct_mr *s;
+
+	s = list_first_entry_or_null(&mkey->head, struct mlx5_vdpa_direct_mr, list);
+	if (!s)
+		return MLX5_VDPA_INVALID_START_ADDR;
+
+	return s->start;
+}
+
+static u64 indir_len(struct mlx5_vdpa_mr *mkey)
+{
+	struct mlx5_vdpa_direct_mr *s;
+	struct mlx5_vdpa_direct_mr *e;
+
+	s = list_first_entry_or_null(&mkey->head, struct mlx5_vdpa_direct_mr, list);
+	if (!s)
+		return MLX5_VDPA_INVALID_LEN;
+
+	e = list_last_entry(&mkey->head, struct mlx5_vdpa_direct_mr, list);
+
+	return e->end - s->start;
+}
+
+#define MAX_KLM_SIZE 0x40000000
+
+static u32 klm_bcount(u64 size)
+{
+	return (u32)size;
+}
+
+static void fill_indir(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mkey, void *in)
+{
+	struct mlx5_vdpa_direct_mr *dmr;
+	struct mlx5_klm *klmarr;
+	struct mlx5_klm *klm;
+	bool first = true;
+	u64 preve;
+	int i;
+
+	klmarr = MLX5_ADDR_OF(create_mkey_in, in, klm_pas_mtt);
+	i = 0;
+	list_for_each_entry(dmr, &mkey->head, list) {
+again:
+		klm = &klmarr[i++];
+		if (first) {
+			preve = dmr->start;
+			first = false;
+		}
+
+		if (preve == dmr->start) {
+			klm->key = cpu_to_be32(dmr->mr.key);
+			klm->bcount = cpu_to_be32(klm_bcount(dmr->end - dmr->start));
+			preve = dmr->end;
+		} else {
+			klm->key = cpu_to_be32(mvdev->res.null_mkey);
+			klm->bcount = cpu_to_be32(klm_bcount(dmr->start - preve));
+			preve = dmr->start;
+			goto again;
+		}
+	}
+}
+
+static int klm_byte_size(int nklms)
+{
+	return 16 * ALIGN(nklms, 4);
+}
+
+static int create_indirect_key(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr)
+{
+	int inlen;
+	void *mkc;
+	void *in;
+	int err;
+	u64 start;
+	u64 len;
+
+	start = indir_start_addr(mr);
+	len = indir_len(mr);
+	if (start == MLX5_VDPA_INVALID_START_ADDR || len == MLX5_VDPA_INVALID_LEN)
+		return -EINVAL;
+
+	inlen = MLX5_ST_SZ_BYTES(create_mkey_in) + klm_byte_size(mr->num_klms);
+	in = kzalloc(inlen, GFP_KERNEL);
+	if (!in)
+		return -ENOMEM;
+
+	MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
+	mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
+	MLX5_SET(mkc, mkc, lw, 1);
+	MLX5_SET(mkc, mkc, lr, 1);
+	mlx5_set_access_mode(mkc, MLX5_MKC_ACCESS_MODE_KLMS);
+	MLX5_SET(mkc, mkc, qpn, 0xffffff);
+	MLX5_SET(mkc, mkc, pd, mvdev->res.pdn);
+	MLX5_SET64(mkc, mkc, start_addr, start);
+	MLX5_SET64(mkc, mkc, len, len);
+	MLX5_SET(mkc, mkc, translations_octword_size, klm_byte_size(mr->num_klms) / 16);
+	MLX5_SET(create_mkey_in, in, translations_octword_actual_size, mr->num_klms);
+	fill_indir(mvdev, mr, in);
+	err = mlx5_vdpa_create_mkey(mvdev, &mr->mkey, in, inlen);
+	kfree(in);
+	return err;
+}
+
+static void destroy_indirect_key(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mkey)
+{
+	mlx5_vdpa_destroy_mkey(mvdev, &mkey->mkey);
+}
+
+static int map_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct_mr *mr,
+			 struct vhost_iotlb *iotlb)
+{
+	struct vhost_iotlb_map *map;
+	unsigned long lgcd = 0;
+	int log_entity_size;
+	unsigned long size;
+	u64 start = 0;
+	int err;
+	struct page *pg;
+	unsigned int nsg;
+	int sglen;
+	u64 pa;
+	u64 paend;
+	struct scatterlist *sg;
+	struct device *dma = mvdev->mdev->device;
+	int ret;
+
+	for (map = vhost_iotlb_itree_first(iotlb, mr->start, mr->end - 1);
+	     map; map = vhost_iotlb_itree_next(map, start, mr->end - 1)) {
+		size = maplen(map, mr);
+		lgcd = gcd(lgcd, size);
+		start += size;
+	}
+	log_entity_size = ilog2(lgcd);
+
+	sglen = 1 << log_entity_size;
+	nsg = DIV_ROUND_UP(mr->end - mr->start, sglen);
+
+	err = sg_alloc_table(&mr->sg_head, nsg, GFP_KERNEL);
+	if (err)
+		return err;
+
+	sg = mr->sg_head.sgl;
+	for (map = vhost_iotlb_itree_first(iotlb, mr->start, mr->end - 1);
+	     map; map = vhost_iotlb_itree_next(map, mr->start, mr->end - 1)) {
+		paend = map->addr + maplen(map, mr);
+		for (pa = map->addr; pa < paend; pa += sglen) {
+			pg = pfn_to_page(__phys_to_pfn(pa));
+			if (!sg) {
+				mlx5_vdpa_warn(mvdev, "sg null. start 0x%llx, end 0x%llx\n",
+					       map->start, map->last + 1);
+				err = -ENOMEM;
+				goto err_map;
+			}
+			sg_set_page(sg, pg, sglen, 0);
+			sg = sg_next(sg);
+			if (!sg)
+				goto done;
+		}
+	}
+done:
+	mr->log_size = log_entity_size;
+	mr->nsg = nsg;
+	ret = dma_map_sg_attrs(dma, mr->sg_head.sgl, mr->nsg, DMA_BIDIRECTIONAL, 0);
+	if (!ret)
+		goto err_map;
+
+	err = create_direct_mr(mvdev, mr);
+	if (err)
+		goto err_direct;
+
+	return 0;
+
+err_direct:
+	dma_unmap_sg_attrs(dma, mr->sg_head.sgl, mr->nsg, DMA_BIDIRECTIONAL, 0);
+err_map:
+	sg_free_table(&mr->sg_head);
+	return err;
+}
+
+static void unmap_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct_mr *mr)
+{
+	struct device *dma = mvdev->mdev->device;
+
+	destroy_direct_mr(mvdev, mr);
+	dma_unmap_sg_attrs(dma, mr->sg_head.sgl, mr->nsg, DMA_BIDIRECTIONAL, 0);
+	sg_free_table(&mr->sg_head);
+}
+
+static int add_direct_chain(struct mlx5_vdpa_dev *mvdev, u64 start, u64 size, u8 perm,
+			    struct vhost_iotlb *iotlb)
+{
+	struct mlx5_vdpa_mr *mr = &mvdev->mr;
+	struct mlx5_vdpa_direct_mr *dmr;
+	struct mlx5_vdpa_direct_mr *n;
+	LIST_HEAD(tmp);
+	u64 st;
+	u64 sz;
+	int err;
+	int i = 0;
+
+	st = start;
+	while (size) {
+		sz = (u32)min_t(u64, MAX_KLM_SIZE, size);
+		dmr = kzalloc(sizeof(*dmr), GFP_KERNEL);
+		if (!dmr)
+			goto err_alloc;
+
+		dmr->start = st;
+		dmr->end = st + sz;
+		dmr->perm = perm;
+		err = map_direct_mr(mvdev, dmr, iotlb);
+		if (err) {
+			kfree(dmr);
+			goto err_alloc;
+		}
+
+		list_add_tail(&dmr->list, &tmp);
+		size -= sz;
+		mr->num_directs++;
+		mr->num_klms++;
+		st += sz;
+		i++;
+	}
+	list_splice_tail(&tmp, &mr->head);
+	return 0;
+
+err_alloc:
+	list_for_each_entry_safe(dmr, n, &mr->head, list) {
+		list_del_init(&dmr->list);
+		unmap_direct_mr(mvdev, dmr);
+		kfree(dmr);
+	}
+	return err;
+}
+
+/* The iotlb pointer contains a list of maps. Go over the maps, possibly
+ * merging mergeable maps, and create direct memory keys that provide the
+ * device access to memory. The direct mkeys are then referred to by the
+ * indirect memory key that provides access to the enitre address space given
+ * by iotlb.
+ */
+static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb)
+{
+	struct mlx5_vdpa_mr *mr = &mvdev->mr;
+	struct mlx5_vdpa_direct_mr *dmr;
+	struct mlx5_vdpa_direct_mr *n;
+	struct vhost_iotlb_map *map;
+	u32 pperm = U16_MAX;
+	u64 last = U64_MAX;
+	u64 ps = U64_MAX;
+	u64 pe = U64_MAX;
+	u64 start = 0;
+	int err = 0;
+	int nnuls;
+
+	if (mr->initialized)
+		return 0;
+
+	INIT_LIST_HEAD(&mr->head);
+	for (map = vhost_iotlb_itree_first(iotlb, start, last); map;
+	     map = vhost_iotlb_itree_next(map, start, last)) {
+		start = map->start;
+		if (pe == map->start && pperm == map->perm) {
+			pe = map->last + 1;
+		} else {
+			if (ps != U64_MAX) {
+				if (pe < map->start) {
+					/* We have a hole in the map. Check how
+					 * many null keys are required to fill it.
+					 */
+					nnuls = DIV_ROUND_UP(map->start - pe, MAX_KLM_SIZE);
+					mr->num_klms += nnuls;
+				}
+				err = add_direct_chain(mvdev, ps, pe - ps, pperm, iotlb);
+				if (err)
+					goto err_chain;
+			}
+			ps = map->start;
+			pe = map->last + 1;
+			pperm = map->perm;
+		}
+	}
+	err = add_direct_chain(mvdev, ps, pe - ps, pperm, iotlb);
+	if (err)
+		goto err_chain;
+
+	/* Create the memory key that defines the guests's address space. This
+	 * memory key refers to the direct keys that contain the MTT
+	 * translations
+	 */
+	err = create_indirect_key(mvdev, mr);
+	if (err)
+		goto err_chain;
+
+	mr->initialized = true;
+	return 0;
+
+err_chain:
+	list_for_each_entry_safe_reverse(dmr, n, &mr->head, list) {
+		list_del_init(&dmr->list);
+		unmap_direct_mr(mvdev, dmr);
+		kfree(dmr);
+	}
+	return err;
+}
+
+int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb)
+{
+	struct mlx5_vdpa_mr *mr = &mvdev->mr;
+	int err;
+
+	mutex_lock(&mr->mkey_mtx);
+	err = _mlx5_vdpa_create_mr(mvdev, iotlb);
+	mutex_unlock(&mr->mkey_mtx);
+	return err;
+}
+
+void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev)
+{
+	struct mlx5_vdpa_mr *mr = &mvdev->mr;
+	struct mlx5_vdpa_direct_mr *dmr;
+	struct mlx5_vdpa_direct_mr *n;
+
+	mutex_lock(&mr->mkey_mtx);
+	if (!mr->initialized)
+		goto out;
+
+	destroy_indirect_key(mvdev, mr);
+	list_for_each_entry_safe_reverse(dmr, n, &mr->head, list) {
+		list_del_init(&dmr->list);
+		unmap_direct_mr(mvdev, dmr);
+		kfree(dmr);
+	}
+	memset(mr, 0, sizeof(*mr));
+	mr->initialized = false;
+out:
+	mutex_unlock(&mr->mkey_mtx);
+}
+
+static bool map_empty(struct vhost_iotlb *iotlb)
+{
+	return !vhost_iotlb_itree_first(iotlb, 0, U64_MAX);
+}
+
+int mlx5_vdpa_handle_set_map(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb,
+			     bool *change_map)
+{
+	struct mlx5_vdpa_mr *mr = &mvdev->mr;
+	int err;
+
+	*change_map = false;
+	if (map_empty(iotlb)) {
+		mlx5_vdpa_destroy_mr(mvdev);
+		return 0;
+	}
+	mutex_lock(&mr->mkey_mtx);
+	if (mr->initialized) {
+		mlx5_vdpa_info(mvdev, "memory map update\n");
+		*change_map = true;
+	}
+	if (!*change_map)
+		err = _mlx5_vdpa_create_mr(mvdev, iotlb);
+	mutex_unlock(&mr->mkey_mtx);
+
+	return err;
+}
diff --git a/drivers/vdpa/mlx5/core/resources.c b/drivers/vdpa/mlx5/core/resources.c
index 6c6552b7e9b5..96e6421c5d1c 100644
--- a/drivers/vdpa/mlx5/core/resources.c
+++ b/drivers/vdpa/mlx5/core/resources.c
@@ -227,6 +227,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
 		mlx5_vdpa_warn(mvdev, "resources already allocated\n");
 		return -EINVAL;
 	}
+	mutex_init(&mvdev->mr.mkey_mtx);
 	res->uar = mlx5_get_uars_page(mdev);
 	if (IS_ERR(res->uar)) {
 		err = PTR_ERR(res->uar);
@@ -262,6 +263,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
 err_uctx:
 	mlx5_put_uars_page(mdev, res->uar);
 err_uars:
+	mutex_destroy(&mvdev->mr.mkey_mtx);
 	return err;
 }
 
@@ -277,5 +279,6 @@ void mlx5_vdpa_free_resources(struct mlx5_vdpa_dev *mvdev)
 	dealloc_pd(mvdev, res->pdn, res->uid);
 	destroy_uctx(mvdev, res->uid);
 	mlx5_put_uars_page(mvdev->mdev, res->uar);
+	mutex_destroy(&mvdev->mr.mkey_mtx);
 	res->valid = false;
 }
-- 
2.26.0

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V2 vhost next 10/10] vdpa/mlx5: Add VDPA driver for supported mlx5 devices
  2020-07-20  7:14 [PATCH V2 vhost next 00/10] VDPA support for Mellanox ConnectX devices Eli Cohen
                   ` (8 preceding siblings ...)
  2020-07-20  7:14 ` [PATCH V2 vhost next 09/10] vdpa/mlx5: Add shared memory registration code Eli Cohen
@ 2020-07-20  7:14 ` Eli Cohen
  2020-07-20 12:55   ` kernel test robot
                     ` (2 more replies)
  9 siblings, 3 replies; 18+ messages in thread
From: Eli Cohen @ 2020-07-20  7:14 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: shahafs, saeedm, parav, Eli Cohen

Add a front end VDPA driver that registers in the VDPA bus and provides
networking to a guest. The VDPA driver creates the necessary resources
on the VF it is driving such that data path will be offloaded.

Notifications are being communicated through the driver.

Currently, only VFs are supported. In subsequent patches we will have
devlink support to control which VF is used for VDPA and which function
is used for regular networking.

Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
---
Changes from V0:
1. Fix include path usage
2. Fix use after free in qp_create()
3. Consistently use mvq->initialized to check if a vq was initialized.
4. Remove unused local variable.
5. Defer modifyig vq to ready to driver ok
6. suspend hardware vq in set_vq_ready(0)
7. Remove reservation for control VQ since it multi queue is not supported in this version
8. Avoid call put_device() since this is not a pci device driver.

 drivers/vdpa/Kconfig              |   10 +
 drivers/vdpa/mlx5/Makefile        |    5 +-
 drivers/vdpa/mlx5/core/mr.c       |    2 +-
 drivers/vdpa/mlx5/net/main.c      |   76 ++
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 1950 +++++++++++++++++++++++++++++
 drivers/vdpa/mlx5/net/mlx5_vnet.h |   24 +
 6 files changed, 2065 insertions(+), 2 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/net/main.c
 create mode 100644 drivers/vdpa/mlx5/net/mlx5_vnet.c
 create mode 100644 drivers/vdpa/mlx5/net/mlx5_vnet.h

diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
index 48a1a776dd86..809cb4c2eecf 100644
--- a/drivers/vdpa/Kconfig
+++ b/drivers/vdpa/Kconfig
@@ -36,4 +36,14 @@ config MLX5_VDPA
 	  Support library for Mellanox VDPA drivers. Provides code that is
 	  common for all types of VDPA drivers.
 
+config MLX5_VDPA_NET
+	tristate "vDPA driver for ConnectX devices"
+	depends on MLX5_VDPA
+	default n
+	help
+	  VDPA network driver for ConnectX6 and newer. Provides offloading
+	  of virtio net datapath such that descriptors put on the ring will
+	  be executed by the hardware. It also supports a variety of stateless
+	  offloads depending on the actual device used and firmware version.
+
 endif # VDPA
diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
index b347c62032ea..3f8850c1a300 100644
--- a/drivers/vdpa/mlx5/Makefile
+++ b/drivers/vdpa/mlx5/Makefile
@@ -1 +1,4 @@
-obj-$(CONFIG_MLX5_VDPA) += core/resources.o core/mr.o
+subdir-ccflags-y += -I$(src)/core
+
+obj-$(CONFIG_MLX5_VDPA_NET) += mlx5_vdpa.o
+mlx5_vdpa-$(CONFIG_MLX5_VDPA_NET) += net/main.o net/mlx5_vnet.o core/resources.o core/mr.o
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index 975aa45fd78b..34e3bbb80df8 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -453,7 +453,7 @@ int mlx5_vdpa_handle_set_map(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *io
 			     bool *change_map)
 {
 	struct mlx5_vdpa_mr *mr = &mvdev->mr;
-	int err;
+	int err = 0;
 
 	*change_map = false;
 	if (map_empty(iotlb)) {
diff --git a/drivers/vdpa/mlx5/net/main.c b/drivers/vdpa/mlx5/net/main.c
new file mode 100644
index 000000000000..838cd98386ff
--- /dev/null
+++ b/drivers/vdpa/mlx5/net/main.c
@@ -0,0 +1,76 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2020 Mellanox Technologies Ltd. */
+
+#include <linux/module.h>
+#include <linux/mlx5/driver.h>
+#include <linux/mlx5/device.h>
+#include "mlx5_vdpa_ifc.h"
+#include "mlx5_vnet.h"
+
+MODULE_AUTHOR("Eli Cohen <eli@mellanox.com>");
+MODULE_DESCRIPTION("Mellanox VDPA driver");
+MODULE_LICENSE("Dual BSD/GPL");
+
+static bool required_caps_supported(struct mlx5_core_dev *mdev)
+{
+	u8 event_mode;
+	u64 got;
+
+	got = MLX5_CAP_GEN_64(mdev, general_obj_types);
+
+	if (!(got & MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_NET_Q))
+		return false;
+
+	event_mode = MLX5_CAP_DEV_VDPA_EMULATION(mdev, event_mode);
+	if (!(event_mode & MLX5_VIRTIO_Q_EVENT_MODE_QP_MODE))
+		return false;
+
+	if (!MLX5_CAP_DEV_VDPA_EMULATION(mdev, eth_frame_offload_type))
+		return false;
+
+	return true;
+}
+
+static void *mlx5_vdpa_add(struct mlx5_core_dev *mdev)
+{
+	struct mlx5_vdpa_dev *vdev;
+
+	if (mlx5_core_is_pf(mdev))
+		return NULL;
+
+	if (!required_caps_supported(mdev)) {
+		dev_info(mdev->device, "virtio net emulation not supported\n");
+		return NULL;
+	}
+	vdev = mlx5_vdpa_add_dev(mdev);
+	if (IS_ERR(vdev))
+		return NULL;
+
+	return vdev;
+}
+
+static void mlx5_vdpa_remove(struct mlx5_core_dev *mdev, void *context)
+{
+	struct mlx5_vdpa_dev *vdev = context;
+
+	mlx5_vdpa_remove_dev(vdev);
+}
+
+static struct mlx5_interface mlx5_vdpa_interface = {
+	.add = mlx5_vdpa_add,
+	.remove = mlx5_vdpa_remove,
+	.protocol = MLX5_INTERFACE_PROTOCOL_VDPA,
+};
+
+static int __init mlx5_vdpa_init(void)
+{
+	return mlx5_register_interface(&mlx5_vdpa_interface);
+}
+
+static void __exit mlx5_vdpa_exit(void)
+{
+	mlx5_unregister_interface(&mlx5_vdpa_interface);
+}
+
+module_init(mlx5_vdpa_init);
+module_exit(mlx5_vdpa_exit);
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
new file mode 100644
index 000000000000..601ed4b6a029
--- /dev/null
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -0,0 +1,1950 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2020 Mellanox Technologies Ltd. */
+
+#include <linux/vdpa.h>
+#include <uapi/linux/virtio_ids.h>
+#include <linux/virtio_config.h>
+#include <linux/mlx5/qp.h>
+#include <linux/mlx5/device.h>
+#include <linux/mlx5/vport.h>
+#include <linux/mlx5/fs.h>
+#include <linux/mlx5/device.h>
+#include "mlx5_vnet.h"
+#include "mlx5_vdpa_ifc.h"
+#include "mlx5_vdpa.h"
+
+#define to_mvdev(__vdev) container_of((__vdev), struct mlx5_vdpa_dev, vdev)
+
+#define VALID_FEATURES_MASK                                                                        \
+	(BIT(VIRTIO_NET_F_CSUM) | BIT(VIRTIO_NET_F_GUEST_CSUM) |                                   \
+	 BIT(VIRTIO_NET_F_CTRL_GUEST_OFFLOADS) | BIT(VIRTIO_NET_F_MTU) | BIT(VIRTIO_NET_F_MAC) |   \
+	 BIT(VIRTIO_NET_F_GUEST_TSO4) | BIT(VIRTIO_NET_F_GUEST_TSO6) |                             \
+	 BIT(VIRTIO_NET_F_GUEST_ECN) | BIT(VIRTIO_NET_F_GUEST_UFO) | BIT(VIRTIO_NET_F_HOST_TSO4) | \
+	 BIT(VIRTIO_NET_F_HOST_TSO6) | BIT(VIRTIO_NET_F_HOST_ECN) | BIT(VIRTIO_NET_F_HOST_UFO) |   \
+	 BIT(VIRTIO_NET_F_MRG_RXBUF) | BIT(VIRTIO_NET_F_STATUS) | BIT(VIRTIO_NET_F_CTRL_VQ) |      \
+	 BIT(VIRTIO_NET_F_CTRL_RX) | BIT(VIRTIO_NET_F_CTRL_VLAN) |                                 \
+	 BIT(VIRTIO_NET_F_CTRL_RX_EXTRA) | BIT(VIRTIO_NET_F_GUEST_ANNOUNCE) |                      \
+	 BIT(VIRTIO_NET_F_MQ) | BIT(VIRTIO_NET_F_CTRL_MAC_ADDR) | BIT(VIRTIO_NET_F_HASH_REPORT) |  \
+	 BIT(VIRTIO_NET_F_RSS) | BIT(VIRTIO_NET_F_RSC_EXT) | BIT(VIRTIO_NET_F_STANDBY) |           \
+	 BIT(VIRTIO_NET_F_SPEED_DUPLEX) | BIT(VIRTIO_F_NOTIFY_ON_EMPTY) |                          \
+	 BIT(VIRTIO_F_ANY_LAYOUT) | BIT(VIRTIO_F_VERSION_1) | BIT(VIRTIO_F_IOMMU_PLATFORM) |       \
+	 BIT(VIRTIO_F_RING_PACKED) | BIT(VIRTIO_F_ORDER_PLATFORM) | BIT(VIRTIO_F_SR_IOV))
+
+#define VALID_STATUS_MASK                                                                          \
+	(VIRTIO_CONFIG_S_ACKNOWLEDGE | VIRTIO_CONFIG_S_DRIVER | VIRTIO_CONFIG_S_DRIVER_OK |        \
+	 VIRTIO_CONFIG_S_FEATURES_OK | VIRTIO_CONFIG_S_NEEDS_RESET | VIRTIO_CONFIG_S_FAILED)
+
+struct mlx5_vdpa_net_resources {
+	u32 tisn;
+	u32 tdn;
+	u32 tirn;
+	u32 rqtn;
+	bool valid;
+};
+
+struct mlx5_vdpa_cq_buf {
+	struct mlx5_frag_buf_ctrl fbc;
+	struct mlx5_frag_buf frag_buf;
+	int cqe_size;
+	int nent;
+};
+
+struct mlx5_vdpa_cq {
+	struct mlx5_core_cq mcq;
+	struct mlx5_vdpa_cq_buf buf;
+	struct mlx5_db db;
+	int cqe;
+};
+
+struct mlx5_vdpa_umem {
+	struct mlx5_frag_buf_ctrl fbc;
+	struct mlx5_frag_buf frag_buf;
+	int size;
+	u32 id;
+};
+
+struct mlx5_vdpa_qp {
+	struct mlx5_core_qp mqp;
+	struct mlx5_frag_buf frag_buf;
+	struct mlx5_db db;
+	u16 head;
+	bool fw;
+};
+
+struct mlx5_vq_restore_info {
+	u32 num_ent;
+	u64 desc_addr;
+	u64 device_addr;
+	u64 driver_addr;
+	u16 avail_index;
+	bool ready;
+	struct vdpa_callback cb;
+	bool restore;
+};
+
+struct mlx5_vdpa_virtqueue {
+	bool ready;
+	u64 desc_addr;
+	u64 device_addr;
+	u64 driver_addr;
+	u32 num_ent;
+	struct vdpa_callback event_cb;
+
+	/* Resources for implementing the notification channel from the device
+	 * to the driver. fwqp is the firmware end of an RC connection; the
+	 * other end is vqqp used by the driver. cq is is where completions are
+	 * reported.
+	 */
+	struct mlx5_vdpa_cq cq;
+	struct mlx5_vdpa_qp fwqp;
+	struct mlx5_vdpa_qp vqqp;
+
+	/* umem resources are required for the virtqueue operation. They're use
+	 * is internal and they must be provided by the driver.
+	 */
+	struct mlx5_vdpa_umem umem1;
+	struct mlx5_vdpa_umem umem2;
+	struct mlx5_vdpa_umem umem3;
+
+	bool initialized;
+	int index;
+	u32 virtq_id;
+	struct mlx5_vdpa_net *ndev;
+	u16 avail_idx;
+	int fw_state;
+
+	/* keep last in the struct */
+	struct mlx5_vq_restore_info ri;
+};
+
+/* We will remove this limitation once mlx5_vdpa_alloc_resources()
+ * provides for driver space allocation
+ */
+#define MLX5_MAX_SUPPORTED_VQS 16
+
+struct mlx5_vdpa_net {
+	struct mlx5_vdpa_dev mvdev;
+	struct mlx5_vdpa_net_resources res;
+	struct virtio_net_config config;
+	struct mlx5_vdpa_virtqueue vqs[MLX5_MAX_SUPPORTED_VQS];
+
+	/* Serialize vq resources creation and destruction. This is required
+	 * since memory map might change and we need to destroy and create
+	 * resources while driver in operational.
+	 */
+	struct mutex reslock;
+	struct mlx5_flow_table *rxft;
+	struct mlx5_fc *rx_counter;
+	struct mlx5_flow_handle *rx_rule;
+	bool setup;
+};
+
+static void free_resources(struct mlx5_vdpa_net *ndev);
+static void init_mvqs(struct mlx5_vdpa_net *ndev);
+static int setup_driver(struct mlx5_vdpa_net *ndev);
+static void teardown_driver(struct mlx5_vdpa_net *ndev);
+
+static bool mlx5_vdpa_debug;
+
+#define MLX5_LOG_VIO_FLAG(_feature)                                                                \
+	do {                                                                                       \
+		if (features & BIT(_feature))                                                      \
+			mlx5_vdpa_info(mvdev, "%s\n", #_feature);                                  \
+	} while (0)
+
+#define MLX5_LOG_VIO_STAT(_status)                                                                 \
+	do {                                                                                       \
+		if (status & (_status))                                                            \
+			mlx5_vdpa_info(mvdev, "%s\n", #_status);                                   \
+	} while (0)
+
+static void print_status(struct mlx5_vdpa_dev *mvdev, u8 status, bool set)
+{
+	if (status & ~VALID_STATUS_MASK)
+		mlx5_vdpa_warn(mvdev, "Warning: there are invalid status bits 0x%x\n",
+			       status & ~VALID_STATUS_MASK);
+
+	if (!mlx5_vdpa_debug)
+		return;
+
+	mlx5_vdpa_info(mvdev, "driver status %s", set ? "set" : "get");
+	if (set && !status) {
+		mlx5_vdpa_info(mvdev, "driver resets the device\n");
+		return;
+	}
+
+	MLX5_LOG_VIO_STAT(VIRTIO_CONFIG_S_ACKNOWLEDGE);
+	MLX5_LOG_VIO_STAT(VIRTIO_CONFIG_S_DRIVER);
+	MLX5_LOG_VIO_STAT(VIRTIO_CONFIG_S_DRIVER_OK);
+	MLX5_LOG_VIO_STAT(VIRTIO_CONFIG_S_FEATURES_OK);
+	MLX5_LOG_VIO_STAT(VIRTIO_CONFIG_S_NEEDS_RESET);
+	MLX5_LOG_VIO_STAT(VIRTIO_CONFIG_S_FAILED);
+}
+
+static void print_features(struct mlx5_vdpa_dev *mvdev, u64 features, bool set)
+{
+	if (features & ~VALID_FEATURES_MASK)
+		mlx5_vdpa_warn(mvdev, "There are invalid feature bits 0x%llx\n",
+			       features & ~VALID_FEATURES_MASK);
+
+	if (!mlx5_vdpa_debug)
+		return;
+
+	mlx5_vdpa_info(mvdev, "driver %s feature bits:\n", set ? "sets" : "reads");
+	if (!features)
+		mlx5_vdpa_info(mvdev, "all feature bits are cleared\n");
+
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CSUM);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_GUEST_CSUM);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CTRL_GUEST_OFFLOADS);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_MTU);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_MAC);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_GUEST_TSO4);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_GUEST_TSO6);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_GUEST_ECN);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_GUEST_UFO);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_HOST_TSO4);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_HOST_TSO6);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_HOST_ECN);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_HOST_UFO);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_MRG_RXBUF);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_STATUS);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CTRL_VQ);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CTRL_RX);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CTRL_VLAN);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CTRL_RX_EXTRA);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_GUEST_ANNOUNCE);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_MQ);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CTRL_MAC_ADDR);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_HASH_REPORT);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_RSS);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_RSC_EXT);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_STANDBY);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_SPEED_DUPLEX);
+	MLX5_LOG_VIO_FLAG(VIRTIO_F_NOTIFY_ON_EMPTY);
+	MLX5_LOG_VIO_FLAG(VIRTIO_F_ANY_LAYOUT);
+	MLX5_LOG_VIO_FLAG(VIRTIO_F_VERSION_1);
+	MLX5_LOG_VIO_FLAG(VIRTIO_F_IOMMU_PLATFORM);
+	MLX5_LOG_VIO_FLAG(VIRTIO_F_RING_PACKED);
+	MLX5_LOG_VIO_FLAG(VIRTIO_F_ORDER_PLATFORM);
+	MLX5_LOG_VIO_FLAG(VIRTIO_F_SR_IOV);
+}
+
+static int create_tis(struct mlx5_vdpa_net *ndev)
+{
+	struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
+	u32 in[MLX5_ST_SZ_DW(create_tis_in)] = {};
+	void *tisc;
+	int err;
+
+	tisc = MLX5_ADDR_OF(create_tis_in, in, ctx);
+	MLX5_SET(tisc, tisc, transport_domain, ndev->res.tdn);
+	err = mlx5_vdpa_create_tis(mvdev, in, &ndev->res.tisn);
+	if (err)
+		mlx5_vdpa_warn(mvdev, "create TIS (%d)\n", err);
+
+	return err;
+}
+
+static void destroy_tis(struct mlx5_vdpa_net *ndev)
+{
+	mlx5_vdpa_destroy_tis(&ndev->mvdev, ndev->res.tisn);
+}
+
+#define MLX5_VDPA_CQE_SIZE 64
+#define MLX5_VDPA_LOG_CQE_SIZE ilog2(MLX5_VDPA_CQE_SIZE)
+
+static int cq_frag_buf_alloc(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_cq_buf *buf, int nent)
+{
+	struct mlx5_frag_buf *frag_buf = &buf->frag_buf;
+	u8 log_wq_stride = MLX5_VDPA_LOG_CQE_SIZE;
+	u8 log_wq_sz = MLX5_VDPA_LOG_CQE_SIZE;
+	int err;
+
+	err = mlx5_frag_buf_alloc_node(ndev->mvdev.mdev, nent * MLX5_VDPA_CQE_SIZE, frag_buf,
+				       ndev->mvdev.mdev->priv.numa_node);
+	if (err)
+		return err;
+
+	mlx5_init_fbc(frag_buf->frags, log_wq_stride, log_wq_sz, &buf->fbc);
+
+	buf->cqe_size = MLX5_VDPA_CQE_SIZE;
+	buf->nent = nent;
+
+	return 0;
+}
+
+static int umem_frag_buf_alloc(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_umem *umem, int size)
+{
+	struct mlx5_frag_buf *frag_buf = &umem->frag_buf;
+
+	return mlx5_frag_buf_alloc_node(ndev->mvdev.mdev, size, frag_buf,
+					ndev->mvdev.mdev->priv.numa_node);
+}
+
+static void cq_frag_buf_free(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_cq_buf *buf)
+{
+	mlx5_frag_buf_free(ndev->mvdev.mdev, &buf->frag_buf);
+}
+
+static void *get_cqe(struct mlx5_vdpa_cq *vcq, int n)
+{
+	return mlx5_frag_buf_get_wqe(&vcq->buf.fbc, n);
+}
+
+static void cq_frag_buf_init(struct mlx5_vdpa_cq *vcq, struct mlx5_vdpa_cq_buf *buf)
+{
+	struct mlx5_cqe64 *cqe64;
+	void *cqe;
+	int i;
+
+	for (i = 0; i < buf->nent; i++) {
+		cqe = get_cqe(vcq, i);
+		cqe64 = cqe;
+		cqe64->op_own = MLX5_CQE_INVALID << 4;
+	}
+}
+
+static void *get_sw_cqe(struct mlx5_vdpa_cq *cq, int n)
+{
+	struct mlx5_cqe64 *cqe64 = get_cqe(cq, n & (cq->cqe - 1));
+
+	if (likely(get_cqe_opcode(cqe64) != MLX5_CQE_INVALID) &&
+	    !((cqe64->op_own & MLX5_CQE_OWNER_MASK) ^ !!(n & cq->cqe)))
+		return cqe64;
+
+	return NULL;
+}
+
+static void rx_post(struct mlx5_vdpa_qp *vqp, int n)
+{
+	vqp->head += n;
+	vqp->db.db[0] = cpu_to_be32(vqp->head);
+}
+
+static void qp_prepare(struct mlx5_vdpa_net *ndev, bool fw, void *in,
+		       struct mlx5_vdpa_virtqueue *mvq, u32 num_ent)
+{
+	struct mlx5_vdpa_qp *vqp;
+	__be64 *pas;
+	void *qpc;
+
+	vqp = fw ? &mvq->fwqp : &mvq->vqqp;
+	MLX5_SET(create_qp_in, in, uid, ndev->mvdev.res.uid);
+	qpc = MLX5_ADDR_OF(create_qp_in, in, qpc);
+	if (vqp->fw) {
+		/* Firmware QP is allocated by the driver for the firmware's
+		 * use so we can skip part of the params as they will be chosen by firmware
+		 */
+		qpc = MLX5_ADDR_OF(create_qp_in, in, qpc);
+		MLX5_SET(qpc, qpc, rq_type, MLX5_ZERO_LEN_RQ);
+		MLX5_SET(qpc, qpc, no_sq, 1);
+		return;
+	}
+
+	MLX5_SET(qpc, qpc, st, MLX5_QP_ST_RC);
+	MLX5_SET(qpc, qpc, pm_state, MLX5_QP_PM_MIGRATED);
+	MLX5_SET(qpc, qpc, pd, ndev->mvdev.res.pdn);
+	MLX5_SET(qpc, qpc, mtu, MLX5_QPC_MTU_256_BYTES);
+	MLX5_SET(qpc, qpc, uar_page, ndev->mvdev.res.uar->index);
+	MLX5_SET(qpc, qpc, log_page_size, vqp->frag_buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT);
+	MLX5_SET(qpc, qpc, no_sq, 1);
+	MLX5_SET(qpc, qpc, cqn_rcv, mvq->cq.mcq.cqn);
+	MLX5_SET(qpc, qpc, log_rq_size, ilog2(num_ent));
+	MLX5_SET(qpc, qpc, rq_type, MLX5_NON_ZERO_RQ);
+	pas = (__be64 *)MLX5_ADDR_OF(create_qp_in, in, pas);
+	mlx5_fill_page_frag_array(&vqp->frag_buf, pas);
+}
+
+static int rq_buf_alloc(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_qp *vqp, u32 num_ent)
+{
+	return mlx5_frag_buf_alloc_node(ndev->mvdev.mdev,
+					num_ent * sizeof(struct mlx5_wqe_data_seg), &vqp->frag_buf,
+					ndev->mvdev.mdev->priv.numa_node);
+}
+
+static void rq_buf_free(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_qp *vqp)
+{
+	mlx5_frag_buf_free(ndev->mvdev.mdev, &vqp->frag_buf);
+}
+
+static int qp_create(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq,
+		     struct mlx5_vdpa_qp *vqp)
+{
+	struct mlx5_core_dev *mdev = ndev->mvdev.mdev;
+	int inlen = MLX5_ST_SZ_BYTES(create_qp_in);
+	u32 out[MLX5_ST_SZ_DW(create_qp_out)] = {};
+	void *qpc;
+	void *in;
+	int err;
+
+	if (!vqp->fw) {
+		vqp = &mvq->vqqp;
+		err = rq_buf_alloc(ndev, vqp, mvq->num_ent);
+		if (err)
+			return err;
+
+		err = mlx5_db_alloc(ndev->mvdev.mdev, &vqp->db);
+		if (err)
+			goto err_db;
+		inlen += vqp->frag_buf.npages * sizeof(__be64);
+	}
+
+	in = kzalloc(inlen, GFP_KERNEL);
+	if (!in) {
+		err = -ENOMEM;
+		goto err_kzalloc;
+	}
+
+	qp_prepare(ndev, vqp->fw, in, mvq, mvq->num_ent);
+	qpc = MLX5_ADDR_OF(create_qp_in, in, qpc);
+	MLX5_SET(qpc, qpc, st, MLX5_QP_ST_RC);
+	MLX5_SET(qpc, qpc, pm_state, MLX5_QP_PM_MIGRATED);
+	MLX5_SET(qpc, qpc, pd, ndev->mvdev.res.pdn);
+	MLX5_SET(qpc, qpc, mtu, MLX5_QPC_MTU_256_BYTES);
+	if (!vqp->fw)
+		MLX5_SET64(qpc, qpc, dbr_addr, vqp->db.dma);
+	MLX5_SET(create_qp_in, in, opcode, MLX5_CMD_OP_CREATE_QP);
+	err = mlx5_cmd_exec(mdev, in, inlen, out, sizeof(out));
+	kfree(in);
+	if (err)
+		goto err_kzalloc;
+
+	vqp->mqp.uid = ndev->mvdev.res.uid;
+	vqp->mqp.qpn = MLX5_GET(create_qp_out, out, qpn);
+
+	if (!vqp->fw)
+		rx_post(vqp, mvq->num_ent);
+
+	return 0;
+
+err_kzalloc:
+	if (!vqp->fw)
+		mlx5_db_free(ndev->mvdev.mdev, &vqp->db);
+err_db:
+	if (!vqp->fw)
+		rq_buf_free(ndev, vqp);
+
+	return err;
+}
+
+static void qp_destroy(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_qp *vqp)
+{
+	u32 in[MLX5_ST_SZ_DW(destroy_qp_in)] = {};
+
+	MLX5_SET(destroy_qp_in, in, opcode, MLX5_CMD_OP_DESTROY_QP);
+	MLX5_SET(destroy_qp_in, in, qpn, vqp->mqp.qpn);
+	MLX5_SET(destroy_qp_in, in, uid, ndev->mvdev.res.uid);
+	if (mlx5_cmd_exec_in(ndev->mvdev.mdev, destroy_qp, in))
+		mlx5_vdpa_warn(&ndev->mvdev, "destroy qp 0x%x\n", vqp->mqp.qpn);
+	if (!vqp->fw) {
+		mlx5_db_free(ndev->mvdev.mdev, &vqp->db);
+		rq_buf_free(ndev, vqp);
+	}
+}
+
+static void *next_cqe_sw(struct mlx5_vdpa_cq *cq)
+{
+	return get_sw_cqe(cq, cq->mcq.cons_index);
+}
+
+static int mlx5_vdpa_poll_one(struct mlx5_vdpa_cq *vcq)
+{
+	struct mlx5_cqe64 *cqe64;
+
+	cqe64 = next_cqe_sw(vcq);
+	if (!cqe64)
+		return -EAGAIN;
+
+	vcq->mcq.cons_index++;
+	return 0;
+}
+
+static void mlx5_vdpa_handle_completions(struct mlx5_vdpa_virtqueue *mvq, int num)
+{
+	mlx5_cq_set_ci(&mvq->cq.mcq);
+	rx_post(&mvq->vqqp, num);
+	if (mvq->event_cb.callback)
+		mvq->event_cb.callback(mvq->event_cb.private);
+}
+
+static void mlx5_vdpa_cq_comp(struct mlx5_core_cq *mcq, struct mlx5_eqe *eqe)
+{
+	struct mlx5_vdpa_virtqueue *mvq = container_of(mcq, struct mlx5_vdpa_virtqueue, cq.mcq);
+	struct mlx5_vdpa_net *ndev = mvq->ndev;
+	void __iomem *uar_page = ndev->mvdev.res.uar->map;
+	int num = 0;
+
+	while (!mlx5_vdpa_poll_one(&mvq->cq)) {
+		num++;
+		if (num > mvq->num_ent / 2) {
+			/* If completions keep coming while we poll, we want to
+			 * let the hardware know that we consumed them by
+			 * updating the doorbell record.  We also let vdpa core
+			 * know about this so it passes it on the virtio driver
+			 * on the guest.
+			 */
+			mlx5_vdpa_handle_completions(mvq, num);
+			num = 0;
+		}
+	}
+
+	if (num)
+		mlx5_vdpa_handle_completions(mvq, num);
+
+	mlx5_cq_arm(&mvq->cq.mcq, MLX5_CQ_DB_REQ_NOT, uar_page, mvq->cq.mcq.cons_index);
+}
+
+static int cq_create(struct mlx5_vdpa_net *ndev, u16 idx, u32 num_ent)
+{
+	struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
+	struct mlx5_core_dev *mdev = ndev->mvdev.mdev;
+	void __iomem *uar_page = ndev->mvdev.res.uar->map;
+	u32 out[MLX5_ST_SZ_DW(create_cq_out)];
+	struct mlx5_vdpa_cq *vcq = &mvq->cq;
+	unsigned int irqn;
+	__be64 *pas;
+	int inlen;
+	void *cqc;
+	void *in;
+	int err;
+	int eqn;
+
+	err = mlx5_db_alloc(mdev, &vcq->db);
+	if (err)
+		return err;
+
+	vcq->mcq.set_ci_db = vcq->db.db;
+	vcq->mcq.arm_db = vcq->db.db + 1;
+	vcq->mcq.cqe_sz = 64;
+
+	err = cq_frag_buf_alloc(ndev, &vcq->buf, num_ent);
+	if (err)
+		goto err_db;
+
+	cq_frag_buf_init(vcq, &vcq->buf);
+
+	inlen = MLX5_ST_SZ_BYTES(create_cq_in) +
+		MLX5_FLD_SZ_BYTES(create_cq_in, pas[0]) * vcq->buf.frag_buf.npages;
+	in = kzalloc(inlen, GFP_KERNEL);
+	if (!in) {
+		err = -ENOMEM;
+		goto err_vzalloc;
+	}
+
+	MLX5_SET(create_cq_in, in, uid, ndev->mvdev.res.uid);
+	pas = (__be64 *)MLX5_ADDR_OF(create_cq_in, in, pas);
+	mlx5_fill_page_frag_array(&vcq->buf.frag_buf, pas);
+
+	cqc = MLX5_ADDR_OF(create_cq_in, in, cq_context);
+	MLX5_SET(cqc, cqc, log_page_size, vcq->buf.frag_buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT);
+
+	/* Use vector 0 by default. Consider adding code to choose least used
+	 * vector.
+	 */
+	err = mlx5_vector2eqn(mdev, 0, &eqn, &irqn);
+	if (err)
+		goto err_vec;
+
+	cqc = MLX5_ADDR_OF(create_cq_in, in, cq_context);
+	MLX5_SET(cqc, cqc, log_cq_size, ilog2(num_ent));
+	MLX5_SET(cqc, cqc, uar_page, ndev->mvdev.res.uar->index);
+	MLX5_SET(cqc, cqc, c_eqn, eqn);
+	MLX5_SET64(cqc, cqc, dbr_addr, vcq->db.dma);
+
+	err = mlx5_core_create_cq(mdev, &vcq->mcq, in, inlen, out, sizeof(out));
+	if (err)
+		goto err_vec;
+
+	vcq->mcq.comp = mlx5_vdpa_cq_comp;
+	vcq->cqe = num_ent;
+	vcq->mcq.set_ci_db = vcq->db.db;
+	vcq->mcq.arm_db = vcq->db.db + 1;
+	mlx5_cq_arm(&mvq->cq.mcq, MLX5_CQ_DB_REQ_NOT, uar_page, mvq->cq.mcq.cons_index);
+	kfree(in);
+	return 0;
+
+err_vec:
+	kfree(in);
+err_vzalloc:
+	cq_frag_buf_free(ndev, &vcq->buf);
+err_db:
+	mlx5_db_free(ndev->mvdev.mdev, &vcq->db);
+	return err;
+}
+
+static void cq_destroy(struct mlx5_vdpa_net *ndev, u16 idx)
+{
+	struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
+	struct mlx5_core_dev *mdev = ndev->mvdev.mdev;
+	struct mlx5_vdpa_cq *vcq = &mvq->cq;
+
+	if (mlx5_core_destroy_cq(mdev, &vcq->mcq)) {
+		mlx5_vdpa_warn(&ndev->mvdev, "destroy CQ 0x%x\n", vcq->mcq.cqn);
+		return;
+	}
+	cq_frag_buf_free(ndev, &vcq->buf);
+	mlx5_db_free(ndev->mvdev.mdev, &vcq->db);
+}
+
+static int umem_size(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, int num,
+		     struct mlx5_vdpa_umem **umemp)
+{
+	struct mlx5_core_dev *mdev = ndev->mvdev.mdev;
+	int p_a;
+	int p_b;
+
+	switch (num) {
+	case 1:
+		p_a = MLX5_CAP_DEV_VDPA_EMULATION(mdev, umem_1_buffer_param_a);
+		p_b = MLX5_CAP_DEV_VDPA_EMULATION(mdev, umem_1_buffer_param_b);
+		*umemp = &mvq->umem1;
+		break;
+	case 2:
+		p_a = MLX5_CAP_DEV_VDPA_EMULATION(mdev, umem_2_buffer_param_a);
+		p_b = MLX5_CAP_DEV_VDPA_EMULATION(mdev, umem_2_buffer_param_b);
+		*umemp = &mvq->umem2;
+		break;
+	case 3:
+		p_a = MLX5_CAP_DEV_VDPA_EMULATION(mdev, umem_3_buffer_param_a);
+		p_b = MLX5_CAP_DEV_VDPA_EMULATION(mdev, umem_3_buffer_param_b);
+		*umemp = &mvq->umem3;
+		break;
+	}
+	return p_a * mvq->num_ent + p_b;
+}
+
+static void umem_frag_buf_free(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_umem *umem)
+{
+	mlx5_frag_buf_free(ndev->mvdev.mdev, &umem->frag_buf);
+}
+
+static int create_umem(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, int num)
+{
+	int inlen;
+	u32 out[MLX5_ST_SZ_DW(create_umem_out)] = {};
+	void *um;
+	void *in;
+	int err;
+	__be64 *pas;
+	int size;
+	struct mlx5_vdpa_umem *umem;
+
+	size = umem_size(ndev, mvq, num, &umem);
+	if (size < 0)
+		return size;
+
+	umem->size = size;
+	err = umem_frag_buf_alloc(ndev, umem, size);
+	if (err)
+		return err;
+
+	inlen = MLX5_ST_SZ_BYTES(create_umem_in) + MLX5_ST_SZ_BYTES(mtt) * umem->frag_buf.npages;
+
+	in = kzalloc(inlen, GFP_KERNEL);
+	if (!in) {
+		err = -ENOMEM;
+		goto err_in;
+	}
+
+	MLX5_SET(create_umem_in, in, opcode, MLX5_CMD_OP_CREATE_UMEM);
+	MLX5_SET(create_umem_in, in, uid, ndev->mvdev.res.uid);
+	um = MLX5_ADDR_OF(create_umem_in, in, umem);
+	MLX5_SET(umem, um, log_page_size, umem->frag_buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT);
+	MLX5_SET64(umem, um, num_of_mtt, umem->frag_buf.npages);
+
+	pas = (__be64 *)MLX5_ADDR_OF(umem, um, mtt[0]);
+	mlx5_fill_page_frag_array_perm(&umem->frag_buf, pas, MLX5_MTT_PERM_RW);
+
+	err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
+	if (err) {
+		mlx5_vdpa_warn(&ndev->mvdev, "create umem(%d)\n", err);
+		goto err_cmd;
+	}
+
+	kfree(in);
+	umem->id = MLX5_GET(create_umem_out, out, umem_id);
+
+	return 0;
+
+err_cmd:
+	kfree(in);
+err_in:
+	umem_frag_buf_free(ndev, umem);
+	return err;
+}
+
+static void umem_destroy(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, int num)
+{
+	u32 in[MLX5_ST_SZ_DW(destroy_umem_in)] = {};
+	u32 out[MLX5_ST_SZ_DW(destroy_umem_out)] = {};
+	struct mlx5_vdpa_umem *umem;
+
+	switch (num) {
+	case 1:
+		umem = &mvq->umem1;
+		break;
+	case 2:
+		umem = &mvq->umem2;
+		break;
+	case 3:
+		umem = &mvq->umem3;
+		break;
+	}
+
+	MLX5_SET(destroy_umem_in, in, opcode, MLX5_CMD_OP_DESTROY_UMEM);
+	MLX5_SET(destroy_umem_in, in, umem_id, umem->id);
+	if (mlx5_cmd_exec(ndev->mvdev.mdev, in, sizeof(in), out, sizeof(out)))
+		return;
+
+	umem_frag_buf_free(ndev, umem);
+}
+
+static int umems_create(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+	int num;
+	int err;
+
+	for (num = 1; num <= 3; num++) {
+		err = create_umem(ndev, mvq, num);
+		if (err)
+			goto err_umem;
+	}
+	return 0;
+
+err_umem:
+	for (num--; num > 0; num--)
+		umem_destroy(ndev, mvq, num);
+
+	return err;
+}
+
+static void umems_destroy(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+	int num;
+
+	for (num = 3; num > 0; num--)
+		umem_destroy(ndev, mvq, num);
+}
+
+static int get_queue_type(struct mlx5_vdpa_net *ndev)
+{
+	u32 type_mask;
+
+	type_mask = MLX5_CAP_DEV_VDPA_EMULATION(ndev->mvdev.mdev, virtio_queue_type);
+
+	/* prefer split queue */
+	if (type_mask & MLX5_VIRTIO_EMULATION_CAP_VIRTIO_QUEUE_TYPE_PACKED)
+		return MLX5_VIRTIO_EMULATION_VIRTIO_QUEUE_TYPE_PACKED;
+
+	WARN_ON(!(type_mask & MLX5_VIRTIO_EMULATION_CAP_VIRTIO_QUEUE_TYPE_SPLIT));
+
+	return MLX5_VIRTIO_EMULATION_VIRTIO_QUEUE_TYPE_SPLIT;
+}
+
+static bool vq_is_tx(u16 idx)
+{
+	return idx % 2;
+}
+
+static u16 get_features_12_3(u64 features)
+{
+	return (!!(features & BIT(VIRTIO_NET_F_HOST_TSO4)) << 9) |
+	       (!!(features & BIT(VIRTIO_NET_F_HOST_TSO6)) << 8) |
+	       (!!(features & BIT(VIRTIO_NET_F_CSUM)) << 7) |
+	       (!!(features & BIT(VIRTIO_NET_F_GUEST_CSUM)) << 6);
+}
+
+static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+	int inlen = MLX5_ST_SZ_BYTES(create_virtio_net_q_in);
+	u32 out[MLX5_ST_SZ_DW(create_virtio_net_q_out)] = {};
+	void *obj_context;
+	void *cmd_hdr;
+	void *vq_ctx;
+	void *in;
+	int err;
+
+	err = umems_create(ndev, mvq);
+	if (err)
+		return err;
+
+	in = kzalloc(inlen, GFP_KERNEL);
+	if (!in) {
+		err = -ENOMEM;
+		goto err_alloc;
+	}
+
+	cmd_hdr = MLX5_ADDR_OF(create_virtio_net_q_in, in, general_obj_in_cmd_hdr);
+
+	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, opcode, MLX5_CMD_OP_CREATE_GENERAL_OBJECT);
+	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_type, MLX5_OBJ_TYPE_VIRTIO_NET_Q);
+	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
+
+	obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, obj_context);
+	MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, mvq->avail_idx);
+	MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_12_3,
+		 get_features_12_3(ndev->mvdev.actual_features));
+	vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, virtio_q_context);
+	MLX5_SET(virtio_q, vq_ctx, virtio_q_type, get_queue_type(ndev));
+
+	if (vq_is_tx(mvq->index))
+		MLX5_SET(virtio_net_q_object, obj_context, tisn_or_qpn, ndev->res.tisn);
+
+	MLX5_SET(virtio_q, vq_ctx, event_mode, MLX5_VIRTIO_Q_EVENT_MODE_QP_MODE);
+	MLX5_SET(virtio_q, vq_ctx, queue_index, mvq->index);
+	MLX5_SET(virtio_q, vq_ctx, event_qpn_or_msix, mvq->fwqp.mqp.qpn);
+	MLX5_SET(virtio_q, vq_ctx, queue_size, mvq->num_ent);
+	MLX5_SET(virtio_q, vq_ctx, virtio_version_1_0,
+		 !!(ndev->mvdev.actual_features & VIRTIO_F_VERSION_1));
+	MLX5_SET64(virtio_q, vq_ctx, desc_addr, mvq->desc_addr);
+	MLX5_SET64(virtio_q, vq_ctx, used_addr, mvq->device_addr);
+	MLX5_SET64(virtio_q, vq_ctx, available_addr, mvq->driver_addr);
+	MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, ndev->mvdev.mr.mkey.key);
+	MLX5_SET(virtio_q, vq_ctx, umem_1_id, mvq->umem1.id);
+	MLX5_SET(virtio_q, vq_ctx, umem_1_size, mvq->umem1.size);
+	MLX5_SET(virtio_q, vq_ctx, umem_2_id, mvq->umem2.id);
+	MLX5_SET(virtio_q, vq_ctx, umem_2_size, mvq->umem1.size);
+	MLX5_SET(virtio_q, vq_ctx, umem_3_id, mvq->umem3.id);
+	MLX5_SET(virtio_q, vq_ctx, umem_3_size, mvq->umem1.size);
+	MLX5_SET(virtio_q, vq_ctx, pd, ndev->mvdev.res.pdn);
+	if (MLX5_CAP_DEV_VDPA_EMULATION(ndev->mvdev.mdev, eth_frame_offload_type))
+		MLX5_SET(virtio_q, vq_ctx, virtio_version_1_0, 1);
+
+	err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
+	if (err)
+		goto err_cmd;
+
+	kfree(in);
+	mvq->virtq_id = MLX5_GET(general_obj_out_cmd_hdr, out, obj_id);
+
+	return 0;
+
+err_cmd:
+	kfree(in);
+err_alloc:
+	umems_destroy(ndev, mvq);
+	return err;
+}
+
+static void destroy_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+	u32 in[MLX5_ST_SZ_DW(destroy_virtio_net_q_in)] = {};
+	u32 out[MLX5_ST_SZ_DW(destroy_virtio_net_q_out)] = {};
+
+	MLX5_SET(destroy_virtio_net_q_in, in, general_obj_out_cmd_hdr.opcode,
+		 MLX5_CMD_OP_DESTROY_GENERAL_OBJECT);
+	MLX5_SET(destroy_virtio_net_q_in, in, general_obj_out_cmd_hdr.obj_id, mvq->virtq_id);
+	MLX5_SET(destroy_virtio_net_q_in, in, general_obj_out_cmd_hdr.uid, ndev->mvdev.res.uid);
+	MLX5_SET(destroy_virtio_net_q_in, in, general_obj_out_cmd_hdr.obj_type,
+		 MLX5_OBJ_TYPE_VIRTIO_NET_Q);
+	if (mlx5_cmd_exec(ndev->mvdev.mdev, in, sizeof(in), out, sizeof(out))) {
+		mlx5_vdpa_warn(&ndev->mvdev, "destroy virtqueue 0x%x\n", mvq->virtq_id);
+		return;
+	}
+	umems_destroy(ndev, mvq);
+}
+
+static u32 get_rqpn(struct mlx5_vdpa_virtqueue *mvq, bool fw)
+{
+	return fw ? mvq->vqqp.mqp.qpn : mvq->fwqp.mqp.qpn;
+}
+
+static u32 get_qpn(struct mlx5_vdpa_virtqueue *mvq, bool fw)
+{
+	return fw ? mvq->fwqp.mqp.qpn : mvq->vqqp.mqp.qpn;
+}
+
+static void alloc_inout(struct mlx5_vdpa_net *ndev, int cmd, void **in, int *inlen, void **out,
+			int *outlen, u32 qpn, u32 rqpn)
+{
+	void *qpc;
+	void *pp;
+
+	switch (cmd) {
+	case MLX5_CMD_OP_2RST_QP:
+		*inlen = MLX5_ST_SZ_BYTES(qp_2rst_in);
+		*outlen = MLX5_ST_SZ_BYTES(qp_2rst_out);
+		*in = kzalloc(*inlen, GFP_KERNEL);
+		*out = kzalloc(*outlen, GFP_KERNEL);
+		if (!in || !out)
+			goto outerr;
+
+		MLX5_SET(qp_2rst_in, *in, opcode, cmd);
+		MLX5_SET(qp_2rst_in, *in, uid, ndev->mvdev.res.uid);
+		MLX5_SET(qp_2rst_in, *in, qpn, qpn);
+		break;
+	case MLX5_CMD_OP_RST2INIT_QP:
+		*inlen = MLX5_ST_SZ_BYTES(rst2init_qp_in);
+		*outlen = MLX5_ST_SZ_BYTES(rst2init_qp_out);
+		*in = kzalloc(*inlen, GFP_KERNEL);
+		*out = kzalloc(MLX5_ST_SZ_BYTES(rst2init_qp_out), GFP_KERNEL);
+		if (!in || !out)
+			goto outerr;
+
+		MLX5_SET(rst2init_qp_in, *in, opcode, cmd);
+		MLX5_SET(rst2init_qp_in, *in, uid, ndev->mvdev.res.uid);
+		MLX5_SET(rst2init_qp_in, *in, qpn, qpn);
+		qpc = MLX5_ADDR_OF(rst2init_qp_in, *in, qpc);
+		MLX5_SET(qpc, qpc, remote_qpn, rqpn);
+		MLX5_SET(qpc, qpc, rwe, 1);
+		pp = MLX5_ADDR_OF(qpc, qpc, primary_address_path);
+		MLX5_SET(ads, pp, vhca_port_num, 1);
+		break;
+	case MLX5_CMD_OP_INIT2RTR_QP:
+		*inlen = MLX5_ST_SZ_BYTES(init2rtr_qp_in);
+		*outlen = MLX5_ST_SZ_BYTES(init2rtr_qp_out);
+		*in = kzalloc(*inlen, GFP_KERNEL);
+		*out = kzalloc(MLX5_ST_SZ_BYTES(init2rtr_qp_out), GFP_KERNEL);
+		if (!in || !out)
+			goto outerr;
+
+		MLX5_SET(init2rtr_qp_in, *in, opcode, cmd);
+		MLX5_SET(init2rtr_qp_in, *in, uid, ndev->mvdev.res.uid);
+		MLX5_SET(init2rtr_qp_in, *in, qpn, qpn);
+		qpc = MLX5_ADDR_OF(rst2init_qp_in, *in, qpc);
+		MLX5_SET(qpc, qpc, mtu, MLX5_QPC_MTU_256_BYTES);
+		MLX5_SET(qpc, qpc, log_msg_max, 30);
+		MLX5_SET(qpc, qpc, remote_qpn, rqpn);
+		pp = MLX5_ADDR_OF(qpc, qpc, primary_address_path);
+		MLX5_SET(ads, pp, fl, 1);
+		break;
+	case MLX5_CMD_OP_RTR2RTS_QP:
+		*inlen = MLX5_ST_SZ_BYTES(rtr2rts_qp_in);
+		*outlen = MLX5_ST_SZ_BYTES(rtr2rts_qp_out);
+		*in = kzalloc(*inlen, GFP_KERNEL);
+		*out = kzalloc(MLX5_ST_SZ_BYTES(rtr2rts_qp_out), GFP_KERNEL);
+		if (!in || !out)
+			goto outerr;
+
+		MLX5_SET(rtr2rts_qp_in, *in, opcode, cmd);
+		MLX5_SET(rtr2rts_qp_in, *in, uid, ndev->mvdev.res.uid);
+		MLX5_SET(rtr2rts_qp_in, *in, qpn, qpn);
+		qpc = MLX5_ADDR_OF(rst2init_qp_in, *in, qpc);
+		pp = MLX5_ADDR_OF(qpc, qpc, primary_address_path);
+		MLX5_SET(ads, pp, ack_timeout, 14);
+		MLX5_SET(qpc, qpc, retry_count, 7);
+		MLX5_SET(qpc, qpc, rnr_retry, 7);
+		break;
+	default:
+		goto outerr;
+	}
+	if (!*in || !*out)
+		goto outerr;
+
+	return;
+
+outerr:
+	kfree(*in);
+	kfree(*out);
+	*in = NULL;
+	*out = NULL;
+}
+
+static void free_inout(void *in, void *out)
+{
+	kfree(in);
+	kfree(out);
+}
+
+/* Two QPs are used by each virtqueue. One is used by the driver and one by
+ * firmware. The fw argument indicates whether the subjected QP is the one used
+ * by firmware.
+ */
+static int modify_qp(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, bool fw, int cmd)
+{
+	int outlen;
+	int inlen;
+	void *out;
+	void *in;
+	int err;
+
+	alloc_inout(ndev, cmd, &in, &inlen, &out, &outlen, get_qpn(mvq, fw), get_rqpn(mvq, fw));
+	if (!in || !out)
+		return -ENOMEM;
+
+	err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, outlen);
+	free_inout(in, out);
+	return err;
+}
+
+static int connect_qps(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+	int err;
+
+	err = modify_qp(ndev, mvq, true, MLX5_CMD_OP_2RST_QP);
+	if (err)
+		return err;
+
+	err = modify_qp(ndev, mvq, false, MLX5_CMD_OP_2RST_QP);
+	if (err)
+		return err;
+
+	err = modify_qp(ndev, mvq, true, MLX5_CMD_OP_RST2INIT_QP);
+	if (err)
+		return err;
+
+	err = modify_qp(ndev, mvq, false, MLX5_CMD_OP_RST2INIT_QP);
+	if (err)
+		return err;
+
+	err = modify_qp(ndev, mvq, true, MLX5_CMD_OP_INIT2RTR_QP);
+	if (err)
+		return err;
+
+	err = modify_qp(ndev, mvq, false, MLX5_CMD_OP_INIT2RTR_QP);
+	if (err)
+		return err;
+
+	return modify_qp(ndev, mvq, true, MLX5_CMD_OP_RTR2RTS_QP);
+}
+
+struct mlx5_virtq_attr {
+	u8 state;
+	u16 available_index;
+};
+
+static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq,
+			   struct mlx5_virtq_attr *attr)
+{
+	int outlen = MLX5_ST_SZ_BYTES(query_virtio_net_q_out);
+	u32 in[MLX5_ST_SZ_DW(query_virtio_net_q_in)] = {};
+	void *out;
+	void *obj_context;
+	void *cmd_hdr;
+	int err;
+
+	out = kzalloc(outlen, GFP_KERNEL);
+	if (!out)
+		return -ENOMEM;
+
+	cmd_hdr = MLX5_ADDR_OF(query_virtio_net_q_in, in, general_obj_in_cmd_hdr);
+
+	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, opcode, MLX5_CMD_OP_QUERY_GENERAL_OBJECT);
+	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_type, MLX5_OBJ_TYPE_VIRTIO_NET_Q);
+	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_id, mvq->virtq_id);
+	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
+	err = mlx5_cmd_exec(ndev->mvdev.mdev, in, sizeof(in), out, outlen);
+	if (err)
+		goto err_cmd;
+
+	obj_context = MLX5_ADDR_OF(query_virtio_net_q_out, out, obj_context);
+	memset(attr, 0, sizeof(*attr));
+	attr->state = MLX5_GET(virtio_net_q_object, obj_context, state);
+	attr->available_index = MLX5_GET(virtio_net_q_object, obj_context, hw_available_index);
+	kfree(out);
+	return 0;
+
+err_cmd:
+	kfree(out);
+	return err;
+}
+
+static int modify_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, int state)
+{
+	int inlen = MLX5_ST_SZ_BYTES(modify_virtio_net_q_in);
+	u32 out[MLX5_ST_SZ_DW(modify_virtio_net_q_out)] = {};
+	void *obj_context;
+	void *cmd_hdr;
+	void *in;
+	int err;
+
+	in = kzalloc(inlen, GFP_KERNEL);
+	if (!in)
+		return -ENOMEM;
+
+	cmd_hdr = MLX5_ADDR_OF(modify_virtio_net_q_in, in, general_obj_in_cmd_hdr);
+
+	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, opcode, MLX5_CMD_OP_MODIFY_GENERAL_OBJECT);
+	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_type, MLX5_OBJ_TYPE_VIRTIO_NET_Q);
+	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_id, mvq->virtq_id);
+	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
+
+	obj_context = MLX5_ADDR_OF(modify_virtio_net_q_in, in, obj_context);
+	MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select,
+		   MLX5_VIRTQ_MODIFY_MASK_STATE);
+	MLX5_SET(virtio_net_q_object, obj_context, state, state);
+	err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
+	kfree(in);
+	if (!err)
+		mvq->fw_state = state;
+
+	return err;
+}
+
+static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+	u16 idx = mvq->index;
+	int err;
+
+	if (!mvq->num_ent)
+		return 0;
+
+	if (mvq->initialized) {
+		mlx5_vdpa_warn(&ndev->mvdev, "attempt re init\n");
+		return -EINVAL;
+	}
+
+	err = cq_create(ndev, idx, mvq->num_ent);
+	if (err)
+		return err;
+
+	err = qp_create(ndev, mvq, &mvq->fwqp);
+	if (err)
+		goto err_fwqp;
+
+	err = qp_create(ndev, mvq, &mvq->vqqp);
+	if (err)
+		goto err_vqqp;
+
+	err = connect_qps(ndev, mvq);
+	if (err)
+		goto err_connect;
+
+	err = create_virtqueue(ndev, mvq);
+	if (err)
+		goto err_connect;
+
+	if (mvq->ready) {
+		err = modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
+		if (err) {
+			mlx5_vdpa_warn(&ndev->mvdev, "failed to modify to ready vq idx %d(%d)\n",
+				       idx, err);
+			goto err_connect;
+		}
+	}
+
+	mvq->initialized = true;
+	return 0;
+
+err_connect:
+	qp_destroy(ndev, &mvq->vqqp);
+err_vqqp:
+	qp_destroy(ndev, &mvq->fwqp);
+err_fwqp:
+	cq_destroy(ndev, idx);
+	return err;
+}
+
+static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+	struct mlx5_virtq_attr attr;
+
+	if (!mvq->initialized)
+		return;
+
+	if (query_virtqueue(ndev, mvq, &attr)) {
+		mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue\n");
+		return;
+	}
+	if (mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY)
+		return;
+
+	if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
+		mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");
+}
+
+static void suspend_vqs(struct mlx5_vdpa_net *ndev)
+{
+	int i;
+
+	for (i = 0; i < MLX5_MAX_SUPPORTED_VQS; i++)
+		suspend_vq(ndev, &ndev->vqs[i]);
+}
+
+static void teardown_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+	if (!mvq->initialized)
+		return;
+
+	suspend_vq(ndev, mvq);
+	destroy_virtqueue(ndev, mvq);
+	qp_destroy(ndev, &mvq->vqqp);
+	qp_destroy(ndev, &mvq->fwqp);
+	cq_destroy(ndev, mvq->index);
+	mvq->initialized = false;
+}
+
+static int create_rqt(struct mlx5_vdpa_net *ndev)
+{
+	int log_max_rqt;
+	__be32 *list;
+	void *rqtc;
+	int inlen;
+	void *in;
+	int i, j;
+	int err;
+
+	log_max_rqt = min_t(int, 1, MLX5_CAP_GEN(ndev->mvdev.mdev, log_max_rqt_size));
+	if (log_max_rqt < 1)
+		return -EOPNOTSUPP;
+
+	inlen = MLX5_ST_SZ_BYTES(create_rqt_in) + (1 << log_max_rqt) * MLX5_ST_SZ_BYTES(rq_num);
+	in = kzalloc(inlen, GFP_KERNEL);
+	if (!in)
+		return -ENOMEM;
+
+	MLX5_SET(create_rqt_in, in, uid, ndev->mvdev.res.uid);
+	rqtc = MLX5_ADDR_OF(create_rqt_in, in, rqt_context);
+
+	MLX5_SET(rqtc, rqtc, list_q_type, MLX5_RQTC_LIST_Q_TYPE_VIRTIO_NET_Q);
+	MLX5_SET(rqtc, rqtc, rqt_max_size, 1 << log_max_rqt);
+	MLX5_SET(rqtc, rqtc, rqt_actual_size, 1);
+	list = MLX5_ADDR_OF(rqtc, rqtc, rq_num[0]);
+	for (i = 0, j = 0; j < ndev->mvdev.max_vqs; j++) {
+		if (!ndev->vqs[j].initialized)
+			continue;
+
+		if (!vq_is_tx(ndev->vqs[j].index)) {
+			list[i] = cpu_to_be32(ndev->vqs[j].virtq_id);
+			i++;
+		}
+	}
+
+	err = mlx5_vdpa_create_rqt(&ndev->mvdev, in, inlen, &ndev->res.rqtn);
+	kfree(in);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+static void destroy_rqt(struct mlx5_vdpa_net *ndev)
+{
+	mlx5_vdpa_destroy_rqt(&ndev->mvdev, ndev->res.rqtn);
+}
+
+static int create_tir(struct mlx5_vdpa_net *ndev)
+{
+#define HASH_IP_L4PORTS                                                                            \
+	(MLX5_HASH_FIELD_SEL_SRC_IP | MLX5_HASH_FIELD_SEL_DST_IP | MLX5_HASH_FIELD_SEL_L4_SPORT |  \
+	 MLX5_HASH_FIELD_SEL_L4_DPORT)
+	static const u8 rx_hash_toeplitz_key[] = { 0x2c, 0xc6, 0x81, 0xd1, 0x5b, 0xdb, 0xf4, 0xf7,
+						   0xfc, 0xa2, 0x83, 0x19, 0xdb, 0x1a, 0x3e, 0x94,
+						   0x6b, 0x9e, 0x38, 0xd9, 0x2c, 0x9c, 0x03, 0xd1,
+						   0xad, 0x99, 0x44, 0xa7, 0xd9, 0x56, 0x3d, 0x59,
+						   0x06, 0x3c, 0x25, 0xf3, 0xfc, 0x1f, 0xdc, 0x2a };
+	void *rss_key;
+	void *outer;
+	void *tirc;
+	void *in;
+	int err;
+
+	in = kzalloc(MLX5_ST_SZ_BYTES(create_tir_in), GFP_KERNEL);
+	if (!in)
+		return -ENOMEM;
+
+	MLX5_SET(create_tir_in, in, uid, ndev->mvdev.res.uid);
+	tirc = MLX5_ADDR_OF(create_tir_in, in, ctx);
+	MLX5_SET(tirc, tirc, disp_type, MLX5_TIRC_DISP_TYPE_INDIRECT);
+
+	MLX5_SET(tirc, tirc, rx_hash_symmetric, 1);
+	MLX5_SET(tirc, tirc, rx_hash_fn, MLX5_RX_HASH_FN_TOEPLITZ);
+	rss_key = MLX5_ADDR_OF(tirc, tirc, rx_hash_toeplitz_key);
+	memcpy(rss_key, rx_hash_toeplitz_key, sizeof(rx_hash_toeplitz_key));
+
+	outer = MLX5_ADDR_OF(tirc, tirc, rx_hash_field_selector_outer);
+	MLX5_SET(rx_hash_field_select, outer, l3_prot_type, MLX5_L3_PROT_TYPE_IPV4);
+	MLX5_SET(rx_hash_field_select, outer, l4_prot_type, MLX5_L4_PROT_TYPE_TCP);
+	MLX5_SET(rx_hash_field_select, outer, selected_fields, HASH_IP_L4PORTS);
+
+	MLX5_SET(tirc, tirc, indirect_table, ndev->res.rqtn);
+	MLX5_SET(tirc, tirc, transport_domain, ndev->res.tdn);
+
+	err = mlx5_vdpa_create_tir(&ndev->mvdev, in, &ndev->res.tirn);
+	kfree(in);
+	return err;
+}
+
+static void destroy_tir(struct mlx5_vdpa_net *ndev)
+{
+	mlx5_vdpa_destroy_tir(&ndev->mvdev, ndev->res.tirn);
+}
+
+static int add_fwd_to_tir(struct mlx5_vdpa_net *ndev)
+{
+	struct mlx5_flow_destination dest[2] = {};
+	struct mlx5_flow_table_attr ft_attr = {};
+	struct mlx5_flow_act flow_act = {};
+	struct mlx5_flow_namespace *ns;
+	int err;
+
+	/* for now, one entry, match all, forward to tir */
+	ft_attr.max_fte = 1;
+	ft_attr.autogroup.max_num_groups = 1;
+
+	ns = mlx5_get_flow_namespace(ndev->mvdev.mdev, MLX5_FLOW_NAMESPACE_BYPASS);
+	if (!ns) {
+		mlx5_vdpa_warn(&ndev->mvdev, "get flow namespace\n");
+		return -EOPNOTSUPP;
+	}
+
+	ndev->rxft = mlx5_create_auto_grouped_flow_table(ns, &ft_attr);
+	if (IS_ERR(ndev->rxft))
+		return PTR_ERR(ndev->rxft);
+
+	ndev->rx_counter = mlx5_fc_create(ndev->mvdev.mdev, false);
+	if (IS_ERR(ndev->rx_counter)) {
+		err = PTR_ERR(ndev->rx_counter);
+		goto err_fc;
+	}
+
+	flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST | MLX5_FLOW_CONTEXT_ACTION_COUNT;
+	dest[0].type = MLX5_FLOW_DESTINATION_TYPE_TIR;
+	dest[0].tir_num = ndev->res.tirn;
+	dest[1].type = MLX5_FLOW_DESTINATION_TYPE_COUNTER;
+	dest[1].counter_id = mlx5_fc_id(ndev->rx_counter);
+	ndev->rx_rule = mlx5_add_flow_rules(ndev->rxft, NULL, &flow_act, dest, 2);
+	if (IS_ERR(ndev->rx_rule)) {
+		err = PTR_ERR(ndev->rx_rule);
+		ndev->rx_rule = NULL;
+		goto err_rule;
+	}
+
+	return 0;
+
+err_rule:
+	mlx5_fc_destroy(ndev->mvdev.mdev, ndev->rx_counter);
+err_fc:
+	mlx5_destroy_flow_table(ndev->rxft);
+	return err;
+}
+
+static void remove_fwd_to_tir(struct mlx5_vdpa_net *ndev)
+{
+	if (!ndev->rx_rule)
+		return;
+
+	mlx5_del_flow_rules(ndev->rx_rule);
+	mlx5_fc_destroy(ndev->mvdev.mdev, ndev->rx_counter);
+	mlx5_destroy_flow_table(ndev->rxft);
+
+	ndev->rx_rule = NULL;
+}
+
+static void mlx5_vdpa_kick_vq(struct vdpa_device *vdev, u16 idx)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
+
+	if (unlikely(!mvq->ready))
+		return;
+
+	iowrite16(idx, ndev->mvdev.res.kick_addr);
+}
+
+static int mlx5_vdpa_set_vq_address(struct vdpa_device *vdev, u16 idx, u64 desc_area,
+				    u64 driver_area, u64 device_area)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
+
+	mvq->desc_addr = desc_area;
+	mvq->device_addr = device_area;
+	mvq->driver_addr = driver_area;
+	return 0;
+}
+
+static void mlx5_vdpa_set_vq_num(struct vdpa_device *vdev, u16 idx, u32 num)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	struct mlx5_vdpa_virtqueue *mvq;
+
+	mvq = &ndev->vqs[idx];
+	mvq->num_ent = num;
+}
+
+static void mlx5_vdpa_set_vq_cb(struct vdpa_device *vdev, u16 idx, struct vdpa_callback *cb)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	struct mlx5_vdpa_virtqueue *vq = &ndev->vqs[idx];
+
+	vq->event_cb = *cb;
+}
+
+static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
+
+	if (!ready)
+		suspend_vq(ndev, mvq);
+
+	mvq->ready = ready;
+}
+
+static bool mlx5_vdpa_get_vq_ready(struct vdpa_device *vdev, u16 idx)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
+
+	return mvq->ready;
+}
+
+static int mlx5_vdpa_set_vq_state(struct vdpa_device *vdev, u16 idx,
+				  const struct vdpa_vq_state *state)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
+
+	if (mvq->fw_state == MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY) {
+		mlx5_vdpa_warn(mvdev, "can't modify available index\n");
+		return -EINVAL;
+	}
+
+	mvq->avail_idx = state->avail_index;
+	return 0;
+}
+
+static int mlx5_vdpa_get_vq_state(struct vdpa_device *vdev, u16 idx, struct vdpa_vq_state *state)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
+	struct mlx5_virtq_attr attr;
+	int err;
+
+	if (!mvq->initialized)
+		return -EAGAIN;
+
+	err = query_virtqueue(ndev, mvq, &attr);
+	if (err) {
+		mlx5_vdpa_warn(mvdev, "failed to query virtqueue\n");
+		return err;
+	}
+	state->avail_index = attr.available_index;
+	return 0;
+}
+
+static u32 mlx5_vdpa_get_vq_align(struct vdpa_device *vdev)
+{
+	return PAGE_SIZE;
+}
+
+enum { MLX5_VIRTIO_NET_F_GUEST_CSUM = 1 << 9,
+	MLX5_VIRTIO_NET_F_CSUM = 1 << 10,
+	MLX5_VIRTIO_NET_F_HOST_TSO6 = 1 << 11,
+	MLX5_VIRTIO_NET_F_HOST_TSO4 = 1 << 12,
+};
+
+static u64 mlx_to_vritio_features(u16 dev_features)
+{
+	u64 result = 0;
+
+	if (dev_features & MLX5_VIRTIO_NET_F_GUEST_CSUM)
+		result |= BIT(VIRTIO_NET_F_GUEST_CSUM);
+	if (dev_features & MLX5_VIRTIO_NET_F_CSUM)
+		result |= BIT(VIRTIO_NET_F_CSUM);
+	if (dev_features & MLX5_VIRTIO_NET_F_HOST_TSO6)
+		result |= BIT(VIRTIO_NET_F_HOST_TSO6);
+	if (dev_features & MLX5_VIRTIO_NET_F_HOST_TSO4)
+		result |= BIT(VIRTIO_NET_F_HOST_TSO4);
+
+	return result;
+}
+
+static u64 mlx5_vdpa_get_features(struct vdpa_device *vdev)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	u16 dev_features;
+
+	dev_features = MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, device_features_bits_mask);
+	ndev->mvdev.mlx_features = mlx_to_vritio_features(dev_features);
+	if (MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, virtio_version_1_0))
+		ndev->mvdev.mlx_features |= BIT(VIRTIO_F_VERSION_1);
+	ndev->mvdev.mlx_features |= BIT(VIRTIO_F_IOMMU_PLATFORM);
+	print_features(mvdev, ndev->mvdev.mlx_features, false);
+	return ndev->mvdev.mlx_features;
+}
+
+static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features)
+{
+	if (!(features & BIT(VIRTIO_F_IOMMU_PLATFORM)))
+		return -EOPNOTSUPP;
+
+	return 0;
+}
+
+static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
+{
+	int err;
+	int i;
+
+	for (i = 0; i < 2 * mlx5_vdpa_max_qps(ndev->mvdev.max_vqs); i++) {
+		err = setup_vq(ndev, &ndev->vqs[i]);
+		if (err)
+			goto err_vq;
+	}
+
+	return 0;
+
+err_vq:
+	for (--i; i >= 0; i--)
+		teardown_vq(ndev, &ndev->vqs[i]);
+
+	return err;
+}
+
+static void teardown_virtqueues(struct mlx5_vdpa_net *ndev)
+{
+	struct mlx5_vdpa_virtqueue *mvq;
+	int i;
+
+	for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) {
+		mvq = &ndev->vqs[i];
+		if (!mvq->initialized)
+			continue;
+
+		teardown_vq(ndev, mvq);
+	}
+}
+
+static int mlx5_vdpa_set_features(struct vdpa_device *vdev, u64 features)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	int err;
+
+	print_features(mvdev, features, true);
+
+	err = verify_min_features(mvdev, features);
+	if (err)
+		return err;
+
+	ndev->mvdev.actual_features = features & ndev->mvdev.mlx_features;
+	return err;
+}
+
+static void mlx5_vdpa_set_config_cb(struct vdpa_device *vdev, struct vdpa_callback *cb)
+{
+	/* not implemented */
+	mlx5_vdpa_warn(to_mvdev(vdev), "set config callback not supported\n");
+}
+
+#define MLX5_VDPA_MAX_VQ_ENTRIES 256
+static u16 mlx5_vdpa_get_vq_num_max(struct vdpa_device *vdev)
+{
+	return MLX5_VDPA_MAX_VQ_ENTRIES;
+}
+
+static u32 mlx5_vdpa_get_device_id(struct vdpa_device *vdev)
+{
+	return VIRTIO_ID_NET;
+}
+
+static u32 mlx5_vdpa_get_vendor_id(struct vdpa_device *vdev)
+{
+	return PCI_VENDOR_ID_MELLANOX;
+}
+
+static u8 mlx5_vdpa_get_status(struct vdpa_device *vdev)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+
+	print_status(mvdev, ndev->mvdev.status, false);
+	return ndev->mvdev.status;
+}
+
+static int save_channel_info(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+	struct mlx5_vq_restore_info *ri = &mvq->ri;
+	struct mlx5_virtq_attr attr;
+	int err;
+
+	if (!mvq->initialized)
+		return 0;
+
+	err = query_virtqueue(ndev, mvq, &attr);
+	if (err)
+		return err;
+
+	ri->avail_index = attr.available_index;
+	ri->ready = mvq->ready;
+	ri->num_ent = mvq->num_ent;
+	ri->desc_addr = mvq->desc_addr;
+	ri->device_addr = mvq->device_addr;
+	ri->driver_addr = mvq->driver_addr;
+	ri->cb = mvq->event_cb;
+	ri->restore = true;
+	return 0;
+}
+
+static int save_channels_info(struct mlx5_vdpa_net *ndev)
+{
+	int i;
+
+	for (i = 0; i < ndev->mvdev.max_vqs; i++) {
+		memset(&ndev->vqs[i].ri, 0, sizeof(ndev->vqs[i].ri));
+		save_channel_info(ndev, &ndev->vqs[i]);
+	}
+	return 0;
+}
+
+static void mlx5_clear_vqs(struct mlx5_vdpa_net *ndev)
+{
+	int i;
+
+	for (i = 0; i < ndev->mvdev.max_vqs; i++)
+		memset(&ndev->vqs[i], 0, offsetof(struct mlx5_vdpa_virtqueue, ri));
+}
+
+static void restore_channels_info(struct mlx5_vdpa_net *ndev)
+{
+	struct mlx5_vdpa_virtqueue *mvq;
+	struct mlx5_vq_restore_info *ri;
+	int i;
+
+	mlx5_clear_vqs(ndev);
+	init_mvqs(ndev);
+	for (i = 0; i < ndev->mvdev.max_vqs; i++) {
+		mvq = &ndev->vqs[i];
+		ri = &mvq->ri;
+		if (!ri->restore)
+			continue;
+
+		mvq->avail_idx = ri->avail_index;
+		mvq->ready = ri->ready;
+		mvq->num_ent = ri->num_ent;
+		mvq->desc_addr = ri->desc_addr;
+		mvq->device_addr = ri->device_addr;
+		mvq->driver_addr = ri->driver_addr;
+		mvq->event_cb = ri->cb;
+	}
+}
+
+static int mlx5_vdpa_change_map(struct mlx5_vdpa_net *ndev, struct vhost_iotlb *iotlb)
+{
+	int err;
+
+	suspend_vqs(ndev);
+	err = save_channels_info(ndev);
+	if (err)
+		goto err_mr;
+
+	teardown_driver(ndev);
+	mlx5_vdpa_destroy_mr(&ndev->mvdev);
+	err = mlx5_vdpa_create_mr(&ndev->mvdev, iotlb);
+	if (err)
+		goto err_mr;
+
+	restore_channels_info(ndev);
+	err = setup_driver(ndev);
+	if (err)
+		goto err_setup;
+
+	return 0;
+
+err_setup:
+	mlx5_vdpa_destroy_mr(&ndev->mvdev);
+err_mr:
+	return err;
+}
+
+static int setup_driver(struct mlx5_vdpa_net *ndev)
+{
+	int err;
+
+	mutex_lock(&ndev->reslock);
+	if (ndev->setup) {
+		mlx5_vdpa_warn(&ndev->mvdev, "setup driver called for already setup driver\n");
+		err = 0;
+		goto out;
+	}
+	err = setup_virtqueues(ndev);
+	if (err) {
+		mlx5_vdpa_warn(&ndev->mvdev, "setup_virtqueues\n");
+		goto out;
+	}
+
+	err = create_rqt(ndev);
+	if (err) {
+		mlx5_vdpa_warn(&ndev->mvdev, "create_rqt\n");
+		goto err_rqt;
+	}
+
+	err = create_tir(ndev);
+	if (err) {
+		mlx5_vdpa_warn(&ndev->mvdev, "create_tir\n");
+		goto err_tir;
+	}
+
+	err = add_fwd_to_tir(ndev);
+	if (err) {
+		mlx5_vdpa_warn(&ndev->mvdev, "add_fwd_to_tir\n");
+		goto err_fwd;
+	}
+	ndev->setup = true;
+	mutex_unlock(&ndev->reslock);
+
+	return 0;
+
+err_fwd:
+	destroy_tir(ndev);
+err_tir:
+	destroy_rqt(ndev);
+err_rqt:
+	teardown_virtqueues(ndev);
+out:
+	mutex_unlock(&ndev->reslock);
+	return err;
+}
+
+static void teardown_driver(struct mlx5_vdpa_net *ndev)
+{
+	mutex_lock(&ndev->reslock);
+	if (!ndev->setup)
+		goto out;
+
+	remove_fwd_to_tir(ndev);
+	destroy_tir(ndev);
+	destroy_rqt(ndev);
+	teardown_virtqueues(ndev);
+	ndev->setup = false;
+out:
+	mutex_unlock(&ndev->reslock);
+}
+
+static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	int err;
+
+	print_status(mvdev, status, true);
+	if (!status) {
+		mlx5_vdpa_info(mvdev, "performing device reset\n");
+		teardown_driver(ndev);
+		mlx5_vdpa_destroy_mr(&ndev->mvdev);
+		ndev->mvdev.status = 0;
+		ndev->mvdev.mlx_features = 0;
+		++mvdev->generation;
+		return;
+	}
+
+	if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) {
+		if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+			err = setup_driver(ndev);
+			if (err) {
+				mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
+				goto err_setup;
+			}
+		} else {
+			mlx5_vdpa_warn(mvdev, "did not expect DRIVER_OK to be cleared\n");
+			return;
+		}
+	}
+
+	ndev->mvdev.status = status;
+	return;
+
+err_setup:
+	mlx5_vdpa_destroy_mr(&ndev->mvdev);
+	ndev->mvdev.status |= VIRTIO_CONFIG_S_FAILED;
+}
+
+static void mlx5_vdpa_get_config(struct vdpa_device *vdev, unsigned int offset, void *buf,
+				 unsigned int len)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+
+	if (offset + len < sizeof(struct virtio_net_config))
+		memcpy(buf, &ndev->config + offset, len);
+}
+
+static void mlx5_vdpa_set_config(struct vdpa_device *vdev, unsigned int offset, const void *buf,
+				 unsigned int len)
+{
+	/* not supported */
+}
+
+static u32 mlx5_vdpa_get_generation(struct vdpa_device *vdev)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+
+	return mvdev->generation;
+}
+
+static int mlx5_vdpa_set_map(struct vdpa_device *vdev, struct vhost_iotlb *iotlb)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	bool change_map;
+	int err;
+
+	err = mlx5_vdpa_handle_set_map(mvdev, iotlb, &change_map);
+	if (err) {
+		mlx5_vdpa_warn(mvdev, "set map failed(%d)\n", err);
+		return err;
+	}
+
+	if (change_map)
+		return mlx5_vdpa_change_map(ndev, iotlb);
+
+	return 0;
+}
+
+static void mlx5_vdpa_free(struct vdpa_device *vdev)
+{
+	/* not used */
+}
+
+static const struct vdpa_config_ops mlx5_vdpa_ops = {
+	.set_vq_address = mlx5_vdpa_set_vq_address,
+	.set_vq_num = mlx5_vdpa_set_vq_num,
+	.kick_vq = mlx5_vdpa_kick_vq,
+	.set_vq_cb = mlx5_vdpa_set_vq_cb,
+	.set_vq_ready = mlx5_vdpa_set_vq_ready,
+	.get_vq_ready = mlx5_vdpa_get_vq_ready,
+	.set_vq_state = mlx5_vdpa_set_vq_state,
+	.get_vq_state = mlx5_vdpa_get_vq_state,
+	.get_vq_align = mlx5_vdpa_get_vq_align,
+	.get_features = mlx5_vdpa_get_features,
+	.set_features = mlx5_vdpa_set_features,
+	.set_config_cb = mlx5_vdpa_set_config_cb,
+	.get_vq_num_max = mlx5_vdpa_get_vq_num_max,
+	.get_device_id = mlx5_vdpa_get_device_id,
+	.get_vendor_id = mlx5_vdpa_get_vendor_id,
+	.get_status = mlx5_vdpa_get_status,
+	.set_status = mlx5_vdpa_set_status,
+	.get_config = mlx5_vdpa_get_config,
+	.set_config = mlx5_vdpa_set_config,
+	.get_generation = mlx5_vdpa_get_generation,
+	.set_map = mlx5_vdpa_set_map,
+	.free = mlx5_vdpa_free,
+};
+
+static int alloc_resources(struct mlx5_vdpa_net *ndev)
+{
+	struct mlx5_vdpa_net_resources *res = &ndev->res;
+	int err;
+
+	if (res->valid) {
+		mlx5_vdpa_warn(&ndev->mvdev, "resources already allocated\n");
+		return -EEXIST;
+	}
+
+	err = mlx5_vdpa_alloc_transport_domain(&ndev->mvdev, &res->tdn);
+	if (err)
+		return err;
+
+	err = create_tis(ndev);
+	if (err)
+		goto err_tis;
+
+	res->valid = true;
+
+	return 0;
+
+err_tis:
+	mlx5_vdpa_dealloc_transport_domain(&ndev->mvdev, res->tdn);
+	return err;
+}
+
+static void free_resources(struct mlx5_vdpa_net *ndev)
+{
+	struct mlx5_vdpa_net_resources *res = &ndev->res;
+
+	if (!res->valid)
+		return;
+
+	destroy_tis(ndev);
+	mlx5_vdpa_dealloc_transport_domain(&ndev->mvdev, res->tdn);
+	res->valid = false;
+}
+
+static void init_mvqs(struct mlx5_vdpa_net *ndev)
+{
+	struct mlx5_vdpa_virtqueue *mvq;
+	int i;
+
+	for (i = 0; i < 2 * mlx5_vdpa_max_qps(ndev->mvdev.max_vqs); ++i) {
+		mvq = &ndev->vqs[i];
+		memset(mvq, 0, offsetof(struct mlx5_vdpa_virtqueue, ri));
+		mvq->index = i;
+		mvq->ndev = ndev;
+		mvq->fwqp.fw = true;
+	}
+	for (; i < ndev->mvdev.max_vqs; i++) {
+		mvq = &ndev->vqs[i];
+		memset(mvq, 0, offsetof(struct mlx5_vdpa_virtqueue, ri));
+		mvq->index = i;
+		mvq->ndev = ndev;
+	}
+}
+
+void *mlx5_vdpa_add_dev(struct mlx5_core_dev *mdev)
+{
+	struct virtio_net_config *config;
+	struct mlx5_vdpa_dev *mvdev;
+	struct mlx5_vdpa_net *ndev;
+	u32 max_vqs;
+	int err;
+
+	/* we save one virtqueue for control virtqueue should we require it */
+	max_vqs = MLX5_CAP_DEV_VDPA_EMULATION(mdev, max_num_virtio_queues);
+	max_vqs = min_t(u32, max_vqs, MLX5_MAX_SUPPORTED_VQS);
+
+	ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, mdev->device, &mlx5_vdpa_ops,
+				 2 * mlx5_vdpa_max_qps(max_vqs));
+	if (IS_ERR(ndev))
+		return ndev;
+
+	ndev->mvdev.max_vqs = max_vqs;
+	mvdev = &ndev->mvdev;
+	mvdev->mdev = mdev;
+	init_mvqs(ndev);
+	mutex_init(&ndev->reslock);
+	config = &ndev->config;
+	err = mlx5_query_nic_vport_mtu(mdev, &config->mtu);
+	if (err)
+		goto err_mtu;
+
+	err = mlx5_query_nic_vport_mac_address(mdev, 0, 0, config->mac);
+	if (err)
+		goto err_mtu;
+
+	mvdev->vdev.dma_dev = mdev->device;
+	err = mlx5_vdpa_alloc_resources(&ndev->mvdev);
+	if (err)
+		goto err_mtu;
+
+	err = alloc_resources(ndev);
+	if (err)
+		goto err_res;
+
+	err = vdpa_register_device(&mvdev->vdev);
+	if (err)
+		goto err_reg;
+
+	return ndev;
+
+err_reg:
+	free_resources(ndev);
+err_res:
+	mlx5_vdpa_free_resources(&ndev->mvdev);
+err_mtu:
+	mutex_destroy(&ndev->reslock);
+	put_device(&mvdev->vdev.dev);
+	return ERR_PTR(err);
+}
+
+void mlx5_vdpa_remove_dev(struct mlx5_vdpa_dev *mvdev)
+{
+	struct mlx5_vdpa_net *ndev;
+
+	ndev = container_of(mvdev, struct mlx5_vdpa_net, mvdev);
+	vdpa_unregister_device(&ndev->mvdev.vdev);
+	free_resources(ndev);
+	mlx5_vdpa_free_resources(&ndev->mvdev);
+	mutex_destroy(&ndev->reslock);
+}
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.h b/drivers/vdpa/mlx5/net/mlx5_vnet.h
new file mode 100644
index 000000000000..f2d6d68b020e
--- /dev/null
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2020 Mellanox Technologies Ltd. */
+
+#ifndef __MLX5_VNET_H_
+#define __MLX5_VNET_H_
+
+#include <linux/vdpa.h>
+#include <linux/virtio_net.h>
+#include <linux/vringh.h>
+#include <linux/mlx5/driver.h>
+#include <linux/mlx5/cq.h>
+#include <linux/mlx5/qp.h>
+#include "mlx5_vdpa.h"
+
+static inline u32 mlx5_vdpa_max_qps(int max_vqs)
+{
+	return max_vqs / 2;
+}
+
+#define to_mlx5_vdpa_ndev(__mvdev) container_of(__mvdev, struct mlx5_vdpa_net, mvdev)
+void *mlx5_vdpa_add_dev(struct mlx5_core_dev *mdev);
+void mlx5_vdpa_remove_dev(struct mlx5_vdpa_dev *mvdev);
+
+#endif /* __MLX5_VNET_H_ */
-- 
2.26.0

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V2 vhost next 10/10] vdpa/mlx5: Add VDPA driver for supported mlx5 devices
  2020-07-20  7:14 ` [PATCH V2 vhost next 10/10] vdpa/mlx5: Add VDPA driver for supported mlx5 devices Eli Cohen
@ 2020-07-20 12:55   ` kernel test robot
  2020-07-22  7:46     ` Eli Cohen
  2020-07-21  3:30   ` kernel test robot
  2020-07-21  5:51   ` Jason Wang
  2 siblings, 1 reply; 18+ messages in thread
From: kernel test robot @ 2020-07-20 12:55 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: shahafs, Eli Cohen, kbuild-all, parav, saeedm


[-- Attachment #1: Type: text/plain, Size: 1607 bytes --]

Hi Eli,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on next-20200717]

url:    https://github.com/0day-ci/linux/commits/Eli-Cohen/VDPA-support-for-Mellanox-ConnectX-devices/20200720-160220
base:    aab7ee9f8ff0110bfcd594b33dc33748dc1baf46
config: arc-allyesconfig (attached as .config)
compiler: arc-elf-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> drivers/vdpa/mlx5/net/main.c:7:10: fatal error: mlx5_vdpa_ifc.h: No such file or directory
       7 | #include "mlx5_vdpa_ifc.h"
         |          ^~~~~~~~~~~~~~~~~
   compilation terminated.
--
   In file included from drivers/vdpa/mlx5/net/mlx5_vnet.c:12:
>> drivers/vdpa/mlx5/net/mlx5_vnet.h:13:10: fatal error: mlx5_vdpa.h: No such file or directory
      13 | #include "mlx5_vdpa.h"
         |          ^~~~~~~~~~~~~
   compilation terminated.

vim +7 drivers/vdpa/mlx5/net/main.c

     3	
     4	#include <linux/module.h>
     5	#include <linux/mlx5/driver.h>
     6	#include <linux/mlx5/device.h>
   > 7	#include "mlx5_vdpa_ifc.h"
     8	#include "mlx5_vnet.h"
     9	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 65011 bytes --]

[-- Attachment #3: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V2 vhost next 10/10] vdpa/mlx5: Add VDPA driver for supported mlx5 devices
  2020-07-20  7:14 ` [PATCH V2 vhost next 10/10] vdpa/mlx5: Add VDPA driver for supported mlx5 devices Eli Cohen
  2020-07-20 12:55   ` kernel test robot
@ 2020-07-21  3:30   ` kernel test robot
  2020-07-21  5:51   ` Jason Wang
  2 siblings, 0 replies; 18+ messages in thread
From: kernel test robot @ 2020-07-21  3:30 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: shahafs, kbuild-all, parav, saeedm, clang-built-linux, Eli Cohen


[-- Attachment #1: Type: text/plain, Size: 1639 bytes --]

Hi Eli,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on next-20200717]

url:    https://github.com/0day-ci/linux/commits/Eli-Cohen/VDPA-support-for-Mellanox-ConnectX-devices/20200720-160220
base:    aab7ee9f8ff0110bfcd594b33dc33748dc1baf46
config: x86_64-allyesconfig (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project cf1105069648446d58adfb7a6cc590013d6886ba)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from drivers/vdpa/mlx5/net/mlx5_vnet.c:12:
>> drivers/vdpa/mlx5/net/mlx5_vnet.h:13:10: fatal error: 'mlx5_vdpa.h' file not found
   #include "mlx5_vdpa.h"
            ^~~~~~~~~~~~~
   1 error generated.

vim +13 drivers/vdpa/mlx5/net/mlx5_vnet.h

     6	
     7	#include <linux/vdpa.h>
     8	#include <linux/virtio_net.h>
     9	#include <linux/vringh.h>
    10	#include <linux/mlx5/driver.h>
    11	#include <linux/mlx5/cq.h>
    12	#include <linux/mlx5/qp.h>
  > 13	#include "mlx5_vdpa.h"
    14	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 75700 bytes --]

[-- Attachment #3: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V2 vhost next 06/10] vdpa: Modify get_vq_state() to return error code
  2020-07-20  7:14 ` [PATCH V2 vhost next 06/10] vdpa: Modify get_vq_state() to return error code Eli Cohen
@ 2020-07-21  5:34   ` Jason Wang
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Wang @ 2020-07-21  5:34 UTC (permalink / raw)
  To: Eli Cohen, mst, virtualization, linux-kernel; +Cc: shahafs, saeedm, parav


On 2020/7/20 下午3:14, Eli Cohen wrote:
> Modify get_vq_state() so it returns an error code. In case of hardware
> acceleration, the available index may be retrieved from the device, an
> operation that can possibly fail.
>
> Reviewed-by: Parav Pandit <parav@mellanox.com>
> Signed-off-by: Eli Cohen <eli@mellanox.com>


Acked-by: Jason Wang <jasowang@redhat.com>


> ---
>   drivers/vdpa/ifcvf/ifcvf_main.c  | 5 +++--
>   drivers/vdpa/vdpa_sim/vdpa_sim.c | 5 +++--
>   drivers/vhost/vdpa.c             | 5 ++++-
>   include/linux/vdpa.h             | 4 ++--
>   4 files changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
> index 69032ee97824..d9b5f465ac81 100644
> --- a/drivers/vdpa/ifcvf/ifcvf_main.c
> +++ b/drivers/vdpa/ifcvf/ifcvf_main.c
> @@ -235,12 +235,13 @@ static u16 ifcvf_vdpa_get_vq_num_max(struct vdpa_device *vdpa_dev)
>   	return IFCVF_QUEUE_MAX;
>   }
>   
> -static void ifcvf_vdpa_get_vq_state(struct vdpa_device *vdpa_dev, u16 qid,
> -				    struct vdpa_vq_state *state)
> +static int ifcvf_vdpa_get_vq_state(struct vdpa_device *vdpa_dev, u16 qid,
> +				   struct vdpa_vq_state *state)
>   {
>   	struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
>   
>   	state->avail_index = ifcvf_get_vq_state(vf, qid);
> +	return 0;
>   }
>   
>   static int ifcvf_vdpa_set_vq_state(struct vdpa_device *vdpa_dev, u16 qid,
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index 599519039f8d..ddf6086d43c2 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -427,14 +427,15 @@ static int vdpasim_set_vq_state(struct vdpa_device *vdpa, u16 idx,
>   	return 0;
>   }
>   
> -static void vdpasim_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> -				 struct vdpa_vq_state *state)
> +static int vdpasim_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> +				struct vdpa_vq_state *state)
>   {
>   	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
>   	struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
>   	struct vringh *vrh = &vq->vring;
>   
>   	state->avail_index = vrh->last_avail_idx;
> +	return 0;
>   }
>   
>   static u32 vdpasim_get_vq_align(struct vdpa_device *vdpa)
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index af98c11c9d26..fadad74f882e 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -360,7 +360,10 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
>   	}
>   
>   	if (cmd == VHOST_GET_VRING_BASE) {
> -		ops->get_vq_state(v->vdpa, idx, &vq_state);
> +		r = ops->get_vq_state(v->vdpa, idx, &vq_state);
> +		if (r)
> +			return r;
> +
>   		vq->last_avail_idx = vq_state.avail_index;
>   	}
>   
> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> index 7b088bebffe8..000d71a9f988 100644
> --- a/include/linux/vdpa.h
> +++ b/include/linux/vdpa.h
> @@ -185,8 +185,8 @@ struct vdpa_config_ops {
>   	bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
>   	int (*set_vq_state)(struct vdpa_device *vdev, u16 idx,
>   			    const struct vdpa_vq_state *state);
> -	void (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
> -			     struct vdpa_vq_state *state);
> +	int (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
> +			    struct vdpa_vq_state *state);
>   	struct vdpa_notification_area
>   	(*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
>   

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V2 vhost next 05/10] vhost: Fix documentation
  2020-07-20  7:14 ` [PATCH V2 vhost next 05/10] vhost: Fix documentation Eli Cohen
@ 2020-07-21  5:35   ` Jason Wang
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Wang @ 2020-07-21  5:35 UTC (permalink / raw)
  To: Eli Cohen, mst, virtualization, linux-kernel; +Cc: shahafs, saeedm, parav


On 2020/7/20 下午3:14, Eli Cohen wrote:
> Fix documentation to match actual function prototypes.
>
> Reviewed-by: Parav Pandit <parav@mellanox.com>
> Signed-off-by: Eli Cohen <eli@mellanox.com>


Acked-by: Jason Wang <jasowang@redhat.com>


> ---
>   drivers/vhost/iotlb.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vhost/iotlb.c b/drivers/vhost/iotlb.c
> index 1f0ca6e44410..0d4213a54a88 100644
> --- a/drivers/vhost/iotlb.c
> +++ b/drivers/vhost/iotlb.c
> @@ -149,7 +149,7 @@ EXPORT_SYMBOL_GPL(vhost_iotlb_free);
>    * vhost_iotlb_itree_first - return the first overlapped range
>    * @iotlb: the IOTLB
>    * @start: start of IOVA range
> - * @end: end of IOVA range
> + * @last: last byte in IOVA range
>    */
>   struct vhost_iotlb_map *
>   vhost_iotlb_itree_first(struct vhost_iotlb *iotlb, u64 start, u64 last)
> @@ -162,7 +162,7 @@ EXPORT_SYMBOL_GPL(vhost_iotlb_itree_first);
>    * vhost_iotlb_itree_first - return the next overlapped range
>    * @iotlb: the IOTLB
>    * @start: start of IOVA range
> - * @end: end of IOVA range
> + * @last: last byte IOVA range
>    */
>   struct vhost_iotlb_map *
>   vhost_iotlb_itree_next(struct vhost_iotlb_map *map, u64 start, u64 last)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V2 vhost next 10/10] vdpa/mlx5: Add VDPA driver for supported mlx5 devices
  2020-07-20  7:14 ` [PATCH V2 vhost next 10/10] vdpa/mlx5: Add VDPA driver for supported mlx5 devices Eli Cohen
  2020-07-20 12:55   ` kernel test robot
  2020-07-21  3:30   ` kernel test robot
@ 2020-07-21  5:51   ` Jason Wang
  2 siblings, 0 replies; 18+ messages in thread
From: Jason Wang @ 2020-07-21  5:51 UTC (permalink / raw)
  To: Eli Cohen, mst, virtualization, linux-kernel; +Cc: shahafs, saeedm, parav


On 2020/7/20 下午3:14, Eli Cohen wrote:
> Add a front end VDPA driver that registers in the VDPA bus and provides
> networking to a guest. The VDPA driver creates the necessary resources
> on the VF it is driving such that data path will be offloaded.
>
> Notifications are being communicated through the driver.
>
> Currently, only VFs are supported. In subsequent patches we will have
> devlink support to control which VF is used for VDPA and which function
> is used for regular networking.
>
> Reviewed-by: Parav Pandit<parav@mellanox.com>
> Signed-off-by: Eli Cohen<eli@mellanox.com>
> ---
> Changes from V0:
> 1. Fix include path usage
> 2. Fix use after free in qp_create()
> 3. Consistently use mvq->initialized to check if a vq was initialized.
> 4. Remove unused local variable.
> 5. Defer modifyig vq to ready to driver ok
> 6. suspend hardware vq in set_vq_ready(0)
> 7. Remove reservation for control VQ since it multi queue is not supported in this version
> 8. Avoid call put_device() since this is not a pci device driver.


Looks good to me.

Acked-by: Jason Wang <jasowang@redhat.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V2 vhost next 10/10] vdpa/mlx5: Add VDPA driver for supported mlx5 devices
  2020-07-20 12:55   ` kernel test robot
@ 2020-07-22  7:46     ` Eli Cohen
  0 siblings, 0 replies; 18+ messages in thread
From: Eli Cohen @ 2020-07-22  7:46 UTC (permalink / raw)
  To: kernel test robot
  Cc: mst, jasowang, virtualization, linux-kernel, kbuild-all, shahafs,
	saeedm, parav

On Mon, Jul 20, 2020 at 08:55:54PM +0800, kernel test robot wrote:
> Hi Eli,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on next-20200717]
> 
> url:    https://github.com/0day-ci/linux/commits/Eli-Cohen/VDPA-support-for-Mellanox-ConnectX-devices/20200720-160220
> base:    aab7ee9f8ff0110bfcd594b33dc33748dc1baf46
> config: arc-allyesconfig (attached as .config)
> compiler: arc-elf-gcc (GCC) 9.3.0
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arc 
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All errors (new ones prefixed by >>):
> 
> >> drivers/vdpa/mlx5/net/main.c:7:10: fatal error: mlx5_vdpa_ifc.h: No such file or directory
>        7 | #include "mlx5_vdpa_ifc.h"
>          |          ^~~~~~~~~~~~~~~~~

Does anyone understand this error? The code compiles with no issues. I
think there may be an issue with the tool not interpreting the include
path in the makefile:

subdir-ccflags-y += -I$(src)/core

>    compilation terminated.
> --
>    In file included from drivers/vdpa/mlx5/net/mlx5_vnet.c:12:
> >> drivers/vdpa/mlx5/net/mlx5_vnet.h:13:10: fatal error: mlx5_vdpa.h: No such file or directory
>       13 | #include "mlx5_vdpa.h"
>          |          ^~~~~~~~~~~~~
>    compilation terminated.
> 
> vim +7 drivers/vdpa/mlx5/net/main.c
> 
>      3	
>      4	#include <linux/module.h>
>      5	#include <linux/mlx5/driver.h>
>      6	#include <linux/mlx5/device.h>
>    > 7	#include "mlx5_vdpa_ifc.h"
>      8	#include "mlx5_vnet.h"
>      9	
> 
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V2 vhost next 10/10] vdpa/mlx5: Add VDPA driver for supported mlx5 devices
  2020-07-20  6:54 [PATCH V2 vhost next 00/10] VDPA support for Mellanox ConnectX devices Eli Cohen
@ 2020-07-20  6:54 ` Eli Cohen
  0 siblings, 0 replies; 18+ messages in thread
From: Eli Cohen @ 2020-07-20  6:54 UTC (permalink / raw)
  To: mst, jasowang, virtualization, linux-kernel
  Cc: shahafs, saeedm, parav, Eli Cohen

Add a front end VDPA driver that registers in the VDPA bus and provides
networking to a guest. The VDPA driver creates the necessary resources
on the VF it is driving such that data path will be offloaded.

Notifications are being communicated through the driver.

Currently, only VFs are supported. In subsequent patches we will have
devlink support to control which VF is used for VDPA and which function
is used for regular networking.

Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
---
Changes from V0:
1. Fix include path usage
2. Fix use after free in qp_create()
3. Consistently use mvq->initialized to check if a vq was initialized.
4. Remove unused local variable.
5. Defer modifyig vq to ready to driver ok
6. suspend hardware vq in set_vq_ready(0)
7. Remove reservation for control VQ since it multi queue is not supported in this version
8. Avoid call put_device() since this is not a pci device driver.

 drivers/vdpa/Kconfig              |   10 +
 drivers/vdpa/mlx5/Makefile        |    5 +-
 drivers/vdpa/mlx5/core/mr.c       |    2 +-
 drivers/vdpa/mlx5/net/main.c      |   76 ++
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 1950 +++++++++++++++++++++++++++++
 drivers/vdpa/mlx5/net/mlx5_vnet.h |   24 +
 6 files changed, 2065 insertions(+), 2 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/net/main.c
 create mode 100644 drivers/vdpa/mlx5/net/mlx5_vnet.c
 create mode 100644 drivers/vdpa/mlx5/net/mlx5_vnet.h

diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
index 48a1a776dd86..809cb4c2eecf 100644
--- a/drivers/vdpa/Kconfig
+++ b/drivers/vdpa/Kconfig
@@ -36,4 +36,14 @@ config MLX5_VDPA
 	  Support library for Mellanox VDPA drivers. Provides code that is
 	  common for all types of VDPA drivers.
 
+config MLX5_VDPA_NET
+	tristate "vDPA driver for ConnectX devices"
+	depends on MLX5_VDPA
+	default n
+	help
+	  VDPA network driver for ConnectX6 and newer. Provides offloading
+	  of virtio net datapath such that descriptors put on the ring will
+	  be executed by the hardware. It also supports a variety of stateless
+	  offloads depending on the actual device used and firmware version.
+
 endif # VDPA
diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
index b347c62032ea..3f8850c1a300 100644
--- a/drivers/vdpa/mlx5/Makefile
+++ b/drivers/vdpa/mlx5/Makefile
@@ -1 +1,4 @@
-obj-$(CONFIG_MLX5_VDPA) += core/resources.o core/mr.o
+subdir-ccflags-y += -I$(src)/core
+
+obj-$(CONFIG_MLX5_VDPA_NET) += mlx5_vdpa.o
+mlx5_vdpa-$(CONFIG_MLX5_VDPA_NET) += net/main.o net/mlx5_vnet.o core/resources.o core/mr.o
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index 975aa45fd78b..34e3bbb80df8 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -453,7 +453,7 @@ int mlx5_vdpa_handle_set_map(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *io
 			     bool *change_map)
 {
 	struct mlx5_vdpa_mr *mr = &mvdev->mr;
-	int err;
+	int err = 0;
 
 	*change_map = false;
 	if (map_empty(iotlb)) {
diff --git a/drivers/vdpa/mlx5/net/main.c b/drivers/vdpa/mlx5/net/main.c
new file mode 100644
index 000000000000..838cd98386ff
--- /dev/null
+++ b/drivers/vdpa/mlx5/net/main.c
@@ -0,0 +1,76 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2020 Mellanox Technologies Ltd. */
+
+#include <linux/module.h>
+#include <linux/mlx5/driver.h>
+#include <linux/mlx5/device.h>
+#include "mlx5_vdpa_ifc.h"
+#include "mlx5_vnet.h"
+
+MODULE_AUTHOR("Eli Cohen <eli@mellanox.com>");
+MODULE_DESCRIPTION("Mellanox VDPA driver");
+MODULE_LICENSE("Dual BSD/GPL");
+
+static bool required_caps_supported(struct mlx5_core_dev *mdev)
+{
+	u8 event_mode;
+	u64 got;
+
+	got = MLX5_CAP_GEN_64(mdev, general_obj_types);
+
+	if (!(got & MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_NET_Q))
+		return false;
+
+	event_mode = MLX5_CAP_DEV_VDPA_EMULATION(mdev, event_mode);
+	if (!(event_mode & MLX5_VIRTIO_Q_EVENT_MODE_QP_MODE))
+		return false;
+
+	if (!MLX5_CAP_DEV_VDPA_EMULATION(mdev, eth_frame_offload_type))
+		return false;
+
+	return true;
+}
+
+static void *mlx5_vdpa_add(struct mlx5_core_dev *mdev)
+{
+	struct mlx5_vdpa_dev *vdev;
+
+	if (mlx5_core_is_pf(mdev))
+		return NULL;
+
+	if (!required_caps_supported(mdev)) {
+		dev_info(mdev->device, "virtio net emulation not supported\n");
+		return NULL;
+	}
+	vdev = mlx5_vdpa_add_dev(mdev);
+	if (IS_ERR(vdev))
+		return NULL;
+
+	return vdev;
+}
+
+static void mlx5_vdpa_remove(struct mlx5_core_dev *mdev, void *context)
+{
+	struct mlx5_vdpa_dev *vdev = context;
+
+	mlx5_vdpa_remove_dev(vdev);
+}
+
+static struct mlx5_interface mlx5_vdpa_interface = {
+	.add = mlx5_vdpa_add,
+	.remove = mlx5_vdpa_remove,
+	.protocol = MLX5_INTERFACE_PROTOCOL_VDPA,
+};
+
+static int __init mlx5_vdpa_init(void)
+{
+	return mlx5_register_interface(&mlx5_vdpa_interface);
+}
+
+static void __exit mlx5_vdpa_exit(void)
+{
+	mlx5_unregister_interface(&mlx5_vdpa_interface);
+}
+
+module_init(mlx5_vdpa_init);
+module_exit(mlx5_vdpa_exit);
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
new file mode 100644
index 000000000000..601ed4b6a029
--- /dev/null
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -0,0 +1,1950 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2020 Mellanox Technologies Ltd. */
+
+#include <linux/vdpa.h>
+#include <uapi/linux/virtio_ids.h>
+#include <linux/virtio_config.h>
+#include <linux/mlx5/qp.h>
+#include <linux/mlx5/device.h>
+#include <linux/mlx5/vport.h>
+#include <linux/mlx5/fs.h>
+#include <linux/mlx5/device.h>
+#include "mlx5_vnet.h"
+#include "mlx5_vdpa_ifc.h"
+#include "mlx5_vdpa.h"
+
+#define to_mvdev(__vdev) container_of((__vdev), struct mlx5_vdpa_dev, vdev)
+
+#define VALID_FEATURES_MASK                                                                        \
+	(BIT(VIRTIO_NET_F_CSUM) | BIT(VIRTIO_NET_F_GUEST_CSUM) |                                   \
+	 BIT(VIRTIO_NET_F_CTRL_GUEST_OFFLOADS) | BIT(VIRTIO_NET_F_MTU) | BIT(VIRTIO_NET_F_MAC) |   \
+	 BIT(VIRTIO_NET_F_GUEST_TSO4) | BIT(VIRTIO_NET_F_GUEST_TSO6) |                             \
+	 BIT(VIRTIO_NET_F_GUEST_ECN) | BIT(VIRTIO_NET_F_GUEST_UFO) | BIT(VIRTIO_NET_F_HOST_TSO4) | \
+	 BIT(VIRTIO_NET_F_HOST_TSO6) | BIT(VIRTIO_NET_F_HOST_ECN) | BIT(VIRTIO_NET_F_HOST_UFO) |   \
+	 BIT(VIRTIO_NET_F_MRG_RXBUF) | BIT(VIRTIO_NET_F_STATUS) | BIT(VIRTIO_NET_F_CTRL_VQ) |      \
+	 BIT(VIRTIO_NET_F_CTRL_RX) | BIT(VIRTIO_NET_F_CTRL_VLAN) |                                 \
+	 BIT(VIRTIO_NET_F_CTRL_RX_EXTRA) | BIT(VIRTIO_NET_F_GUEST_ANNOUNCE) |                      \
+	 BIT(VIRTIO_NET_F_MQ) | BIT(VIRTIO_NET_F_CTRL_MAC_ADDR) | BIT(VIRTIO_NET_F_HASH_REPORT) |  \
+	 BIT(VIRTIO_NET_F_RSS) | BIT(VIRTIO_NET_F_RSC_EXT) | BIT(VIRTIO_NET_F_STANDBY) |           \
+	 BIT(VIRTIO_NET_F_SPEED_DUPLEX) | BIT(VIRTIO_F_NOTIFY_ON_EMPTY) |                          \
+	 BIT(VIRTIO_F_ANY_LAYOUT) | BIT(VIRTIO_F_VERSION_1) | BIT(VIRTIO_F_IOMMU_PLATFORM) |       \
+	 BIT(VIRTIO_F_RING_PACKED) | BIT(VIRTIO_F_ORDER_PLATFORM) | BIT(VIRTIO_F_SR_IOV))
+
+#define VALID_STATUS_MASK                                                                          \
+	(VIRTIO_CONFIG_S_ACKNOWLEDGE | VIRTIO_CONFIG_S_DRIVER | VIRTIO_CONFIG_S_DRIVER_OK |        \
+	 VIRTIO_CONFIG_S_FEATURES_OK | VIRTIO_CONFIG_S_NEEDS_RESET | VIRTIO_CONFIG_S_FAILED)
+
+struct mlx5_vdpa_net_resources {
+	u32 tisn;
+	u32 tdn;
+	u32 tirn;
+	u32 rqtn;
+	bool valid;
+};
+
+struct mlx5_vdpa_cq_buf {
+	struct mlx5_frag_buf_ctrl fbc;
+	struct mlx5_frag_buf frag_buf;
+	int cqe_size;
+	int nent;
+};
+
+struct mlx5_vdpa_cq {
+	struct mlx5_core_cq mcq;
+	struct mlx5_vdpa_cq_buf buf;
+	struct mlx5_db db;
+	int cqe;
+};
+
+struct mlx5_vdpa_umem {
+	struct mlx5_frag_buf_ctrl fbc;
+	struct mlx5_frag_buf frag_buf;
+	int size;
+	u32 id;
+};
+
+struct mlx5_vdpa_qp {
+	struct mlx5_core_qp mqp;
+	struct mlx5_frag_buf frag_buf;
+	struct mlx5_db db;
+	u16 head;
+	bool fw;
+};
+
+struct mlx5_vq_restore_info {
+	u32 num_ent;
+	u64 desc_addr;
+	u64 device_addr;
+	u64 driver_addr;
+	u16 avail_index;
+	bool ready;
+	struct vdpa_callback cb;
+	bool restore;
+};
+
+struct mlx5_vdpa_virtqueue {
+	bool ready;
+	u64 desc_addr;
+	u64 device_addr;
+	u64 driver_addr;
+	u32 num_ent;
+	struct vdpa_callback event_cb;
+
+	/* Resources for implementing the notification channel from the device
+	 * to the driver. fwqp is the firmware end of an RC connection; the
+	 * other end is vqqp used by the driver. cq is is where completions are
+	 * reported.
+	 */
+	struct mlx5_vdpa_cq cq;
+	struct mlx5_vdpa_qp fwqp;
+	struct mlx5_vdpa_qp vqqp;
+
+	/* umem resources are required for the virtqueue operation. They're use
+	 * is internal and they must be provided by the driver.
+	 */
+	struct mlx5_vdpa_umem umem1;
+	struct mlx5_vdpa_umem umem2;
+	struct mlx5_vdpa_umem umem3;
+
+	bool initialized;
+	int index;
+	u32 virtq_id;
+	struct mlx5_vdpa_net *ndev;
+	u16 avail_idx;
+	int fw_state;
+
+	/* keep last in the struct */
+	struct mlx5_vq_restore_info ri;
+};
+
+/* We will remove this limitation once mlx5_vdpa_alloc_resources()
+ * provides for driver space allocation
+ */
+#define MLX5_MAX_SUPPORTED_VQS 16
+
+struct mlx5_vdpa_net {
+	struct mlx5_vdpa_dev mvdev;
+	struct mlx5_vdpa_net_resources res;
+	struct virtio_net_config config;
+	struct mlx5_vdpa_virtqueue vqs[MLX5_MAX_SUPPORTED_VQS];
+
+	/* Serialize vq resources creation and destruction. This is required
+	 * since memory map might change and we need to destroy and create
+	 * resources while driver in operational.
+	 */
+	struct mutex reslock;
+	struct mlx5_flow_table *rxft;
+	struct mlx5_fc *rx_counter;
+	struct mlx5_flow_handle *rx_rule;
+	bool setup;
+};
+
+static void free_resources(struct mlx5_vdpa_net *ndev);
+static void init_mvqs(struct mlx5_vdpa_net *ndev);
+static int setup_driver(struct mlx5_vdpa_net *ndev);
+static void teardown_driver(struct mlx5_vdpa_net *ndev);
+
+static bool mlx5_vdpa_debug;
+
+#define MLX5_LOG_VIO_FLAG(_feature)                                                                \
+	do {                                                                                       \
+		if (features & BIT(_feature))                                                      \
+			mlx5_vdpa_info(mvdev, "%s\n", #_feature);                                  \
+	} while (0)
+
+#define MLX5_LOG_VIO_STAT(_status)                                                                 \
+	do {                                                                                       \
+		if (status & (_status))                                                            \
+			mlx5_vdpa_info(mvdev, "%s\n", #_status);                                   \
+	} while (0)
+
+static void print_status(struct mlx5_vdpa_dev *mvdev, u8 status, bool set)
+{
+	if (status & ~VALID_STATUS_MASK)
+		mlx5_vdpa_warn(mvdev, "Warning: there are invalid status bits 0x%x\n",
+			       status & ~VALID_STATUS_MASK);
+
+	if (!mlx5_vdpa_debug)
+		return;
+
+	mlx5_vdpa_info(mvdev, "driver status %s", set ? "set" : "get");
+	if (set && !status) {
+		mlx5_vdpa_info(mvdev, "driver resets the device\n");
+		return;
+	}
+
+	MLX5_LOG_VIO_STAT(VIRTIO_CONFIG_S_ACKNOWLEDGE);
+	MLX5_LOG_VIO_STAT(VIRTIO_CONFIG_S_DRIVER);
+	MLX5_LOG_VIO_STAT(VIRTIO_CONFIG_S_DRIVER_OK);
+	MLX5_LOG_VIO_STAT(VIRTIO_CONFIG_S_FEATURES_OK);
+	MLX5_LOG_VIO_STAT(VIRTIO_CONFIG_S_NEEDS_RESET);
+	MLX5_LOG_VIO_STAT(VIRTIO_CONFIG_S_FAILED);
+}
+
+static void print_features(struct mlx5_vdpa_dev *mvdev, u64 features, bool set)
+{
+	if (features & ~VALID_FEATURES_MASK)
+		mlx5_vdpa_warn(mvdev, "There are invalid feature bits 0x%llx\n",
+			       features & ~VALID_FEATURES_MASK);
+
+	if (!mlx5_vdpa_debug)
+		return;
+
+	mlx5_vdpa_info(mvdev, "driver %s feature bits:\n", set ? "sets" : "reads");
+	if (!features)
+		mlx5_vdpa_info(mvdev, "all feature bits are cleared\n");
+
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CSUM);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_GUEST_CSUM);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CTRL_GUEST_OFFLOADS);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_MTU);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_MAC);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_GUEST_TSO4);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_GUEST_TSO6);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_GUEST_ECN);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_GUEST_UFO);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_HOST_TSO4);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_HOST_TSO6);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_HOST_ECN);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_HOST_UFO);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_MRG_RXBUF);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_STATUS);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CTRL_VQ);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CTRL_RX);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CTRL_VLAN);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CTRL_RX_EXTRA);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_GUEST_ANNOUNCE);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_MQ);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CTRL_MAC_ADDR);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_HASH_REPORT);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_RSS);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_RSC_EXT);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_STANDBY);
+	MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_SPEED_DUPLEX);
+	MLX5_LOG_VIO_FLAG(VIRTIO_F_NOTIFY_ON_EMPTY);
+	MLX5_LOG_VIO_FLAG(VIRTIO_F_ANY_LAYOUT);
+	MLX5_LOG_VIO_FLAG(VIRTIO_F_VERSION_1);
+	MLX5_LOG_VIO_FLAG(VIRTIO_F_IOMMU_PLATFORM);
+	MLX5_LOG_VIO_FLAG(VIRTIO_F_RING_PACKED);
+	MLX5_LOG_VIO_FLAG(VIRTIO_F_ORDER_PLATFORM);
+	MLX5_LOG_VIO_FLAG(VIRTIO_F_SR_IOV);
+}
+
+static int create_tis(struct mlx5_vdpa_net *ndev)
+{
+	struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
+	u32 in[MLX5_ST_SZ_DW(create_tis_in)] = {};
+	void *tisc;
+	int err;
+
+	tisc = MLX5_ADDR_OF(create_tis_in, in, ctx);
+	MLX5_SET(tisc, tisc, transport_domain, ndev->res.tdn);
+	err = mlx5_vdpa_create_tis(mvdev, in, &ndev->res.tisn);
+	if (err)
+		mlx5_vdpa_warn(mvdev, "create TIS (%d)\n", err);
+
+	return err;
+}
+
+static void destroy_tis(struct mlx5_vdpa_net *ndev)
+{
+	mlx5_vdpa_destroy_tis(&ndev->mvdev, ndev->res.tisn);
+}
+
+#define MLX5_VDPA_CQE_SIZE 64
+#define MLX5_VDPA_LOG_CQE_SIZE ilog2(MLX5_VDPA_CQE_SIZE)
+
+static int cq_frag_buf_alloc(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_cq_buf *buf, int nent)
+{
+	struct mlx5_frag_buf *frag_buf = &buf->frag_buf;
+	u8 log_wq_stride = MLX5_VDPA_LOG_CQE_SIZE;
+	u8 log_wq_sz = MLX5_VDPA_LOG_CQE_SIZE;
+	int err;
+
+	err = mlx5_frag_buf_alloc_node(ndev->mvdev.mdev, nent * MLX5_VDPA_CQE_SIZE, frag_buf,
+				       ndev->mvdev.mdev->priv.numa_node);
+	if (err)
+		return err;
+
+	mlx5_init_fbc(frag_buf->frags, log_wq_stride, log_wq_sz, &buf->fbc);
+
+	buf->cqe_size = MLX5_VDPA_CQE_SIZE;
+	buf->nent = nent;
+
+	return 0;
+}
+
+static int umem_frag_buf_alloc(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_umem *umem, int size)
+{
+	struct mlx5_frag_buf *frag_buf = &umem->frag_buf;
+
+	return mlx5_frag_buf_alloc_node(ndev->mvdev.mdev, size, frag_buf,
+					ndev->mvdev.mdev->priv.numa_node);
+}
+
+static void cq_frag_buf_free(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_cq_buf *buf)
+{
+	mlx5_frag_buf_free(ndev->mvdev.mdev, &buf->frag_buf);
+}
+
+static void *get_cqe(struct mlx5_vdpa_cq *vcq, int n)
+{
+	return mlx5_frag_buf_get_wqe(&vcq->buf.fbc, n);
+}
+
+static void cq_frag_buf_init(struct mlx5_vdpa_cq *vcq, struct mlx5_vdpa_cq_buf *buf)
+{
+	struct mlx5_cqe64 *cqe64;
+	void *cqe;
+	int i;
+
+	for (i = 0; i < buf->nent; i++) {
+		cqe = get_cqe(vcq, i);
+		cqe64 = cqe;
+		cqe64->op_own = MLX5_CQE_INVALID << 4;
+	}
+}
+
+static void *get_sw_cqe(struct mlx5_vdpa_cq *cq, int n)
+{
+	struct mlx5_cqe64 *cqe64 = get_cqe(cq, n & (cq->cqe - 1));
+
+	if (likely(get_cqe_opcode(cqe64) != MLX5_CQE_INVALID) &&
+	    !((cqe64->op_own & MLX5_CQE_OWNER_MASK) ^ !!(n & cq->cqe)))
+		return cqe64;
+
+	return NULL;
+}
+
+static void rx_post(struct mlx5_vdpa_qp *vqp, int n)
+{
+	vqp->head += n;
+	vqp->db.db[0] = cpu_to_be32(vqp->head);
+}
+
+static void qp_prepare(struct mlx5_vdpa_net *ndev, bool fw, void *in,
+		       struct mlx5_vdpa_virtqueue *mvq, u32 num_ent)
+{
+	struct mlx5_vdpa_qp *vqp;
+	__be64 *pas;
+	void *qpc;
+
+	vqp = fw ? &mvq->fwqp : &mvq->vqqp;
+	MLX5_SET(create_qp_in, in, uid, ndev->mvdev.res.uid);
+	qpc = MLX5_ADDR_OF(create_qp_in, in, qpc);
+	if (vqp->fw) {
+		/* Firmware QP is allocated by the driver for the firmware's
+		 * use so we can skip part of the params as they will be chosen by firmware
+		 */
+		qpc = MLX5_ADDR_OF(create_qp_in, in, qpc);
+		MLX5_SET(qpc, qpc, rq_type, MLX5_ZERO_LEN_RQ);
+		MLX5_SET(qpc, qpc, no_sq, 1);
+		return;
+	}
+
+	MLX5_SET(qpc, qpc, st, MLX5_QP_ST_RC);
+	MLX5_SET(qpc, qpc, pm_state, MLX5_QP_PM_MIGRATED);
+	MLX5_SET(qpc, qpc, pd, ndev->mvdev.res.pdn);
+	MLX5_SET(qpc, qpc, mtu, MLX5_QPC_MTU_256_BYTES);
+	MLX5_SET(qpc, qpc, uar_page, ndev->mvdev.res.uar->index);
+	MLX5_SET(qpc, qpc, log_page_size, vqp->frag_buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT);
+	MLX5_SET(qpc, qpc, no_sq, 1);
+	MLX5_SET(qpc, qpc, cqn_rcv, mvq->cq.mcq.cqn);
+	MLX5_SET(qpc, qpc, log_rq_size, ilog2(num_ent));
+	MLX5_SET(qpc, qpc, rq_type, MLX5_NON_ZERO_RQ);
+	pas = (__be64 *)MLX5_ADDR_OF(create_qp_in, in, pas);
+	mlx5_fill_page_frag_array(&vqp->frag_buf, pas);
+}
+
+static int rq_buf_alloc(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_qp *vqp, u32 num_ent)
+{
+	return mlx5_frag_buf_alloc_node(ndev->mvdev.mdev,
+					num_ent * sizeof(struct mlx5_wqe_data_seg), &vqp->frag_buf,
+					ndev->mvdev.mdev->priv.numa_node);
+}
+
+static void rq_buf_free(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_qp *vqp)
+{
+	mlx5_frag_buf_free(ndev->mvdev.mdev, &vqp->frag_buf);
+}
+
+static int qp_create(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq,
+		     struct mlx5_vdpa_qp *vqp)
+{
+	struct mlx5_core_dev *mdev = ndev->mvdev.mdev;
+	int inlen = MLX5_ST_SZ_BYTES(create_qp_in);
+	u32 out[MLX5_ST_SZ_DW(create_qp_out)] = {};
+	void *qpc;
+	void *in;
+	int err;
+
+	if (!vqp->fw) {
+		vqp = &mvq->vqqp;
+		err = rq_buf_alloc(ndev, vqp, mvq->num_ent);
+		if (err)
+			return err;
+
+		err = mlx5_db_alloc(ndev->mvdev.mdev, &vqp->db);
+		if (err)
+			goto err_db;
+		inlen += vqp->frag_buf.npages * sizeof(__be64);
+	}
+
+	in = kzalloc(inlen, GFP_KERNEL);
+	if (!in) {
+		err = -ENOMEM;
+		goto err_kzalloc;
+	}
+
+	qp_prepare(ndev, vqp->fw, in, mvq, mvq->num_ent);
+	qpc = MLX5_ADDR_OF(create_qp_in, in, qpc);
+	MLX5_SET(qpc, qpc, st, MLX5_QP_ST_RC);
+	MLX5_SET(qpc, qpc, pm_state, MLX5_QP_PM_MIGRATED);
+	MLX5_SET(qpc, qpc, pd, ndev->mvdev.res.pdn);
+	MLX5_SET(qpc, qpc, mtu, MLX5_QPC_MTU_256_BYTES);
+	if (!vqp->fw)
+		MLX5_SET64(qpc, qpc, dbr_addr, vqp->db.dma);
+	MLX5_SET(create_qp_in, in, opcode, MLX5_CMD_OP_CREATE_QP);
+	err = mlx5_cmd_exec(mdev, in, inlen, out, sizeof(out));
+	kfree(in);
+	if (err)
+		goto err_kzalloc;
+
+	vqp->mqp.uid = ndev->mvdev.res.uid;
+	vqp->mqp.qpn = MLX5_GET(create_qp_out, out, qpn);
+
+	if (!vqp->fw)
+		rx_post(vqp, mvq->num_ent);
+
+	return 0;
+
+err_kzalloc:
+	if (!vqp->fw)
+		mlx5_db_free(ndev->mvdev.mdev, &vqp->db);
+err_db:
+	if (!vqp->fw)
+		rq_buf_free(ndev, vqp);
+
+	return err;
+}
+
+static void qp_destroy(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_qp *vqp)
+{
+	u32 in[MLX5_ST_SZ_DW(destroy_qp_in)] = {};
+
+	MLX5_SET(destroy_qp_in, in, opcode, MLX5_CMD_OP_DESTROY_QP);
+	MLX5_SET(destroy_qp_in, in, qpn, vqp->mqp.qpn);
+	MLX5_SET(destroy_qp_in, in, uid, ndev->mvdev.res.uid);
+	if (mlx5_cmd_exec_in(ndev->mvdev.mdev, destroy_qp, in))
+		mlx5_vdpa_warn(&ndev->mvdev, "destroy qp 0x%x\n", vqp->mqp.qpn);
+	if (!vqp->fw) {
+		mlx5_db_free(ndev->mvdev.mdev, &vqp->db);
+		rq_buf_free(ndev, vqp);
+	}
+}
+
+static void *next_cqe_sw(struct mlx5_vdpa_cq *cq)
+{
+	return get_sw_cqe(cq, cq->mcq.cons_index);
+}
+
+static int mlx5_vdpa_poll_one(struct mlx5_vdpa_cq *vcq)
+{
+	struct mlx5_cqe64 *cqe64;
+
+	cqe64 = next_cqe_sw(vcq);
+	if (!cqe64)
+		return -EAGAIN;
+
+	vcq->mcq.cons_index++;
+	return 0;
+}
+
+static void mlx5_vdpa_handle_completions(struct mlx5_vdpa_virtqueue *mvq, int num)
+{
+	mlx5_cq_set_ci(&mvq->cq.mcq);
+	rx_post(&mvq->vqqp, num);
+	if (mvq->event_cb.callback)
+		mvq->event_cb.callback(mvq->event_cb.private);
+}
+
+static void mlx5_vdpa_cq_comp(struct mlx5_core_cq *mcq, struct mlx5_eqe *eqe)
+{
+	struct mlx5_vdpa_virtqueue *mvq = container_of(mcq, struct mlx5_vdpa_virtqueue, cq.mcq);
+	struct mlx5_vdpa_net *ndev = mvq->ndev;
+	void __iomem *uar_page = ndev->mvdev.res.uar->map;
+	int num = 0;
+
+	while (!mlx5_vdpa_poll_one(&mvq->cq)) {
+		num++;
+		if (num > mvq->num_ent / 2) {
+			/* If completions keep coming while we poll, we want to
+			 * let the hardware know that we consumed them by
+			 * updating the doorbell record.  We also let vdpa core
+			 * know about this so it passes it on the virtio driver
+			 * on the guest.
+			 */
+			mlx5_vdpa_handle_completions(mvq, num);
+			num = 0;
+		}
+	}
+
+	if (num)
+		mlx5_vdpa_handle_completions(mvq, num);
+
+	mlx5_cq_arm(&mvq->cq.mcq, MLX5_CQ_DB_REQ_NOT, uar_page, mvq->cq.mcq.cons_index);
+}
+
+static int cq_create(struct mlx5_vdpa_net *ndev, u16 idx, u32 num_ent)
+{
+	struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
+	struct mlx5_core_dev *mdev = ndev->mvdev.mdev;
+	void __iomem *uar_page = ndev->mvdev.res.uar->map;
+	u32 out[MLX5_ST_SZ_DW(create_cq_out)];
+	struct mlx5_vdpa_cq *vcq = &mvq->cq;
+	unsigned int irqn;
+	__be64 *pas;
+	int inlen;
+	void *cqc;
+	void *in;
+	int err;
+	int eqn;
+
+	err = mlx5_db_alloc(mdev, &vcq->db);
+	if (err)
+		return err;
+
+	vcq->mcq.set_ci_db = vcq->db.db;
+	vcq->mcq.arm_db = vcq->db.db + 1;
+	vcq->mcq.cqe_sz = 64;
+
+	err = cq_frag_buf_alloc(ndev, &vcq->buf, num_ent);
+	if (err)
+		goto err_db;
+
+	cq_frag_buf_init(vcq, &vcq->buf);
+
+	inlen = MLX5_ST_SZ_BYTES(create_cq_in) +
+		MLX5_FLD_SZ_BYTES(create_cq_in, pas[0]) * vcq->buf.frag_buf.npages;
+	in = kzalloc(inlen, GFP_KERNEL);
+	if (!in) {
+		err = -ENOMEM;
+		goto err_vzalloc;
+	}
+
+	MLX5_SET(create_cq_in, in, uid, ndev->mvdev.res.uid);
+	pas = (__be64 *)MLX5_ADDR_OF(create_cq_in, in, pas);
+	mlx5_fill_page_frag_array(&vcq->buf.frag_buf, pas);
+
+	cqc = MLX5_ADDR_OF(create_cq_in, in, cq_context);
+	MLX5_SET(cqc, cqc, log_page_size, vcq->buf.frag_buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT);
+
+	/* Use vector 0 by default. Consider adding code to choose least used
+	 * vector.
+	 */
+	err = mlx5_vector2eqn(mdev, 0, &eqn, &irqn);
+	if (err)
+		goto err_vec;
+
+	cqc = MLX5_ADDR_OF(create_cq_in, in, cq_context);
+	MLX5_SET(cqc, cqc, log_cq_size, ilog2(num_ent));
+	MLX5_SET(cqc, cqc, uar_page, ndev->mvdev.res.uar->index);
+	MLX5_SET(cqc, cqc, c_eqn, eqn);
+	MLX5_SET64(cqc, cqc, dbr_addr, vcq->db.dma);
+
+	err = mlx5_core_create_cq(mdev, &vcq->mcq, in, inlen, out, sizeof(out));
+	if (err)
+		goto err_vec;
+
+	vcq->mcq.comp = mlx5_vdpa_cq_comp;
+	vcq->cqe = num_ent;
+	vcq->mcq.set_ci_db = vcq->db.db;
+	vcq->mcq.arm_db = vcq->db.db + 1;
+	mlx5_cq_arm(&mvq->cq.mcq, MLX5_CQ_DB_REQ_NOT, uar_page, mvq->cq.mcq.cons_index);
+	kfree(in);
+	return 0;
+
+err_vec:
+	kfree(in);
+err_vzalloc:
+	cq_frag_buf_free(ndev, &vcq->buf);
+err_db:
+	mlx5_db_free(ndev->mvdev.mdev, &vcq->db);
+	return err;
+}
+
+static void cq_destroy(struct mlx5_vdpa_net *ndev, u16 idx)
+{
+	struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
+	struct mlx5_core_dev *mdev = ndev->mvdev.mdev;
+	struct mlx5_vdpa_cq *vcq = &mvq->cq;
+
+	if (mlx5_core_destroy_cq(mdev, &vcq->mcq)) {
+		mlx5_vdpa_warn(&ndev->mvdev, "destroy CQ 0x%x\n", vcq->mcq.cqn);
+		return;
+	}
+	cq_frag_buf_free(ndev, &vcq->buf);
+	mlx5_db_free(ndev->mvdev.mdev, &vcq->db);
+}
+
+static int umem_size(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, int num,
+		     struct mlx5_vdpa_umem **umemp)
+{
+	struct mlx5_core_dev *mdev = ndev->mvdev.mdev;
+	int p_a;
+	int p_b;
+
+	switch (num) {
+	case 1:
+		p_a = MLX5_CAP_DEV_VDPA_EMULATION(mdev, umem_1_buffer_param_a);
+		p_b = MLX5_CAP_DEV_VDPA_EMULATION(mdev, umem_1_buffer_param_b);
+		*umemp = &mvq->umem1;
+		break;
+	case 2:
+		p_a = MLX5_CAP_DEV_VDPA_EMULATION(mdev, umem_2_buffer_param_a);
+		p_b = MLX5_CAP_DEV_VDPA_EMULATION(mdev, umem_2_buffer_param_b);
+		*umemp = &mvq->umem2;
+		break;
+	case 3:
+		p_a = MLX5_CAP_DEV_VDPA_EMULATION(mdev, umem_3_buffer_param_a);
+		p_b = MLX5_CAP_DEV_VDPA_EMULATION(mdev, umem_3_buffer_param_b);
+		*umemp = &mvq->umem3;
+		break;
+	}
+	return p_a * mvq->num_ent + p_b;
+}
+
+static void umem_frag_buf_free(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_umem *umem)
+{
+	mlx5_frag_buf_free(ndev->mvdev.mdev, &umem->frag_buf);
+}
+
+static int create_umem(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, int num)
+{
+	int inlen;
+	u32 out[MLX5_ST_SZ_DW(create_umem_out)] = {};
+	void *um;
+	void *in;
+	int err;
+	__be64 *pas;
+	int size;
+	struct mlx5_vdpa_umem *umem;
+
+	size = umem_size(ndev, mvq, num, &umem);
+	if (size < 0)
+		return size;
+
+	umem->size = size;
+	err = umem_frag_buf_alloc(ndev, umem, size);
+	if (err)
+		return err;
+
+	inlen = MLX5_ST_SZ_BYTES(create_umem_in) + MLX5_ST_SZ_BYTES(mtt) * umem->frag_buf.npages;
+
+	in = kzalloc(inlen, GFP_KERNEL);
+	if (!in) {
+		err = -ENOMEM;
+		goto err_in;
+	}
+
+	MLX5_SET(create_umem_in, in, opcode, MLX5_CMD_OP_CREATE_UMEM);
+	MLX5_SET(create_umem_in, in, uid, ndev->mvdev.res.uid);
+	um = MLX5_ADDR_OF(create_umem_in, in, umem);
+	MLX5_SET(umem, um, log_page_size, umem->frag_buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT);
+	MLX5_SET64(umem, um, num_of_mtt, umem->frag_buf.npages);
+
+	pas = (__be64 *)MLX5_ADDR_OF(umem, um, mtt[0]);
+	mlx5_fill_page_frag_array_perm(&umem->frag_buf, pas, MLX5_MTT_PERM_RW);
+
+	err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
+	if (err) {
+		mlx5_vdpa_warn(&ndev->mvdev, "create umem(%d)\n", err);
+		goto err_cmd;
+	}
+
+	kfree(in);
+	umem->id = MLX5_GET(create_umem_out, out, umem_id);
+
+	return 0;
+
+err_cmd:
+	kfree(in);
+err_in:
+	umem_frag_buf_free(ndev, umem);
+	return err;
+}
+
+static void umem_destroy(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, int num)
+{
+	u32 in[MLX5_ST_SZ_DW(destroy_umem_in)] = {};
+	u32 out[MLX5_ST_SZ_DW(destroy_umem_out)] = {};
+	struct mlx5_vdpa_umem *umem;
+
+	switch (num) {
+	case 1:
+		umem = &mvq->umem1;
+		break;
+	case 2:
+		umem = &mvq->umem2;
+		break;
+	case 3:
+		umem = &mvq->umem3;
+		break;
+	}
+
+	MLX5_SET(destroy_umem_in, in, opcode, MLX5_CMD_OP_DESTROY_UMEM);
+	MLX5_SET(destroy_umem_in, in, umem_id, umem->id);
+	if (mlx5_cmd_exec(ndev->mvdev.mdev, in, sizeof(in), out, sizeof(out)))
+		return;
+
+	umem_frag_buf_free(ndev, umem);
+}
+
+static int umems_create(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+	int num;
+	int err;
+
+	for (num = 1; num <= 3; num++) {
+		err = create_umem(ndev, mvq, num);
+		if (err)
+			goto err_umem;
+	}
+	return 0;
+
+err_umem:
+	for (num--; num > 0; num--)
+		umem_destroy(ndev, mvq, num);
+
+	return err;
+}
+
+static void umems_destroy(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+	int num;
+
+	for (num = 3; num > 0; num--)
+		umem_destroy(ndev, mvq, num);
+}
+
+static int get_queue_type(struct mlx5_vdpa_net *ndev)
+{
+	u32 type_mask;
+
+	type_mask = MLX5_CAP_DEV_VDPA_EMULATION(ndev->mvdev.mdev, virtio_queue_type);
+
+	/* prefer split queue */
+	if (type_mask & MLX5_VIRTIO_EMULATION_CAP_VIRTIO_QUEUE_TYPE_PACKED)
+		return MLX5_VIRTIO_EMULATION_VIRTIO_QUEUE_TYPE_PACKED;
+
+	WARN_ON(!(type_mask & MLX5_VIRTIO_EMULATION_CAP_VIRTIO_QUEUE_TYPE_SPLIT));
+
+	return MLX5_VIRTIO_EMULATION_VIRTIO_QUEUE_TYPE_SPLIT;
+}
+
+static bool vq_is_tx(u16 idx)
+{
+	return idx % 2;
+}
+
+static u16 get_features_12_3(u64 features)
+{
+	return (!!(features & BIT(VIRTIO_NET_F_HOST_TSO4)) << 9) |
+	       (!!(features & BIT(VIRTIO_NET_F_HOST_TSO6)) << 8) |
+	       (!!(features & BIT(VIRTIO_NET_F_CSUM)) << 7) |
+	       (!!(features & BIT(VIRTIO_NET_F_GUEST_CSUM)) << 6);
+}
+
+static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+	int inlen = MLX5_ST_SZ_BYTES(create_virtio_net_q_in);
+	u32 out[MLX5_ST_SZ_DW(create_virtio_net_q_out)] = {};
+	void *obj_context;
+	void *cmd_hdr;
+	void *vq_ctx;
+	void *in;
+	int err;
+
+	err = umems_create(ndev, mvq);
+	if (err)
+		return err;
+
+	in = kzalloc(inlen, GFP_KERNEL);
+	if (!in) {
+		err = -ENOMEM;
+		goto err_alloc;
+	}
+
+	cmd_hdr = MLX5_ADDR_OF(create_virtio_net_q_in, in, general_obj_in_cmd_hdr);
+
+	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, opcode, MLX5_CMD_OP_CREATE_GENERAL_OBJECT);
+	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_type, MLX5_OBJ_TYPE_VIRTIO_NET_Q);
+	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
+
+	obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, obj_context);
+	MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, mvq->avail_idx);
+	MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_12_3,
+		 get_features_12_3(ndev->mvdev.actual_features));
+	vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, virtio_q_context);
+	MLX5_SET(virtio_q, vq_ctx, virtio_q_type, get_queue_type(ndev));
+
+	if (vq_is_tx(mvq->index))
+		MLX5_SET(virtio_net_q_object, obj_context, tisn_or_qpn, ndev->res.tisn);
+
+	MLX5_SET(virtio_q, vq_ctx, event_mode, MLX5_VIRTIO_Q_EVENT_MODE_QP_MODE);
+	MLX5_SET(virtio_q, vq_ctx, queue_index, mvq->index);
+	MLX5_SET(virtio_q, vq_ctx, event_qpn_or_msix, mvq->fwqp.mqp.qpn);
+	MLX5_SET(virtio_q, vq_ctx, queue_size, mvq->num_ent);
+	MLX5_SET(virtio_q, vq_ctx, virtio_version_1_0,
+		 !!(ndev->mvdev.actual_features & VIRTIO_F_VERSION_1));
+	MLX5_SET64(virtio_q, vq_ctx, desc_addr, mvq->desc_addr);
+	MLX5_SET64(virtio_q, vq_ctx, used_addr, mvq->device_addr);
+	MLX5_SET64(virtio_q, vq_ctx, available_addr, mvq->driver_addr);
+	MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, ndev->mvdev.mr.mkey.key);
+	MLX5_SET(virtio_q, vq_ctx, umem_1_id, mvq->umem1.id);
+	MLX5_SET(virtio_q, vq_ctx, umem_1_size, mvq->umem1.size);
+	MLX5_SET(virtio_q, vq_ctx, umem_2_id, mvq->umem2.id);
+	MLX5_SET(virtio_q, vq_ctx, umem_2_size, mvq->umem1.size);
+	MLX5_SET(virtio_q, vq_ctx, umem_3_id, mvq->umem3.id);
+	MLX5_SET(virtio_q, vq_ctx, umem_3_size, mvq->umem1.size);
+	MLX5_SET(virtio_q, vq_ctx, pd, ndev->mvdev.res.pdn);
+	if (MLX5_CAP_DEV_VDPA_EMULATION(ndev->mvdev.mdev, eth_frame_offload_type))
+		MLX5_SET(virtio_q, vq_ctx, virtio_version_1_0, 1);
+
+	err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
+	if (err)
+		goto err_cmd;
+
+	kfree(in);
+	mvq->virtq_id = MLX5_GET(general_obj_out_cmd_hdr, out, obj_id);
+
+	return 0;
+
+err_cmd:
+	kfree(in);
+err_alloc:
+	umems_destroy(ndev, mvq);
+	return err;
+}
+
+static void destroy_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+	u32 in[MLX5_ST_SZ_DW(destroy_virtio_net_q_in)] = {};
+	u32 out[MLX5_ST_SZ_DW(destroy_virtio_net_q_out)] = {};
+
+	MLX5_SET(destroy_virtio_net_q_in, in, general_obj_out_cmd_hdr.opcode,
+		 MLX5_CMD_OP_DESTROY_GENERAL_OBJECT);
+	MLX5_SET(destroy_virtio_net_q_in, in, general_obj_out_cmd_hdr.obj_id, mvq->virtq_id);
+	MLX5_SET(destroy_virtio_net_q_in, in, general_obj_out_cmd_hdr.uid, ndev->mvdev.res.uid);
+	MLX5_SET(destroy_virtio_net_q_in, in, general_obj_out_cmd_hdr.obj_type,
+		 MLX5_OBJ_TYPE_VIRTIO_NET_Q);
+	if (mlx5_cmd_exec(ndev->mvdev.mdev, in, sizeof(in), out, sizeof(out))) {
+		mlx5_vdpa_warn(&ndev->mvdev, "destroy virtqueue 0x%x\n", mvq->virtq_id);
+		return;
+	}
+	umems_destroy(ndev, mvq);
+}
+
+static u32 get_rqpn(struct mlx5_vdpa_virtqueue *mvq, bool fw)
+{
+	return fw ? mvq->vqqp.mqp.qpn : mvq->fwqp.mqp.qpn;
+}
+
+static u32 get_qpn(struct mlx5_vdpa_virtqueue *mvq, bool fw)
+{
+	return fw ? mvq->fwqp.mqp.qpn : mvq->vqqp.mqp.qpn;
+}
+
+static void alloc_inout(struct mlx5_vdpa_net *ndev, int cmd, void **in, int *inlen, void **out,
+			int *outlen, u32 qpn, u32 rqpn)
+{
+	void *qpc;
+	void *pp;
+
+	switch (cmd) {
+	case MLX5_CMD_OP_2RST_QP:
+		*inlen = MLX5_ST_SZ_BYTES(qp_2rst_in);
+		*outlen = MLX5_ST_SZ_BYTES(qp_2rst_out);
+		*in = kzalloc(*inlen, GFP_KERNEL);
+		*out = kzalloc(*outlen, GFP_KERNEL);
+		if (!in || !out)
+			goto outerr;
+
+		MLX5_SET(qp_2rst_in, *in, opcode, cmd);
+		MLX5_SET(qp_2rst_in, *in, uid, ndev->mvdev.res.uid);
+		MLX5_SET(qp_2rst_in, *in, qpn, qpn);
+		break;
+	case MLX5_CMD_OP_RST2INIT_QP:
+		*inlen = MLX5_ST_SZ_BYTES(rst2init_qp_in);
+		*outlen = MLX5_ST_SZ_BYTES(rst2init_qp_out);
+		*in = kzalloc(*inlen, GFP_KERNEL);
+		*out = kzalloc(MLX5_ST_SZ_BYTES(rst2init_qp_out), GFP_KERNEL);
+		if (!in || !out)
+			goto outerr;
+
+		MLX5_SET(rst2init_qp_in, *in, opcode, cmd);
+		MLX5_SET(rst2init_qp_in, *in, uid, ndev->mvdev.res.uid);
+		MLX5_SET(rst2init_qp_in, *in, qpn, qpn);
+		qpc = MLX5_ADDR_OF(rst2init_qp_in, *in, qpc);
+		MLX5_SET(qpc, qpc, remote_qpn, rqpn);
+		MLX5_SET(qpc, qpc, rwe, 1);
+		pp = MLX5_ADDR_OF(qpc, qpc, primary_address_path);
+		MLX5_SET(ads, pp, vhca_port_num, 1);
+		break;
+	case MLX5_CMD_OP_INIT2RTR_QP:
+		*inlen = MLX5_ST_SZ_BYTES(init2rtr_qp_in);
+		*outlen = MLX5_ST_SZ_BYTES(init2rtr_qp_out);
+		*in = kzalloc(*inlen, GFP_KERNEL);
+		*out = kzalloc(MLX5_ST_SZ_BYTES(init2rtr_qp_out), GFP_KERNEL);
+		if (!in || !out)
+			goto outerr;
+
+		MLX5_SET(init2rtr_qp_in, *in, opcode, cmd);
+		MLX5_SET(init2rtr_qp_in, *in, uid, ndev->mvdev.res.uid);
+		MLX5_SET(init2rtr_qp_in, *in, qpn, qpn);
+		qpc = MLX5_ADDR_OF(rst2init_qp_in, *in, qpc);
+		MLX5_SET(qpc, qpc, mtu, MLX5_QPC_MTU_256_BYTES);
+		MLX5_SET(qpc, qpc, log_msg_max, 30);
+		MLX5_SET(qpc, qpc, remote_qpn, rqpn);
+		pp = MLX5_ADDR_OF(qpc, qpc, primary_address_path);
+		MLX5_SET(ads, pp, fl, 1);
+		break;
+	case MLX5_CMD_OP_RTR2RTS_QP:
+		*inlen = MLX5_ST_SZ_BYTES(rtr2rts_qp_in);
+		*outlen = MLX5_ST_SZ_BYTES(rtr2rts_qp_out);
+		*in = kzalloc(*inlen, GFP_KERNEL);
+		*out = kzalloc(MLX5_ST_SZ_BYTES(rtr2rts_qp_out), GFP_KERNEL);
+		if (!in || !out)
+			goto outerr;
+
+		MLX5_SET(rtr2rts_qp_in, *in, opcode, cmd);
+		MLX5_SET(rtr2rts_qp_in, *in, uid, ndev->mvdev.res.uid);
+		MLX5_SET(rtr2rts_qp_in, *in, qpn, qpn);
+		qpc = MLX5_ADDR_OF(rst2init_qp_in, *in, qpc);
+		pp = MLX5_ADDR_OF(qpc, qpc, primary_address_path);
+		MLX5_SET(ads, pp, ack_timeout, 14);
+		MLX5_SET(qpc, qpc, retry_count, 7);
+		MLX5_SET(qpc, qpc, rnr_retry, 7);
+		break;
+	default:
+		goto outerr;
+	}
+	if (!*in || !*out)
+		goto outerr;
+
+	return;
+
+outerr:
+	kfree(*in);
+	kfree(*out);
+	*in = NULL;
+	*out = NULL;
+}
+
+static void free_inout(void *in, void *out)
+{
+	kfree(in);
+	kfree(out);
+}
+
+/* Two QPs are used by each virtqueue. One is used by the driver and one by
+ * firmware. The fw argument indicates whether the subjected QP is the one used
+ * by firmware.
+ */
+static int modify_qp(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, bool fw, int cmd)
+{
+	int outlen;
+	int inlen;
+	void *out;
+	void *in;
+	int err;
+
+	alloc_inout(ndev, cmd, &in, &inlen, &out, &outlen, get_qpn(mvq, fw), get_rqpn(mvq, fw));
+	if (!in || !out)
+		return -ENOMEM;
+
+	err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, outlen);
+	free_inout(in, out);
+	return err;
+}
+
+static int connect_qps(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+	int err;
+
+	err = modify_qp(ndev, mvq, true, MLX5_CMD_OP_2RST_QP);
+	if (err)
+		return err;
+
+	err = modify_qp(ndev, mvq, false, MLX5_CMD_OP_2RST_QP);
+	if (err)
+		return err;
+
+	err = modify_qp(ndev, mvq, true, MLX5_CMD_OP_RST2INIT_QP);
+	if (err)
+		return err;
+
+	err = modify_qp(ndev, mvq, false, MLX5_CMD_OP_RST2INIT_QP);
+	if (err)
+		return err;
+
+	err = modify_qp(ndev, mvq, true, MLX5_CMD_OP_INIT2RTR_QP);
+	if (err)
+		return err;
+
+	err = modify_qp(ndev, mvq, false, MLX5_CMD_OP_INIT2RTR_QP);
+	if (err)
+		return err;
+
+	return modify_qp(ndev, mvq, true, MLX5_CMD_OP_RTR2RTS_QP);
+}
+
+struct mlx5_virtq_attr {
+	u8 state;
+	u16 available_index;
+};
+
+static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq,
+			   struct mlx5_virtq_attr *attr)
+{
+	int outlen = MLX5_ST_SZ_BYTES(query_virtio_net_q_out);
+	u32 in[MLX5_ST_SZ_DW(query_virtio_net_q_in)] = {};
+	void *out;
+	void *obj_context;
+	void *cmd_hdr;
+	int err;
+
+	out = kzalloc(outlen, GFP_KERNEL);
+	if (!out)
+		return -ENOMEM;
+
+	cmd_hdr = MLX5_ADDR_OF(query_virtio_net_q_in, in, general_obj_in_cmd_hdr);
+
+	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, opcode, MLX5_CMD_OP_QUERY_GENERAL_OBJECT);
+	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_type, MLX5_OBJ_TYPE_VIRTIO_NET_Q);
+	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_id, mvq->virtq_id);
+	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
+	err = mlx5_cmd_exec(ndev->mvdev.mdev, in, sizeof(in), out, outlen);
+	if (err)
+		goto err_cmd;
+
+	obj_context = MLX5_ADDR_OF(query_virtio_net_q_out, out, obj_context);
+	memset(attr, 0, sizeof(*attr));
+	attr->state = MLX5_GET(virtio_net_q_object, obj_context, state);
+	attr->available_index = MLX5_GET(virtio_net_q_object, obj_context, hw_available_index);
+	kfree(out);
+	return 0;
+
+err_cmd:
+	kfree(out);
+	return err;
+}
+
+static int modify_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, int state)
+{
+	int inlen = MLX5_ST_SZ_BYTES(modify_virtio_net_q_in);
+	u32 out[MLX5_ST_SZ_DW(modify_virtio_net_q_out)] = {};
+	void *obj_context;
+	void *cmd_hdr;
+	void *in;
+	int err;
+
+	in = kzalloc(inlen, GFP_KERNEL);
+	if (!in)
+		return -ENOMEM;
+
+	cmd_hdr = MLX5_ADDR_OF(modify_virtio_net_q_in, in, general_obj_in_cmd_hdr);
+
+	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, opcode, MLX5_CMD_OP_MODIFY_GENERAL_OBJECT);
+	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_type, MLX5_OBJ_TYPE_VIRTIO_NET_Q);
+	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_id, mvq->virtq_id);
+	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
+
+	obj_context = MLX5_ADDR_OF(modify_virtio_net_q_in, in, obj_context);
+	MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select,
+		   MLX5_VIRTQ_MODIFY_MASK_STATE);
+	MLX5_SET(virtio_net_q_object, obj_context, state, state);
+	err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
+	kfree(in);
+	if (!err)
+		mvq->fw_state = state;
+
+	return err;
+}
+
+static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+	u16 idx = mvq->index;
+	int err;
+
+	if (!mvq->num_ent)
+		return 0;
+
+	if (mvq->initialized) {
+		mlx5_vdpa_warn(&ndev->mvdev, "attempt re init\n");
+		return -EINVAL;
+	}
+
+	err = cq_create(ndev, idx, mvq->num_ent);
+	if (err)
+		return err;
+
+	err = qp_create(ndev, mvq, &mvq->fwqp);
+	if (err)
+		goto err_fwqp;
+
+	err = qp_create(ndev, mvq, &mvq->vqqp);
+	if (err)
+		goto err_vqqp;
+
+	err = connect_qps(ndev, mvq);
+	if (err)
+		goto err_connect;
+
+	err = create_virtqueue(ndev, mvq);
+	if (err)
+		goto err_connect;
+
+	if (mvq->ready) {
+		err = modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
+		if (err) {
+			mlx5_vdpa_warn(&ndev->mvdev, "failed to modify to ready vq idx %d(%d)\n",
+				       idx, err);
+			goto err_connect;
+		}
+	}
+
+	mvq->initialized = true;
+	return 0;
+
+err_connect:
+	qp_destroy(ndev, &mvq->vqqp);
+err_vqqp:
+	qp_destroy(ndev, &mvq->fwqp);
+err_fwqp:
+	cq_destroy(ndev, idx);
+	return err;
+}
+
+static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+	struct mlx5_virtq_attr attr;
+
+	if (!mvq->initialized)
+		return;
+
+	if (query_virtqueue(ndev, mvq, &attr)) {
+		mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue\n");
+		return;
+	}
+	if (mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY)
+		return;
+
+	if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
+		mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");
+}
+
+static void suspend_vqs(struct mlx5_vdpa_net *ndev)
+{
+	int i;
+
+	for (i = 0; i < MLX5_MAX_SUPPORTED_VQS; i++)
+		suspend_vq(ndev, &ndev->vqs[i]);
+}
+
+static void teardown_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+	if (!mvq->initialized)
+		return;
+
+	suspend_vq(ndev, mvq);
+	destroy_virtqueue(ndev, mvq);
+	qp_destroy(ndev, &mvq->vqqp);
+	qp_destroy(ndev, &mvq->fwqp);
+	cq_destroy(ndev, mvq->index);
+	mvq->initialized = false;
+}
+
+static int create_rqt(struct mlx5_vdpa_net *ndev)
+{
+	int log_max_rqt;
+	__be32 *list;
+	void *rqtc;
+	int inlen;
+	void *in;
+	int i, j;
+	int err;
+
+	log_max_rqt = min_t(int, 1, MLX5_CAP_GEN(ndev->mvdev.mdev, log_max_rqt_size));
+	if (log_max_rqt < 1)
+		return -EOPNOTSUPP;
+
+	inlen = MLX5_ST_SZ_BYTES(create_rqt_in) + (1 << log_max_rqt) * MLX5_ST_SZ_BYTES(rq_num);
+	in = kzalloc(inlen, GFP_KERNEL);
+	if (!in)
+		return -ENOMEM;
+
+	MLX5_SET(create_rqt_in, in, uid, ndev->mvdev.res.uid);
+	rqtc = MLX5_ADDR_OF(create_rqt_in, in, rqt_context);
+
+	MLX5_SET(rqtc, rqtc, list_q_type, MLX5_RQTC_LIST_Q_TYPE_VIRTIO_NET_Q);
+	MLX5_SET(rqtc, rqtc, rqt_max_size, 1 << log_max_rqt);
+	MLX5_SET(rqtc, rqtc, rqt_actual_size, 1);
+	list = MLX5_ADDR_OF(rqtc, rqtc, rq_num[0]);
+	for (i = 0, j = 0; j < ndev->mvdev.max_vqs; j++) {
+		if (!ndev->vqs[j].initialized)
+			continue;
+
+		if (!vq_is_tx(ndev->vqs[j].index)) {
+			list[i] = cpu_to_be32(ndev->vqs[j].virtq_id);
+			i++;
+		}
+	}
+
+	err = mlx5_vdpa_create_rqt(&ndev->mvdev, in, inlen, &ndev->res.rqtn);
+	kfree(in);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+static void destroy_rqt(struct mlx5_vdpa_net *ndev)
+{
+	mlx5_vdpa_destroy_rqt(&ndev->mvdev, ndev->res.rqtn);
+}
+
+static int create_tir(struct mlx5_vdpa_net *ndev)
+{
+#define HASH_IP_L4PORTS                                                                            \
+	(MLX5_HASH_FIELD_SEL_SRC_IP | MLX5_HASH_FIELD_SEL_DST_IP | MLX5_HASH_FIELD_SEL_L4_SPORT |  \
+	 MLX5_HASH_FIELD_SEL_L4_DPORT)
+	static const u8 rx_hash_toeplitz_key[] = { 0x2c, 0xc6, 0x81, 0xd1, 0x5b, 0xdb, 0xf4, 0xf7,
+						   0xfc, 0xa2, 0x83, 0x19, 0xdb, 0x1a, 0x3e, 0x94,
+						   0x6b, 0x9e, 0x38, 0xd9, 0x2c, 0x9c, 0x03, 0xd1,
+						   0xad, 0x99, 0x44, 0xa7, 0xd9, 0x56, 0x3d, 0x59,
+						   0x06, 0x3c, 0x25, 0xf3, 0xfc, 0x1f, 0xdc, 0x2a };
+	void *rss_key;
+	void *outer;
+	void *tirc;
+	void *in;
+	int err;
+
+	in = kzalloc(MLX5_ST_SZ_BYTES(create_tir_in), GFP_KERNEL);
+	if (!in)
+		return -ENOMEM;
+
+	MLX5_SET(create_tir_in, in, uid, ndev->mvdev.res.uid);
+	tirc = MLX5_ADDR_OF(create_tir_in, in, ctx);
+	MLX5_SET(tirc, tirc, disp_type, MLX5_TIRC_DISP_TYPE_INDIRECT);
+
+	MLX5_SET(tirc, tirc, rx_hash_symmetric, 1);
+	MLX5_SET(tirc, tirc, rx_hash_fn, MLX5_RX_HASH_FN_TOEPLITZ);
+	rss_key = MLX5_ADDR_OF(tirc, tirc, rx_hash_toeplitz_key);
+	memcpy(rss_key, rx_hash_toeplitz_key, sizeof(rx_hash_toeplitz_key));
+
+	outer = MLX5_ADDR_OF(tirc, tirc, rx_hash_field_selector_outer);
+	MLX5_SET(rx_hash_field_select, outer, l3_prot_type, MLX5_L3_PROT_TYPE_IPV4);
+	MLX5_SET(rx_hash_field_select, outer, l4_prot_type, MLX5_L4_PROT_TYPE_TCP);
+	MLX5_SET(rx_hash_field_select, outer, selected_fields, HASH_IP_L4PORTS);
+
+	MLX5_SET(tirc, tirc, indirect_table, ndev->res.rqtn);
+	MLX5_SET(tirc, tirc, transport_domain, ndev->res.tdn);
+
+	err = mlx5_vdpa_create_tir(&ndev->mvdev, in, &ndev->res.tirn);
+	kfree(in);
+	return err;
+}
+
+static void destroy_tir(struct mlx5_vdpa_net *ndev)
+{
+	mlx5_vdpa_destroy_tir(&ndev->mvdev, ndev->res.tirn);
+}
+
+static int add_fwd_to_tir(struct mlx5_vdpa_net *ndev)
+{
+	struct mlx5_flow_destination dest[2] = {};
+	struct mlx5_flow_table_attr ft_attr = {};
+	struct mlx5_flow_act flow_act = {};
+	struct mlx5_flow_namespace *ns;
+	int err;
+
+	/* for now, one entry, match all, forward to tir */
+	ft_attr.max_fte = 1;
+	ft_attr.autogroup.max_num_groups = 1;
+
+	ns = mlx5_get_flow_namespace(ndev->mvdev.mdev, MLX5_FLOW_NAMESPACE_BYPASS);
+	if (!ns) {
+		mlx5_vdpa_warn(&ndev->mvdev, "get flow namespace\n");
+		return -EOPNOTSUPP;
+	}
+
+	ndev->rxft = mlx5_create_auto_grouped_flow_table(ns, &ft_attr);
+	if (IS_ERR(ndev->rxft))
+		return PTR_ERR(ndev->rxft);
+
+	ndev->rx_counter = mlx5_fc_create(ndev->mvdev.mdev, false);
+	if (IS_ERR(ndev->rx_counter)) {
+		err = PTR_ERR(ndev->rx_counter);
+		goto err_fc;
+	}
+
+	flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST | MLX5_FLOW_CONTEXT_ACTION_COUNT;
+	dest[0].type = MLX5_FLOW_DESTINATION_TYPE_TIR;
+	dest[0].tir_num = ndev->res.tirn;
+	dest[1].type = MLX5_FLOW_DESTINATION_TYPE_COUNTER;
+	dest[1].counter_id = mlx5_fc_id(ndev->rx_counter);
+	ndev->rx_rule = mlx5_add_flow_rules(ndev->rxft, NULL, &flow_act, dest, 2);
+	if (IS_ERR(ndev->rx_rule)) {
+		err = PTR_ERR(ndev->rx_rule);
+		ndev->rx_rule = NULL;
+		goto err_rule;
+	}
+
+	return 0;
+
+err_rule:
+	mlx5_fc_destroy(ndev->mvdev.mdev, ndev->rx_counter);
+err_fc:
+	mlx5_destroy_flow_table(ndev->rxft);
+	return err;
+}
+
+static void remove_fwd_to_tir(struct mlx5_vdpa_net *ndev)
+{
+	if (!ndev->rx_rule)
+		return;
+
+	mlx5_del_flow_rules(ndev->rx_rule);
+	mlx5_fc_destroy(ndev->mvdev.mdev, ndev->rx_counter);
+	mlx5_destroy_flow_table(ndev->rxft);
+
+	ndev->rx_rule = NULL;
+}
+
+static void mlx5_vdpa_kick_vq(struct vdpa_device *vdev, u16 idx)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
+
+	if (unlikely(!mvq->ready))
+		return;
+
+	iowrite16(idx, ndev->mvdev.res.kick_addr);
+}
+
+static int mlx5_vdpa_set_vq_address(struct vdpa_device *vdev, u16 idx, u64 desc_area,
+				    u64 driver_area, u64 device_area)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
+
+	mvq->desc_addr = desc_area;
+	mvq->device_addr = device_area;
+	mvq->driver_addr = driver_area;
+	return 0;
+}
+
+static void mlx5_vdpa_set_vq_num(struct vdpa_device *vdev, u16 idx, u32 num)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	struct mlx5_vdpa_virtqueue *mvq;
+
+	mvq = &ndev->vqs[idx];
+	mvq->num_ent = num;
+}
+
+static void mlx5_vdpa_set_vq_cb(struct vdpa_device *vdev, u16 idx, struct vdpa_callback *cb)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	struct mlx5_vdpa_virtqueue *vq = &ndev->vqs[idx];
+
+	vq->event_cb = *cb;
+}
+
+static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
+
+	if (!ready)
+		suspend_vq(ndev, mvq);
+
+	mvq->ready = ready;
+}
+
+static bool mlx5_vdpa_get_vq_ready(struct vdpa_device *vdev, u16 idx)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
+
+	return mvq->ready;
+}
+
+static int mlx5_vdpa_set_vq_state(struct vdpa_device *vdev, u16 idx,
+				  const struct vdpa_vq_state *state)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
+
+	if (mvq->fw_state == MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY) {
+		mlx5_vdpa_warn(mvdev, "can't modify available index\n");
+		return -EINVAL;
+	}
+
+	mvq->avail_idx = state->avail_index;
+	return 0;
+}
+
+static int mlx5_vdpa_get_vq_state(struct vdpa_device *vdev, u16 idx, struct vdpa_vq_state *state)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
+	struct mlx5_virtq_attr attr;
+	int err;
+
+	if (!mvq->initialized)
+		return -EAGAIN;
+
+	err = query_virtqueue(ndev, mvq, &attr);
+	if (err) {
+		mlx5_vdpa_warn(mvdev, "failed to query virtqueue\n");
+		return err;
+	}
+	state->avail_index = attr.available_index;
+	return 0;
+}
+
+static u32 mlx5_vdpa_get_vq_align(struct vdpa_device *vdev)
+{
+	return PAGE_SIZE;
+}
+
+enum { MLX5_VIRTIO_NET_F_GUEST_CSUM = 1 << 9,
+	MLX5_VIRTIO_NET_F_CSUM = 1 << 10,
+	MLX5_VIRTIO_NET_F_HOST_TSO6 = 1 << 11,
+	MLX5_VIRTIO_NET_F_HOST_TSO4 = 1 << 12,
+};
+
+static u64 mlx_to_vritio_features(u16 dev_features)
+{
+	u64 result = 0;
+
+	if (dev_features & MLX5_VIRTIO_NET_F_GUEST_CSUM)
+		result |= BIT(VIRTIO_NET_F_GUEST_CSUM);
+	if (dev_features & MLX5_VIRTIO_NET_F_CSUM)
+		result |= BIT(VIRTIO_NET_F_CSUM);
+	if (dev_features & MLX5_VIRTIO_NET_F_HOST_TSO6)
+		result |= BIT(VIRTIO_NET_F_HOST_TSO6);
+	if (dev_features & MLX5_VIRTIO_NET_F_HOST_TSO4)
+		result |= BIT(VIRTIO_NET_F_HOST_TSO4);
+
+	return result;
+}
+
+static u64 mlx5_vdpa_get_features(struct vdpa_device *vdev)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	u16 dev_features;
+
+	dev_features = MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, device_features_bits_mask);
+	ndev->mvdev.mlx_features = mlx_to_vritio_features(dev_features);
+	if (MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, virtio_version_1_0))
+		ndev->mvdev.mlx_features |= BIT(VIRTIO_F_VERSION_1);
+	ndev->mvdev.mlx_features |= BIT(VIRTIO_F_IOMMU_PLATFORM);
+	print_features(mvdev, ndev->mvdev.mlx_features, false);
+	return ndev->mvdev.mlx_features;
+}
+
+static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features)
+{
+	if (!(features & BIT(VIRTIO_F_IOMMU_PLATFORM)))
+		return -EOPNOTSUPP;
+
+	return 0;
+}
+
+static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
+{
+	int err;
+	int i;
+
+	for (i = 0; i < 2 * mlx5_vdpa_max_qps(ndev->mvdev.max_vqs); i++) {
+		err = setup_vq(ndev, &ndev->vqs[i]);
+		if (err)
+			goto err_vq;
+	}
+
+	return 0;
+
+err_vq:
+	for (--i; i >= 0; i--)
+		teardown_vq(ndev, &ndev->vqs[i]);
+
+	return err;
+}
+
+static void teardown_virtqueues(struct mlx5_vdpa_net *ndev)
+{
+	struct mlx5_vdpa_virtqueue *mvq;
+	int i;
+
+	for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) {
+		mvq = &ndev->vqs[i];
+		if (!mvq->initialized)
+			continue;
+
+		teardown_vq(ndev, mvq);
+	}
+}
+
+static int mlx5_vdpa_set_features(struct vdpa_device *vdev, u64 features)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	int err;
+
+	print_features(mvdev, features, true);
+
+	err = verify_min_features(mvdev, features);
+	if (err)
+		return err;
+
+	ndev->mvdev.actual_features = features & ndev->mvdev.mlx_features;
+	return err;
+}
+
+static void mlx5_vdpa_set_config_cb(struct vdpa_device *vdev, struct vdpa_callback *cb)
+{
+	/* not implemented */
+	mlx5_vdpa_warn(to_mvdev(vdev), "set config callback not supported\n");
+}
+
+#define MLX5_VDPA_MAX_VQ_ENTRIES 256
+static u16 mlx5_vdpa_get_vq_num_max(struct vdpa_device *vdev)
+{
+	return MLX5_VDPA_MAX_VQ_ENTRIES;
+}
+
+static u32 mlx5_vdpa_get_device_id(struct vdpa_device *vdev)
+{
+	return VIRTIO_ID_NET;
+}
+
+static u32 mlx5_vdpa_get_vendor_id(struct vdpa_device *vdev)
+{
+	return PCI_VENDOR_ID_MELLANOX;
+}
+
+static u8 mlx5_vdpa_get_status(struct vdpa_device *vdev)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+
+	print_status(mvdev, ndev->mvdev.status, false);
+	return ndev->mvdev.status;
+}
+
+static int save_channel_info(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+	struct mlx5_vq_restore_info *ri = &mvq->ri;
+	struct mlx5_virtq_attr attr;
+	int err;
+
+	if (!mvq->initialized)
+		return 0;
+
+	err = query_virtqueue(ndev, mvq, &attr);
+	if (err)
+		return err;
+
+	ri->avail_index = attr.available_index;
+	ri->ready = mvq->ready;
+	ri->num_ent = mvq->num_ent;
+	ri->desc_addr = mvq->desc_addr;
+	ri->device_addr = mvq->device_addr;
+	ri->driver_addr = mvq->driver_addr;
+	ri->cb = mvq->event_cb;
+	ri->restore = true;
+	return 0;
+}
+
+static int save_channels_info(struct mlx5_vdpa_net *ndev)
+{
+	int i;
+
+	for (i = 0; i < ndev->mvdev.max_vqs; i++) {
+		memset(&ndev->vqs[i].ri, 0, sizeof(ndev->vqs[i].ri));
+		save_channel_info(ndev, &ndev->vqs[i]);
+	}
+	return 0;
+}
+
+static void mlx5_clear_vqs(struct mlx5_vdpa_net *ndev)
+{
+	int i;
+
+	for (i = 0; i < ndev->mvdev.max_vqs; i++)
+		memset(&ndev->vqs[i], 0, offsetof(struct mlx5_vdpa_virtqueue, ri));
+}
+
+static void restore_channels_info(struct mlx5_vdpa_net *ndev)
+{
+	struct mlx5_vdpa_virtqueue *mvq;
+	struct mlx5_vq_restore_info *ri;
+	int i;
+
+	mlx5_clear_vqs(ndev);
+	init_mvqs(ndev);
+	for (i = 0; i < ndev->mvdev.max_vqs; i++) {
+		mvq = &ndev->vqs[i];
+		ri = &mvq->ri;
+		if (!ri->restore)
+			continue;
+
+		mvq->avail_idx = ri->avail_index;
+		mvq->ready = ri->ready;
+		mvq->num_ent = ri->num_ent;
+		mvq->desc_addr = ri->desc_addr;
+		mvq->device_addr = ri->device_addr;
+		mvq->driver_addr = ri->driver_addr;
+		mvq->event_cb = ri->cb;
+	}
+}
+
+static int mlx5_vdpa_change_map(struct mlx5_vdpa_net *ndev, struct vhost_iotlb *iotlb)
+{
+	int err;
+
+	suspend_vqs(ndev);
+	err = save_channels_info(ndev);
+	if (err)
+		goto err_mr;
+
+	teardown_driver(ndev);
+	mlx5_vdpa_destroy_mr(&ndev->mvdev);
+	err = mlx5_vdpa_create_mr(&ndev->mvdev, iotlb);
+	if (err)
+		goto err_mr;
+
+	restore_channels_info(ndev);
+	err = setup_driver(ndev);
+	if (err)
+		goto err_setup;
+
+	return 0;
+
+err_setup:
+	mlx5_vdpa_destroy_mr(&ndev->mvdev);
+err_mr:
+	return err;
+}
+
+static int setup_driver(struct mlx5_vdpa_net *ndev)
+{
+	int err;
+
+	mutex_lock(&ndev->reslock);
+	if (ndev->setup) {
+		mlx5_vdpa_warn(&ndev->mvdev, "setup driver called for already setup driver\n");
+		err = 0;
+		goto out;
+	}
+	err = setup_virtqueues(ndev);
+	if (err) {
+		mlx5_vdpa_warn(&ndev->mvdev, "setup_virtqueues\n");
+		goto out;
+	}
+
+	err = create_rqt(ndev);
+	if (err) {
+		mlx5_vdpa_warn(&ndev->mvdev, "create_rqt\n");
+		goto err_rqt;
+	}
+
+	err = create_tir(ndev);
+	if (err) {
+		mlx5_vdpa_warn(&ndev->mvdev, "create_tir\n");
+		goto err_tir;
+	}
+
+	err = add_fwd_to_tir(ndev);
+	if (err) {
+		mlx5_vdpa_warn(&ndev->mvdev, "add_fwd_to_tir\n");
+		goto err_fwd;
+	}
+	ndev->setup = true;
+	mutex_unlock(&ndev->reslock);
+
+	return 0;
+
+err_fwd:
+	destroy_tir(ndev);
+err_tir:
+	destroy_rqt(ndev);
+err_rqt:
+	teardown_virtqueues(ndev);
+out:
+	mutex_unlock(&ndev->reslock);
+	return err;
+}
+
+static void teardown_driver(struct mlx5_vdpa_net *ndev)
+{
+	mutex_lock(&ndev->reslock);
+	if (!ndev->setup)
+		goto out;
+
+	remove_fwd_to_tir(ndev);
+	destroy_tir(ndev);
+	destroy_rqt(ndev);
+	teardown_virtqueues(ndev);
+	ndev->setup = false;
+out:
+	mutex_unlock(&ndev->reslock);
+}
+
+static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	int err;
+
+	print_status(mvdev, status, true);
+	if (!status) {
+		mlx5_vdpa_info(mvdev, "performing device reset\n");
+		teardown_driver(ndev);
+		mlx5_vdpa_destroy_mr(&ndev->mvdev);
+		ndev->mvdev.status = 0;
+		ndev->mvdev.mlx_features = 0;
+		++mvdev->generation;
+		return;
+	}
+
+	if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) {
+		if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+			err = setup_driver(ndev);
+			if (err) {
+				mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
+				goto err_setup;
+			}
+		} else {
+			mlx5_vdpa_warn(mvdev, "did not expect DRIVER_OK to be cleared\n");
+			return;
+		}
+	}
+
+	ndev->mvdev.status = status;
+	return;
+
+err_setup:
+	mlx5_vdpa_destroy_mr(&ndev->mvdev);
+	ndev->mvdev.status |= VIRTIO_CONFIG_S_FAILED;
+}
+
+static void mlx5_vdpa_get_config(struct vdpa_device *vdev, unsigned int offset, void *buf,
+				 unsigned int len)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+
+	if (offset + len < sizeof(struct virtio_net_config))
+		memcpy(buf, &ndev->config + offset, len);
+}
+
+static void mlx5_vdpa_set_config(struct vdpa_device *vdev, unsigned int offset, const void *buf,
+				 unsigned int len)
+{
+	/* not supported */
+}
+
+static u32 mlx5_vdpa_get_generation(struct vdpa_device *vdev)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+
+	return mvdev->generation;
+}
+
+static int mlx5_vdpa_set_map(struct vdpa_device *vdev, struct vhost_iotlb *iotlb)
+{
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	bool change_map;
+	int err;
+
+	err = mlx5_vdpa_handle_set_map(mvdev, iotlb, &change_map);
+	if (err) {
+		mlx5_vdpa_warn(mvdev, "set map failed(%d)\n", err);
+		return err;
+	}
+
+	if (change_map)
+		return mlx5_vdpa_change_map(ndev, iotlb);
+
+	return 0;
+}
+
+static void mlx5_vdpa_free(struct vdpa_device *vdev)
+{
+	/* not used */
+}
+
+static const struct vdpa_config_ops mlx5_vdpa_ops = {
+	.set_vq_address = mlx5_vdpa_set_vq_address,
+	.set_vq_num = mlx5_vdpa_set_vq_num,
+	.kick_vq = mlx5_vdpa_kick_vq,
+	.set_vq_cb = mlx5_vdpa_set_vq_cb,
+	.set_vq_ready = mlx5_vdpa_set_vq_ready,
+	.get_vq_ready = mlx5_vdpa_get_vq_ready,
+	.set_vq_state = mlx5_vdpa_set_vq_state,
+	.get_vq_state = mlx5_vdpa_get_vq_state,
+	.get_vq_align = mlx5_vdpa_get_vq_align,
+	.get_features = mlx5_vdpa_get_features,
+	.set_features = mlx5_vdpa_set_features,
+	.set_config_cb = mlx5_vdpa_set_config_cb,
+	.get_vq_num_max = mlx5_vdpa_get_vq_num_max,
+	.get_device_id = mlx5_vdpa_get_device_id,
+	.get_vendor_id = mlx5_vdpa_get_vendor_id,
+	.get_status = mlx5_vdpa_get_status,
+	.set_status = mlx5_vdpa_set_status,
+	.get_config = mlx5_vdpa_get_config,
+	.set_config = mlx5_vdpa_set_config,
+	.get_generation = mlx5_vdpa_get_generation,
+	.set_map = mlx5_vdpa_set_map,
+	.free = mlx5_vdpa_free,
+};
+
+static int alloc_resources(struct mlx5_vdpa_net *ndev)
+{
+	struct mlx5_vdpa_net_resources *res = &ndev->res;
+	int err;
+
+	if (res->valid) {
+		mlx5_vdpa_warn(&ndev->mvdev, "resources already allocated\n");
+		return -EEXIST;
+	}
+
+	err = mlx5_vdpa_alloc_transport_domain(&ndev->mvdev, &res->tdn);
+	if (err)
+		return err;
+
+	err = create_tis(ndev);
+	if (err)
+		goto err_tis;
+
+	res->valid = true;
+
+	return 0;
+
+err_tis:
+	mlx5_vdpa_dealloc_transport_domain(&ndev->mvdev, res->tdn);
+	return err;
+}
+
+static void free_resources(struct mlx5_vdpa_net *ndev)
+{
+	struct mlx5_vdpa_net_resources *res = &ndev->res;
+
+	if (!res->valid)
+		return;
+
+	destroy_tis(ndev);
+	mlx5_vdpa_dealloc_transport_domain(&ndev->mvdev, res->tdn);
+	res->valid = false;
+}
+
+static void init_mvqs(struct mlx5_vdpa_net *ndev)
+{
+	struct mlx5_vdpa_virtqueue *mvq;
+	int i;
+
+	for (i = 0; i < 2 * mlx5_vdpa_max_qps(ndev->mvdev.max_vqs); ++i) {
+		mvq = &ndev->vqs[i];
+		memset(mvq, 0, offsetof(struct mlx5_vdpa_virtqueue, ri));
+		mvq->index = i;
+		mvq->ndev = ndev;
+		mvq->fwqp.fw = true;
+	}
+	for (; i < ndev->mvdev.max_vqs; i++) {
+		mvq = &ndev->vqs[i];
+		memset(mvq, 0, offsetof(struct mlx5_vdpa_virtqueue, ri));
+		mvq->index = i;
+		mvq->ndev = ndev;
+	}
+}
+
+void *mlx5_vdpa_add_dev(struct mlx5_core_dev *mdev)
+{
+	struct virtio_net_config *config;
+	struct mlx5_vdpa_dev *mvdev;
+	struct mlx5_vdpa_net *ndev;
+	u32 max_vqs;
+	int err;
+
+	/* we save one virtqueue for control virtqueue should we require it */
+	max_vqs = MLX5_CAP_DEV_VDPA_EMULATION(mdev, max_num_virtio_queues);
+	max_vqs = min_t(u32, max_vqs, MLX5_MAX_SUPPORTED_VQS);
+
+	ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, mdev->device, &mlx5_vdpa_ops,
+				 2 * mlx5_vdpa_max_qps(max_vqs));
+	if (IS_ERR(ndev))
+		return ndev;
+
+	ndev->mvdev.max_vqs = max_vqs;
+	mvdev = &ndev->mvdev;
+	mvdev->mdev = mdev;
+	init_mvqs(ndev);
+	mutex_init(&ndev->reslock);
+	config = &ndev->config;
+	err = mlx5_query_nic_vport_mtu(mdev, &config->mtu);
+	if (err)
+		goto err_mtu;
+
+	err = mlx5_query_nic_vport_mac_address(mdev, 0, 0, config->mac);
+	if (err)
+		goto err_mtu;
+
+	mvdev->vdev.dma_dev = mdev->device;
+	err = mlx5_vdpa_alloc_resources(&ndev->mvdev);
+	if (err)
+		goto err_mtu;
+
+	err = alloc_resources(ndev);
+	if (err)
+		goto err_res;
+
+	err = vdpa_register_device(&mvdev->vdev);
+	if (err)
+		goto err_reg;
+
+	return ndev;
+
+err_reg:
+	free_resources(ndev);
+err_res:
+	mlx5_vdpa_free_resources(&ndev->mvdev);
+err_mtu:
+	mutex_destroy(&ndev->reslock);
+	put_device(&mvdev->vdev.dev);
+	return ERR_PTR(err);
+}
+
+void mlx5_vdpa_remove_dev(struct mlx5_vdpa_dev *mvdev)
+{
+	struct mlx5_vdpa_net *ndev;
+
+	ndev = container_of(mvdev, struct mlx5_vdpa_net, mvdev);
+	vdpa_unregister_device(&ndev->mvdev.vdev);
+	free_resources(ndev);
+	mlx5_vdpa_free_resources(&ndev->mvdev);
+	mutex_destroy(&ndev->reslock);
+}
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.h b/drivers/vdpa/mlx5/net/mlx5_vnet.h
new file mode 100644
index 000000000000..f2d6d68b020e
--- /dev/null
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2020 Mellanox Technologies Ltd. */
+
+#ifndef __MLX5_VNET_H_
+#define __MLX5_VNET_H_
+
+#include <linux/vdpa.h>
+#include <linux/virtio_net.h>
+#include <linux/vringh.h>
+#include <linux/mlx5/driver.h>
+#include <linux/mlx5/cq.h>
+#include <linux/mlx5/qp.h>
+#include "mlx5_vdpa.h"
+
+static inline u32 mlx5_vdpa_max_qps(int max_vqs)
+{
+	return max_vqs / 2;
+}
+
+#define to_mlx5_vdpa_ndev(__mvdev) container_of(__mvdev, struct mlx5_vdpa_net, mvdev)
+void *mlx5_vdpa_add_dev(struct mlx5_core_dev *mdev);
+void mlx5_vdpa_remove_dev(struct mlx5_vdpa_dev *mvdev);
+
+#endif /* __MLX5_VNET_H_ */
-- 
2.26.0

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, back to index

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-20  7:14 [PATCH V2 vhost next 00/10] VDPA support for Mellanox ConnectX devices Eli Cohen
2020-07-20  7:14 ` [PATCH V2 vhost next 01/10] vhost-vdpa: support batch updating Eli Cohen
2020-07-20  7:14 ` [PATCH V2 vhost next 02/10] vdpa_sim: use the batching API Eli Cohen
2020-07-20  7:14 ` [PATCH V2 vhost next 03/10] vdpa: remove hard coded virtq num Eli Cohen
2020-07-20  7:14 ` [PATCH V2 vhost next 04/10] net/vdpa: Use struct for set/get vq state Eli Cohen
2020-07-20  7:14 ` [PATCH V2 vhost next 05/10] vhost: Fix documentation Eli Cohen
2020-07-21  5:35   ` Jason Wang
2020-07-20  7:14 ` [PATCH V2 vhost next 06/10] vdpa: Modify get_vq_state() to return error code Eli Cohen
2020-07-21  5:34   ` Jason Wang
2020-07-20  7:14 ` [PATCH V2 vhost next 07/10] vdpa/mlx5: Add hardware descriptive header file Eli Cohen
2020-07-20  7:14 ` [PATCH V2 vhost next 08/10] vdpa/mlx5: Add support library for mlx5 VDPA implementation Eli Cohen
2020-07-20  7:14 ` [PATCH V2 vhost next 09/10] vdpa/mlx5: Add shared memory registration code Eli Cohen
2020-07-20  7:14 ` [PATCH V2 vhost next 10/10] vdpa/mlx5: Add VDPA driver for supported mlx5 devices Eli Cohen
2020-07-20 12:55   ` kernel test robot
2020-07-22  7:46     ` Eli Cohen
2020-07-21  3:30   ` kernel test robot
2020-07-21  5:51   ` Jason Wang
  -- strict thread matches above, loose matches on Subject: below --
2020-07-20  6:54 [PATCH V2 vhost next 00/10] VDPA support for Mellanox ConnectX devices Eli Cohen
2020-07-20  6:54 ` [PATCH V2 vhost next 10/10] vdpa/mlx5: Add VDPA driver for supported mlx5 devices Eli Cohen

Virtualization Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/virtualization/0 virtualization/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 virtualization virtualization/ https://lore.kernel.org/virtualization \
		virtualization@lists.linuxfoundation.org
	public-inbox-index virtualization

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.linuxfoundation.lists.virtualization


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git