linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Patch v6 00/12] Introduce Microsoft Azure Network Adapter (MANA) RDMA driver
@ 2022-09-21  1:22 longli
  2022-09-21  1:22 ` [Patch v6 01/12] net: mana: Add support for auxiliary device longli
                   ` (12 more replies)
  0 siblings, 13 replies; 19+ messages in thread
From: longli @ 2022-09-21  1:22 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Jason Gunthorpe, Leon Romanovsky, edumazet, shiraz.saleem,
	Ajay Sharma
  Cc: linux-hyperv, netdev, linux-kernel, linux-rdma, Long Li

From: Long Li <longli@microsoft.com>

This patchset implements a RDMA driver for Microsoft Azure Network
Adapter (MANA). In MANA, the RDMA device is modeled as an auxiliary device
to the Ethernet device.

The first 11 patches modify the MANA Ethernet driver to support RDMA driver.
The last patch implementes the RDMA driver.

The user-mode of the driver is being reviewed at:
https://github.com/linux-rdma/rdma-core/pull/1177


Ajay Sharma (3):
  net: mana: Set the DMA device max segment size
  net: mana: Define and process GDMA response code
    GDMA_STATUS_MORE_ENTRIES
  net: mana: Define data structures for protection domain and memory
    registration

Long Li (9):
  net: mana: Add support for auxiliary device
  net: mana: Record the physical address for doorbell page region
  net: mana: Handle vport sharing between devices
  net: mana: Add functions for allocating doorbell page from GDMA
  net: mana: Export Work Queue functions for use by RDMA driver
  net: mana: Record port number in netdev
  net: mana: Move header files to a common location
  net: mana: Define max values for SGL entries
  RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter

 MAINTAINERS                                   |   4 +
 drivers/infiniband/Kconfig                    |   1 +
 drivers/infiniband/hw/Makefile                |   1 +
 drivers/infiniband/hw/mana/Kconfig            |   7 +
 drivers/infiniband/hw/mana/Makefile           |   4 +
 drivers/infiniband/hw/mana/cq.c               |  80 +++
 drivers/infiniband/hw/mana/device.c           | 129 ++++
 drivers/infiniband/hw/mana/main.c             | 555 ++++++++++++++++++
 drivers/infiniband/hw/mana/mana_ib.h          | 165 ++++++
 drivers/infiniband/hw/mana/mr.c               | 133 +++++
 drivers/infiniband/hw/mana/qp.c               | 501 ++++++++++++++++
 drivers/infiniband/hw/mana/wq.c               | 114 ++++
 drivers/net/ethernet/microsoft/Kconfig        |   1 +
 .../net/ethernet/microsoft/mana/gdma_main.c   |  96 ++-
 .../net/ethernet/microsoft/mana/hw_channel.c  |   6 +-
 .../net/ethernet/microsoft/mana/mana_bpf.c    |   2 +-
 drivers/net/ethernet/microsoft/mana/mana_en.c | 176 +++++-
 .../ethernet/microsoft/mana/mana_ethtool.c    |   2 +-
 .../net/ethernet/microsoft/mana/shm_channel.c |   2 +-
 .../microsoft => include/net}/mana/gdma.h     | 162 ++++-
 .../net}/mana/hw_channel.h                    |   0
 .../microsoft => include/net}/mana/mana.h     |  23 +-
 include/net/mana/mana_auxiliary.h             |  10 +
 .../net}/mana/shm_channel.h                   |   0
 include/uapi/rdma/ib_user_ioctl_verbs.h       |   1 +
 include/uapi/rdma/mana-abi.h                  |  66 +++
 26 files changed, 2196 insertions(+), 45 deletions(-)
 create mode 100644 drivers/infiniband/hw/mana/Kconfig
 create mode 100644 drivers/infiniband/hw/mana/Makefile
 create mode 100644 drivers/infiniband/hw/mana/cq.c
 create mode 100644 drivers/infiniband/hw/mana/device.c
 create mode 100644 drivers/infiniband/hw/mana/main.c
 create mode 100644 drivers/infiniband/hw/mana/mana_ib.h
 create mode 100644 drivers/infiniband/hw/mana/mr.c
 create mode 100644 drivers/infiniband/hw/mana/qp.c
 create mode 100644 drivers/infiniband/hw/mana/wq.c
 rename {drivers/net/ethernet/microsoft => include/net}/mana/gdma.h (80%)
 rename {drivers/net/ethernet/microsoft => include/net}/mana/hw_channel.h (100%)
 rename {drivers/net/ethernet/microsoft => include/net}/mana/mana.h (95%)
 create mode 100644 include/net/mana/mana_auxiliary.h
 rename {drivers/net/ethernet/microsoft => include/net}/mana/shm_channel.h (100%)
 create mode 100644 include/uapi/rdma/mana-abi.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Patch v6 01/12] net: mana: Add support for auxiliary device
  2022-09-21  1:22 [Patch v6 00/12] Introduce Microsoft Azure Network Adapter (MANA) RDMA driver longli
@ 2022-09-21  1:22 ` longli
  2022-09-21  1:22 ` [Patch v6 02/12] net: mana: Record the physical address for doorbell page region longli
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: longli @ 2022-09-21  1:22 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Jason Gunthorpe, Leon Romanovsky, edumazet, shiraz.saleem,
	Ajay Sharma
  Cc: linux-hyperv, netdev, linux-kernel, linux-rdma, Long Li

From: Long Li <longli@microsoft.com>

In preparation for supporting MANA RDMA driver, add support for auxiliary
device in the Ethernet driver. The RDMA device is modeled as an auxiliary
device to the Ethernet device.

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
---
Change log:
v3: define mana_adev_idx_alloc and mana_adev_idx_free as static

 drivers/net/ethernet/microsoft/Kconfig        |  1 +
 drivers/net/ethernet/microsoft/mana/gdma.h    |  2 +
 .../ethernet/microsoft/mana/mana_auxiliary.h  | 10 +++
 drivers/net/ethernet/microsoft/mana/mana_en.c | 84 ++++++++++++++++++-
 4 files changed, 96 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/microsoft/mana/mana_auxiliary.h

diff --git a/drivers/net/ethernet/microsoft/Kconfig b/drivers/net/ethernet/microsoft/Kconfig
index fe4e7a7d9c0b..090e6b983243 100644
--- a/drivers/net/ethernet/microsoft/Kconfig
+++ b/drivers/net/ethernet/microsoft/Kconfig
@@ -19,6 +19,7 @@ config MICROSOFT_MANA
 	tristate "Microsoft Azure Network Adapter (MANA) support"
 	depends on PCI_MSI && X86_64
 	depends on PCI_HYPERV
+	select AUXILIARY_BUS
 	help
 	  This driver supports Microsoft Azure Network Adapter (MANA).
 	  So far, the driver is only supported on X86_64.
diff --git a/drivers/net/ethernet/microsoft/mana/gdma.h b/drivers/net/ethernet/microsoft/mana/gdma.h
index 4a6efe6ada08..f321a2616d03 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma.h
+++ b/drivers/net/ethernet/microsoft/mana/gdma.h
@@ -204,6 +204,8 @@ struct gdma_dev {
 
 	/* GDMA driver specific pointer */
 	void *driver_data;
+
+	struct auxiliary_device *adev;
 };
 
 #define MINIMUM_SUPPORTED_PAGE_SIZE PAGE_SIZE
diff --git a/drivers/net/ethernet/microsoft/mana/mana_auxiliary.h b/drivers/net/ethernet/microsoft/mana/mana_auxiliary.h
new file mode 100644
index 000000000000..373d59756846
--- /dev/null
+++ b/drivers/net/ethernet/microsoft/mana/mana_auxiliary.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2022, Microsoft Corporation. */
+
+#include "mana.h"
+#include <linux/auxiliary_bus.h>
+
+struct mana_adev {
+	struct auxiliary_device adev;
+	struct gdma_dev *mdev;
+};
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 9259a74eca40..8ec6924a445c 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -13,6 +13,19 @@
 #include <net/ip6_checksum.h>
 
 #include "mana.h"
+#include "mana_auxiliary.h"
+
+static DEFINE_IDA(mana_adev_ida);
+
+static int mana_adev_idx_alloc(void)
+{
+	return ida_alloc(&mana_adev_ida, GFP_KERNEL);
+}
+
+static void mana_adev_idx_free(int idx)
+{
+	ida_free(&mana_adev_ida, idx);
+}
 
 /* Microsoft Azure Network Adapter (MANA) functions */
 
@@ -2106,6 +2119,70 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
 	return err;
 }
 
+static void adev_release(struct device *dev)
+{
+	struct mana_adev *madev = container_of(dev, struct mana_adev, adev.dev);
+
+	kfree(madev);
+}
+
+static void remove_adev(struct gdma_dev *gd)
+{
+	struct auxiliary_device *adev = gd->adev;
+	int id = adev->id;
+
+	auxiliary_device_delete(adev);
+	auxiliary_device_uninit(adev);
+
+	mana_adev_idx_free(id);
+	gd->adev = NULL;
+}
+
+static int add_adev(struct gdma_dev *gd)
+{
+	struct auxiliary_device *adev;
+	struct mana_adev *madev;
+	int ret;
+
+	madev = kzalloc(sizeof(*madev), GFP_KERNEL);
+	if (!madev)
+		return -ENOMEM;
+
+	adev = &madev->adev;
+	adev->id = mana_adev_idx_alloc();
+	if (adev->id < 0) {
+		ret = adev->id;
+		goto idx_fail;
+	}
+
+	adev->name = "rdma";
+	adev->dev.parent = gd->gdma_context->dev;
+	adev->dev.release = adev_release;
+	madev->mdev = gd;
+
+	ret = auxiliary_device_init(adev);
+	if (ret)
+		goto init_fail;
+
+	ret = auxiliary_device_add(adev);
+	if (ret)
+		goto add_fail;
+
+	gd->adev = adev;
+	return 0;
+
+add_fail:
+	auxiliary_device_uninit(adev);
+
+init_fail:
+	mana_adev_idx_free(adev->id);
+
+idx_fail:
+	kfree(madev);
+
+	return ret;
+}
+
 int mana_probe(struct gdma_dev *gd, bool resuming)
 {
 	struct gdma_context *gc = gd->gdma_context;
@@ -2173,6 +2250,8 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
 				break;
 		}
 	}
+
+	err = add_adev(gd);
 out:
 	if (err)
 		mana_remove(gd, false);
@@ -2189,6 +2268,10 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
 	int err;
 	int i;
 
+	/* adev currently doesn't support suspending, always remove it */
+	if (gd->adev)
+		remove_adev(gd);
+
 	for (i = 0; i < ac->num_ports; i++) {
 		ndev = ac->ports[i];
 		if (!ndev) {
@@ -2221,7 +2304,6 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
 	}
 
 	mana_destroy_eq(ac);
-
 out:
 	mana_gd_deregister_device(gd);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Patch v6 02/12] net: mana: Record the physical address for doorbell page region
  2022-09-21  1:22 [Patch v6 00/12] Introduce Microsoft Azure Network Adapter (MANA) RDMA driver longli
  2022-09-21  1:22 ` [Patch v6 01/12] net: mana: Add support for auxiliary device longli
@ 2022-09-21  1:22 ` longli
  2022-09-21  1:22 ` [Patch v6 03/12] net: mana: Handle vport sharing between devices longli
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: longli @ 2022-09-21  1:22 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Jason Gunthorpe, Leon Romanovsky, edumazet, shiraz.saleem,
	Ajay Sharma
  Cc: linux-hyperv, netdev, linux-kernel, linux-rdma, Long Li

From: Long Li <longli@microsoft.com>

For supporting RDMA device with multiple user contexts with their
individual doorbell pages, record the start address of doorbell page
region for use by the RDMA driver to allocate user context doorbell IDs.

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
---
Change log:
v6: rebased to rdma-next

 drivers/net/ethernet/microsoft/mana/gdma.h      | 2 ++
 drivers/net/ethernet/microsoft/mana/gdma_main.c | 4 ++++
 2 files changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/microsoft/mana/gdma.h b/drivers/net/ethernet/microsoft/mana/gdma.h
index f321a2616d03..72eaec2470c0 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma.h
+++ b/drivers/net/ethernet/microsoft/mana/gdma.h
@@ -351,9 +351,11 @@ struct gdma_context {
 	u32			test_event_eq_id;
 
 	bool			is_pf;
+	phys_addr_t		bar0_pa;
 	void __iomem		*bar0_va;
 	void __iomem		*shm_base;
 	void __iomem		*db_page_base;
+	phys_addr_t		phys_db_page_base;
 	u32 db_page_size;
 
 	/* Shared memory chanenl (used to bootstrap HWC) */
diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 5f9240182351..0cfe5f15458e 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -44,6 +44,9 @@ static void mana_gd_init_vf_regs(struct pci_dev *pdev)
 	gc->db_page_base = gc->bar0_va +
 				mana_gd_r64(gc, GDMA_REG_DB_PAGE_OFFSET);
 
+	gc->phys_db_page_base = gc->bar0_pa +
+				mana_gd_r64(gc, GDMA_REG_DB_PAGE_OFFSET);
+
 	gc->shm_base = gc->bar0_va + mana_gd_r64(gc, GDMA_REG_SHM_OFFSET);
 }
 
@@ -1367,6 +1370,7 @@ static int mana_gd_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	mutex_init(&gc->eq_test_event_mutex);
 	pci_set_drvdata(pdev, gc);
+	gc->bar0_pa = pci_resource_start(pdev, 0);
 
 	bar0_va = pci_iomap(pdev, bar, 0);
 	if (!bar0_va)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Patch v6 03/12] net: mana: Handle vport sharing between devices
  2022-09-21  1:22 [Patch v6 00/12] Introduce Microsoft Azure Network Adapter (MANA) RDMA driver longli
  2022-09-21  1:22 ` [Patch v6 01/12] net: mana: Add support for auxiliary device longli
  2022-09-21  1:22 ` [Patch v6 02/12] net: mana: Record the physical address for doorbell page region longli
@ 2022-09-21  1:22 ` longli
  2022-09-21  1:22 ` [Patch v6 04/12] net: mana: Add functions for allocating doorbell page from GDMA longli
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: longli @ 2022-09-21  1:22 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Jason Gunthorpe, Leon Romanovsky, edumazet, shiraz.saleem,
	Ajay Sharma
  Cc: linux-hyperv, netdev, linux-kernel, linux-rdma, Long Li

From: Long Li <longli@microsoft.com>

For outgoing packets, the PF requires the VF to configure the vport with
corresponding protection domain and doorbell ID for the kernel or user
context. The vport can't be shared between different contexts.

Implement the logic to exclusively take over the vport by either the
Ethernet device or RDMA device.

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
---
Change log:
v2: use refcount instead of directly using atomic variables
v4: change to mutex to avoid possible race with refcount
v5: add detailed comments explaining vport sharing, use EXPORT_SYMBOL_NS
v6: rebased to rdma-next

 drivers/net/ethernet/microsoft/mana/mana.h    |  7 +++
 drivers/net/ethernet/microsoft/mana/mana_en.c | 53 ++++++++++++++++++-
 2 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/mana.h b/drivers/net/ethernet/microsoft/mana/mana.h
index d58be64374c8..2883a08dbfb5 100644
--- a/drivers/net/ethernet/microsoft/mana/mana.h
+++ b/drivers/net/ethernet/microsoft/mana/mana.h
@@ -380,6 +380,10 @@ struct mana_port_context {
 	mana_handle_t port_handle;
 	mana_handle_t pf_filter_handle;
 
+	/* Mutex for sharing access to vport_use_count */
+	struct mutex vport_mutex;
+	int vport_use_count;
+
 	u16 port_idx;
 
 	bool port_is_up;
@@ -631,4 +635,7 @@ struct mana_tx_package {
 	struct gdma_posted_wqe_info wqe_info;
 };
 
+int mana_cfg_vport(struct mana_port_context *apc, u32 protection_dom_id,
+		   u32 doorbell_pg_id);
+void mana_uncfg_vport(struct mana_port_context *apc);
 #endif /* _MANA_H */
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 8ec6924a445c..ef843a4560bb 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -646,13 +646,48 @@ static int mana_query_vport_cfg(struct mana_port_context *apc, u32 vport_index,
 	return 0;
 }
 
-static int mana_cfg_vport(struct mana_port_context *apc, u32 protection_dom_id,
-			  u32 doorbell_pg_id)
+void mana_uncfg_vport(struct mana_port_context *apc)
+{
+	mutex_lock(&apc->vport_mutex);
+	apc->vport_use_count--;
+	WARN_ON(apc->vport_use_count < 0);
+	mutex_unlock(&apc->vport_mutex);
+}
+EXPORT_SYMBOL_NS(mana_uncfg_vport, NET_MANA);
+
+int mana_cfg_vport(struct mana_port_context *apc, u32 protection_dom_id,
+		   u32 doorbell_pg_id)
 {
 	struct mana_config_vport_resp resp = {};
 	struct mana_config_vport_req req = {};
 	int err;
 
+	/* This function is used to program the Ethernet port in the hardware
+	 * table. It can be called from the Ethernet driver or the RDMA driver.
+	 *
+	 * For Ethernet usage, the hardware supports only one active user on a
+	 * physical port. The driver checks on the port usage before programming
+	 * the hardware when creating the RAW QP (RDMA driver) or exposing the
+	 * device to kernel NET layer (Ethernet driver).
+	 *
+	 * Because the RDMA driver doesn't know in advance which QP type the
+	 * user will create, it exposes the device with all its ports. The user
+	 * may not be able to create RAW QP on a port if this port is already
+	 * in used by the Ethernet driver from the kernel.
+	 *
+	 * This physical port limitation only applies to the RAW QP. For RC QP,
+	 * the hardware doesn't have this limitation. The user can create RC
+	 * QPs on a physical port up to the hardware limits independent of the
+	 * Ethernet usage on the same port.
+	 */
+	mutex_lock(&apc->vport_mutex);
+	if (apc->vport_use_count > 0) {
+		mutex_unlock(&apc->vport_mutex);
+		return -EBUSY;
+	}
+	apc->vport_use_count++;
+	mutex_unlock(&apc->vport_mutex);
+
 	mana_gd_init_req_hdr(&req.hdr, MANA_CONFIG_VPORT_TX,
 			     sizeof(req), sizeof(resp));
 	req.vport = apc->port_handle;
@@ -679,9 +714,16 @@ static int mana_cfg_vport(struct mana_port_context *apc, u32 protection_dom_id,
 
 	apc->tx_shortform_allowed = resp.short_form_allowed;
 	apc->tx_vp_offset = resp.tx_vport_offset;
+
+	netdev_info(apc->ndev, "Configured vPort %llu PD %u DB %u\n",
+		    apc->port_handle, protection_dom_id, doorbell_pg_id);
 out:
+	if (err)
+		mana_uncfg_vport(apc);
+
 	return err;
 }
+EXPORT_SYMBOL_NS(mana_cfg_vport, NET_MANA);
 
 static int mana_cfg_vport_steering(struct mana_port_context *apc,
 				   enum TRI_STATE rx,
@@ -742,6 +784,9 @@ static int mana_cfg_vport_steering(struct mana_port_context *apc,
 			   resp.hdr.status);
 		err = -EPROTO;
 	}
+
+	netdev_info(ndev, "Configured steering vPort %llu entries %u\n",
+		    apc->port_handle, num_entries);
 out:
 	kfree(req);
 	return err;
@@ -1804,6 +1849,7 @@ static void mana_destroy_vport(struct mana_port_context *apc)
 	}
 
 	mana_destroy_txq(apc);
+	mana_uncfg_vport(apc);
 
 	if (gd->gdma_context->is_pf)
 		mana_pf_deregister_hw_vport(apc);
@@ -2076,6 +2122,9 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
 	apc->pf_filter_handle = INVALID_MANA_HANDLE;
 	apc->port_idx = port_idx;
 
+	mutex_init(&apc->vport_mutex);
+	apc->vport_use_count = 0;
+
 	ndev->netdev_ops = &mana_devops;
 	ndev->ethtool_ops = &mana_ethtool_ops;
 	ndev->mtu = ETH_DATA_LEN;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Patch v6 04/12] net: mana: Add functions for allocating doorbell page from GDMA
  2022-09-21  1:22 [Patch v6 00/12] Introduce Microsoft Azure Network Adapter (MANA) RDMA driver longli
                   ` (2 preceding siblings ...)
  2022-09-21  1:22 ` [Patch v6 03/12] net: mana: Handle vport sharing between devices longli
@ 2022-09-21  1:22 ` longli
  2022-09-21  1:22 ` [Patch v6 05/12] net: mana: Set the DMA device max segment size longli
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: longli @ 2022-09-21  1:22 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Jason Gunthorpe, Leon Romanovsky, edumazet, shiraz.saleem,
	Ajay Sharma
  Cc: linux-hyperv, netdev, linux-kernel, linux-rdma, Long Li

From: Long Li <longli@microsoft.com>

The RDMA device needs to allocate doorbell pages for each user context.
Implement those functions and expose them for use by the RDMA driver.

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
---
Change log:
v4: use EXPORT_SYMBOL_NS

 drivers/net/ethernet/microsoft/mana/gdma.h    | 29 ++++++++++
 .../net/ethernet/microsoft/mana/gdma_main.c   | 56 +++++++++++++++++++
 2 files changed, 85 insertions(+)

diff --git a/drivers/net/ethernet/microsoft/mana/gdma.h b/drivers/net/ethernet/microsoft/mana/gdma.h
index 72eaec2470c0..0ee3615152b8 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma.h
+++ b/drivers/net/ethernet/microsoft/mana/gdma.h
@@ -22,11 +22,15 @@ enum gdma_request_type {
 	GDMA_GENERATE_TEST_EQE		= 10,
 	GDMA_CREATE_QUEUE		= 12,
 	GDMA_DISABLE_QUEUE		= 13,
+	GDMA_ALLOCATE_RESOURCE_RANGE	= 22,
+	GDMA_DESTROY_RESOURCE_RANGE	= 24,
 	GDMA_CREATE_DMA_REGION		= 25,
 	GDMA_DMA_REGION_ADD_PAGES	= 26,
 	GDMA_DESTROY_DMA_REGION		= 27,
 };
 
+#define GDMA_RESOURCE_DOORBELL_PAGE	27
+
 enum gdma_queue_type {
 	GDMA_INVALID_QUEUE,
 	GDMA_SQ,
@@ -578,6 +582,26 @@ struct gdma_register_device_resp {
 	u32 db_id;
 }; /* HW DATA */
 
+struct gdma_allocate_resource_range_req {
+	struct gdma_req_hdr hdr;
+	u32 resource_type;
+	u32 num_resources;
+	u32 alignment;
+	u32 allocated_resources;
+};
+
+struct gdma_allocate_resource_range_resp {
+	struct gdma_resp_hdr hdr;
+	u32 allocated_resources;
+};
+
+struct gdma_destroy_resource_range_req {
+	struct gdma_req_hdr hdr;
+	u32 resource_type;
+	u32 num_resources;
+	u32 allocated_resources;
+};
+
 /* GDMA_CREATE_QUEUE */
 struct gdma_create_queue_req {
 	struct gdma_req_hdr hdr;
@@ -686,4 +710,9 @@ void mana_gd_free_memory(struct gdma_mem_info *gmi);
 
 int mana_gd_send_request(struct gdma_context *gc, u32 req_len, const void *req,
 			 u32 resp_len, void *resp);
+
+int mana_gd_allocate_doorbell_page(struct gdma_context *gc, int *doorbell_page);
+
+int mana_gd_destroy_doorbell_page(struct gdma_context *gc, int doorbell_page);
+
 #endif /* _GDMA_H */
diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 0cfe5f15458e..63544ca9e238 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -180,6 +180,62 @@ void mana_gd_free_memory(struct gdma_mem_info *gmi)
 			  gmi->dma_handle);
 }
 
+int mana_gd_destroy_doorbell_page(struct gdma_context *gc, int doorbell_page)
+{
+	struct gdma_destroy_resource_range_req req = {};
+	struct gdma_resp_hdr resp = {};
+	int err;
+
+	mana_gd_init_req_hdr(&req.hdr, GDMA_DESTROY_RESOURCE_RANGE,
+			     sizeof(req), sizeof(resp));
+
+	req.resource_type = GDMA_RESOURCE_DOORBELL_PAGE;
+	req.num_resources = 1;
+	req.allocated_resources = doorbell_page;
+
+	err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
+	if (err || resp.status) {
+		dev_err(gc->dev,
+			"Failed to destroy doorbell page: ret %d, 0x%x\n",
+			err, resp.status);
+		return err ? err : -EPROTO;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_NS(mana_gd_destroy_doorbell_page, NET_MANA);
+
+int mana_gd_allocate_doorbell_page(struct gdma_context *gc,
+				   int *doorbell_page)
+{
+	struct gdma_allocate_resource_range_req req = {};
+	struct gdma_allocate_resource_range_resp resp = {};
+	int err;
+
+	mana_gd_init_req_hdr(&req.hdr, GDMA_ALLOCATE_RESOURCE_RANGE,
+			     sizeof(req), sizeof(resp));
+
+	req.resource_type = GDMA_RESOURCE_DOORBELL_PAGE;
+	req.num_resources = 1;
+	req.alignment = 1;
+
+	/* Have GDMA start searching from 0 */
+	req.allocated_resources = 0;
+
+	err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
+	if (err || resp.hdr.status) {
+		dev_err(gc->dev,
+			"Failed to allocate doorbell page: ret %d, 0x%x\n",
+			err, resp.hdr.status);
+		return err ? err : -EPROTO;
+	}
+
+	*doorbell_page = resp.allocated_resources;
+
+	return 0;
+}
+EXPORT_SYMBOL_NS(mana_gd_allocate_doorbell_page, NET_MANA);
+
 static int mana_gd_create_hw_eq(struct gdma_context *gc,
 				struct gdma_queue *queue)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Patch v6 05/12] net: mana: Set the DMA device max segment size
  2022-09-21  1:22 [Patch v6 00/12] Introduce Microsoft Azure Network Adapter (MANA) RDMA driver longli
                   ` (3 preceding siblings ...)
  2022-09-21  1:22 ` [Patch v6 04/12] net: mana: Add functions for allocating doorbell page from GDMA longli
@ 2022-09-21  1:22 ` longli
  2022-09-21  1:22 ` [Patch v6 06/12] net: mana: Export Work Queue functions for use by RDMA driver longli
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: longli @ 2022-09-21  1:22 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Jason Gunthorpe, Leon Romanovsky, edumazet, shiraz.saleem,
	Ajay Sharma
  Cc: linux-hyperv, netdev, linux-kernel, linux-rdma, Long Li

From: Ajay Sharma <sharmaajay@microsoft.com>

MANA hardware doesn't have any restrictions on the DMA segment size, set it
to the max allowed value.

Signed-off-by: Ajay Sharma <sharmaajay@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
---
Change log:
v2: Use the max allowed value as the hardware doesn't have any limit

 drivers/net/ethernet/microsoft/mana/gdma_main.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 63544ca9e238..b44548136027 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -1419,6 +1419,12 @@ static int mana_gd_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (err)
 		goto release_region;
 
+	err = dma_set_max_seg_size(&pdev->dev, UINT_MAX);
+	if (err) {
+		dev_err(&pdev->dev, "Failed to set dma device segment size\n");
+		goto release_region;
+	}
+
 	err = -ENOMEM;
 	gc = vzalloc(sizeof(*gc));
 	if (!gc)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Patch v6 06/12] net: mana: Export Work Queue functions for use by RDMA driver
  2022-09-21  1:22 [Patch v6 00/12] Introduce Microsoft Azure Network Adapter (MANA) RDMA driver longli
                   ` (4 preceding siblings ...)
  2022-09-21  1:22 ` [Patch v6 05/12] net: mana: Set the DMA device max segment size longli
@ 2022-09-21  1:22 ` longli
  2022-09-21  1:22 ` [Patch v6 07/12] net: mana: Record port number in netdev longli
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: longli @ 2022-09-21  1:22 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Jason Gunthorpe, Leon Romanovsky, edumazet, shiraz.saleem,
	Ajay Sharma
  Cc: linux-hyperv, netdev, linux-kernel, linux-rdma, Long Li

From: Long Li <longli@microsoft.com>

RDMA device may need to create Ethernet device queues for use by Queue
Pair type RAW. This allows a user-mode context accesses Ethernet hardware
queues. Export the supporting functions for use by the RDMA driver.

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
---
Change log:
v3: format/coding style changes
v5: remove unused defintions, use EXPORT_SYMBOL_NS, rearrange some defintions to a later patch in the series

 drivers/net/ethernet/microsoft/mana/gdma_main.c |  1 +
 drivers/net/ethernet/microsoft/mana/mana.h      |  9 +++++++++
 drivers/net/ethernet/microsoft/mana/mana_en.c   | 16 +++++++++-------
 3 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index b44548136027..6eae7297e5f5 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -152,6 +152,7 @@ int mana_gd_send_request(struct gdma_context *gc, u32 req_len, const void *req,
 
 	return mana_hwc_send_request(hwc, req_len, req, resp_len, resp);
 }
+EXPORT_SYMBOL_NS(mana_gd_send_request, NET_MANA);
 
 int mana_gd_alloc_memory(struct gdma_context *gc, unsigned int length,
 			 struct gdma_mem_info *gmi)
diff --git a/drivers/net/ethernet/microsoft/mana/mana.h b/drivers/net/ethernet/microsoft/mana/mana.h
index 2883a08dbfb5..6e9e86fb4c02 100644
--- a/drivers/net/ethernet/microsoft/mana/mana.h
+++ b/drivers/net/ethernet/microsoft/mana/mana.h
@@ -635,6 +635,15 @@ struct mana_tx_package {
 	struct gdma_posted_wqe_info wqe_info;
 };
 
+int mana_create_wq_obj(struct mana_port_context *apc,
+		       mana_handle_t vport,
+		       u32 wq_type, struct mana_obj_spec *wq_spec,
+		       struct mana_obj_spec *cq_spec,
+		       mana_handle_t *wq_obj);
+
+void mana_destroy_wq_obj(struct mana_port_context *apc, u32 wq_type,
+			 mana_handle_t wq_obj);
+
 int mana_cfg_vport(struct mana_port_context *apc, u32 protection_dom_id,
 		   u32 doorbell_pg_id);
 void mana_uncfg_vport(struct mana_port_context *apc);
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index ef843a4560bb..345e3a47da3e 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -792,11 +792,11 @@ static int mana_cfg_vport_steering(struct mana_port_context *apc,
 	return err;
 }
 
-static int mana_create_wq_obj(struct mana_port_context *apc,
-			      mana_handle_t vport,
-			      u32 wq_type, struct mana_obj_spec *wq_spec,
-			      struct mana_obj_spec *cq_spec,
-			      mana_handle_t *wq_obj)
+int mana_create_wq_obj(struct mana_port_context *apc,
+		       mana_handle_t vport,
+		       u32 wq_type, struct mana_obj_spec *wq_spec,
+		       struct mana_obj_spec *cq_spec,
+		       mana_handle_t *wq_obj)
 {
 	struct mana_create_wqobj_resp resp = {};
 	struct mana_create_wqobj_req req = {};
@@ -845,9 +845,10 @@ static int mana_create_wq_obj(struct mana_port_context *apc,
 out:
 	return err;
 }
+EXPORT_SYMBOL_NS(mana_create_wq_obj, NET_MANA);
 
-static void mana_destroy_wq_obj(struct mana_port_context *apc, u32 wq_type,
-				mana_handle_t wq_obj)
+void mana_destroy_wq_obj(struct mana_port_context *apc, u32 wq_type,
+			 mana_handle_t wq_obj)
 {
 	struct mana_destroy_wqobj_resp resp = {};
 	struct mana_destroy_wqobj_req req = {};
@@ -872,6 +873,7 @@ static void mana_destroy_wq_obj(struct mana_port_context *apc, u32 wq_type,
 		netdev_err(ndev, "Failed to destroy WQ object: %d, 0x%x\n", err,
 			   resp.hdr.status);
 }
+EXPORT_SYMBOL_NS(mana_destroy_wq_obj, NET_MANA);
 
 static void mana_destroy_eq(struct mana_context *ac)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Patch v6 07/12] net: mana: Record port number in netdev
  2022-09-21  1:22 [Patch v6 00/12] Introduce Microsoft Azure Network Adapter (MANA) RDMA driver longli
                   ` (5 preceding siblings ...)
  2022-09-21  1:22 ` [Patch v6 06/12] net: mana: Export Work Queue functions for use by RDMA driver longli
@ 2022-09-21  1:22 ` longli
  2022-09-21  1:22 ` [Patch v6 08/12] net: mana: Move header files to a common location longli
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: longli @ 2022-09-21  1:22 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Jason Gunthorpe, Leon Romanovsky, edumazet, shiraz.saleem,
	Ajay Sharma
  Cc: linux-hyperv, netdev, linux-kernel, linux-rdma, Long Li

From: Long Li <longli@microsoft.com>

The port number is useful for user-mode application to identify this
net device based on port index. Set to the correct value in ndev.

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 drivers/net/ethernet/microsoft/mana/mana_en.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 345e3a47da3e..17cfee292cfb 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -2133,6 +2133,7 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
 	ndev->max_mtu = ndev->mtu;
 	ndev->min_mtu = ndev->mtu;
 	ndev->needed_headroom = MANA_HEADROOM;
+	ndev->dev_port = port_idx;
 	SET_NETDEV_DEV(ndev, gc->dev);
 
 	netif_carrier_off(ndev);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Patch v6 08/12] net: mana: Move header files to a common location
  2022-09-21  1:22 [Patch v6 00/12] Introduce Microsoft Azure Network Adapter (MANA) RDMA driver longli
                   ` (6 preceding siblings ...)
  2022-09-21  1:22 ` [Patch v6 07/12] net: mana: Record port number in netdev longli
@ 2022-09-21  1:22 ` longli
  2022-09-21  1:22 ` [Patch v6 09/12] net: mana: Define max values for SGL entries longli
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: longli @ 2022-09-21  1:22 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Jason Gunthorpe, Leon Romanovsky, edumazet, shiraz.saleem,
	Ajay Sharma
  Cc: linux-hyperv, netdev, linux-kernel, linux-rdma, Long Li

From: Long Li <longli@microsoft.com>

In preparation to add MANA RDMA driver, move all the required header files
to a common location for use by both Ethernet and RDMA drivers.

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
---
Change log:
v2: Move headers to include/net/mana, instead of include/linux/mana

 MAINTAINERS                                                   | 1 +
 drivers/net/ethernet/microsoft/mana/gdma_main.c               | 2 +-
 drivers/net/ethernet/microsoft/mana/hw_channel.c              | 4 ++--
 drivers/net/ethernet/microsoft/mana/mana_bpf.c                | 2 +-
 drivers/net/ethernet/microsoft/mana/mana_en.c                 | 4 ++--
 drivers/net/ethernet/microsoft/mana/mana_ethtool.c            | 2 +-
 drivers/net/ethernet/microsoft/mana/shm_channel.c             | 2 +-
 {drivers/net/ethernet/microsoft => include/net}/mana/gdma.h   | 0
 .../net/ethernet/microsoft => include/net}/mana/hw_channel.h  | 0
 {drivers/net/ethernet/microsoft => include/net}/mana/mana.h   | 0
 .../ethernet/microsoft => include/net}/mana/mana_auxiliary.h  | 0
 .../net/ethernet/microsoft => include/net}/mana/shm_channel.h | 0
 12 files changed, 9 insertions(+), 8 deletions(-)
 rename {drivers/net/ethernet/microsoft => include/net}/mana/gdma.h (100%)
 rename {drivers/net/ethernet/microsoft => include/net}/mana/hw_channel.h (100%)
 rename {drivers/net/ethernet/microsoft => include/net}/mana/mana.h (100%)
 rename {drivers/net/ethernet/microsoft => include/net}/mana/mana_auxiliary.h (100%)
 rename {drivers/net/ethernet/microsoft => include/net}/mana/shm_channel.h (100%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8a5012ba6ff9..8b9a50756c7e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9457,6 +9457,7 @@ F:	include/asm-generic/hyperv-tlfs.h
 F:	include/asm-generic/mshyperv.h
 F:	include/clocksource/hyperv_timer.h
 F:	include/linux/hyperv.h
+F:	include/net/mana
 F:	include/uapi/linux/hyperv.h
 F:	net/vmw_vsock/hyperv_transport.c
 F:	tools/hv/
diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 6eae7297e5f5..c8a4f7b580b0 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -6,7 +6,7 @@
 #include <linux/utsname.h>
 #include <linux/version.h>
 
-#include "mana.h"
+#include <net/mana/mana.h>
 
 static u32 mana_gd_r32(struct gdma_context *g, u64 offset)
 {
diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c b/drivers/net/ethernet/microsoft/mana/hw_channel.c
index 543a5d5c304f..76829ab43d40 100644
--- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
+++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
@@ -1,8 +1,8 @@
 // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
 /* Copyright (c) 2021, Microsoft Corporation. */
 
-#include "gdma.h"
-#include "hw_channel.h"
+#include <net/mana/gdma.h>
+#include <net/mana/hw_channel.h>
 
 static int mana_hwc_get_msg_index(struct hw_channel_context *hwc, u16 *msg_id)
 {
diff --git a/drivers/net/ethernet/microsoft/mana/mana_bpf.c b/drivers/net/ethernet/microsoft/mana/mana_bpf.c
index 421fd39ff3a8..3caea631229c 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_bpf.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_bpf.c
@@ -8,7 +8,7 @@
 #include <linux/bpf_trace.h>
 #include <net/xdp.h>
 
-#include "mana.h"
+#include <net/mana/mana.h>
 
 void mana_xdp_tx(struct sk_buff *skb, struct net_device *ndev)
 {
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 17cfee292cfb..a79a57644ec3 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -12,8 +12,8 @@
 #include <net/checksum.h>
 #include <net/ip6_checksum.h>
 
-#include "mana.h"
-#include "mana_auxiliary.h"
+#include <net/mana/mana.h>
+#include <net/mana/mana_auxiliary.h>
 
 static DEFINE_IDA(mana_adev_ida);
 
diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
index c530db76880f..6f98de6d7440 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
@@ -5,7 +5,7 @@
 #include <linux/etherdevice.h>
 #include <linux/ethtool.h>
 
-#include "mana.h"
+#include <net/mana/mana.h>
 
 static const struct {
 	char name[ETH_GSTRING_LEN];
diff --git a/drivers/net/ethernet/microsoft/mana/shm_channel.c b/drivers/net/ethernet/microsoft/mana/shm_channel.c
index da255da62176..5553af9c8085 100644
--- a/drivers/net/ethernet/microsoft/mana/shm_channel.c
+++ b/drivers/net/ethernet/microsoft/mana/shm_channel.c
@@ -6,7 +6,7 @@
 #include <linux/io.h>
 #include <linux/mm.h>
 
-#include "shm_channel.h"
+#include <net/mana/shm_channel.h>
 
 #define PAGE_FRAME_L48_WIDTH_BYTES 6
 #define PAGE_FRAME_L48_WIDTH_BITS (PAGE_FRAME_L48_WIDTH_BYTES * 8)
diff --git a/drivers/net/ethernet/microsoft/mana/gdma.h b/include/net/mana/gdma.h
similarity index 100%
rename from drivers/net/ethernet/microsoft/mana/gdma.h
rename to include/net/mana/gdma.h
diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.h b/include/net/mana/hw_channel.h
similarity index 100%
rename from drivers/net/ethernet/microsoft/mana/hw_channel.h
rename to include/net/mana/hw_channel.h
diff --git a/drivers/net/ethernet/microsoft/mana/mana.h b/include/net/mana/mana.h
similarity index 100%
rename from drivers/net/ethernet/microsoft/mana/mana.h
rename to include/net/mana/mana.h
diff --git a/drivers/net/ethernet/microsoft/mana/mana_auxiliary.h b/include/net/mana/mana_auxiliary.h
similarity index 100%
rename from drivers/net/ethernet/microsoft/mana/mana_auxiliary.h
rename to include/net/mana/mana_auxiliary.h
diff --git a/drivers/net/ethernet/microsoft/mana/shm_channel.h b/include/net/mana/shm_channel.h
similarity index 100%
rename from drivers/net/ethernet/microsoft/mana/shm_channel.h
rename to include/net/mana/shm_channel.h
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Patch v6 09/12] net: mana: Define max values for SGL entries
  2022-09-21  1:22 [Patch v6 00/12] Introduce Microsoft Azure Network Adapter (MANA) RDMA driver longli
                   ` (7 preceding siblings ...)
  2022-09-21  1:22 ` [Patch v6 08/12] net: mana: Move header files to a common location longli
@ 2022-09-21  1:22 ` longli
  2022-09-21  1:22 ` [Patch v6 10/12] net: mana: Define and process GDMA response code GDMA_STATUS_MORE_ENTRIES longli
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: longli @ 2022-09-21  1:22 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Jason Gunthorpe, Leon Romanovsky, edumazet, shiraz.saleem,
	Ajay Sharma
  Cc: linux-hyperv, netdev, linux-kernel, linux-rdma, Long Li

From: Long Li <longli@microsoft.com>

The number of maximum SGl entries should be computed from the maximum
WQE size for the intended queue type and the corresponding OOB data
size. This guarantees the hardware queue can successfully queue requests
up to the queue depth exposed to the upper layer.

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 drivers/net/ethernet/microsoft/mana/mana_en.c | 2 +-
 include/net/mana/gdma.h                       | 7 +++++++
 include/net/mana/mana.h                       | 4 +---
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index a79a57644ec3..08d15effc824 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -189,7 +189,7 @@ int mana_start_xmit(struct sk_buff *skb, struct net_device *ndev)
 	pkg.wqe_req.client_data_unit = 0;
 
 	pkg.wqe_req.num_sge = 1 + skb_shinfo(skb)->nr_frags;
-	WARN_ON_ONCE(pkg.wqe_req.num_sge > 30);
+	WARN_ON_ONCE(pkg.wqe_req.num_sge > MAX_TX_WQE_SGL_ENTRIES);
 
 	if (pkg.wqe_req.num_sge <= ARRAY_SIZE(pkg.sgl_array)) {
 		pkg.wqe_req.sgl = pkg.sgl_array;
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index 0ee3615152b8..9b050aabf76e 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -431,6 +431,13 @@ struct gdma_wqe {
 #define MAX_TX_WQE_SIZE 512
 #define MAX_RX_WQE_SIZE 256
 
+#define MAX_TX_WQE_SGL_ENTRIES	((GDMA_MAX_SQE_SIZE -			   \
+			sizeof(struct gdma_sge) - INLINE_OOB_SMALL_SIZE) / \
+			sizeof(struct gdma_sge))
+
+#define MAX_RX_WQE_SGL_ENTRIES	((GDMA_MAX_RQE_SIZE -			   \
+			sizeof(struct gdma_sge)) / sizeof(struct gdma_sge))
+
 struct gdma_cqe {
 	u32 cqe_data[GDMA_COMP_DATA_SIZE / 4];
 
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 6e9e86fb4c02..713a8f8cca9a 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -265,8 +265,6 @@ struct mana_cq {
 	int budget;
 };
 
-#define GDMA_MAX_RQE_SGES 15
-
 struct mana_recv_buf_oob {
 	/* A valid GDMA work request representing the data buffer. */
 	struct gdma_wqe_request wqe_req;
@@ -276,7 +274,7 @@ struct mana_recv_buf_oob {
 
 	/* SGL of the buffer going to be sent has part of the work request. */
 	u32 num_sge;
-	struct gdma_sge sgl[GDMA_MAX_RQE_SGES];
+	struct gdma_sge sgl[MAX_RX_WQE_SGL_ENTRIES];
 
 	/* Required to store the result of mana_gd_post_work_request.
 	 * gdma_posted_wqe_info.wqe_size_in_bu is required for progressing the
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Patch v6 10/12] net: mana: Define and process GDMA response code GDMA_STATUS_MORE_ENTRIES
  2022-09-21  1:22 [Patch v6 00/12] Introduce Microsoft Azure Network Adapter (MANA) RDMA driver longli
                   ` (8 preceding siblings ...)
  2022-09-21  1:22 ` [Patch v6 09/12] net: mana: Define max values for SGL entries longli
@ 2022-09-21  1:22 ` longli
  2022-09-21  1:22 ` [Patch v6 11/12] net: mana: Define data structures for protection domain and memory registration longli
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: longli @ 2022-09-21  1:22 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Jason Gunthorpe, Leon Romanovsky, edumazet, shiraz.saleem,
	Ajay Sharma
  Cc: linux-hyperv, netdev, linux-kernel, linux-rdma, Long Li

From: Ajay Sharma <sharmaajay@microsoft.com>

When doing memory registration, the PF may respond with
GDMA_STATUS_MORE_ENTRIES to indicate a follow request is needed. This is
not an error and should be processed as expected.

Signed-off-by: Ajay Sharma <sharmaajay@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 drivers/net/ethernet/microsoft/mana/hw_channel.c | 2 +-
 include/net/mana/gdma.h                          | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c b/drivers/net/ethernet/microsoft/mana/hw_channel.c
index 76829ab43d40..9d1507eba5b9 100644
--- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
+++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
@@ -836,7 +836,7 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
 		goto out;
 	}
 
-	if (ctx->status_code) {
+	if (ctx->status_code && ctx->status_code != GDMA_STATUS_MORE_ENTRIES) {
 		dev_err(hwc->dev, "HWC: Failed hw_channel req: 0x%x\n",
 			ctx->status_code);
 		err = -EPROTO;
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index 9b050aabf76e..42344cd5e8ce 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -9,6 +9,8 @@
 
 #include "shm_channel.h"
 
+#define GDMA_STATUS_MORE_ENTRIES	0x00000105
+
 /* Structures labeled with "HW DATA" are exchanged with the hardware. All of
  * them are naturally aligned and hence don't need __packed.
  */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Patch v6 11/12] net: mana: Define data structures for protection domain and memory registration
  2022-09-21  1:22 [Patch v6 00/12] Introduce Microsoft Azure Network Adapter (MANA) RDMA driver longli
                   ` (9 preceding siblings ...)
  2022-09-21  1:22 ` [Patch v6 10/12] net: mana: Define and process GDMA response code GDMA_STATUS_MORE_ENTRIES longli
@ 2022-09-21  1:22 ` longli
  2022-09-21  1:22 ` [Patch v6 12/12] RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter longli
  2022-09-21 16:25 ` [Patch v6 00/12] Introduce Microsoft Azure Network Adapter (MANA) RDMA driver Jason Gunthorpe
  12 siblings, 0 replies; 19+ messages in thread
From: longli @ 2022-09-21  1:22 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Jason Gunthorpe, Leon Romanovsky, edumazet, shiraz.saleem,
	Ajay Sharma
  Cc: linux-hyperv, netdev, linux-kernel, linux-rdma, Long Li

From: Ajay Sharma <sharmaajay@microsoft.com>

The MANA hardware support protection domain and memory registration for use
in RDMA environment. Add those definitions and expose them for use by the
RDMA driver.

Signed-off-by: Ajay Sharma <sharmaajay@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
---
Change log:
v3: format/coding style changes
v5: remove unused code dealing with kernel-mode MR

 .../net/ethernet/microsoft/mana/gdma_main.c   |  27 ++--
 drivers/net/ethernet/microsoft/mana/mana_en.c |  18 +--
 include/net/mana/gdma.h                       | 120 +++++++++++++++++-
 3 files changed, 142 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index c8a4f7b580b0..d1971cae8015 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -254,7 +254,7 @@ static int mana_gd_create_hw_eq(struct gdma_context *gc,
 	req.type = queue->type;
 	req.pdid = queue->gdma_dev->pdid;
 	req.doolbell_id = queue->gdma_dev->doorbell;
-	req.gdma_region = queue->mem_info.gdma_region;
+	req.gdma_region = queue->mem_info.dma_region_handle;
 	req.queue_size = queue->queue_size;
 	req.log2_throttle_limit = queue->eq.log2_throttle_limit;
 	req.eq_pci_msix_index = queue->eq.msix_index;
@@ -268,7 +268,7 @@ static int mana_gd_create_hw_eq(struct gdma_context *gc,
 
 	queue->id = resp.queue_index;
 	queue->eq.disable_needed = true;
-	queue->mem_info.gdma_region = GDMA_INVALID_DMA_REGION;
+	queue->mem_info.dma_region_handle = GDMA_INVALID_DMA_REGION;
 	return 0;
 }
 
@@ -722,24 +722,30 @@ int mana_gd_create_hwc_queue(struct gdma_dev *gd,
 	return err;
 }
 
-static void mana_gd_destroy_dma_region(struct gdma_context *gc, u64 gdma_region)
+int mana_gd_destroy_dma_region(struct gdma_context *gc,
+			       gdma_obj_handle_t dma_region_handle)
 {
 	struct gdma_destroy_dma_region_req req = {};
 	struct gdma_general_resp resp = {};
 	int err;
 
-	if (gdma_region == GDMA_INVALID_DMA_REGION)
-		return;
+	if (dma_region_handle == GDMA_INVALID_DMA_REGION)
+		return 0;
 
 	mana_gd_init_req_hdr(&req.hdr, GDMA_DESTROY_DMA_REGION, sizeof(req),
 			     sizeof(resp));
-	req.gdma_region = gdma_region;
+	req.dma_region_handle = dma_region_handle;
 
 	err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
-	if (err || resp.hdr.status)
+	if (err || resp.hdr.status) {
 		dev_err(gc->dev, "Failed to destroy DMA region: %d, 0x%x\n",
 			err, resp.hdr.status);
+		return -EPROTO;
+	}
+
+	return 0;
 }
+EXPORT_SYMBOL_NS(mana_gd_destroy_dma_region, NET_MANA);
 
 static int mana_gd_create_dma_region(struct gdma_dev *gd,
 				     struct gdma_mem_info *gmi)
@@ -784,14 +790,15 @@ static int mana_gd_create_dma_region(struct gdma_dev *gd,
 	if (err)
 		goto out;
 
-	if (resp.hdr.status || resp.gdma_region == GDMA_INVALID_DMA_REGION) {
+	if (resp.hdr.status ||
+	    resp.dma_region_handle == GDMA_INVALID_DMA_REGION) {
 		dev_err(gc->dev, "Failed to create DMA region: 0x%x\n",
 			resp.hdr.status);
 		err = -EPROTO;
 		goto out;
 	}
 
-	gmi->gdma_region = resp.gdma_region;
+	gmi->dma_region_handle = resp.dma_region_handle;
 out:
 	kfree(req);
 	return err;
@@ -914,7 +921,7 @@ void mana_gd_destroy_queue(struct gdma_context *gc, struct gdma_queue *queue)
 		return;
 	}
 
-	mana_gd_destroy_dma_region(gc, gmi->gdma_region);
+	mana_gd_destroy_dma_region(gc, gmi->dma_region_handle);
 	mana_gd_free_memory(gmi);
 	kfree(queue);
 }
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 08d15effc824..7ca313c7b7b3 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1523,10 +1523,10 @@ static int mana_create_txq(struct mana_port_context *apc,
 		memset(&wq_spec, 0, sizeof(wq_spec));
 		memset(&cq_spec, 0, sizeof(cq_spec));
 
-		wq_spec.gdma_region = txq->gdma_sq->mem_info.gdma_region;
+		wq_spec.gdma_region = txq->gdma_sq->mem_info.dma_region_handle;
 		wq_spec.queue_size = txq->gdma_sq->queue_size;
 
-		cq_spec.gdma_region = cq->gdma_cq->mem_info.gdma_region;
+		cq_spec.gdma_region = cq->gdma_cq->mem_info.dma_region_handle;
 		cq_spec.queue_size = cq->gdma_cq->queue_size;
 		cq_spec.modr_ctx_id = 0;
 		cq_spec.attached_eq = cq->gdma_cq->cq.parent->id;
@@ -1541,8 +1541,10 @@ static int mana_create_txq(struct mana_port_context *apc,
 		txq->gdma_sq->id = wq_spec.queue_index;
 		cq->gdma_cq->id = cq_spec.queue_index;
 
-		txq->gdma_sq->mem_info.gdma_region = GDMA_INVALID_DMA_REGION;
-		cq->gdma_cq->mem_info.gdma_region = GDMA_INVALID_DMA_REGION;
+		txq->gdma_sq->mem_info.dma_region_handle =
+			GDMA_INVALID_DMA_REGION;
+		cq->gdma_cq->mem_info.dma_region_handle =
+			GDMA_INVALID_DMA_REGION;
 
 		txq->gdma_txq_id = txq->gdma_sq->id;
 
@@ -1753,10 +1755,10 @@ static struct mana_rxq *mana_create_rxq(struct mana_port_context *apc,
 
 	memset(&wq_spec, 0, sizeof(wq_spec));
 	memset(&cq_spec, 0, sizeof(cq_spec));
-	wq_spec.gdma_region = rxq->gdma_rq->mem_info.gdma_region;
+	wq_spec.gdma_region = rxq->gdma_rq->mem_info.dma_region_handle;
 	wq_spec.queue_size = rxq->gdma_rq->queue_size;
 
-	cq_spec.gdma_region = cq->gdma_cq->mem_info.gdma_region;
+	cq_spec.gdma_region = cq->gdma_cq->mem_info.dma_region_handle;
 	cq_spec.queue_size = cq->gdma_cq->queue_size;
 	cq_spec.modr_ctx_id = 0;
 	cq_spec.attached_eq = cq->gdma_cq->cq.parent->id;
@@ -1769,8 +1771,8 @@ static struct mana_rxq *mana_create_rxq(struct mana_port_context *apc,
 	rxq->gdma_rq->id = wq_spec.queue_index;
 	cq->gdma_cq->id = cq_spec.queue_index;
 
-	rxq->gdma_rq->mem_info.gdma_region = GDMA_INVALID_DMA_REGION;
-	cq->gdma_cq->mem_info.gdma_region = GDMA_INVALID_DMA_REGION;
+	rxq->gdma_rq->mem_info.dma_region_handle = GDMA_INVALID_DMA_REGION;
+	cq->gdma_cq->mem_info.dma_region_handle = GDMA_INVALID_DMA_REGION;
 
 	rxq->gdma_id = rxq->gdma_rq->id;
 	cq->gdma_id = cq->gdma_cq->id;
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index 42344cd5e8ce..f7b439549356 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -29,6 +29,10 @@ enum gdma_request_type {
 	GDMA_CREATE_DMA_REGION		= 25,
 	GDMA_DMA_REGION_ADD_PAGES	= 26,
 	GDMA_DESTROY_DMA_REGION		= 27,
+	GDMA_CREATE_PD			= 29,
+	GDMA_DESTROY_PD			= 30,
+	GDMA_CREATE_MR			= 31,
+	GDMA_DESTROY_MR			= 32,
 };
 
 #define GDMA_RESOURCE_DOORBELL_PAGE	27
@@ -61,6 +65,8 @@ enum {
 	GDMA_DEVICE_MANA	= 2,
 };
 
+typedef u64 gdma_obj_handle_t;
+
 struct gdma_resource {
 	/* Protect the bitmap */
 	spinlock_t lock;
@@ -194,7 +200,7 @@ struct gdma_mem_info {
 	u64 length;
 
 	/* Allocated by the PF driver */
-	u64 gdma_region;
+	gdma_obj_handle_t dma_region_handle;
 };
 
 #define REGISTER_ATB_MST_MKEY_LOWER_SIZE 8
@@ -618,7 +624,7 @@ struct gdma_create_queue_req {
 	u32 reserved1;
 	u32 pdid;
 	u32 doolbell_id;
-	u64 gdma_region;
+	gdma_obj_handle_t gdma_region;
 	u32 reserved2;
 	u32 queue_size;
 	u32 log2_throttle_limit;
@@ -645,6 +651,28 @@ struct gdma_disable_queue_req {
 	u32 alloc_res_id_on_creation;
 }; /* HW DATA */
 
+enum atb_page_size {
+	ATB_PAGE_SIZE_4K,
+	ATB_PAGE_SIZE_8K,
+	ATB_PAGE_SIZE_16K,
+	ATB_PAGE_SIZE_32K,
+	ATB_PAGE_SIZE_64K,
+	ATB_PAGE_SIZE_128K,
+	ATB_PAGE_SIZE_256K,
+	ATB_PAGE_SIZE_512K,
+	ATB_PAGE_SIZE_1M,
+	ATB_PAGE_SIZE_2M,
+	ATB_PAGE_SIZE_MAX,
+};
+
+enum gdma_mr_access_flags {
+	GDMA_ACCESS_FLAG_LOCAL_READ = BIT_ULL(0),
+	GDMA_ACCESS_FLAG_LOCAL_WRITE = BIT_ULL(1),
+	GDMA_ACCESS_FLAG_REMOTE_READ = BIT_ULL(2),
+	GDMA_ACCESS_FLAG_REMOTE_WRITE = BIT_ULL(3),
+	GDMA_ACCESS_FLAG_REMOTE_ATOMIC = BIT_ULL(4),
+};
+
 /* GDMA_CREATE_DMA_REGION */
 struct gdma_create_dma_region_req {
 	struct gdma_req_hdr hdr;
@@ -671,14 +699,14 @@ struct gdma_create_dma_region_req {
 
 struct gdma_create_dma_region_resp {
 	struct gdma_resp_hdr hdr;
-	u64 gdma_region;
+	gdma_obj_handle_t dma_region_handle;
 }; /* HW DATA */
 
 /* GDMA_DMA_REGION_ADD_PAGES */
 struct gdma_dma_region_add_pages_req {
 	struct gdma_req_hdr hdr;
 
-	u64 gdma_region;
+	gdma_obj_handle_t dma_region_handle;
 
 	u32 page_addr_list_len;
 	u32 reserved3;
@@ -690,9 +718,88 @@ struct gdma_dma_region_add_pages_req {
 struct gdma_destroy_dma_region_req {
 	struct gdma_req_hdr hdr;
 
-	u64 gdma_region;
+	gdma_obj_handle_t dma_region_handle;
 }; /* HW DATA */
 
+enum gdma_pd_flags {
+	GDMA_PD_FLAG_INVALID = 0,
+};
+
+struct gdma_create_pd_req {
+	struct gdma_req_hdr hdr;
+	enum gdma_pd_flags flags;
+	u32 reserved;
+};/* HW DATA */
+
+struct gdma_create_pd_resp {
+	struct gdma_resp_hdr hdr;
+	gdma_obj_handle_t pd_handle;
+	u32 pd_id;
+	u32 reserved;
+};/* HW DATA */
+
+struct gdma_destroy_pd_req {
+	struct gdma_req_hdr hdr;
+	gdma_obj_handle_t pd_handle;
+};/* HW DATA */
+
+struct gdma_destory_pd_resp {
+	struct gdma_resp_hdr hdr;
+};/* HW DATA */
+
+enum gdma_mr_type {
+	/* Guest Virtual Address - MRs of this type allow access
+	 * to memory mapped by PTEs associated with this MR using a virtual
+	 * address that is set up in the MST
+	 */
+	GDMA_MR_TYPE_GVA = 2,
+};
+
+struct gdma_create_mr_params {
+	gdma_obj_handle_t pd_handle;
+	enum gdma_mr_type mr_type;
+	union {
+		struct {
+			gdma_obj_handle_t dma_region_handle;
+			u64 virtual_address;
+			enum gdma_mr_access_flags access_flags;
+		} gva;
+	};
+};
+
+struct gdma_create_mr_request {
+	struct gdma_req_hdr hdr;
+	gdma_obj_handle_t pd_handle;
+	enum gdma_mr_type mr_type;
+	u32 reserved_1;
+
+	union {
+		struct {
+			gdma_obj_handle_t dma_region_handle;
+			u64 virtual_address;
+			enum gdma_mr_access_flags access_flags;
+		} gva;
+
+	};
+	u32 reserved_2;
+};/* HW DATA */
+
+struct gdma_create_mr_response {
+	struct gdma_resp_hdr hdr;
+	gdma_obj_handle_t mr_handle;
+	u32 lkey;
+	u32 rkey;
+};/* HW DATA */
+
+struct gdma_destroy_mr_request {
+	struct gdma_req_hdr hdr;
+	gdma_obj_handle_t mr_handle;
+};/* HW DATA */
+
+struct gdma_destroy_mr_response {
+	struct gdma_resp_hdr hdr;
+};/* HW DATA */
+
 int mana_gd_verify_vf_version(struct pci_dev *pdev);
 
 int mana_gd_register_device(struct gdma_dev *gd);
@@ -724,4 +831,7 @@ int mana_gd_allocate_doorbell_page(struct gdma_context *gc, int *doorbell_page);
 
 int mana_gd_destroy_doorbell_page(struct gdma_context *gc, int doorbell_page);
 
+int mana_gd_destroy_dma_region(struct gdma_context *gc,
+			       gdma_obj_handle_t dma_region_handle);
+
 #endif /* _GDMA_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Patch v6 12/12] RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter
  2022-09-21  1:22 [Patch v6 00/12] Introduce Microsoft Azure Network Adapter (MANA) RDMA driver longli
                   ` (10 preceding siblings ...)
  2022-09-21  1:22 ` [Patch v6 11/12] net: mana: Define data structures for protection domain and memory registration longli
@ 2022-09-21  1:22 ` longli
  2022-09-21 16:26   ` Jason Gunthorpe
  2022-09-21 17:59   ` Jason Gunthorpe
  2022-09-21 16:25 ` [Patch v6 00/12] Introduce Microsoft Azure Network Adapter (MANA) RDMA driver Jason Gunthorpe
  12 siblings, 2 replies; 19+ messages in thread
From: longli @ 2022-09-21  1:22 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Jason Gunthorpe, Leon Romanovsky, edumazet, shiraz.saleem,
	Ajay Sharma
  Cc: linux-hyperv, netdev, linux-kernel, linux-rdma, Long Li

From: Long Li <longli@microsoft.com>

Add a RDMA VF driver for Microsoft Azure Network Adapter (MANA).

Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
---
Change log:
v2:
Changed coding sytles/formats
Checked undersize for udata length
Changed all logging to use ibdev_xxx()
Avoided page array copy when doing MR
Sorted driver ops
Fixed warnings reported by kernel test robot <lkp@intel.com>

v3:
More coding sytle/format changes

v4:
Process error on hardware vport configuration

v5:
Change licenses to GPL-2.0-only
Fix error handling in mana_ib_gd_create_dma_region()

v6:
rebased to rdma-next
removed redundant initialization to return value in mana_ib_probe()
added missing tabs at the end of mana_ib_gd_create_dma_region()

 MAINTAINERS                             |   3 +
 drivers/infiniband/Kconfig              |   1 +
 drivers/infiniband/hw/Makefile          |   1 +
 drivers/infiniband/hw/mana/Kconfig      |   7 +
 drivers/infiniband/hw/mana/Makefile     |   4 +
 drivers/infiniband/hw/mana/cq.c         |  80 ++++
 drivers/infiniband/hw/mana/device.c     | 129 ++++++
 drivers/infiniband/hw/mana/main.c       | 555 ++++++++++++++++++++++++
 drivers/infiniband/hw/mana/mana_ib.h    | 165 +++++++
 drivers/infiniband/hw/mana/mr.c         | 133 ++++++
 drivers/infiniband/hw/mana/qp.c         | 501 +++++++++++++++++++++
 drivers/infiniband/hw/mana/wq.c         | 114 +++++
 include/net/mana/mana.h                 |   3 +
 include/uapi/rdma/ib_user_ioctl_verbs.h |   1 +
 include/uapi/rdma/mana-abi.h            |  66 +++
 15 files changed, 1763 insertions(+)
 create mode 100644 drivers/infiniband/hw/mana/Kconfig
 create mode 100644 drivers/infiniband/hw/mana/Makefile
 create mode 100644 drivers/infiniband/hw/mana/cq.c
 create mode 100644 drivers/infiniband/hw/mana/device.c
 create mode 100644 drivers/infiniband/hw/mana/main.c
 create mode 100644 drivers/infiniband/hw/mana/mana_ib.h
 create mode 100644 drivers/infiniband/hw/mana/mr.c
 create mode 100644 drivers/infiniband/hw/mana/qp.c
 create mode 100644 drivers/infiniband/hw/mana/wq.c
 create mode 100644 include/uapi/rdma/mana-abi.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 8b9a50756c7e..7bcc19e27f97 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9426,6 +9426,7 @@ M:	Haiyang Zhang <haiyangz@microsoft.com>
 M:	Stephen Hemminger <sthemmin@microsoft.com>
 M:	Wei Liu <wei.liu@kernel.org>
 M:	Dexuan Cui <decui@microsoft.com>
+M:	Long Li <longli@microsoft.com>
 L:	linux-hyperv@vger.kernel.org
 S:	Supported
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git
@@ -9444,6 +9445,7 @@ F:	arch/x86/kernel/cpu/mshyperv.c
 F:	drivers/clocksource/hyperv_timer.c
 F:	drivers/hid/hid-hyperv.c
 F:	drivers/hv/
+F:	drivers/infiniband/hw/mana/
 F:	drivers/input/serio/hyperv-keyboard.c
 F:	drivers/iommu/hyperv-iommu.c
 F:	drivers/net/ethernet/microsoft/
@@ -9459,6 +9461,7 @@ F:	include/clocksource/hyperv_timer.h
 F:	include/linux/hyperv.h
 F:	include/net/mana
 F:	include/uapi/linux/hyperv.h
+F:	include/uapi/rdma/mana-abi.h
 F:	net/vmw_vsock/hyperv_transport.c
 F:	tools/hv/
 
diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index aa36ac618e72..ccc874478f0b 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -85,6 +85,7 @@ source "drivers/infiniband/hw/erdma/Kconfig"
 source "drivers/infiniband/hw/hfi1/Kconfig"
 source "drivers/infiniband/hw/hns/Kconfig"
 source "drivers/infiniband/hw/irdma/Kconfig"
+source "drivers/infiniband/hw/mana/Kconfig"
 source "drivers/infiniband/hw/mlx4/Kconfig"
 source "drivers/infiniband/hw/mlx5/Kconfig"
 source "drivers/infiniband/hw/mthca/Kconfig"
diff --git a/drivers/infiniband/hw/Makefile b/drivers/infiniband/hw/Makefile
index 6b3a88046125..1211f4317a9f 100644
--- a/drivers/infiniband/hw/Makefile
+++ b/drivers/infiniband/hw/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_INFINIBAND_QIB)		+= qib/
 obj-$(CONFIG_INFINIBAND_CXGB4)		+= cxgb4/
 obj-$(CONFIG_INFINIBAND_EFA)		+= efa/
 obj-$(CONFIG_INFINIBAND_IRDMA)		+= irdma/
+obj-$(CONFIG_MANA_INFINIBAND)		+= mana/
 obj-$(CONFIG_MLX4_INFINIBAND)		+= mlx4/
 obj-$(CONFIG_MLX5_INFINIBAND)		+= mlx5/
 obj-$(CONFIG_INFINIBAND_OCRDMA)		+= ocrdma/
diff --git a/drivers/infiniband/hw/mana/Kconfig b/drivers/infiniband/hw/mana/Kconfig
new file mode 100644
index 000000000000..b3ff03a23257
--- /dev/null
+++ b/drivers/infiniband/hw/mana/Kconfig
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config MANA_INFINIBAND
+	tristate "Microsoft Azure Network Adapter support"
+	depends on NETDEVICES && ETHERNET && PCI && MICROSOFT_MANA
+	help
+	  This driver provides low-level RDMA support for
+	  Microsoft Azure Network Adapter (MANA).
diff --git a/drivers/infiniband/hw/mana/Makefile b/drivers/infiniband/hw/mana/Makefile
new file mode 100644
index 000000000000..88655fe5e398
--- /dev/null
+++ b/drivers/infiniband/hw/mana/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_MANA_INFINIBAND) += mana_ib.o
+
+mana_ib-y := device.o main.o wq.o qp.o cq.o mr.o
diff --git a/drivers/infiniband/hw/mana/cq.c b/drivers/infiniband/hw/mana/cq.c
new file mode 100644
index 000000000000..f4717af96819
--- /dev/null
+++ b/drivers/infiniband/hw/mana/cq.c
@@ -0,0 +1,80 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2022, Microsoft Corporation. All rights reserved.
+ */
+
+#include "mana_ib.h"
+
+int mana_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+		      struct ib_udata *udata)
+{
+	struct mana_ib_cq *cq = container_of(ibcq, struct mana_ib_cq, ibcq);
+	struct ib_device *ibdev = ibcq->device;
+	struct mana_ib_create_cq ucmd = {};
+	struct mana_ib_dev *mdev;
+	int err;
+
+	mdev = container_of(ibdev, struct mana_ib_dev, ib_dev);
+
+	if (udata->inlen < sizeof(ucmd))
+		return -EINVAL;
+
+	err = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen));
+	if (err) {
+		ibdev_dbg(ibdev,
+			  "Failed to copy from udata for create cq, %d\n", err);
+		return -EFAULT;
+	}
+
+	if (attr->cqe > MAX_SEND_BUFFERS_PER_QUEUE) {
+		ibdev_dbg(ibdev, "CQE %d exceeding limit\n", attr->cqe);
+		return -EINVAL;
+	}
+
+	cq->cqe = attr->cqe;
+	cq->umem = ib_umem_get(ibdev, ucmd.buf_addr, cq->cqe * COMP_ENTRY_SIZE,
+			       IB_ACCESS_LOCAL_WRITE);
+	if (IS_ERR(cq->umem)) {
+		err = PTR_ERR(cq->umem);
+		ibdev_dbg(ibdev, "Failed to get umem for create cq, err %d\n",
+			  err);
+		return err;
+	}
+
+	err = mana_ib_gd_create_dma_region(mdev, cq->umem, &cq->gdma_region,
+					   PAGE_SIZE);
+	if (err) {
+		ibdev_err(ibdev,
+			  "Failed to create dma region for create cq, %d\n",
+			  err);
+		goto err_release_umem;
+	}
+
+	ibdev_dbg(ibdev,
+		  "mana_ib_gd_create_dma_region ret %d gdma_region 0x%llx\n",
+		  err, cq->gdma_region);
+
+	/* The CQ ID is not known at this time
+	 * The ID is generated at create_qp
+	 */
+
+	return 0;
+
+err_release_umem:
+	ib_umem_release(cq->umem);
+	return err;
+}
+
+int mana_ib_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
+{
+	struct mana_ib_cq *cq = container_of(ibcq, struct mana_ib_cq, ibcq);
+	struct ib_device *ibdev = ibcq->device;
+	struct mana_ib_dev *mdev;
+
+	mdev = container_of(ibdev, struct mana_ib_dev, ib_dev);
+
+	mana_ib_gd_destroy_dma_region(mdev, cq->gdma_region);
+	ib_umem_release(cq->umem);
+
+	return 0;
+}
diff --git a/drivers/infiniband/hw/mana/device.c b/drivers/infiniband/hw/mana/device.c
new file mode 100644
index 000000000000..c0a5a37583ff
--- /dev/null
+++ b/drivers/infiniband/hw/mana/device.c
@@ -0,0 +1,129 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2022, Microsoft Corporation. All rights reserved.
+ */
+
+#include "mana_ib.h"
+#include <net/mana/mana_auxiliary.h>
+
+MODULE_DESCRIPTION("Microsoft Azure Network Adapter IB driver");
+MODULE_LICENSE("GPL");
+MODULE_IMPORT_NS(NET_MANA);
+
+const struct ib_device_ops mana_ib_dev_ops = {
+	.owner = THIS_MODULE,
+	.driver_id = RDMA_DRIVER_MANA,
+	.uverbs_abi_ver = MANA_IB_UVERBS_ABI_VERSION,
+
+	.alloc_pd = mana_ib_alloc_pd,
+	.alloc_ucontext = mana_ib_alloc_ucontext,
+	.create_cq = mana_ib_create_cq,
+	.create_qp = mana_ib_create_qp,
+	.create_rwq_ind_table = mana_ib_create_rwq_ind_table,
+	.create_wq = mana_ib_create_wq,
+	.dealloc_pd = mana_ib_dealloc_pd,
+	.dealloc_ucontext = mana_ib_dealloc_ucontext,
+	.dereg_mr = mana_ib_dereg_mr,
+	.destroy_cq = mana_ib_destroy_cq,
+	.destroy_qp = mana_ib_destroy_qp,
+	.destroy_rwq_ind_table = mana_ib_destroy_rwq_ind_table,
+	.destroy_wq = mana_ib_destroy_wq,
+	.disassociate_ucontext = mana_ib_disassociate_ucontext,
+	.get_port_immutable = mana_ib_get_port_immutable,
+	.mmap = mana_ib_mmap,
+	.modify_qp = mana_ib_modify_qp,
+	.modify_wq = mana_ib_modify_wq,
+	.query_device = mana_ib_query_device,
+	.query_gid = mana_ib_query_gid,
+	.query_port = mana_ib_query_port,
+	.reg_user_mr = mana_ib_reg_user_mr,
+
+	INIT_RDMA_OBJ_SIZE(ib_cq, mana_ib_cq, ibcq),
+	INIT_RDMA_OBJ_SIZE(ib_pd, mana_ib_pd, ibpd),
+	INIT_RDMA_OBJ_SIZE(ib_qp, mana_ib_qp, ibqp),
+	INIT_RDMA_OBJ_SIZE(ib_ucontext, mana_ib_ucontext, ibucontext),
+	INIT_RDMA_OBJ_SIZE(ib_rwq_ind_table, mana_ib_rwq_ind_table,
+			   ib_ind_table),
+};
+
+static int mana_ib_probe(struct auxiliary_device *adev,
+			 const struct auxiliary_device_id *id)
+{
+	struct mana_adev *madev = container_of(adev, struct mana_adev, adev);
+	struct gdma_dev *mdev = madev->mdev;
+	struct mana_context *mc;
+	struct mana_ib_dev *dev;
+	int ret;
+
+	mc = mdev->driver_data;
+
+	dev = ib_alloc_device(mana_ib_dev, ib_dev);
+	if (!dev)
+		return -ENOMEM;
+
+	ib_set_device_ops(&dev->ib_dev, &mana_ib_dev_ops);
+
+	dev->ib_dev.phys_port_cnt = mc->num_ports;
+
+	ibdev_dbg(&dev->ib_dev, "mdev=%p id=%d num_ports=%d\n", mdev,
+		  mdev->dev_id.as_uint32, dev->ib_dev.phys_port_cnt);
+
+	dev->gdma_dev = mdev;
+	dev->ib_dev.node_type = RDMA_NODE_IB_CA;
+
+	/* num_comp_vectors needs to set to the max MSIX index
+	 * when interrupts and event queues are implemented
+	 */
+	dev->ib_dev.num_comp_vectors = 1;
+	dev->ib_dev.dev.parent = mdev->gdma_context->dev;
+
+	ret = ib_register_device(&dev->ib_dev, "mana_%d",
+				 mdev->gdma_context->dev);
+	if (ret) {
+		ib_dealloc_device(&dev->ib_dev);
+		return ret;
+	}
+
+	dev_set_drvdata(&adev->dev, dev);
+
+	return 0;
+}
+
+static void mana_ib_remove(struct auxiliary_device *adev)
+{
+	struct mana_ib_dev *dev = dev_get_drvdata(&adev->dev);
+
+	ib_unregister_device(&dev->ib_dev);
+	ib_dealloc_device(&dev->ib_dev);
+}
+
+static const struct auxiliary_device_id mana_id_table[] = {
+	{
+		.name = "mana.rdma",
+	},
+	{},
+};
+
+MODULE_DEVICE_TABLE(auxiliary, mana_id_table);
+
+static struct auxiliary_driver mana_driver = {
+	.name = "rdma",
+	.probe = mana_ib_probe,
+	.remove = mana_ib_remove,
+	.id_table = mana_id_table,
+};
+
+static int __init mana_ib_init(void)
+{
+	auxiliary_driver_register(&mana_driver);
+
+	return 0;
+}
+
+static void __exit mana_ib_cleanup(void)
+{
+	auxiliary_driver_unregister(&mana_driver);
+}
+
+module_init(mana_ib_init);
+module_exit(mana_ib_cleanup);
diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
new file mode 100644
index 000000000000..f7b6c3f47e66
--- /dev/null
+++ b/drivers/infiniband/hw/mana/main.c
@@ -0,0 +1,555 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2022, Microsoft Corporation. All rights reserved.
+ */
+
+#include "mana_ib.h"
+
+void mana_ib_uncfg_vport(struct mana_ib_dev *dev, struct mana_ib_pd *pd,
+			 u32 port)
+{
+	struct gdma_dev *gd = dev->gdma_dev;
+	struct mana_port_context *mpc;
+	struct net_device *ndev;
+	struct mana_context *mc;
+
+	mc = gd->driver_data;
+	ndev = mc->ports[port];
+	mpc = netdev_priv(ndev);
+
+	mutex_lock(&pd->vport_mutex);
+
+	pd->vport_use_count--;
+	WARN_ON(pd->vport_use_count < 0);
+
+	if (!pd->vport_use_count)
+		mana_uncfg_vport(mpc);
+
+	mutex_unlock(&pd->vport_mutex);
+}
+
+int mana_ib_cfg_vport(struct mana_ib_dev *dev, u32 port, struct mana_ib_pd *pd,
+		      u32 doorbell_id)
+{
+	struct gdma_dev *mdev = dev->gdma_dev;
+	struct mana_port_context *mpc;
+	struct mana_context *mc;
+	struct net_device *ndev;
+	int err;
+
+	mc = mdev->driver_data;
+	ndev = mc->ports[port];
+	mpc = netdev_priv(ndev);
+
+	mutex_lock(&pd->vport_mutex);
+
+	pd->vport_use_count++;
+	if (pd->vport_use_count > 1) {
+		ibdev_dbg(&dev->ib_dev,
+			  "Skip as this PD is already configured vport\n");
+		mutex_unlock(&pd->vport_mutex);
+		return 0;
+	}
+	mutex_unlock(&pd->vport_mutex);
+
+	err = mana_cfg_vport(mpc, pd->pdn, doorbell_id);
+	if (err) {
+		mutex_lock(&pd->vport_mutex);
+		pd->vport_use_count--;
+		mutex_unlock(&pd->vport_mutex);
+
+		ibdev_err(&dev->ib_dev, "Failed to configure vPort %d\n", err);
+		return err;
+	}
+
+	pd->tx_shortform_allowed = mpc->tx_shortform_allowed;
+	pd->tx_vp_offset = mpc->tx_vp_offset;
+
+	ibdev_dbg(&dev->ib_dev,
+		  "vport handle %llx pdid %x doorbell_id %x "
+		  "tx_shortform_allowed %d tx_vp_offset %u\n",
+		  mpc->port_handle, pd->pdn, doorbell_id,
+		  pd->tx_shortform_allowed, pd->tx_vp_offset);
+
+	return 0;
+}
+
+int mana_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
+{
+	struct mana_ib_pd *pd = container_of(ibpd, struct mana_ib_pd, ibpd);
+	struct ib_device *ibdev = ibpd->device;
+	enum gdma_pd_flags flags = 0;
+	struct mana_ib_dev *dev;
+	int ret;
+
+	dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
+
+	ret = mana_ib_gd_create_pd(dev, &pd->pd_handle, &pd->pdn, flags);
+	if (ret) {
+		ibdev_err(ibdev, "Failed to get pd id, err %d\n", ret);
+		return ret;
+	}
+
+	mutex_init(&pd->vport_mutex);
+	pd->vport_use_count = 0;
+	return 0;
+}
+
+int mana_ib_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
+{
+	struct mana_ib_pd *pd = container_of(ibpd, struct mana_ib_pd, ibpd);
+	struct ib_device *ibdev = ibpd->device;
+	struct mana_ib_dev *dev;
+
+	dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
+	return mana_ib_gd_destroy_pd(dev, pd->pd_handle);
+}
+
+int mana_ib_alloc_ucontext(struct ib_ucontext *ibcontext,
+			   struct ib_udata *udata)
+{
+	struct mana_ib_ucontext *ucontext =
+		container_of(ibcontext, struct mana_ib_ucontext, ibucontext);
+	struct ib_device *ibdev = ibcontext->device;
+	struct mana_ib_dev *mdev;
+	struct gdma_context *gc;
+	struct gdma_dev *dev;
+	int doorbell_page;
+	int ret;
+
+	mdev = container_of(ibdev, struct mana_ib_dev, ib_dev);
+	dev = mdev->gdma_dev;
+	gc = dev->gdma_context;
+
+	/* Allocate a doorbell page index */
+	ret = mana_gd_allocate_doorbell_page(gc, &doorbell_page);
+	if (ret) {
+		ibdev_err(ibdev, "Failed to allocate doorbell page %d\n", ret);
+		return -ENOMEM;
+	}
+
+	ibdev_dbg(ibdev, "Doorbell page allocated %d\n", doorbell_page);
+
+	ucontext->doorbell = doorbell_page;
+
+	return 0;
+}
+
+void mana_ib_dealloc_ucontext(struct ib_ucontext *ibcontext)
+{
+	struct mana_ib_ucontext *mana_ucontext =
+		container_of(ibcontext, struct mana_ib_ucontext, ibucontext);
+	struct ib_device *ibdev = ibcontext->device;
+	struct mana_ib_dev *mdev;
+	struct gdma_context *gc;
+	int ret;
+
+	mdev = container_of(ibdev, struct mana_ib_dev, ib_dev);
+	gc = mdev->gdma_dev->gdma_context;
+
+	ret = mana_gd_destroy_doorbell_page(gc, mana_ucontext->doorbell);
+	if (ret)
+		ibdev_err(ibdev, "Failed to destroy doorbell page %d\n", ret);
+}
+
+int mana_ib_gd_create_dma_region(struct mana_ib_dev *dev, struct ib_umem *umem,
+				 mana_handle_t *gdma_region, u64 page_sz)
+{
+	size_t num_pages_total = ib_umem_num_dma_blocks(umem, page_sz);
+	struct gdma_dma_region_add_pages_req *add_req = NULL;
+	struct gdma_create_dma_region_resp create_resp = {};
+	struct gdma_create_dma_region_req *create_req;
+	size_t num_pages_cur, num_pages_to_handle;
+	unsigned int create_req_msg_size;
+	struct hw_channel_context *hwc;
+	struct ib_block_iter biter;
+	size_t max_pgs_create_cmd;
+	struct gdma_context *gc;
+	struct gdma_dev *mdev;
+	unsigned int i;
+	int err;
+
+	mdev = dev->gdma_dev;
+	gc = mdev->gdma_context;
+	hwc = gc->hwc.driver_data;
+	max_pgs_create_cmd =
+		(hwc->max_req_msg_size - sizeof(*create_req)) / sizeof(u64);
+
+	num_pages_to_handle =
+		min_t(size_t, num_pages_total, max_pgs_create_cmd);
+	create_req_msg_size =
+		struct_size(create_req, page_addr_list, num_pages_to_handle);
+
+	create_req = kzalloc(create_req_msg_size, GFP_KERNEL);
+	if (!create_req)
+		return -ENOMEM;
+
+	mana_gd_init_req_hdr(&create_req->hdr, GDMA_CREATE_DMA_REGION,
+			     create_req_msg_size, sizeof(create_resp));
+
+	create_req->length = umem->length;
+	create_req->offset_in_page = umem->address & (page_sz - 1);
+	create_req->gdma_page_type = order_base_2(page_sz) - PAGE_SHIFT;
+	create_req->page_count = num_pages_total;
+	create_req->page_addr_list_len = num_pages_to_handle;
+
+	ibdev_dbg(&dev->ib_dev,
+		  "size_dma_region %lu num_pages_total %lu, "
+		  "page_sz 0x%llx offset_in_page %u\n",
+		  umem->length, num_pages_total, page_sz,
+		  create_req->offset_in_page);
+
+	ibdev_dbg(&dev->ib_dev, "num_pages_to_handle %lu, gdma_page_type %u",
+		  num_pages_to_handle, create_req->gdma_page_type);
+
+	__rdma_umem_block_iter_start(&biter, umem, page_sz);
+
+	for (i = 0; i < num_pages_to_handle; ++i) {
+		dma_addr_t cur_addr;
+
+		__rdma_block_iter_next(&biter);
+		cur_addr = rdma_block_iter_dma_address(&biter);
+
+		create_req->page_addr_list[i] = cur_addr;
+
+		ibdev_dbg(&dev->ib_dev, "page num %u cur_addr 0x%llx\n", i,
+			  cur_addr);
+	}
+
+	err = mana_gd_send_request(gc, create_req_msg_size, create_req,
+				   sizeof(create_resp), &create_resp);
+	kfree(create_req);
+
+	if (err || create_resp.hdr.status) {
+		ibdev_err(&dev->ib_dev,
+			  "Failed to create DMA region: %d, 0x%x\n", err,
+			  create_resp.hdr.status);
+		if (!err)
+			err = -EPROTO;
+
+		goto error;
+	}
+
+	*gdma_region = create_resp.dma_region_handle;
+	ibdev_dbg(&dev->ib_dev, "Created DMA region with handle 0x%llx\n",
+		  *gdma_region);
+
+	num_pages_cur = num_pages_to_handle;
+
+	if (num_pages_cur < num_pages_total) {
+		unsigned int add_req_msg_size;
+		size_t max_pgs_add_cmd =
+			(hwc->max_req_msg_size - sizeof(*add_req)) /
+			sizeof(u64);
+
+		num_pages_to_handle =
+			min_t(size_t, num_pages_total - num_pages_cur,
+			      max_pgs_add_cmd);
+
+		/* Calculate the max num of pages that will be handled */
+		add_req_msg_size = struct_size(add_req, page_addr_list,
+					       num_pages_to_handle);
+
+		add_req = kmalloc(add_req_msg_size, GFP_KERNEL);
+		if (!add_req) {
+			err = -ENOMEM;
+			goto free_gdma_region;
+		}
+
+		while (num_pages_cur < num_pages_total) {
+			struct gdma_general_resp add_resp = {};
+			u32 expected_status = 0;
+
+			if (num_pages_cur + num_pages_to_handle <
+			    num_pages_total) {
+				/* Status indicating more pages are needed */
+				expected_status = GDMA_STATUS_MORE_ENTRIES;
+			}
+
+			memset(add_req, 0, add_req_msg_size);
+
+			mana_gd_init_req_hdr(&add_req->hdr,
+					     GDMA_DMA_REGION_ADD_PAGES,
+					     add_req_msg_size,
+					     sizeof(add_resp));
+			add_req->dma_region_handle = *gdma_region;
+			add_req->page_addr_list_len = num_pages_to_handle;
+
+			for (i = 0; i < num_pages_to_handle; ++i) {
+				dma_addr_t cur_addr =
+					rdma_block_iter_dma_address(&biter);
+				add_req->page_addr_list[i] = cur_addr;
+				__rdma_block_iter_next(&biter);
+
+				ibdev_dbg(&dev->ib_dev,
+					  "page_addr_list %lu addr 0x%llx\n",
+					  num_pages_cur + i, cur_addr);
+			}
+
+			err = mana_gd_send_request(gc, add_req_msg_size,
+						   add_req, sizeof(add_resp),
+						   &add_resp);
+			if (!err || add_resp.hdr.status != expected_status) {
+				ibdev_err(&dev->ib_dev,
+					  "Failed put DMA pages %u: %d,0x%x\n",
+					  i, err, add_resp.hdr.status);
+				err = -EPROTO;
+				break;
+			}
+
+			num_pages_cur += num_pages_to_handle;
+			num_pages_to_handle =
+				min_t(size_t, num_pages_total - num_pages_cur,
+				      max_pgs_add_cmd);
+			add_req_msg_size = sizeof(*add_req) +
+					   num_pages_to_handle * sizeof(u64);
+		}
+
+		kfree(add_req);
+	}
+
+	if (!err)
+		return 0;
+
+free_gdma_region:
+	mana_ib_gd_destroy_dma_region(dev, create_resp.dma_region_handle);
+
+error:
+	return err;
+}
+
+int mana_ib_gd_destroy_dma_region(struct mana_ib_dev *dev, u64 gdma_region)
+{
+	struct gdma_dev *mdev = dev->gdma_dev;
+	struct gdma_context *gc;
+
+	gc = mdev->gdma_context;
+	ibdev_dbg(&dev->ib_dev, "destroy dma region 0x%llx\n", gdma_region);
+
+	return mana_gd_destroy_dma_region(gc, gdma_region);
+}
+
+int mana_ib_gd_create_pd(struct mana_ib_dev *dev, u64 *pd_handle, u32 *pd_id,
+			 enum gdma_pd_flags flags)
+{
+	struct gdma_dev *mdev = dev->gdma_dev;
+	struct gdma_create_pd_resp resp = {};
+	struct gdma_create_pd_req req = {};
+	struct gdma_context *gc;
+	int err;
+
+	gc = mdev->gdma_context;
+
+	mana_gd_init_req_hdr(&req.hdr, GDMA_CREATE_PD, sizeof(req),
+			     sizeof(resp));
+
+	req.flags = flags;
+	err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
+
+	if (err || resp.hdr.status) {
+		ibdev_err(&dev->ib_dev,
+			  "Failed to get pd_id err %d status %u\n", err,
+			  resp.hdr.status);
+		if (!err)
+			err = -EPROTO;
+
+		return err;
+	}
+
+	*pd_handle = resp.pd_handle;
+	*pd_id = resp.pd_id;
+	ibdev_dbg(&dev->ib_dev, "pd_handle 0x%llx pd_id %d\n", *pd_handle,
+		  *pd_id);
+
+	return 0;
+}
+
+int mana_ib_gd_destroy_pd(struct mana_ib_dev *dev, u64 pd_handle)
+{
+	struct gdma_dev *mdev = dev->gdma_dev;
+	struct gdma_destory_pd_resp resp = {};
+	struct gdma_destroy_pd_req req = {};
+	struct gdma_context *gc;
+	int err;
+
+	gc = mdev->gdma_context;
+
+	mana_gd_init_req_hdr(&req.hdr, GDMA_DESTROY_PD, sizeof(req),
+			     sizeof(resp));
+
+	req.pd_handle = pd_handle;
+	err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
+
+	if (err || resp.hdr.status) {
+		ibdev_err(&dev->ib_dev,
+			  "Failed to destroy pd_handle 0x%llx err %d status %u",
+			  pd_handle, err, resp.hdr.status);
+		if (!err)
+			err = -EPROTO;
+	}
+
+	return err;
+}
+
+int mana_ib_gd_create_mr(struct mana_ib_dev *dev, struct mana_ib_mr *mr,
+			 struct gdma_create_mr_params *mr_params)
+{
+	struct gdma_create_mr_response resp = {};
+	struct gdma_create_mr_request req = {};
+	struct gdma_dev *mdev = dev->gdma_dev;
+	struct gdma_context *gc;
+	int err;
+
+	gc = mdev->gdma_context;
+
+	mana_gd_init_req_hdr(&req.hdr, GDMA_CREATE_MR, sizeof(req),
+			     sizeof(resp));
+	req.pd_handle = mr_params->pd_handle;
+	req.mr_type = mr_params->mr_type;
+
+	switch (mr_params->mr_type) {
+	case GDMA_MR_TYPE_GVA:
+		req.gva.dma_region_handle = mr_params->gva.dma_region_handle;
+		req.gva.virtual_address = mr_params->gva.virtual_address;
+		req.gva.access_flags = mr_params->gva.access_flags;
+		break;
+
+	default:
+		ibdev_dbg(&dev->ib_dev,
+			  "invalid param (GDMA_MR_TYPE) passed, type %d\n",
+			  req.mr_type);
+		err = -EINVAL;
+		goto error;
+	}
+
+	err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
+
+	if (err || resp.hdr.status) {
+		ibdev_err(&dev->ib_dev, "Failed to create mr %d, %u", err,
+			  resp.hdr.status);
+		if (!err)
+			err = -EPROTO;
+
+		goto error;
+	}
+
+	mr->ibmr.lkey = resp.lkey;
+	mr->ibmr.rkey = resp.rkey;
+	mr->mr_handle = resp.mr_handle;
+
+	return 0;
+error:
+	return err;
+}
+
+int mana_ib_gd_destroy_mr(struct mana_ib_dev *dev, gdma_obj_handle_t mr_handle)
+{
+	struct gdma_destroy_mr_response resp = {};
+	struct gdma_destroy_mr_request req = {};
+	struct gdma_dev *mdev = dev->gdma_dev;
+	struct gdma_context *gc;
+	int err;
+
+	gc = mdev->gdma_context;
+
+	mana_gd_init_req_hdr(&req.hdr, GDMA_DESTROY_MR, sizeof(req),
+			     sizeof(resp));
+
+	req.mr_handle = mr_handle;
+
+	err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
+	if (err || resp.hdr.status) {
+		dev_err(gc->dev, "Failed to destroy MR: %d, 0x%x\n", err,
+			resp.hdr.status);
+		if (!err)
+			err = -EPROTO;
+		return err;
+	}
+
+	return 0;
+}
+
+int mana_ib_mmap(struct ib_ucontext *ibcontext, struct vm_area_struct *vma)
+{
+	struct mana_ib_ucontext *mana_ucontext =
+		container_of(ibcontext, struct mana_ib_ucontext, ibucontext);
+	struct ib_device *ibdev = ibcontext->device;
+	struct mana_ib_dev *mdev;
+	struct gdma_context *gc;
+	phys_addr_t pfn;
+	pgprot_t prot;
+	int ret;
+
+	mdev = container_of(ibdev, struct mana_ib_dev, ib_dev);
+	gc = mdev->gdma_dev->gdma_context;
+
+	if (vma->vm_pgoff != 0) {
+		ibdev_err(ibdev, "Unexpected vm_pgoff %lu\n", vma->vm_pgoff);
+		return -EINVAL;
+	}
+
+	/* Map to the page indexed by ucontext->doorbell */
+	pfn = (gc->phys_db_page_base +
+	       gc->db_page_size * mana_ucontext->doorbell) >>
+	      PAGE_SHIFT;
+	prot = pgprot_writecombine(vma->vm_page_prot);
+
+	ret = rdma_user_mmap_io(ibcontext, vma, pfn, gc->db_page_size, prot,
+				NULL);
+	if (ret)
+		ibdev_err(ibdev, "can't rdma_user_mmap_io ret %d\n", ret);
+	else
+		ibdev_dbg(ibdev, "mapped I/O pfn 0x%llx page_size %u, ret %d\n",
+			  pfn, gc->db_page_size, ret);
+
+	return ret;
+}
+
+int mana_ib_get_port_immutable(struct ib_device *ibdev, u32 port_num,
+			       struct ib_port_immutable *immutable)
+{
+	/* This version only support RAW_PACKET
+	 * other values need to be filled for other types
+	 */
+	immutable->core_cap_flags = RDMA_CORE_PORT_RAW_PACKET;
+
+	return 0;
+}
+
+int mana_ib_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
+			 struct ib_udata *uhw)
+{
+	props->max_qp = MANA_MAX_NUM_QUEUES;
+	props->max_qp_wr = MAX_SEND_BUFFERS_PER_QUEUE;
+
+	/* max_cqe could be potentially much bigger.
+	 * As this version of driver only support RAW QP, set it to the same
+	 * value as max_qp_wr
+	 */
+	props->max_cqe = MAX_SEND_BUFFERS_PER_QUEUE;
+
+	props->max_mr_size = MANA_IB_MAX_MR_SIZE;
+	props->max_mr = INT_MAX;
+	props->max_send_sge = MAX_TX_WQE_SGL_ENTRIES;
+	props->max_recv_sge = MAX_RX_WQE_SGL_ENTRIES;
+
+	return 0;
+}
+
+int mana_ib_query_port(struct ib_device *ibdev, u32 port,
+		       struct ib_port_attr *props)
+{
+	/* This version doesn't return port properties */
+	return 0;
+}
+
+int mana_ib_query_gid(struct ib_device *ibdev, u32 port, int index,
+		      union ib_gid *gid)
+{
+	/* This version doesn't return GID properties */
+	return 0;
+}
+
+void mana_ib_disassociate_ucontext(struct ib_ucontext *ibcontext)
+{
+}
diff --git a/drivers/infiniband/hw/mana/mana_ib.h b/drivers/infiniband/hw/mana/mana_ib.h
new file mode 100644
index 000000000000..581aa5bff470
--- /dev/null
+++ b/drivers/infiniband/hw/mana/mana_ib.h
@@ -0,0 +1,165 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2022 Microsoft Corporation. All rights reserved.
+ */
+
+#ifndef _MANA_IB_H_
+#define _MANA_IB_H_
+
+#include <rdma/ib_verbs.h>
+#include <rdma/ib_mad.h>
+#include <rdma/ib_umem.h>
+#include <rdma/mana-abi.h>
+
+#include <net/mana/mana.h>
+
+#define PAGE_SZ_BM                                                             \
+	(SZ_4K | SZ_8K | SZ_16K | SZ_32K | SZ_64K | SZ_128K | SZ_256K |        \
+	 SZ_512K | SZ_1M | SZ_2M)
+
+/* MANA doesn't have any limit for MR size */
+#define MANA_IB_MAX_MR_SIZE ((u64)(~(0ULL)))
+
+struct mana_ib_dev {
+	struct ib_device ib_dev;
+	struct gdma_dev *gdma_dev;
+};
+
+struct mana_ib_wq {
+	struct ib_wq ibwq;
+	struct ib_umem *umem;
+	int wqe;
+	u32 wq_buf_size;
+	u64 gdma_region;
+	u64 id;
+	mana_handle_t rx_object;
+};
+
+struct mana_ib_pd {
+	struct ib_pd ibpd;
+	u32 pdn;
+	mana_handle_t pd_handle;
+
+	/* Mutex for sharing access to vport_use_count */
+	struct mutex vport_mutex;
+	int vport_use_count;
+
+	bool tx_shortform_allowed;
+	u32 tx_vp_offset;
+};
+
+struct mana_ib_mr {
+	struct ib_mr ibmr;
+	struct ib_umem *umem;
+	mana_handle_t mr_handle;
+};
+
+struct mana_ib_cq {
+	struct ib_cq ibcq;
+	struct ib_umem *umem;
+	int cqe;
+	u64 gdma_region;
+	u64 id;
+};
+
+struct mana_ib_qp {
+	struct ib_qp ibqp;
+
+	/* Work queue info */
+	struct ib_umem *sq_umem;
+	int sqe;
+	u64 sq_gdma_region;
+	u64 sq_id;
+	mana_handle_t tx_object;
+
+	/* The port on the IB device, starting with 1 */
+	u32 port;
+};
+
+struct mana_ib_ucontext {
+	struct ib_ucontext ibucontext;
+	u32 doorbell;
+};
+
+struct mana_ib_rwq_ind_table {
+	struct ib_rwq_ind_table ib_ind_table;
+};
+
+int mana_ib_gd_create_dma_region(struct mana_ib_dev *dev, struct ib_umem *umem,
+				 mana_handle_t *gdma_region, u64 page_sz);
+
+int mana_ib_gd_destroy_dma_region(struct mana_ib_dev *dev,
+				  mana_handle_t gdma_region);
+
+struct ib_wq *mana_ib_create_wq(struct ib_pd *pd,
+				struct ib_wq_init_attr *init_attr,
+				struct ib_udata *udata);
+
+int mana_ib_modify_wq(struct ib_wq *wq, struct ib_wq_attr *wq_attr,
+		      u32 wq_attr_mask, struct ib_udata *udata);
+
+int mana_ib_destroy_wq(struct ib_wq *ibwq, struct ib_udata *udata);
+
+int mana_ib_create_rwq_ind_table(struct ib_rwq_ind_table *ib_rwq_ind_table,
+				 struct ib_rwq_ind_table_init_attr *init_attr,
+				 struct ib_udata *udata);
+
+int mana_ib_destroy_rwq_ind_table(struct ib_rwq_ind_table *ib_rwq_ind_tbl);
+
+struct ib_mr *mana_ib_get_dma_mr(struct ib_pd *ibpd, int access_flags);
+
+struct ib_mr *mana_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
+				  u64 iova, int access_flags,
+				  struct ib_udata *udata);
+
+int mana_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata);
+
+int mana_ib_create_qp(struct ib_qp *qp, struct ib_qp_init_attr *qp_init_attr,
+		      struct ib_udata *udata);
+
+int mana_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
+		      int attr_mask, struct ib_udata *udata);
+
+int mana_ib_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata);
+
+int mana_ib_cfg_vport(struct mana_ib_dev *dev, u32 port_id,
+		      struct mana_ib_pd *pd, u32 doorbell_id);
+void mana_ib_uncfg_vport(struct mana_ib_dev *dev, struct mana_ib_pd *pd,
+			 u32 port);
+
+int mana_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
+		      struct ib_udata *udata);
+
+int mana_ib_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata);
+
+int mana_ib_gd_create_pd(struct mana_ib_dev *dev, u64 *pd_handle, u32 *pd_id,
+			 enum gdma_pd_flags flags);
+
+int mana_ib_gd_destroy_pd(struct mana_ib_dev *dev, u64 pd_handle);
+
+int mana_ib_gd_create_mr(struct mana_ib_dev *dev, struct mana_ib_mr *mr,
+			 struct gdma_create_mr_params *mr_params);
+
+int mana_ib_gd_destroy_mr(struct mana_ib_dev *dev, mana_handle_t mr_handle);
+
+int mana_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata);
+int mana_ib_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata);
+
+int mana_ib_alloc_ucontext(struct ib_ucontext *ibcontext,
+			   struct ib_udata *udata);
+void mana_ib_dealloc_ucontext(struct ib_ucontext *ibcontext);
+
+int mana_ib_mmap(struct ib_ucontext *ibcontext, struct vm_area_struct *vma);
+
+int mana_ib_get_port_immutable(struct ib_device *ibdev, u32 port_num,
+			       struct ib_port_immutable *immutable);
+int mana_ib_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
+			 struct ib_udata *uhw);
+int mana_ib_query_port(struct ib_device *ibdev, u32 port,
+		       struct ib_port_attr *props);
+int mana_ib_query_gid(struct ib_device *ibdev, u32 port, int index,
+		      union ib_gid *gid);
+
+void mana_ib_disassociate_ucontext(struct ib_ucontext *ibcontext);
+
+#endif
diff --git a/drivers/infiniband/hw/mana/mr.c b/drivers/infiniband/hw/mana/mr.c
new file mode 100644
index 000000000000..6b2c91685743
--- /dev/null
+++ b/drivers/infiniband/hw/mana/mr.c
@@ -0,0 +1,133 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2022, Microsoft Corporation. All rights reserved.
+ */
+
+#include "mana_ib.h"
+
+#define VALID_MR_FLAGS                                                         \
+	(IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE | IB_ACCESS_REMOTE_READ)
+
+static enum gdma_mr_access_flags
+mana_ib_verbs_to_gdma_access_flags(int access_flags)
+{
+	enum gdma_mr_access_flags flags = GDMA_ACCESS_FLAG_LOCAL_READ;
+
+	if (access_flags & IB_ACCESS_LOCAL_WRITE)
+		flags |= GDMA_ACCESS_FLAG_LOCAL_WRITE;
+
+	if (access_flags & IB_ACCESS_REMOTE_WRITE)
+		flags |= GDMA_ACCESS_FLAG_REMOTE_WRITE;
+
+	if (access_flags & IB_ACCESS_REMOTE_READ)
+		flags |= GDMA_ACCESS_FLAG_REMOTE_READ;
+
+	return flags;
+}
+
+struct ib_mr *mana_ib_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 length,
+				  u64 iova, int access_flags,
+				  struct ib_udata *udata)
+{
+	struct mana_ib_pd *pd = container_of(ibpd, struct mana_ib_pd, ibpd);
+	struct gdma_create_mr_params mr_params = {};
+	struct ib_device *ibdev = ibpd->device;
+	gdma_obj_handle_t dma_region_handle;
+	struct mana_ib_dev *dev;
+	struct mana_ib_mr *mr;
+	u64 page_sz;
+	int err;
+
+	dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
+
+	ibdev_dbg(ibdev,
+		  "start 0x%llx, iova 0x%llx length 0x%llx access_flags 0x%x",
+		  start, iova, length, access_flags);
+
+	if (access_flags & ~VALID_MR_FLAGS)
+		return ERR_PTR(-EINVAL);
+
+	mr = kzalloc(sizeof(*mr), GFP_KERNEL);
+	if (!mr)
+		return ERR_PTR(-ENOMEM);
+
+	mr->umem = ib_umem_get(ibdev, start, length, access_flags);
+	if (IS_ERR(mr->umem)) {
+		err = PTR_ERR(mr->umem);
+		ibdev_dbg(ibdev,
+			  "Failed to get umem for register user-mr, %d\n", err);
+		goto err_free;
+	}
+
+	page_sz = ib_umem_find_best_pgsz(mr->umem, PAGE_SZ_BM, iova);
+	if (unlikely(!page_sz)) {
+		ibdev_err(ibdev, "Failed to get best page size\n");
+		err = -EOPNOTSUPP;
+		goto err_umem;
+	}
+	ibdev_dbg(ibdev, "Page size chosen %llu\n", page_sz);
+
+	err = mana_ib_gd_create_dma_region(dev, mr->umem, &dma_region_handle,
+					   page_sz);
+	if (err) {
+		ibdev_err(ibdev, "Failed create dma region for user-mr, %d\n",
+			  err);
+		goto err_umem;
+	}
+
+	ibdev_dbg(ibdev,
+		  "mana_ib_gd_create_dma_region ret %d gdma_region %llx\n", err,
+		  dma_region_handle);
+
+	mr_params.pd_handle = pd->pd_handle;
+	mr_params.mr_type = GDMA_MR_TYPE_GVA;
+	mr_params.gva.dma_region_handle = dma_region_handle;
+	mr_params.gva.virtual_address = iova;
+	mr_params.gva.access_flags =
+		mana_ib_verbs_to_gdma_access_flags(access_flags);
+
+	err = mana_ib_gd_create_mr(dev, mr, &mr_params);
+	if (err)
+		goto err_dma_region;
+
+	/* There is no need to keep track of dma_region_handle after MR is
+	 * successfully created. The dma_region_handle is tracked in the PF
+	 * as part of the lifecycle of this MR.
+	 */
+
+	mr->ibmr.length = length;
+	mr->ibmr.page_size = page_sz;
+	return &mr->ibmr;
+
+err_dma_region:
+	mana_gd_destroy_dma_region(dev->gdma_dev->gdma_context,
+				   dma_region_handle);
+
+err_umem:
+	ib_umem_release(mr->umem);
+
+err_free:
+	kfree(mr);
+	return ERR_PTR(err);
+}
+
+int mana_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
+{
+	struct mana_ib_mr *mr = container_of(ibmr, struct mana_ib_mr, ibmr);
+	struct ib_device *ibdev = ibmr->device;
+	struct mana_ib_dev *dev;
+	int err;
+
+	dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
+
+	err = mana_ib_gd_destroy_mr(dev, mr->mr_handle);
+	if (err)
+		return err;
+
+	if (mr->umem)
+		ib_umem_release(mr->umem);
+
+	kfree(mr);
+
+	return 0;
+}
diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
new file mode 100644
index 000000000000..9214d760495f
--- /dev/null
+++ b/drivers/infiniband/hw/mana/qp.c
@@ -0,0 +1,501 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2022, Microsoft Corporation. All rights reserved.
+ */
+
+#include "mana_ib.h"
+
+static int mana_ib_cfg_vport_steering(struct mana_ib_dev *dev,
+				      struct net_device *ndev,
+				      mana_handle_t default_rxobj,
+				      mana_handle_t ind_table[],
+				      u32 log_ind_tbl_size, u32 rx_hash_key_len,
+				      u8 *rx_hash_key)
+{
+	struct mana_port_context *mpc = netdev_priv(ndev);
+	struct mana_cfg_rx_steer_req *req = NULL;
+	struct mana_cfg_rx_steer_resp resp = {};
+	mana_handle_t *req_indir_tab;
+	struct gdma_context *gc;
+	struct gdma_dev *mdev;
+	u32 req_buf_size;
+	int i, err;
+
+	mdev = dev->gdma_dev;
+	gc = mdev->gdma_context;
+
+	req_buf_size =
+		sizeof(*req) + sizeof(mana_handle_t) * MANA_INDIRECT_TABLE_SIZE;
+	req = kzalloc(req_buf_size, GFP_KERNEL);
+	if (!req)
+		return -ENOMEM;
+
+	mana_gd_init_req_hdr(&req->hdr, MANA_CONFIG_VPORT_RX, req_buf_size,
+			     sizeof(resp));
+
+	req->vport = mpc->port_handle;
+	req->rx_enable = 1;
+	req->update_default_rxobj = 1;
+	req->default_rxobj = default_rxobj;
+	req->hdr.dev_id = mdev->dev_id;
+
+	/* If there are more than 1 entries in indirection table, enable RSS */
+	if (log_ind_tbl_size)
+		req->rss_enable = true;
+
+	req->num_indir_entries = MANA_INDIRECT_TABLE_SIZE;
+	req->indir_tab_offset = sizeof(*req);
+	req->update_indir_tab = true;
+
+	req_indir_tab = (mana_handle_t *)(req + 1);
+	/* The ind table passed to the hardware must have
+	 * MANA_INDIRECT_TABLE_SIZE entries. Adjust the verb
+	 * ind_table to MANA_INDIRECT_TABLE_SIZE if required
+	 */
+	ibdev_dbg(&dev->ib_dev, "ind table size %u\n", 1 << log_ind_tbl_size);
+	for (i = 0; i < MANA_INDIRECT_TABLE_SIZE; i++) {
+		req_indir_tab[i] = ind_table[i % (1 << log_ind_tbl_size)];
+		ibdev_dbg(&dev->ib_dev, "index %u handle 0x%llx\n", i,
+			  req_indir_tab[i]);
+	}
+
+	req->update_hashkey = true;
+	if (rx_hash_key_len)
+		memcpy(req->hashkey, rx_hash_key, rx_hash_key_len);
+	else
+		netdev_rss_key_fill(req->hashkey, MANA_HASH_KEY_SIZE);
+
+	ibdev_dbg(&dev->ib_dev, "vport handle %llu default_rxobj 0x%llx\n",
+		  req->vport, default_rxobj);
+
+	err = mana_gd_send_request(gc, req_buf_size, req, sizeof(resp), &resp);
+	if (err) {
+		netdev_err(ndev, "Failed to configure vPort RX: %d\n", err);
+		goto out;
+	}
+
+	if (resp.hdr.status) {
+		netdev_err(ndev, "vPort RX configuration failed: 0x%x\n",
+			   resp.hdr.status);
+		err = -EPROTO;
+	}
+
+	netdev_info(ndev, "Configured steering vPort %llu log_entries %u\n",
+		    mpc->port_handle, log_ind_tbl_size);
+
+out:
+	kfree(req);
+	return err;
+}
+
+static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
+				 struct ib_qp_init_attr *attr,
+				 struct ib_udata *udata)
+{
+	struct mana_ib_qp *qp = container_of(ibqp, struct mana_ib_qp, ibqp);
+	struct mana_ib_dev *mdev =
+		container_of(pd->device, struct mana_ib_dev, ib_dev);
+	struct ib_rwq_ind_table *ind_tbl = attr->rwq_ind_tbl;
+	struct mana_ib_create_qp_rss_resp resp = {};
+	struct mana_ib_create_qp_rss ucmd = {};
+	struct gdma_dev *gd = mdev->gdma_dev;
+	mana_handle_t *mana_ind_table;
+	struct mana_port_context *mpc;
+	struct mana_context *mc;
+	struct net_device *ndev;
+	struct mana_ib_cq *cq;
+	struct mana_ib_wq *wq;
+	struct ib_cq *ibcq;
+	struct ib_wq *ibwq;
+	int i = 0, ret;
+	u32 port;
+
+	mc = gd->driver_data;
+
+	if (udata->inlen < sizeof(ucmd))
+		return -EINVAL;
+
+	ret = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen));
+	if (ret) {
+		ibdev_dbg(&mdev->ib_dev,
+			  "Failed copy from udata for create rss-qp, err %d\n",
+			  ret);
+		return -EFAULT;
+	}
+
+	if (attr->cap.max_recv_wr > MAX_SEND_BUFFERS_PER_QUEUE) {
+		ibdev_dbg(&mdev->ib_dev,
+			  "Requested max_recv_wr %d exceeding limit.\n",
+			  attr->cap.max_recv_wr);
+		return -EINVAL;
+	}
+
+	if (attr->cap.max_recv_sge > MAX_RX_WQE_SGL_ENTRIES) {
+		ibdev_dbg(&mdev->ib_dev,
+			  "Requested max_recv_sge %d exceeding limit.\n",
+			  attr->cap.max_recv_sge);
+		return -EINVAL;
+	}
+
+	if (ucmd.rx_hash_function != MANA_IB_RX_HASH_FUNC_TOEPLITZ) {
+		ibdev_dbg(&mdev->ib_dev,
+			  "RX Hash function is not supported, %d\n",
+			  ucmd.rx_hash_function);
+		return -EINVAL;
+	}
+
+	/* IB ports start with 1, MANA start with 0 */
+	port = ucmd.port;
+	if (port < 1 || port > mc->num_ports) {
+		ibdev_dbg(&mdev->ib_dev, "Invalid port %u in creating qp\n",
+			  port);
+		return -EINVAL;
+	}
+	ndev = mc->ports[port - 1];
+	mpc = netdev_priv(ndev);
+
+	ibdev_dbg(&mdev->ib_dev, "rx_hash_function %d port %d\n",
+		  ucmd.rx_hash_function, port);
+
+	mana_ind_table = kzalloc(sizeof(mana_handle_t) *
+					 (1 << ind_tbl->log_ind_tbl_size),
+				 GFP_KERNEL);
+	if (!mana_ind_table) {
+		ret = -ENOMEM;
+		goto fail;
+	}
+
+	qp->port = port;
+
+	for (i = 0; i < (1 << ind_tbl->log_ind_tbl_size); i++) {
+		struct mana_obj_spec wq_spec = {};
+		struct mana_obj_spec cq_spec = {};
+
+		ibwq = ind_tbl->ind_tbl[i];
+		wq = container_of(ibwq, struct mana_ib_wq, ibwq);
+
+		ibcq = ibwq->cq;
+		cq = container_of(ibcq, struct mana_ib_cq, ibcq);
+
+		wq_spec.gdma_region = wq->gdma_region;
+		wq_spec.queue_size = wq->wq_buf_size;
+
+		cq_spec.gdma_region = cq->gdma_region;
+		cq_spec.queue_size = cq->cqe * COMP_ENTRY_SIZE;
+		cq_spec.modr_ctx_id = 0;
+		cq_spec.attached_eq = GDMA_CQ_NO_EQ;
+
+		ret = mana_create_wq_obj(mpc, mpc->port_handle, GDMA_RQ,
+					 &wq_spec, &cq_spec, &wq->rx_object);
+		if (ret)
+			goto fail;
+
+		/* The GDMA regions are now owned by the WQ object */
+		wq->gdma_region = GDMA_INVALID_DMA_REGION;
+		cq->gdma_region = GDMA_INVALID_DMA_REGION;
+
+		wq->id = wq_spec.queue_index;
+		cq->id = cq_spec.queue_index;
+
+		ibdev_dbg(&mdev->ib_dev,
+			  "ret %d rx_object 0x%llx wq id %llu cq id %llu\n",
+			  ret, wq->rx_object, wq->id, cq->id);
+
+		resp.entries[i].cqid = cq->id;
+		resp.entries[i].wqid = wq->id;
+
+		mana_ind_table[i] = wq->rx_object;
+	}
+	resp.num_entries = i;
+
+	ret = mana_ib_cfg_vport_steering(mdev, ndev, wq->rx_object,
+					 mana_ind_table,
+					 ind_tbl->log_ind_tbl_size,
+					 ucmd.rx_hash_key_len,
+					 ucmd.rx_hash_key);
+	if (ret)
+		goto fail;
+
+	kfree(mana_ind_table);
+
+	if (udata) {
+		ret = ib_copy_to_udata(udata, &resp, sizeof(resp));
+		if (ret) {
+			ibdev_dbg(&mdev->ib_dev,
+				  "Failed to copy to udata create rss-qp, %d\n",
+				  ret);
+			goto fail;
+		}
+	}
+
+	return 0;
+
+fail:
+	while (i-- > 0) {
+		ibwq = ind_tbl->ind_tbl[i];
+		wq = container_of(ibwq, struct mana_ib_wq, ibwq);
+		mana_destroy_wq_obj(mpc, GDMA_RQ, wq->rx_object);
+	}
+
+	kfree(mana_ind_table);
+
+	return ret;
+}
+
+static int mana_ib_create_qp_raw(struct ib_qp *ibqp, struct ib_pd *ibpd,
+				 struct ib_qp_init_attr *attr,
+				 struct ib_udata *udata)
+{
+	struct mana_ib_pd *pd = container_of(ibpd, struct mana_ib_pd, ibpd);
+	struct mana_ib_qp *qp = container_of(ibqp, struct mana_ib_qp, ibqp);
+	struct mana_ib_dev *mdev =
+		container_of(ibpd->device, struct mana_ib_dev, ib_dev);
+	struct mana_ib_cq *send_cq =
+		container_of(attr->send_cq, struct mana_ib_cq, ibcq);
+	struct ib_ucontext *ib_ucontext = ibpd->uobject->context;
+	struct mana_ib_create_qp_resp resp = {};
+	struct mana_ib_ucontext *mana_ucontext;
+	struct gdma_dev *gd = mdev->gdma_dev;
+	struct mana_ib_create_qp ucmd = {};
+	struct mana_obj_spec wq_spec = {};
+	struct mana_obj_spec cq_spec = {};
+	struct mana_port_context *mpc;
+	struct mana_context *mc;
+	struct net_device *ndev;
+	struct ib_umem *umem;
+	int err;
+	u32 port;
+
+	mana_ucontext =
+		container_of(ib_ucontext, struct mana_ib_ucontext, ibucontext);
+	mc = gd->driver_data;
+
+	if (udata->inlen < sizeof(ucmd))
+		return -EINVAL;
+
+	err = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen));
+	if (err) {
+		ibdev_dbg(&mdev->ib_dev,
+			  "Failed to copy from udata create qp-raw, %d\n", err);
+		return -EFAULT;
+	}
+
+	/* IB ports start with 1, MANA Ethernet ports start with 0 */
+	port = ucmd.port;
+	if (ucmd.port > mc->num_ports)
+		return -EINVAL;
+
+	if (attr->cap.max_send_wr > MAX_SEND_BUFFERS_PER_QUEUE) {
+		ibdev_dbg(&mdev->ib_dev,
+			  "Requested max_send_wr %d exceeding limit\n",
+			  attr->cap.max_send_wr);
+		return -EINVAL;
+	}
+
+	if (attr->cap.max_send_sge > MAX_TX_WQE_SGL_ENTRIES) {
+		ibdev_dbg(&mdev->ib_dev,
+			  "Requested max_send_sge %d exceeding limit\n",
+			  attr->cap.max_send_sge);
+		return -EINVAL;
+	}
+
+	ndev = mc->ports[port - 1];
+	mpc = netdev_priv(ndev);
+	ibdev_dbg(&mdev->ib_dev, "port %u ndev %p mpc %p\n", port, ndev, mpc);
+
+	err = mana_ib_cfg_vport(mdev, port - 1, pd, mana_ucontext->doorbell);
+	if (err)
+		return -ENODEV;
+
+	qp->port = port;
+
+	ibdev_dbg(&mdev->ib_dev, "ucmd sq_buf_addr 0x%llx port %u\n",
+		  ucmd.sq_buf_addr, ucmd.port);
+
+	umem = ib_umem_get(ibpd->device, ucmd.sq_buf_addr, ucmd.sq_buf_size,
+			   IB_ACCESS_LOCAL_WRITE);
+	if (IS_ERR(umem)) {
+		err = PTR_ERR(umem);
+		ibdev_dbg(&mdev->ib_dev,
+			  "Failed to get umem for create qp-raw, err %d\n",
+			  err);
+		goto err_free_vport;
+	}
+	qp->sq_umem = umem;
+
+	err = mana_ib_gd_create_dma_region(mdev, qp->sq_umem,
+					   &qp->sq_gdma_region, PAGE_SIZE);
+	if (err) {
+		ibdev_err(&mdev->ib_dev,
+			  "Failed to create dma region for create qp-raw, %d\n",
+			  err);
+		goto err_release_umem;
+	}
+
+	ibdev_dbg(&mdev->ib_dev,
+		  "mana_ib_gd_create_dma_region ret %d gdma_region 0x%llx\n",
+		  err, qp->sq_gdma_region);
+
+	/* Create a WQ on the same port handle used by the Ethernet */
+	wq_spec.gdma_region = qp->sq_gdma_region;
+	wq_spec.queue_size = ucmd.sq_buf_size;
+
+	cq_spec.gdma_region = send_cq->gdma_region;
+	cq_spec.queue_size = send_cq->cqe * COMP_ENTRY_SIZE;
+	cq_spec.modr_ctx_id = 0;
+	cq_spec.attached_eq = GDMA_CQ_NO_EQ;
+
+	err = mana_create_wq_obj(mpc, mpc->port_handle, GDMA_SQ, &wq_spec,
+				 &cq_spec, &qp->tx_object);
+	if (err) {
+		ibdev_err(&mdev->ib_dev,
+			  "Failed to create wq for create raw-qp, err %d\n",
+			  err);
+		goto err_destroy_dma_region;
+	}
+
+	/* The GDMA regions are now owned by the WQ object */
+	qp->sq_gdma_region = GDMA_INVALID_DMA_REGION;
+	send_cq->gdma_region = GDMA_INVALID_DMA_REGION;
+
+	qp->sq_id = wq_spec.queue_index;
+	send_cq->id = cq_spec.queue_index;
+
+	ibdev_dbg(&mdev->ib_dev,
+		  "ret %d qp->tx_object 0x%llx sq id %llu cq id %llu\n", err,
+		  qp->tx_object, qp->sq_id, send_cq->id);
+
+	resp.sqid = qp->sq_id;
+	resp.cqid = send_cq->id;
+	resp.tx_vp_offset = pd->tx_vp_offset;
+
+	if (udata) {
+		err = ib_copy_to_udata(udata, &resp, sizeof(resp));
+		if (err) {
+			ibdev_dbg(&mdev->ib_dev,
+				  "Failed copy udata for create qp-raw, %d\n",
+				  err);
+			goto err_destroy_wq_obj;
+		}
+	}
+
+	return 0;
+
+err_destroy_wq_obj:
+	mana_destroy_wq_obj(mpc, GDMA_SQ, qp->tx_object);
+
+err_destroy_dma_region:
+	mana_ib_gd_destroy_dma_region(mdev, qp->sq_gdma_region);
+
+err_release_umem:
+	ib_umem_release(umem);
+
+err_free_vport:
+	mana_ib_uncfg_vport(mdev, pd, port - 1);
+
+	return err;
+}
+
+int mana_ib_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *attr,
+		      struct ib_udata *udata)
+{
+	switch (attr->qp_type) {
+	case IB_QPT_RAW_PACKET:
+		/* When rwq_ind_tbl is used, it's for creating WQs for RSS */
+		if (attr->rwq_ind_tbl)
+			return mana_ib_create_qp_rss(ibqp, ibqp->pd, attr,
+						     udata);
+
+		return mana_ib_create_qp_raw(ibqp, ibqp->pd, attr, udata);
+	default:
+		/* Creating QP other than IB_QPT_RAW_PACKET is not supported */
+		ibdev_dbg(ibqp->device, "Creating QP type %u not supported\n",
+			  attr->qp_type);
+	}
+
+	return -EINVAL;
+}
+
+int mana_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
+		      int attr_mask, struct ib_udata *udata)
+{
+	/* modify_qp is not supported by this version of the driver */
+	return -EOPNOTSUPP;
+}
+
+static int mana_ib_destroy_qp_rss(struct mana_ib_qp *qp,
+				  struct ib_rwq_ind_table *ind_tbl,
+				  struct ib_udata *udata)
+{
+	struct mana_ib_dev *mdev =
+		container_of(qp->ibqp.device, struct mana_ib_dev, ib_dev);
+	struct gdma_dev *gd = mdev->gdma_dev;
+	struct mana_port_context *mpc;
+	struct mana_context *mc;
+	struct net_device *ndev;
+	struct mana_ib_wq *wq;
+	struct ib_wq *ibwq;
+	int i;
+
+	mc = gd->driver_data;
+	ndev = mc->ports[qp->port - 1];
+	mpc = netdev_priv(ndev);
+
+	for (i = 0; i < (1 << ind_tbl->log_ind_tbl_size); i++) {
+		ibwq = ind_tbl->ind_tbl[i];
+		wq = container_of(ibwq, struct mana_ib_wq, ibwq);
+		ibdev_dbg(&mdev->ib_dev, "destroying wq->rx_object %llu\n",
+			  wq->rx_object);
+		mana_destroy_wq_obj(mpc, GDMA_RQ, wq->rx_object);
+	}
+
+	return 0;
+}
+
+static int mana_ib_destroy_qp_raw(struct mana_ib_qp *qp, struct ib_udata *udata)
+{
+	struct mana_ib_dev *mdev =
+		container_of(qp->ibqp.device, struct mana_ib_dev, ib_dev);
+	struct gdma_dev *gd = mdev->gdma_dev;
+	struct ib_pd *ibpd = qp->ibqp.pd;
+	struct mana_port_context *mpc;
+	struct mana_context *mc;
+	struct net_device *ndev;
+	struct mana_ib_pd *pd;
+
+	mc = gd->driver_data;
+	ndev = mc->ports[qp->port - 1];
+	mpc = netdev_priv(ndev);
+	pd = container_of(ibpd, struct mana_ib_pd, ibpd);
+
+	mana_destroy_wq_obj(mpc, GDMA_SQ, qp->tx_object);
+
+	if (qp->sq_umem) {
+		mana_ib_gd_destroy_dma_region(mdev, qp->sq_gdma_region);
+		ib_umem_release(qp->sq_umem);
+	}
+
+	mana_ib_uncfg_vport(mdev, pd, qp->port - 1);
+
+	return 0;
+}
+
+int mana_ib_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
+{
+	struct mana_ib_qp *qp = container_of(ibqp, struct mana_ib_qp, ibqp);
+
+	switch (ibqp->qp_type) {
+	case IB_QPT_RAW_PACKET:
+		if (ibqp->rwq_ind_tbl)
+			return mana_ib_destroy_qp_rss(qp, ibqp->rwq_ind_tbl,
+						      udata);
+
+		return mana_ib_destroy_qp_raw(qp, udata);
+
+	default:
+		ibdev_dbg(ibqp->device, "Unexpected QP type %u\n",
+			  ibqp->qp_type);
+	}
+
+	return -ENOENT;
+}
diff --git a/drivers/infiniband/hw/mana/wq.c b/drivers/infiniband/hw/mana/wq.c
new file mode 100644
index 000000000000..8bb63b7f5db2
--- /dev/null
+++ b/drivers/infiniband/hw/mana/wq.c
@@ -0,0 +1,114 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2022, Microsoft Corporation. All rights reserved.
+ */
+
+#include "mana_ib.h"
+
+struct ib_wq *mana_ib_create_wq(struct ib_pd *pd,
+				struct ib_wq_init_attr *init_attr,
+				struct ib_udata *udata)
+{
+	struct mana_ib_dev *mdev =
+		container_of(pd->device, struct mana_ib_dev, ib_dev);
+	struct mana_ib_create_wq ucmd = {};
+	struct mana_ib_wq *wq;
+	struct ib_umem *umem;
+	int err;
+
+	if (udata->inlen < sizeof(ucmd))
+		return ERR_PTR(-EINVAL);
+
+	err = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen));
+	if (err) {
+		ibdev_dbg(&mdev->ib_dev,
+			  "Failed to copy from udata for create wq, %d\n", err);
+		return ERR_PTR(-EFAULT);
+	}
+
+	wq = kzalloc(sizeof(*wq), GFP_KERNEL);
+	if (!wq)
+		return ERR_PTR(-ENOMEM);
+
+	ibdev_dbg(&mdev->ib_dev, "ucmd wq_buf_addr 0x%llx\n", ucmd.wq_buf_addr);
+
+	umem = ib_umem_get(pd->device, ucmd.wq_buf_addr, ucmd.wq_buf_size,
+			   IB_ACCESS_LOCAL_WRITE);
+	if (IS_ERR(umem)) {
+		err = PTR_ERR(umem);
+		ibdev_dbg(&mdev->ib_dev,
+			  "Failed to get umem for create wq, err %d\n", err);
+		goto err_free_wq;
+	}
+
+	wq->umem = umem;
+	wq->wqe = init_attr->max_wr;
+	wq->wq_buf_size = ucmd.wq_buf_size;
+	wq->rx_object = INVALID_MANA_HANDLE;
+
+	err = mana_ib_gd_create_dma_region(mdev, wq->umem, &wq->gdma_region,
+					   PAGE_SIZE);
+	if (err) {
+		ibdev_err(&mdev->ib_dev,
+			  "Failed to create dma region for create wq, %d\n",
+			  err);
+		goto err_release_umem;
+	}
+
+	ibdev_dbg(&mdev->ib_dev,
+		  "mana_ib_gd_create_dma_region ret %d gdma_region 0x%llx\n",
+		  err, wq->gdma_region);
+
+	/* WQ ID is returned at wq_create time, doesn't know the value yet */
+
+	return &wq->ibwq;
+
+err_release_umem:
+	ib_umem_release(umem);
+
+err_free_wq:
+	kfree(wq);
+
+	return ERR_PTR(err);
+}
+
+int mana_ib_modify_wq(struct ib_wq *wq, struct ib_wq_attr *wq_attr,
+		      u32 wq_attr_mask, struct ib_udata *udata)
+{
+	/* modify_wq is not supported by this version of the driver */
+	return -EOPNOTSUPP;
+}
+
+int mana_ib_destroy_wq(struct ib_wq *ibwq, struct ib_udata *udata)
+{
+	struct mana_ib_wq *wq = container_of(ibwq, struct mana_ib_wq, ibwq);
+	struct ib_device *ib_dev = ibwq->device;
+	struct mana_ib_dev *mdev;
+
+	mdev = container_of(ib_dev, struct mana_ib_dev, ib_dev);
+
+	mana_ib_gd_destroy_dma_region(mdev, wq->gdma_region);
+	ib_umem_release(wq->umem);
+
+	kfree(wq);
+
+	return 0;
+}
+
+int mana_ib_create_rwq_ind_table(struct ib_rwq_ind_table *ib_rwq_ind_table,
+				 struct ib_rwq_ind_table_init_attr *init_attr,
+				 struct ib_udata *udata)
+{
+	/* There is no additional data in ind_table to be maintained by this
+	 * driver, do nothing
+	 */
+	return 0;
+}
+
+int mana_ib_destroy_rwq_ind_table(struct ib_rwq_ind_table *ib_rwq_ind_tbl)
+{
+	/* There is no additional data in ind_table to be maintained by this
+	 * driver, do nothing
+	 */
+	return 0;
+}
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 713a8f8cca9a..20212ffeefb9 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -412,6 +412,9 @@ int mana_bpf(struct net_device *ndev, struct netdev_bpf *bpf);
 
 extern const struct ethtool_ops mana_ethtool_ops;
 
+/* A CQ can be created not associated with any EQ */
+#define GDMA_CQ_NO_EQ  0xffff
+
 struct mana_obj_spec {
 	u32 queue_index;
 	u64 gdma_region;
diff --git a/include/uapi/rdma/ib_user_ioctl_verbs.h b/include/uapi/rdma/ib_user_ioctl_verbs.h
index 7dd56210226f..e0c25537fd2e 100644
--- a/include/uapi/rdma/ib_user_ioctl_verbs.h
+++ b/include/uapi/rdma/ib_user_ioctl_verbs.h
@@ -251,6 +251,7 @@ enum rdma_driver_id {
 	RDMA_DRIVER_EFA,
 	RDMA_DRIVER_SIW,
 	RDMA_DRIVER_ERDMA,
+	RDMA_DRIVER_MANA,
 };
 
 enum ib_uverbs_gid_type {
diff --git a/include/uapi/rdma/mana-abi.h b/include/uapi/rdma/mana-abi.h
new file mode 100644
index 000000000000..5fcb31b37fb9
--- /dev/null
+++ b/include/uapi/rdma/mana-abi.h
@@ -0,0 +1,66 @@
+/* SPDX-License-Identifier: (GPL-2.0 WITH Linux-syscall-note) */
+/*
+ * Copyright (c) 2022, Microsoft Corporation. All rights reserved.
+ */
+
+#ifndef MANA_ABI_USER_H
+#define MANA_ABI_USER_H
+
+#include <linux/types.h>
+#include <rdma/ib_user_ioctl_verbs.h>
+
+/*
+ * Increment this value if any changes that break userspace ABI
+ * compatibility are made.
+ */
+
+#define MANA_IB_UVERBS_ABI_VERSION 1
+
+struct mana_ib_create_cq {
+	__aligned_u64 buf_addr;
+};
+
+struct mana_ib_create_qp {
+	__aligned_u64 sq_buf_addr;
+	__u32 sq_buf_size;
+	__u32 port;
+};
+
+struct mana_ib_create_qp_resp {
+	__u32 sqid;
+	__u32 cqid;
+	__u32 tx_vp_offset;
+	__u32 reserved;
+};
+
+struct mana_ib_create_wq {
+	__aligned_u64 wq_buf_addr;
+	__u32 wq_buf_size;
+	__u32 reserved;
+};
+
+/* RX Hash function flags */
+enum mana_ib_rx_hash_function_flags {
+	MANA_IB_RX_HASH_FUNC_TOEPLITZ = 1 << 0,
+};
+
+struct mana_ib_create_qp_rss {
+	__aligned_u64 rx_hash_fields_mask;
+	__u8 rx_hash_function;
+	__u8 reserved[7];
+	__u32 rx_hash_key_len;
+	__u8 rx_hash_key[40];
+	__u32 port;
+};
+
+struct rss_resp_entry {
+	__u32 cqid;
+	__u32 wqid;
+};
+
+struct mana_ib_create_qp_rss_resp {
+	__aligned_u64 num_entries;
+	struct rss_resp_entry entries[64];
+};
+
+#endif
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Patch v6 00/12] Introduce Microsoft Azure Network Adapter (MANA) RDMA driver
  2022-09-21  1:22 [Patch v6 00/12] Introduce Microsoft Azure Network Adapter (MANA) RDMA driver longli
                   ` (11 preceding siblings ...)
  2022-09-21  1:22 ` [Patch v6 12/12] RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter longli
@ 2022-09-21 16:25 ` Jason Gunthorpe
  2022-09-21 18:53   ` Long Li
  12 siblings, 1 reply; 19+ messages in thread
From: Jason Gunthorpe @ 2022-09-21 16:25 UTC (permalink / raw)
  To: longli
  Cc: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Leon Romanovsky, edumazet, shiraz.saleem, Ajay Sharma,
	linux-hyperv, netdev, linux-kernel, linux-rdma

On Tue, Sep 20, 2022 at 06:22:20PM -0700, longli@linuxonhyperv.com wrote:
> From: Long Li <longli@microsoft.com>
> 
> This patchset implements a RDMA driver for Microsoft Azure Network
> Adapter (MANA). In MANA, the RDMA device is modeled as an auxiliary device
> to the Ethernet device.
> 
> The first 11 patches modify the MANA Ethernet driver to support RDMA driver.
> The last patch implementes the RDMA driver.
> 
> The user-mode of the driver is being reviewed at:
> https://github.com/linux-rdma/rdma-core/pull/1177
> 
> 
> Ajay Sharma (3):
>   net: mana: Set the DMA device max segment size
>   net: mana: Define and process GDMA response code
>     GDMA_STATUS_MORE_ENTRIES
>   net: mana: Define data structures for protection domain and memory
>     registration
> 
> Long Li (9):
>   net: mana: Add support for auxiliary device
>   net: mana: Record the physical address for doorbell page region
>   net: mana: Handle vport sharing between devices
>   net: mana: Add functions for allocating doorbell page from GDMA
>   net: mana: Export Work Queue functions for use by RDMA driver
>   net: mana: Record port number in netdev
>   net: mana: Move header files to a common location
>   net: mana: Define max values for SGL entries
>   RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter

Still some basic checkpatchy stuff:

/tmp/tmpm2fsg47h/0012-RDMA-mana_ib-Add-a-driver-for-Microsoft-Azure-Network-Adapter.patch:412: WARNING: quoted string split across lines
#412: FILE: drivers/infiniband/hw/mana/main.c:70:
+		  "vport handle %llx pdid %x doorbell_id %x "
+		  "tx_shortform_allowed %d tx_vp_offset %u\n",

/tmp/tmpm2fsg47h/0012-RDMA-mana_ib-Add-a-driver-for-Microsoft-Azure-Network-Adapter.patch:540: WARNING: quoted string split across lines
#540: FILE: drivers/infiniband/hw/mana/main.c:198:
+		  "size_dma_region %lu num_pages_total %lu, "
+		  "page_sz 0x%llx offset_in_page %u\n",

And it thinks you should write more for the kconfig symbol, eg why
would someone want to turn it on (hint, to use dpkd on some Azure
instances)

/tmp/tmpm2fsg47h/0012-RDMA-mana_ib-Add-a-driver-for-Microsoft-Azure-Network-Adapter.patch:100: WARNING: please write a help paragraph that fully describes the config symbol
#100: FILE: drivers/infiniband/hw/mana/Kconfig:2:
+config MANA_INFINIBAND
+	tristate "Microsoft Azure Network Adapter support"
+	depends on NETDEVICES && ETHERNET && PCI && MICROSOFT_MANA
+	help
+	  This driver provides low-level RDMA support for
+	  Microsoft Azure Network Adapter (MANA).

I'll put it in linux-next and you can fix the rest of the stuff bots
will usually find.

Jason

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Patch v6 12/12] RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter
  2022-09-21  1:22 ` [Patch v6 12/12] RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter longli
@ 2022-09-21 16:26   ` Jason Gunthorpe
  2022-09-21 21:00     ` Long Li
  2022-09-21 17:59   ` Jason Gunthorpe
  1 sibling, 1 reply; 19+ messages in thread
From: Jason Gunthorpe @ 2022-09-21 16:26 UTC (permalink / raw)
  To: longli
  Cc: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Leon Romanovsky, edumazet, shiraz.saleem, Ajay Sharma,
	linux-hyperv, netdev, linux-kernel, linux-rdma

On Tue, Sep 20, 2022 at 06:22:32PM -0700, longli@linuxonhyperv.com wrote:
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 8b9a50756c7e..7bcc19e27f97 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9426,6 +9426,7 @@ M:	Haiyang Zhang <haiyangz@microsoft.com>
>  M:	Stephen Hemminger <sthemmin@microsoft.com>
>  M:	Wei Liu <wei.liu@kernel.org>
>  M:	Dexuan Cui <decui@microsoft.com>
> +M:	Long Li <longli@microsoft.com>
>  L:	linux-hyperv@vger.kernel.org
>  S:	Supported
>  T:	git git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git
> @@ -9444,6 +9445,7 @@ F:	arch/x86/kernel/cpu/mshyperv.c
>  F:	drivers/clocksource/hyperv_timer.c
>  F:	drivers/hid/hid-hyperv.c
>  F:	drivers/hv/
> +F:	drivers/infiniband/hw/mana/
>  F:	drivers/input/serio/hyperv-keyboard.c
>  F:	drivers/iommu/hyperv-iommu.c
>  F:	drivers/net/ethernet/microsoft/
> @@ -9459,6 +9461,7 @@ F:	include/clocksource/hyperv_timer.h
>  F:	include/linux/hyperv.h
>  F:	include/net/mana
>  F:	include/uapi/linux/hyperv.h
> +F:	include/uapi/rdma/mana-abi.h
>  F:	net/vmw_vsock/hyperv_transport.c
>  F:	tools/hv/

This is not the proper way to write a maintainers entry, a driver
should be stand-alone and have L: entries for the subsystem mailing
list, not be bundled like this. If you do this patches will not go to
the correct lists and will not be applied.

Follow the example of every other rdma driver please.

Jason

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Patch v6 12/12] RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter
  2022-09-21  1:22 ` [Patch v6 12/12] RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter longli
  2022-09-21 16:26   ` Jason Gunthorpe
@ 2022-09-21 17:59   ` Jason Gunthorpe
  2022-09-22  8:23     ` Long Li
  1 sibling, 1 reply; 19+ messages in thread
From: Jason Gunthorpe @ 2022-09-21 17:59 UTC (permalink / raw)
  To: longli
  Cc: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Leon Romanovsky, edumazet, shiraz.saleem, Ajay Sharma,
	linux-hyperv, netdev, linux-kernel, linux-rdma

On Tue, Sep 20, 2022 at 06:22:32PM -0700, longli@linuxonhyperv.com wrote:
> +int mana_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
> +		      struct ib_udata *udata)
> +{
> +	struct mana_ib_cq *cq = container_of(ibcq, struct mana_ib_cq, ibcq);
> +	struct ib_device *ibdev = ibcq->device;
> +	struct mana_ib_create_cq ucmd = {};
> +	struct mana_ib_dev *mdev;
> +	int err;
> +
> +	mdev = container_of(ibdev, struct mana_ib_dev, ib_dev);

Stylistically these container_of's are usually in the definitions
section, at least pick a form and stick to it consistently.

> +	if (udata->inlen < sizeof(ucmd))
> +		return -EINVAL;
> +
> +	err = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen));
> +	if (err) {
> +		ibdev_dbg(ibdev,
> +			  "Failed to copy from udata for create cq, %d\n", err);

> +		return -EFAULT;
> +	}
> +
> +	if (attr->cqe > MAX_SEND_BUFFERS_PER_QUEUE) {
> +		ibdev_dbg(ibdev, "CQE %d exceeding limit\n", attr->cqe);
> +		return -EINVAL;
> +	}
> +
> +	cq->cqe = attr->cqe;
> +	cq->umem = ib_umem_get(ibdev, ucmd.buf_addr, cq->cqe * COMP_ENTRY_SIZE,
> +			       IB_ACCESS_LOCAL_WRITE);
> +	if (IS_ERR(cq->umem)) {
> +		err = PTR_ERR(cq->umem);
> +		ibdev_dbg(ibdev, "Failed to get umem for create cq, err %d\n",
> +			  err);
> +		return err;
> +	}
> +
> +	err = mana_ib_gd_create_dma_region(mdev, cq->umem, &cq->gdma_region,
> +					   PAGE_SIZE);
> +	if (err) {
> +		ibdev_err(ibdev,
> +			  "Failed to create dma region for create cq, %d\n",
> +			  err);

Prints on userspace paths are not allowed, this should be dbg. There
are many other cases like this, please fix them all. This driver may
have too many dbg prints too, not every failure if should have a print
:(

> +	ibdev_dbg(ibdev,
> +		  "mana_ib_gd_create_dma_region ret %d gdma_region 0x%llx\n",
> +		  err, cq->gdma_region);
> +
> +	/* The CQ ID is not known at this time
> +	 * The ID is generated at create_qp
> +	 */

Wrong comment style, rdma uses the leading empty blank line for some
reason

 /*
  * The CQ ID is not known at this time. The ID is generated at create_qp.
  */

> +static void mana_ib_remove(struct auxiliary_device *adev)
> +{
> +	struct mana_ib_dev *dev = dev_get_drvdata(&adev->dev);
> +
> +	ib_unregister_device(&dev->ib_dev);
> +	ib_dealloc_device(&dev->ib_dev);
> +}
> +
> +static const struct auxiliary_device_id mana_id_table[] = {
> +	{
> +		.name = "mana.rdma",
> +	},
> +	{},
> +};
> +
> +MODULE_DEVICE_TABLE(auxiliary, mana_id_table);
> +
> +static struct auxiliary_driver mana_driver = {
> +	.name = "rdma",
> +	.probe = mana_ib_probe,
> +	.remove = mana_ib_remove,
> +	.id_table = mana_id_table,
> +};
> +
> +static int __init mana_ib_init(void)
> +{
> +	auxiliary_driver_register(&mana_driver);
> +
> +	return 0;
> +}
> +
> +static void __exit mana_ib_cleanup(void)
> +{
> +	auxiliary_driver_unregister(&mana_driver);
> +}
> +

All this is just module_auxiliary_driver()

> +	mutex_lock(&pd->vport_mutex);
> +
> +	pd->vport_use_count++;
> +	if (pd->vport_use_count > 1) {
> +		ibdev_dbg(&dev->ib_dev,
> +			  "Skip as this PD is already configured vport\n");
> +		mutex_unlock(&pd->vport_mutex);

This leaves vport_use_count elevated.

> +		return 0;
> +int mana_ib_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
> +{
> +	struct mana_ib_pd *pd = container_of(ibpd, struct mana_ib_pd, ibpd);
> +	struct ib_device *ibdev = ibpd->device;
> +	struct mana_ib_dev *dev;
> +
> +	dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
> +	return mana_ib_gd_destroy_pd(dev, pd->pd_handle);

This is the only place that calls mana_ib_gd_destroy_pd(), don't have
a spaghetti of functions calling single other functions like this,
just inline it.

Also it shouldn't have been non-static, please check everything that
everything non-static actually has an out-of-file user.

> +void mana_ib_dealloc_ucontext(struct ib_ucontext *ibcontext)
> +{
> +	struct mana_ib_ucontext *mana_ucontext =
> +		container_of(ibcontext, struct mana_ib_ucontext, ibucontext);
> +	struct ib_device *ibdev = ibcontext->device;
> +	struct mana_ib_dev *mdev;
> +	struct gdma_context *gc;
> +	int ret;
> +
> +	mdev = container_of(ibdev, struct mana_ib_dev, ib_dev);
> +	gc = mdev->gdma_dev->gdma_context;
> +
> +	ret = mana_gd_destroy_doorbell_page(gc, mana_ucontext->doorbell);
> +	if (ret)
> +		ibdev_err(ibdev, "Failed to destroy doorbell page %d\n", ret);

This already printing on error

And again, why is this driver split up so strangely?
mana_gd_destroy_doorbell_page() is an RDMA function, it is only called
by the RDMA code, why is it located under drivers/net/ethernet?

I do not want an RDMA driver in drivers/net/ethernet.


> +}
> +
> +int mana_ib_gd_create_dma_region(struct mana_ib_dev *dev, struct ib_umem *umem,
> +				 mana_handle_t *gdma_region, u64 page_sz)
> +{
> +	size_t num_pages_total = ib_umem_num_dma_blocks(umem, page_sz);
> +	struct gdma_dma_region_add_pages_req *add_req = NULL;
> +	struct gdma_create_dma_region_resp create_resp = {};
> +	struct gdma_create_dma_region_req *create_req;
> +	size_t num_pages_cur, num_pages_to_handle;
> +	unsigned int create_req_msg_size;
> +	struct hw_channel_context *hwc;
> +	struct ib_block_iter biter;
> +	size_t max_pgs_create_cmd;
> +	struct gdma_context *gc;
> +	struct gdma_dev *mdev;
> +	unsigned int i;
> +	int err;
> +
> +	mdev = dev->gdma_dev;
> +	gc = mdev->gdma_context;
> +	hwc = gc->hwc.driver_data;
> +	max_pgs_create_cmd =
> +		(hwc->max_req_msg_size - sizeof(*create_req)) / sizeof(u64);
> +
> +	num_pages_to_handle =
> +		min_t(size_t, num_pages_total, max_pgs_create_cmd);
> +	create_req_msg_size =
> +		struct_size(create_req, page_addr_list, num_pages_to_handle);
> +
> +	create_req = kzalloc(create_req_msg_size, GFP_KERNEL);
> +	if (!create_req)

Is this a multi-order allocation, I can't tell how big
max_req_msg_size is? 

This design seems to repeat the mistakes we made in mlx5, the low
levels of the driver already has memory allocated - why not get a
pointer to that memory here and directly fill the message buffer
instead of all this allocation and memory copying?
> +	ibdev_dbg(&dev->ib_dev,
> +		  "size_dma_region %lu num_pages_total %lu, "
> +		  "page_sz 0x%llx offset_in_page %u\n",
> +		  umem->length, num_pages_total, page_sz,
> +		  create_req->offset_in_page);
> +
> +	ibdev_dbg(&dev->ib_dev, "num_pages_to_handle %lu, gdma_page_type %u",
> +		  num_pages_to_handle, create_req->gdma_page_type);
> +
> +	__rdma_umem_block_iter_start(&biter, umem, page_sz);

> +	for (i = 0; i < num_pages_to_handle; ++i) {
> +		dma_addr_t cur_addr;
> +
> +		__rdma_block_iter_next(&biter);
> +		cur_addr = rdma_block_iter_dma_address(&biter);
> +
> +		create_req->page_addr_list[i] = cur_addr;
> +
> +		ibdev_dbg(&dev->ib_dev, "page num %u cur_addr 0x%llx\n", i,
> +			  cur_addr);

Please get rid of the worthless debugging code, actually test your
driver with EBUG enabled and ensure it is usuable. Printing thousands
of lines of garbage is not OK. Especially when what should have been a
1 line loop body is expended into a big block just to have a worthless
debug statement.


> +	if (num_pages_cur < num_pages_total) {
> +		unsigned int add_req_msg_size;
> +		size_t max_pgs_add_cmd =
> +			(hwc->max_req_msg_size - sizeof(*add_req)) /
> +			sizeof(u64);
> +
> +		num_pages_to_handle =
> +			min_t(size_t, num_pages_total - num_pages_cur,
> +			      max_pgs_add_cmd);
> +
> +		/* Calculate the max num of pages that will be handled */
> +		add_req_msg_size = struct_size(add_req, page_addr_list,
> +					       num_pages_to_handle);
> +
> +		add_req = kmalloc(add_req_msg_size, GFP_KERNEL);

And allocating every loop iteration seems like overkill, why not just
reuse the large buffer that create_req allocated?

Usually the way these loops are structured is to fill the array and
then check for fullness, trigger an action to drain the array, and
reset the indexes back to the start.

> +int mana_ib_gd_create_pd(struct mana_ib_dev *dev, u64 *pd_handle, u32 *pd_id,
> +			 enum gdma_pd_flags flags)
> +{
> +	struct gdma_dev *mdev = dev->gdma_dev;
> +	struct gdma_create_pd_resp resp = {};
> +	struct gdma_create_pd_req req = {};
> +	struct gdma_context *gc;
> +	int err;
> +
> +	gc = mdev->gdma_context;
> +
> +	mana_gd_init_req_hdr(&req.hdr, GDMA_CREATE_PD, sizeof(req),
> +			     sizeof(resp));
> +
> +	req.flags = flags;
> +	err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
> +
> +	if (err || resp.hdr.status) {
> +		ibdev_err(&dev->ib_dev,
> +			  "Failed to get pd_id err %d status %u\n", err,
> +			  resp.hdr.status);
> +		if (!err)
> +			err = -EPROTO;

This pattern is repeated everywhere, you should fix
mana_gd_send_request() to return EPROTO.

> +		return err;
> +	}
> +
> +	*pd_handle = resp.pd_handle;
> +	*pd_id = resp.pd_id;
> +	ibdev_dbg(&dev->ib_dev, "pd_handle 0x%llx pd_id %d\n", *pd_handle,
> +		  *pd_id);
> +
> +	return 0;
> +}
> +
> +int mana_ib_gd_destroy_pd(struct mana_ib_dev *dev, u64 pd_handle)
> +{
> +	struct gdma_dev *mdev = dev->gdma_dev;
> +	struct gdma_destory_pd_resp resp = {};
> +	struct gdma_destroy_pd_req req = {};
> +	struct gdma_context *gc;
> +	int err;
> +
> +	gc = mdev->gdma_context;

Why the local for a variable that is used once?

> +int mana_ib_mmap(struct ib_ucontext *ibcontext, struct vm_area_struct *vma)
> +{
> +	struct mana_ib_ucontext *mana_ucontext =
> +		container_of(ibcontext, struct mana_ib_ucontext, ibucontext);
> +	struct ib_device *ibdev = ibcontext->device;
> +	struct mana_ib_dev *mdev;
> +	struct gdma_context *gc;
> +	phys_addr_t pfn;
> +	pgprot_t prot;
> +	int ret;
> +
> +	mdev = container_of(ibdev, struct mana_ib_dev, ib_dev);
> +	gc = mdev->gdma_dev->gdma_context;
> +
> +	if (vma->vm_pgoff != 0) {
> +		ibdev_err(ibdev, "Unexpected vm_pgoff %lu\n", vma->vm_pgoff);
> +		return -EINVAL;

More user triggerable printing

> +int mana_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
> +{
> +	struct mana_ib_mr *mr = container_of(ibmr, struct mana_ib_mr, ibmr);
> +	struct ib_device *ibdev = ibmr->device;
> +	struct mana_ib_dev *dev;
> +	int err;
> +
> +	dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
> +
> +	err = mana_ib_gd_destroy_mr(dev, mr->mr_handle);
> +	if (err)
> +		return err;

mana_ib_gd_destroy_mr() is only ever called here, why is it in main.c?

> +static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
> +				 struct ib_qp_init_attr *attr,
> +				 struct ib_udata *udata)
> +{
> +	struct mana_ib_qp *qp = container_of(ibqp, struct mana_ib_qp, ibqp);
> +	struct mana_ib_dev *mdev =
> +		container_of(pd->device, struct mana_ib_dev, ib_dev);
> +	struct ib_rwq_ind_table *ind_tbl = attr->rwq_ind_tbl;
> +	struct mana_ib_create_qp_rss_resp resp = {};
> +	struct mana_ib_create_qp_rss ucmd = {};
> +	struct gdma_dev *gd = mdev->gdma_dev;
> +	mana_handle_t *mana_ind_table;
> +	struct mana_port_context *mpc;
> +	struct mana_context *mc;
> +	struct net_device *ndev;
> +	struct mana_ib_cq *cq;
> +	struct mana_ib_wq *wq;
> +	struct ib_cq *ibcq;
> +	struct ib_wq *ibwq;
> +	int i = 0, ret;
> +	u32 port;
> +
> +	mc = gd->driver_data;
> +
> +	if (udata->inlen < sizeof(ucmd))
> +		return -EINVAL;
> +
> +	ret = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen));
> +	if (ret) {
> +		ibdev_dbg(&mdev->ib_dev,
> +			  "Failed copy from udata for create rss-qp, err %d\n",
> +			  ret);
> +		return -EFAULT;
> +	}
> +
> +	if (attr->cap.max_recv_wr > MAX_SEND_BUFFERS_PER_QUEUE) {
> +		ibdev_dbg(&mdev->ib_dev,
> +			  "Requested max_recv_wr %d exceeding limit.\n",
> +			  attr->cap.max_recv_wr);
> +		return -EINVAL;
> +	}
> +
> +	if (attr->cap.max_recv_sge > MAX_RX_WQE_SGL_ENTRIES) {
> +		ibdev_dbg(&mdev->ib_dev,
> +			  "Requested max_recv_sge %d exceeding limit.\n",
> +			  attr->cap.max_recv_sge);
> +		return -EINVAL;
> +	}
> +
> +	if (ucmd.rx_hash_function != MANA_IB_RX_HASH_FUNC_TOEPLITZ) {
> +		ibdev_dbg(&mdev->ib_dev,
> +			  "RX Hash function is not supported, %d\n",
> +			  ucmd.rx_hash_function);
> +		return -EINVAL;
> +	}
> +
> +	/* IB ports start with 1, MANA start with 0 */
> +	port = ucmd.port;
> +	if (port < 1 || port > mc->num_ports) {
> +		ibdev_dbg(&mdev->ib_dev, "Invalid port %u in creating qp\n",
> +			  port);
> +		return -EINVAL;
> +	}
> +	ndev = mc->ports[port - 1];
> +	mpc = netdev_priv(ndev);
> +
> +	ibdev_dbg(&mdev->ib_dev, "rx_hash_function %d port %d\n",
> +		  ucmd.rx_hash_function, port);
> +
> +	mana_ind_table = kzalloc(sizeof(mana_handle_t) *
> +					 (1 << ind_tbl->log_ind_tbl_size),
> +				 GFP_KERNEL);

Should be careful about maths overflow on this calculation.

> +	ibdev_dbg(&mdev->ib_dev, "ucmd sq_buf_addr 0x%llx port %u\n",
> +		  ucmd.sq_buf_addr, ucmd.port);
> +
> +	umem = ib_umem_get(ibpd->device, ucmd.sq_buf_addr, ucmd.sq_buf_size,
> +			   IB_ACCESS_LOCAL_WRITE);
> +	if (IS_ERR(umem)) {
> +		err = PTR_ERR(umem);
> +		ibdev_dbg(&mdev->ib_dev,
> +			  "Failed to get umem for create qp-raw, err %d\n",
> +			  err);
> +		goto err_free_vport;
> +	}
> +	qp->sq_umem = umem;
> +
> +	err = mana_ib_gd_create_dma_region(mdev, qp->sq_umem,
> +					   &qp->sq_gdma_region, PAGE_SIZE);
> +	if (err) {

All these cases that process a page list have to call
ib_umem_find_best_XXX()!

It does important validation against hostile user input, including
checking the critical iova.

You have missed that the user can request that an unaligned umem be
created and this creates a starting offset from the first page that
must be honored in HW, or there will be weird memory corruption. Since
it looks like this HW doesn't support a starting page offset this
should call ib_umem_find_best_pgsz() with a SZ_4K and a 0 iova. Also
never use PAGE_SIZE to describe a HW limitation.

There is alot of repeated code here, it would do well to have a
wrapper function consolidating this pattern.

Jason

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [Patch v6 00/12] Introduce Microsoft Azure Network Adapter (MANA) RDMA driver
  2022-09-21 16:25 ` [Patch v6 00/12] Introduce Microsoft Azure Network Adapter (MANA) RDMA driver Jason Gunthorpe
@ 2022-09-21 18:53   ` Long Li
  0 siblings, 0 replies; 19+ messages in thread
From: Long Li @ 2022-09-21 18:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Leon Romanovsky, edumazet, shiraz.saleem, Ajay Sharma,
	linux-hyperv, netdev, linux-kernel, linux-rdma

> Subject: Re: [Patch v6 00/12] Introduce Microsoft Azure Network Adapter
> (MANA) RDMA driver
> 
> On Tue, Sep 20, 2022 at 06:22:20PM -0700, longli@linuxonhyperv.com wrote:
> > From: Long Li <longli@microsoft.com>
> >
> > This patchset implements a RDMA driver for Microsoft Azure Network
> > Adapter (MANA). In MANA, the RDMA device is modeled as an auxiliary
> > device to the Ethernet device.
> >
> > The first 11 patches modify the MANA Ethernet driver to support RDMA
> driver.
> > The last patch implementes the RDMA driver.
> >
> > The user-mode of the driver is being reviewed at:
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> > ub.com%2Flinux-rdma%2Frdma-
> core%2Fpull%2F1177&amp;data=05%7C01%7Clongl
> >
> i%40microsoft.com%7C1d30ce9d49bc411b0fb808da9bede4bd%7C72f988bf86
> f141a
> >
> f91ab2d7cd011db47%7C1%7C0%7C637993743320198387%7CUnknown%7CT
> WFpbGZsb3d
> >
> 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%
> 3D%7C
> >
> 3000%7C%7C%7C&amp;sdata=4cT9PFnKqb2yHRLkKNrbVVMAGwO6Ig91eoQ
> zomlm6lY%3D
> > &amp;reserved=0
> >
> >
> > Ajay Sharma (3):
> >   net: mana: Set the DMA device max segment size
> >   net: mana: Define and process GDMA response code
> >     GDMA_STATUS_MORE_ENTRIES
> >   net: mana: Define data structures for protection domain and memory
> >     registration
> >
> > Long Li (9):
> >   net: mana: Add support for auxiliary device
> >   net: mana: Record the physical address for doorbell page region
> >   net: mana: Handle vport sharing between devices
> >   net: mana: Add functions for allocating doorbell page from GDMA
> >   net: mana: Export Work Queue functions for use by RDMA driver
> >   net: mana: Record port number in netdev
> >   net: mana: Move header files to a common location
> >   net: mana: Define max values for SGL entries
> >   RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter
> 
> Still some basic checkpatchy stuff:
> 
> /tmp/tmpm2fsg47h/0012-RDMA-mana_ib-Add-a-driver-for-Microsoft-
> Azure-Network-Adapter.patch:412: WARNING: quoted string split across
> lines
> #412: FILE: drivers/infiniband/hw/mana/main.c:70:
> +		  "vport handle %llx pdid %x doorbell_id %x "
> +		  "tx_shortform_allowed %d tx_vp_offset %u\n",
> 
> /tmp/tmpm2fsg47h/0012-RDMA-mana_ib-Add-a-driver-for-Microsoft-
> Azure-Network-Adapter.patch:540: WARNING: quoted string split across
> lines
> #540: FILE: drivers/infiniband/hw/mana/main.c:198:
> +		  "size_dma_region %lu num_pages_total %lu, "
> +		  "page_sz 0x%llx offset_in_page %u\n",
> 
> And it thinks you should write more for the kconfig symbol, eg why would
> someone want to turn it on (hint, to use dpkd on some Azure
> instances)

Will fix those.

> 
> /tmp/tmpm2fsg47h/0012-RDMA-mana_ib-Add-a-driver-for-Microsoft-
> Azure-Network-Adapter.patch:100: WARNING: please write a help
> paragraph that fully describes the config symbol
> #100: FILE: drivers/infiniband/hw/mana/Kconfig:2:
> +config MANA_INFINIBAND
> +	tristate "Microsoft Azure Network Adapter support"
> +	depends on NETDEVICES && ETHERNET && PCI &&
> MICROSOFT_MANA
> +	help
> +	  This driver provides low-level RDMA support for
> +	  Microsoft Azure Network Adapter (MANA).
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [Patch v6 12/12] RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter
  2022-09-21 16:26   ` Jason Gunthorpe
@ 2022-09-21 21:00     ` Long Li
  0 siblings, 0 replies; 19+ messages in thread
From: Long Li @ 2022-09-21 21:00 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Leon Romanovsky, edumazet, shiraz.saleem, Ajay Sharma,
	linux-hyperv, netdev, linux-kernel, linux-rdma

> Subject: Re: [Patch v6 12/12] RDMA/mana_ib: Add a driver for Microsoft
> Azure Network Adapter
> 
> On Tue, Sep 20, 2022 at 06:22:32PM -0700, longli@linuxonhyperv.com wrote:
> > diff --git a/MAINTAINERS b/MAINTAINERS index
> > 8b9a50756c7e..7bcc19e27f97 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -9426,6 +9426,7 @@ M:	Haiyang Zhang <haiyangz@microsoft.com>
> >  M:	Stephen Hemminger <sthemmin@microsoft.com>
> >  M:	Wei Liu <wei.liu@kernel.org>
> >  M:	Dexuan Cui <decui@microsoft.com>
> > +M:	Long Li <longli@microsoft.com>
> >  L:	linux-hyperv@vger.kernel.org
> >  S:	Supported
> >  T:	git git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git
> > @@ -9444,6 +9445,7 @@ F:	arch/x86/kernel/cpu/mshyperv.c
> >  F:	drivers/clocksource/hyperv_timer.c
> >  F:	drivers/hid/hid-hyperv.c
> >  F:	drivers/hv/
> > +F:	drivers/infiniband/hw/mana/
> >  F:	drivers/input/serio/hyperv-keyboard.c
> >  F:	drivers/iommu/hyperv-iommu.c
> >  F:	drivers/net/ethernet/microsoft/
> > @@ -9459,6 +9461,7 @@ F:	include/clocksource/hyperv_timer.h
> >  F:	include/linux/hyperv.h
> >  F:	include/net/mana
> >  F:	include/uapi/linux/hyperv.h
> > +F:	include/uapi/rdma/mana-abi.h
> >  F:	net/vmw_vsock/hyperv_transport.c
> >  F:	tools/hv/
> 
> This is not the proper way to write a maintainers entry, a driver should be
> stand-alone and have L: entries for the subsystem mailing list, not be
> bundled like this. If you do this patches will not go to the correct lists and will
> not be applied.
> 
> Follow the example of every other rdma driver please.
> 
> Jason

Will fix this.

Thanks,
Long

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [Patch v6 12/12] RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter
  2022-09-21 17:59   ` Jason Gunthorpe
@ 2022-09-22  8:23     ` Long Li
  0 siblings, 0 replies; 19+ messages in thread
From: Long Li @ 2022-09-22  8:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Leon Romanovsky, edumazet, shiraz.saleem, Ajay Sharma,
	linux-hyperv, netdev, linux-kernel, linux-rdma

> Subject: Re: [Patch v6 12/12] RDMA/mana_ib: Add a driver for Microsoft
> Azure Network Adapter
> 
> On Tue, Sep 20, 2022 at 06:22:32PM -0700, longli@linuxonhyperv.com wrote:
> > +int mana_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr
> *attr,
> > +		      struct ib_udata *udata)
> > +{
> > +	struct mana_ib_cq *cq = container_of(ibcq, struct mana_ib_cq, ibcq);
> > +	struct ib_device *ibdev = ibcq->device;
> > +	struct mana_ib_create_cq ucmd = {};
> > +	struct mana_ib_dev *mdev;
> > +	int err;
> > +
> > +	mdev = container_of(ibdev, struct mana_ib_dev, ib_dev);
> 
> Stylistically these container_of's are usually in the definitions section, at least
> pick a form and stick to it consistently.

I will review all the other occurrences for consistency. I'm trying to use the "reverse Christmas tree" for definitions, so putting "mdev =" in a separate line.

> 
> > +	if (udata->inlen < sizeof(ucmd))
> > +		return -EINVAL;
> > +
> > +	err = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata-
> >inlen));
> > +	if (err) {
> > +		ibdev_dbg(ibdev,
> > +			  "Failed to copy from udata for create cq, %d\n", err);
> 
> > +		return -EFAULT;
> > +	}
> > +
> > +	if (attr->cqe > MAX_SEND_BUFFERS_PER_QUEUE) {
> > +		ibdev_dbg(ibdev, "CQE %d exceeding limit\n", attr->cqe);
> > +		return -EINVAL;
> > +	}
> > +
> > +	cq->cqe = attr->cqe;
> > +	cq->umem = ib_umem_get(ibdev, ucmd.buf_addr, cq->cqe *
> COMP_ENTRY_SIZE,
> > +			       IB_ACCESS_LOCAL_WRITE);
> > +	if (IS_ERR(cq->umem)) {
> > +		err = PTR_ERR(cq->umem);
> > +		ibdev_dbg(ibdev, "Failed to get umem for create cq,
> err %d\n",
> > +			  err);
> > +		return err;
> > +	}
> > +
> > +	err = mana_ib_gd_create_dma_region(mdev, cq->umem, &cq-
> >gdma_region,
> > +					   PAGE_SIZE);
> > +	if (err) {
> > +		ibdev_err(ibdev,
> > +			  "Failed to create dma region for create cq, %d\n",
> > +			  err);
> 
> Prints on userspace paths are not allowed, this should be dbg. There are
> many other cases like this, please fix them all. This driver may have too many
> dbg prints too, not every failure if should have a print :(

Will fix those.

> 
> > +	ibdev_dbg(ibdev,
> > +		  "mana_ib_gd_create_dma_region ret %d gdma_region
> 0x%llx\n",
> > +		  err, cq->gdma_region);
> > +
> > +	/* The CQ ID is not known at this time
> > +	 * The ID is generated at create_qp
> > +	 */
> 
> Wrong comment style, rdma uses the leading empty blank line for some
> reason
> 
>  /*
>   * The CQ ID is not known at this time. The ID is generated at create_qp.
>   */

Will fix all the comments.

> 
> > +static void mana_ib_remove(struct auxiliary_device *adev) {
> > +	struct mana_ib_dev *dev = dev_get_drvdata(&adev->dev);
> > +
> > +	ib_unregister_device(&dev->ib_dev);
> > +	ib_dealloc_device(&dev->ib_dev);
> > +}
> > +
> > +static const struct auxiliary_device_id mana_id_table[] = {
> > +	{
> > +		.name = "mana.rdma",
> > +	},
> > +	{},
> > +};
> > +
> > +MODULE_DEVICE_TABLE(auxiliary, mana_id_table);
> > +
> > +static struct auxiliary_driver mana_driver = {
> > +	.name = "rdma",
> > +	.probe = mana_ib_probe,
> > +	.remove = mana_ib_remove,
> > +	.id_table = mana_id_table,
> > +};
> > +
> > +static int __init mana_ib_init(void)
> > +{
> > +	auxiliary_driver_register(&mana_driver);
> > +
> > +	return 0;
> > +}
> > +
> > +static void __exit mana_ib_cleanup(void) {
> > +	auxiliary_driver_unregister(&mana_driver);
> > +}
> > +
> 
> All this is just module_auxiliary_driver()

Will fix this.

> 
> > +	mutex_lock(&pd->vport_mutex);
> > +
> > +	pd->vport_use_count++;
> > +	if (pd->vport_use_count > 1) {
> > +		ibdev_dbg(&dev->ib_dev,
> > +			  "Skip as this PD is already configured vport\n");
> > +		mutex_unlock(&pd->vport_mutex);
> 
> This leaves vport_use_count elevated.

This is intentional. The code will call mana_ib_uncfg_vport() (which decreases vport_use_count ) when destroying the QP.

> 
> > +		return 0;
> > +int mana_ib_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata) {
> > +	struct mana_ib_pd *pd = container_of(ibpd, struct mana_ib_pd,
> ibpd);
> > +	struct ib_device *ibdev = ibpd->device;
> > +	struct mana_ib_dev *dev;
> > +
> > +	dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
> > +	return mana_ib_gd_destroy_pd(dev, pd->pd_handle);
> 
> This is the only place that calls mana_ib_gd_destroy_pd(), don't have a
> spaghetti of functions calling single other functions like this, just inline it.
> 
> Also it shouldn't have been non-static, please check everything that
> everything non-static actually has an out-of-file user.

Will fix those.

> 
> > +void mana_ib_dealloc_ucontext(struct ib_ucontext *ibcontext) {
> > +	struct mana_ib_ucontext *mana_ucontext =
> > +		container_of(ibcontext, struct mana_ib_ucontext,
> ibucontext);
> > +	struct ib_device *ibdev = ibcontext->device;
> > +	struct mana_ib_dev *mdev;
> > +	struct gdma_context *gc;
> > +	int ret;
> > +
> > +	mdev = container_of(ibdev, struct mana_ib_dev, ib_dev);
> > +	gc = mdev->gdma_dev->gdma_context;
> > +
> > +	ret = mana_gd_destroy_doorbell_page(gc, mana_ucontext-
> >doorbell);
> > +	if (ret)
> > +		ibdev_err(ibdev, "Failed to destroy doorbell page %d\n", ret);
> 
> This already printing on error
> 
> And again, why is this driver split up so strangely?
> mana_gd_destroy_doorbell_page() is an RDMA function, it is only called by
> the RDMA code, why is it located under drivers/net/ethernet?
> 
> I do not want an RDMA driver in drivers/net/ethernet.

Will move this function (and mana_gd_allocate_doorbell_page) to the RDMA driver.

> 
> 
> > +}
> > +
> > +int mana_ib_gd_create_dma_region(struct mana_ib_dev *dev, struct
> ib_umem *umem,
> > +				 mana_handle_t *gdma_region, u64 page_sz)
> {
> > +	size_t num_pages_total = ib_umem_num_dma_blocks(umem,
> page_sz);
> > +	struct gdma_dma_region_add_pages_req *add_req = NULL;
> > +	struct gdma_create_dma_region_resp create_resp = {};
> > +	struct gdma_create_dma_region_req *create_req;
> > +	size_t num_pages_cur, num_pages_to_handle;
> > +	unsigned int create_req_msg_size;
> > +	struct hw_channel_context *hwc;
> > +	struct ib_block_iter biter;
> > +	size_t max_pgs_create_cmd;
> > +	struct gdma_context *gc;
> > +	struct gdma_dev *mdev;
> > +	unsigned int i;
> > +	int err;
> > +
> > +	mdev = dev->gdma_dev;
> > +	gc = mdev->gdma_context;
> > +	hwc = gc->hwc.driver_data;
> > +	max_pgs_create_cmd =
> > +		(hwc->max_req_msg_size - sizeof(*create_req)) /
> sizeof(u64);
> > +
> > +	num_pages_to_handle =
> > +		min_t(size_t, num_pages_total, max_pgs_create_cmd);
> > +	create_req_msg_size =
> > +		struct_size(create_req, page_addr_list,
> num_pages_to_handle);
> > +
> > +	create_req = kzalloc(create_req_msg_size, GFP_KERNEL);
> > +	if (!create_req)
> 
> Is this a multi-order allocation, I can't tell how big max_req_msg_size is?
> 
> This design seems to repeat the mistakes we made in mlx5, the low levels of
> the driver already has memory allocated - why not get a pointer to that
> memory here and directly fill the message buffer instead of all this allocation
> and memory copying?

max_req_msg_size is 4k bytes (one page). I think this allocation is still necessary as we don't have a buffer for this request. But we should be able to reuse this buffer for subsequent commands (details below).

> > +	ibdev_dbg(&dev->ib_dev,
> > +		  "size_dma_region %lu num_pages_total %lu, "
> > +		  "page_sz 0x%llx offset_in_page %u\n",
> > +		  umem->length, num_pages_total, page_sz,
> > +		  create_req->offset_in_page);
> > +
> > +	ibdev_dbg(&dev->ib_dev, "num_pages_to_handle %lu,
> gdma_page_type %u",
> > +		  num_pages_to_handle, create_req->gdma_page_type);
> > +
> > +	__rdma_umem_block_iter_start(&biter, umem, page_sz);
> 
> > +	for (i = 0; i < num_pages_to_handle; ++i) {
> > +		dma_addr_t cur_addr;
> > +
> > +		__rdma_block_iter_next(&biter);
> > +		cur_addr = rdma_block_iter_dma_address(&biter);
> > +
> > +		create_req->page_addr_list[i] = cur_addr;
> > +
> > +		ibdev_dbg(&dev->ib_dev, "page num %u cur_addr 0x%llx\n",
> i,
> > +			  cur_addr);
> 
> Please get rid of the worthless debugging code, actually test your driver with
> EBUG enabled and ensure it is usuable. Printing thousands of lines of garbage
> is not OK. Especially when what should have been a
> 1 line loop body is expended into a big block just to have a worthless debug
> statement.

Will remove those debugging code.

> 
> 
> > +	if (num_pages_cur < num_pages_total) {
> > +		unsigned int add_req_msg_size;
> > +		size_t max_pgs_add_cmd =
> > +			(hwc->max_req_msg_size - sizeof(*add_req)) /
> > +			sizeof(u64);
> > +
> > +		num_pages_to_handle =
> > +			min_t(size_t, num_pages_total - num_pages_cur,
> > +			      max_pgs_add_cmd);
> > +
> > +		/* Calculate the max num of pages that will be handled */
> > +		add_req_msg_size = struct_size(add_req, page_addr_list,
> > +					       num_pages_to_handle);
> > +
> > +		add_req = kmalloc(add_req_msg_size, GFP_KERNEL);
> 
> And allocating every loop iteration seems like overkill, why not just reuse the
> large buffer that create_req allocated?
> 
> Usually the way these loops are structured is to fill the array and then check
> for fullness, trigger an action to drain the array, and reset the indexes back to
> the start.

I will reuse the command buffer allocated earlier as you suggested.

> 
> > +int mana_ib_gd_create_pd(struct mana_ib_dev *dev, u64 *pd_handle,
> u32 *pd_id,
> > +			 enum gdma_pd_flags flags)
> > +{
> > +	struct gdma_dev *mdev = dev->gdma_dev;
> > +	struct gdma_create_pd_resp resp = {};
> > +	struct gdma_create_pd_req req = {};
> > +	struct gdma_context *gc;
> > +	int err;
> > +
> > +	gc = mdev->gdma_context;
> > +
> > +	mana_gd_init_req_hdr(&req.hdr, GDMA_CREATE_PD, sizeof(req),
> > +			     sizeof(resp));
> > +
> > +	req.flags = flags;
> > +	err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp),
> > +&resp);
> > +
> > +	if (err || resp.hdr.status) {
> > +		ibdev_err(&dev->ib_dev,
> > +			  "Failed to get pd_id err %d status %u\n", err,
> > +			  resp.hdr.status);
> > +		if (!err)
> > +			err = -EPROTO;
> 
> This pattern is repeated everywhere, you should fix
> mana_gd_send_request() to return EPROTO.

This pattern is also used all over in the ethernet driver. Since this patch series is for the RDMA driver, can we make a separate patch to refactor the code in mana_gd_send_request()? This can minimize the code changes to the ethernet driver in this patch series.

> 
> > +		return err;
> > +	}
> > +
> > +	*pd_handle = resp.pd_handle;
> > +	*pd_id = resp.pd_id;
> > +	ibdev_dbg(&dev->ib_dev, "pd_handle 0x%llx pd_id %d\n",
> *pd_handle,
> > +		  *pd_id);
> > +
> > +	return 0;
> > +}
> > +
> > +int mana_ib_gd_destroy_pd(struct mana_ib_dev *dev, u64 pd_handle) {
> > +	struct gdma_dev *mdev = dev->gdma_dev;
> > +	struct gdma_destory_pd_resp resp = {};
> > +	struct gdma_destroy_pd_req req = {};
> > +	struct gdma_context *gc;
> > +	int err;
> > +
> > +	gc = mdev->gdma_context;
> 
> Why the local for a variable that is used once?

Will refactoring the code.

> 
> > +int mana_ib_mmap(struct ib_ucontext *ibcontext, struct vm_area_struct
> > +*vma) {
> > +	struct mana_ib_ucontext *mana_ucontext =
> > +		container_of(ibcontext, struct mana_ib_ucontext,
> ibucontext);
> > +	struct ib_device *ibdev = ibcontext->device;
> > +	struct mana_ib_dev *mdev;
> > +	struct gdma_context *gc;
> > +	phys_addr_t pfn;
> > +	pgprot_t prot;
> > +	int ret;
> > +
> > +	mdev = container_of(ibdev, struct mana_ib_dev, ib_dev);
> > +	gc = mdev->gdma_dev->gdma_context;
> > +
> > +	if (vma->vm_pgoff != 0) {
> > +		ibdev_err(ibdev, "Unexpected vm_pgoff %lu\n", vma-
> >vm_pgoff);
> > +		return -EINVAL;
> 
> More user triggerable printing
> 
> > +int mana_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata) {
> > +	struct mana_ib_mr *mr = container_of(ibmr, struct mana_ib_mr,
> ibmr);
> > +	struct ib_device *ibdev = ibmr->device;
> > +	struct mana_ib_dev *dev;
> > +	int err;
> > +
> > +	dev = container_of(ibdev, struct mana_ib_dev, ib_dev);
> > +
> > +	err = mana_ib_gd_destroy_mr(dev, mr->mr_handle);
> > +	if (err)
> > +		return err;
> 
> mana_ib_gd_destroy_mr() is only ever called here, why is it in main.c?

Will move it and mana_ib_gd_create_mr to mr.c

> 
> > +static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
> > +				 struct ib_qp_init_attr *attr,
> > +				 struct ib_udata *udata)
> > +{
> > +	struct mana_ib_qp *qp = container_of(ibqp, struct mana_ib_qp,
> ibqp);
> > +	struct mana_ib_dev *mdev =
> > +		container_of(pd->device, struct mana_ib_dev, ib_dev);
> > +	struct ib_rwq_ind_table *ind_tbl = attr->rwq_ind_tbl;
> > +	struct mana_ib_create_qp_rss_resp resp = {};
> > +	struct mana_ib_create_qp_rss ucmd = {};
> > +	struct gdma_dev *gd = mdev->gdma_dev;
> > +	mana_handle_t *mana_ind_table;
> > +	struct mana_port_context *mpc;
> > +	struct mana_context *mc;
> > +	struct net_device *ndev;
> > +	struct mana_ib_cq *cq;
> > +	struct mana_ib_wq *wq;
> > +	struct ib_cq *ibcq;
> > +	struct ib_wq *ibwq;
> > +	int i = 0, ret;
> > +	u32 port;
> > +
> > +	mc = gd->driver_data;
> > +
> > +	if (udata->inlen < sizeof(ucmd))
> > +		return -EINVAL;
> > +
> > +	ret = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata-
> >inlen));
> > +	if (ret) {
> > +		ibdev_dbg(&mdev->ib_dev,
> > +			  "Failed copy from udata for create rss-qp, err %d\n",
> > +			  ret);
> > +		return -EFAULT;
> > +	}
> > +
> > +	if (attr->cap.max_recv_wr > MAX_SEND_BUFFERS_PER_QUEUE) {
> > +		ibdev_dbg(&mdev->ib_dev,
> > +			  "Requested max_recv_wr %d exceeding limit.\n",
> > +			  attr->cap.max_recv_wr);
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (attr->cap.max_recv_sge > MAX_RX_WQE_SGL_ENTRIES) {
> > +		ibdev_dbg(&mdev->ib_dev,
> > +			  "Requested max_recv_sge %d exceeding limit.\n",
> > +			  attr->cap.max_recv_sge);
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (ucmd.rx_hash_function != MANA_IB_RX_HASH_FUNC_TOEPLITZ)
> {
> > +		ibdev_dbg(&mdev->ib_dev,
> > +			  "RX Hash function is not supported, %d\n",
> > +			  ucmd.rx_hash_function);
> > +		return -EINVAL;
> > +	}
> > +
> > +	/* IB ports start with 1, MANA start with 0 */
> > +	port = ucmd.port;
> > +	if (port < 1 || port > mc->num_ports) {
> > +		ibdev_dbg(&mdev->ib_dev, "Invalid port %u in creating
> qp\n",
> > +			  port);
> > +		return -EINVAL;
> > +	}
> > +	ndev = mc->ports[port - 1];
> > +	mpc = netdev_priv(ndev);
> > +
> > +	ibdev_dbg(&mdev->ib_dev, "rx_hash_function %d port %d\n",
> > +		  ucmd.rx_hash_function, port);
> > +
> > +	mana_ind_table = kzalloc(sizeof(mana_handle_t) *
> > +					 (1 << ind_tbl->log_ind_tbl_size),
> > +				 GFP_KERNEL);
> 
> Should be careful about maths overflow on this calculation.

Will add a check.

> 
> > +	ibdev_dbg(&mdev->ib_dev, "ucmd sq_buf_addr 0x%llx port %u\n",
> > +		  ucmd.sq_buf_addr, ucmd.port);
> > +
> > +	umem = ib_umem_get(ibpd->device, ucmd.sq_buf_addr,
> ucmd.sq_buf_size,
> > +			   IB_ACCESS_LOCAL_WRITE);
> > +	if (IS_ERR(umem)) {
> > +		err = PTR_ERR(umem);
> > +		ibdev_dbg(&mdev->ib_dev,
> > +			  "Failed to get umem for create qp-raw, err %d\n",
> > +			  err);
> > +		goto err_free_vport;
> > +	}
> > +	qp->sq_umem = umem;
> > +
> > +	err = mana_ib_gd_create_dma_region(mdev, qp->sq_umem,
> > +					   &qp->sq_gdma_region, PAGE_SIZE);
> > +	if (err) {
> 
> All these cases that process a page list have to call
> ib_umem_find_best_XXX()!
> 
> It does important validation against hostile user input, including checking the
> critical iova.
> 
> You have missed that the user can request that an unaligned umem be
> created and this creates a starting offset from the first page that must be
> honored in HW, or there will be weird memory corruption. Since it looks like
> this HW doesn't support a starting page offset this should call
> ib_umem_find_best_pgsz() with a SZ_4K and a 0 iova. Also never use
> PAGE_SIZE to describe a HW limitation.

Thank you, will fix this.

> 
> There is alot of repeated code here, it would do well to have a wrapper
> function consolidating this pattern.

Will factor the code.

Thanks,
Long

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-09-22  8:23 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-21  1:22 [Patch v6 00/12] Introduce Microsoft Azure Network Adapter (MANA) RDMA driver longli
2022-09-21  1:22 ` [Patch v6 01/12] net: mana: Add support for auxiliary device longli
2022-09-21  1:22 ` [Patch v6 02/12] net: mana: Record the physical address for doorbell page region longli
2022-09-21  1:22 ` [Patch v6 03/12] net: mana: Handle vport sharing between devices longli
2022-09-21  1:22 ` [Patch v6 04/12] net: mana: Add functions for allocating doorbell page from GDMA longli
2022-09-21  1:22 ` [Patch v6 05/12] net: mana: Set the DMA device max segment size longli
2022-09-21  1:22 ` [Patch v6 06/12] net: mana: Export Work Queue functions for use by RDMA driver longli
2022-09-21  1:22 ` [Patch v6 07/12] net: mana: Record port number in netdev longli
2022-09-21  1:22 ` [Patch v6 08/12] net: mana: Move header files to a common location longli
2022-09-21  1:22 ` [Patch v6 09/12] net: mana: Define max values for SGL entries longli
2022-09-21  1:22 ` [Patch v6 10/12] net: mana: Define and process GDMA response code GDMA_STATUS_MORE_ENTRIES longli
2022-09-21  1:22 ` [Patch v6 11/12] net: mana: Define data structures for protection domain and memory registration longli
2022-09-21  1:22 ` [Patch v6 12/12] RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter longli
2022-09-21 16:26   ` Jason Gunthorpe
2022-09-21 21:00     ` Long Li
2022-09-21 17:59   ` Jason Gunthorpe
2022-09-22  8:23     ` Long Li
2022-09-21 16:25 ` [Patch v6 00/12] Introduce Microsoft Azure Network Adapter (MANA) RDMA driver Jason Gunthorpe
2022-09-21 18:53   ` Long Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).