All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 rdma-next 0/4] Misc update for hns driver
@ 2018-05-23 10:16 ` Wei Hu (Xavier)
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Hu (Xavier) @ 2018-05-23 10:16 UTC (permalink / raw)
  To: dledford, jgg; +Cc: linux-rdma, linux-kernel

This patchset included fixing bug, some optimization, reset
process for hns driver.

v1->v2: 1.pull the patch out of the series. The name of the patch is 
	  RDMA/hns: Implement the disassociate_ucontext API.
	2.Modify the fixes statement according to Leon's comment.
	3.Address the Jason's comment in reset relevant patch.

Wei Hu (Xavier) (4):
  RDMA/hns: Modify uar allocation algorithm to avoid bitmap exhaust
  RDMA/hns: Increase checking CMQ status timeout value
  RDMA/hns: Add reset process for RoCE in hip08
  RDMA/hns: Fix the illegal memory operation when cross page

 drivers/infiniband/hw/hns/hns_roce_cmd.c    |   3 +
 drivers/infiniband/hw/hns/hns_roce_device.h |   3 +
 drivers/infiniband/hw/hns/hns_roce_hw_v2.c  | 146 ++++++++++++++++++++++++----
 drivers/infiniband/hw/hns/hns_roce_hw_v2.h  |   3 +-
 drivers/infiniband/hw/hns/hns_roce_main.c   |   7 ++
 drivers/infiniband/hw/hns/hns_roce_pd.c     |  10 +-
 6 files changed, 147 insertions(+), 25 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH V2 rdma-next 0/4] Misc update for hns driver
@ 2018-05-23 10:16 ` Wei Hu (Xavier)
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Hu (Xavier) @ 2018-05-23 10:16 UTC (permalink / raw)
  To: dledford, jgg; +Cc: linux-rdma, linux-kernel

This patchset included fixing bug, some optimization, reset
process for hns driver.

v1->v2: 1.pull the patch out of the series. The name of the patch is 
	  RDMA/hns: Implement the disassociate_ucontext API.
	2.Modify the fixes statement according to Leon's comment.
	3.Address the Jason's comment in reset relevant patch.

Wei Hu (Xavier) (4):
  RDMA/hns: Modify uar allocation algorithm to avoid bitmap exhaust
  RDMA/hns: Increase checking CMQ status timeout value
  RDMA/hns: Add reset process for RoCE in hip08
  RDMA/hns: Fix the illegal memory operation when cross page

 drivers/infiniband/hw/hns/hns_roce_cmd.c    |   3 +
 drivers/infiniband/hw/hns/hns_roce_device.h |   3 +
 drivers/infiniband/hw/hns/hns_roce_hw_v2.c  | 146 ++++++++++++++++++++++++----
 drivers/infiniband/hw/hns/hns_roce_hw_v2.h  |   3 +-
 drivers/infiniband/hw/hns/hns_roce_main.c   |   7 ++
 drivers/infiniband/hw/hns/hns_roce_pd.c     |  10 +-
 6 files changed, 147 insertions(+), 25 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH V2 rdma-next 1/4] RDMA/hns: Modify uar allocation algorithm to avoid bitmap exhaust
  2018-05-23 10:16 ` Wei Hu (Xavier)
@ 2018-05-23 10:16   ` Wei Hu (Xavier)
  -1 siblings, 0 replies; 23+ messages in thread
From: Wei Hu (Xavier) @ 2018-05-23 10:16 UTC (permalink / raw)
  To: dledford, jgg; +Cc: linux-rdma, linux-kernel

This patch modified uar allocation algorithm in hns_roce_uar_alloc
function to avoid bitmap exhaust.

Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/hns/hns_roce_device.h |  1 +
 drivers/infiniband/hw/hns/hns_roce_pd.c     | 10 ++++++----
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 53c2f1b..412297d4 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -214,6 +214,7 @@ enum {
 struct hns_roce_uar {
 	u64		pfn;
 	unsigned long	index;
+	unsigned long	logic_idx;
 };
 
 struct hns_roce_ucontext {
diff --git a/drivers/infiniband/hw/hns/hns_roce_pd.c b/drivers/infiniband/hw/hns/hns_roce_pd.c
index 4b41e04..b9f2c87 100644
--- a/drivers/infiniband/hw/hns/hns_roce_pd.c
+++ b/drivers/infiniband/hw/hns/hns_roce_pd.c
@@ -107,13 +107,15 @@ int hns_roce_uar_alloc(struct hns_roce_dev *hr_dev, struct hns_roce_uar *uar)
 	int ret = 0;
 
 	/* Using bitmap to manager UAR index */
-	ret = hns_roce_bitmap_alloc(&hr_dev->uar_table.bitmap, &uar->index);
+	ret = hns_roce_bitmap_alloc(&hr_dev->uar_table.bitmap, &uar->logic_idx);
 	if (ret == -1)
 		return -ENOMEM;
 
-	if (uar->index > 0)
-		uar->index = (uar->index - 1) %
+	if (uar->logic_idx > 0 && hr_dev->caps.phy_num_uars > 1)
+		uar->index = (uar->logic_idx - 1) %
 			     (hr_dev->caps.phy_num_uars - 1) + 1;
+	else
+		uar->index = 0;
 
 	if (!dev_is_pci(hr_dev->dev)) {
 		res = platform_get_resource(hr_dev->pdev, IORESOURCE_MEM, 0);
@@ -132,7 +134,7 @@ int hns_roce_uar_alloc(struct hns_roce_dev *hr_dev, struct hns_roce_uar *uar)
 
 void hns_roce_uar_free(struct hns_roce_dev *hr_dev, struct hns_roce_uar *uar)
 {
-	hns_roce_bitmap_free(&hr_dev->uar_table.bitmap, uar->index,
+	hns_roce_bitmap_free(&hr_dev->uar_table.bitmap, uar->logic_idx,
 			     BITMAP_NO_RR);
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V2 rdma-next 1/4] RDMA/hns: Modify uar allocation algorithm to avoid bitmap exhaust
@ 2018-05-23 10:16   ` Wei Hu (Xavier)
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Hu (Xavier) @ 2018-05-23 10:16 UTC (permalink / raw)
  To: dledford, jgg; +Cc: linux-rdma, linux-kernel

This patch modified uar allocation algorithm in hns_roce_uar_alloc
function to avoid bitmap exhaust.

Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/hns/hns_roce_device.h |  1 +
 drivers/infiniband/hw/hns/hns_roce_pd.c     | 10 ++++++----
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 53c2f1b..412297d4 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -214,6 +214,7 @@ enum {
 struct hns_roce_uar {
 	u64		pfn;
 	unsigned long	index;
+	unsigned long	logic_idx;
 };
 
 struct hns_roce_ucontext {
diff --git a/drivers/infiniband/hw/hns/hns_roce_pd.c b/drivers/infiniband/hw/hns/hns_roce_pd.c
index 4b41e04..b9f2c87 100644
--- a/drivers/infiniband/hw/hns/hns_roce_pd.c
+++ b/drivers/infiniband/hw/hns/hns_roce_pd.c
@@ -107,13 +107,15 @@ int hns_roce_uar_alloc(struct hns_roce_dev *hr_dev, struct hns_roce_uar *uar)
 	int ret = 0;
 
 	/* Using bitmap to manager UAR index */
-	ret = hns_roce_bitmap_alloc(&hr_dev->uar_table.bitmap, &uar->index);
+	ret = hns_roce_bitmap_alloc(&hr_dev->uar_table.bitmap, &uar->logic_idx);
 	if (ret == -1)
 		return -ENOMEM;
 
-	if (uar->index > 0)
-		uar->index = (uar->index - 1) %
+	if (uar->logic_idx > 0 && hr_dev->caps.phy_num_uars > 1)
+		uar->index = (uar->logic_idx - 1) %
 			     (hr_dev->caps.phy_num_uars - 1) + 1;
+	else
+		uar->index = 0;
 
 	if (!dev_is_pci(hr_dev->dev)) {
 		res = platform_get_resource(hr_dev->pdev, IORESOURCE_MEM, 0);
@@ -132,7 +134,7 @@ int hns_roce_uar_alloc(struct hns_roce_dev *hr_dev, struct hns_roce_uar *uar)
 
 void hns_roce_uar_free(struct hns_roce_dev *hr_dev, struct hns_roce_uar *uar)
 {
-	hns_roce_bitmap_free(&hr_dev->uar_table.bitmap, uar->index,
+	hns_roce_bitmap_free(&hr_dev->uar_table.bitmap, uar->logic_idx,
 			     BITMAP_NO_RR);
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V2 rdma-next 2/4] RDMA/hns: Increase checking CMQ status timeout value
  2018-05-23 10:16 ` Wei Hu (Xavier)
@ 2018-05-23 10:16   ` Wei Hu (Xavier)
  -1 siblings, 0 replies; 23+ messages in thread
From: Wei Hu (Xavier) @ 2018-05-23 10:16 UTC (permalink / raw)
  To: dledford, jgg; +Cc: linux-rdma, linux-kernel

This patch increases checking CMQ status timeout value and
uses the same value with NIC driver to avoid deficiency of
time.

Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/hns/hns_roce_hw_v2.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
index 182b672..f16df1b 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
@@ -76,7 +76,7 @@
 #define HNS_ROCE_V2_PAGE_SIZE_SUPPORTED		0xFFFFF000
 #define HNS_ROCE_V2_MAX_INNER_MTPT_NUM		2
 #define HNS_ROCE_INVALID_LKEY			0x100
-#define HNS_ROCE_CMQ_TX_TIMEOUT			200
+#define HNS_ROCE_CMQ_TX_TIMEOUT			30000
 
 #define HNS_ROCE_CONTEXT_HOP_NUM		1
 #define HNS_ROCE_MTT_HOP_NUM			1
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V2 rdma-next 2/4] RDMA/hns: Increase checking CMQ status timeout value
@ 2018-05-23 10:16   ` Wei Hu (Xavier)
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Hu (Xavier) @ 2018-05-23 10:16 UTC (permalink / raw)
  To: dledford, jgg; +Cc: linux-rdma, linux-kernel

This patch increases checking CMQ status timeout value and
uses the same value with NIC driver to avoid deficiency of
time.

Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/hns/hns_roce_hw_v2.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
index 182b672..f16df1b 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
@@ -76,7 +76,7 @@
 #define HNS_ROCE_V2_PAGE_SIZE_SUPPORTED		0xFFFFF000
 #define HNS_ROCE_V2_MAX_INNER_MTPT_NUM		2
 #define HNS_ROCE_INVALID_LKEY			0x100
-#define HNS_ROCE_CMQ_TX_TIMEOUT			200
+#define HNS_ROCE_CMQ_TX_TIMEOUT			30000
 
 #define HNS_ROCE_CONTEXT_HOP_NUM		1
 #define HNS_ROCE_MTT_HOP_NUM			1
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V2 rdma-next 3/4] RDMA/hns: Add reset process for RoCE in hip08
  2018-05-23 10:16 ` Wei Hu (Xavier)
@ 2018-05-23 10:16   ` Wei Hu (Xavier)
  -1 siblings, 0 replies; 23+ messages in thread
From: Wei Hu (Xavier) @ 2018-05-23 10:16 UTC (permalink / raw)
  To: dledford, jgg; +Cc: linux-rdma, linux-kernel

This patch added reset process for RoCE in hip08.

Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>

---
v1->v2: 1.Delete handle->priv = NULL in hns_roce_hw_v2_uninit_instance.
	2.Add hns_roce_hw_v2_reset_notify_init callback function,
	  When RoCE reinit failed in this function, inform NIC driver.
	The related link of Jason's commets:
	https://www.spinics.net/lists/linux-rdma/msg65009.html
---
 drivers/infiniband/hw/hns/hns_roce_cmd.c    |  3 ++
 drivers/infiniband/hw/hns/hns_roce_device.h |  2 +
 drivers/infiniband/hw/hns/hns_roce_hw_v2.c  | 76 +++++++++++++++++++++++++++++
 drivers/infiniband/hw/hns/hns_roce_main.c   |  7 +++
 4 files changed, 88 insertions(+)

diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.c b/drivers/infiniband/hw/hns/hns_roce_cmd.c
index 9ebe839..a0ba19d 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cmd.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cmd.c
@@ -176,6 +176,9 @@ int hns_roce_cmd_mbox(struct hns_roce_dev *hr_dev, u64 in_param, u64 out_param,
 		      unsigned long in_modifier, u8 op_modifier, u16 op,
 		      unsigned long timeout)
 {
+	if (hr_dev->is_reset)
+		return 0;
+
 	if (hr_dev->cmd.use_events)
 		return hns_roce_cmd_mbox_wait(hr_dev, in_param, out_param,
 					      in_modifier, op_modifier, op,
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 412297d4..da8512b 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -774,6 +774,8 @@ struct hns_roce_dev {
 	const char		*irq_names[HNS_ROCE_MAX_IRQ_NUM];
 	spinlock_t		sm_lock;
 	spinlock_t		bt_cmd_lock;
+	bool			active;
+	bool			is_reset;
 	struct hns_roce_ib_iboe iboe;
 
 	struct list_head        pgdir_list;
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index e0ab672..a70d07b 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -768,6 +768,9 @@ static int hns_roce_cmq_send(struct hns_roce_dev *hr_dev,
 	int ret = 0;
 	int ntc;
 
+	if (hr_dev->is_reset)
+		return 0;
+
 	spin_lock_bh(&csq->lock);
 
 	if (num > hns_roce_cmq_space(csq)) {
@@ -4790,14 +4793,87 @@ static void hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
 {
 	struct hns_roce_dev *hr_dev = (struct hns_roce_dev *)handle->priv;
 
+	if (!hr_dev)
+		return;
+
 	hns_roce_exit(hr_dev);
 	kfree(hr_dev->priv);
 	ib_dealloc_device(&hr_dev->ib_dev);
 }
 
+static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
+{
+	struct hns_roce_dev *hr_dev = (struct hns_roce_dev *)handle->priv;
+	struct ib_event event;
+
+	if (!hr_dev) {
+		dev_err(&handle->pdev->dev,
+			"Input parameter handle->priv is NULL!\n");
+		return -EINVAL;
+	}
+
+	hr_dev->active = false;
+	hr_dev->is_reset = true;
+
+	event.event = IB_EVENT_DEVICE_FATAL;
+	event.device = &hr_dev->ib_dev;
+	event.element.port_num = 1;
+	ib_dispatch_event(&event);
+
+	return 0;
+}
+
+static int hns_roce_hw_v2_reset_notify_init(struct hnae3_handle *handle)
+{
+	int ret;
+
+	ret = hns_roce_hw_v2_init_instance(handle);
+	if (ret) {
+		/* when reset notify type is HNAE3_INIT_CLIENT In reset notify
+		 * callback function, RoCE Engine reinitialize. If RoCE reinit
+		 * failed, we should inform NIC driver.
+		 */
+		handle->priv = NULL;
+		dev_err(&handle->pdev->dev,
+			"In reset process RoCE reinit failed %d.\n", ret);
+	}
+
+	return ret;
+}
+
+static int hns_roce_hw_v2_reset_notify_uninit(struct hnae3_handle *handle)
+{
+	msleep(100);
+	hns_roce_hw_v2_uninit_instance(handle, false);
+	return 0;
+}
+
+static int hns_roce_hw_v2_reset_notify(struct hnae3_handle *handle,
+				       enum hnae3_reset_notify_type type)
+{
+	int ret = 0;
+
+	switch (type) {
+	case HNAE3_DOWN_CLIENT:
+		ret = hns_roce_hw_v2_reset_notify_down(handle);
+		break;
+	case HNAE3_INIT_CLIENT:
+		ret = hns_roce_hw_v2_reset_notify_init(handle);
+		break;
+	case HNAE3_UNINIT_CLIENT:
+		ret = hns_roce_hw_v2_reset_notify_uninit(handle);
+		break;
+	default:
+		break;
+	}
+
+	return ret;
+}
+
 static const struct hnae3_client_ops hns_roce_hw_v2_ops = {
 	.init_instance = hns_roce_hw_v2_init_instance,
 	.uninit_instance = hns_roce_hw_v2_uninit_instance,
+	.reset_notify = hns_roce_hw_v2_reset_notify,
 };
 
 static struct hnae3_client hns_roce_hw_v2_client = {
diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
index 1b79a38..ac51372 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -332,6 +332,9 @@ static struct ib_ucontext *hns_roce_alloc_ucontext(struct ib_device *ib_dev,
 	struct hns_roce_ib_alloc_ucontext_resp resp = {};
 	struct hns_roce_dev *hr_dev = to_hr_dev(ib_dev);
 
+	if (!hr_dev->active)
+		return ERR_PTR(-EAGAIN);
+
 	resp.qp_tab_size = hr_dev->caps.num_qps;
 
 	context = kmalloc(sizeof(*context), GFP_KERNEL);
@@ -425,6 +428,7 @@ static void hns_roce_unregister_device(struct hns_roce_dev *hr_dev)
 {
 	struct hns_roce_ib_iboe *iboe = &hr_dev->iboe;
 
+	hr_dev->active = false;
 	unregister_netdevice_notifier(&iboe->nb);
 	ib_unregister_device(&hr_dev->ib_dev);
 }
@@ -536,6 +540,7 @@ static int hns_roce_register_device(struct hns_roce_dev *hr_dev)
 		goto error_failed_setup_mtu_mac;
 	}
 
+	hr_dev->active = true;
 	return 0;
 
 error_failed_setup_mtu_mac:
@@ -728,6 +733,7 @@ int hns_roce_init(struct hns_roce_dev *hr_dev)
 			return ret;
 		}
 	}
+	hr_dev->is_reset = false;
 
 	if (hr_dev->hw->cmq_init) {
 		ret = hr_dev->hw->cmq_init(hr_dev);
@@ -827,6 +833,7 @@ int hns_roce_init(struct hns_roce_dev *hr_dev)
 void hns_roce_exit(struct hns_roce_dev *hr_dev)
 {
 	hns_roce_unregister_device(hr_dev);
+
 	if (hr_dev->hw->hw_exit)
 		hr_dev->hw->hw_exit(hr_dev);
 	hns_roce_cleanup_bitmap(hr_dev);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V2 rdma-next 3/4] RDMA/hns: Add reset process for RoCE in hip08
@ 2018-05-23 10:16   ` Wei Hu (Xavier)
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Hu (Xavier) @ 2018-05-23 10:16 UTC (permalink / raw)
  To: dledford, jgg; +Cc: linux-rdma, linux-kernel

This patch added reset process for RoCE in hip08.

Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>

---
v1->v2: 1.Delete handle->priv = NULL in hns_roce_hw_v2_uninit_instance.
	2.Add hns_roce_hw_v2_reset_notify_init callback function,
	  When RoCE reinit failed in this function, inform NIC driver.
	The related link of Jason's commets:
	https://www.spinics.net/lists/linux-rdma/msg65009.html
---
 drivers/infiniband/hw/hns/hns_roce_cmd.c    |  3 ++
 drivers/infiniband/hw/hns/hns_roce_device.h |  2 +
 drivers/infiniband/hw/hns/hns_roce_hw_v2.c  | 76 +++++++++++++++++++++++++++++
 drivers/infiniband/hw/hns/hns_roce_main.c   |  7 +++
 4 files changed, 88 insertions(+)

diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.c b/drivers/infiniband/hw/hns/hns_roce_cmd.c
index 9ebe839..a0ba19d 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cmd.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cmd.c
@@ -176,6 +176,9 @@ int hns_roce_cmd_mbox(struct hns_roce_dev *hr_dev, u64 in_param, u64 out_param,
 		      unsigned long in_modifier, u8 op_modifier, u16 op,
 		      unsigned long timeout)
 {
+	if (hr_dev->is_reset)
+		return 0;
+
 	if (hr_dev->cmd.use_events)
 		return hns_roce_cmd_mbox_wait(hr_dev, in_param, out_param,
 					      in_modifier, op_modifier, op,
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 412297d4..da8512b 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -774,6 +774,8 @@ struct hns_roce_dev {
 	const char		*irq_names[HNS_ROCE_MAX_IRQ_NUM];
 	spinlock_t		sm_lock;
 	spinlock_t		bt_cmd_lock;
+	bool			active;
+	bool			is_reset;
 	struct hns_roce_ib_iboe iboe;
 
 	struct list_head        pgdir_list;
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index e0ab672..a70d07b 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -768,6 +768,9 @@ static int hns_roce_cmq_send(struct hns_roce_dev *hr_dev,
 	int ret = 0;
 	int ntc;
 
+	if (hr_dev->is_reset)
+		return 0;
+
 	spin_lock_bh(&csq->lock);
 
 	if (num > hns_roce_cmq_space(csq)) {
@@ -4790,14 +4793,87 @@ static void hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
 {
 	struct hns_roce_dev *hr_dev = (struct hns_roce_dev *)handle->priv;
 
+	if (!hr_dev)
+		return;
+
 	hns_roce_exit(hr_dev);
 	kfree(hr_dev->priv);
 	ib_dealloc_device(&hr_dev->ib_dev);
 }
 
+static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
+{
+	struct hns_roce_dev *hr_dev = (struct hns_roce_dev *)handle->priv;
+	struct ib_event event;
+
+	if (!hr_dev) {
+		dev_err(&handle->pdev->dev,
+			"Input parameter handle->priv is NULL!\n");
+		return -EINVAL;
+	}
+
+	hr_dev->active = false;
+	hr_dev->is_reset = true;
+
+	event.event = IB_EVENT_DEVICE_FATAL;
+	event.device = &hr_dev->ib_dev;
+	event.element.port_num = 1;
+	ib_dispatch_event(&event);
+
+	return 0;
+}
+
+static int hns_roce_hw_v2_reset_notify_init(struct hnae3_handle *handle)
+{
+	int ret;
+
+	ret = hns_roce_hw_v2_init_instance(handle);
+	if (ret) {
+		/* when reset notify type is HNAE3_INIT_CLIENT In reset notify
+		 * callback function, RoCE Engine reinitialize. If RoCE reinit
+		 * failed, we should inform NIC driver.
+		 */
+		handle->priv = NULL;
+		dev_err(&handle->pdev->dev,
+			"In reset process RoCE reinit failed %d.\n", ret);
+	}
+
+	return ret;
+}
+
+static int hns_roce_hw_v2_reset_notify_uninit(struct hnae3_handle *handle)
+{
+	msleep(100);
+	hns_roce_hw_v2_uninit_instance(handle, false);
+	return 0;
+}
+
+static int hns_roce_hw_v2_reset_notify(struct hnae3_handle *handle,
+				       enum hnae3_reset_notify_type type)
+{
+	int ret = 0;
+
+	switch (type) {
+	case HNAE3_DOWN_CLIENT:
+		ret = hns_roce_hw_v2_reset_notify_down(handle);
+		break;
+	case HNAE3_INIT_CLIENT:
+		ret = hns_roce_hw_v2_reset_notify_init(handle);
+		break;
+	case HNAE3_UNINIT_CLIENT:
+		ret = hns_roce_hw_v2_reset_notify_uninit(handle);
+		break;
+	default:
+		break;
+	}
+
+	return ret;
+}
+
 static const struct hnae3_client_ops hns_roce_hw_v2_ops = {
 	.init_instance = hns_roce_hw_v2_init_instance,
 	.uninit_instance = hns_roce_hw_v2_uninit_instance,
+	.reset_notify = hns_roce_hw_v2_reset_notify,
 };
 
 static struct hnae3_client hns_roce_hw_v2_client = {
diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
index 1b79a38..ac51372 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -332,6 +332,9 @@ static struct ib_ucontext *hns_roce_alloc_ucontext(struct ib_device *ib_dev,
 	struct hns_roce_ib_alloc_ucontext_resp resp = {};
 	struct hns_roce_dev *hr_dev = to_hr_dev(ib_dev);
 
+	if (!hr_dev->active)
+		return ERR_PTR(-EAGAIN);
+
 	resp.qp_tab_size = hr_dev->caps.num_qps;
 
 	context = kmalloc(sizeof(*context), GFP_KERNEL);
@@ -425,6 +428,7 @@ static void hns_roce_unregister_device(struct hns_roce_dev *hr_dev)
 {
 	struct hns_roce_ib_iboe *iboe = &hr_dev->iboe;
 
+	hr_dev->active = false;
 	unregister_netdevice_notifier(&iboe->nb);
 	ib_unregister_device(&hr_dev->ib_dev);
 }
@@ -536,6 +540,7 @@ static int hns_roce_register_device(struct hns_roce_dev *hr_dev)
 		goto error_failed_setup_mtu_mac;
 	}
 
+	hr_dev->active = true;
 	return 0;
 
 error_failed_setup_mtu_mac:
@@ -728,6 +733,7 @@ int hns_roce_init(struct hns_roce_dev *hr_dev)
 			return ret;
 		}
 	}
+	hr_dev->is_reset = false;
 
 	if (hr_dev->hw->cmq_init) {
 		ret = hr_dev->hw->cmq_init(hr_dev);
@@ -827,6 +833,7 @@ int hns_roce_init(struct hns_roce_dev *hr_dev)
 void hns_roce_exit(struct hns_roce_dev *hr_dev)
 {
 	hns_roce_unregister_device(hr_dev);
+
 	if (hr_dev->hw->hw_exit)
 		hr_dev->hw->hw_exit(hr_dev);
 	hns_roce_cleanup_bitmap(hr_dev);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V2 rdma-next 4/4] RDMA/hns: Fix the illegal memory operation when cross page
  2018-05-23 10:16 ` Wei Hu (Xavier)
@ 2018-05-23 10:16   ` Wei Hu (Xavier)
  -1 siblings, 0 replies; 23+ messages in thread
From: Wei Hu (Xavier) @ 2018-05-23 10:16 UTC (permalink / raw)
  To: dledford, jgg; +Cc: linux-rdma, linux-kernel

This patch fixed the potential illegal operation when using the
extend sge buffer cross page in post send operation. The bug
will cause the calltrace.

Reported-by: Jie Chen <chenjie103@huawei.com>
Reported-by: Xiping Zhang (Francis) <zhangxiping3@huawei.com>
Fixes: b1c158350968("RDMA/hns: Get rid of virt_to_page and vmap calls after dma_alloc_coherent")
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>

---
v1->v2: Modify the Fixes statement according to Leon's comment.
---
 drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 70 +++++++++++++++++++++---------
 drivers/infiniband/hw/hns/hns_roce_hw_v2.h |  1 +
 2 files changed, 51 insertions(+), 20 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index a70d07b..62c1eb5 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -52,6 +52,52 @@ static void set_data_seg_v2(struct hns_roce_v2_wqe_data_seg *dseg,
 	dseg->len  = cpu_to_le32(sg->length);
 }
 
+static void set_extend_sge(struct hns_roce_qp *qp, struct ib_send_wr *wr,
+			   unsigned int *sge_ind)
+{
+	struct hns_roce_v2_wqe_data_seg *dseg;
+	struct ib_sge *sg;
+	int num_in_wqe = 0;
+	int extend_sge_num;
+	int fi_sge_num;
+	int se_sge_num;
+	int shift;
+	int i;
+
+	if (qp->ibqp.qp_type == IB_QPT_RC || qp->ibqp.qp_type == IB_QPT_UC)
+		num_in_wqe = HNS_ROCE_V2_UC_RC_SGE_NUM_IN_WQE;
+	extend_sge_num = wr->num_sge - num_in_wqe;
+	sg = wr->sg_list + num_in_wqe;
+	shift = qp->hr_buf.page_shift;
+
+	/*
+	 * Check whether wr->num_sge sges are in the same page. If not, we
+	 * should calculate how many sges in the first page and the second
+	 * page.
+	 */
+	dseg = get_send_extend_sge(qp, (*sge_ind) & (qp->sge.sge_cnt - 1));
+	fi_sge_num = (((((u64)dseg >> shift) + 1) << shift) - (u64)dseg) /
+		      sizeof(struct hns_roce_v2_wqe_data_seg);
+	if (extend_sge_num > fi_sge_num) {
+		se_sge_num = extend_sge_num - fi_sge_num;
+		for (i = 0; i < fi_sge_num; i++) {
+			set_data_seg_v2(dseg++, sg + i);
+			(*sge_ind)++;
+		}
+		dseg = get_send_extend_sge(qp,
+					   (*sge_ind) & (qp->sge.sge_cnt - 1));
+		for (i = 0; i < se_sge_num; i++) {
+			set_data_seg_v2(dseg++, sg + fi_sge_num + i);
+			(*sge_ind)++;
+		}
+	} else {
+		for (i = 0; i < extend_sge_num; i++) {
+			set_data_seg_v2(dseg++, sg + i);
+			(*sge_ind)++;
+		}
+	}
+}
+
 static int set_rwqe_data_seg(struct ib_qp *ibqp, struct ib_send_wr *wr,
 			     struct hns_roce_v2_rc_send_wqe *rc_sq_wqe,
 			     void *wqe, unsigned int *sge_ind,
@@ -85,7 +131,7 @@ static int set_rwqe_data_seg(struct ib_qp *ibqp, struct ib_send_wr *wr,
 		roce_set_bit(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_INLINE_S,
 			     1);
 	} else {
-		if (wr->num_sge <= 2) {
+		if (wr->num_sge <= HNS_ROCE_V2_UC_RC_SGE_NUM_IN_WQE) {
 			for (i = 0; i < wr->num_sge; i++) {
 				if (likely(wr->sg_list[i].length)) {
 					set_data_seg_v2(dseg, wr->sg_list + i);
@@ -98,24 +144,14 @@ static int set_rwqe_data_seg(struct ib_qp *ibqp, struct ib_send_wr *wr,
 				     V2_RC_SEND_WQE_BYTE_20_MSG_START_SGE_IDX_S,
 				     (*sge_ind) & (qp->sge.sge_cnt - 1));
 
-			for (i = 0; i < 2; i++) {
+			for (i = 0; i < HNS_ROCE_V2_UC_RC_SGE_NUM_IN_WQE; i++) {
 				if (likely(wr->sg_list[i].length)) {
 					set_data_seg_v2(dseg, wr->sg_list + i);
 					dseg++;
 				}
 			}
 
-			dseg = get_send_extend_sge(qp,
-					    (*sge_ind) & (qp->sge.sge_cnt - 1));
-
-			for (i = 0; i < wr->num_sge - 2; i++) {
-				if (likely(wr->sg_list[i + 2].length)) {
-					set_data_seg_v2(dseg,
-							wr->sg_list + 2 + i);
-					dseg++;
-					(*sge_ind)++;
-				}
-			}
+			set_extend_sge(qp, wr, sge_ind);
 		}
 
 		roce_set_field(rc_sq_wqe->byte_16,
@@ -318,13 +354,7 @@ static int hns_roce_v2_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 			memcpy(&ud_sq_wqe->dgid[0], &ah->av.dgid[0],
 			       GID_LEN_V2);
 
-			dseg = get_send_extend_sge(qp,
-					    sge_ind & (qp->sge.sge_cnt - 1));
-			for (i = 0; i < wr->num_sge; i++) {
-				set_data_seg_v2(dseg + i, wr->sg_list + i);
-				sge_ind++;
-			}
-
+			set_extend_sge(qp, wr, &sge_ind);
 			ind++;
 		} else if (ibqp->qp_type == IB_QPT_RC) {
 			rc_sq_wqe = wqe;
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
index f16df1b..97f9bef 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
@@ -77,6 +77,7 @@
 #define HNS_ROCE_V2_MAX_INNER_MTPT_NUM		2
 #define HNS_ROCE_INVALID_LKEY			0x100
 #define HNS_ROCE_CMQ_TX_TIMEOUT			30000
+#define HNS_ROCE_V2_UC_RC_SGE_NUM_IN_WQE	2
 
 #define HNS_ROCE_CONTEXT_HOP_NUM		1
 #define HNS_ROCE_MTT_HOP_NUM			1
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V2 rdma-next 4/4] RDMA/hns: Fix the illegal memory operation when cross page
@ 2018-05-23 10:16   ` Wei Hu (Xavier)
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Hu (Xavier) @ 2018-05-23 10:16 UTC (permalink / raw)
  To: dledford, jgg; +Cc: linux-rdma, linux-kernel

This patch fixed the potential illegal operation when using the
extend sge buffer cross page in post send operation. The bug
will cause the calltrace.

Reported-by: Jie Chen <chenjie103@huawei.com>
Reported-by: Xiping Zhang (Francis) <zhangxiping3@huawei.com>
Fixes: b1c158350968("RDMA/hns: Get rid of virt_to_page and vmap calls after dma_alloc_coherent")
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>

---
v1->v2: Modify the Fixes statement according to Leon's comment.
---
 drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 70 +++++++++++++++++++++---------
 drivers/infiniband/hw/hns/hns_roce_hw_v2.h |  1 +
 2 files changed, 51 insertions(+), 20 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index a70d07b..62c1eb5 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -52,6 +52,52 @@ static void set_data_seg_v2(struct hns_roce_v2_wqe_data_seg *dseg,
 	dseg->len  = cpu_to_le32(sg->length);
 }
 
+static void set_extend_sge(struct hns_roce_qp *qp, struct ib_send_wr *wr,
+			   unsigned int *sge_ind)
+{
+	struct hns_roce_v2_wqe_data_seg *dseg;
+	struct ib_sge *sg;
+	int num_in_wqe = 0;
+	int extend_sge_num;
+	int fi_sge_num;
+	int se_sge_num;
+	int shift;
+	int i;
+
+	if (qp->ibqp.qp_type == IB_QPT_RC || qp->ibqp.qp_type == IB_QPT_UC)
+		num_in_wqe = HNS_ROCE_V2_UC_RC_SGE_NUM_IN_WQE;
+	extend_sge_num = wr->num_sge - num_in_wqe;
+	sg = wr->sg_list + num_in_wqe;
+	shift = qp->hr_buf.page_shift;
+
+	/*
+	 * Check whether wr->num_sge sges are in the same page. If not, we
+	 * should calculate how many sges in the first page and the second
+	 * page.
+	 */
+	dseg = get_send_extend_sge(qp, (*sge_ind) & (qp->sge.sge_cnt - 1));
+	fi_sge_num = (((((u64)dseg >> shift) + 1) << shift) - (u64)dseg) /
+		      sizeof(struct hns_roce_v2_wqe_data_seg);
+	if (extend_sge_num > fi_sge_num) {
+		se_sge_num = extend_sge_num - fi_sge_num;
+		for (i = 0; i < fi_sge_num; i++) {
+			set_data_seg_v2(dseg++, sg + i);
+			(*sge_ind)++;
+		}
+		dseg = get_send_extend_sge(qp,
+					   (*sge_ind) & (qp->sge.sge_cnt - 1));
+		for (i = 0; i < se_sge_num; i++) {
+			set_data_seg_v2(dseg++, sg + fi_sge_num + i);
+			(*sge_ind)++;
+		}
+	} else {
+		for (i = 0; i < extend_sge_num; i++) {
+			set_data_seg_v2(dseg++, sg + i);
+			(*sge_ind)++;
+		}
+	}
+}
+
 static int set_rwqe_data_seg(struct ib_qp *ibqp, struct ib_send_wr *wr,
 			     struct hns_roce_v2_rc_send_wqe *rc_sq_wqe,
 			     void *wqe, unsigned int *sge_ind,
@@ -85,7 +131,7 @@ static int set_rwqe_data_seg(struct ib_qp *ibqp, struct ib_send_wr *wr,
 		roce_set_bit(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_INLINE_S,
 			     1);
 	} else {
-		if (wr->num_sge <= 2) {
+		if (wr->num_sge <= HNS_ROCE_V2_UC_RC_SGE_NUM_IN_WQE) {
 			for (i = 0; i < wr->num_sge; i++) {
 				if (likely(wr->sg_list[i].length)) {
 					set_data_seg_v2(dseg, wr->sg_list + i);
@@ -98,24 +144,14 @@ static int set_rwqe_data_seg(struct ib_qp *ibqp, struct ib_send_wr *wr,
 				     V2_RC_SEND_WQE_BYTE_20_MSG_START_SGE_IDX_S,
 				     (*sge_ind) & (qp->sge.sge_cnt - 1));
 
-			for (i = 0; i < 2; i++) {
+			for (i = 0; i < HNS_ROCE_V2_UC_RC_SGE_NUM_IN_WQE; i++) {
 				if (likely(wr->sg_list[i].length)) {
 					set_data_seg_v2(dseg, wr->sg_list + i);
 					dseg++;
 				}
 			}
 
-			dseg = get_send_extend_sge(qp,
-					    (*sge_ind) & (qp->sge.sge_cnt - 1));
-
-			for (i = 0; i < wr->num_sge - 2; i++) {
-				if (likely(wr->sg_list[i + 2].length)) {
-					set_data_seg_v2(dseg,
-							wr->sg_list + 2 + i);
-					dseg++;
-					(*sge_ind)++;
-				}
-			}
+			set_extend_sge(qp, wr, sge_ind);
 		}
 
 		roce_set_field(rc_sq_wqe->byte_16,
@@ -318,13 +354,7 @@ static int hns_roce_v2_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 			memcpy(&ud_sq_wqe->dgid[0], &ah->av.dgid[0],
 			       GID_LEN_V2);
 
-			dseg = get_send_extend_sge(qp,
-					    sge_ind & (qp->sge.sge_cnt - 1));
-			for (i = 0; i < wr->num_sge; i++) {
-				set_data_seg_v2(dseg + i, wr->sg_list + i);
-				sge_ind++;
-			}
-
+			set_extend_sge(qp, wr, &sge_ind);
 			ind++;
 		} else if (ibqp->qp_type == IB_QPT_RC) {
 			rc_sq_wqe = wqe;
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
index f16df1b..97f9bef 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
@@ -77,6 +77,7 @@
 #define HNS_ROCE_V2_MAX_INNER_MTPT_NUM		2
 #define HNS_ROCE_INVALID_LKEY			0x100
 #define HNS_ROCE_CMQ_TX_TIMEOUT			30000
+#define HNS_ROCE_V2_UC_RC_SGE_NUM_IN_WQE	2
 
 #define HNS_ROCE_CONTEXT_HOP_NUM		1
 #define HNS_ROCE_MTT_HOP_NUM			1
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 rdma-next 3/4] RDMA/hns: Add reset process for RoCE in hip08
  2018-05-23 10:16   ` Wei Hu (Xavier)
  (?)
@ 2018-05-24 21:31   ` Jason Gunthorpe
  2018-05-25  5:54       ` Wei Hu (Xavier)
  -1 siblings, 1 reply; 23+ messages in thread
From: Jason Gunthorpe @ 2018-05-24 21:31 UTC (permalink / raw)
  To: Wei Hu (Xavier); +Cc: dledford, linux-rdma, linux-kernel

On Wed, May 23, 2018 at 06:16:29PM +0800, Wei Hu (Xavier) wrote:
> This patch added reset process for RoCE in hip08.
> 
> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
> 
> v1->v2: 1.Delete handle->priv = NULL in hns_roce_hw_v2_uninit_instance.
> 	2.Add hns_roce_hw_v2_reset_notify_init callback function,
> 	  When RoCE reinit failed in this function, inform NIC driver.
> 	The related link of Jason's commets:
> 	https://www.spinics.net/lists/linux-rdma/msg65009.html
>  drivers/infiniband/hw/hns/hns_roce_cmd.c    |  3 ++
>  drivers/infiniband/hw/hns/hns_roce_device.h |  2 +
>  drivers/infiniband/hw/hns/hns_roce_hw_v2.c  | 76 +++++++++++++++++++++++++++++
>  drivers/infiniband/hw/hns/hns_roce_main.c   |  7 +++
>  4 files changed, 88 insertions(+)
> 
> diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.c b/drivers/infiniband/hw/hns/hns_roce_cmd.c
> index 9ebe839..a0ba19d 100644
> +++ b/drivers/infiniband/hw/hns/hns_roce_cmd.c
> @@ -176,6 +176,9 @@ int hns_roce_cmd_mbox(struct hns_roce_dev *hr_dev, u64 in_param, u64 out_param,
>  		      unsigned long in_modifier, u8 op_modifier, u16 op,
>  		      unsigned long timeout)
>  {
> +	if (hr_dev->is_reset)
> +		return 0;
> +
>  	if (hr_dev->cmd.use_events)
>  		return hns_roce_cmd_mbox_wait(hr_dev, in_param, out_param,
>  					      in_modifier, op_modifier, op,
> diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
> index 412297d4..da8512b 100644
> +++ b/drivers/infiniband/hw/hns/hns_roce_device.h
> @@ -774,6 +774,8 @@ struct hns_roce_dev {
>  	const char		*irq_names[HNS_ROCE_MAX_IRQ_NUM];
>  	spinlock_t		sm_lock;
>  	spinlock_t		bt_cmd_lock;
> +	bool			active;
> +	bool			is_reset;
>  	struct hns_roce_ib_iboe iboe;
>  
>  	struct list_head        pgdir_list;
> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> index e0ab672..a70d07b 100644
> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> @@ -768,6 +768,9 @@ static int hns_roce_cmq_send(struct hns_roce_dev *hr_dev,
>  	int ret = 0;
>  	int ntc;
>  
> +	if (hr_dev->is_reset)
> +		return 0;
> +
>  	spin_lock_bh(&csq->lock);
>  
>  	if (num > hns_roce_cmq_space(csq)) {
> @@ -4790,14 +4793,87 @@ static void hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
>  {
>  	struct hns_roce_dev *hr_dev = (struct hns_roce_dev *)handle->priv;
>  
> +	if (!hr_dev)
> +		return;
> +
>  	hns_roce_exit(hr_dev);
>  	kfree(hr_dev->priv);
>  	ib_dealloc_device(&hr_dev->ib_dev);
>  }
>  
> +static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
> +{
> +	struct hns_roce_dev *hr_dev = (struct hns_roce_dev *)handle->priv;
> +	struct ib_event event;
> +
> +	if (!hr_dev) {
> +		dev_err(&handle->pdev->dev,
> +			"Input parameter handle->priv is NULL!\n");
> +		return -EINVAL;
> +	}
> +
> +	hr_dev->active = false;
> +	hr_dev->is_reset = true;
> +
> +	event.event = IB_EVENT_DEVICE_FATAL;
> +	event.device = &hr_dev->ib_dev;
> +	event.element.port_num = 1;
> +	ib_dispatch_event(&event);
> +
> +	return 0;
> +}
> +
> +static int hns_roce_hw_v2_reset_notify_init(struct hnae3_handle *handle)
> +{
> +	int ret;
> +
> +	ret = hns_roce_hw_v2_init_instance(handle);
> +	if (ret) {
> +		/* when reset notify type is HNAE3_INIT_CLIENT In reset notify
> +		 * callback function, RoCE Engine reinitialize. If RoCE reinit
> +		 * failed, we should inform NIC driver.
> +		 */
> +		handle->priv = NULL;
> +		dev_err(&handle->pdev->dev,
> +			"In reset process RoCE reinit failed %d.\n", ret);
> +	}
> +
> +	return ret;
> +}
> +
> +static int hns_roce_hw_v2_reset_notify_uninit(struct hnae3_handle *handle)
> +{
> +	msleep(100);
> +	hns_roce_hw_v2_uninit_instance(handle, false);
> +	return 0;
> +}
> +
> +static int hns_roce_hw_v2_reset_notify(struct hnae3_handle *handle,
> +				       enum hnae3_reset_notify_type type)
> +{
> +	int ret = 0;
> +
> +	switch (type) {
> +	case HNAE3_DOWN_CLIENT:
> +		ret = hns_roce_hw_v2_reset_notify_down(handle);
> +		break;
> +	case HNAE3_INIT_CLIENT:
> +		ret = hns_roce_hw_v2_reset_notify_init(handle);
> +		break;
> +	case HNAE3_UNINIT_CLIENT:
> +		ret = hns_roce_hw_v2_reset_notify_uninit(handle);
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	return ret;
> +}
> +
>  static const struct hnae3_client_ops hns_roce_hw_v2_ops = {
>  	.init_instance = hns_roce_hw_v2_init_instance,
>  	.uninit_instance = hns_roce_hw_v2_uninit_instance,
> +	.reset_notify = hns_roce_hw_v2_reset_notify,
>  };
>  
>  static struct hnae3_client hns_roce_hw_v2_client = {
> diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
> index 1b79a38..ac51372 100644
> +++ b/drivers/infiniband/hw/hns/hns_roce_main.c
> @@ -332,6 +332,9 @@ static struct ib_ucontext *hns_roce_alloc_ucontext(struct ib_device *ib_dev,
>  	struct hns_roce_ib_alloc_ucontext_resp resp = {};
>  	struct hns_roce_dev *hr_dev = to_hr_dev(ib_dev);
>  
> +	if (!hr_dev->active)
> +		return ERR_PTR(-EAGAIN);

This still doesn't make sense, ib_unregister_device already makes sure
that hns_roce_alloc_ucontext isn't running and won't be called before
returning, don't need another flag to do that.

Since this is the only place the active flag is tested it can just be deleted
entirely.

> @@ -425,6 +428,7 @@ static void hns_roce_unregister_device(struct hns_roce_dev *hr_dev)
>  {
>  	struct hns_roce_ib_iboe *iboe = &hr_dev->iboe;
>  
> +	hr_dev->active = false;
>  	unregister_netdevice_notifier(&iboe->nb);
>  	ib_unregister_device(&hr_dev->ib_dev);

Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 rdma-next 4/4] RDMA/hns: Fix the illegal memory operation when cross page
  2018-05-23 10:16   ` Wei Hu (Xavier)
  (?)
@ 2018-05-24 21:40   ` Jason Gunthorpe
  2018-05-25  5:56       ` Wei Hu (Xavier)
  -1 siblings, 1 reply; 23+ messages in thread
From: Jason Gunthorpe @ 2018-05-24 21:40 UTC (permalink / raw)
  To: Wei Hu (Xavier); +Cc: dledford, linux-rdma, linux-kernel

On Wed, May 23, 2018 at 06:16:30PM +0800, Wei Hu (Xavier) wrote:
> This patch fixed the potential illegal operation when using the
> extend sge buffer cross page in post send operation. The bug
> will cause the calltrace.

Should include the oops for reference here..

> Reported-by: Jie Chen <chenjie103@huawei.com>
> Reported-by: Xiping Zhang (Francis) <zhangxiping3@huawei.com>
> Fixes: b1c158350968("RDMA/hns: Get rid of virt_to_page and vmap calls after dma_alloc_coherent")
> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
> 
> v1->v2: Modify the Fixes statement according to Leon's comment.
>  drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 70 +++++++++++++++++++++---------
>  drivers/infiniband/hw/hns/hns_roce_hw_v2.h |  1 +
>  2 files changed, 51 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> index a70d07b..62c1eb5 100644
> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> @@ -52,6 +52,52 @@ static void set_data_seg_v2(struct hns_roce_v2_wqe_data_seg *dseg,
>  	dseg->len  = cpu_to_le32(sg->length);
>  }
>  
> +static void set_extend_sge(struct hns_roce_qp *qp, struct ib_send_wr *wr,
> +			   unsigned int *sge_ind)
> +{
> +	struct hns_roce_v2_wqe_data_seg *dseg;
> +	struct ib_sge *sg;
> +	int num_in_wqe = 0;
> +	int extend_sge_num;
> +	int fi_sge_num;
> +	int se_sge_num;
> +	int shift;
> +	int i;
> +
> +	if (qp->ibqp.qp_type == IB_QPT_RC || qp->ibqp.qp_type == IB_QPT_UC)
> +		num_in_wqe = HNS_ROCE_V2_UC_RC_SGE_NUM_IN_WQE;
> +	extend_sge_num = wr->num_sge - num_in_wqe;
> +	sg = wr->sg_list + num_in_wqe;
> +	shift = qp->hr_buf.page_shift;
> +
> +	/*
> +	 * Check whether wr->num_sge sges are in the same page. If not, we
> +	 * should calculate how many sges in the first page and the second
> +	 * page.
> +	 */
> +	dseg = get_send_extend_sge(qp, (*sge_ind) & (qp->sge.sge_cnt - 1));
> +	fi_sge_num = (((((u64)dseg >> shift) + 1) << shift) - (u64)dseg) /
> +		      sizeof(struct hns_roce_v2_wqe_data_seg);

desg is a pointer.. that u64 should be a uinptr_t

And it is better written as

  (round_up((uintptr_t)dseg, 1 << shift) - (uintptr)desg)/sizeof(struct hns_roce_v2_wqe_data_seg)

if I got it right..

Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 rdma-next 0/4] Misc update for hns driver
  2018-05-23 10:16 ` Wei Hu (Xavier)
                   ` (4 preceding siblings ...)
  (?)
@ 2018-05-24 21:43 ` Jason Gunthorpe
  2018-05-26  7:40     ` Wei Hu (Xavier)
  -1 siblings, 1 reply; 23+ messages in thread
From: Jason Gunthorpe @ 2018-05-24 21:43 UTC (permalink / raw)
  To: Wei Hu (Xavier); +Cc: dledford, linux-rdma, linux-kernel

On Wed, May 23, 2018 at 06:16:26PM +0800, Wei Hu (Xavier) wrote:
> This patchset included fixing bug, some optimization, reset
> process for hns driver.
> 
> v1->v2: 1.pull the patch out of the series. The name of the patch is 
> 	  RDMA/hns: Implement the disassociate_ucontext API.
> 	2.Modify the fixes statement according to Leon's comment.
> 	3.Address the Jason's comment in reset relevant patch.
> 
> Wei Hu (Xavier) (4):
>   RDMA/hns: Modify uar allocation algorithm to avoid bitmap exhaust
>   RDMA/hns: Increase checking CMQ status timeout value

I took these two to for-next , the others have to respun.

>   RDMA/hns: Add reset process for RoCE in hip08
>   RDMA/hns: Fix the illegal memory operation when cross page

Thanks,
Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 rdma-next 3/4] RDMA/hns: Add reset process for RoCE in hip08
  2018-05-24 21:31   ` Jason Gunthorpe
@ 2018-05-25  5:54       ` Wei Hu (Xavier)
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Hu (Xavier) @ 2018-05-25  5:54 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: dledford, linux-rdma, linux-kernel



On 2018/5/25 5:31, Jason Gunthorpe wrote:
> On Wed, May 23, 2018 at 06:16:29PM +0800, Wei Hu (Xavier) wrote:
>> This patch added reset process for RoCE in hip08.
>>
>> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
>>
>> v1->v2: 1.Delete handle->priv = NULL in hns_roce_hw_v2_uninit_instance.
>> 	2.Add hns_roce_hw_v2_reset_notify_init callback function,
>> 	  When RoCE reinit failed in this function, inform NIC driver.
>> 	The related link of Jason's commets:
>> 	https://www.spinics.net/lists/linux-rdma/msg65009.html
>>  drivers/infiniband/hw/hns/hns_roce_cmd.c    |  3 ++
>>  drivers/infiniband/hw/hns/hns_roce_device.h |  2 +
>>  drivers/infiniband/hw/hns/hns_roce_hw_v2.c  | 76 +++++++++++++++++++++++++++++
>>  drivers/infiniband/hw/hns/hns_roce_main.c   |  7 +++
>>  4 files changed, 88 insertions(+)
>>
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.c b/drivers/infiniband/hw/hns/hns_roce_cmd.c
>> index 9ebe839..a0ba19d 100644
>> +++ b/drivers/infiniband/hw/hns/hns_roce_cmd.c
>> @@ -176,6 +176,9 @@ int hns_roce_cmd_mbox(struct hns_roce_dev *hr_dev, u64 in_param, u64 out_param,
>>  		      unsigned long in_modifier, u8 op_modifier, u16 op,
>>  		      unsigned long timeout)
>>  {
>> +	if (hr_dev->is_reset)
>> +		return 0;
>> +
>>  	if (hr_dev->cmd.use_events)
>>  		return hns_roce_cmd_mbox_wait(hr_dev, in_param, out_param,
>>  					      in_modifier, op_modifier, op,
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
>> index 412297d4..da8512b 100644
>> +++ b/drivers/infiniband/hw/hns/hns_roce_device.h
>> @@ -774,6 +774,8 @@ struct hns_roce_dev {
>>  	const char		*irq_names[HNS_ROCE_MAX_IRQ_NUM];
>>  	spinlock_t		sm_lock;
>>  	spinlock_t		bt_cmd_lock;
>> +	bool			active;
>> +	bool			is_reset;
>>  	struct hns_roce_ib_iboe iboe;
>>  
>>  	struct list_head        pgdir_list;
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> index e0ab672..a70d07b 100644
>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> @@ -768,6 +768,9 @@ static int hns_roce_cmq_send(struct hns_roce_dev *hr_dev,
>>  	int ret = 0;
>>  	int ntc;
>>  
>> +	if (hr_dev->is_reset)
>> +		return 0;
>> +
>>  	spin_lock_bh(&csq->lock);
>>  
>>  	if (num > hns_roce_cmq_space(csq)) {
>> @@ -4790,14 +4793,87 @@ static void hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
>>  {
>>  	struct hns_roce_dev *hr_dev = (struct hns_roce_dev *)handle->priv;
>>  
>> +	if (!hr_dev)
>> +		return;
>> +
>>  	hns_roce_exit(hr_dev);
>>  	kfree(hr_dev->priv);
>>  	ib_dealloc_device(&hr_dev->ib_dev);
>>  }
>>  
>> +static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
>> +{
>> +	struct hns_roce_dev *hr_dev = (struct hns_roce_dev *)handle->priv;
>> +	struct ib_event event;
>> +
>> +	if (!hr_dev) {
>> +		dev_err(&handle->pdev->dev,
>> +			"Input parameter handle->priv is NULL!\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	hr_dev->active = false;
>> +	hr_dev->is_reset = true;
>> +
>> +	event.event = IB_EVENT_DEVICE_FATAL;
>> +	event.device = &hr_dev->ib_dev;
>> +	event.element.port_num = 1;
>> +	ib_dispatch_event(&event);
>> +
>> +	return 0;
>> +}
>> +
>> +static int hns_roce_hw_v2_reset_notify_init(struct hnae3_handle *handle)
>> +{
>> +	int ret;
>> +
>> +	ret = hns_roce_hw_v2_init_instance(handle);
>> +	if (ret) {
>> +		/* when reset notify type is HNAE3_INIT_CLIENT In reset notify
>> +		 * callback function, RoCE Engine reinitialize. If RoCE reinit
>> +		 * failed, we should inform NIC driver.
>> +		 */
>> +		handle->priv = NULL;
>> +		dev_err(&handle->pdev->dev,
>> +			"In reset process RoCE reinit failed %d.\n", ret);
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +static int hns_roce_hw_v2_reset_notify_uninit(struct hnae3_handle *handle)
>> +{
>> +	msleep(100);
>> +	hns_roce_hw_v2_uninit_instance(handle, false);
>> +	return 0;
>> +}
>> +
>> +static int hns_roce_hw_v2_reset_notify(struct hnae3_handle *handle,
>> +				       enum hnae3_reset_notify_type type)
>> +{
>> +	int ret = 0;
>> +
>> +	switch (type) {
>> +	case HNAE3_DOWN_CLIENT:
>> +		ret = hns_roce_hw_v2_reset_notify_down(handle);
>> +		break;
>> +	case HNAE3_INIT_CLIENT:
>> +		ret = hns_roce_hw_v2_reset_notify_init(handle);
>> +		break;
>> +	case HNAE3_UNINIT_CLIENT:
>> +		ret = hns_roce_hw_v2_reset_notify_uninit(handle);
>> +		break;
>> +	default:
>> +		break;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>>  static const struct hnae3_client_ops hns_roce_hw_v2_ops = {
>>  	.init_instance = hns_roce_hw_v2_init_instance,
>>  	.uninit_instance = hns_roce_hw_v2_uninit_instance,
>> +	.reset_notify = hns_roce_hw_v2_reset_notify,
>>  };
>>  
>>  static struct hnae3_client hns_roce_hw_v2_client = {
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
>> index 1b79a38..ac51372 100644
>> +++ b/drivers/infiniband/hw/hns/hns_roce_main.c
>> @@ -332,6 +332,9 @@ static struct ib_ucontext *hns_roce_alloc_ucontext(struct ib_device *ib_dev,
>>  	struct hns_roce_ib_alloc_ucontext_resp resp = {};
>>  	struct hns_roce_dev *hr_dev = to_hr_dev(ib_dev);
>>  
>> +	if (!hr_dev->active)
>> +		return ERR_PTR(-EAGAIN);
> This still doesn't make sense, ib_unregister_device already makes sure
> that hns_roce_alloc_ucontext isn't running and won't be called before
> returning, don't need another flag to do that.
>
> Since this is the only place the active flag is tested it can just be deleted
> entirely.
Hi, Jason
   
    roce reset process:
    1. hr_dev->active = false;  //make sure no any process call
ibv_open_device.   
    2. call ib_dispatch_event() function to report IB_EVENT_DEVICE_FATAL
event.
    3. msleep(100);   // for some app to free resources
    4. call ib_unregister_device().   
    5. ...
    6. ...

    There are 2 steps as above before calling ib_unregister_device(), we
evaluate
hr_dev->active with false to avoid no any process call ibv_open_device.
    Thanks     

    Regards
Wei Hu
>> @@ -425,6 +428,7 @@ static void hns_roce_unregister_device(struct hns_roce_dev *hr_dev)
>>  {
>>  	struct hns_roce_ib_iboe *iboe = &hr_dev->iboe;
>>  
>> +	hr_dev->active = false;
>>  	unregister_netdevice_notifier(&iboe->nb);
>>  	ib_unregister_device(&hr_dev->ib_dev);
> Jason
>
> .
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 rdma-next 3/4] RDMA/hns: Add reset process for RoCE in hip08
@ 2018-05-25  5:54       ` Wei Hu (Xavier)
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Hu (Xavier) @ 2018-05-25  5:54 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: dledford, linux-rdma, linux-kernel



On 2018/5/25 5:31, Jason Gunthorpe wrote:
> On Wed, May 23, 2018 at 06:16:29PM +0800, Wei Hu (Xavier) wrote:
>> This patch added reset process for RoCE in hip08.
>>
>> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
>>
>> v1->v2: 1.Delete handle->priv = NULL in hns_roce_hw_v2_uninit_instance.
>> 	2.Add hns_roce_hw_v2_reset_notify_init callback function,
>> 	  When RoCE reinit failed in this function, inform NIC driver.
>> 	The related link of Jason's commets:
>> 	https://www.spinics.net/lists/linux-rdma/msg65009.html
>>  drivers/infiniband/hw/hns/hns_roce_cmd.c    |  3 ++
>>  drivers/infiniband/hw/hns/hns_roce_device.h |  2 +
>>  drivers/infiniband/hw/hns/hns_roce_hw_v2.c  | 76 +++++++++++++++++++++++++++++
>>  drivers/infiniband/hw/hns/hns_roce_main.c   |  7 +++
>>  4 files changed, 88 insertions(+)
>>
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.c b/drivers/infiniband/hw/hns/hns_roce_cmd.c
>> index 9ebe839..a0ba19d 100644
>> +++ b/drivers/infiniband/hw/hns/hns_roce_cmd.c
>> @@ -176,6 +176,9 @@ int hns_roce_cmd_mbox(struct hns_roce_dev *hr_dev, u64 in_param, u64 out_param,
>>  		      unsigned long in_modifier, u8 op_modifier, u16 op,
>>  		      unsigned long timeout)
>>  {
>> +	if (hr_dev->is_reset)
>> +		return 0;
>> +
>>  	if (hr_dev->cmd.use_events)
>>  		return hns_roce_cmd_mbox_wait(hr_dev, in_param, out_param,
>>  					      in_modifier, op_modifier, op,
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
>> index 412297d4..da8512b 100644
>> +++ b/drivers/infiniband/hw/hns/hns_roce_device.h
>> @@ -774,6 +774,8 @@ struct hns_roce_dev {
>>  	const char		*irq_names[HNS_ROCE_MAX_IRQ_NUM];
>>  	spinlock_t		sm_lock;
>>  	spinlock_t		bt_cmd_lock;
>> +	bool			active;
>> +	bool			is_reset;
>>  	struct hns_roce_ib_iboe iboe;
>>  
>>  	struct list_head        pgdir_list;
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> index e0ab672..a70d07b 100644
>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> @@ -768,6 +768,9 @@ static int hns_roce_cmq_send(struct hns_roce_dev *hr_dev,
>>  	int ret = 0;
>>  	int ntc;
>>  
>> +	if (hr_dev->is_reset)
>> +		return 0;
>> +
>>  	spin_lock_bh(&csq->lock);
>>  
>>  	if (num > hns_roce_cmq_space(csq)) {
>> @@ -4790,14 +4793,87 @@ static void hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
>>  {
>>  	struct hns_roce_dev *hr_dev = (struct hns_roce_dev *)handle->priv;
>>  
>> +	if (!hr_dev)
>> +		return;
>> +
>>  	hns_roce_exit(hr_dev);
>>  	kfree(hr_dev->priv);
>>  	ib_dealloc_device(&hr_dev->ib_dev);
>>  }
>>  
>> +static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
>> +{
>> +	struct hns_roce_dev *hr_dev = (struct hns_roce_dev *)handle->priv;
>> +	struct ib_event event;
>> +
>> +	if (!hr_dev) {
>> +		dev_err(&handle->pdev->dev,
>> +			"Input parameter handle->priv is NULL!\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	hr_dev->active = false;
>> +	hr_dev->is_reset = true;
>> +
>> +	event.event = IB_EVENT_DEVICE_FATAL;
>> +	event.device = &hr_dev->ib_dev;
>> +	event.element.port_num = 1;
>> +	ib_dispatch_event(&event);
>> +
>> +	return 0;
>> +}
>> +
>> +static int hns_roce_hw_v2_reset_notify_init(struct hnae3_handle *handle)
>> +{
>> +	int ret;
>> +
>> +	ret = hns_roce_hw_v2_init_instance(handle);
>> +	if (ret) {
>> +		/* when reset notify type is HNAE3_INIT_CLIENT In reset notify
>> +		 * callback function, RoCE Engine reinitialize. If RoCE reinit
>> +		 * failed, we should inform NIC driver.
>> +		 */
>> +		handle->priv = NULL;
>> +		dev_err(&handle->pdev->dev,
>> +			"In reset process RoCE reinit failed %d.\n", ret);
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +static int hns_roce_hw_v2_reset_notify_uninit(struct hnae3_handle *handle)
>> +{
>> +	msleep(100);
>> +	hns_roce_hw_v2_uninit_instance(handle, false);
>> +	return 0;
>> +}
>> +
>> +static int hns_roce_hw_v2_reset_notify(struct hnae3_handle *handle,
>> +				       enum hnae3_reset_notify_type type)
>> +{
>> +	int ret = 0;
>> +
>> +	switch (type) {
>> +	case HNAE3_DOWN_CLIENT:
>> +		ret = hns_roce_hw_v2_reset_notify_down(handle);
>> +		break;
>> +	case HNAE3_INIT_CLIENT:
>> +		ret = hns_roce_hw_v2_reset_notify_init(handle);
>> +		break;
>> +	case HNAE3_UNINIT_CLIENT:
>> +		ret = hns_roce_hw_v2_reset_notify_uninit(handle);
>> +		break;
>> +	default:
>> +		break;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>>  static const struct hnae3_client_ops hns_roce_hw_v2_ops = {
>>  	.init_instance = hns_roce_hw_v2_init_instance,
>>  	.uninit_instance = hns_roce_hw_v2_uninit_instance,
>> +	.reset_notify = hns_roce_hw_v2_reset_notify,
>>  };
>>  
>>  static struct hnae3_client hns_roce_hw_v2_client = {
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
>> index 1b79a38..ac51372 100644
>> +++ b/drivers/infiniband/hw/hns/hns_roce_main.c
>> @@ -332,6 +332,9 @@ static struct ib_ucontext *hns_roce_alloc_ucontext(struct ib_device *ib_dev,
>>  	struct hns_roce_ib_alloc_ucontext_resp resp = {};
>>  	struct hns_roce_dev *hr_dev = to_hr_dev(ib_dev);
>>  
>> +	if (!hr_dev->active)
>> +		return ERR_PTR(-EAGAIN);
> This still doesn't make sense, ib_unregister_device already makes sure
> that hns_roce_alloc_ucontext isn't running and won't be called before
> returning, don't need another flag to do that.
>
> Since this is the only place the active flag is tested it can just be deleted
> entirely.
Hi, Jason
   
    roce reset process:
    1. hr_dev->active = false;  //make sure no any process call
ibv_open_device.   
    2. call ib_dispatch_event() function to report IB_EVENT_DEVICE_FATAL
event.
    3. msleep(100);   // for some app to free resources
    4. call ib_unregister_device().   
    5. ...
    6. ...

    There are 2 steps as above before calling ib_unregister_device(), we
evaluate
hr_dev->active with false to avoid no any process call ibv_open_device.
    Thanks     

    Regards
Wei Hu
>> @@ -425,6 +428,7 @@ static void hns_roce_unregister_device(struct hns_roce_dev *hr_dev)
>>  {
>>  	struct hns_roce_ib_iboe *iboe = &hr_dev->iboe;
>>  
>> +	hr_dev->active = false;
>>  	unregister_netdevice_notifier(&iboe->nb);
>>  	ib_unregister_device(&hr_dev->ib_dev);
> Jason
>
> .
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 rdma-next 4/4] RDMA/hns: Fix the illegal memory operation when cross page
  2018-05-24 21:40   ` Jason Gunthorpe
@ 2018-05-25  5:56       ` Wei Hu (Xavier)
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Hu (Xavier) @ 2018-05-25  5:56 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: dledford, linux-rdma, linux-kernel



On 2018/5/25 5:40, Jason Gunthorpe wrote:
> On Wed, May 23, 2018 at 06:16:30PM +0800, Wei Hu (Xavier) wrote:
>> This patch fixed the potential illegal operation when using the
>> extend sge buffer cross page in post send operation. The bug
>> will cause the calltrace.
> Should include the oops for reference here..
OK, we will fix it in v3.
Thanks
>> Reported-by: Jie Chen <chenjie103@huawei.com>
>> Reported-by: Xiping Zhang (Francis) <zhangxiping3@huawei.com>
>> Fixes: b1c158350968("RDMA/hns: Get rid of virt_to_page and vmap calls after dma_alloc_coherent")
>> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
>>
>> v1->v2: Modify the Fixes statement according to Leon's comment.
>>  drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 70 +++++++++++++++++++++---------
>>  drivers/infiniband/hw/hns/hns_roce_hw_v2.h |  1 +
>>  2 files changed, 51 insertions(+), 20 deletions(-)
>>
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> index a70d07b..62c1eb5 100644
>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> @@ -52,6 +52,52 @@ static void set_data_seg_v2(struct hns_roce_v2_wqe_data_seg *dseg,
>>  	dseg->len  = cpu_to_le32(sg->length);
>>  }
>>  
>> +static void set_extend_sge(struct hns_roce_qp *qp, struct ib_send_wr *wr,
>> +			   unsigned int *sge_ind)
>> +{
>> +	struct hns_roce_v2_wqe_data_seg *dseg;
>> +	struct ib_sge *sg;
>> +	int num_in_wqe = 0;
>> +	int extend_sge_num;
>> +	int fi_sge_num;
>> +	int se_sge_num;
>> +	int shift;
>> +	int i;
>> +
>> +	if (qp->ibqp.qp_type == IB_QPT_RC || qp->ibqp.qp_type == IB_QPT_UC)
>> +		num_in_wqe = HNS_ROCE_V2_UC_RC_SGE_NUM_IN_WQE;
>> +	extend_sge_num = wr->num_sge - num_in_wqe;
>> +	sg = wr->sg_list + num_in_wqe;
>> +	shift = qp->hr_buf.page_shift;
>> +
>> +	/*
>> +	 * Check whether wr->num_sge sges are in the same page. If not, we
>> +	 * should calculate how many sges in the first page and the second
>> +	 * page.
>> +	 */
>> +	dseg = get_send_extend_sge(qp, (*sge_ind) & (qp->sge.sge_cnt - 1));
>> +	fi_sge_num = (((((u64)dseg >> shift) + 1) << shift) - (u64)dseg) /
>> +		      sizeof(struct hns_roce_v2_wqe_data_seg);
> desg is a pointer.. that u64 should be a uinptr_t
>
> And it is better written as
>
>   (round_up((uintptr_t)dseg, 1 << shift) - (uintptr)desg)/sizeof(struct hns_roce_v2_wqe_data_seg)
>
> if I got it right..
Ok, we will fix it v3.
Thanks

Wei Hu
> Jason
>
> .
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 rdma-next 4/4] RDMA/hns: Fix the illegal memory operation when cross page
@ 2018-05-25  5:56       ` Wei Hu (Xavier)
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Hu (Xavier) @ 2018-05-25  5:56 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: dledford, linux-rdma, linux-kernel



On 2018/5/25 5:40, Jason Gunthorpe wrote:
> On Wed, May 23, 2018 at 06:16:30PM +0800, Wei Hu (Xavier) wrote:
>> This patch fixed the potential illegal operation when using the
>> extend sge buffer cross page in post send operation. The bug
>> will cause the calltrace.
> Should include the oops for reference here..
OK, we will fix it in v3.
Thanks
>> Reported-by: Jie Chen <chenjie103@huawei.com>
>> Reported-by: Xiping Zhang (Francis) <zhangxiping3@huawei.com>
>> Fixes: b1c158350968("RDMA/hns: Get rid of virt_to_page and vmap calls after dma_alloc_coherent")
>> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
>>
>> v1->v2: Modify the Fixes statement according to Leon's comment.
>>  drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 70 +++++++++++++++++++++---------
>>  drivers/infiniband/hw/hns/hns_roce_hw_v2.h |  1 +
>>  2 files changed, 51 insertions(+), 20 deletions(-)
>>
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> index a70d07b..62c1eb5 100644
>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> @@ -52,6 +52,52 @@ static void set_data_seg_v2(struct hns_roce_v2_wqe_data_seg *dseg,
>>  	dseg->len  = cpu_to_le32(sg->length);
>>  }
>>  
>> +static void set_extend_sge(struct hns_roce_qp *qp, struct ib_send_wr *wr,
>> +			   unsigned int *sge_ind)
>> +{
>> +	struct hns_roce_v2_wqe_data_seg *dseg;
>> +	struct ib_sge *sg;
>> +	int num_in_wqe = 0;
>> +	int extend_sge_num;
>> +	int fi_sge_num;
>> +	int se_sge_num;
>> +	int shift;
>> +	int i;
>> +
>> +	if (qp->ibqp.qp_type == IB_QPT_RC || qp->ibqp.qp_type == IB_QPT_UC)
>> +		num_in_wqe = HNS_ROCE_V2_UC_RC_SGE_NUM_IN_WQE;
>> +	extend_sge_num = wr->num_sge - num_in_wqe;
>> +	sg = wr->sg_list + num_in_wqe;
>> +	shift = qp->hr_buf.page_shift;
>> +
>> +	/*
>> +	 * Check whether wr->num_sge sges are in the same page. If not, we
>> +	 * should calculate how many sges in the first page and the second
>> +	 * page.
>> +	 */
>> +	dseg = get_send_extend_sge(qp, (*sge_ind) & (qp->sge.sge_cnt - 1));
>> +	fi_sge_num = (((((u64)dseg >> shift) + 1) << shift) - (u64)dseg) /
>> +		      sizeof(struct hns_roce_v2_wqe_data_seg);
> desg is a pointer.. that u64 should be a uinptr_t
>
> And it is better written as
>
>   (round_up((uintptr_t)dseg, 1 << shift) - (uintptr)desg)/sizeof(struct hns_roce_v2_wqe_data_seg)
>
> if I got it right..
Ok, we will fix it v3.
Thanks

Wei Hu
> Jason
>
> .
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 rdma-next 3/4] RDMA/hns: Add reset process for RoCE in hip08
  2018-05-25  5:54       ` Wei Hu (Xavier)
  (?)
@ 2018-05-25 14:55       ` Jason Gunthorpe
  2018-05-26  1:47           ` Wei Hu (Xavier)
  -1 siblings, 1 reply; 23+ messages in thread
From: Jason Gunthorpe @ 2018-05-25 14:55 UTC (permalink / raw)
  To: Wei Hu (Xavier); +Cc: dledford, linux-rdma, linux-kernel

On Fri, May 25, 2018 at 01:54:31PM +0800, Wei Hu (Xavier) wrote:
> 
> 
> On 2018/5/25 5:31, Jason Gunthorpe wrote:
> >>  static const struct hnae3_client_ops hns_roce_hw_v2_ops = {
> >>  	.init_instance = hns_roce_hw_v2_init_instance,
> >>  	.uninit_instance = hns_roce_hw_v2_uninit_instance,
> >> +	.reset_notify = hns_roce_hw_v2_reset_notify,
> >>  };
> >>  
> >>  static struct hnae3_client hns_roce_hw_v2_client = {
> >> diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
> >> index 1b79a38..ac51372 100644
> >> +++ b/drivers/infiniband/hw/hns/hns_roce_main.c
> >> @@ -332,6 +332,9 @@ static struct ib_ucontext *hns_roce_alloc_ucontext(struct ib_device *ib_dev,
> >>  	struct hns_roce_ib_alloc_ucontext_resp resp = {};
> >>  	struct hns_roce_dev *hr_dev = to_hr_dev(ib_dev);
> >>  
> >> +	if (!hr_dev->active)
> >> +		return ERR_PTR(-EAGAIN);
> > This still doesn't make sense, ib_unregister_device already makes sure
> > that hns_roce_alloc_ucontext isn't running and won't be called before
> > returning, don't need another flag to do that.
> >
> > Since this is the only place the active flag is tested it can just be deleted
> > entirely.
> Hi, Jason
>    
>     roce reset process:
>     1. hr_dev->active = false;  //make sure no any process call
> ibv_open_device.   
>     2. call ib_dispatch_event() function to report IB_EVENT_DEVICE_FATAL
> event.
>     3. msleep(100);   // for some app to free resources
>     4. call ib_unregister_device().   
>     5. ...
>     6. ...
> 
>     There are 2 steps as above before calling ib_unregister_device(), we
> evaluate
> hr_dev->active with false to avoid no any process call
> ibv_open_device.

If you think this is the right flow then it is core issue to block new
opens, not an individual driver issue, send a core patch - eg add a
'ib_driver_fatal_error()' call that does the dispatch and call it from
all the drivers using this IB_EVENT_DEVICE_FATAL

I'm not completely sure this makes sense though, it might be better
for the core code to force stuff a IB_EVENT_DEVICE_FATAL to contexts
that open after the fatal event..

Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 rdma-next 3/4] RDMA/hns: Add reset process for RoCE in hip08
  2018-05-25 14:55       ` Jason Gunthorpe
@ 2018-05-26  1:47           ` Wei Hu (Xavier)
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Hu (Xavier) @ 2018-05-26  1:47 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: dledford, linux-rdma, linux-kernel



On 2018/5/25 22:55, Jason Gunthorpe wrote:
> On Fri, May 25, 2018 at 01:54:31PM +0800, Wei Hu (Xavier) wrote:
>>
>> On 2018/5/25 5:31, Jason Gunthorpe wrote:
>>>>  static const struct hnae3_client_ops hns_roce_hw_v2_ops = {
>>>>  	.init_instance = hns_roce_hw_v2_init_instance,
>>>>  	.uninit_instance = hns_roce_hw_v2_uninit_instance,
>>>> +	.reset_notify = hns_roce_hw_v2_reset_notify,
>>>>  };
>>>>  
>>>>  static struct hnae3_client hns_roce_hw_v2_client = {
>>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
>>>> index 1b79a38..ac51372 100644
>>>> +++ b/drivers/infiniband/hw/hns/hns_roce_main.c
>>>> @@ -332,6 +332,9 @@ static struct ib_ucontext *hns_roce_alloc_ucontext(struct ib_device *ib_dev,
>>>>  	struct hns_roce_ib_alloc_ucontext_resp resp = {};
>>>>  	struct hns_roce_dev *hr_dev = to_hr_dev(ib_dev);
>>>>  
>>>> +	if (!hr_dev->active)
>>>> +		return ERR_PTR(-EAGAIN);
>>> This still doesn't make sense, ib_unregister_device already makes sure
>>> that hns_roce_alloc_ucontext isn't running and won't be called before
>>> returning, don't need another flag to do that.
>>>
>>> Since this is the only place the active flag is tested it can just be deleted
>>> entirely.
>> Hi, Jason
>>    
>>     roce reset process:
>>     1. hr_dev->active = false;  //make sure no any process call
>> ibv_open_device.   
>>     2. call ib_dispatch_event() function to report IB_EVENT_DEVICE_FATAL
>> event.
>>     3. msleep(100);   // for some app to free resources
>>     4. call ib_unregister_device().   
>>     5. ...
>>     6. ...
>>
>>     There are 2 steps as above before calling ib_unregister_device(), we
>> evaluate
>> hr_dev->active with false to avoid no any process call
>> ibv_open_device.
> If you think this is the right flow then it is core issue to block new
> opens, not an individual driver issue, send a core patch - eg add a
> 'ib_driver_fatal_error()' call that does the dispatch and call it from
> all the drivers using this IB_EVENT_DEVICE_FATAL
Hi, Jason

It seem to be no difference between calling ib_driver_fatal_error and
calling ib_dispatch_event  directly in manufacturer driver.

void ib_driver_fatal_error(struct ib_device *ib_dev, u8 port_num)
 {
       struct ib_event event;
                
  event.event = IB_EVENT_DEVICE_FATAL;
    event.device = ib_dev;
    event.element.port_num = port_num;
    ib_dispatch_event(&event);
}

    Regards
Wei Hu
> I'm not completely sure this makes sense though, it might be better
> for the core code to force stuff a IB_EVENT_DEVICE_FATAL to contexts
> that open after the fatal event..
>
> Jason
>
> .
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 rdma-next 3/4] RDMA/hns: Add reset process for RoCE in hip08
@ 2018-05-26  1:47           ` Wei Hu (Xavier)
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Hu (Xavier) @ 2018-05-26  1:47 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: dledford, linux-rdma, linux-kernel



On 2018/5/25 22:55, Jason Gunthorpe wrote:
> On Fri, May 25, 2018 at 01:54:31PM +0800, Wei Hu (Xavier) wrote:
>>
>> On 2018/5/25 5:31, Jason Gunthorpe wrote:
>>>>  static const struct hnae3_client_ops hns_roce_hw_v2_ops = {
>>>>  	.init_instance = hns_roce_hw_v2_init_instance,
>>>>  	.uninit_instance = hns_roce_hw_v2_uninit_instance,
>>>> +	.reset_notify = hns_roce_hw_v2_reset_notify,
>>>>  };
>>>>  
>>>>  static struct hnae3_client hns_roce_hw_v2_client = {
>>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
>>>> index 1b79a38..ac51372 100644
>>>> +++ b/drivers/infiniband/hw/hns/hns_roce_main.c
>>>> @@ -332,6 +332,9 @@ static struct ib_ucontext *hns_roce_alloc_ucontext(struct ib_device *ib_dev,
>>>>  	struct hns_roce_ib_alloc_ucontext_resp resp = {};
>>>>  	struct hns_roce_dev *hr_dev = to_hr_dev(ib_dev);
>>>>  
>>>> +	if (!hr_dev->active)
>>>> +		return ERR_PTR(-EAGAIN);
>>> This still doesn't make sense, ib_unregister_device already makes sure
>>> that hns_roce_alloc_ucontext isn't running and won't be called before
>>> returning, don't need another flag to do that.
>>>
>>> Since this is the only place the active flag is tested it can just be deleted
>>> entirely.
>> Hi, Jason
>>    
>>     roce reset process:
>>     1. hr_dev->active = false;  //make sure no any process call
>> ibv_open_device.   
>>     2. call ib_dispatch_event() function to report IB_EVENT_DEVICE_FATAL
>> event.
>>     3. msleep(100);   // for some app to free resources
>>     4. call ib_unregister_device().   
>>     5. ...
>>     6. ...
>>
>>     There are 2 steps as above before calling ib_unregister_device(), we
>> evaluate
>> hr_dev->active with false to avoid no any process call
>> ibv_open_device.
> If you think this is the right flow then it is core issue to block new
> opens, not an individual driver issue, send a core patch - eg add a
> 'ib_driver_fatal_error()' call that does the dispatch and call it from
> all the drivers using this IB_EVENT_DEVICE_FATAL
Hi, Jason

It seem to be no difference between calling ib_driver_fatal_error and
calling ib_dispatch_event  directly in manufacturer driver.

void ib_driver_fatal_error(struct ib_device *ib_dev, u8 port_num)
 {
       struct ib_event event;
                
  event.event = IB_EVENT_DEVICE_FATAL;
    event.device = ib_dev;
    event.element.port_num = port_num;
    ib_dispatch_event(&event);
}

    Regards
Wei Hu
> I'm not completely sure this makes sense though, it might be better
> for the core code to force stuff a IB_EVENT_DEVICE_FATAL to contexts
> that open after the fatal event..
>
> Jason
>
> .
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 rdma-next 0/4] Misc update for hns driver
  2018-05-24 21:43 ` [PATCH V2 rdma-next 0/4] Misc update for hns driver Jason Gunthorpe
@ 2018-05-26  7:40     ` Wei Hu (Xavier)
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Hu (Xavier) @ 2018-05-26  7:40 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: dledford, linux-rdma, linux-kernel



On 2018/5/25 5:43, Jason Gunthorpe wrote:
> On Wed, May 23, 2018 at 06:16:26PM +0800, Wei Hu (Xavier) wrote:
>> This patchset included fixing bug, some optimization, reset
>> process for hns driver.
>>
>> v1->v2: 1.pull the patch out of the series. The name of the patch is 
>> 	  RDMA/hns: Implement the disassociate_ucontext API.
>> 	2.Modify the fixes statement according to Leon's comment.
>> 	3.Address the Jason's comment in reset relevant patch.
>>
>> Wei Hu (Xavier) (4):
>>   RDMA/hns: Modify uar allocation algorithm to avoid bitmap exhaust
>>   RDMA/hns: Increase checking CMQ status timeout value
> I took these two to for-next , the others have to respun.
Hi, Jason
   
    I will send V3 for these two patches and about disassociate_ucontext
the according to your comments.
    By the way,  About this patch as follows will still be in this
series, and you and other experts can continue giving comment. Thanks
             RDMA/hns: Add reset process for RoCE in hip08for these two
patches and the refactored patches

    Regards
Wei Hu
>
>>   RDMA/hns: Add reset process for RoCE in hip08
>>   RDMA/hns: Fix the illegal memory operation when cross page
> Thanks,
> Jason
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 rdma-next 0/4] Misc update for hns driver
@ 2018-05-26  7:40     ` Wei Hu (Xavier)
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Hu (Xavier) @ 2018-05-26  7:40 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: dledford, linux-rdma, linux-kernel



On 2018/5/25 5:43, Jason Gunthorpe wrote:
> On Wed, May 23, 2018 at 06:16:26PM +0800, Wei Hu (Xavier) wrote:
>> This patchset included fixing bug, some optimization, reset
>> process for hns driver.
>>
>> v1->v2: 1.pull the patch out of the series. The name of the patch is 
>> 	  RDMA/hns: Implement the disassociate_ucontext API.
>> 	2.Modify the fixes statement according to Leon's comment.
>> 	3.Address the Jason's comment in reset relevant patch.
>>
>> Wei Hu (Xavier) (4):
>>   RDMA/hns: Modify uar allocation algorithm to avoid bitmap exhaust
>>   RDMA/hns: Increase checking CMQ status timeout value
> I took these two to for-next , the others have to respun.
Hi, Jason
   
    I will send V3 for these two patches and about disassociate_ucontext
the according to your comments.
    By the way,  About this patch as follows will still be in this
series, and you and other experts can continue giving comment. Thanks
             RDMA/hns: Add reset process for RoCE in hip08for these two
patches and the refactored patches

    Regards
Wei Hu
>
>>   RDMA/hns: Add reset process for RoCE in hip08
>>   RDMA/hns: Fix the illegal memory operation when cross page
> Thanks,
> Jason
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 rdma-next 3/4] RDMA/hns: Add reset process for RoCE in hip08
  2018-05-26  1:47           ` Wei Hu (Xavier)
  (?)
@ 2018-05-28 16:46           ` Jason Gunthorpe
  -1 siblings, 0 replies; 23+ messages in thread
From: Jason Gunthorpe @ 2018-05-28 16:46 UTC (permalink / raw)
  To: Wei Hu (Xavier); +Cc: dledford, linux-rdma, linux-kernel

On Sat, May 26, 2018 at 09:47:43AM +0800, Wei Hu (Xavier) wrote:
> 
> 
> On 2018/5/25 22:55, Jason Gunthorpe wrote:
> > On Fri, May 25, 2018 at 01:54:31PM +0800, Wei Hu (Xavier) wrote:
> >>
> >> On 2018/5/25 5:31, Jason Gunthorpe wrote:
> >>>>  static const struct hnae3_client_ops hns_roce_hw_v2_ops = {
> >>>>  	.init_instance = hns_roce_hw_v2_init_instance,
> >>>>  	.uninit_instance = hns_roce_hw_v2_uninit_instance,
> >>>> +	.reset_notify = hns_roce_hw_v2_reset_notify,
> >>>>  };
> >>>>  
> >>>>  static struct hnae3_client hns_roce_hw_v2_client = {
> >>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
> >>>> index 1b79a38..ac51372 100644
> >>>> +++ b/drivers/infiniband/hw/hns/hns_roce_main.c
> >>>> @@ -332,6 +332,9 @@ static struct ib_ucontext *hns_roce_alloc_ucontext(struct ib_device *ib_dev,
> >>>>  	struct hns_roce_ib_alloc_ucontext_resp resp = {};
> >>>>  	struct hns_roce_dev *hr_dev = to_hr_dev(ib_dev);
> >>>>  
> >>>> +	if (!hr_dev->active)
> >>>> +		return ERR_PTR(-EAGAIN);
> >>> This still doesn't make sense, ib_unregister_device already makes sure
> >>> that hns_roce_alloc_ucontext isn't running and won't be called before
> >>> returning, don't need another flag to do that.
> >>>
> >>> Since this is the only place the active flag is tested it can just be deleted
> >>> entirely.
> >> Hi, Jason
> >>    
> >>     roce reset process:
> >>     1. hr_dev->active = false;  //make sure no any process call
> >> ibv_open_device.   
> >>     2. call ib_dispatch_event() function to report IB_EVENT_DEVICE_FATAL
> >> event.
> >>     3. msleep(100);   // for some app to free resources
> >>     4. call ib_unregister_device().   
> >>     5. ...
> >>     6. ...
> >>
> >>     There are 2 steps as above before calling ib_unregister_device(), we
> >> evaluate
> >> hr_dev->active with false to avoid no any process call
> >> ibv_open_device.
> > If you think this is the right flow then it is core issue to block new
> > opens, not an individual driver issue, send a core patch - eg add a
> > 'ib_driver_fatal_error()' call that does the dispatch and call it from
> > all the drivers using this IB_EVENT_DEVICE_FATAL
> Hi, Jason
> 
> It seem to be no difference between calling ib_driver_fatal_error and
> calling ib_dispatch_event  directly in manufacturer driver.
> 
> void ib_driver_fatal_error(struct ib_device *ib_dev, u8 port_num)
>  {
>        struct ib_event event;
>                 
>   event.event = IB_EVENT_DEVICE_FATAL;
>     event.device = ib_dev;
>     event.element.port_num = port_num;
>     ib_dispatch_event(&event);
> }

My point was the core code should block calling
hns_roce_alloc_ucontext after DEVICE_FATAL if we agree that is
correct, the driver shouldn't be doing it.

Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-05-28 16:46 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-23 10:16 [PATCH V2 rdma-next 0/4] Misc update for hns driver Wei Hu (Xavier)
2018-05-23 10:16 ` Wei Hu (Xavier)
2018-05-23 10:16 ` [PATCH V2 rdma-next 1/4] RDMA/hns: Modify uar allocation algorithm to avoid bitmap exhaust Wei Hu (Xavier)
2018-05-23 10:16   ` Wei Hu (Xavier)
2018-05-23 10:16 ` [PATCH V2 rdma-next 2/4] RDMA/hns: Increase checking CMQ status timeout value Wei Hu (Xavier)
2018-05-23 10:16   ` Wei Hu (Xavier)
2018-05-23 10:16 ` [PATCH V2 rdma-next 3/4] RDMA/hns: Add reset process for RoCE in hip08 Wei Hu (Xavier)
2018-05-23 10:16   ` Wei Hu (Xavier)
2018-05-24 21:31   ` Jason Gunthorpe
2018-05-25  5:54     ` Wei Hu (Xavier)
2018-05-25  5:54       ` Wei Hu (Xavier)
2018-05-25 14:55       ` Jason Gunthorpe
2018-05-26  1:47         ` Wei Hu (Xavier)
2018-05-26  1:47           ` Wei Hu (Xavier)
2018-05-28 16:46           ` Jason Gunthorpe
2018-05-23 10:16 ` [PATCH V2 rdma-next 4/4] RDMA/hns: Fix the illegal memory operation when cross page Wei Hu (Xavier)
2018-05-23 10:16   ` Wei Hu (Xavier)
2018-05-24 21:40   ` Jason Gunthorpe
2018-05-25  5:56     ` Wei Hu (Xavier)
2018-05-25  5:56       ` Wei Hu (Xavier)
2018-05-24 21:43 ` [PATCH V2 rdma-next 0/4] Misc update for hns driver Jason Gunthorpe
2018-05-26  7:40   ` Wei Hu (Xavier)
2018-05-26  7:40     ` Wei Hu (Xavier)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.