linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for-next] RDMA/hns: Restore mr status when rereg mr fails
@ 2021-08-19  1:53 Wenpeng Liang
  2021-08-19  1:53 ` [PATCH v2 for-next] RDMA/hns: Solve the problem that dma_pool is used during the reset Wenpeng Liang
  2021-08-19  2:16 ` [PATCH for-next] RDMA/hns: Restore mr status when rereg mr fails Wenpeng Liang
  0 siblings, 2 replies; 4+ messages in thread
From: Wenpeng Liang @ 2021-08-19  1:53 UTC (permalink / raw)
  To: dledford, jgg; +Cc: linux-rdma, linuxarm, liangwenpeng

From: Yangyang Li <liyangyang20@huawei.com>

If rereg_user_mr fails, mr needs to stay in a safe state. The mr state is
backed up when entering the rereg_user_mr function, and the mr state is
restored when the rereg fails.

Fixes: 4e9fc1dae2a9 ("RDMA/hns: Optimize the MR registration process")
Reported-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
---
 drivers/infiniband/hw/hns/hns_roce_mr.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
index 006c84b..d0870b4 100644
--- a/drivers/infiniband/hw/hns/hns_roce_mr.c
+++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
@@ -285,6 +285,27 @@ struct ib_mr *hns_roce_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 	return ERR_PTR(ret);
 }
 
+static void copy_mr(struct hns_roce_mr *dst, struct hns_roce_mr *src)
+{
+	dst->enabled = src->enabled;
+	dst->iova = src->iova;
+	dst->size = src->size;
+	dst->pd = src->pd;
+	dst->access = src->access;
+}
+
+static void store_rereg_mr(struct hns_roce_mr *mr_bak,
+				  struct hns_roce_mr *mr)
+{
+	copy_mr(mr_bak, mr);
+}
+
+static void restore_rereg_mr(struct hns_roce_mr *mr_bak,
+				    struct hns_roce_mr *mr)
+{
+	copy_mr(mr, mr_bak);
+}
+
 struct ib_mr *hns_roce_rereg_user_mr(struct ib_mr *ibmr, int flags, u64 start,
 				     u64 length, u64 virt_addr,
 				     int mr_access_flags, struct ib_pd *pd,
@@ -294,9 +315,12 @@ struct ib_mr *hns_roce_rereg_user_mr(struct ib_mr *ibmr, int flags, u64 start,
 	struct ib_device *ib_dev = &hr_dev->ib_dev;
 	struct hns_roce_mr *mr = to_hr_mr(ibmr);
 	struct hns_roce_cmd_mailbox *mailbox;
+	struct hns_roce_mr mr_bak;
 	unsigned long mtpt_idx;
 	int ret;
 
+	store_rereg_mr(&mr_bak, mr);
+
 	if (!mr->enabled)
 		return ERR_PTR(-EINVAL);
 
@@ -349,7 +373,11 @@ struct ib_mr *hns_roce_rereg_user_mr(struct ib_mr *ibmr, int flags, u64 start,
 
 	mr->enabled = 1;
 
+	hns_roce_free_cmd_mailbox(hr_dev, mailbox);
+	return NULL;
+
 free_cmd_mbox:
+	restore_rereg_mr(&mr_bak, mr);
 	hns_roce_free_cmd_mailbox(hr_dev, mailbox);
 
 	return ERR_PTR(ret);
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2 for-next] RDMA/hns: Solve the problem that dma_pool is used during the reset
  2021-08-19  1:53 [PATCH for-next] RDMA/hns: Restore mr status when rereg mr fails Wenpeng Liang
@ 2021-08-19  1:53 ` Wenpeng Liang
  2021-08-19  2:16   ` Wenpeng Liang
  2021-08-19  2:16 ` [PATCH for-next] RDMA/hns: Restore mr status when rereg mr fails Wenpeng Liang
  1 sibling, 1 reply; 4+ messages in thread
From: Wenpeng Liang @ 2021-08-19  1:53 UTC (permalink / raw)
  To: dledford, jgg; +Cc: linux-rdma, linuxarm, liangwenpeng

From: Lang Cheng <chenglang@huawei.com>

During the reset, the driver calls dma_pool_destroy() to release the
dma_pool resources. If the dma_pool_free interface is called during the
modify_qp operation, an exception will occur.

[15834.440744] Unable to handle kernel paging request at virtual address
ffffa2cfc7725678
...
[15834.660596] Call trace:
[15834.663033]  queued_spin_lock_slowpath+0x224/0x308
[15834.667802]  _raw_spin_lock_irqsave+0x78/0x88
[15834.672140]  dma_pool_free+0x34/0x118
[15834.675799]  hns_roce_free_cmd_mailbox+0x54/0x88 [hns_roce_hw_v2]
[15834.681872]  hns_roce_v2_qp_modify.isra.57+0xcc/0x120 [hns_roce_hw_v2]
[15834.688376]  hns_roce_v2_modify_qp+0x4d4/0x1ef8 [hns_roce_hw_v2]
[15834.694362]  hns_roce_modify_qp+0x214/0x5a8 [hns_roce_hw_v2]
[15834.699996]  _ib_modify_qp+0xf0/0x308
[15834.703642]  ib_modify_qp+0x38/0x48
[15834.707118]  rt_ktest_modify_qp+0x14c/0x998 [rdma_test]
...
[15837.269216] Unable to handle kernel paging request at virtual address
000197c995a1d1b4
...
[15837.480898] Call trace:
[15837.483335]  __free_pages+0x28/0x78
[15837.486807]  dma_direct_free_pages+0xa0/0xe8
[15837.491058]  dma_direct_free+0x48/0x60
[15837.494790]  dma_free_attrs+0xa4/0xe8
[15837.498449]  hns_roce_buf_free+0xb0/0x150 [hns_roce_hw_v2]
[15837.503918]  mtr_free_bufs.isra.1+0x88/0xc0 [hns_roce_hw_v2]
[15837.509558]  hns_roce_mtr_destroy+0x60/0x80 [hns_roce_hw_v2]
[15837.515198]  hns_roce_v2_cleanup_eq_table+0x1d0/0x2a0 [hns_roce_hw_v2]
[15837.521701]  hns_roce_exit+0x108/0x1e0 [hns_roce_hw_v2]
[15837.526908]  __hns_roce_hw_v2_uninit_instance.isra.75+0x70/0xb8 [hns_roce_hw_v2]
[15837.534276]  hns_roce_hw_v2_uninit_instance+0x64/0x80 [hns_roce_hw_v2]
[15837.540786]  hclge_uninit_client_instance+0xe8/0x1e8 [hclge]
[15837.546419]  hnae3_uninit_client_instance+0xc4/0x118 [hnae3]
[15837.552052]  hnae3_unregister_client+0x16c/0x1f0 [hnae3]
[15837.557346]  hns_roce_hw_v2_exit+0x34/0x50 [hns_roce_hw_v2]
[15837.562895]  __arm64_sys_delete_module+0x208/0x268
[15837.567665]  el0_svc_common.constprop.4+0x110/0x200
[15837.572520]  do_el0_svc+0x34/0x98
[15837.575821]  el0_svc+0x14/0x40
[15837.578862]  el0_sync_handler+0xb0/0x2d0
[15837.582766]  el0_sync+0x140/0x180

It is caused by two concurrent processes:
	uninit_instance->dma_pool_destroy(cmdq)
	modify_qp->dma_poll_free(cmdq)

Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
---
 drivers/infiniband/hw/hns/hns_roce_cmd.c    | 12 ++++++++++++
 drivers/infiniband/hw/hns/hns_roce_device.h |  1 +
 2 files changed, 13 insertions(+)

diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.c b/drivers/infiniband/hw/hns/hns_roce_cmd.c
index 8f68cc3..3dfb97a 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cmd.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cmd.c
@@ -198,12 +198,16 @@ int hns_roce_cmd_init(struct hns_roce_dev *hr_dev)
 	if (!hr_dev->cmd.pool)
 		return -ENOMEM;

+	init_rwsem(&hr_dev->cmd.mb_rwsem);
+
 	return 0;
 }

 void hns_roce_cmd_cleanup(struct hns_roce_dev *hr_dev)
 {
+	down_write(&hr_dev->cmd.mb_rwsem);
 	dma_pool_destroy(hr_dev->cmd.pool);
+	up_write(&hr_dev->cmd.mb_rwsem);
 }

 int hns_roce_cmd_use_events(struct hns_roce_dev *hr_dev)
@@ -237,8 +241,10 @@ void hns_roce_cmd_use_polling(struct hns_roce_dev *hr_dev)
 {
 	struct hns_roce_cmdq *hr_cmd = &hr_dev->cmd;

+	down_write(&hr_dev->cmd.mb_rwsem);
 	kfree(hr_cmd->context);
 	hr_cmd->use_events = 0;
+	up_write(&hr_dev->cmd.mb_rwsem);

 	up(&hr_cmd->poll_sem);
 }
@@ -252,9 +258,12 @@ hns_roce_alloc_cmd_mailbox(struct hns_roce_dev *hr_dev)
 	if (!mailbox)
 		return ERR_PTR(-ENOMEM);

+	down_read(&hr_dev->cmd.mb_rwsem);
 	mailbox->buf =
 		dma_pool_alloc(hr_dev->cmd.pool, GFP_KERNEL, &mailbox->dma);
 	if (!mailbox->buf) {
+		up_read(&hr_dev->cmd.mb_rwsem);
+
 		kfree(mailbox);
 		return ERR_PTR(-ENOMEM);
 	}
@@ -269,5 +278,8 @@ void hns_roce_free_cmd_mailbox(struct hns_roce_dev *hr_dev,
 		return;

 	dma_pool_free(hr_dev->cmd.pool, mailbox->buf, mailbox->dma);
+
+	up_read(&hr_dev->cmd.mb_rwsem);
+
 	kfree(mailbox);
 }
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 0c3eb11..90d8ef8 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -571,6 +571,7 @@ struct hns_roce_cmdq {
 	 * close device, switch into poll mode(non event mode)
 	 */
 	u8			use_events;
+	struct rw_semaphore	mb_rwsem;
 };

 struct hns_roce_cmd_mailbox {
--
2.8.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH for-next] RDMA/hns: Restore mr status when rereg mr fails
  2021-08-19  1:53 [PATCH for-next] RDMA/hns: Restore mr status when rereg mr fails Wenpeng Liang
  2021-08-19  1:53 ` [PATCH v2 for-next] RDMA/hns: Solve the problem that dma_pool is used during the reset Wenpeng Liang
@ 2021-08-19  2:16 ` Wenpeng Liang
  1 sibling, 0 replies; 4+ messages in thread
From: Wenpeng Liang @ 2021-08-19  2:16 UTC (permalink / raw)
  To: dledford, jgg; +Cc: linux-rdma, linuxarm

Please ignore this patch.
I mistakenly sent two unrelated patches together,
and I will send the two patches separately

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 for-next] RDMA/hns: Solve the problem that dma_pool is used during the reset
  2021-08-19  1:53 ` [PATCH v2 for-next] RDMA/hns: Solve the problem that dma_pool is used during the reset Wenpeng Liang
@ 2021-08-19  2:16   ` Wenpeng Liang
  0 siblings, 0 replies; 4+ messages in thread
From: Wenpeng Liang @ 2021-08-19  2:16 UTC (permalink / raw)
  To: dledford, jgg; +Cc: linux-rdma, linuxarm

Please ignore this patch.
I mistakenly sent two unrelated patches together,
and I will send the two patches separately.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-08-19  2:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-19  1:53 [PATCH for-next] RDMA/hns: Restore mr status when rereg mr fails Wenpeng Liang
2021-08-19  1:53 ` [PATCH v2 for-next] RDMA/hns: Solve the problem that dma_pool is used during the reset Wenpeng Liang
2021-08-19  2:16   ` Wenpeng Liang
2021-08-19  2:16 ` [PATCH for-next] RDMA/hns: Restore mr status when rereg mr fails Wenpeng Liang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).