[PATCH for-next 0/9] Bugfixes for 5.3-rc2

linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH for-next 0/9] Bugfixes for 5.3-rc2
@ 2019-08-09  9:40 Lijun Ou
  2019-08-09  9:40 ` [PATCH for-next 1/9] RDMA/hns: Logic optimization of wc_flags Lijun Ou
                   ` (9 more replies)
  0 siblings, 10 replies; 24+ messages in thread
From: Lijun Ou @ 2019-08-09  9:40 UTC (permalink / raw)
  To: dledford, jgg; +Cc: leon, linux-rdma, linuxarm

Here fixes some bugs for hip08

Lang Cheng (1):
  RDMA/hns: Remove unuseful member

Lijun Ou (2):
  RDMA/hns: Bugfix for creating qp attached to srq
  RDMA/hns: Copy some information of AV to user

Weihang Li (1):
  RDMA/hns: Logic optimization of wc_flags

Xi Wang (2):
  RDMA/hns: Bugfix for slab-out-of-bounds when unloading hip08 driver
  RDMA/hns: bugfix for slab-out-of-bounds when loading hip08 driver

Yangyang Li (3):
  RDMA/hns: Completely release qp resources when hw err
  RDMA/hns: Modify pi vlaue when cq overflows
  RDMA/hns: Kernel notify usr space to stop ring db

 drivers/infiniband/hw/hns/hns_roce_ah.c     | 22 ++++++--
 drivers/infiniband/hw/hns/hns_roce_cmd.c    |  1 -
 drivers/infiniband/hw/hns/hns_roce_device.h |  8 ++-
 drivers/infiniband/hw/hns/hns_roce_hem.c    | 19 ++++---
 drivers/infiniband/hw/hns/hns_roce_hw_v2.c  | 78 +++++++++++++++++++++++------
 drivers/infiniband/hw/hns/hns_roce_hw_v2.h  |  4 ++
 drivers/infiniband/hw/hns/hns_roce_main.c   | 29 +++++++----
 drivers/infiniband/hw/hns/hns_roce_qp.c     | 25 ++++++---
 include/uapi/rdma/hns-abi.h                 |  7 +++
 9 files changed, 148 insertions(+), 45 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH for-next 1/9] RDMA/hns: Logic optimization of wc_flags
  2019-08-09  9:40 [PATCH for-next 0/9] Bugfixes for 5.3-rc2 Lijun Ou
@ 2019-08-09  9:40 ` Lijun Ou
  2019-08-09  9:40 ` [PATCH for-next 2/9] RDMA/hns: Bugfix for creating qp attached to srq Lijun Ou
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Lijun Ou @ 2019-08-09  9:40 UTC (permalink / raw)
  To: dledford, jgg; +Cc: leon, linux-rdma, linuxarm

From: Weihang Li <liweihang@hisilicon.com>

We should set IB_WC_WITH_VLAN only when VLAN is enabled.
In addition, move setting of IB_WC_WITH_SMAC below
setting of wc->smac.

Signed-off-by: Weihang Li <liweihang@hisilicon.com>
---
 drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index 87a1574..7a14f0b 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -2860,15 +2860,16 @@ static int hns_roce_v2_poll_one(struct hns_roce_cq *hr_cq,
 		wc->smac[5] = roce_get_field(cqe->byte_28,
 					     V2_CQE_BYTE_28_SMAC_5_M,
 					     V2_CQE_BYTE_28_SMAC_5_S);
+		wc->wc_flags |= IB_WC_WITH_SMAC;
 		if (roce_get_bit(cqe->byte_28, V2_CQE_BYTE_28_VID_VLD_S)) {
 			wc->vlan_id = (u16)roce_get_field(cqe->byte_28,
 							  V2_CQE_BYTE_28_VID_M,
 							  V2_CQE_BYTE_28_VID_S);
+			wc->wc_flags |= IB_WC_WITH_VLAN;
 		} else {
 			wc->vlan_id = 0xffff;
 		}
 
-		wc->wc_flags |= (IB_WC_WITH_VLAN | IB_WC_WITH_SMAC);
 		wc->network_hdr_type = roce_get_field(cqe->byte_28,
 						    V2_CQE_BYTE_28_PORT_TYPE_M,
 						    V2_CQE_BYTE_28_PORT_TYPE_S);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-next 2/9] RDMA/hns: Bugfix for creating qp attached to srq
  2019-08-09  9:40 [PATCH for-next 0/9] Bugfixes for 5.3-rc2 Lijun Ou
  2019-08-09  9:40 ` [PATCH for-next 1/9] RDMA/hns: Logic optimization of wc_flags Lijun Ou
@ 2019-08-09  9:40 ` Lijun Ou
  2019-08-12 15:29   ` Doug Ledford
  2019-08-09  9:41 ` [PATCH for-next 3/9] RDMA/hns: Completely release qp resources when hw err Lijun Ou
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Lijun Ou @ 2019-08-09  9:40 UTC (permalink / raw)
  To: dledford, jgg; +Cc: leon, linux-rdma, linuxarm

When create a qp and attached to srq, rq will no longer be used
and the members of rq will be set zero. As a result, the wrid
of rq will not be allocated and used.

Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC")
Signed-off-by: Lijun Ou <oulijun@huawei.com>
---
 drivers/infiniband/hw/hns/hns_roce_qp.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_qp.c b/drivers/infiniband/hw/hns/hns_roce_qp.c
index b729f8e..5f23c09 100644
--- a/drivers/infiniband/hw/hns/hns_roce_qp.c
+++ b/drivers/infiniband/hw/hns/hns_roce_qp.c
@@ -853,12 +853,19 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev,
 		}
 
 		hr_qp->sq.wrid = kcalloc(hr_qp->sq.wqe_cnt, sizeof(u64),
-					 GFP_KERNEL);
-		hr_qp->rq.wrid = kcalloc(hr_qp->rq.wqe_cnt, sizeof(u64),
-					 GFP_KERNEL);
-		if (!hr_qp->sq.wrid || !hr_qp->rq.wrid) {
+					       GFP_KERNEL);
+		if (ZERO_OR_NULL_PTR(hr_qp->sq.wrid)) {
 			ret = -ENOMEM;
-			goto err_wrid;
+			goto err_get_bufs;
+		}
+
+		if (hr_qp->rq.wqe_cnt) {
+			hr_qp->rq.wrid = kcalloc(hr_qp->rq.wqe_cnt, sizeof(u64),
+						 GFP_KERNEL);
+			if (ZERO_OR_NULL_PTR(hr_qp->rq.wrid)) {
+				ret = -ENOMEM;
+				goto err_sq_wrid;
+			}
 		}
 	}
 
@@ -944,8 +951,8 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev,
 		    hns_roce_qp_has_rq(init_attr))
 			hns_roce_db_unmap_user(uctx, &hr_qp->rdb);
 	} else {
-		kfree(hr_qp->sq.wrid);
-		kfree(hr_qp->rq.wrid);
+		if (hr_qp->rq.wqe_cnt)
+			kfree(hr_qp->rq.wrid);
 	}
 
 err_sq_dbmap:
@@ -956,6 +963,10 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev,
 		    hns_roce_qp_has_sq(init_attr))
 			hns_roce_db_unmap_user(uctx, &hr_qp->sdb);
 
+err_sq_wrid:
+	if (!udata)
+		kfree(hr_qp->sq.wrid);
+
 err_get_bufs:
 	hns_roce_free_buf_list(buf_list, hr_qp->region_cnt);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-next 3/9] RDMA/hns: Completely release qp resources when hw err
  2019-08-09  9:40 [PATCH for-next 0/9] Bugfixes for 5.3-rc2 Lijun Ou
  2019-08-09  9:40 ` [PATCH for-next 1/9] RDMA/hns: Logic optimization of wc_flags Lijun Ou
  2019-08-09  9:40 ` [PATCH for-next 2/9] RDMA/hns: Bugfix for creating qp attached to srq Lijun Ou
@ 2019-08-09  9:41 ` Lijun Ou
  2019-08-12 15:29   ` Doug Ledford
  2019-08-09  9:41 ` [PATCH for-next 4/9] RDMA/hns: Modify pi vlaue when cq overflows Lijun Ou
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Lijun Ou @ 2019-08-09  9:41 UTC (permalink / raw)
  To: dledford, jgg; +Cc: leon, linux-rdma, linuxarm

From: Yangyang Li <liyangyang20@huawei.com>

Even if no response from hardware, make sure that qp related
resources are completely released.

Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
---
 drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index 7a14f0b..0409851 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -4562,16 +4562,14 @@ static int hns_roce_v2_destroy_qp_common(struct hns_roce_dev *hr_dev,
 {
 	struct hns_roce_cq *send_cq, *recv_cq;
 	struct ib_device *ibdev = &hr_dev->ib_dev;
-	int ret;
+	int ret = 0;
 
 	if (hr_qp->ibqp.qp_type == IB_QPT_RC && hr_qp->state != IB_QPS_RESET) {
 		/* Modify qp to reset before destroying qp */
 		ret = hns_roce_v2_modify_qp(&hr_qp->ibqp, NULL, 0,
 					    hr_qp->state, IB_QPS_RESET);
-		if (ret) {
+		if (ret)
 			ibdev_err(ibdev, "modify QP to Reset failed.\n");
-			return ret;
-		}
 	}
 
 	send_cq = to_hr_cq(hr_qp->ibqp.send_cq);
@@ -4627,7 +4625,7 @@ static int hns_roce_v2_destroy_qp_common(struct hns_roce_dev *hr_dev,
 		kfree(hr_qp->rq_inl_buf.wqe_list);
 	}
 
-	return 0;
+	return ret;
 }
 
 static int hns_roce_v2_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
@@ -4637,11 +4635,9 @@ static int hns_roce_v2_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
 	int ret;
 
 	ret = hns_roce_v2_destroy_qp_common(hr_dev, hr_qp, udata);
-	if (ret) {
+	if (ret)
 		ibdev_err(&hr_dev->ib_dev, "Destroy qp 0x%06lx failed(%d)\n",
 			  hr_qp->qpn, ret);
-		return ret;
-	}
 
 	if (hr_qp->ibqp.qp_type == IB_QPT_GSI)
 		kfree(hr_to_hr_sqp(hr_qp));
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-next 4/9] RDMA/hns: Modify pi vlaue when cq overflows
  2019-08-09  9:40 [PATCH for-next 0/9] Bugfixes for 5.3-rc2 Lijun Ou
                   ` (2 preceding siblings ...)
  2019-08-09  9:41 ` [PATCH for-next 3/9] RDMA/hns: Completely release qp resources when hw err Lijun Ou
@ 2019-08-09  9:41 ` Lijun Ou
  2019-08-09  9:41 ` [PATCH for-next 5/9] RDMA/hns: Bugfix for slab-out-of-bounds when unloading hip08 driver Lijun Ou
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Lijun Ou @ 2019-08-09  9:41 UTC (permalink / raw)
  To: dledford, jgg; +Cc: leon, linux-rdma, linuxarm

From: Yangyang Li <liyangyang20@huawei.com>

When exiting "for loop", the actual value of pi will be
increased by 1, which is compatible with the next calculation.
But when pi is equal to "ci + hr_cq-> ib_cq.cqe", the "break"
was called and the pi is actual value, it will lead one cqe
still existing, so the "==" should be modify to ">".

Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
---
 drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index 0409851..c6b55a1 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -2407,7 +2407,7 @@ static void __hns_roce_v2_cq_clean(struct hns_roce_cq *hr_cq, u32 qpn,
 
 	for (prod_index = hr_cq->cons_index; get_sw_cqe_v2(hr_cq, prod_index);
 	     ++prod_index) {
-		if (prod_index == hr_cq->cons_index + hr_cq->ib_cq.cqe)
+		if (prod_index > hr_cq->cons_index + hr_cq->ib_cq.cqe)
 			break;
 	}
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-next 5/9] RDMA/hns: Bugfix for slab-out-of-bounds when unloading hip08 driver
  2019-08-09  9:40 [PATCH for-next 0/9] Bugfixes for 5.3-rc2 Lijun Ou
                   ` (3 preceding siblings ...)
  2019-08-09  9:41 ` [PATCH for-next 4/9] RDMA/hns: Modify pi vlaue when cq overflows Lijun Ou
@ 2019-08-09  9:41 ` Lijun Ou
  2019-08-09  9:41 ` [PATCH for-next 6/9] RDMA/hns: bugfix for slab-out-of-bounds when loading " Lijun Ou
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Lijun Ou @ 2019-08-09  9:41 UTC (permalink / raw)
  To: dledford, jgg; +Cc: leon, linux-rdma, linuxarm

From: Xi Wang <wangxi11@huawei.com>

kasan will report a BUG when run command 'rmmod hns_roce_hw_v2', the calltrace
is as follows:

==================================================================
BUG: KASAN: slab-out-of-bounds in hns_roce_table_mhop_put+0x584/0x828
[hns_roce]
Read of size 8 at addr ffff802185e08300 by task rmmod/270

Call trace:
dump_backtrace+0x0/0x1e8
show_stack+0x14/0x20
dump_stack+0xc4/0xfc
print_address_description+0x60/0x270
__kasan_report+0x164/0x1b8
kasan_report+0xc/0x18
__asan_load8+0x84/0xa8
hns_roce_table_mhop_put+0x584/0x828 [hns_roce]
hns_roce_table_put+0x174/0x1a0 [hns_roce]
hns_roce_mr_free+0x124/0x210 [hns_roce]
hns_roce_dereg_mr+0x90/0xb8 [hns_roce]
ib_dealloc_pd_user+0x60/0xf0
ib_mad_port_close+0x128/0x1d8
ib_mad_remove_device+0x94/0x118
remove_client_context+0xa0/0xe0
disable_device+0xfc/0x1c0
__ib_unregister_device+0x60/0xe0
ib_unregister_device+0x24/0x38
hns_roce_exit+0x3c/0x138 [hns_roce]
__hns_roce_hw_v2_uninit_instance.isra.30+0x28/0x50 [hns_roce_hw_v2]
hns_roce_hw_v2_uninit_instance+0x44/0x60 [hns_roce_hw_v2]
hclge_uninit_client_instance+0x15c/0x238 [hclge]
hnae3_uninit_client_instance+0x84/0xa8 [hnae3]
hnae3_unregister_client+0x84/0x158 [hnae3]
hns_roce_hw_v2_exit+0x14/0x20 [hns_roce_hw_v2]
__arm64_sys_delete_module+0x20c/0x308
el0_svc_handler+0xbc/0x210
el0_svc+0x8/0xc

Allocated by task 255:
__kasan_kmalloc.isra.0+0xd0/0x180
kasan_kmalloc+0xc/0x18
__kmalloc+0x16c/0x328
hns_roce_init_hem_table+0x20c/0x428 [hns_roce]
hns_roce_init+0x214/0xfe0 [hns_roce]
__hns_roce_hw_v2_init_instance+0x284/0x330 [hns_roce_hw_v2]
hns_roce_hw_v2_init_instance+0xd0/0x1b8 [hns_roce_hw_v2]
hclge_init_roce_client_instance+0x180/0x310 [hclge]
hclge_init_client_instance+0xcc/0x508 [hclge]
hnae3_init_client_instance.part.3+0x3c/0x80 [hnae3]
hnae3_register_client+0x134/0x1a8 [hnae3]
0xffff200009c00014
do_one_initcall+0x9c/0x3e0
do_init_module+0xd4/0x2d8
load_module+0x3284/0x3690
__se_sys_init_module+0x274/0x308
__arm64_sys_init_module+0x40/0x50
el0_svc_handler+0xbc/0x210
el0_svc+0x8/0xc

Freed by task 0:
(stack is not available)

The buggy address belongs to the object at ffff802185e06300
which belongs to the cache kmalloc-8k of size 8192
The buggy address is located 0 bytes to the right of
8192-byte region [ffff802185e06300, ffff802185e08300)
The buggy address belongs to the page:
page:ffff7fe008617800 refcount:1 mapcount:0 mapping:ffff802340020e00 index:0x0
compound_mapcount: 0
flags: 0x5fffe00000010200(slab|head)
raw: 5fffe00000010200 dead000000000100 dead000000000200 ffff802340020e00
raw: 0000000000000000 00000000803e003e 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff802185e08200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff802185e08280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff802185e08300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff802185e08380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff802185e08400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Disabling lock debugging due to kernel taint

Fixes: a25d13cbe816 ("RDMA/hns: Add the interfaces to support multi hop addressing for the contexts in hip08")

Signed-off-by: Xi Wang <wangxi11@huawei.com>
---
 drivers/infiniband/hw/hns/hns_roce_hem.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_hem.c b/drivers/infiniband/hw/hns/hns_roce_hem.c
index 0268c7a..f2c4fef 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hem.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hem.c
@@ -85,12 +85,13 @@ bool hns_roce_check_whether_mhop(struct hns_roce_dev *hr_dev, u32 type)
 }
 
 static bool hns_roce_check_hem_null(struct hns_roce_hem **hem, u64 start_idx,
-			    u32 bt_chunk_num)
+			    u32 bt_chunk_num, u64 hem_max_num)
 {
-	int i;
+	u64 check_max_num = start_idx + bt_chunk_num;
+	u64 i;
 
-	for (i = 0; i < bt_chunk_num; i++)
-		if (hem[start_idx + i])
+	for (i = start_idx; (i < check_max_num) && (i < hem_max_num); i++)
+		if (hem[i])
 			return false;
 
 	return true;
@@ -496,6 +497,12 @@ static int hns_roce_table_mhop_get(struct hns_roce_dev *hr_dev,
 		return -EINVAL;
 	}
 
+	if (unlikely(hem_idx >= table->num_hem)) {
+		dev_err(dev, "Table %d exceed hem limt idx = %llu,max = %lu!\n",
+			     table->type, hem_idx, table->num_hem);
+		return -EINVAL;
+	}
+
 	mutex_lock(&table->mutex);
 
 	if (table->hem[hem_idx]) {
@@ -732,7 +739,7 @@ static void hns_roce_table_mhop_put(struct hns_roce_dev *hr_dev,
 	if (check_whether_bt_num_2(table->type, hop_num)) {
 		start_idx = mhop.l0_idx * chunk_ba_num;
 		if (hns_roce_check_hem_null(table->hem, start_idx,
-					    chunk_ba_num)) {
+					    chunk_ba_num, table->num_hem)) {
 			if (table->type < HEM_TYPE_MTT &&
 			    hr_dev->hw->clear_hem(hr_dev, table, obj, 0))
 				dev_warn(dev, "Clear HEM base address failed.\n");
@@ -746,7 +753,7 @@ static void hns_roce_table_mhop_put(struct hns_roce_dev *hr_dev,
 		start_idx = mhop.l0_idx * chunk_ba_num * chunk_ba_num +
 			    mhop.l1_idx * chunk_ba_num;
 		if (hns_roce_check_hem_null(table->hem, start_idx,
-					    chunk_ba_num)) {
+					    chunk_ba_num, table->num_hem)) {
 			if (hr_dev->hw->clear_hem(hr_dev, table, obj, 1))
 				dev_warn(dev, "Clear HEM base address failed.\n");
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-next 6/9] RDMA/hns: bugfix for slab-out-of-bounds when loading hip08 driver
  2019-08-09  9:40 [PATCH for-next 0/9] Bugfixes for 5.3-rc2 Lijun Ou
                   ` (4 preceding siblings ...)
  2019-08-09  9:41 ` [PATCH for-next 5/9] RDMA/hns: Bugfix for slab-out-of-bounds when unloading hip08 driver Lijun Ou
@ 2019-08-09  9:41 ` Lijun Ou
  2019-08-09  9:41 ` [PATCH for-next 7/9] RDMA/hns: Remove unuseful member Lijun Ou
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Lijun Ou @ 2019-08-09  9:41 UTC (permalink / raw)
  To: dledford, jgg; +Cc: leon, linux-rdma, linuxarm

From: Xi Wang <wangxi11@huawei.com>

kasan will report a BUG when run command 'insmod hns_roce_hw_v2.ko', the
calltrace is as follows:

==================================================================
BUG: KASAN: slab-out-of-bounds in hns_roce_v2_init_eq_table+0x1324/0x1948
[hns_roce_hw_v2]
Read of size 8 at addr ffff8020e7a10608 by task insmod/256

CPU: 0 PID: 256 Comm: insmod Tainted: G           O      5.2.0-rc4 #1
Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0
Call trace:
dump_backtrace+0x0/0x1e8
show_stack+0x14/0x20
dump_stack+0xc4/0xfc
print_address_description+0x60/0x270
__kasan_report+0x164/0x1b8
kasan_report+0xc/0x18
__asan_load8+0x84/0xa8
hns_roce_v2_init_eq_table+0x1324/0x1948 [hns_roce_hw_v2]
hns_roce_init+0xf8/0xfe0 [hns_roce]
__hns_roce_hw_v2_init_instance+0x284/0x330 [hns_roce_hw_v2]
hns_roce_hw_v2_init_instance+0xd0/0x1b8 [hns_roce_hw_v2]
hclge_init_roce_client_instance+0x180/0x310 [hclge]
hclge_init_client_instance+0xcc/0x508 [hclge]
hnae3_init_client_instance.part.3+0x3c/0x80 [hnae3]
hnae3_register_client+0x134/0x1a8 [hnae3]
hns_roce_hw_v2_init+0x14/0x10000 [hns_roce_hw_v2]
do_one_initcall+0x9c/0x3e0
do_init_module+0xd4/0x2d8
load_module+0x3284/0x3690
__se_sys_init_module+0x274/0x308
__arm64_sys_init_module+0x40/0x50
el0_svc_handler+0xbc/0x210
el0_svc+0x8/0xc

Allocated by task 256:
__kasan_kmalloc.isra.0+0xd0/0x180
kasan_kmalloc+0xc/0x18
__kmalloc+0x16c/0x328
hns_roce_v2_init_eq_table+0x764/0x1948 [hns_roce_hw_v2]
hns_roce_init+0xf8/0xfe0 [hns_roce]
__hns_roce_hw_v2_init_instance+0x284/0x330 [hns_roce_hw_v2]
hns_roce_hw_v2_init_instance+0xd0/0x1b8 [hns_roce_hw_v2]
hclge_init_roce_client_instance+0x180/0x310 [hclge]
hclge_init_client_instance+0xcc/0x508 [hclge]
hnae3_init_client_instance.part.3+0x3c/0x80 [hnae3]
hnae3_register_client+0x134/0x1a8 [hnae3]
hns_roce_hw_v2_init+0x14/0x10000 [hns_roce_hw_v2]
do_one_initcall+0x9c/0x3e0
do_init_module+0xd4/0x2d8
load_module+0x3284/0x3690
__se_sys_init_module+0x274/0x308
__arm64_sys_init_module+0x40/0x50
el0_svc_handler+0xbc/0x210
el0_svc+0x8/0xc

Freed by task 0:
(stack is not available)

The buggy address belongs to the object at ffff8020e7a10600
which belongs to the cache kmalloc-128 of size 128
The buggy address is located 8 bytes inside of
128-byte region [ffff8020e7a10600, ffff8020e7a10680)
The buggy address belongs to the page:
page:ffff7fe00839e840 refcount:1 mapcount:0 mapping:ffff802340020200 index:0x0
flags: 0x5fffe00000000200(slab)
raw: 5fffe00000000200 dead000000000100 dead000000000200 ffff802340020200
raw: 0000000000000000 0000000081000100 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff8020e7a10500: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
ffff8020e7a10580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8020e7a10600: 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff8020e7a10680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8020e7a10700: 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Disabling lock debugging due to kernel taint

Fixes: a5073d6054f7 ("RDMA/hns: Add eq support of hip08")

Signed-off-by: Xi Wang <wangxi11@huawei.com>
---
 drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index c6b55a1..d33341e 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -5543,7 +5543,8 @@ static int hns_roce_mhop_alloc_eq(struct hns_roce_dev *hr_dev,
 				break;
 		}
 		eq->cur_eqe_ba = eq->buf_dma[0];
-		eq->nxt_eqe_ba = eq->buf_dma[1];
+		if (ba_num > 1)
+			eq->nxt_eqe_ba = eq->buf_dma[1];
 
 	} else if (mhop_num == 2) {
 		/* alloc L1 BT and buf */
@@ -5584,7 +5585,8 @@ static int hns_roce_mhop_alloc_eq(struct hns_roce_dev *hr_dev,
 				break;
 		}
 		eq->cur_eqe_ba = eq->buf_dma[0];
-		eq->nxt_eqe_ba = eq->buf_dma[1];
+		if (ba_num > 1)
+			eq->nxt_eqe_ba = eq->buf_dma[1];
 	}
 
 	eq->l0_last_num = i + 1;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-next 7/9] RDMA/hns: Remove unuseful member
  2019-08-09  9:40 [PATCH for-next 0/9] Bugfixes for 5.3-rc2 Lijun Ou
                   ` (5 preceding siblings ...)
  2019-08-09  9:41 ` [PATCH for-next 6/9] RDMA/hns: bugfix for slab-out-of-bounds when loading " Lijun Ou
@ 2019-08-09  9:41 ` Lijun Ou
  2019-08-09  9:41 ` [PATCH for-next 8/9] RDMA/hns: Kernel notify usr space to stop ring db Lijun Ou
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Lijun Ou @ 2019-08-09  9:41 UTC (permalink / raw)
  To: dledford, jgg; +Cc: leon, linux-rdma, linuxarm

From: Lang Cheng <chenglang@huawei.com>

For struct hns_roce_cmdq, toggle is unused and
delete it.

Signed-off-by: Lang Cheng <chenglang@huawei.com>
---
 drivers/infiniband/hw/hns/hns_roce_cmd.c    | 1 -
 drivers/infiniband/hw/hns/hns_roce_device.h | 1 -
 2 files changed, 2 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.c b/drivers/infiniband/hw/hns/hns_roce_cmd.c
index 0cd09bf..ade26fa 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cmd.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cmd.c
@@ -211,7 +211,6 @@ int hns_roce_cmd_init(struct hns_roce_dev *hr_dev)
 	mutex_init(&hr_dev->cmd.hcr_mutex);
 	sema_init(&hr_dev->cmd.poll_sem, 1);
 	hr_dev->cmd.use_events = 0;
-	hr_dev->cmd.toggle = 1;
 	hr_dev->cmd.max_cmds = CMD_MAX_NUM;
 	hr_dev->cmd.pool = dma_pool_create("hns_roce_cmd", dev,
 					   HNS_ROCE_MAILBOX_SIZE,
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 8c4b120..32465f5 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -617,7 +617,6 @@ struct hns_roce_cmdq {
 	 * close device, switch into poll mode(non event mode)
 	 */
 	u8			use_events;
-	u8			toggle;
 };
 
 struct hns_roce_cmd_mailbox {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-next 8/9] RDMA/hns: Kernel notify usr space to stop ring db
  2019-08-09  9:40 [PATCH for-next 0/9] Bugfixes for 5.3-rc2 Lijun Ou
                   ` (6 preceding siblings ...)
  2019-08-09  9:41 ` [PATCH for-next 7/9] RDMA/hns: Remove unuseful member Lijun Ou
@ 2019-08-09  9:41 ` Lijun Ou
  2019-08-12  5:52   ` Leon Romanovsky
  2019-08-09  9:41 ` [PATCH for-next 9/9] RDMA/hns: Copy some information of AV to user Lijun Ou
  2019-08-13 16:34 ` [PATCH for-next 0/9] Bugfixes for 5.3-rc2 Doug Ledford
  9 siblings, 1 reply; 24+ messages in thread
From: Lijun Ou @ 2019-08-09  9:41 UTC (permalink / raw)
  To: dledford, jgg; +Cc: leon, linux-rdma, linuxarm

From: Yangyang Li <liyangyang20@huawei.com>

In the reset scenario, if the kernel receives the reset signal,
it needs to notify the user space to stop ring doorbell.

Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
---
 drivers/infiniband/hw/hns/hns_roce_device.h |  4 +++
 drivers/infiniband/hw/hns/hns_roce_hw_v2.c  | 52 ++++++++++++++++++++++++++++-
 drivers/infiniband/hw/hns/hns_roce_hw_v2.h  |  4 +++
 drivers/infiniband/hw/hns/hns_roce_main.c   | 22 ++++++------
 4 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 32465f5..be65fce 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -268,6 +268,8 @@ enum {
 
 #define PAGE_ADDR_SHIFT				12
 
+#define HNS_ROCE_IS_RESETTING			1
+
 struct hns_roce_uar {
 	u64		pfn;
 	unsigned long	index;
@@ -1043,6 +1045,8 @@ struct hns_roce_dev {
 	u32			odb_offset;
 	dma_addr_t		tptr_dma_addr;	/* only for hw v1 */
 	u32			tptr_size;	/* only for hw v1 */
+	struct page		*reset_page; /* store reset state */
+	void			*reset_kaddr; /* addr of reset page */
 	const struct hns_roce_hw *hw;
 	void			*priv;
 	struct workqueue_struct *irq_workq;
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index d33341e..138e5a8 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -1867,17 +1867,49 @@ static void hns_roce_free_link_table(struct hns_roce_dev *hr_dev,
 			  link_tbl->table.map);
 }
 
+static int hns_roce_v2_get_reset_page(struct hns_roce_dev *hr_dev)
+{
+	hr_dev->reset_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!hr_dev->reset_page)
+		return -ENOMEM;
+
+	hr_dev->reset_kaddr = vmap(&hr_dev->reset_page, 1, VM_MAP, PAGE_KERNEL);
+	if (!hr_dev->reset_kaddr)
+		goto err_with_vmap;
+
+	return 0;
+
+err_with_vmap:
+	put_page(hr_dev->reset_page);
+	return -ENOMEM;
+}
+
+static void hns_roce_v2_put_reset_page(struct hns_roce_dev *hr_dev)
+{
+	vunmap(hr_dev->reset_kaddr);
+	hr_dev->reset_kaddr = NULL;
+	put_page(hr_dev->reset_page);
+	hr_dev->reset_page = NULL;
+}
+
 static int hns_roce_v2_init(struct hns_roce_dev *hr_dev)
 {
 	struct hns_roce_v2_priv *priv = hr_dev->priv;
 	int qpc_count, cqc_count;
 	int ret, i;
 
+	ret = hns_roce_v2_get_reset_page(hr_dev);
+	if (ret) {
+		dev_err(hr_dev->dev,
+			"reset state init failed, ret = %d.\n", ret);
+		return ret;
+	}
+
 	/* TSQ includes SQ doorbell and ack doorbell */
 	ret = hns_roce_init_link_table(hr_dev, TSQ_LINK_TABLE);
 	if (ret) {
 		dev_err(hr_dev->dev, "TSQ init failed, ret = %d.\n", ret);
-		return ret;
+		goto err_tsq_init_failed;
 	}
 
 	ret = hns_roce_init_link_table(hr_dev, TPQ_LINK_TABLE);
@@ -1923,6 +1955,9 @@ static int hns_roce_v2_init(struct hns_roce_dev *hr_dev)
 err_tpq_init_failed:
 	hns_roce_free_link_table(hr_dev, &priv->tsq);
 
+err_tsq_init_failed:
+	hns_roce_v2_put_reset_page(hr_dev);
+
 	return ret;
 }
 
@@ -1935,6 +1970,7 @@ static void hns_roce_v2_exit(struct hns_roce_dev *hr_dev)
 
 	hns_roce_free_link_table(hr_dev, &priv->tpq);
 	hns_roce_free_link_table(hr_dev, &priv->tsq);
+	hns_roce_v2_put_reset_page(hr_dev);
 }
 
 static int hns_roce_query_mbox_status(struct hns_roce_dev *hr_dev)
@@ -6434,6 +6470,19 @@ static void hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
 
 	handle->rinfo.instance_state = HNS_ROCE_STATE_NON_INIT;
 }
+
+static void hns_roce_v2_reset_notify_user(struct hns_roce_dev *hr_dev)
+{
+	struct hns_roce_v2_reset_state *state;
+
+	state = (struct hns_roce_v2_reset_state *) hr_dev->reset_kaddr;
+
+	state->reset_state = HNS_ROCE_IS_RESETTING;
+
+	/* Ensure reset state was flushed in memory */
+	wmb();
+}
+
 static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
 {
 	struct hns_roce_dev *hr_dev;
@@ -6454,6 +6503,7 @@ static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
 	hr_dev->is_reset = true;
 	hr_dev->active = false;
 	hr_dev->dis_db = true;
+	hns_roce_v2_reset_notify_user(hr_dev);
 
 	event.event = IB_EVENT_DEVICE_FATAL;
 	event.device = &hr_dev->ib_dev;
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
index 58931b5..392bc03 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
@@ -1622,6 +1622,10 @@ struct hns_roce_link_table_entry {
 #define HNS_ROCE_LINK_TABLE_NXT_PTR_S 20
 #define HNS_ROCE_LINK_TABLE_NXT_PTR_M GENMASK(31, 20)
 
+struct hns_roce_v2_reset_state {
+	u32 reset_state; /* stored to use in user space */
+};
+
 struct hns_roce_v2_priv {
 	struct hnae3_handle *handle;
 	struct hns_roce_v2_cmq cmq;
diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
index 1e4ba48..1bda7a5 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -360,18 +360,18 @@ static int hns_roce_mmap(struct ib_ucontext *context,
 					 PAGE_SIZE,
 					 pgprot_noncached(vma->vm_page_prot));
 
-	/* vm_pgoff: 1 -- TPTR */
+	/* vm_pgoff: 1 -- TPTR(hw v1), reset_page(hw v2) */
 	case 1:
-		if (!hr_dev->tptr_dma_addr || !hr_dev->tptr_size)
-			return -EINVAL;
-		/*
-		 * FIXME: using io_remap_pfn_range on the dma address returned
-		 * by dma_alloc_coherent is totally wrong.
-		 */
-		return rdma_user_mmap_io(context, vma,
-					 hr_dev->tptr_dma_addr >> PAGE_SHIFT,
-					 hr_dev->tptr_size,
-					 vma->vm_page_prot);
+		if (hr_dev->tptr_dma_addr && hr_dev->tptr_size)
+			return rdma_user_mmap_io(context, vma,
+					    hr_dev->tptr_dma_addr >> PAGE_SHIFT,
+					    hr_dev->tptr_size,
+					    vma->vm_page_prot);
+
+		if (hr_dev->reset_page)
+			return remap_pfn_range(vma, vma->vm_start,
+					  page_to_pfn(hr_dev->reset_page),
+					  PAGE_SIZE, vma->vm_page_prot);
 
 	default:
 		return -EINVAL;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-next 9/9] RDMA/hns: Copy some information of AV to user
  2019-08-09  9:40 [PATCH for-next 0/9] Bugfixes for 5.3-rc2 Lijun Ou
                   ` (7 preceding siblings ...)
  2019-08-09  9:41 ` [PATCH for-next 8/9] RDMA/hns: Kernel notify usr space to stop ring db Lijun Ou
@ 2019-08-09  9:41 ` Lijun Ou
  2019-10-21 17:23   ` Doug Ledford
  2019-08-13 16:34 ` [PATCH for-next 0/9] Bugfixes for 5.3-rc2 Doug Ledford
  9 siblings, 1 reply; 24+ messages in thread
From: Lijun Ou @ 2019-08-09  9:41 UTC (permalink / raw)
  To: dledford, jgg; +Cc: leon, linux-rdma, linuxarm

When the driver support UD transport in user mode, it needs to
create the Address Handle(AH) and transfer Address Vector to
The hardware. The Address Vector includes the destination mac
and vlan inforation and it will be generated from the kernel
driver. As a result, we can copy this information to user
through ib_copy_to_udata function.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
---
 drivers/infiniband/hw/hns/hns_roce_ah.c     | 22 ++++++++++++++++++----
 drivers/infiniband/hw/hns/hns_roce_device.h |  3 ++-
 drivers/infiniband/hw/hns/hns_roce_hw_v2.c  |  3 ++-
 drivers/infiniband/hw/hns/hns_roce_main.c   |  7 +++++++
 include/uapi/rdma/hns-abi.h                 |  7 +++++++
 5 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_ah.c b/drivers/infiniband/hw/hns/hns_roce_ah.c
index cdd2ac2..eee53be 100644
--- a/drivers/infiniband/hw/hns/hns_roce_ah.c
+++ b/drivers/infiniband/hw/hns/hns_roce_ah.c
@@ -31,6 +31,7 @@
  */
 
 #include <linux/platform_device.h>
+#include <rdma/hns-abi.h>
 #include <rdma/ib_addr.h>
 #include <rdma/ib_cache.h>
 #include "hns_roce_device.h"
@@ -43,13 +44,14 @@ int hns_roce_create_ah(struct ib_ah *ibah, struct rdma_ah_attr *ah_attr,
 		       u32 flags, struct ib_udata *udata)
 {
 	struct hns_roce_dev *hr_dev = to_hr_dev(ibah->device);
+	struct hns_roce_ib_create_ah_resp resp = {};
 	const struct ib_gid_attr *gid_attr;
 	struct device *dev = hr_dev->dev;
 	struct hns_roce_ah *ah = to_hr_ah(ibah);
 	u16 vlan_tag = 0xffff;
 	const struct ib_global_route *grh = rdma_ah_read_grh(ah_attr);
-	bool vlan_en = false;
-	int ret;
+	u8 vlan_en = 0;
+	int ret = 0;
 
 	gid_attr = ah_attr->grh.sgid_attr;
 	ret = rdma_read_gid_l2_fields(gid_attr, &vlan_tag, NULL);
@@ -60,7 +62,7 @@ int hns_roce_create_ah(struct ib_ah *ibah, struct rdma_ah_attr *ah_attr,
 	memcpy(ah->av.mac, ah_attr->roce.dmac, ETH_ALEN);
 
 	if (vlan_tag < VLAN_CFI_MASK) {
-		vlan_en = true;
+		vlan_en = 1;
 		vlan_tag |= (rdma_ah_get_sl(ah_attr) &
 			     HNS_ROCE_VLAN_SL_BIT_MASK) <<
 			     HNS_ROCE_VLAN_SL_SHIFT;
@@ -80,7 +82,19 @@ int hns_roce_create_ah(struct ib_ah *ibah, struct rdma_ah_attr *ah_attr,
 
 	memcpy(ah->av.dgid, grh->dgid.raw, HNS_ROCE_GID_SIZE);
 	ah->av.sl_tclass_flowlabel = cpu_to_le32(rdma_ah_get_sl(ah_attr) <<
-						 HNS_ROCE_SL_SHIFT);
+				i		 HNS_ROCE_SL_SHIFT);
+
+       if (udata) {
+               memcpy(resp.dmac, ah_attr->roce.dmac, ETH_ALEN);
+               resp.vlan = vlan_tag;
+               resp.vlan_en = vlan_en;
+               ret = ib_copy_to_udata(udata, &resp,
+                                      min(udata->outlen, sizeof(resp)));
+               if (ret) {
+                       kfree(ah);
+                       return ret;
+               }
+       }
 
 	return 0;
 }
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index be65fce..e703912 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -217,6 +217,7 @@ enum {
 	HNS_ROCE_CAP_FLAG_FRMR                  = BIT(8),
 	HNS_ROCE_CAP_FLAG_QP_FLOW_CTRL		= BIT(9),
 	HNS_ROCE_CAP_FLAG_ATOMIC		= BIT(10),
+	HNS_ROCE_CAP_FLAG_UD			= BIT(11),
 };
 
 enum hns_roce_mtt_type {
@@ -578,7 +579,7 @@ struct hns_roce_av {
 	u8          dgid[HNS_ROCE_GID_SIZE];
 	u8          mac[ETH_ALEN];
 	__le16      vlan;
-	bool	    vlan_en;
+	u8	    vlan_en;
 };
 
 struct hns_roce_ah {
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index 138e5a8..71166e7 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -1653,7 +1653,8 @@ static int hns_roce_v2_profile(struct hns_roce_dev *hr_dev)
 	if (hr_dev->pci_dev->revision == 0x21) {
 		caps->flags |= HNS_ROCE_CAP_FLAG_ATOMIC |
 			       HNS_ROCE_CAP_FLAG_SRQ |
-			       HNS_ROCE_CAP_FLAG_QP_FLOW_CTRL;
+			       HNS_ROCE_CAP_FLAG_QP_FLOW_CTRL |
+			       HNS_ROCE_CAP_FLAG_UD;
 
 		caps->num_qpc_timer	  = HNS_ROCE_V2_MAX_QPC_TIMER_NUM;
 		caps->qpc_timer_entry_sz  = HNS_ROCE_V2_QPC_TIMER_ENTRY_SZ;
diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
index 1bda7a5..ba9594c 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -550,6 +550,13 @@ static int hns_roce_register_device(struct hns_roce_dev *hr_dev)
 		if (ret)
 			return ret;
 	}
+
+	if (hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_UD) {
+		ib_dev->uverbs_cmd_mask |=
+					(1ULL << IB_USER_VERBS_CMD_CREATE_AH) |
+					(1ULL << IB_USER_VERBS_CMD_DESTROY_AH);
+	}
+
 	ret = ib_register_device(ib_dev, "hns_%d");
 	if (ret) {
 		dev_err(dev, "ib_register_device failed!\n");
diff --git a/include/uapi/rdma/hns-abi.h b/include/uapi/rdma/hns-abi.h
index eb76b38..6a2d994 100644
--- a/include/uapi/rdma/hns-abi.h
+++ b/include/uapi/rdma/hns-abi.h
@@ -80,4 +80,11 @@ struct hns_roce_ib_alloc_pd_resp {
 	__u32 pdn;
 };
 
+struct hns_roce_ib_create_ah_resp {
+	__u8	dmac[6];
+	__u16	vlan;
+	__u8	vlan_en;
+	__u8	reserved[7];
+};
+
 #endif /* HNS_ABI_USER_H */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 8/9] RDMA/hns: Kernel notify usr space to stop ring db
  2019-08-09  9:41 ` [PATCH for-next 8/9] RDMA/hns: Kernel notify usr space to stop ring db Lijun Ou
@ 2019-08-12  5:52   ` Leon Romanovsky
  2019-08-12 13:14     ` Jason Gunthorpe
  0 siblings, 1 reply; 24+ messages in thread
From: Leon Romanovsky @ 2019-08-12  5:52 UTC (permalink / raw)
  To: Lijun Ou; +Cc: dledford, jgg, linux-rdma, linuxarm

On Fri, Aug 09, 2019 at 05:41:05PM +0800, Lijun Ou wrote:
> From: Yangyang Li <liyangyang20@huawei.com>
>
> In the reset scenario, if the kernel receives the reset signal,
> it needs to notify the user space to stop ring doorbell.

I doubt about it, it is racy like hell and relies on assumption that
userspace will honor such request to stop.

>
> Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
> ---
>  drivers/infiniband/hw/hns/hns_roce_device.h |  4 +++
>  drivers/infiniband/hw/hns/hns_roce_hw_v2.c  | 52 ++++++++++++++++++++++++++++-
>  drivers/infiniband/hw/hns/hns_roce_hw_v2.h  |  4 +++
>  drivers/infiniband/hw/hns/hns_roce_main.c   | 22 ++++++------
>  4 files changed, 70 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
> index 32465f5..be65fce 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_device.h
> +++ b/drivers/infiniband/hw/hns/hns_roce_device.h
> @@ -268,6 +268,8 @@ enum {
>
>  #define PAGE_ADDR_SHIFT				12
>
> +#define HNS_ROCE_IS_RESETTING			1
> +
>  struct hns_roce_uar {
>  	u64		pfn;
>  	unsigned long	index;
> @@ -1043,6 +1045,8 @@ struct hns_roce_dev {
>  	u32			odb_offset;
>  	dma_addr_t		tptr_dma_addr;	/* only for hw v1 */
>  	u32			tptr_size;	/* only for hw v1 */
> +	struct page		*reset_page; /* store reset state */
> +	void			*reset_kaddr; /* addr of reset page */
>  	const struct hns_roce_hw *hw;
>  	void			*priv;
>  	struct workqueue_struct *irq_workq;
> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> index d33341e..138e5a8 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> @@ -1867,17 +1867,49 @@ static void hns_roce_free_link_table(struct hns_roce_dev *hr_dev,
>  			  link_tbl->table.map);
>  }
>
> +static int hns_roce_v2_get_reset_page(struct hns_roce_dev *hr_dev)
> +{
> +	hr_dev->reset_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> +	if (!hr_dev->reset_page)
> +		return -ENOMEM;
> +
> +	hr_dev->reset_kaddr = vmap(&hr_dev->reset_page, 1, VM_MAP, PAGE_KERNEL);
> +	if (!hr_dev->reset_kaddr)
> +		goto err_with_vmap;
> +
> +	return 0;
> +
> +err_with_vmap:
> +	put_page(hr_dev->reset_page);

Are you sure that pages allocated with alloc_page() are released with put_page()?

I would say with high confidence that whole this patch is wrong.

Thanks

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 8/9] RDMA/hns: Kernel notify usr space to stop ring db
  2019-08-12  5:52   ` Leon Romanovsky
@ 2019-08-12 13:14     ` Jason Gunthorpe
  2019-08-14  5:54       ` Yangyang Li
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2019-08-12 13:14 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Lijun Ou, dledford, linux-rdma, linuxarm

On Mon, Aug 12, 2019 at 08:52:20AM +0300, Leon Romanovsky wrote:
> On Fri, Aug 09, 2019 at 05:41:05PM +0800, Lijun Ou wrote:
> > From: Yangyang Li <liyangyang20@huawei.com>
> >
> > In the reset scenario, if the kernel receives the reset signal,
> > it needs to notify the user space to stop ring doorbell.
> 
> I doubt about it, it is racy like hell and relies on assumption that
> userspace will honor such request to stop.

Sounds like this is the device unplug flow we already have support
for, use the APIs to drop the VMA refering to the doorbell

> > Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
> >  drivers/infiniband/hw/hns/hns_roce_device.h |  4 +++
> >  drivers/infiniband/hw/hns/hns_roce_hw_v2.c  | 52 ++++++++++++++++++++++++++++-
> >  drivers/infiniband/hw/hns/hns_roce_hw_v2.h  |  4 +++
> >  drivers/infiniband/hw/hns/hns_roce_main.c   | 22 ++++++------
> >  4 files changed, 70 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
> > index 32465f5..be65fce 100644
> > +++ b/drivers/infiniband/hw/hns/hns_roce_device.h
> > @@ -268,6 +268,8 @@ enum {
> >
> >  #define PAGE_ADDR_SHIFT				12
> >
> > +#define HNS_ROCE_IS_RESETTING			1
> > +
> >  struct hns_roce_uar {
> >  	u64		pfn;
> >  	unsigned long	index;
> > @@ -1043,6 +1045,8 @@ struct hns_roce_dev {
> >  	u32			odb_offset;
> >  	dma_addr_t		tptr_dma_addr;	/* only for hw v1 */
> >  	u32			tptr_size;	/* only for hw v1 */
> > +	struct page		*reset_page; /* store reset state */
> > +	void			*reset_kaddr; /* addr of reset page */
> >  	const struct hns_roce_hw *hw;
> >  	void			*priv;
> >  	struct workqueue_struct *irq_workq;
> > diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> > index d33341e..138e5a8 100644
> > +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> > @@ -1867,17 +1867,49 @@ static void hns_roce_free_link_table(struct hns_roce_dev *hr_dev,
> >  			  link_tbl->table.map);
> >  }
> >
> > +static int hns_roce_v2_get_reset_page(struct hns_roce_dev *hr_dev)
> > +{
> > +	hr_dev->reset_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> > +	if (!hr_dev->reset_page)
> > +		return -ENOMEM;
> > +
> > +	hr_dev->reset_kaddr = vmap(&hr_dev->reset_page, 1, VM_MAP, PAGE_KERNEL);
> > +	if (!hr_dev->reset_kaddr)
> > +		goto err_with_vmap;

Yes, this vmap is nonsense too, get_zeroed_page() is the right API

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 3/9] RDMA/hns: Completely release qp resources when hw err
  2019-08-09  9:41 ` [PATCH for-next 3/9] RDMA/hns: Completely release qp resources when hw err Lijun Ou
@ 2019-08-12 15:29   ` Doug Ledford
  2019-08-14  6:02     ` Yangyang Li
  0 siblings, 1 reply; 24+ messages in thread
From: Doug Ledford @ 2019-08-12 15:29 UTC (permalink / raw)
  To: Lijun Ou, jgg; +Cc: leon, linux-rdma, linuxarm

[-- Attachment #1: Type: text/plain, Size: 2910 bytes --]

On Fri, 2019-08-09 at 17:41 +0800, Lijun Ou wrote:
> From: Yangyang Li <liyangyang20@huawei.com>
> 
> Even if no response from hardware, make sure that qp related
> resources are completely released.
> 
> Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
> ---
>  drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 12 ++++--------
>  1 file changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> index 7a14f0b..0409851 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> @@ -4562,16 +4562,14 @@ static int
> hns_roce_v2_destroy_qp_common(struct hns_roce_dev *hr_dev,
>  {
>  	struct hns_roce_cq *send_cq, *recv_cq;
>  	struct ib_device *ibdev = &hr_dev->ib_dev;
> -	int ret;
> +	int ret = 0;
>  
>  	if (hr_qp->ibqp.qp_type == IB_QPT_RC && hr_qp->state !=
> IB_QPS_RESET) {
>  		/* Modify qp to reset before destroying qp */
>  		ret = hns_roce_v2_modify_qp(&hr_qp->ibqp, NULL, 0,
>  					    hr_qp->state, IB_QPS_RESET);
> -		if (ret) {
> +		if (ret)
>  			ibdev_err(ibdev, "modify QP to Reset
> failed.\n");
> -			return ret;
> -		}
>  	}
>  
>  	send_cq = to_hr_cq(hr_qp->ibqp.send_cq);
> @@ -4627,7 +4625,7 @@ static int hns_roce_v2_destroy_qp_common(struct
> hns_roce_dev *hr_dev,
>  		kfree(hr_qp->rq_inl_buf.wqe_list);
>  	}
>  
> -	return 0;
> +	return ret;
>  }
>  
>  static int hns_roce_v2_destroy_qp(struct ib_qp *ibqp, struct ib_udata
> *udata)
> @@ -4637,11 +4635,9 @@ static int hns_roce_v2_destroy_qp(struct ib_qp
> *ibqp, struct ib_udata *udata)
>  	int ret;
>  
>  	ret = hns_roce_v2_destroy_qp_common(hr_dev, hr_qp, udata);
> -	if (ret) {
> +	if (ret)
>  		ibdev_err(&hr_dev->ib_dev, "Destroy qp 0x%06lx
> failed(%d)\n",
>  			  hr_qp->qpn, ret);
> -		return ret;
> -	}
>  
>  	if (hr_qp->ibqp.qp_type == IB_QPT_GSI)
>  		kfree(hr_to_hr_sqp(hr_qp));

I don't know your hardware, but this patch sounds wrong/dangerous to me.
As long as the resources this card might access are allocated by the
kernel, you can't get random data corruption by the card writing to
memory used elsewhere in the kernel.  So if your card is not responding
to your requests to free the resources, it would seem safer to leak
those resources permanently than to free them and risk the card coming
back to life long enough to corrupt memory reallocated to some other
task.

Only if you can guarantee me that there is no way your commands to the
card will fail and then the card start working again later would I
consider this patch safe.  And if it's possible for the card to hang
like this, should that be triggering a reset of the device?

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
    Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 2/9] RDMA/hns: Bugfix for creating qp attached to srq
  2019-08-09  9:40 ` [PATCH for-next 2/9] RDMA/hns: Bugfix for creating qp attached to srq Lijun Ou
@ 2019-08-12 15:29   ` Doug Ledford
  0 siblings, 0 replies; 24+ messages in thread
From: Doug Ledford @ 2019-08-12 15:29 UTC (permalink / raw)
  To: Lijun Ou, jgg; +Cc: leon, linux-rdma, linuxarm

[-- Attachment #1: Type: text/plain, Size: 618 bytes --]

On Fri, 2019-08-09 at 17:40 +0800, Lijun Ou wrote:
>                 hr_qp->sq.wrid = kcalloc(hr_qp->sq.wqe_cnt,
> sizeof(u64),
> -                                        GFP_KERNEL);
> -               hr_qp->rq.wrid = kcalloc(hr_qp->rq.wqe_cnt,
> sizeof(u64),
> -                                        GFP_KERNEL);
> -               if (!hr_qp->sq.wrid || !hr_qp->rq.wrid) {
> +                                              GFP_KERNEL);

Whitespace breakage.

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
    Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 0/9] Bugfixes for 5.3-rc2
  2019-08-09  9:40 [PATCH for-next 0/9] Bugfixes for 5.3-rc2 Lijun Ou
                   ` (8 preceding siblings ...)
  2019-08-09  9:41 ` [PATCH for-next 9/9] RDMA/hns: Copy some information of AV to user Lijun Ou
@ 2019-08-13 16:34 ` Doug Ledford
  2019-08-24  6:23   ` oulijun
  9 siblings, 1 reply; 24+ messages in thread
From: Doug Ledford @ 2019-08-13 16:34 UTC (permalink / raw)
  To: Lijun Ou, jgg; +Cc: leon, linux-rdma, linuxarm

[-- Attachment #1: Type: text/plain, Size: 1643 bytes --]

On Fri, 2019-08-09 at 17:40 +0800, Lijun Ou wrote:
> Here fixes some bugs for hip08
> 
> Lang Cheng (1):
>   RDMA/hns: Remove unuseful member
> 
> Lijun Ou (2):
>   RDMA/hns: Bugfix for creating qp attached to srq
>   RDMA/hns: Copy some information of AV to user
> 
> Weihang Li (1):
>   RDMA/hns: Logic optimization of wc_flags
> 
> Xi Wang (2):
>   RDMA/hns: Bugfix for slab-out-of-bounds when unloading hip08 driver
>   RDMA/hns: bugfix for slab-out-of-bounds when loading hip08 driver
> 
> Yangyang Li (3):
>   RDMA/hns: Completely release qp resources when hw err
>   RDMA/hns: Modify pi vlaue when cq overflows
>   RDMA/hns: Kernel notify usr space to stop ring db
> 
>  drivers/infiniband/hw/hns/hns_roce_ah.c     | 22 ++++++--
>  drivers/infiniband/hw/hns/hns_roce_cmd.c    |  1 -
>  drivers/infiniband/hw/hns/hns_roce_device.h |  8 ++-
>  drivers/infiniband/hw/hns/hns_roce_hem.c    | 19 ++++---
>  drivers/infiniband/hw/hns/hns_roce_hw_v2.c  | 78
> +++++++++++++++++++++++------
>  drivers/infiniband/hw/hns/hns_roce_hw_v2.h  |  4 ++
>  drivers/infiniband/hw/hns/hns_roce_main.c   | 29 +++++++----
>  drivers/infiniband/hw/hns/hns_roce_qp.c     | 25 ++++++---
>  include/uapi/rdma/hns-abi.h                 |  7 +++
>  9 files changed, 148 insertions(+), 45 deletions(-)
> 

Patches 1, 2, 4, 5, 6, and 7 applied to for-next.

I have concerns about patches 3, 8, and 9.  I'm skipping them until you
can address the comments I made on and off list.

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
    Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 8/9] RDMA/hns: Kernel notify usr space to stop ring db
  2019-08-12 13:14     ` Jason Gunthorpe
@ 2019-08-14  5:54       ` Yangyang Li
  0 siblings, 0 replies; 24+ messages in thread
From: Yangyang Li @ 2019-08-14  5:54 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky; +Cc: Lijun Ou, dledford, linux-rdma, linuxarm

Hi, Leon & Jason
Thanks a lot for your reply.

在 2019/8/12 21:14, Jason Gunthorpe 写道:
> On Mon, Aug 12, 2019 at 08:52:20AM +0300, Leon Romanovsky wrote:
>> On Fri, Aug 09, 2019 at 05:41:05PM +0800, Lijun Ou wrote:
>>> From: Yangyang Li <liyangyang20@huawei.com>
>>>
>>> In the reset scenario, if the kernel receives the reset signal,
>>> it needs to notify the user space to stop ring doorbell.
>>
>> I doubt about it, it is racy like hell and relies on assumption that
>> userspace will honor such request to stop.
> 
> Sounds like this is the device unplug flow we already have support
> for, use the APIs to drop the VMA refering to the doorbell

Thanks for the suggestion, I have found this new unplug API,
and  I am trying to use it in the driver..

> 
>>> Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
>>>  drivers/infiniband/hw/hns/hns_roce_device.h |  4 +++
>>>  drivers/infiniband/hw/hns/hns_roce_hw_v2.c  | 52 ++++++++++++++++++++++++++++-
>>>  drivers/infiniband/hw/hns/hns_roce_hw_v2.h  |  4 +++
>>>  drivers/infiniband/hw/hns/hns_roce_main.c   | 22 ++++++------
>>>  4 files changed, 70 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
>>> index 32465f5..be65fce 100644
>>> +++ b/drivers/infiniband/hw/hns/hns_roce_device.h
>>> @@ -268,6 +268,8 @@ enum {
>>>
>>>  #define PAGE_ADDR_SHIFT				12
>>>
>>> +#define HNS_ROCE_IS_RESETTING			1
>>> +
>>>  struct hns_roce_uar {
>>>  	u64		pfn;
>>>  	unsigned long	index;
>>> @@ -1043,6 +1045,8 @@ struct hns_roce_dev {
>>>  	u32			odb_offset;
>>>  	dma_addr_t		tptr_dma_addr;	/* only for hw v1 */
>>>  	u32			tptr_size;	/* only for hw v1 */
>>> +	struct page		*reset_page; /* store reset state */
>>> +	void			*reset_kaddr; /* addr of reset page */
>>>  	const struct hns_roce_hw *hw;
>>>  	void			*priv;
>>>  	struct workqueue_struct *irq_workq;
>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>>> index d33341e..138e5a8 100644
>>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>>> @@ -1867,17 +1867,49 @@ static void hns_roce_free_link_table(struct hns_roce_dev *hr_dev,
>>>  			  link_tbl->table.map);
>>>  }
>>>
>>> +static int hns_roce_v2_get_reset_page(struct hns_roce_dev *hr_dev)
>>> +{
>>> +	hr_dev->reset_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>> +	if (!hr_dev->reset_page)
>>> +		return -ENOMEM;
>>> +
>>> +	hr_dev->reset_kaddr = vmap(&hr_dev->reset_page, 1, VM_MAP, PAGE_KERNEL);
>>> +	if (!hr_dev->reset_kaddr)
>>> +		goto err_with_vmap;
> 
> Yes, this vmap is nonsense too, get_zeroed_page() is the right API
> 
> Jason
> 
> 
Thanks for the suggestion, put_page is not the correct usage,
I will use the appropriate API to fix it in the next patch.

Thanks


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 3/9] RDMA/hns: Completely release qp resources when hw err
  2019-08-12 15:29   ` Doug Ledford
@ 2019-08-14  6:02     ` Yangyang Li
  2019-08-14 15:05       ` Doug Ledford
  0 siblings, 1 reply; 24+ messages in thread
From: Yangyang Li @ 2019-08-14  6:02 UTC (permalink / raw)
  To: Doug Ledford, Lijun Ou, jgg; +Cc: leon, linux-rdma, linuxarm

Hi, Doug
Thanks a lot for your reply.

在 2019/8/12 23:29, Doug Ledford 写道:
> On Fri, 2019-08-09 at 17:41 +0800, Lijun Ou wrote:
>> From: Yangyang Li <liyangyang20@huawei.com>
>>
>> Even if no response from hardware, make sure that qp related
>> resources are completely released.
>>
>> Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
>> ---
>>  drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 12 ++++--------
>>  1 file changed, 4 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> index 7a14f0b..0409851 100644
>> --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> @@ -4562,16 +4562,14 @@ static int
>> hns_roce_v2_destroy_qp_common(struct hns_roce_dev *hr_dev,
>>  {
>>  	struct hns_roce_cq *send_cq, *recv_cq;
>>  	struct ib_device *ibdev = &hr_dev->ib_dev;
>> -	int ret;
>> +	int ret = 0;
>>  
>>  	if (hr_qp->ibqp.qp_type == IB_QPT_RC && hr_qp->state !=
>> IB_QPS_RESET) {
>>  		/* Modify qp to reset before destroying qp */
>>  		ret = hns_roce_v2_modify_qp(&hr_qp->ibqp, NULL, 0,
>>  					    hr_qp->state, IB_QPS_RESET);
>> -		if (ret) {
>> +		if (ret)
>>  			ibdev_err(ibdev, "modify QP to Reset
>> failed.\n");
>> -			return ret;
>> -		}
>>  	}
>>  
>>  	send_cq = to_hr_cq(hr_qp->ibqp.send_cq);
>> @@ -4627,7 +4625,7 @@ static int hns_roce_v2_destroy_qp_common(struct
>> hns_roce_dev *hr_dev,
>>  		kfree(hr_qp->rq_inl_buf.wqe_list);
>>  	}
>>  
>> -	return 0;
>> +	return ret;
>>  }
>>  
>>  static int hns_roce_v2_destroy_qp(struct ib_qp *ibqp, struct ib_udata
>> *udata)
>> @@ -4637,11 +4635,9 @@ static int hns_roce_v2_destroy_qp(struct ib_qp
>> *ibqp, struct ib_udata *udata)
>>  	int ret;
>>  
>>  	ret = hns_roce_v2_destroy_qp_common(hr_dev, hr_qp, udata);
>> -	if (ret) {
>> +	if (ret)
>>  		ibdev_err(&hr_dev->ib_dev, "Destroy qp 0x%06lx
>> failed(%d)\n",
>>  			  hr_qp->qpn, ret);
>> -		return ret;
>> -	}
>>  
>>  	if (hr_qp->ibqp.qp_type == IB_QPT_GSI)
>>  		kfree(hr_to_hr_sqp(hr_qp));
> 
> I don't know your hardware, but this patch sounds wrong/dangerous to me.
> As long as the resources this card might access are allocated by the
> kernel, you can't get random data corruption by the card writing to
> memory used elsewhere in the kernel.  So if your card is not responding
> to your requests to free the resources, it would seem safer to leak
> those resources permanently than to free them and risk the card coming
> back to life long enough to corrupt memory reallocated to some other
> task.
> 
> Only if you can guarantee me that there is no way your commands to the
> card will fail and then the card start working again later would I
> consider this patch safe.  And if it's possible for the card to hang
> like this, should that be triggering a reset of the device?
> 

Thanks for your suggestion, I agree with you, it would seem safer to leak
those resources permanently than to free them. I will abandon this change
and consider cleaning up these leaked resources during uninstallation or reset.

Thanks


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 3/9] RDMA/hns: Completely release qp resources when hw err
  2019-08-14  6:02     ` Yangyang Li
@ 2019-08-14 15:05       ` Doug Ledford
  2019-08-14 18:47         ` Leon Romanovsky
  0 siblings, 1 reply; 24+ messages in thread
From: Doug Ledford @ 2019-08-14 15:05 UTC (permalink / raw)
  To: Yangyang Li, Lijun Ou, jgg; +Cc: leon, linux-rdma, linuxarm

[-- Attachment #1: Type: text/plain, Size: 1352 bytes --]

On Wed, 2019-08-14 at 14:02 +0800, Yangyang Li wrote:
> > I don't know your hardware, but this patch sounds wrong/dangerous to
> > me.
> > As long as the resources this card might access are allocated by the
> > kernel, you can't get random data corruption by the card writing to
> > memory used elsewhere in the kernel.  So if your card is not
> > responding
> > to your requests to free the resources, it would seem safer to leak
> > those resources permanently than to free them and risk the card
> > coming
> > back to life long enough to corrupt memory reallocated to some other
> > task.
> > 
> > Only if you can guarantee me that there is no way your commands to
> > the
> > card will fail and then the card start working again later would I
> > consider this patch safe.  And if it's possible for the card to hang
> > like this, should that be triggering a reset of the device?
> > 
> 
> Thanks for your suggestion, I agree with you, it would seem safer to
> leak
> those resources permanently than to free them. I will abandon this
> change
> and consider cleaning up these leaked resources during uninstallation
> or reset.

Ok, patch dropped from patchworks, thanks.

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
    Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 3/9] RDMA/hns: Completely release qp resources when hw err
  2019-08-14 15:05       ` Doug Ledford
@ 2019-08-14 18:47         ` Leon Romanovsky
  2019-08-19 17:39           ` Doug Ledford
  0 siblings, 1 reply; 24+ messages in thread
From: Leon Romanovsky @ 2019-08-14 18:47 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Yangyang Li, Lijun Ou, jgg, linux-rdma, linuxarm

On Wed, Aug 14, 2019 at 11:05:04AM -0400, Doug Ledford wrote:
> On Wed, 2019-08-14 at 14:02 +0800, Yangyang Li wrote:
> > > I don't know your hardware, but this patch sounds wrong/dangerous to
> > > me.
> > > As long as the resources this card might access are allocated by the
> > > kernel, you can't get random data corruption by the card writing to
> > > memory used elsewhere in the kernel.  So if your card is not
> > > responding
> > > to your requests to free the resources, it would seem safer to leak
> > > those resources permanently than to free them and risk the card
> > > coming
> > > back to life long enough to corrupt memory reallocated to some other
> > > task.
> > >
> > > Only if you can guarantee me that there is no way your commands to
> > > the
> > > card will fail and then the card start working again later would I
> > > consider this patch safe.  And if it's possible for the card to hang
> > > like this, should that be triggering a reset of the device?
> > >
> >
> > Thanks for your suggestion, I agree with you, it would seem safer to
> > leak
> > those resources permanently than to free them. I will abandon this
> > change
> > and consider cleaning up these leaked resources during uninstallation
> > or reset.
>
> Ok, patch dropped from patchworks, thanks.

Sorry for being late, but I don't like the idea of having leaked memory.

All my allocation patches are actually trying to avoid such situation
by ensuring that no driver does crazy stuff like this. It means that
once I'll have time to work on QP allocations, I'll ensure that memory
is freed, so it is better to free such memory now.

Thanks

>
> --
> Doug Ledford <dledford@redhat.com>
>     GPG KeyID: B826A3330E572FDD
>     Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 3/9] RDMA/hns: Completely release qp resources when hw err
  2019-08-14 18:47         ` Leon Romanovsky
@ 2019-08-19 17:39           ` Doug Ledford
  2019-10-08  8:43             ` liweihang
  0 siblings, 1 reply; 24+ messages in thread
From: Doug Ledford @ 2019-08-19 17:39 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Yangyang Li, Lijun Ou, jgg, linux-rdma, linuxarm

[-- Attachment #1: Type: text/plain, Size: 2147 bytes --]

On Wed, 2019-08-14 at 21:47 +0300, Leon Romanovsky wrote:
> On Wed, Aug 14, 2019 at 11:05:04AM -0400, Doug Ledford wrote:
> > On Wed, 2019-08-14 at 14:02 +0800, Yangyang Li wrote:
> > > > I don't know your hardware, but this patch sounds
> > > > wrong/dangerous to
> > > > me.
> > > > As long as the resources this card might access are allocated by
> > > > the
> > > > kernel, you can't get random data corruption by the card writing
> > > > to
> > > > memory used elsewhere in the kernel.  So if your card is not
> > > > responding
> > > > to your requests to free the resources, it would seem safer to
> > > > leak
> > > > those resources permanently than to free them and risk the card
> > > > coming
> > > > back to life long enough to corrupt memory reallocated to some
> > > > other
> > > > task.
> > > > 
> > > > Only if you can guarantee me that there is no way your commands
> > > > to
> > > > the
> > > > card will fail and then the card start working again later would
> > > > I
> > > > consider this patch safe.  And if it's possible for the card to
> > > > hang
> > > > like this, should that be triggering a reset of the device?
> > > > 
> > > 
> > > Thanks for your suggestion, I agree with you, it would seem safer
> > > to
> > > leak
> > > those resources permanently than to free them. I will abandon this
> > > change
> > > and consider cleaning up these leaked resources during
> > > uninstallation
> > > or reset.
> > 
> > Ok, patch dropped from patchworks, thanks.
> 
> Sorry for being late, but I don't like the idea of having leaked
> memory.
> 
> All my allocation patches are actually trying to avoid such situation
> by ensuring that no driver does crazy stuff like this. It means that
> once I'll have time to work on QP allocations, I'll ensure that memory
> is freed, so it is better to free such memory now.

You can't free something if the card might still access it.  A leak is
always preferable to data corruption.

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
    Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 0/9] Bugfixes for 5.3-rc2
  2019-08-13 16:34 ` [PATCH for-next 0/9] Bugfixes for 5.3-rc2 Doug Ledford
@ 2019-08-24  6:23   ` oulijun
  0 siblings, 0 replies; 24+ messages in thread
From: oulijun @ 2019-08-24  6:23 UTC (permalink / raw)
  To: Doug Ledford, jgg; +Cc: leon, linux-rdma, linuxarm

在 2019/8/14 0:34, Doug Ledford 写道:
> On Fri, 2019-08-09 at 17:40 +0800, Lijun Ou wrote:
>> Here fixes some bugs for hip08
>>
>> Lang Cheng (1):
>>   RDMA/hns: Remove unuseful member
>>
>> Lijun Ou (2):
>>   RDMA/hns: Bugfix for creating qp attached to srq
>>   RDMA/hns: Copy some information of AV to user
>>
>> Weihang Li (1):
>>   RDMA/hns: Logic optimization of wc_flags
>>
>> Xi Wang (2):
>>   RDMA/hns: Bugfix for slab-out-of-bounds when unloading hip08 driver
>>   RDMA/hns: bugfix for slab-out-of-bounds when loading hip08 driver
>>
>> Yangyang Li (3):
>>   RDMA/hns: Completely release qp resources when hw err
>>   RDMA/hns: Modify pi vlaue when cq overflows
>>   RDMA/hns: Kernel notify usr space to stop ring db
>>
>>  drivers/infiniband/hw/hns/hns_roce_ah.c     | 22 ++++++--
>>  drivers/infiniband/hw/hns/hns_roce_cmd.c    |  1 -
>>  drivers/infiniband/hw/hns/hns_roce_device.h |  8 ++-
>>  drivers/infiniband/hw/hns/hns_roce_hem.c    | 19 ++++---
>>  drivers/infiniband/hw/hns/hns_roce_hw_v2.c  | 78
>> +++++++++++++++++++++++------
>>  drivers/infiniband/hw/hns/hns_roce_hw_v2.h  |  4 ++
>>  drivers/infiniband/hw/hns/hns_roce_main.c   | 29 +++++++----
>>  drivers/infiniband/hw/hns/hns_roce_qp.c     | 25 ++++++---
>>  include/uapi/rdma/hns-abi.h                 |  7 +++
>>  9 files changed, 148 insertions(+), 45 deletions(-)
>>
> Patches 1, 2, 4, 5, 6, and 7 applied to for-next.
>
> I have concerns about patches 3, 8, and 9.  I'm skipping them until you
> can address the comments I made on and off list.
>
Thanks. We will talk about further analysis discussion for the comments before send the next patch.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH for-next 3/9] RDMA/hns: Completely release qp resources when hw err
  2019-08-19 17:39           ` Doug Ledford
@ 2019-10-08  8:43             ` liweihang
  0 siblings, 0 replies; 24+ messages in thread
From: liweihang @ 2019-10-08  8:43 UTC (permalink / raw)
  To: Doug Ledford, Leon Romanovsky; +Cc: linux-rdma, Linuxarm, jgg



> -----Original Message-----
> From: Linuxarm [mailto:linuxarm-bounces@huawei.com] On Behalf Of Doug
> Ledford
> Sent: Tuesday, August 20, 2019 1:40 AM
> To: Leon Romanovsky <leon@kernel.org>
> Cc: linux-rdma@vger.kernel.org; Linuxarm <linuxarm@huawei.com>;
> jgg@ziepe.ca
> Subject: Re: [PATCH for-next 3/9] RDMA/hns: Completely release qp
> resources when hw err
> 
> On Wed, 2019-08-14 at 21:47 +0300, Leon Romanovsky wrote:
> > On Wed, Aug 14, 2019 at 11:05:04AM -0400, Doug Ledford wrote:
> > > On Wed, 2019-08-14 at 14:02 +0800, Yangyang Li wrote:
> > > > > I don't know your hardware, but this patch sounds
> > > > > wrong/dangerous to me.
> > > > > As long as the resources this card might access are allocated by
> > > > > the kernel, you can't get random data corruption by the card
> > > > > writing to memory used elsewhere in the kernel.  So if your card
> > > > > is not responding to your requests to free the resources, it
> > > > > would seem safer to leak those resources permanently than to
> > > > > free them and risk the card coming back to life long enough to
> > > > > corrupt memory reallocated to some other task.
> > > > >
> > > > > Only if you can guarantee me that there is no way your commands
> > > > > to the card will fail and then the card start working again
> > > > > later would I consider this patch safe.  And if it's possible
> > > > > for the card to hang like this, should that be triggering a
> > > > > reset of the device?
> > > > >
> > > >
> > > > Thanks for your suggestion, I agree with you, it would seem safer
> > > > to leak those resources permanently than to free them. I will
> > > > abandon this change and consider cleaning up these leaked
> > > > resources during uninstallation or reset.
> > >
> > > Ok, patch dropped from patchworks, thanks.
> >
> > Sorry for being late, but I don't like the idea of having leaked
> > memory.
> >
> > All my allocation patches are actually trying to avoid such situation
> > by ensuring that no driver does crazy stuff like this. It means that
> > once I'll have time to work on QP allocations, I'll ensure that memory
> > is freed, so it is better to free such memory now.
> 
> You can't free something if the card might still access it.  A leak is always
> preferable to data corruption.
> 

Hi Doug,

We can confirm that hip08 hardware won't start working again when failed
to destroy after considering all possible situations. So it is safe to free qp
resources  even if the destroy command failed executed. I will resend this
patch to avoid memory leaks as Jason asked.

Thanks,
Weihang

> --
> Doug Ledford <dledford@redhat.com>
>     GPG KeyID: B826A3330E572FDD
>     Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 9/9] RDMA/hns: Copy some information of AV to user
  2019-08-09  9:41 ` [PATCH for-next 9/9] RDMA/hns: Copy some information of AV to user Lijun Ou
@ 2019-10-21 17:23   ` Doug Ledford
  2019-10-22  1:13     ` oulijun
  0 siblings, 1 reply; 24+ messages in thread
From: Doug Ledford @ 2019-10-21 17:23 UTC (permalink / raw)
  To: Lijun Ou, jgg; +Cc: leon, linux-rdma, linuxarm

[-- Attachment #1: Type: text/plain, Size: 1012 bytes --]

On Fri, 2019-08-09 at 17:41 +0800, Lijun Ou wrote:
> When the driver support UD transport in user mode, it needs to
> create the Address Handle(AH) and transfer Address Vector to
> The hardware. The Address Vector includes the destination mac
> and vlan inforation and it will be generated from the kernel
> driver. As a result, we can copy this information to user
> through ib_copy_to_udata function.
> 
> Signed-off-by: Lijun Ou <oulijun@huawei.com>

This patch is broken.  There are multiple instance of whitespace
breakage (spaces that should be tabs, etc.), and at least one instance
of a hanging character that makes no sense:

> 
>  	ah->av.sl_tclass_flowlabel = cpu_to_le32(rdma_ah_get_sl(ah_attr)
> <<
> -						 HNS_ROCE_SL_SHIFT);
> +				i		 HNS_ROCE_SL_SHIFT);
                                ^ ?
> 

This needs to be resubmitted.

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
    Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-next 9/9] RDMA/hns: Copy some information of AV to user
  2019-10-21 17:23   ` Doug Ledford
@ 2019-10-22  1:13     ` oulijun
  0 siblings, 0 replies; 24+ messages in thread
From: oulijun @ 2019-10-22  1:13 UTC (permalink / raw)
  To: Doug Ledford, jgg; +Cc: leon, linux-rdma, linuxarm

在 2019/10/22 1:23, Doug Ledford 写道:
> On Fri, 2019-08-09 at 17:41 +0800, Lijun Ou wrote:
>> When the driver support UD transport in user mode, it needs to
>> create the Address Handle(AH) and transfer Address Vector to
>> The hardware. The Address Vector includes the destination mac
>> and vlan inforation and it will be generated from the kernel
>> driver. As a result, we can copy this information to user
>> through ib_copy_to_udata function.
>>
>> Signed-off-by: Lijun Ou <oulijun@huawei.com>
> This patch is broken.  There are multiple instance of whitespace
> breakage (spaces that should be tabs, etc.), and at least one instance
> of a hanging character that makes no sense:
>
>>  	ah->av.sl_tclass_flowlabel = cpu_to_le32(rdma_ah_get_sl(ah_attr)
>> <<
>> -						 HNS_ROCE_SL_SHIFT);
>> +				i		 HNS_ROCE_SL_SHIFT);
>                                 ^ ?
> This needs to be resubmitted.
>
ok， I will do



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2019-10-22  1:14 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-09  9:40 [PATCH for-next 0/9] Bugfixes for 5.3-rc2 Lijun Ou
2019-08-09  9:40 ` [PATCH for-next 1/9] RDMA/hns: Logic optimization of wc_flags Lijun Ou
2019-08-09  9:40 ` [PATCH for-next 2/9] RDMA/hns: Bugfix for creating qp attached to srq Lijun Ou
2019-08-12 15:29   ` Doug Ledford
2019-08-09  9:41 ` [PATCH for-next 3/9] RDMA/hns: Completely release qp resources when hw err Lijun Ou
2019-08-12 15:29   ` Doug Ledford
2019-08-14  6:02     ` Yangyang Li
2019-08-14 15:05       ` Doug Ledford
2019-08-14 18:47         ` Leon Romanovsky
2019-08-19 17:39           ` Doug Ledford
2019-10-08  8:43             ` liweihang
2019-08-09  9:41 ` [PATCH for-next 4/9] RDMA/hns: Modify pi vlaue when cq overflows Lijun Ou
2019-08-09  9:41 ` [PATCH for-next 5/9] RDMA/hns: Bugfix for slab-out-of-bounds when unloading hip08 driver Lijun Ou
2019-08-09  9:41 ` [PATCH for-next 6/9] RDMA/hns: bugfix for slab-out-of-bounds when loading " Lijun Ou
2019-08-09  9:41 ` [PATCH for-next 7/9] RDMA/hns: Remove unuseful member Lijun Ou
2019-08-09  9:41 ` [PATCH for-next 8/9] RDMA/hns: Kernel notify usr space to stop ring db Lijun Ou
2019-08-12  5:52   ` Leon Romanovsky
2019-08-12 13:14     ` Jason Gunthorpe
2019-08-14  5:54       ` Yangyang Li
2019-08-09  9:41 ` [PATCH for-next 9/9] RDMA/hns: Copy some information of AV to user Lijun Ou
2019-10-21 17:23   ` Doug Ledford
2019-10-22  1:13     ` oulijun
2019-08-13 16:34 ` [PATCH for-next 0/9] Bugfixes for 5.3-rc2 Doug Ledford
2019-08-24  6:23   ` oulijun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).