From: Yangyang Li <liyangyang20@huawei.com>
To: Doug Ledford <dledford@redhat.com>, Lijun Ou <oulijun@huawei.com>,
<jgg@ziepe.ca>
Cc: <leon@kernel.org>, <linux-rdma@vger.kernel.org>, <linuxarm@huawei.com>
Subject: Re: [PATCH for-next 3/9] RDMA/hns: Completely release qp resources when hw err
Date: Wed, 14 Aug 2019 14:02:49 +0800 [thread overview]
Message-ID: <0d325f78-a929-f088-cc29-e2c7af98fd40@huawei.com> (raw)
In-Reply-To: <f49c56933205d90d82ffd3fa55a951843e22cda1.camel@redhat.com>
Hi, Doug
Thanks a lot for your reply.
在 2019/8/12 23:29, Doug Ledford 写道:
> On Fri, 2019-08-09 at 17:41 +0800, Lijun Ou wrote:
>> From: Yangyang Li <liyangyang20@huawei.com>
>>
>> Even if no response from hardware, make sure that qp related
>> resources are completely released.
>>
>> Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
>> ---
>> drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 12 ++++--------
>> 1 file changed, 4 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> index 7a14f0b..0409851 100644
>> --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> @@ -4562,16 +4562,14 @@ static int
>> hns_roce_v2_destroy_qp_common(struct hns_roce_dev *hr_dev,
>> {
>> struct hns_roce_cq *send_cq, *recv_cq;
>> struct ib_device *ibdev = &hr_dev->ib_dev;
>> - int ret;
>> + int ret = 0;
>>
>> if (hr_qp->ibqp.qp_type == IB_QPT_RC && hr_qp->state !=
>> IB_QPS_RESET) {
>> /* Modify qp to reset before destroying qp */
>> ret = hns_roce_v2_modify_qp(&hr_qp->ibqp, NULL, 0,
>> hr_qp->state, IB_QPS_RESET);
>> - if (ret) {
>> + if (ret)
>> ibdev_err(ibdev, "modify QP to Reset
>> failed.\n");
>> - return ret;
>> - }
>> }
>>
>> send_cq = to_hr_cq(hr_qp->ibqp.send_cq);
>> @@ -4627,7 +4625,7 @@ static int hns_roce_v2_destroy_qp_common(struct
>> hns_roce_dev *hr_dev,
>> kfree(hr_qp->rq_inl_buf.wqe_list);
>> }
>>
>> - return 0;
>> + return ret;
>> }
>>
>> static int hns_roce_v2_destroy_qp(struct ib_qp *ibqp, struct ib_udata
>> *udata)
>> @@ -4637,11 +4635,9 @@ static int hns_roce_v2_destroy_qp(struct ib_qp
>> *ibqp, struct ib_udata *udata)
>> int ret;
>>
>> ret = hns_roce_v2_destroy_qp_common(hr_dev, hr_qp, udata);
>> - if (ret) {
>> + if (ret)
>> ibdev_err(&hr_dev->ib_dev, "Destroy qp 0x%06lx
>> failed(%d)\n",
>> hr_qp->qpn, ret);
>> - return ret;
>> - }
>>
>> if (hr_qp->ibqp.qp_type == IB_QPT_GSI)
>> kfree(hr_to_hr_sqp(hr_qp));
>
> I don't know your hardware, but this patch sounds wrong/dangerous to me.
> As long as the resources this card might access are allocated by the
> kernel, you can't get random data corruption by the card writing to
> memory used elsewhere in the kernel. So if your card is not responding
> to your requests to free the resources, it would seem safer to leak
> those resources permanently than to free them and risk the card coming
> back to life long enough to corrupt memory reallocated to some other
> task.
>
> Only if you can guarantee me that there is no way your commands to the
> card will fail and then the card start working again later would I
> consider this patch safe. And if it's possible for the card to hang
> like this, should that be triggering a reset of the device?
>
Thanks for your suggestion, I agree with you, it would seem safer to leak
those resources permanently than to free them. I will abandon this change
and consider cleaning up these leaked resources during uninstallation or reset.
Thanks
next prev parent reply other threads:[~2019-08-14 6:03 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-09 9:40 [PATCH for-next 0/9] Bugfixes for 5.3-rc2 Lijun Ou
2019-08-09 9:40 ` [PATCH for-next 1/9] RDMA/hns: Logic optimization of wc_flags Lijun Ou
2019-08-09 9:40 ` [PATCH for-next 2/9] RDMA/hns: Bugfix for creating qp attached to srq Lijun Ou
2019-08-12 15:29 ` Doug Ledford
2019-08-09 9:41 ` [PATCH for-next 3/9] RDMA/hns: Completely release qp resources when hw err Lijun Ou
2019-08-12 15:29 ` Doug Ledford
2019-08-14 6:02 ` Yangyang Li [this message]
2019-08-14 15:05 ` Doug Ledford
2019-08-14 18:47 ` Leon Romanovsky
2019-08-19 17:39 ` Doug Ledford
2019-10-08 8:43 ` liweihang
2019-08-09 9:41 ` [PATCH for-next 4/9] RDMA/hns: Modify pi vlaue when cq overflows Lijun Ou
2019-08-09 9:41 ` [PATCH for-next 5/9] RDMA/hns: Bugfix for slab-out-of-bounds when unloading hip08 driver Lijun Ou
2019-08-09 9:41 ` [PATCH for-next 6/9] RDMA/hns: bugfix for slab-out-of-bounds when loading " Lijun Ou
2019-08-09 9:41 ` [PATCH for-next 7/9] RDMA/hns: Remove unuseful member Lijun Ou
2019-08-09 9:41 ` [PATCH for-next 8/9] RDMA/hns: Kernel notify usr space to stop ring db Lijun Ou
2019-08-12 5:52 ` Leon Romanovsky
2019-08-12 13:14 ` Jason Gunthorpe
2019-08-14 5:54 ` Yangyang Li
2019-08-09 9:41 ` [PATCH for-next 9/9] RDMA/hns: Copy some information of AV to user Lijun Ou
2019-10-21 17:23 ` Doug Ledford
2019-10-22 1:13 ` oulijun
2019-08-13 16:34 ` [PATCH for-next 0/9] Bugfixes for 5.3-rc2 Doug Ledford
2019-08-24 6:23 ` oulijun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0d325f78-a929-f088-cc29-e2c7af98fd40@huawei.com \
--to=liyangyang20@huawei.com \
--cc=dledford@redhat.com \
--cc=jgg@ziepe.ca \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linuxarm@huawei.com \
--cc=oulijun@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).