All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Liuyixian (Eason)" <liuyixian@huawei.com>
To: Leon Romanovsky <leon@kernel.org>
Cc: <dledford@redhat.com>, <jgg@ziepe.ca>,
	<linux-rdma@vger.kernel.org>, <linuxarm@huawei.com>
Subject: Re: [PATCH v4 for-next 1/2] RDMA/hns: Add the workqueue framework for flush cqe handler
Date: Fri, 27 Dec 2019 17:59:16 +0800	[thread overview]
Message-ID: <a507777e-e1cb-1473-f0cf-35346b2706ee@huawei.com> (raw)
In-Reply-To: <20191226081923.GB6285@unreal>



On 2019/12/26 16:19, Leon Romanovsky wrote:
> On Tue, Dec 24, 2019 at 09:10:13PM +0800, Yixian Liu wrote:
>> HiP08 RoCE hardware lacks ability(a known hardware problem) to flush
>> outstanding WQEs if QP state gets into errored mode for some reason.
>> To overcome this hardware problem and as a workaround, when QP is
>> detected to be in errored state during various legs like post send,
>> post receive etc [1], flush needs to be performed from the driver.
>>
>> The earlier patch[1] sent to solve the hardware limitation explained
>> in the cover-letter had a bug in the software flushing leg. It
>> acquired mutex while modifying QP state to errored state and while
>> conveying it to the hardware using the mailbox. This caused leg to
>> sleep while holding spin-lock and caused crash.
>>
>> Suggested Solution:
>> we have proposed to defer the flushing of the QP in the Errored state
>> using the workqueue to get around with the limitation of our hardware.
>>
>> This patch adds the framework of the workqueue and the flush handler
>> function.
>>
>> [1] https://patchwork.kernel.org/patch/10534271/
>>
>> Signed-off-by: Yixian Liu <liuyixian@huawei.com>
>> Reviewed-by: Salil Mehta <salil.mehta@huawei.com>
>> ---
>>  drivers/infiniband/hw/hns/hns_roce_device.h |  2 ++
>>  drivers/infiniband/hw/hns/hns_roce_hw_v2.c  |  4 +--
>>  drivers/infiniband/hw/hns/hns_roce_qp.c     | 43 +++++++++++++++++++++++++++++
>>  3 files changed, 47 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
>> index a1b712e..292b712 100644
>> --- a/drivers/infiniband/hw/hns/hns_roce_device.h
>> +++ b/drivers/infiniband/hw/hns/hns_roce_device.h
>> @@ -906,6 +906,7 @@ struct hns_roce_caps {
>>  struct hns_roce_work {
>>  	struct hns_roce_dev *hr_dev;
>>  	struct work_struct work;
>> +	struct hns_roce_qp *hr_qp;
>>  	u32 qpn;
>>  	u32 cqn;
>>  	int event_type;
>> @@ -1226,6 +1227,7 @@ struct ib_qp *hns_roce_create_qp(struct ib_pd *ib_pd,
>>  				 struct ib_udata *udata);
>>  int hns_roce_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
>>  		       int attr_mask, struct ib_udata *udata);
>> +void init_flush_work(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp);
>>  void *get_recv_wqe(struct hns_roce_qp *hr_qp, int n);
>>  void *get_send_wqe(struct hns_roce_qp *hr_qp, int n);
>>  void *get_send_extend_sge(struct hns_roce_qp *hr_qp, int n);
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> index 907c951..ec48e7e 100644
>> --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> @@ -5967,8 +5967,8 @@ static int hns_roce_v2_init_eq_table(struct hns_roce_dev *hr_dev)
>>  		goto err_request_irq_fail;
>>  	}
>>
>> -	hr_dev->irq_workq =
>> -		create_singlethread_workqueue("hns_roce_irq_workqueue");
>> +	hr_dev->irq_workq = alloc_workqueue("hns_roce_irq_workqueue",
>> +					    WQ_MEM_RECLAIM, 0);
> 
> Combination of WQ_MEM_RECLAIM flag with kzalloc inside init_flush_work()
> can't be correct at the same time.
> 
Thanks a lot for reminder!
I will check with previous discussion on the flag WQ_MEM_RECLAIM and fix it next version.


  reply	other threads:[~2019-12-27  9:59 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-24 13:10 [PATCH v4 for-next 0/2] Fix crash due to sleepy mutex while holding lock in post_{send|recv|poll} Yixian Liu
2019-12-24 13:10 ` [PATCH v4 for-next 1/2] RDMA/hns: Add the workqueue framework for flush cqe handler Yixian Liu
2019-12-26  8:19   ` Leon Romanovsky
2019-12-27  9:59     ` Liuyixian (Eason) [this message]
2019-12-24 13:10 ` [PATCH v4 for-next 2/2] RDMA/hns: Delayed flush cqe process with workqueue Yixian Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a507777e-e1cb-1473-f0cf-35346b2706ee@huawei.com \
    --to=liuyixian@huawei.com \
    --cc=dledford@redhat.com \
    --cc=jgg@ziepe.ca \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.