Re: [PATCH for-rc] IB/isert: Fix hang in iscsit_wait_for_tag

* Re: [PATCH for-rc] IB/isert: Fix hang in iscsit_wait_for_tag
       [not found] <20230119210659.1871-1-shiraz.saleem@intel.com>
@ 2023-01-23  9:19 ` Sagi Grimberg
       [not found]   ` <SA2PR11MB495347CE35C9ED97CD80C422F3CC9@SA2PR11MB4953.namprd11.prod.outlook.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Sagi Grimberg @ 2023-01-23  9:19 UTC (permalink / raw)
  To: Shiraz Saleem, jgg, leon, linux-rdma, target-devel, Mike Christie
  Cc: Mustafa Ismail, Mike Marciniszyn

> From: Mustafa Ismail <mustafa.ismail@intel.com>
> 
> Running fio can occasionally cause a hang when sbitmap_queue_get() fails to
> return a tag in iscsit_allocate_cmd() and iscsit_wait_for_tag() is called
> and will never return from the schedule(). This is because the polling
> thread of the CQ is suspended, and will not poll for a SQ completion which
> would free up a tag.
> Fix this by creating a separate CQ for the SQ so that send completions are
> processed on a separate thread and are not blocked when the RQ CQ is
> stalled.
> 
> Fixes: 10e9cbb6b531 ("scsi: target: Convert target drivers to use sbitmap")

Is this the real offending commit? What prevented this from happening
before?

> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
> Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
> ---
>   drivers/infiniband/ulp/isert/ib_isert.c | 33 +++++++++++++++++++++++----------
>   drivers/infiniband/ulp/isert/ib_isert.h |  3 ++-
>   2 files changed, 25 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
> index 7540488..f827b91 100644
> --- a/drivers/infiniband/ulp/isert/ib_isert.c
> +++ b/drivers/infiniband/ulp/isert/ib_isert.c
> @@ -109,19 +109,27 @@ static int isert_sg_tablesize_set(const char *val, const struct kernel_param *kp
>   	struct ib_qp_init_attr attr;
>   	int ret, factor;
>   
> -	isert_conn->cq = ib_cq_pool_get(ib_dev, cq_size, -1, IB_POLL_WORKQUEUE);
> -	if (IS_ERR(isert_conn->cq)) {
> -		isert_err("Unable to allocate cq\n");
> -		ret = PTR_ERR(isert_conn->cq);
> +	isert_conn->snd_cq = ib_cq_pool_get(ib_dev, cq_size, -1,
> +					    IB_POLL_WORKQUEUE);
> +	if (IS_ERR(isert_conn->snd_cq)) {
> +		isert_err("Unable to allocate send cq\n");
> +		ret = PTR_ERR(isert_conn->snd_cq);
>   		return ERR_PTR(ret);
>   	}
> +	isert_conn->rcv_cq = ib_cq_pool_get(ib_dev, cq_size, -1,
> +					    IB_POLL_WORKQUEUE);
> +	if (IS_ERR(isert_conn->rcv_cq)) {
> +		isert_err("Unable to allocate receive cq\n");
> +		ret = PTR_ERR(isert_conn->rcv_cq);
> +		goto create_cq_err;
> +	}

Does this have any noticeable performance implications?

Also I wander if there are any other assumptions in the code
for having a single context processing completions...

It'd be much easier if iscsi_allocate_cmd could accept
a timeout to fail...

CCing target-devel and Mike.

^ permalink raw reply	[flat|nested] 6+ messages in thread