* [PATCH v3] iwcm: don't hold the irq disabled lock on iw_rem_ref
@ 2019-09-04 21:25 Sagi Grimberg
2019-09-10 11:18 ` Krishnamraju Eraparaju
0 siblings, 1 reply; 15+ messages in thread
From: Sagi Grimberg @ 2019-09-04 21:25 UTC (permalink / raw)
To: linux-rdma; +Cc: Jason Gunthorpe
This may be the final put on a qp and result in freeing
resourcesand should not be done with interrupts disabled.
Produce the following warning:
--
[ 317.026048] WARNING: CPU: 1 PID: 443 at kernel/smp.c:425 smp_call_function_many+0xa0/0x260
[ 317.026131] Call Trace:
[ 317.026159] ? load_new_mm_cr3+0xe0/0xe0
[ 317.026161] on_each_cpu+0x28/0x50
[ 317.026183] __purge_vmap_area_lazy+0x72/0x150
[ 317.026200] free_vmap_area_noflush+0x7a/0x90
[ 317.026202] remove_vm_area+0x6f/0x80
[ 317.026203] __vunmap+0x71/0x210
[ 317.026211] siw_free_qp+0x8d/0x130 [siw]
[ 317.026217] destroy_cm_id+0xc3/0x200 [iw_cm]
[ 317.026222] rdma_destroy_id+0x224/0x2b0 [rdma_cm]
[ 317.026226] nvme_rdma_reset_ctrl_work+0x2c/0x70 [nvme_rdma]
[ 317.026235] process_one_work+0x1f4/0x3e0
[ 317.026249] worker_thread+0x221/0x3e0
[ 317.026252] ? process_one_work+0x3e0/0x3e0
[ 317.026256] kthread+0x117/0x130
[ 317.026264] ? kthread_create_worker_on_cpu+0x70/0x70
[ 317.026275] ret_from_fork+0x35/0x40
--
Fix this by exchanging the qp pointer early on and safely destroying
it.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
Changes from v2:
- store the qp locally so we don't need to unlock the cm_id_priv lock when
destroying the qp
Changes from v1:
- don't release the lock before qp pointer is cleared.
drivers/infiniband/core/iwcm.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c
index 72141c5b7c95..c64707f68d22 100644
--- a/drivers/infiniband/core/iwcm.c
+++ b/drivers/infiniband/core/iwcm.c
@@ -373,8 +373,10 @@ static void destroy_cm_id(struct iw_cm_id *cm_id)
{
struct iwcm_id_private *cm_id_priv;
unsigned long flags;
+ struct ib_qp *qp;
cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
+ qp = xchg(&cm_id_priv->qp, NULL);
/*
* Wait if we're currently in a connect or accept downcall. A
* listening endpoint should never block here.
@@ -401,7 +403,7 @@ static void destroy_cm_id(struct iw_cm_id *cm_id)
cm_id_priv->state = IW_CM_STATE_DESTROYING;
spin_unlock_irqrestore(&cm_id_priv->lock, flags);
/* Abrupt close of the connection */
- (void)iwcm_modify_qp_err(cm_id_priv->qp);
+ (void)iwcm_modify_qp_err(qp);
spin_lock_irqsave(&cm_id_priv->lock, flags);
break;
case IW_CM_STATE_IDLE:
@@ -426,11 +428,9 @@ static void destroy_cm_id(struct iw_cm_id *cm_id)
BUG();
break;
}
- if (cm_id_priv->qp) {
- cm_id_priv->id.device->ops.iw_rem_ref(cm_id_priv->qp);
- cm_id_priv->qp = NULL;
- }
spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+ if (qp)
+ cm_id_priv->id.device->ops.iw_rem_ref(qp);
if (cm_id->mapped) {
iwpm_remove_mapinfo(&cm_id->local_addr, &cm_id->m_local_addr);
--
2.17.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v3] iwcm: don't hold the irq disabled lock on iw_rem_ref
2019-09-04 21:25 [PATCH v3] iwcm: don't hold the irq disabled lock on iw_rem_ref Sagi Grimberg
@ 2019-09-10 11:18 ` Krishnamraju Eraparaju
2019-09-10 16:53 ` Sagi Grimberg
0 siblings, 1 reply; 15+ messages in thread
From: Krishnamraju Eraparaju @ 2019-09-10 11:18 UTC (permalink / raw)
To: Sagi Grimberg; +Cc: linux-rdma, Jason Gunthorpe
On Wednesday, September 09/04/19, 2019 at 14:25:31 -0700, Sagi Grimberg wrote:
> This may be the final put on a qp and result in freeing
> resourcesand should not be done with interrupts disabled.
Hi Sagi,
Few things to consider in fixing this completely:
- there are some other places where iw_rem_ref() should be called
after spinlock critical section. eg: in cm_close_handler(),
iw_cm_connect(),...
- Any modifications to "cm_id_priv" should be done with in spinlock
critical section, modifying cm_id_priv->qp outside spinlocks, even
with atomic xchg(), might be error prone.
- the structure "siw_base_qp" is getting freed in siw_destroy_qp(),
but it should be done at the end of siw_free_qp().
I am about to finish writing a patch that cover all the above issues.
Will test it and submit here by EOD.
Regards,
Krishna.
>
> Produce the following warning:
> --
> [ 317.026048] WARNING: CPU: 1 PID: 443 at kernel/smp.c:425 smp_call_function_many+0xa0/0x260
> [ 317.026131] Call Trace:
> [ 317.026159] ? load_new_mm_cr3+0xe0/0xe0
> [ 317.026161] on_each_cpu+0x28/0x50
> [ 317.026183] __purge_vmap_area_lazy+0x72/0x150
> [ 317.026200] free_vmap_area_noflush+0x7a/0x90
> [ 317.026202] remove_vm_area+0x6f/0x80
> [ 317.026203] __vunmap+0x71/0x210
> [ 317.026211] siw_free_qp+0x8d/0x130 [siw]
> [ 317.026217] destroy_cm_id+0xc3/0x200 [iw_cm]
> [ 317.026222] rdma_destroy_id+0x224/0x2b0 [rdma_cm]
> [ 317.026226] nvme_rdma_reset_ctrl_work+0x2c/0x70 [nvme_rdma]
> [ 317.026235] process_one_work+0x1f4/0x3e0
> [ 317.026249] worker_thread+0x221/0x3e0
> [ 317.026252] ? process_one_work+0x3e0/0x3e0
> [ 317.026256] kthread+0x117/0x130
> [ 317.026264] ? kthread_create_worker_on_cpu+0x70/0x70
> [ 317.026275] ret_from_fork+0x35/0x40
> --
>
> Fix this by exchanging the qp pointer early on and safely destroying
> it.
>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> ---
> Changes from v2:
> - store the qp locally so we don't need to unlock the cm_id_priv lock when
> destroying the qp
>
> Changes from v1:
> - don't release the lock before qp pointer is cleared.
>
> drivers/infiniband/core/iwcm.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c
> index 72141c5b7c95..c64707f68d22 100644
> --- a/drivers/infiniband/core/iwcm.c
> +++ b/drivers/infiniband/core/iwcm.c
> @@ -373,8 +373,10 @@ static void destroy_cm_id(struct iw_cm_id *cm_id)
> {
> struct iwcm_id_private *cm_id_priv;
> unsigned long flags;
> + struct ib_qp *qp;
>
> cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
> + qp = xchg(&cm_id_priv->qp, NULL);
> /*
> * Wait if we're currently in a connect or accept downcall. A
> * listening endpoint should never block here.
> @@ -401,7 +403,7 @@ static void destroy_cm_id(struct iw_cm_id *cm_id)
> cm_id_priv->state = IW_CM_STATE_DESTROYING;
> spin_unlock_irqrestore(&cm_id_priv->lock, flags);
> /* Abrupt close of the connection */
> - (void)iwcm_modify_qp_err(cm_id_priv->qp);
> + (void)iwcm_modify_qp_err(qp);
> spin_lock_irqsave(&cm_id_priv->lock, flags);
> break;
> case IW_CM_STATE_IDLE:
> @@ -426,11 +428,9 @@ static void destroy_cm_id(struct iw_cm_id *cm_id)
> BUG();
> break;
> }
> - if (cm_id_priv->qp) {
> - cm_id_priv->id.device->ops.iw_rem_ref(cm_id_priv->qp);
> - cm_id_priv->qp = NULL;
> - }
> spin_unlock_irqrestore(&cm_id_priv->lock, flags);
> + if (qp)
> + cm_id_priv->id.device->ops.iw_rem_ref(qp);
>
> if (cm_id->mapped) {
> iwpm_remove_mapinfo(&cm_id->local_addr, &cm_id->m_local_addr);
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] iwcm: don't hold the irq disabled lock on iw_rem_ref
2019-09-10 11:18 ` Krishnamraju Eraparaju
@ 2019-09-10 16:53 ` Sagi Grimberg
2019-09-10 19:21 ` Krishnamraju Eraparaju
2019-09-11 9:38 ` Bernard Metzler
0 siblings, 2 replies; 15+ messages in thread
From: Sagi Grimberg @ 2019-09-10 16:53 UTC (permalink / raw)
To: Krishnamraju Eraparaju; +Cc: linux-rdma, Jason Gunthorpe
>> This may be the final put on a qp and result in freeing
>> resourcesand should not be done with interrupts disabled.
>
> Hi Sagi,
>
> Few things to consider in fixing this completely:
> - there are some other places where iw_rem_ref() should be called
> after spinlock critical section. eg: in cm_close_handler(),
> iw_cm_connect(),...
> - Any modifications to "cm_id_priv" should be done with in spinlock
> critical section, modifying cm_id_priv->qp outside spinlocks, even
> with atomic xchg(), might be error prone.
> - the structure "siw_base_qp" is getting freed in siw_destroy_qp(),
> but it should be done at the end of siw_free_qp().
Not sure why you say that, at the end of this function ->qp will be null
anyways...
>
> I am about to finish writing a patch that cover all the above issues.
> Will test it and submit here by EOD.
Sure, you take it. Just stumbled on it so thought I'd go ahead and send
a patch...
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] iwcm: don't hold the irq disabled lock on iw_rem_ref
2019-09-10 16:53 ` Sagi Grimberg
@ 2019-09-10 19:21 ` Krishnamraju Eraparaju
2019-09-11 9:38 ` Bernard Metzler
1 sibling, 0 replies; 15+ messages in thread
From: Krishnamraju Eraparaju @ 2019-09-10 19:21 UTC (permalink / raw)
To: Sagi Grimberg, Steve Wise, Bernard Metzler; +Cc: linux-rdma, Jason Gunthorpe
Please review the below patch, I will resubmit this in patch-series
after review.
- As kput_ref handler(siw_free_qp) uses vfree, iwcm can't call
iw_rem_ref() with spinlocks held. Doing so can cause vfree() to sleep
with irq disabled.
Two possible solutions:
1)With spinlock acquired, take a copy of "cm_id_priv->qp" and update
it to NULL. And after releasing lock use the copied qp pointer for
rem_ref().
2)Replacing issue causing vmalloc()/vfree to kmalloc()/kfree in SIW
driver, may not be a ideal solution.
Solution 2 may not be ideal as allocating huge contigous memory for
SQ & RQ doesn't look appropriate.
- The structure "siw_base_qp" is getting freed in siw_destroy_qp(), but
if cm_close_handler() holds the last reference, then siw_free_qp(),
via cm_close_handler(), tries to get already freed "siw_base_qp" from
"ib_qp".
Hence, "siw_base_qp" should be freed at the end of siw_free_qp().
diff --git a/drivers/infiniband/core/iwcm.c
b/drivers/infiniband/core/iwcm.c
index 72141c5b7c95..d5ab69fa598a 100644
--- a/drivers/infiniband/core/iwcm.c
+++ b/drivers/infiniband/core/iwcm.c
@@ -373,6 +373,7 @@ static void destroy_cm_id(struct iw_cm_id *cm_id)
{
struct iwcm_id_private *cm_id_priv;
unsigned long flags;
+ struct ib_qp *qp;
cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
/*
@@ -389,6 +390,9 @@ static void destroy_cm_id(struct iw_cm_id *cm_id)
set_bit(IWCM_F_DROP_EVENTS, &cm_id_priv->flags);
spin_lock_irqsave(&cm_id_priv->lock, flags);
+ qp = cm_id_priv->qp;
+ cm_id_priv->qp = NULL;
+
switch (cm_id_priv->state) {
case IW_CM_STATE_LISTEN:
cm_id_priv->state = IW_CM_STATE_DESTROYING;
@@ -401,7 +405,7 @@ static void destroy_cm_id(struct iw_cm_id *cm_id)
cm_id_priv->state = IW_CM_STATE_DESTROYING;
spin_unlock_irqrestore(&cm_id_priv->lock, flags);
/* Abrupt close of the connection */
- (void)iwcm_modify_qp_err(cm_id_priv->qp);
+ (void)iwcm_modify_qp_err(qp);
spin_lock_irqsave(&cm_id_priv->lock, flags);
break;
case IW_CM_STATE_IDLE:
@@ -426,11 +430,9 @@ static void destroy_cm_id(struct iw_cm_id *cm_id)
BUG();
break;
}
- if (cm_id_priv->qp) {
- cm_id_priv->id.device->ops.iw_rem_ref(cm_id_priv->qp);
- cm_id_priv->qp = NULL;
- }
spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+ if (qp)
+ cm_id_priv->id.device->ops.iw_rem_ref(qp);
if (cm_id->mapped) {
iwpm_remove_mapinfo(&cm_id->local_addr,
&cm_id->m_local_addr);
@@ -671,11 +673,11 @@ int iw_cm_accept(struct iw_cm_id *cm_id,
BUG_ON(cm_id_priv->state != IW_CM_STATE_CONN_RECV);
cm_id_priv->state = IW_CM_STATE_IDLE;
spin_lock_irqsave(&cm_id_priv->lock, flags);
- if (cm_id_priv->qp) {
- cm_id->device->ops.iw_rem_ref(qp);
- cm_id_priv->qp = NULL;
- }
+ qp = cm_id_priv->qp;
+ cm_id_priv->qp = NULL;
spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+ if (qp)
+ cm_id->device->ops.iw_rem_ref(qp);
clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags);
wake_up_all(&cm_id_priv->connect_wait);
}
@@ -730,13 +732,13 @@ int iw_cm_connect(struct iw_cm_id *cm_id, struct
iw_cm_conn_param *iw_param)
return 0; /* success */
spin_lock_irqsave(&cm_id_priv->lock, flags);
- if (cm_id_priv->qp) {
- cm_id->device->ops.iw_rem_ref(qp);
- cm_id_priv->qp = NULL;
- }
+ qp = cm_id_priv->qp;
+ cm_id_priv->qp = NULL;
cm_id_priv->state = IW_CM_STATE_IDLE;
err:
spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+ if (qp)
+ cm_id->device->ops.iw_rem_ref(qp);
clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags);
wake_up_all(&cm_id_priv->connect_wait);
return ret;
@@ -880,6 +882,7 @@ static int cm_conn_rep_handler(struct
iwcm_id_private *cm_id_priv,
{
unsigned long flags;
int ret;
+ struct ib_qp *qp = NULL;
spin_lock_irqsave(&cm_id_priv->lock, flags);
/*
@@ -896,11 +899,13 @@ static int cm_conn_rep_handler(struct
iwcm_id_private *cm_id_priv,
cm_id_priv->state = IW_CM_STATE_ESTABLISHED;
} else {
/* REJECTED or RESET */
- cm_id_priv->id.device->ops.iw_rem_ref(cm_id_priv->qp);
+ qp = cm_id_priv->qp;
cm_id_priv->qp = NULL;
cm_id_priv->state = IW_CM_STATE_IDLE;
}
spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+ if (qp)
+ cm_id_priv->id.device->ops.iw_rem_ref(qp);
ret = cm_id_priv->id.cm_handler(&cm_id_priv->id, iw_event);
if (iw_event->private_data_len)
@@ -944,12 +949,12 @@ static int cm_close_handler(struct iwcm_id_private
*cm_id_priv,
{
unsigned long flags;
int ret = 0;
+ struct ib_qp *qp;
+
spin_lock_irqsave(&cm_id_priv->lock, flags);
+ qp = cm_id_priv->qp;
+ cm_id_priv->qp = NULL;
- if (cm_id_priv->qp) {
- cm_id_priv->id.device->ops.iw_rem_ref(cm_id_priv->qp);
- cm_id_priv->qp = NULL;
- }
switch (cm_id_priv->state) {
case IW_CM_STATE_ESTABLISHED:
case IW_CM_STATE_CLOSING:
@@ -965,6 +970,8 @@ static int cm_close_handler(struct iwcm_id_private
*cm_id_priv,
}
spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+ if (qp)
+ cm_id_priv->id.device->ops.iw_rem_ref(qp);
return ret;
}
diff --git a/drivers/infiniband/sw/siw/siw_qp.c
b/drivers/infiniband/sw/siw/siw_qp.c
index 430314c8abd9..cb177688a49f 100644
--- a/drivers/infiniband/sw/siw/siw_qp.c
+++ b/drivers/infiniband/sw/siw/siw_qp.c
@@ -1307,6 +1307,7 @@ void siw_free_qp(struct kref *ref)
struct siw_qp *found, *qp = container_of(ref, struct siw_qp,
ref);
struct siw_device *sdev = qp->sdev;
unsigned long flags;
+ struct siw_base_qp *siw_base_qp = to_siw_base_qp(qp->ib_qp);
if (qp->cep)
siw_cep_put(qp->cep);
@@ -1327,4 +1328,5 @@ void siw_free_qp(struct kref *ref)
atomic_dec(&sdev->num_qp);
siw_dbg_qp(qp, "free QP\n");
kfree_rcu(qp, rcu);
+ kfree(siw_base_qp);
}
diff --git a/drivers/infiniband/sw/siw/siw_verbs.c
b/drivers/infiniband/sw/siw/siw_verbs.c
index da52c90e06d4..ac08d84d84cb 100644
--- a/drivers/infiniband/sw/siw/siw_verbs.c
+++ b/drivers/infiniband/sw/siw/siw_verbs.c
@@ -603,7 +603,6 @@ int siw_verbs_modify_qp(struct ib_qp *base_qp,
struct ib_qp_attr *attr,
int siw_destroy_qp(struct ib_qp *base_qp, struct ib_udata *udata)
{
struct siw_qp *qp = to_siw_qp(base_qp);
- struct siw_base_qp *siw_base_qp = to_siw_base_qp(base_qp);
struct siw_ucontext *uctx =
rdma_udata_to_drv_context(udata, struct siw_ucontext,
base_ucontext);
@@ -640,7 +639,6 @@ int siw_destroy_qp(struct ib_qp *base_qp, struct
ib_udata *udata)
qp->scq = qp->rcq = NULL;
siw_qp_put(qp);
- kfree(siw_base_qp);
return 0;
}
On Tuesday, September 09/10/19, 2019 at 22:23:13 +0530, Sagi Grimberg wrote:
>
> >> This may be the final put on a qp and result in freeing
> >> resourcesand should not be done with interrupts disabled.
> >
> > Hi Sagi,
> >
> > Few things to consider in fixing this completely:
> > - there are some other places where iw_rem_ref() should be called
> > after spinlock critical section. eg: in cm_close_handler(),
> > iw_cm_connect(),...
> > - Any modifications to "cm_id_priv" should be done with in spinlock
> > critical section, modifying cm_id_priv->qp outside spinlocks, even
> > with atomic xchg(), might be error prone.
> > - the structure "siw_base_qp" is getting freed in siw_destroy_qp(),
> > but it should be done at the end of siw_free_qp().
>
> Not sure why you say that, at the end of this function ->qp will be null
> anyways...
Hope the above description and patch answers this.
>
> >
> > I am about to finish writing a patch that cover all the above issues.
> > Will test it and submit here by EOD.
>
> Sure, you take it. Just stumbled on it so thought I'd go ahead and send
> a patch...
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: Re: [PATCH v3] iwcm: don't hold the irq disabled lock on iw_rem_ref
2019-09-10 16:53 ` Sagi Grimberg
2019-09-10 19:21 ` Krishnamraju Eraparaju
@ 2019-09-11 9:38 ` Bernard Metzler
2019-09-11 14:42 ` Steve Wise
1 sibling, 1 reply; 15+ messages in thread
From: Bernard Metzler @ 2019-09-11 9:38 UTC (permalink / raw)
To: Krishnamraju Eraparaju
Cc: Sagi Grimberg, Steve Wise, linux-rdma, Jason Gunthorpe
-----"Krishnamraju Eraparaju" <krishna2@chelsio.com> wrote: -----
>To: "Sagi Grimberg" <sagi@grimberg.me>, "Steve Wise"
><larrystevenwise@gmail.com>, "Bernard Metzler" <BMT@zurich.ibm.com>
>From: "Krishnamraju Eraparaju" <krishna2@chelsio.com>
>Date: 09/10/2019 09:22PM
>Cc: "linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>, "Jason
>Gunthorpe" <jgg@ziepe.ca>
>Subject: [EXTERNAL] Re: [PATCH v3] iwcm: don't hold the irq disabled
>lock on iw_rem_ref
>
>Please review the below patch, I will resubmit this in patch-series
>after review.
>- As kput_ref handler(siw_free_qp) uses vfree, iwcm can't call
> iw_rem_ref() with spinlocks held. Doing so can cause vfree() to
>sleep
> with irq disabled.
> Two possible solutions:
> 1)With spinlock acquired, take a copy of "cm_id_priv->qp" and
>update
> it to NULL. And after releasing lock use the copied qp pointer
>for
> rem_ref().
> 2)Replacing issue causing vmalloc()/vfree to kmalloc()/kfree in SIW
> driver, may not be a ideal solution.
>
> Solution 2 may not be ideal as allocating huge contigous memory for
> SQ & RQ doesn't look appropriate.
>
>- The structure "siw_base_qp" is getting freed in siw_destroy_qp(),
>but
> if cm_close_handler() holds the last reference, then siw_free_qp(),
> via cm_close_handler(), tries to get already freed "siw_base_qp"
>from
> "ib_qp".
> Hence, "siw_base_qp" should be freed at the end of siw_free_qp().
>
Regarding the siw driver, I am fine with that proposed
change. Delaying freeing the base_qp is OK. In fact,
I'd expect the drivers soon are passing that responsibility
to the rdma core anyway -- like for CQ/SRQ/PD/CTX objects,
which are already allocated and freed up there.
The iwcm changes look OK to me as well.
(some comments on re-formatting the changes
inlined below)
Thanks!
Bernard.
>
>diff --git a/drivers/infiniband/core/iwcm.c
>b/drivers/infiniband/core/iwcm.c
>index 72141c5b7c95..d5ab69fa598a 100644
>--- a/drivers/infiniband/core/iwcm.c
>+++ b/drivers/infiniband/core/iwcm.c
>@@ -373,6 +373,7 @@ static void destroy_cm_id(struct iw_cm_id *cm_id)
> {
> struct iwcm_id_private *cm_id_priv;
> unsigned long flags;
>+ struct ib_qp *qp;
move *qp declaration up one line - see comment in
siw driver change below.
>
> cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
> /*
>@@ -389,6 +390,9 @@ static void destroy_cm_id(struct iw_cm_id *cm_id)
> set_bit(IWCM_F_DROP_EVENTS, &cm_id_priv->flags);
>
> spin_lock_irqsave(&cm_id_priv->lock, flags);
>+ qp = cm_id_priv->qp;
>+ cm_id_priv->qp = NULL;
>+
> switch (cm_id_priv->state) {
> case IW_CM_STATE_LISTEN:
> cm_id_priv->state = IW_CM_STATE_DESTROYING;
>@@ -401,7 +405,7 @@ static void destroy_cm_id(struct iw_cm_id *cm_id)
> cm_id_priv->state = IW_CM_STATE_DESTROYING;
> spin_unlock_irqrestore(&cm_id_priv->lock, flags);
> /* Abrupt close of the connection */
>- (void)iwcm_modify_qp_err(cm_id_priv->qp);
>+ (void)iwcm_modify_qp_err(qp);
> spin_lock_irqsave(&cm_id_priv->lock, flags);
> break;
> case IW_CM_STATE_IDLE:
>@@ -426,11 +430,9 @@ static void destroy_cm_id(struct iw_cm_id
>*cm_id)
> BUG();
> break;
> }
>- if (cm_id_priv->qp) {
>-
>cm_id_priv->id.device->ops.iw_rem_ref(cm_id_priv->qp);
>- cm_id_priv->qp = NULL;
>- }
> spin_unlock_irqrestore(&cm_id_priv->lock, flags);
>+ if (qp)
>+ cm_id_priv->id.device->ops.iw_rem_ref(qp);
>
> if (cm_id->mapped) {
> iwpm_remove_mapinfo(&cm_id->local_addr,
>&cm_id->m_local_addr);
>@@ -671,11 +673,11 @@ int iw_cm_accept(struct iw_cm_id *cm_id,
> BUG_ON(cm_id_priv->state != IW_CM_STATE_CONN_RECV);
> cm_id_priv->state = IW_CM_STATE_IDLE;
> spin_lock_irqsave(&cm_id_priv->lock, flags);
>- if (cm_id_priv->qp) {
>- cm_id->device->ops.iw_rem_ref(qp);
>- cm_id_priv->qp = NULL;
>- }
>+ qp = cm_id_priv->qp;
>+ cm_id_priv->qp = NULL;
> spin_unlock_irqrestore(&cm_id_priv->lock, flags);
>+ if (qp)
>+ cm_id->device->ops.iw_rem_ref(qp);
> clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags);
> wake_up_all(&cm_id_priv->connect_wait);
> }
>@@ -730,13 +732,13 @@ int iw_cm_connect(struct iw_cm_id *cm_id,
>struct
>iw_cm_conn_param *iw_param)
> return 0; /* success */
>
> spin_lock_irqsave(&cm_id_priv->lock, flags);
>- if (cm_id_priv->qp) {
>- cm_id->device->ops.iw_rem_ref(qp);
>- cm_id_priv->qp = NULL;
>- }
>+ qp = cm_id_priv->qp;
>+ cm_id_priv->qp = NULL;
> cm_id_priv->state = IW_CM_STATE_IDLE;
> err:
> spin_unlock_irqrestore(&cm_id_priv->lock, flags);
>+ if (qp)
>+ cm_id->device->ops.iw_rem_ref(qp);
> clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags);
> wake_up_all(&cm_id_priv->connect_wait);
> return ret;
>@@ -880,6 +882,7 @@ static int cm_conn_rep_handler(struct
>iwcm_id_private *cm_id_priv,
> {
> unsigned long flags;
> int ret;
>+ struct ib_qp *qp = NULL;
>
> spin_lock_irqsave(&cm_id_priv->lock, flags);
> /*
>@@ -896,11 +899,13 @@ static int cm_conn_rep_handler(struct
>iwcm_id_private *cm_id_priv,
> cm_id_priv->state = IW_CM_STATE_ESTABLISHED;
> } else {
> /* REJECTED or RESET */
>-
>cm_id_priv->id.device->ops.iw_rem_ref(cm_id_priv->qp);
>+ qp = cm_id_priv->qp;
> cm_id_priv->qp = NULL;
> cm_id_priv->state = IW_CM_STATE_IDLE;
> }
> spin_unlock_irqrestore(&cm_id_priv->lock, flags);
>+ if (qp)
>+ cm_id_priv->id.device->ops.iw_rem_ref(qp);
> ret = cm_id_priv->id.cm_handler(&cm_id_priv->id, iw_event);
>
> if (iw_event->private_data_len)
>@@ -944,12 +949,12 @@ static int cm_close_handler(struct
>iwcm_id_private
>*cm_id_priv,
> {
> unsigned long flags;
> int ret = 0;
>+ struct ib_qp *qp;
move *qp declaration up two lines - see comment on siw
driver change below.
>+
> spin_lock_irqsave(&cm_id_priv->lock, flags);
>+ qp = cm_id_priv->qp;
>+ cm_id_priv->qp = NULL;
>
>- if (cm_id_priv->qp) {
>-
>cm_id_priv->id.device->ops.iw_rem_ref(cm_id_priv->qp);
>- cm_id_priv->qp = NULL;
>- }
> switch (cm_id_priv->state) {
> case IW_CM_STATE_ESTABLISHED:
> case IW_CM_STATE_CLOSING:
>@@ -965,6 +970,8 @@ static int cm_close_handler(struct
>iwcm_id_private
>*cm_id_priv,
> }
> spin_unlock_irqrestore(&cm_id_priv->lock, flags);
>
>+ if (qp)
>+ cm_id_priv->id.device->ops.iw_rem_ref(qp);
> return ret;
> }
>
>diff --git a/drivers/infiniband/sw/siw/siw_qp.c
>b/drivers/infiniband/sw/siw/siw_qp.c
>index 430314c8abd9..cb177688a49f 100644
>--- a/drivers/infiniband/sw/siw/siw_qp.c
>+++ b/drivers/infiniband/sw/siw/siw_qp.c
>@@ -1307,6 +1307,7 @@ void siw_free_qp(struct kref *ref)
> struct siw_qp *found, *qp = container_of(ref, struct siw_qp,
>ref);
> struct siw_device *sdev = qp->sdev;
> unsigned long flags;
>+ struct siw_base_qp *siw_base_qp = to_siw_base_qp(qp->ib_qp);
Please move that two lines up if OK with you.
I always prefer to have structs and its pointers
declared before introducing simple helper variables
like int and long etc. Thanks!
>
> if (qp->cep)
> siw_cep_put(qp->cep);
>@@ -1327,4 +1328,5 @@ void siw_free_qp(struct kref *ref)
> atomic_dec(&sdev->num_qp);
> siw_dbg_qp(qp, "free QP\n");
> kfree_rcu(qp, rcu);
>+ kfree(siw_base_qp);
> }
>diff --git a/drivers/infiniband/sw/siw/siw_verbs.c
>b/drivers/infiniband/sw/siw/siw_verbs.c
>index da52c90e06d4..ac08d84d84cb 100644
>--- a/drivers/infiniband/sw/siw/siw_verbs.c
>+++ b/drivers/infiniband/sw/siw/siw_verbs.c
>@@ -603,7 +603,6 @@ int siw_verbs_modify_qp(struct ib_qp *base_qp,
>struct ib_qp_attr *attr,
> int siw_destroy_qp(struct ib_qp *base_qp, struct ib_udata *udata)
> {
> struct siw_qp *qp = to_siw_qp(base_qp);
>- struct siw_base_qp *siw_base_qp = to_siw_base_qp(base_qp);
> struct siw_ucontext *uctx =
> rdma_udata_to_drv_context(udata, struct siw_ucontext,
> base_ucontext);
>@@ -640,7 +639,6 @@ int siw_destroy_qp(struct ib_qp *base_qp, struct
>ib_udata *udata)
> qp->scq = qp->rcq = NULL;
>
> siw_qp_put(qp);
>- kfree(siw_base_qp);
>
> return 0;
> }
>
>
>On Tuesday, September 09/10/19, 2019 at 22:23:13 +0530, Sagi Grimberg
>wrote:
>>
>> >> This may be the final put on a qp and result in freeing
>> >> resourcesand should not be done with interrupts disabled.
>> >
>> > Hi Sagi,
>> >
>> > Few things to consider in fixing this completely:
>> > - there are some other places where iw_rem_ref() should be
>called
>> > after spinlock critical section. eg: in cm_close_handler(),
>> > iw_cm_connect(),...
>> > - Any modifications to "cm_id_priv" should be done with in
>spinlock
>> > critical section, modifying cm_id_priv->qp outside
>spinlocks, even
>> > with atomic xchg(), might be error prone.
>> > - the structure "siw_base_qp" is getting freed in
>siw_destroy_qp(),
>> > but it should be done at the end of siw_free_qp().
>>
>> Not sure why you say that, at the end of this function ->qp will be
>null
>> anyways...
> Hope the above description and patch answers this.
>>
>> >
>> > I am about to finish writing a patch that cover all the above
>issues.
>> > Will test it and submit here by EOD.
>>
>> Sure, you take it. Just stumbled on it so thought I'd go ahead and
>send
>> a patch...
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re: [PATCH v3] iwcm: don't hold the irq disabled lock on iw_rem_ref
2019-09-11 9:38 ` Bernard Metzler
@ 2019-09-11 14:42 ` Steve Wise
2019-09-11 15:58 ` Krishnamraju Eraparaju
0 siblings, 1 reply; 15+ messages in thread
From: Steve Wise @ 2019-09-11 14:42 UTC (permalink / raw)
To: Bernard Metzler
Cc: Krishnamraju Eraparaju, Sagi Grimberg, linux-rdma, Jason Gunthorpe
On Wed, Sep 11, 2019 at 4:38 AM Bernard Metzler <BMT@zurich.ibm.com> wrote:
>
> -----"Krishnamraju Eraparaju" <krishna2@chelsio.com> wrote: -----
>
> >To: "Sagi Grimberg" <sagi@grimberg.me>, "Steve Wise"
> ><larrystevenwise@gmail.com>, "Bernard Metzler" <BMT@zurich.ibm.com>
> >From: "Krishnamraju Eraparaju" <krishna2@chelsio.com>
> >Date: 09/10/2019 09:22PM
> >Cc: "linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>, "Jason
> >Gunthorpe" <jgg@ziepe.ca>
> >Subject: [EXTERNAL] Re: [PATCH v3] iwcm: don't hold the irq disabled
> >lock on iw_rem_ref
> >
> >Please review the below patch, I will resubmit this in patch-series
> >after review.
> >- As kput_ref handler(siw_free_qp) uses vfree, iwcm can't call
> > iw_rem_ref() with spinlocks held. Doing so can cause vfree() to
> >sleep
> > with irq disabled.
> > Two possible solutions:
> > 1)With spinlock acquired, take a copy of "cm_id_priv->qp" and
> >update
> > it to NULL. And after releasing lock use the copied qp pointer
> >for
> > rem_ref().
> > 2)Replacing issue causing vmalloc()/vfree to kmalloc()/kfree in SIW
> > driver, may not be a ideal solution.
> >
> > Solution 2 may not be ideal as allocating huge contigous memory for
> > SQ & RQ doesn't look appropriate.
> >
> >- The structure "siw_base_qp" is getting freed in siw_destroy_qp(),
> >but
> > if cm_close_handler() holds the last reference, then siw_free_qp(),
> > via cm_close_handler(), tries to get already freed "siw_base_qp"
> >from
> > "ib_qp".
> > Hence, "siw_base_qp" should be freed at the end of siw_free_qp().
> >
>
> Regarding the siw driver, I am fine with that proposed
> change. Delaying freeing the base_qp is OK. In fact,
> I'd expect the drivers soon are passing that responsibility
> to the rdma core anyway -- like for CQ/SRQ/PD/CTX objects,
> which are already allocated and freed up there.
>
> The iwcm changes look OK to me as well.
>
Hey Krishna, Since the iwcm struct/state is still correctly being
manipulated under the lock, then I think it this patch correct. Test
the heck out of it. :)
Steve.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re: [PATCH v3] iwcm: don't hold the irq disabled lock on iw_rem_ref
2019-09-11 14:42 ` Steve Wise
@ 2019-09-11 15:58 ` Krishnamraju Eraparaju
2019-09-16 16:28 ` Jason Gunthorpe
2019-09-17 9:04 ` Bernard Metzler
0 siblings, 2 replies; 15+ messages in thread
From: Krishnamraju Eraparaju @ 2019-09-11 15:58 UTC (permalink / raw)
To: Steve Wise, Bernard Metzler; +Cc: Sagi Grimberg, linux-rdma, Jason Gunthorpe
Hi Steve & Bernard,
Thanks for the review comments.
I will do those formating changes.
Thanks,
Krishna.
On Wednesday, September 09/11/19, 2019 at 20:12:43 +0530, Steve Wise wrote:
> On Wed, Sep 11, 2019 at 4:38 AM Bernard Metzler <BMT@zurich.ibm.com> wrote:
> >
> > -----"Krishnamraju Eraparaju" <krishna2@chelsio.com> wrote: -----
> >
> > >To: "Sagi Grimberg" <sagi@grimberg.me>, "Steve Wise"
> > ><larrystevenwise@gmail.com>, "Bernard Metzler" <BMT@zurich.ibm.com>
> > >From: "Krishnamraju Eraparaju" <krishna2@chelsio.com>
> > >Date: 09/10/2019 09:22PM
> > >Cc: "linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>, "Jason
> > >Gunthorpe" <jgg@ziepe.ca>
> > >Subject: [EXTERNAL] Re: [PATCH v3] iwcm: don't hold the irq disabled
> > >lock on iw_rem_ref
> > >
> > >Please review the below patch, I will resubmit this in patch-series
> > >after review.
> > >- As kput_ref handler(siw_free_qp) uses vfree, iwcm can't call
> > > iw_rem_ref() with spinlocks held. Doing so can cause vfree() to
> > >sleep
> > > with irq disabled.
> > > Two possible solutions:
> > > 1)With spinlock acquired, take a copy of "cm_id_priv->qp" and
> > >update
> > > it to NULL. And after releasing lock use the copied qp pointer
> > >for
> > > rem_ref().
> > > 2)Replacing issue causing vmalloc()/vfree to kmalloc()/kfree in SIW
> > > driver, may not be a ideal solution.
> > >
> > > Solution 2 may not be ideal as allocating huge contigous memory for
> > > SQ & RQ doesn't look appropriate.
> > >
> > >- The structure "siw_base_qp" is getting freed in siw_destroy_qp(),
> > >but
> > > if cm_close_handler() holds the last reference, then siw_free_qp(),
> > > via cm_close_handler(), tries to get already freed "siw_base_qp"
> > >from
> > > "ib_qp".
> > > Hence, "siw_base_qp" should be freed at the end of siw_free_qp().
> > >
> >
> > Regarding the siw driver, I am fine with that proposed
> > change. Delaying freeing the base_qp is OK. In fact,
> > I'd expect the drivers soon are passing that responsibility
> > to the rdma core anyway -- like for CQ/SRQ/PD/CTX objects,
> > which are already allocated and freed up there.
> >
> > The iwcm changes look OK to me as well.
> >
>
> Hey Krishna, Since the iwcm struct/state is still correctly being
> manipulated under the lock, then I think it this patch correct. Test
> the heck out of it. :)
>
> Steve.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re: [PATCH v3] iwcm: don't hold the irq disabled lock on iw_rem_ref
2019-09-11 15:58 ` Krishnamraju Eraparaju
@ 2019-09-16 16:28 ` Jason Gunthorpe
2019-09-17 9:04 ` Bernard Metzler
1 sibling, 0 replies; 15+ messages in thread
From: Jason Gunthorpe @ 2019-09-16 16:28 UTC (permalink / raw)
To: Krishnamraju Eraparaju
Cc: Steve Wise, Bernard Metzler, Sagi Grimberg, linux-rdma
On Wed, Sep 11, 2019 at 09:28:16PM +0530, Krishnamraju Eraparaju wrote:
> Hi Steve & Bernard,
>
> Thanks for the review comments.
> I will do those formating changes.
I don't see anything in patchworks, but the consensus is to drop
Sagi's patch pending this future patch?
Jason
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re: Re: [PATCH v3] iwcm: don't hold the irq disabled lock on iw_rem_ref
2019-09-11 15:58 ` Krishnamraju Eraparaju
2019-09-16 16:28 ` Jason Gunthorpe
@ 2019-09-17 9:04 ` Bernard Metzler
2019-09-17 12:47 ` Krishnamraju Eraparaju
` (2 more replies)
1 sibling, 3 replies; 15+ messages in thread
From: Bernard Metzler @ 2019-09-17 9:04 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Krishnamraju Eraparaju, Steve Wise, Sagi Grimberg, linux-rdma
-----"Jason Gunthorpe" <jgg@ziepe.ca> wrote: -----
>To: "Krishnamraju Eraparaju" <krishna2@chelsio.com>
>From: "Jason Gunthorpe" <jgg@ziepe.ca>
>Date: 09/16/2019 06:28PM
>Cc: "Steve Wise" <larrystevenwise@gmail.com>, "Bernard Metzler"
><BMT@zurich.ibm.com>, "Sagi Grimberg" <sagi@grimberg.me>,
>"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
>Subject: [EXTERNAL] Re: Re: [PATCH v3] iwcm: don't hold the irq
>disabled lock on iw_rem_ref
>
>On Wed, Sep 11, 2019 at 09:28:16PM +0530, Krishnamraju Eraparaju
>wrote:
>> Hi Steve & Bernard,
>>
>> Thanks for the review comments.
>> I will do those formating changes.
>
>I don't see anything in patchworks, but the consensus is to drop
>Sagi's patch pending this future patch?
>
>Jason
>
This is my impression as well. But consensus should be
explicit...Sagi, what do you think?
Best regards,
Bernard.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re: Re: [PATCH v3] iwcm: don't hold the irq disabled lock on iw_rem_ref
2019-09-17 9:04 ` Bernard Metzler
@ 2019-09-17 12:47 ` Krishnamraju Eraparaju
2019-09-17 13:40 ` Bernard Metzler
2019-09-17 17:20 ` Sagi Grimberg
2 siblings, 0 replies; 15+ messages in thread
From: Krishnamraju Eraparaju @ 2019-09-17 12:47 UTC (permalink / raw)
To: Bernard Metzler, Steve Wise
Cc: Jason Gunthorpe, Sagi Grimberg, linux-rdma, Nirranjan Kirubaharan
On Tuesday, September 09/17/19, 2019 at 14:34:24 +0530, Bernard Metzler wrote:
> -----"Jason Gunthorpe" <jgg@ziepe.ca> wrote: -----
>
> >To: "Krishnamraju Eraparaju" <krishna2@chelsio.com>
> >From: "Jason Gunthorpe" <jgg@ziepe.ca>
> >Date: 09/16/2019 06:28PM
> >Cc: "Steve Wise" <larrystevenwise@gmail.com>, "Bernard Metzler"
> ><BMT@zurich.ibm.com>, "Sagi Grimberg" <sagi@grimberg.me>,
> >"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
> >Subject: [EXTERNAL] Re: Re: [PATCH v3] iwcm: don't hold the irq
> >disabled lock on iw_rem_ref
> >
> >On Wed, Sep 11, 2019 at 09:28:16PM +0530, Krishnamraju Eraparaju
> >wrote:
> >> Hi Steve & Bernard,
> >>
> >> Thanks for the review comments.
> >> I will do those formating changes.
> >
> >I don't see anything in patchworks, but the consensus is to drop
> >Sagi's patch pending this future patch?
> >
> >Jason
> >
> This is my impression as well. But consensus should be
> explicit...Sagi, what do you think?
>
> Best regards,
> Bernard.
>
While testing iSER(with my proposed patch applied) I see Chelsio iwarp
driver is hitting the below deadlock issue. This is due to iw_rem_ref
reordering changes in IWCM.
Bernard, how about replacing vmalloc/vfree with kmalloc/kfree,
such that freeing of SIW qp resources can be done with spinlocks held?
to fix the orginal vfree issue less invasively..
Steve, any suggestions?
[ 1230.161871] INFO: task kworker/u12:0:11291 blocked for more than 122
seconds.
[ 1230.162147] Not tainted 5.3.0-rc5+ #19
[ 1230.162417] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1230.162911] kworker/u12:0 D13000 11291 2 0x80004080
[ 1230.163186] Workqueue: iw_cm_wq cm_work_handler
[ 1230.163456] Call Trace:
[ 1230.163718] ? __schedule+0x297/0x510
[ 1230.163986] schedule+0x2e/0x90
[ 1230.164253] schedule_timeout+0x1c0/0x280
[ 1230.164520] ? xas_store+0x23e/0x500
[ 1230.164789] wait_for_completion+0xa2/0x110
[ 1230.165067] ? wake_up_q+0x70/0x70
[ 1230.165336] c4iw_destroy_qp+0x141/0x260 [iw_cxgb4]
[ 1230.165611] ? xas_store+0x23e/0x500
[ 1230.165893] ? _cond_resched+0x10/0x20
[ 1230.166160] ? wait_for_completion+0x2e/0x110
[ 1230.166432] ib_destroy_qp_user+0x142/0x230
[ 1230.166699] rdma_destroy_qp+0x1f/0x40
[ 1230.166966] iser_free_ib_conn_res+0x52/0x190 [ib_iser]
[ 1230.167241] iser_cleanup_handler.isra.15+0x32/0x60 [ib_iser]
[ 1230.167510] iser_cma_handler+0x23b/0x730 [ib_iser]
[ 1230.167776] cma_iw_handler+0x154/0x1e0
[ 1230.168037] cm_work_handler+0xb4c/0xd60
[ 1230.168302] process_one_work+0x155/0x380
[ 1230.168564] worker_thread+0x41/0x3b0
[ 1230.168827] kthread+0xf3/0x130
[ 1230.169086] ? process_one_work+0x380/0x380
[ 1230.169350] ? kthread_bind+0x10/0x10
[ 1230.169615] ret_from_fork+0x35/0x40
[ 1230.169885] NMI backtrace for cpu 3
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re: Re: Re: [PATCH v3] iwcm: don't hold the irq disabled lock on iw_rem_ref
2019-09-17 9:04 ` Bernard Metzler
2019-09-17 12:47 ` Krishnamraju Eraparaju
@ 2019-09-17 13:40 ` Bernard Metzler
2019-09-17 17:20 ` Sagi Grimberg
2 siblings, 0 replies; 15+ messages in thread
From: Bernard Metzler @ 2019-09-17 13:40 UTC (permalink / raw)
To: Krishnamraju Eraparaju
Cc: Steve Wise, Jason Gunthorpe, Sagi Grimberg, linux-rdma,
Nirranjan Kirubaharan
-----"Krishnamraju Eraparaju" <krishna2@chelsio.com> wrote: -----
>To: "Bernard Metzler" <BMT@zurich.ibm.com>, "Steve Wise"
><larrystevenwise@gmail.com>
>From: "Krishnamraju Eraparaju" <krishna2@chelsio.com>
>Date: 09/17/2019 02:48PM
>Cc: "Jason Gunthorpe" <jgg@ziepe.ca>, "Sagi Grimberg"
><sagi@grimberg.me>, "linux-rdma@vger.kernel.org"
><linux-rdma@vger.kernel.org>, "Nirranjan Kirubaharan"
><nirranjan@chelsio.com>
>Subject: [EXTERNAL] Re: Re: Re: [PATCH v3] iwcm: don't hold the irq
>disabled lock on iw_rem_ref
>
>On Tuesday, September 09/17/19, 2019 at 14:34:24 +0530, Bernard
>Metzler wrote:
>> -----"Jason Gunthorpe" <jgg@ziepe.ca> wrote: -----
>>
>> >To: "Krishnamraju Eraparaju" <krishna2@chelsio.com>
>> >From: "Jason Gunthorpe" <jgg@ziepe.ca>
>> >Date: 09/16/2019 06:28PM
>> >Cc: "Steve Wise" <larrystevenwise@gmail.com>, "Bernard Metzler"
>> ><BMT@zurich.ibm.com>, "Sagi Grimberg" <sagi@grimberg.me>,
>> >"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
>> >Subject: [EXTERNAL] Re: Re: [PATCH v3] iwcm: don't hold the irq
>> >disabled lock on iw_rem_ref
>> >
>> >On Wed, Sep 11, 2019 at 09:28:16PM +0530, Krishnamraju Eraparaju
>> >wrote:
>> >> Hi Steve & Bernard,
>> >>
>> >> Thanks for the review comments.
>> >> I will do those formating changes.
>> >
>> >I don't see anything in patchworks, but the consensus is to drop
>> >Sagi's patch pending this future patch?
>> >
>> >Jason
>> >
>> This is my impression as well. But consensus should be
>> explicit...Sagi, what do you think?
>>
>> Best regards,
>> Bernard.
>>
>While testing iSER(with my proposed patch applied) I see Chelsio
>iwarp
>driver is hitting the below deadlock issue. This is due to iw_rem_ref
>reordering changes in IWCM.
>
>Bernard, how about replacing vmalloc/vfree with kmalloc/kfree,
>such that freeing of SIW qp resources can be done with spinlocks
>held?
>to fix the orginal vfree issue less invasively..
Well, I'd really like to avoid kmalloc on potentially
large data structures when there is no need to have
physically contiguous memory.
I could of course move that vfree out to a worker.
Simple, but not really nice though.
So it seems it would be no good option to restructure
the Chelsio driver?
Thanks
Bernard.
>
>Steve, any suggestions?
>
>
>[ 1230.161871] INFO: task kworker/u12:0:11291 blocked for more than
>122
>seconds.
>[ 1230.162147] Not tainted 5.3.0-rc5+ #19
>[ 1230.162417] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>disables this message.
>[ 1230.162911] kworker/u12:0 D13000 11291 2 0x80004080
>[ 1230.163186] Workqueue: iw_cm_wq cm_work_handler
>[ 1230.163456] Call Trace:
>[ 1230.163718] ? __schedule+0x297/0x510
>[ 1230.163986] schedule+0x2e/0x90
>[ 1230.164253] schedule_timeout+0x1c0/0x280
>[ 1230.164520] ? xas_store+0x23e/0x500
>[ 1230.164789] wait_for_completion+0xa2/0x110
>[ 1230.165067] ? wake_up_q+0x70/0x70
>[ 1230.165336] c4iw_destroy_qp+0x141/0x260 [iw_cxgb4]
>[ 1230.165611] ? xas_store+0x23e/0x500
>[ 1230.165893] ? _cond_resched+0x10/0x20
>[ 1230.166160] ? wait_for_completion+0x2e/0x110
>[ 1230.166432] ib_destroy_qp_user+0x142/0x230
>[ 1230.166699] rdma_destroy_qp+0x1f/0x40
>[ 1230.166966] iser_free_ib_conn_res+0x52/0x190 [ib_iser]
>[ 1230.167241] iser_cleanup_handler.isra.15+0x32/0x60 [ib_iser]
>[ 1230.167510] iser_cma_handler+0x23b/0x730 [ib_iser]
>[ 1230.167776] cma_iw_handler+0x154/0x1e0
>[ 1230.168037] cm_work_handler+0xb4c/0xd60
>[ 1230.168302] process_one_work+0x155/0x380
>[ 1230.168564] worker_thread+0x41/0x3b0
>[ 1230.168827] kthread+0xf3/0x130
>[ 1230.169086] ? process_one_work+0x380/0x380
>[ 1230.169350] ? kthread_bind+0x10/0x10
>[ 1230.169615] ret_from_fork+0x35/0x40
>[ 1230.169885] NMI backtrace for cpu 3
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] iwcm: don't hold the irq disabled lock on iw_rem_ref
2019-09-17 9:04 ` Bernard Metzler
2019-09-17 12:47 ` Krishnamraju Eraparaju
2019-09-17 13:40 ` Bernard Metzler
@ 2019-09-17 17:20 ` Sagi Grimberg
2019-09-18 13:43 ` Krishnamraju Eraparaju
2 siblings, 1 reply; 15+ messages in thread
From: Sagi Grimberg @ 2019-09-17 17:20 UTC (permalink / raw)
To: Bernard Metzler, Jason Gunthorpe
Cc: Krishnamraju Eraparaju, Steve Wise, linux-rdma
>> To: "Krishnamraju Eraparaju" <krishna2@chelsio.com>
>> From: "Jason Gunthorpe" <jgg@ziepe.ca>
>> Date: 09/16/2019 06:28PM
>> Cc: "Steve Wise" <larrystevenwise@gmail.com>, "Bernard Metzler"
>> <BMT@zurich.ibm.com>, "Sagi Grimberg" <sagi@grimberg.me>,
>> "linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
>> Subject: [EXTERNAL] Re: Re: [PATCH v3] iwcm: don't hold the irq
>> disabled lock on iw_rem_ref
>>
>> On Wed, Sep 11, 2019 at 09:28:16PM +0530, Krishnamraju Eraparaju
>> wrote:
>>> Hi Steve & Bernard,
>>>
>>> Thanks for the review comments.
>>> I will do those formating changes.
>>
>> I don't see anything in patchworks, but the consensus is to drop
>> Sagi's patch pending this future patch?
>>
>> Jason
>>
> This is my impression as well. But consensus should be
> explicit...Sagi, what do you think?
I don't really care, but given the changes from Krishnamraju cause other
problems I'd ask if my version is also offending his test.
In general, I do not think that making resources free routines (both
explict or implicit via ref dec) under a spinlock is not a robust
design.
I would first make it clear and documented what cm_id_priv->lock is
protecting. In my mind, it should protect *its own* mutations of
cm_id_priv and by design leave all the ops calls outside the lock.
I don't understand what is causing the Chelsio issue observed, but
it looks like c4iw_destroy_qp blocks on a completion that depends on a
refcount that is taken also by iwcm, which means that I cannot call
ib_destroy_qp if the cm is not destroyed as well?
If that is madatory, I'd say that instead of blocking on this
completion, we can simply convert c4iw_qp_rem_ref if use a kref
which is not order dependent.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] iwcm: don't hold the irq disabled lock on iw_rem_ref
2019-09-17 17:20 ` Sagi Grimberg
@ 2019-09-18 13:43 ` Krishnamraju Eraparaju
2019-09-21 18:35 ` Krishnamraju Eraparaju
0 siblings, 1 reply; 15+ messages in thread
From: Krishnamraju Eraparaju @ 2019-09-18 13:43 UTC (permalink / raw)
To: Sagi Grimberg, Bernard Metzler
Cc: Jason Gunthorpe, Steve Wise, linux-rdma, Nirranjan Kirubaharan
On Tuesday, September 09/17/19, 2019 at 22:50:27 +0530, Sagi Grimberg wrote:
>
> >> To: "Krishnamraju Eraparaju" <krishna2@chelsio.com>
> >> From: "Jason Gunthorpe" <jgg@ziepe.ca>
> >> Date: 09/16/2019 06:28PM
> >> Cc: "Steve Wise" <larrystevenwise@gmail.com>, "Bernard Metzler"
> >> <BMT@zurich.ibm.com>, "Sagi Grimberg" <sagi@grimberg.me>,
> >> "linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
> >> Subject: [EXTERNAL] Re: Re: [PATCH v3] iwcm: don't hold the irq
> >> disabled lock on iw_rem_ref
> >>
> >> On Wed, Sep 11, 2019 at 09:28:16PM +0530, Krishnamraju Eraparaju
> >> wrote:
> >>> Hi Steve & Bernard,
> >>>
> >>> Thanks for the review comments.
> >>> I will do those formating changes.
> >>
> >> I don't see anything in patchworks, but the consensus is to drop
> >> Sagi's patch pending this future patch?
> >>
> >> Jason
> >>
> > This is my impression as well. But consensus should be
> > explicit...Sagi, what do you think?
>
> I don't really care, but given the changes from Krishnamraju cause other
> problems I'd ask if my version is also offending his test.
Hi Sagi,
Your version holds good for rdma_destroy_id() path only, but there are
other places where iw_rem_ref() is being called within spinlocks, here is
the call trace when iw_rem_ref() is called in cm_close_handler() path:
[ 627.716339] Call Trace:
[ 627.716339] ? load_new_mm_cr3+0xc0/0xc0
[ 627.716339] on_each_cpu+0x23/0x40
[ 627.716339] flush_tlb_kernel_range+0x74/0x80
[ 627.716340] free_unmap_vmap_area+0x2a/0x40
[ 627.716340] remove_vm_area+0x59/0x60
[ 627.716340] __vunmap+0x46/0x210
[ 627.716340] siw_free_qp+0x88/0x120 [siw]
[ 627.716340] cm_work_handler+0x5b8/0xd00 <=====
[ 627.716340] process_one_work+0x155/0x380
[ 627.716341] worker_thread+0x41/0x3b0
[ 627.716341] kthread+0xf3/0x130
[ 627.716341] ? process_one_work+0x380/0x380
[ 627.716341] ? kthread_bind+0x10/0x10
[ 627.716341] ret_from_fork+0x35/0x40
Hence, I moved all the occurances of iw_rem_ref() outside of spinlocks
critical section.
>
> In general, I do not think that making resources free routines (both
> explict or implicit via ref dec) under a spinlock is not a robust
> design.
>
> I would first make it clear and documented what cm_id_priv->lock is
> protecting. In my mind, it should protect *its own* mutations of
> cm_id_priv and by design leave all the ops calls outside the lock.
>
> I don't understand what is causing the Chelsio issue observed, but
> it looks like c4iw_destroy_qp blocks on a completion that depends on a
> refcount that is taken also by iwcm, which means that I cannot call
> ib_destroy_qp if the cm is not destroyed as well?
>
> If that is madatory, I'd say that instead of blocking on this
> completion, we can simply convert c4iw_qp_rem_ref if use a kref
> which is not order dependent.
I agree,
I'm exploring best possible ways to address this issue, will update my
final resolution by this Friday.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] iwcm: don't hold the irq disabled lock on iw_rem_ref
2019-09-18 13:43 ` Krishnamraju Eraparaju
@ 2019-09-21 18:35 ` Krishnamraju Eraparaju
2019-10-01 17:17 ` Krishnamraju Eraparaju
0 siblings, 1 reply; 15+ messages in thread
From: Krishnamraju Eraparaju @ 2019-09-21 18:35 UTC (permalink / raw)
To: Sagi Grimberg, Bernard Metzler
Cc: Jason Gunthorpe, Steve Wise, linux-rdma, Nirranjan Kirubaharan,
'Potnuri Bharat Teja'
On Wednesday, September 09/18/19, 2019 at 19:13:25 +0530, Krishnamraju Eraparaju wrote:
Hi,
I have restructured iw_cxgb4 driver to support the proposed iwcm
changes. Issues addressed by this patch:
-all iw_rem_ref calls from iwcm are outside the spinlock critical
section.
-fixes siw early freeing of siw_base_qp issue.
-the ib_destroy_qp for iw_cxgb4 does not block, even before CM getting
destroyed.
Currently we are testing these changes, will submit the patch next week.
PATCH:
-----
diff --git a/drivers/infiniband/core/iwcm.c
b/drivers/infiniband/core/iwcm.c
index 72141c5..7e1d834 100644
--- a/drivers/infiniband/core/iwcm.c
+++ b/drivers/infiniband/core/iwcm.c
@@ -372,6 +372,7 @@ int iw_cm_disconnect(struct iw_cm_id *cm_id, int
abrupt)
static void destroy_cm_id(struct iw_cm_id *cm_id)
{
struct iwcm_id_private *cm_id_priv;
+ struct ib_qp *qp;
unsigned long flags;
cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
@@ -389,6 +390,9 @@ static void destroy_cm_id(struct iw_cm_id *cm_id)
set_bit(IWCM_F_DROP_EVENTS, &cm_id_priv->flags);
spin_lock_irqsave(&cm_id_priv->lock, flags);
+ qp = cm_id_priv->qp;
+ cm_id_priv->qp = NULL;
+
switch (cm_id_priv->state) {
case IW_CM_STATE_LISTEN:
cm_id_priv->state = IW_CM_STATE_DESTROYING;
@@ -401,7 +405,7 @@ static void destroy_cm_id(struct iw_cm_id *cm_id)
cm_id_priv->state = IW_CM_STATE_DESTROYING;
spin_unlock_irqrestore(&cm_id_priv->lock, flags);
/* Abrupt close of the connection */
- (void)iwcm_modify_qp_err(cm_id_priv->qp);
+ (void)iwcm_modify_qp_err(qp);
spin_lock_irqsave(&cm_id_priv->lock, flags);
break;
case IW_CM_STATE_IDLE:
@@ -426,11 +430,9 @@ static void destroy_cm_id(struct iw_cm_id *cm_id)
BUG();
break;
}
- if (cm_id_priv->qp) {
- cm_id_priv->id.device->ops.iw_rem_ref(cm_id_priv->qp);
- cm_id_priv->qp = NULL;
- }
spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+ if (qp)
+ cm_id_priv->id.device->ops.iw_rem_ref(qp);
if (cm_id->mapped) {
iwpm_remove_mapinfo(&cm_id->local_addr,
&cm_id->m_local_addr);
@@ -671,11 +673,11 @@ int iw_cm_accept(struct iw_cm_id *cm_id,
BUG_ON(cm_id_priv->state != IW_CM_STATE_CONN_RECV);
cm_id_priv->state = IW_CM_STATE_IDLE;
spin_lock_irqsave(&cm_id_priv->lock, flags);
- if (cm_id_priv->qp) {
- cm_id->device->ops.iw_rem_ref(qp);
- cm_id_priv->qp = NULL;
- }
+ qp = cm_id_priv->qp;
+ cm_id_priv->qp = NULL;
spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+ if (qp)
+ cm_id->device->ops.iw_rem_ref(qp);
clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags);
wake_up_all(&cm_id_priv->connect_wait);
}
@@ -696,7 +698,7 @@ int iw_cm_connect(struct iw_cm_id *cm_id, struct
iw_cm_conn_param *iw_param)
struct iwcm_id_private *cm_id_priv;
int ret;
unsigned long flags;
- struct ib_qp *qp;
+ struct ib_qp *qp = NULL;
cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
@@ -730,13 +732,13 @@ int iw_cm_connect(struct iw_cm_id *cm_id, struct
iw_cm_conn_param *iw_param)
return 0; /* success */
spin_lock_irqsave(&cm_id_priv->lock, flags);
- if (cm_id_priv->qp) {
- cm_id->device->ops.iw_rem_ref(qp);
- cm_id_priv->qp = NULL;
- }
+ qp = cm_id_priv->qp;
+ cm_id_priv->qp = NULL;
cm_id_priv->state = IW_CM_STATE_IDLE;
err:
spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+ if (qp)
+ cm_id->device->ops.iw_rem_ref(qp);
clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags);
wake_up_all(&cm_id_priv->connect_wait);
return ret;
@@ -878,6 +880,7 @@ static int cm_conn_est_handler(struct
iwcm_id_private *cm_id_priv,
static int cm_conn_rep_handler(struct iwcm_id_private *cm_id_priv,
struct iw_cm_event *iw_event)
{
+ struct ib_qp *qp = NULL;
unsigned long flags;
int ret;
@@ -896,11 +899,13 @@ static int cm_conn_rep_handler(struct
iwcm_id_private *cm_id_priv,
cm_id_priv->state = IW_CM_STATE_ESTABLISHED;
} else {
/* REJECTED or RESET */
- cm_id_priv->id.device->ops.iw_rem_ref(cm_id_priv->qp);
+ qp = cm_id_priv->qp;
cm_id_priv->qp = NULL;
cm_id_priv->state = IW_CM_STATE_IDLE;
}
spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+ if (qp)
+ cm_id_priv->id.device->ops.iw_rem_ref(qp);
ret = cm_id_priv->id.cm_handler(&cm_id_priv->id, iw_event);
if (iw_event->private_data_len)
@@ -942,14 +947,13 @@ static void cm_disconnect_handler(struct
iwcm_id_private *cm_id_priv,
static int cm_close_handler(struct iwcm_id_private *cm_id_priv,
struct iw_cm_event *iw_event)
{
+ struct ib_qp *qp;
unsigned long flags;
int ret = 0;
spin_lock_irqsave(&cm_id_priv->lock, flags);
+ qp = cm_id_priv->qp;
+ cm_id_priv->qp = NULL;
- if (cm_id_priv->qp) {
- cm_id_priv->id.device->ops.iw_rem_ref(cm_id_priv->qp);
- cm_id_priv->qp = NULL;
- }
switch (cm_id_priv->state) {
case IW_CM_STATE_ESTABLISHED:
case IW_CM_STATE_CLOSING:
@@ -965,6 +969,8 @@ static int cm_close_handler(struct iwcm_id_private
*cm_id_priv,
}
spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+ if (qp)
+ cm_id_priv->id.device->ops.iw_rem_ref(qp);
return ret;
}
diff --git a/drivers/infiniband/hw/cxgb4/device.c
b/drivers/infiniband/hw/cxgb4/device.c
index a8b9548..330483d 100644
--- a/drivers/infiniband/hw/cxgb4/device.c
+++ b/drivers/infiniband/hw/cxgb4/device.c
@@ -747,6 +747,7 @@ void c4iw_release_dev_ucontext(struct c4iw_rdev
*rdev,
struct list_head *pos, *nxt;
struct c4iw_qid_list *entry;
+ wait_event(uctx->wait, refcount_read(&uctx->refcnt) == 1);
mutex_lock(&uctx->lock);
list_for_each_safe(pos, nxt, &uctx->qpids) {
entry = list_entry(pos, struct c4iw_qid_list, entry);
@@ -775,6 +776,8 @@ void c4iw_init_dev_ucontext(struct c4iw_rdev *rdev,
INIT_LIST_HEAD(&uctx->qpids);
INIT_LIST_HEAD(&uctx->cqids);
mutex_init(&uctx->lock);
+ init_waitqueue_head(&uctx->wait);
+ refcount_set(&uctx->refcnt, 1);
}
/* Caller takes care of locking if needed */
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index 7d06b0f..441adc6 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -110,6 +110,8 @@ struct c4iw_dev_ucontext {
struct list_head cqids;
struct mutex lock;
struct kref kref;
+ wait_queue_head_t wait;
+ refcount_t refcnt;
};
enum c4iw_rdev_flags {
@@ -490,13 +492,13 @@ struct c4iw_qp {
struct t4_wq wq;
spinlock_t lock;
struct mutex mutex;
+ struct kref kref;
wait_queue_head_t wait;
int sq_sig_all;
struct c4iw_srq *srq;
+ struct work_struct free_work;
struct c4iw_ucontext *ucontext;
struct c4iw_wr_wait *wr_waitp;
- struct completion qp_rel_comp;
- refcount_t qp_refcnt;
};
static inline struct c4iw_qp *to_c4iw_qp(struct ib_qp *ibqp)
@@ -531,6 +533,7 @@ struct c4iw_ucontext {
spinlock_t mmap_lock;
struct list_head mmaps;
bool is_32b_cqe;
+ struct completion qp_comp;
};
static inline struct c4iw_ucontext *to_c4iw_ucontext(struct ib_ucontext
*c)
diff --git a/drivers/infiniband/hw/cxgb4/qp.c
b/drivers/infiniband/hw/cxgb4/qp.c
index eb9368b..ca61193 100644
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -889,17 +889,44 @@ static int build_inv_stag(union t4_wr *wqe, const
struct ib_send_wr *wr,
return 0;
}
+static void free_qp_work(struct work_struct *work)
+{
+ struct c4iw_dev_ucontext *uctx;
+ struct c4iw_qp *qhp;
+ struct c4iw_dev *rhp;
+
+ qhp = container_of(work, struct c4iw_qp, free_work);
+ rhp = qhp->rhp;
+ uctx = qhp->ucontext ? &qhp->ucontext->uctx : &rhp->rdev.uctx;
+
+ pr_debug("qhp %p ucontext %p\n", qhp, qhp->ucontext);
+ destroy_qp(&rhp->rdev, &qhp->wq, uctx, !qhp->srq);
+
+ c4iw_put_wr_wait(qhp->wr_waitp);
+ kfree(qhp);
+ refcount_dec(&uctx->refcnt);
+ wake_up(&uctx->wait);
+}
+
+static void queue_qp_free(struct kref *kref)
+{
+ struct c4iw_qp *qhp;
+
+ qhp = container_of(kref, struct c4iw_qp, kref);
+ pr_debug("qhp %p\n", qhp);
+ queue_work(qhp->rhp->rdev.free_workq, &qhp->free_work);
+}
+
void c4iw_qp_add_ref(struct ib_qp *qp)
{
pr_debug("ib_qp %p\n", qp);
- refcount_inc(&to_c4iw_qp(qp)->qp_refcnt);
+ kref_get(&to_c4iw_qp(qp)->kref);
}
void c4iw_qp_rem_ref(struct ib_qp *qp)
{
pr_debug("ib_qp %p\n", qp);
- if (refcount_dec_and_test(&to_c4iw_qp(qp)->qp_refcnt))
- complete(&to_c4iw_qp(qp)->qp_rel_comp);
+ kref_put(&to_c4iw_qp(qp)->kref, queue_qp_free);
}
static void add_to_fc_list(struct list_head *head, struct list_head
*entry)
@@ -2071,12 +2098,10 @@ int c4iw_destroy_qp(struct ib_qp *ib_qp, struct
ib_udata *udata)
{
struct c4iw_dev *rhp;
struct c4iw_qp *qhp;
- struct c4iw_ucontext *ucontext;
struct c4iw_qp_attributes attrs;
qhp = to_c4iw_qp(ib_qp);
rhp = qhp->rhp;
- ucontext = qhp->ucontext;
attrs.next_state = C4IW_QP_STATE_ERROR;
if (qhp->attr.state == C4IW_QP_STATE_TERMINATE)
@@ -2094,17 +2119,7 @@ int c4iw_destroy_qp(struct ib_qp *ib_qp, struct
ib_udata *udata)
c4iw_qp_rem_ref(ib_qp);
- wait_for_completion(&qhp->qp_rel_comp);
-
pr_debug("ib_qp %p qpid 0x%0x\n", ib_qp, qhp->wq.sq.qid);
- pr_debug("qhp %p ucontext %p\n", qhp, ucontext);
-
- destroy_qp(&rhp->rdev, &qhp->wq,
- ucontext ? &ucontext->uctx : &rhp->rdev.uctx,
!qhp->srq);
-
- c4iw_put_wr_wait(qhp->wr_waitp);
-
- kfree(qhp);
return 0;
}
@@ -2214,8 +2229,8 @@ struct ib_qp *c4iw_create_qp(struct ib_pd *pd,
struct ib_qp_init_attr *attrs,
spin_lock_init(&qhp->lock);
mutex_init(&qhp->mutex);
init_waitqueue_head(&qhp->wait);
- init_completion(&qhp->qp_rel_comp);
- refcount_set(&qhp->qp_refcnt, 1);
+ kref_init(&qhp->kref);
+ INIT_WORK(&qhp->free_work, free_qp_work);
ret = xa_insert_irq(&rhp->qps, qhp->wq.sq.qid, qhp, GFP_KERNEL);
if (ret)
@@ -2335,6 +2350,8 @@ struct ib_qp *c4iw_create_qp(struct ib_pd *pd,
struct ib_qp_init_attr *attrs,
if (attrs->srq)
qhp->srq = to_c4iw_srq(attrs->srq);
INIT_LIST_HEAD(&qhp->db_fc_entry);
+ refcount_inc(ucontext ? &ucontext->uctx.refcnt :
+ &rhp->rdev.uctx.refcnt);
pr_debug("sq id %u size %u memsize %zu num_entries %u rq id %u
size %u memsize %zu num_entries %u\n",
qhp->wq.sq.qid, qhp->wq.sq.size, qhp->wq.sq.memsize,
attrs->cap.max_send_wr, qhp->wq.rq.qid,
qhp->wq.rq.size,
diff --git a/drivers/infiniband/sw/siw/siw_qp.c
b/drivers/infiniband/sw/siw/siw_qp.c
index 0990307..743d3d2 100644
--- a/drivers/infiniband/sw/siw/siw_qp.c
+++ b/drivers/infiniband/sw/siw/siw_qp.c
@@ -1305,6 +1305,7 @@ int siw_qp_add(struct siw_device *sdev, struct
siw_qp *qp)
void siw_free_qp(struct kref *ref)
{
struct siw_qp *found, *qp = container_of(ref, struct siw_qp,
ref);
+ struct siw_base_qp *siw_base_qp = to_siw_base_qp(qp->ib_qp);
struct siw_device *sdev = qp->sdev;
unsigned long flags;
@@ -1327,4 +1328,5 @@ void siw_free_qp(struct kref *ref)
atomic_dec(&sdev->num_qp);
siw_dbg_qp(qp, "free QP\n");
kfree_rcu(qp, rcu);
+ kfree(siw_base_qp);
}
diff --git a/drivers/infiniband/sw/siw/siw_verbs.c
b/drivers/infiniband/sw/siw/siw_verbs.c
index 03176a3..35eee3f 100644
--- a/drivers/infiniband/sw/siw/siw_verbs.c
+++ b/drivers/infiniband/sw/siw/siw_verbs.c
@@ -605,7 +605,6 @@ int siw_verbs_modify_qp(struct ib_qp *base_qp,
struct ib_qp_attr *attr,
int siw_destroy_qp(struct ib_qp *base_qp, struct ib_udata *udata)
{
struct siw_qp *qp = to_siw_qp(base_qp);
- struct siw_base_qp *siw_base_qp = to_siw_base_qp(base_qp);
struct siw_ucontext *uctx =
rdma_udata_to_drv_context(udata, struct siw_ucontext,
base_ucontext);
@@ -642,7 +641,6 @@ int siw_destroy_qp(struct ib_qp *base_qp, struct
ib_udata *udata)
qp->scq = qp->rcq = NULL;
siw_qp_put(qp);
- kfree(siw_base_qp);
return 0;
}
> On Tuesday, September 09/17/19, 2019 at 22:50:27 +0530, Sagi Grimberg wrote:
> >
> > >> To: "Krishnamraju Eraparaju" <krishna2@chelsio.com>
> > >> From: "Jason Gunthorpe" <jgg@ziepe.ca>
> > >> Date: 09/16/2019 06:28PM
> > >> Cc: "Steve Wise" <larrystevenwise@gmail.com>, "Bernard Metzler"
> > >> <BMT@zurich.ibm.com>, "Sagi Grimberg" <sagi@grimberg.me>,
> > >> "linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
> > >> Subject: [EXTERNAL] Re: Re: [PATCH v3] iwcm: don't hold the irq
> > >> disabled lock on iw_rem_ref
> > >>
> > >> On Wed, Sep 11, 2019 at 09:28:16PM +0530, Krishnamraju Eraparaju
> > >> wrote:
> > >>> Hi Steve & Bernard,
> > >>>
> > >>> Thanks for the review comments.
> > >>> I will do those formating changes.
> > >>
> > >> I don't see anything in patchworks, but the consensus is to drop
> > >> Sagi's patch pending this future patch?
> > >>
> > >> Jason
> > >>
> > > This is my impression as well. But consensus should be
> > > explicit...Sagi, what do you think?
> >
> > I don't really care, but given the changes from Krishnamraju cause other
> > problems I'd ask if my version is also offending his test.
> Hi Sagi,
> Your version holds good for rdma_destroy_id() path only, but there are
> other places where iw_rem_ref() is being called within spinlocks, here is
> the call trace when iw_rem_ref() is called in cm_close_handler() path:
> [ 627.716339] Call Trace:
> [ 627.716339] ? load_new_mm_cr3+0xc0/0xc0
> [ 627.716339] on_each_cpu+0x23/0x40
> [ 627.716339] flush_tlb_kernel_range+0x74/0x80
> [ 627.716340] free_unmap_vmap_area+0x2a/0x40
> [ 627.716340] remove_vm_area+0x59/0x60
> [ 627.716340] __vunmap+0x46/0x210
> [ 627.716340] siw_free_qp+0x88/0x120 [siw]
> [ 627.716340] cm_work_handler+0x5b8/0xd00 <=====
> [ 627.716340] process_one_work+0x155/0x380
> [ 627.716341] worker_thread+0x41/0x3b0
> [ 627.716341] kthread+0xf3/0x130
> [ 627.716341] ? process_one_work+0x380/0x380
> [ 627.716341] ? kthread_bind+0x10/0x10
> [ 627.716341] ret_from_fork+0x35/0x40
>
> Hence, I moved all the occurances of iw_rem_ref() outside of spinlocks
> critical section.
> >
> > In general, I do not think that making resources free routines (both
> > explict or implicit via ref dec) under a spinlock is not a robust
> > design.
> >
> > I would first make it clear and documented what cm_id_priv->lock is
> > protecting. In my mind, it should protect *its own* mutations of
> > cm_id_priv and by design leave all the ops calls outside the lock.
> >
> > I don't understand what is causing the Chelsio issue observed, but
> > it looks like c4iw_destroy_qp blocks on a completion that depends on a
> > refcount that is taken also by iwcm, which means that I cannot call
> > ib_destroy_qp if the cm is not destroyed as well?
> >
> > If that is madatory, I'd say that instead of blocking on this
> > completion, we can simply convert c4iw_qp_rem_ref if use a kref
> > which is not order dependent.
>
> I agree,
> I'm exploring best possible ways to address this issue, will update my
> final resolution by this Friday.
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v3] iwcm: don't hold the irq disabled lock on iw_rem_ref
2019-09-21 18:35 ` Krishnamraju Eraparaju
@ 2019-10-01 17:17 ` Krishnamraju Eraparaju
0 siblings, 0 replies; 15+ messages in thread
From: Krishnamraju Eraparaju @ 2019-10-01 17:17 UTC (permalink / raw)
To: Sagi Grimberg, Bernard Metzler
Cc: Jason Gunthorpe, Steve Wise, linux-rdma, Nirranjan Kirubaharan,
'Potnuri Bharat Teja'
Submitted the final version of my proposed patch at:
https://patchwork.kernel.org/patch/11169293/
Please review.
Thanks,
Krishna.
On Sunday, September 09/22/19, 2019 at 00:05:08 +0530, Krishnamraju Eraparaju wrote:
> On Wednesday, September 09/18/19, 2019 at 19:13:25 +0530, Krishnamraju Eraparaju wrote:
> Hi,
>
> I have restructured iw_cxgb4 driver to support the proposed iwcm
> changes. Issues addressed by this patch:
> -all iw_rem_ref calls from iwcm are outside the spinlock critical
> section.
> -fixes siw early freeing of siw_base_qp issue.
> -the ib_destroy_qp for iw_cxgb4 does not block, even before CM getting
> destroyed.
> Currently we are testing these changes, will submit the patch next week.
>
> PATCH:
> -----
> diff --git a/drivers/infiniband/core/iwcm.c
> b/drivers/infiniband/core/iwcm.c
> index 72141c5..7e1d834 100644
> --- a/drivers/infiniband/core/iwcm.c
> +++ b/drivers/infiniband/core/iwcm.c
> @@ -372,6 +372,7 @@ int iw_cm_disconnect(struct iw_cm_id *cm_id, int
> abrupt)
> static void destroy_cm_id(struct iw_cm_id *cm_id)
> {
> struct iwcm_id_private *cm_id_priv;
> + struct ib_qp *qp;
> unsigned long flags;
>
> cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
> @@ -389,6 +390,9 @@ static void destroy_cm_id(struct iw_cm_id *cm_id)
> set_bit(IWCM_F_DROP_EVENTS, &cm_id_priv->flags);
>
> spin_lock_irqsave(&cm_id_priv->lock, flags);
> + qp = cm_id_priv->qp;
> + cm_id_priv->qp = NULL;
> +
> switch (cm_id_priv->state) {
> case IW_CM_STATE_LISTEN:
> cm_id_priv->state = IW_CM_STATE_DESTROYING;
> @@ -401,7 +405,7 @@ static void destroy_cm_id(struct iw_cm_id *cm_id)
> cm_id_priv->state = IW_CM_STATE_DESTROYING;
> spin_unlock_irqrestore(&cm_id_priv->lock, flags);
> /* Abrupt close of the connection */
> - (void)iwcm_modify_qp_err(cm_id_priv->qp);
> + (void)iwcm_modify_qp_err(qp);
> spin_lock_irqsave(&cm_id_priv->lock, flags);
> break;
> case IW_CM_STATE_IDLE:
> @@ -426,11 +430,9 @@ static void destroy_cm_id(struct iw_cm_id *cm_id)
> BUG();
> break;
> }
> - if (cm_id_priv->qp) {
> - cm_id_priv->id.device->ops.iw_rem_ref(cm_id_priv->qp);
> - cm_id_priv->qp = NULL;
> - }
> spin_unlock_irqrestore(&cm_id_priv->lock, flags);
> + if (qp)
> + cm_id_priv->id.device->ops.iw_rem_ref(qp);
>
> if (cm_id->mapped) {
> iwpm_remove_mapinfo(&cm_id->local_addr,
> &cm_id->m_local_addr);
> @@ -671,11 +673,11 @@ int iw_cm_accept(struct iw_cm_id *cm_id,
> BUG_ON(cm_id_priv->state != IW_CM_STATE_CONN_RECV);
> cm_id_priv->state = IW_CM_STATE_IDLE;
> spin_lock_irqsave(&cm_id_priv->lock, flags);
> - if (cm_id_priv->qp) {
> - cm_id->device->ops.iw_rem_ref(qp);
> - cm_id_priv->qp = NULL;
> - }
> + qp = cm_id_priv->qp;
> + cm_id_priv->qp = NULL;
> spin_unlock_irqrestore(&cm_id_priv->lock, flags);
> + if (qp)
> + cm_id->device->ops.iw_rem_ref(qp);
> clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags);
> wake_up_all(&cm_id_priv->connect_wait);
> }
> @@ -696,7 +698,7 @@ int iw_cm_connect(struct iw_cm_id *cm_id, struct
> iw_cm_conn_param *iw_param)
> struct iwcm_id_private *cm_id_priv;
> int ret;
> unsigned long flags;
> - struct ib_qp *qp;
> + struct ib_qp *qp = NULL;
>
> cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
>
> @@ -730,13 +732,13 @@ int iw_cm_connect(struct iw_cm_id *cm_id, struct
> iw_cm_conn_param *iw_param)
> return 0; /* success */
>
> spin_lock_irqsave(&cm_id_priv->lock, flags);
> - if (cm_id_priv->qp) {
> - cm_id->device->ops.iw_rem_ref(qp);
> - cm_id_priv->qp = NULL;
> - }
> + qp = cm_id_priv->qp;
> + cm_id_priv->qp = NULL;
> cm_id_priv->state = IW_CM_STATE_IDLE;
> err:
> spin_unlock_irqrestore(&cm_id_priv->lock, flags);
> + if (qp)
> + cm_id->device->ops.iw_rem_ref(qp);
> clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags);
> wake_up_all(&cm_id_priv->connect_wait);
> return ret;
> @@ -878,6 +880,7 @@ static int cm_conn_est_handler(struct
> iwcm_id_private *cm_id_priv,
> static int cm_conn_rep_handler(struct iwcm_id_private *cm_id_priv,
> struct iw_cm_event *iw_event)
> {
> + struct ib_qp *qp = NULL;
> unsigned long flags;
> int ret;
>
> @@ -896,11 +899,13 @@ static int cm_conn_rep_handler(struct
> iwcm_id_private *cm_id_priv,
> cm_id_priv->state = IW_CM_STATE_ESTABLISHED;
> } else {
> /* REJECTED or RESET */
> - cm_id_priv->id.device->ops.iw_rem_ref(cm_id_priv->qp);
> + qp = cm_id_priv->qp;
> cm_id_priv->qp = NULL;
> cm_id_priv->state = IW_CM_STATE_IDLE;
> }
> spin_unlock_irqrestore(&cm_id_priv->lock, flags);
> + if (qp)
> + cm_id_priv->id.device->ops.iw_rem_ref(qp);
> ret = cm_id_priv->id.cm_handler(&cm_id_priv->id, iw_event);
>
> if (iw_event->private_data_len)
> @@ -942,14 +947,13 @@ static void cm_disconnect_handler(struct
> iwcm_id_private *cm_id_priv,
> static int cm_close_handler(struct iwcm_id_private *cm_id_priv,
> struct iw_cm_event *iw_event)
> {
> + struct ib_qp *qp;
> unsigned long flags;
> int ret = 0;
> spin_lock_irqsave(&cm_id_priv->lock, flags);
> + qp = cm_id_priv->qp;
> + cm_id_priv->qp = NULL;
>
> - if (cm_id_priv->qp) {
> - cm_id_priv->id.device->ops.iw_rem_ref(cm_id_priv->qp);
> - cm_id_priv->qp = NULL;
> - }
> switch (cm_id_priv->state) {
> case IW_CM_STATE_ESTABLISHED:
> case IW_CM_STATE_CLOSING:
> @@ -965,6 +969,8 @@ static int cm_close_handler(struct iwcm_id_private
> *cm_id_priv,
> }
> spin_unlock_irqrestore(&cm_id_priv->lock, flags);
>
> + if (qp)
> + cm_id_priv->id.device->ops.iw_rem_ref(qp);
> return ret;
> }
>
> diff --git a/drivers/infiniband/hw/cxgb4/device.c
> b/drivers/infiniband/hw/cxgb4/device.c
> index a8b9548..330483d 100644
> --- a/drivers/infiniband/hw/cxgb4/device.c
> +++ b/drivers/infiniband/hw/cxgb4/device.c
> @@ -747,6 +747,7 @@ void c4iw_release_dev_ucontext(struct c4iw_rdev
> *rdev,
> struct list_head *pos, *nxt;
> struct c4iw_qid_list *entry;
>
> + wait_event(uctx->wait, refcount_read(&uctx->refcnt) == 1);
> mutex_lock(&uctx->lock);
> list_for_each_safe(pos, nxt, &uctx->qpids) {
> entry = list_entry(pos, struct c4iw_qid_list, entry);
> @@ -775,6 +776,8 @@ void c4iw_init_dev_ucontext(struct c4iw_rdev *rdev,
> INIT_LIST_HEAD(&uctx->qpids);
> INIT_LIST_HEAD(&uctx->cqids);
> mutex_init(&uctx->lock);
> + init_waitqueue_head(&uctx->wait);
> + refcount_set(&uctx->refcnt, 1);
> }
>
> /* Caller takes care of locking if needed */
> diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
> b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
> index 7d06b0f..441adc6 100644
> --- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
> +++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
> @@ -110,6 +110,8 @@ struct c4iw_dev_ucontext {
> struct list_head cqids;
> struct mutex lock;
> struct kref kref;
> + wait_queue_head_t wait;
> + refcount_t refcnt;
> };
>
> enum c4iw_rdev_flags {
> @@ -490,13 +492,13 @@ struct c4iw_qp {
> struct t4_wq wq;
> spinlock_t lock;
> struct mutex mutex;
> + struct kref kref;
> wait_queue_head_t wait;
> int sq_sig_all;
> struct c4iw_srq *srq;
> + struct work_struct free_work;
> struct c4iw_ucontext *ucontext;
> struct c4iw_wr_wait *wr_waitp;
> - struct completion qp_rel_comp;
> - refcount_t qp_refcnt;
> };
>
> static inline struct c4iw_qp *to_c4iw_qp(struct ib_qp *ibqp)
> @@ -531,6 +533,7 @@ struct c4iw_ucontext {
> spinlock_t mmap_lock;
> struct list_head mmaps;
> bool is_32b_cqe;
> + struct completion qp_comp;
> };
>
> static inline struct c4iw_ucontext *to_c4iw_ucontext(struct ib_ucontext
> *c)
> diff --git a/drivers/infiniband/hw/cxgb4/qp.c
> b/drivers/infiniband/hw/cxgb4/qp.c
> index eb9368b..ca61193 100644
> --- a/drivers/infiniband/hw/cxgb4/qp.c
> +++ b/drivers/infiniband/hw/cxgb4/qp.c
> @@ -889,17 +889,44 @@ static int build_inv_stag(union t4_wr *wqe, const
> struct ib_send_wr *wr,
> return 0;
> }
>
> +static void free_qp_work(struct work_struct *work)
> +{
> + struct c4iw_dev_ucontext *uctx;
> + struct c4iw_qp *qhp;
> + struct c4iw_dev *rhp;
> +
> + qhp = container_of(work, struct c4iw_qp, free_work);
> + rhp = qhp->rhp;
> + uctx = qhp->ucontext ? &qhp->ucontext->uctx : &rhp->rdev.uctx;
> +
> + pr_debug("qhp %p ucontext %p\n", qhp, qhp->ucontext);
> + destroy_qp(&rhp->rdev, &qhp->wq, uctx, !qhp->srq);
> +
> + c4iw_put_wr_wait(qhp->wr_waitp);
> + kfree(qhp);
> + refcount_dec(&uctx->refcnt);
> + wake_up(&uctx->wait);
> +}
> +
> +static void queue_qp_free(struct kref *kref)
> +{
> + struct c4iw_qp *qhp;
> +
> + qhp = container_of(kref, struct c4iw_qp, kref);
> + pr_debug("qhp %p\n", qhp);
> + queue_work(qhp->rhp->rdev.free_workq, &qhp->free_work);
> +}
> +
> void c4iw_qp_add_ref(struct ib_qp *qp)
> {
> pr_debug("ib_qp %p\n", qp);
> - refcount_inc(&to_c4iw_qp(qp)->qp_refcnt);
> + kref_get(&to_c4iw_qp(qp)->kref);
> }
>
> void c4iw_qp_rem_ref(struct ib_qp *qp)
> {
> pr_debug("ib_qp %p\n", qp);
> - if (refcount_dec_and_test(&to_c4iw_qp(qp)->qp_refcnt))
> - complete(&to_c4iw_qp(qp)->qp_rel_comp);
> + kref_put(&to_c4iw_qp(qp)->kref, queue_qp_free);
> }
>
> static void add_to_fc_list(struct list_head *head, struct list_head
> *entry)
> @@ -2071,12 +2098,10 @@ int c4iw_destroy_qp(struct ib_qp *ib_qp, struct
> ib_udata *udata)
> {
> struct c4iw_dev *rhp;
> struct c4iw_qp *qhp;
> - struct c4iw_ucontext *ucontext;
> struct c4iw_qp_attributes attrs;
>
> qhp = to_c4iw_qp(ib_qp);
> rhp = qhp->rhp;
> - ucontext = qhp->ucontext;
>
> attrs.next_state = C4IW_QP_STATE_ERROR;
> if (qhp->attr.state == C4IW_QP_STATE_TERMINATE)
> @@ -2094,17 +2119,7 @@ int c4iw_destroy_qp(struct ib_qp *ib_qp, struct
> ib_udata *udata)
>
> c4iw_qp_rem_ref(ib_qp);
>
> - wait_for_completion(&qhp->qp_rel_comp);
> -
> pr_debug("ib_qp %p qpid 0x%0x\n", ib_qp, qhp->wq.sq.qid);
> - pr_debug("qhp %p ucontext %p\n", qhp, ucontext);
> -
> - destroy_qp(&rhp->rdev, &qhp->wq,
> - ucontext ? &ucontext->uctx : &rhp->rdev.uctx,
> !qhp->srq);
> -
> - c4iw_put_wr_wait(qhp->wr_waitp);
> -
> - kfree(qhp);
> return 0;
> }
>
> @@ -2214,8 +2229,8 @@ struct ib_qp *c4iw_create_qp(struct ib_pd *pd,
> struct ib_qp_init_attr *attrs,
> spin_lock_init(&qhp->lock);
> mutex_init(&qhp->mutex);
> init_waitqueue_head(&qhp->wait);
> - init_completion(&qhp->qp_rel_comp);
> - refcount_set(&qhp->qp_refcnt, 1);
> + kref_init(&qhp->kref);
> + INIT_WORK(&qhp->free_work, free_qp_work);
>
> ret = xa_insert_irq(&rhp->qps, qhp->wq.sq.qid, qhp, GFP_KERNEL);
> if (ret)
> @@ -2335,6 +2350,8 @@ struct ib_qp *c4iw_create_qp(struct ib_pd *pd,
> struct ib_qp_init_attr *attrs,
> if (attrs->srq)
> qhp->srq = to_c4iw_srq(attrs->srq);
> INIT_LIST_HEAD(&qhp->db_fc_entry);
> + refcount_inc(ucontext ? &ucontext->uctx.refcnt :
> + &rhp->rdev.uctx.refcnt);
> pr_debug("sq id %u size %u memsize %zu num_entries %u rq id %u
> size %u memsize %zu num_entries %u\n",
> qhp->wq.sq.qid, qhp->wq.sq.size, qhp->wq.sq.memsize,
> attrs->cap.max_send_wr, qhp->wq.rq.qid,
> qhp->wq.rq.size,
> diff --git a/drivers/infiniband/sw/siw/siw_qp.c
> b/drivers/infiniband/sw/siw/siw_qp.c
> index 0990307..743d3d2 100644
> --- a/drivers/infiniband/sw/siw/siw_qp.c
> +++ b/drivers/infiniband/sw/siw/siw_qp.c
> @@ -1305,6 +1305,7 @@ int siw_qp_add(struct siw_device *sdev, struct
> siw_qp *qp)
> void siw_free_qp(struct kref *ref)
> {
> struct siw_qp *found, *qp = container_of(ref, struct siw_qp,
> ref);
> + struct siw_base_qp *siw_base_qp = to_siw_base_qp(qp->ib_qp);
> struct siw_device *sdev = qp->sdev;
> unsigned long flags;
>
> @@ -1327,4 +1328,5 @@ void siw_free_qp(struct kref *ref)
> atomic_dec(&sdev->num_qp);
> siw_dbg_qp(qp, "free QP\n");
> kfree_rcu(qp, rcu);
> + kfree(siw_base_qp);
> }
> diff --git a/drivers/infiniband/sw/siw/siw_verbs.c
> b/drivers/infiniband/sw/siw/siw_verbs.c
> index 03176a3..35eee3f 100644
> --- a/drivers/infiniband/sw/siw/siw_verbs.c
> +++ b/drivers/infiniband/sw/siw/siw_verbs.c
> @@ -605,7 +605,6 @@ int siw_verbs_modify_qp(struct ib_qp *base_qp,
> struct ib_qp_attr *attr,
> int siw_destroy_qp(struct ib_qp *base_qp, struct ib_udata *udata)
> {
> struct siw_qp *qp = to_siw_qp(base_qp);
> - struct siw_base_qp *siw_base_qp = to_siw_base_qp(base_qp);
> struct siw_ucontext *uctx =
> rdma_udata_to_drv_context(udata, struct siw_ucontext,
> base_ucontext);
> @@ -642,7 +641,6 @@ int siw_destroy_qp(struct ib_qp *base_qp, struct
> ib_udata *udata)
> qp->scq = qp->rcq = NULL;
>
> siw_qp_put(qp);
> - kfree(siw_base_qp);
>
> return 0;
> }
>
> > On Tuesday, September 09/17/19, 2019 at 22:50:27 +0530, Sagi Grimberg wrote:
> > >
> > > >> To: "Krishnamraju Eraparaju" <krishna2@chelsio.com>
> > > >> From: "Jason Gunthorpe" <jgg@ziepe.ca>
> > > >> Date: 09/16/2019 06:28PM
> > > >> Cc: "Steve Wise" <larrystevenwise@gmail.com>, "Bernard Metzler"
> > > >> <BMT@zurich.ibm.com>, "Sagi Grimberg" <sagi@grimberg.me>,
> > > >> "linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
> > > >> Subject: [EXTERNAL] Re: Re: [PATCH v3] iwcm: don't hold the irq
> > > >> disabled lock on iw_rem_ref
> > > >>
> > > >> On Wed, Sep 11, 2019 at 09:28:16PM +0530, Krishnamraju Eraparaju
> > > >> wrote:
> > > >>> Hi Steve & Bernard,
> > > >>>
> > > >>> Thanks for the review comments.
> > > >>> I will do those formating changes.
> > > >>
> > > >> I don't see anything in patchworks, but the consensus is to drop
> > > >> Sagi's patch pending this future patch?
> > > >>
> > > >> Jason
> > > >>
> > > > This is my impression as well. But consensus should be
> > > > explicit...Sagi, what do you think?
> > >
> > > I don't really care, but given the changes from Krishnamraju cause other
> > > problems I'd ask if my version is also offending his test.
> > Hi Sagi,
> > Your version holds good for rdma_destroy_id() path only, but there are
> > other places where iw_rem_ref() is being called within spinlocks, here is
> > the call trace when iw_rem_ref() is called in cm_close_handler() path:
> > [ 627.716339] Call Trace:
> > [ 627.716339] ? load_new_mm_cr3+0xc0/0xc0
> > [ 627.716339] on_each_cpu+0x23/0x40
> > [ 627.716339] flush_tlb_kernel_range+0x74/0x80
> > [ 627.716340] free_unmap_vmap_area+0x2a/0x40
> > [ 627.716340] remove_vm_area+0x59/0x60
> > [ 627.716340] __vunmap+0x46/0x210
> > [ 627.716340] siw_free_qp+0x88/0x120 [siw]
> > [ 627.716340] cm_work_handler+0x5b8/0xd00 <=====
> > [ 627.716340] process_one_work+0x155/0x380
> > [ 627.716341] worker_thread+0x41/0x3b0
> > [ 627.716341] kthread+0xf3/0x130
> > [ 627.716341] ? process_one_work+0x380/0x380
> > [ 627.716341] ? kthread_bind+0x10/0x10
> > [ 627.716341] ret_from_fork+0x35/0x40
> >
> > Hence, I moved all the occurances of iw_rem_ref() outside of spinlocks
> > critical section.
> > >
> > > In general, I do not think that making resources free routines (both
> > > explict or implicit via ref dec) under a spinlock is not a robust
> > > design.
> > >
> > > I would first make it clear and documented what cm_id_priv->lock is
> > > protecting. In my mind, it should protect *its own* mutations of
> > > cm_id_priv and by design leave all the ops calls outside the lock.
> > >
> > > I don't understand what is causing the Chelsio issue observed, but
> > > it looks like c4iw_destroy_qp blocks on a completion that depends on a
> > > refcount that is taken also by iwcm, which means that I cannot call
> > > ib_destroy_qp if the cm is not destroyed as well?
> > >
> > > If that is madatory, I'd say that instead of blocking on this
> > > completion, we can simply convert c4iw_qp_rem_ref if use a kref
> > > which is not order dependent.
> >
> > I agree,
> > I'm exploring best possible ways to address this issue, will update my
> > final resolution by this Friday.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2019-10-01 17:17 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-04 21:25 [PATCH v3] iwcm: don't hold the irq disabled lock on iw_rem_ref Sagi Grimberg
2019-09-10 11:18 ` Krishnamraju Eraparaju
2019-09-10 16:53 ` Sagi Grimberg
2019-09-10 19:21 ` Krishnamraju Eraparaju
2019-09-11 9:38 ` Bernard Metzler
2019-09-11 14:42 ` Steve Wise
2019-09-11 15:58 ` Krishnamraju Eraparaju
2019-09-16 16:28 ` Jason Gunthorpe
2019-09-17 9:04 ` Bernard Metzler
2019-09-17 12:47 ` Krishnamraju Eraparaju
2019-09-17 13:40 ` Bernard Metzler
2019-09-17 17:20 ` Sagi Grimberg
2019-09-18 13:43 ` Krishnamraju Eraparaju
2019-09-21 18:35 ` Krishnamraju Eraparaju
2019-10-01 17:17 ` Krishnamraju Eraparaju
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).