* [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone
@ 2021-04-18 13:37 Leon Romanovsky
2021-04-20 12:39 ` Jason Gunthorpe
0 siblings, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2021-04-18 13:37 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe; +Cc: Shay Drory, linux-rdma
From: Shay Drory <shayd@nvidia.com>
Currently, in case of QP, the following use-after-free is possible:
cpu0 cpu1
---- ----
res_get_common_dumpit()
rdma_restrack_get()
fill_res_qp_entry()
ib_destroy_qp_user()
rdma_restrack_del()
qp->device->ops.destroy_qp()
ib_query_qp()
qp->device->ops.query_qp()
--> use-after-free-qp
This is because rdma_restrack_del(), in case of QP, isn't waiting until
all users are gone.
Fix it by making rdma_restrack_del() wait until all users are gone for
QPs as well.
Fixes: 13ef5539def7 ("RDMA/restrack: Count references to the verbs objects")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/core/restrack.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
index ffabaf327242..def0c5b0efe9 100644
--- a/drivers/infiniband/core/restrack.c
+++ b/drivers/infiniband/core/restrack.c
@@ -340,7 +340,7 @@ void rdma_restrack_del(struct rdma_restrack_entry *res)
rt = &dev->res[res->type];
old = xa_erase(&rt->xa, res->id);
- if (res->type == RDMA_RESTRACK_MR || res->type == RDMA_RESTRACK_QP)
+ if (res->type == RDMA_RESTRACK_MR)
return;
WARN_ON(old != res);
--
2.30.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone
2021-04-18 13:37 [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone Leon Romanovsky
@ 2021-04-20 12:39 ` Jason Gunthorpe
2021-04-20 13:06 ` Leon Romanovsky
0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2021-04-20 12:39 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: Doug Ledford, Shay Drory, linux-rdma
On Sun, Apr 18, 2021 at 04:37:35PM +0300, Leon Romanovsky wrote:
> From: Shay Drory <shayd@nvidia.com>
>
> Currently, in case of QP, the following use-after-free is possible:
>
> cpu0 cpu1
> ---- ----
> res_get_common_dumpit()
> rdma_restrack_get()
> fill_res_qp_entry()
> ib_destroy_qp_user()
> rdma_restrack_del()
> qp->device->ops.destroy_qp()
> ib_query_qp()
> qp->device->ops.query_qp()
> --> use-after-free-qp
>
> This is because rdma_restrack_del(), in case of QP, isn't waiting until
> all users are gone.
>
> Fix it by making rdma_restrack_del() wait until all users are gone for
> QPs as well.
>
> Fixes: 13ef5539def7 ("RDMA/restrack: Count references to the verbs objects")
> Signed-off-by: Shay Drory <shayd@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
> drivers/infiniband/core/restrack.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
> index ffabaf327242..def0c5b0efe9 100644
> --- a/drivers/infiniband/core/restrack.c
> +++ b/drivers/infiniband/core/restrack.c
> @@ -340,7 +340,7 @@ void rdma_restrack_del(struct rdma_restrack_entry *res)
> rt = &dev->res[res->type];
>
> old = xa_erase(&rt->xa, res->id);
> - if (res->type == RDMA_RESTRACK_MR || res->type == RDMA_RESTRACK_QP)
> + if (res->type == RDMA_RESTRACK_MR)
> return;
Why is MR skipping this?
It also calls into the driver under its dumpit, at the very least this
needs a comment.
Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone
2021-04-20 12:39 ` Jason Gunthorpe
@ 2021-04-20 13:06 ` Leon Romanovsky
2021-04-20 15:25 ` Jason Gunthorpe
0 siblings, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2021-04-20 13:06 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Doug Ledford, Shay Drory, linux-rdma
On Tue, Apr 20, 2021 at 09:39:50AM -0300, Jason Gunthorpe wrote:
> On Sun, Apr 18, 2021 at 04:37:35PM +0300, Leon Romanovsky wrote:
> > From: Shay Drory <shayd@nvidia.com>
> >
> > Currently, in case of QP, the following use-after-free is possible:
> >
> > cpu0 cpu1
> > ---- ----
> > res_get_common_dumpit()
> > rdma_restrack_get()
> > fill_res_qp_entry()
> > ib_destroy_qp_user()
> > rdma_restrack_del()
> > qp->device->ops.destroy_qp()
> > ib_query_qp()
> > qp->device->ops.query_qp()
> > --> use-after-free-qp
> >
> > This is because rdma_restrack_del(), in case of QP, isn't waiting until
> > all users are gone.
> >
> > Fix it by making rdma_restrack_del() wait until all users are gone for
> > QPs as well.
> >
> > Fixes: 13ef5539def7 ("RDMA/restrack: Count references to the verbs objects")
> > Signed-off-by: Shay Drory <shayd@nvidia.com>
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
> > drivers/infiniband/core/restrack.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
> > index ffabaf327242..def0c5b0efe9 100644
> > --- a/drivers/infiniband/core/restrack.c
> > +++ b/drivers/infiniband/core/restrack.c
> > @@ -340,7 +340,7 @@ void rdma_restrack_del(struct rdma_restrack_entry *res)
> > rt = &dev->res[res->type];
> >
> > old = xa_erase(&rt->xa, res->id);
> > - if (res->type == RDMA_RESTRACK_MR || res->type == RDMA_RESTRACK_QP)
> > + if (res->type == RDMA_RESTRACK_MR)
> > return;
>
> Why is MR skipping this?
I don't remember the justification for RDMA_RESTRACK_MR || RDMA_RESTRACK_QP check.
My guess that it is related to the allocation flow (both are not converted) and
have internal objects to the drivers.
>
>
> It also calls into the driver under its dumpit, at the very least this
> needs a comment.
>
> Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone
2021-04-20 13:06 ` Leon Romanovsky
@ 2021-04-20 15:25 ` Jason Gunthorpe
2021-04-21 5:03 ` Leon Romanovsky
0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2021-04-20 15:25 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: Doug Ledford, Shay Drory, linux-rdma
On Tue, Apr 20, 2021 at 04:06:03PM +0300, Leon Romanovsky wrote:
> On Tue, Apr 20, 2021 at 09:39:50AM -0300, Jason Gunthorpe wrote:
> > On Sun, Apr 18, 2021 at 04:37:35PM +0300, Leon Romanovsky wrote:
> > > From: Shay Drory <shayd@nvidia.com>
> > >
> > > Currently, in case of QP, the following use-after-free is possible:
> > >
> > > cpu0 cpu1
> > > res_get_common_dumpit()
> > > rdma_restrack_get()
> > > fill_res_qp_entry()
> > > ib_destroy_qp_user()
> > > rdma_restrack_del()
> > > qp->device->ops.destroy_qp()
> > > ib_query_qp()
> > > qp->device->ops.query_qp()
> > >
> > > This is because rdma_restrack_del(), in case of QP, isn't waiting until
> > > all users are gone.
> > >
> > > Fix it by making rdma_restrack_del() wait until all users are gone for
> > > QPs as well.
> > >
> > > Fixes: 13ef5539def7 ("RDMA/restrack: Count references to the verbs objects")
> > > Signed-off-by: Shay Drory <shayd@nvidia.com>
> > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > drivers/infiniband/core/restrack.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
> > > index ffabaf327242..def0c5b0efe9 100644
> > > +++ b/drivers/infiniband/core/restrack.c
> > > @@ -340,7 +340,7 @@ void rdma_restrack_del(struct rdma_restrack_entry *res)
> > > rt = &dev->res[res->type];
> > >
> > > old = xa_erase(&rt->xa, res->id);
> > > - if (res->type == RDMA_RESTRACK_MR || res->type == RDMA_RESTRACK_QP)
> > > + if (res->type == RDMA_RESTRACK_MR)
> > > return;
> >
> > Why is MR skipping this?
>
> I don't remember the justification for RDMA_RESTRACK_MR || RDMA_RESTRACK_QP check.
> My guess that it is related to the allocation flow (both are not converted) and
> have internal objects to the drivers.
Well, I don't understand why QP has a bug but MR would not have the
same one??
Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone
2021-04-20 15:25 ` Jason Gunthorpe
@ 2021-04-21 5:03 ` Leon Romanovsky
2021-04-22 14:29 ` Jason Gunthorpe
0 siblings, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2021-04-21 5:03 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Doug Ledford, Shay Drory, linux-rdma
On Tue, Apr 20, 2021 at 12:25:41PM -0300, Jason Gunthorpe wrote:
> On Tue, Apr 20, 2021 at 04:06:03PM +0300, Leon Romanovsky wrote:
> > On Tue, Apr 20, 2021 at 09:39:50AM -0300, Jason Gunthorpe wrote:
> > > On Sun, Apr 18, 2021 at 04:37:35PM +0300, Leon Romanovsky wrote:
> > > > From: Shay Drory <shayd@nvidia.com>
> > > >
> > > > Currently, in case of QP, the following use-after-free is possible:
> > > >
> > > > cpu0 cpu1
> > > > res_get_common_dumpit()
> > > > rdma_restrack_get()
> > > > fill_res_qp_entry()
> > > > ib_destroy_qp_user()
> > > > rdma_restrack_del()
> > > > qp->device->ops.destroy_qp()
> > > > ib_query_qp()
> > > > qp->device->ops.query_qp()
> > > >
> > > > This is because rdma_restrack_del(), in case of QP, isn't waiting until
> > > > all users are gone.
> > > >
> > > > Fix it by making rdma_restrack_del() wait until all users are gone for
> > > > QPs as well.
> > > >
> > > > Fixes: 13ef5539def7 ("RDMA/restrack: Count references to the verbs objects")
> > > > Signed-off-by: Shay Drory <shayd@nvidia.com>
> > > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > > drivers/infiniband/core/restrack.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
> > > > index ffabaf327242..def0c5b0efe9 100644
> > > > +++ b/drivers/infiniband/core/restrack.c
> > > > @@ -340,7 +340,7 @@ void rdma_restrack_del(struct rdma_restrack_entry *res)
> > > > rt = &dev->res[res->type];
> > > >
> > > > old = xa_erase(&rt->xa, res->id);
> > > > - if (res->type == RDMA_RESTRACK_MR || res->type == RDMA_RESTRACK_QP)
> > > > + if (res->type == RDMA_RESTRACK_MR)
> > > > return;
> > >
> > > Why is MR skipping this?
> >
> > I don't remember the justification for RDMA_RESTRACK_MR || RDMA_RESTRACK_QP check.
> > My guess that it is related to the allocation flow (both are not converted) and
> > have internal objects to the drivers.
>
> Well, I don't understand why QP has a bug but MR would not have the
> same one??
I didn't understand when reviewed either, but decided to post it anyway
to get possible explanation for this RDMA_RESTRACK_MR || RDMA_RESTRACK_QP
check.
Thanks
>
> Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone
2021-04-21 5:03 ` Leon Romanovsky
@ 2021-04-22 14:29 ` Jason Gunthorpe
2021-04-25 13:03 ` Leon Romanovsky
2021-05-02 11:28 ` Leon Romanovsky
0 siblings, 2 replies; 16+ messages in thread
From: Jason Gunthorpe @ 2021-04-22 14:29 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: Doug Ledford, Shay Drory, linux-rdma
On Wed, Apr 21, 2021 at 08:03:22AM +0300, Leon Romanovsky wrote:
> I didn't understand when reviewed either, but decided to post it anyway
> to get possible explanation for this RDMA_RESTRACK_MR || RDMA_RESTRACK_QP
> check.
I think the whole thing should look more like this and we delete the
if entirely.
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 2b9ffc21cbc4ad..479b16b8f6723a 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1860,6 +1860,7 @@ static void _destroy_id(struct rdma_id_private *id_priv,
iw_destroy_cm_id(id_priv->cm_id.iw);
}
cma_leave_mc_groups(id_priv);
+ rdma_restrack_del(&id_priv->res);
cma_release_dev(id_priv);
}
@@ -1873,7 +1874,6 @@ static void _destroy_id(struct rdma_id_private *id_priv,
kfree(id_priv->id.route.path_rec);
put_net(id_priv->id.route.addr.dev_addr.net);
- rdma_restrack_del(&id_priv->res);
kfree(id_priv);
}
diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
index 15493357cfef09..1808bc2533f155 100644
--- a/drivers/infiniband/core/counters.c
+++ b/drivers/infiniband/core/counters.c
@@ -176,6 +176,8 @@ static void rdma_counter_free(struct rdma_counter *counter)
{
struct rdma_port_counter *port_counter;
+ rdma_restrack_del(&counter->res);
+
port_counter = &counter->device->port_data[counter->port].port_counter;
mutex_lock(&port_counter->lock);
port_counter->num_counters--;
@@ -185,7 +187,6 @@ static void rdma_counter_free(struct rdma_counter *counter)
mutex_unlock(&port_counter->lock);
- rdma_restrack_del(&counter->res);
kfree(counter->stats);
kfree(counter);
}
diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
index 433b426729d4ce..3884db637d33ab 100644
--- a/drivers/infiniband/core/cq.c
+++ b/drivers/infiniband/core/cq.c
@@ -339,11 +339,15 @@ void ib_free_cq(struct ib_cq *cq)
WARN_ON_ONCE(1);
}
+ rdma_restrack_prepare_del(&cq->res);
rdma_dim_destroy(cq);
trace_cq_free(cq);
ret = cq->device->ops.destroy_cq(cq, NULL);
- WARN_ONCE(ret, "Destroy of kernel CQ shouldn't fail");
- rdma_restrack_del(&cq->res);
+ if (WARN_ON(ret)) {
+ rdma_restrack_abort_del(&cq->res);
+ return;
+ }
+ rdma_restrack_finish_del(&cq->res);
kfree(cq->wc);
kfree(cq);
}
diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
index 94d83b665a2fe0..f57234ced93ca1 100644
--- a/drivers/infiniband/core/rdma_core.c
+++ b/drivers/infiniband/core/rdma_core.c
@@ -854,6 +854,8 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file *ufile,
struct ib_ucontext *ucontext = ufile->ucontext;
struct ib_device *ib_dev = ucontext->device;
+ rdma_restrack_prepare_del(&ucontext->res);
+
/*
* If we are closing the FD then the user mmap VMAs must have
* already been destroyed as they hold on to the filep, otherwise
@@ -868,9 +870,8 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file *ufile,
ib_rdmacg_uncharge(&ucontext->cg_obj, ib_dev,
RDMACG_RESOURCE_HCA_HANDLE);
- rdma_restrack_del(&ucontext->res);
-
ib_dev->ops.dealloc_ucontext(ucontext);
+ rdma_restrack_finish_del(&ucontext->res);
WARN_ON(!xa_empty(&ucontext->mmap_xa));
kfree(ucontext);
diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
index 033207882c82ce..8aa1dae40f38a7 100644
--- a/drivers/infiniband/core/restrack.c
+++ b/drivers/infiniband/core/restrack.c
@@ -315,11 +315,49 @@ int rdma_restrack_put(struct rdma_restrack_entry *res)
}
EXPORT_SYMBOL(rdma_restrack_put);
-/**
- * rdma_restrack_del() - delete object from the reource tracking database
- * @res: resource entry
- */
-void rdma_restrack_del(struct rdma_restrack_entry *res)
+void rdma_restrack_prepare_del(struct rdma_restrack_entry *res)
+{
+ struct rdma_restrack_entry *old;
+ struct rdma_restrack_root *rt;
+ struct ib_device *dev;
+
+ if (!res->valid)
+ return;
+
+ dev = res_to_dev(res);
+ rt = &dev->res[res->type];
+
+ if (!res->no_track) {
+ old = xa_cmpxchg(&rt->xa, res->id, res, XA_ZERO_ENTRY,
+ GFP_KERNEL);
+ WARN_ON(old != res);
+ }
+
+ /* Fence access from all concurrent threads, like netlink */
+ rdma_restrack_put(res);
+ wait_for_completion(&res->comp);
+}
+EXPORT_SYMBOL(rdma_restrack_prepare_del);
+
+void rdma_restrack_abort_del(struct rdma_restrack_entry *res)
+{
+ struct rdma_restrack_entry *old;
+ struct rdma_restrack_root *rt;
+ struct ib_device *dev;
+
+ if (!res->valid || res->no_track)
+ return;
+
+ dev = res_to_dev(res);
+ rt = &dev->res[res->type];
+
+ kref_init(&res->kref);
+ old = xa_cmpxchg(&rt->xa, res->id, XA_ZERO_ENTRY, res, GFP_KERNEL);
+ WARN_ON(old != res);
+}
+EXPORT_SYMBOL(rdma_restrack_abort_del);
+
+void rdma_restrack_finish_del(struct rdma_restrack_entry *res)
{
struct rdma_restrack_entry *old;
struct rdma_restrack_root *rt;
@@ -332,24 +370,22 @@ void rdma_restrack_del(struct rdma_restrack_entry *res)
}
return;
}
-
- if (res->no_track)
- goto out;
+ res->valid = false;
dev = res_to_dev(res);
- if (WARN_ON(!dev))
- return;
-
rt = &dev->res[res->type];
-
old = xa_erase(&rt->xa, res->id);
- if (res->type == RDMA_RESTRACK_MR || res->type == RDMA_RESTRACK_QP)
- return;
- WARN_ON(old != res);
+ WARN_ON(old != NULL);
+}
+EXPORT_SYMBOL(rdma_restrack_finish_del);
-out:
- res->valid = false;
- rdma_restrack_put(res);
- wait_for_completion(&res->comp);
+/**
+ * rdma_restrack_del() - delete object from the reource tracking database
+ * @res: resource entry
+ */
+void rdma_restrack_del(struct rdma_restrack_entry *res)
+{
+ rdma_restrack_prepare_del(res);
+ rdma_restrack_finish_del(res);
}
EXPORT_SYMBOL(rdma_restrack_del);
diff --git a/drivers/infiniband/core/restrack.h b/drivers/infiniband/core/restrack.h
index 6a04fc41f73801..87a670334acabe 100644
--- a/drivers/infiniband/core/restrack.h
+++ b/drivers/infiniband/core/restrack.h
@@ -27,6 +27,9 @@ int rdma_restrack_init(struct ib_device *dev);
void rdma_restrack_clean(struct ib_device *dev);
void rdma_restrack_add(struct rdma_restrack_entry *res);
void rdma_restrack_del(struct rdma_restrack_entry *res);
+void rdma_restrack_prepare_del(struct rdma_restrack_entry *res);
+void rdma_restrack_finish_del(struct rdma_restrack_entry *res);
+void rdma_restrack_abort_del(struct rdma_restrack_entry *res);
void rdma_restrack_new(struct rdma_restrack_entry *res,
enum rdma_restrack_type type);
void rdma_restrack_set_name(struct rdma_restrack_entry *res,
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 2b0798151fb753..2e761a7eb92847 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -346,13 +346,16 @@ int ib_dealloc_pd_user(struct ib_pd *pd, struct ib_udata *udata)
*/
WARN_ON(atomic_read(&pd->usecnt));
+ rdma_restrack_prepare_del(&pd->res);
ret = pd->device->ops.dealloc_pd(pd, udata);
- if (ret)
+ if (ret) {
+ rdma_restrack_abort_del(&pd->res);
return ret;
+ }
- rdma_restrack_del(&pd->res);
+ rdma_restrack_finish_del(&pd->res);
kfree(pd);
- return ret;
+ return 0;
}
EXPORT_SYMBOL(ib_dealloc_pd_user);
@@ -1085,19 +1088,22 @@ int ib_destroy_srq_user(struct ib_srq *srq, struct ib_udata *udata)
if (atomic_read(&srq->usecnt))
return -EBUSY;
+ rdma_restrack_prepare_del(&srq->res);
ret = srq->device->ops.destroy_srq(srq, udata);
- if (ret)
+ if (ret) {
+ rdma_restrack_abort_del(&srq->res);
return ret;
+ }
+ rdma_restrack_finish_del(&srq->res);
atomic_dec(&srq->pd->usecnt);
if (srq->srq_type == IB_SRQT_XRC)
atomic_dec(&srq->ext.xrc.xrcd->usecnt);
if (ib_srq_has_cq(srq->srq_type))
atomic_dec(&srq->ext.cq->usecnt);
- rdma_restrack_del(&srq->res);
kfree(srq);
- return ret;
+ return 0;
}
EXPORT_SYMBOL(ib_destroy_srq_user);
@@ -1963,31 +1969,33 @@ int ib_destroy_qp_user(struct ib_qp *qp, struct ib_udata *udata)
rdma_rw_cleanup_mrs(qp);
rdma_counter_unbind_qp(qp, true);
- rdma_restrack_del(&qp->res);
+ rdma_restrack_prepare_del(&qp->res);
ret = qp->device->ops.destroy_qp(qp, udata);
- if (!ret) {
- if (alt_path_sgid_attr)
- rdma_put_gid_attr(alt_path_sgid_attr);
- if (av_sgid_attr)
- rdma_put_gid_attr(av_sgid_attr);
- if (pd)
- atomic_dec(&pd->usecnt);
- if (scq)
- atomic_dec(&scq->usecnt);
- if (rcq)
- atomic_dec(&rcq->usecnt);
- if (srq)
- atomic_dec(&srq->usecnt);
- if (ind_tbl)
- atomic_dec(&ind_tbl->usecnt);
- if (sec)
- ib_destroy_qp_security_end(sec);
- } else {
+ if (ret) {
+ rdma_restrack_abort_del(&qp->res);
if (sec)
ib_destroy_qp_security_abort(sec);
+ return ret;
}
- return ret;
+ rdma_restrack_finish_del(&qp->res);
+ if (alt_path_sgid_attr)
+ rdma_put_gid_attr(alt_path_sgid_attr);
+ if (av_sgid_attr)
+ rdma_put_gid_attr(av_sgid_attr);
+ if (pd)
+ atomic_dec(&pd->usecnt);
+ if (scq)
+ atomic_dec(&scq->usecnt);
+ if (rcq)
+ atomic_dec(&rcq->usecnt);
+ if (srq)
+ atomic_dec(&srq->usecnt);
+ if (ind_tbl)
+ atomic_dec(&ind_tbl->usecnt);
+ if (sec)
+ ib_destroy_qp_security_end(sec);
+ return 0;
}
EXPORT_SYMBOL(ib_destroy_qp_user);
@@ -2050,13 +2058,16 @@ int ib_destroy_cq_user(struct ib_cq *cq, struct ib_udata *udata)
if (atomic_read(&cq->usecnt))
return -EBUSY;
+ rdma_restrack_prepare_del(&cq->res);
ret = cq->device->ops.destroy_cq(cq, udata);
- if (ret)
+ if (ret) {
+ rdma_restrack_abort_del(&cq->res);
return ret;
+ }
- rdma_restrack_del(&cq->res);
+ rdma_restrack_finish_del(&cq->res);
kfree(cq);
- return ret;
+ return 0;
}
EXPORT_SYMBOL(ib_destroy_cq_user);
@@ -2126,16 +2137,19 @@ int ib_dereg_mr_user(struct ib_mr *mr, struct ib_udata *udata)
int ret;
trace_mr_dereg(mr);
- rdma_restrack_del(&mr->res);
+ rdma_restrack_prepare_del(&mr->res);
ret = mr->device->ops.dereg_mr(mr, udata);
- if (!ret) {
- atomic_dec(&pd->usecnt);
- if (dm)
- atomic_dec(&dm->usecnt);
- kfree(sig_attrs);
+ if (ret) {
+ rdma_restrack_abort_del(&mr->res);
+ return ret;
}
- return ret;
+ rdma_restrack_finish_del(&mr->res);
+ atomic_dec(&pd->usecnt);
+ if (dm)
+ atomic_dec(&dm->usecnt);
+ kfree(sig_attrs);
+ return 0;
}
EXPORT_SYMBOL(ib_dereg_mr_user);
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone
2021-04-22 14:29 ` Jason Gunthorpe
@ 2021-04-25 13:03 ` Leon Romanovsky
2021-04-25 13:08 ` Jason Gunthorpe
2021-05-02 11:28 ` Leon Romanovsky
1 sibling, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2021-04-25 13:03 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Doug Ledford, Shay Drory, linux-rdma
On Thu, Apr 22, 2021 at 11:29:39AM -0300, Jason Gunthorpe wrote:
> On Wed, Apr 21, 2021 at 08:03:22AM +0300, Leon Romanovsky wrote:
>
> > I didn't understand when reviewed either, but decided to post it anyway
> > to get possible explanation for this RDMA_RESTRACK_MR || RDMA_RESTRACK_QP
> > check.
>
> I think the whole thing should look more like this and we delete the
> if entirely.
I have mixed feelings about this approach. Before "destroy can fail disaster",
the restrack goal was to provide the following flow:
1. create new memory object - rdma_restrack_new()
2. create new HW object - .create_XXX() callback in the driver
3. add HW object to the DB - rdma_restrack_del()
....
4. wait for any work on this HW object to complete - rdma_restrack_del()
5. safely destroy HW object - .destroy_XXX()
I really would like to stay with this flow and block any access to the
object that failed to destruct - maybe add to some zombie list.
The proposed prepare/abort/finish flow is much harder to implement correctly.
Let's take as an example ib_destroy_qp_user(), we called to rdma_rw_cleanup_mrs(),
but didn't restore them after .destroy_qp() failure.
Thanks
>
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index 2b9ffc21cbc4ad..479b16b8f6723a 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -1860,6 +1860,7 @@ static void _destroy_id(struct rdma_id_private *id_priv,
> iw_destroy_cm_id(id_priv->cm_id.iw);
> }
> cma_leave_mc_groups(id_priv);
> + rdma_restrack_del(&id_priv->res);
> cma_release_dev(id_priv);
> }
>
> @@ -1873,7 +1874,6 @@ static void _destroy_id(struct rdma_id_private *id_priv,
> kfree(id_priv->id.route.path_rec);
>
> put_net(id_priv->id.route.addr.dev_addr.net);
> - rdma_restrack_del(&id_priv->res);
> kfree(id_priv);
> }
>
> diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
> index 15493357cfef09..1808bc2533f155 100644
> --- a/drivers/infiniband/core/counters.c
> +++ b/drivers/infiniband/core/counters.c
> @@ -176,6 +176,8 @@ static void rdma_counter_free(struct rdma_counter *counter)
> {
> struct rdma_port_counter *port_counter;
>
> + rdma_restrack_del(&counter->res);
> +
> port_counter = &counter->device->port_data[counter->port].port_counter;
> mutex_lock(&port_counter->lock);
> port_counter->num_counters--;
> @@ -185,7 +187,6 @@ static void rdma_counter_free(struct rdma_counter *counter)
>
> mutex_unlock(&port_counter->lock);
>
> - rdma_restrack_del(&counter->res);
> kfree(counter->stats);
> kfree(counter);
> }
> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
> index 433b426729d4ce..3884db637d33ab 100644
> --- a/drivers/infiniband/core/cq.c
> +++ b/drivers/infiniband/core/cq.c
> @@ -339,11 +339,15 @@ void ib_free_cq(struct ib_cq *cq)
> WARN_ON_ONCE(1);
> }
>
> + rdma_restrack_prepare_del(&cq->res);
> rdma_dim_destroy(cq);
> trace_cq_free(cq);
> ret = cq->device->ops.destroy_cq(cq, NULL);
> - WARN_ONCE(ret, "Destroy of kernel CQ shouldn't fail");
> - rdma_restrack_del(&cq->res);
> + if (WARN_ON(ret)) {
> + rdma_restrack_abort_del(&cq->res);
> + return;
> + }
> + rdma_restrack_finish_del(&cq->res);
> kfree(cq->wc);
> kfree(cq);
> }
> diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
> index 94d83b665a2fe0..f57234ced93ca1 100644
> --- a/drivers/infiniband/core/rdma_core.c
> +++ b/drivers/infiniband/core/rdma_core.c
> @@ -854,6 +854,8 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file *ufile,
> struct ib_ucontext *ucontext = ufile->ucontext;
> struct ib_device *ib_dev = ucontext->device;
>
> + rdma_restrack_prepare_del(&ucontext->res);
> +
> /*
> * If we are closing the FD then the user mmap VMAs must have
> * already been destroyed as they hold on to the filep, otherwise
> @@ -868,9 +870,8 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file *ufile,
> ib_rdmacg_uncharge(&ucontext->cg_obj, ib_dev,
> RDMACG_RESOURCE_HCA_HANDLE);
>
> - rdma_restrack_del(&ucontext->res);
> -
> ib_dev->ops.dealloc_ucontext(ucontext);
> + rdma_restrack_finish_del(&ucontext->res);
> WARN_ON(!xa_empty(&ucontext->mmap_xa));
> kfree(ucontext);
>
> diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
> index 033207882c82ce..8aa1dae40f38a7 100644
> --- a/drivers/infiniband/core/restrack.c
> +++ b/drivers/infiniband/core/restrack.c
> @@ -315,11 +315,49 @@ int rdma_restrack_put(struct rdma_restrack_entry *res)
> }
> EXPORT_SYMBOL(rdma_restrack_put);
>
> -/**
> - * rdma_restrack_del() - delete object from the reource tracking database
> - * @res: resource entry
> - */
> -void rdma_restrack_del(struct rdma_restrack_entry *res)
> +void rdma_restrack_prepare_del(struct rdma_restrack_entry *res)
> +{
> + struct rdma_restrack_entry *old;
> + struct rdma_restrack_root *rt;
> + struct ib_device *dev;
> +
> + if (!res->valid)
> + return;
> +
> + dev = res_to_dev(res);
> + rt = &dev->res[res->type];
> +
> + if (!res->no_track) {
> + old = xa_cmpxchg(&rt->xa, res->id, res, XA_ZERO_ENTRY,
> + GFP_KERNEL);
> + WARN_ON(old != res);
> + }
> +
> + /* Fence access from all concurrent threads, like netlink */
> + rdma_restrack_put(res);
> + wait_for_completion(&res->comp);
> +}
> +EXPORT_SYMBOL(rdma_restrack_prepare_del);
> +
> +void rdma_restrack_abort_del(struct rdma_restrack_entry *res)
> +{
> + struct rdma_restrack_entry *old;
> + struct rdma_restrack_root *rt;
> + struct ib_device *dev;
> +
> + if (!res->valid || res->no_track)
> + return;
> +
> + dev = res_to_dev(res);
> + rt = &dev->res[res->type];
> +
> + kref_init(&res->kref);
> + old = xa_cmpxchg(&rt->xa, res->id, XA_ZERO_ENTRY, res, GFP_KERNEL);
> + WARN_ON(old != res);
> +}
> +EXPORT_SYMBOL(rdma_restrack_abort_del);
> +
> +void rdma_restrack_finish_del(struct rdma_restrack_entry *res)
> {
> struct rdma_restrack_entry *old;
> struct rdma_restrack_root *rt;
> @@ -332,24 +370,22 @@ void rdma_restrack_del(struct rdma_restrack_entry *res)
> }
> return;
> }
> -
> - if (res->no_track)
> - goto out;
> + res->valid = false;
>
> dev = res_to_dev(res);
> - if (WARN_ON(!dev))
> - return;
> -
> rt = &dev->res[res->type];
> -
> old = xa_erase(&rt->xa, res->id);
> - if (res->type == RDMA_RESTRACK_MR || res->type == RDMA_RESTRACK_QP)
> - return;
> - WARN_ON(old != res);
> + WARN_ON(old != NULL);
> +}
> +EXPORT_SYMBOL(rdma_restrack_finish_del);
>
> -out:
> - res->valid = false;
> - rdma_restrack_put(res);
> - wait_for_completion(&res->comp);
> +/**
> + * rdma_restrack_del() - delete object from the reource tracking database
> + * @res: resource entry
> + */
> +void rdma_restrack_del(struct rdma_restrack_entry *res)
> +{
> + rdma_restrack_prepare_del(res);
> + rdma_restrack_finish_del(res);
> }
> EXPORT_SYMBOL(rdma_restrack_del);
> diff --git a/drivers/infiniband/core/restrack.h b/drivers/infiniband/core/restrack.h
> index 6a04fc41f73801..87a670334acabe 100644
> --- a/drivers/infiniband/core/restrack.h
> +++ b/drivers/infiniband/core/restrack.h
> @@ -27,6 +27,9 @@ int rdma_restrack_init(struct ib_device *dev);
> void rdma_restrack_clean(struct ib_device *dev);
> void rdma_restrack_add(struct rdma_restrack_entry *res);
> void rdma_restrack_del(struct rdma_restrack_entry *res);
> +void rdma_restrack_prepare_del(struct rdma_restrack_entry *res);
> +void rdma_restrack_finish_del(struct rdma_restrack_entry *res);
> +void rdma_restrack_abort_del(struct rdma_restrack_entry *res);
> void rdma_restrack_new(struct rdma_restrack_entry *res,
> enum rdma_restrack_type type);
> void rdma_restrack_set_name(struct rdma_restrack_entry *res,
> diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
> index 2b0798151fb753..2e761a7eb92847 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -346,13 +346,16 @@ int ib_dealloc_pd_user(struct ib_pd *pd, struct ib_udata *udata)
> */
> WARN_ON(atomic_read(&pd->usecnt));
>
> + rdma_restrack_prepare_del(&pd->res);
> ret = pd->device->ops.dealloc_pd(pd, udata);
> - if (ret)
> + if (ret) {
> + rdma_restrack_abort_del(&pd->res);
> return ret;
> + }
>
> - rdma_restrack_del(&pd->res);
> + rdma_restrack_finish_del(&pd->res);
> kfree(pd);
> - return ret;
> + return 0;
> }
> EXPORT_SYMBOL(ib_dealloc_pd_user);
>
> @@ -1085,19 +1088,22 @@ int ib_destroy_srq_user(struct ib_srq *srq, struct ib_udata *udata)
> if (atomic_read(&srq->usecnt))
> return -EBUSY;
>
> + rdma_restrack_prepare_del(&srq->res);
> ret = srq->device->ops.destroy_srq(srq, udata);
> - if (ret)
> + if (ret) {
> + rdma_restrack_abort_del(&srq->res);
> return ret;
> + }
> + rdma_restrack_finish_del(&srq->res);
>
> atomic_dec(&srq->pd->usecnt);
> if (srq->srq_type == IB_SRQT_XRC)
> atomic_dec(&srq->ext.xrc.xrcd->usecnt);
> if (ib_srq_has_cq(srq->srq_type))
> atomic_dec(&srq->ext.cq->usecnt);
> - rdma_restrack_del(&srq->res);
> kfree(srq);
>
> - return ret;
> + return 0;
> }
> EXPORT_SYMBOL(ib_destroy_srq_user);
>
> @@ -1963,31 +1969,33 @@ int ib_destroy_qp_user(struct ib_qp *qp, struct ib_udata *udata)
> rdma_rw_cleanup_mrs(qp);
>
> rdma_counter_unbind_qp(qp, true);
> - rdma_restrack_del(&qp->res);
> + rdma_restrack_prepare_del(&qp->res);
> ret = qp->device->ops.destroy_qp(qp, udata);
> - if (!ret) {
> - if (alt_path_sgid_attr)
> - rdma_put_gid_attr(alt_path_sgid_attr);
> - if (av_sgid_attr)
> - rdma_put_gid_attr(av_sgid_attr);
> - if (pd)
> - atomic_dec(&pd->usecnt);
> - if (scq)
> - atomic_dec(&scq->usecnt);
> - if (rcq)
> - atomic_dec(&rcq->usecnt);
> - if (srq)
> - atomic_dec(&srq->usecnt);
> - if (ind_tbl)
> - atomic_dec(&ind_tbl->usecnt);
> - if (sec)
> - ib_destroy_qp_security_end(sec);
> - } else {
> + if (ret) {
> + rdma_restrack_abort_del(&qp->res);
> if (sec)
> ib_destroy_qp_security_abort(sec);
> + return ret;
> }
>
> - return ret;
> + rdma_restrack_finish_del(&qp->res);
> + if (alt_path_sgid_attr)
> + rdma_put_gid_attr(alt_path_sgid_attr);
> + if (av_sgid_attr)
> + rdma_put_gid_attr(av_sgid_attr);
> + if (pd)
> + atomic_dec(&pd->usecnt);
> + if (scq)
> + atomic_dec(&scq->usecnt);
> + if (rcq)
> + atomic_dec(&rcq->usecnt);
> + if (srq)
> + atomic_dec(&srq->usecnt);
> + if (ind_tbl)
> + atomic_dec(&ind_tbl->usecnt);
> + if (sec)
> + ib_destroy_qp_security_end(sec);
> + return 0;
> }
> EXPORT_SYMBOL(ib_destroy_qp_user);
>
> @@ -2050,13 +2058,16 @@ int ib_destroy_cq_user(struct ib_cq *cq, struct ib_udata *udata)
> if (atomic_read(&cq->usecnt))
> return -EBUSY;
>
> + rdma_restrack_prepare_del(&cq->res);
> ret = cq->device->ops.destroy_cq(cq, udata);
> - if (ret)
> + if (ret) {
> + rdma_restrack_abort_del(&cq->res);
> return ret;
> + }
>
> - rdma_restrack_del(&cq->res);
> + rdma_restrack_finish_del(&cq->res);
> kfree(cq);
> - return ret;
> + return 0;
> }
> EXPORT_SYMBOL(ib_destroy_cq_user);
>
> @@ -2126,16 +2137,19 @@ int ib_dereg_mr_user(struct ib_mr *mr, struct ib_udata *udata)
> int ret;
>
> trace_mr_dereg(mr);
> - rdma_restrack_del(&mr->res);
> + rdma_restrack_prepare_del(&mr->res);
> ret = mr->device->ops.dereg_mr(mr, udata);
> - if (!ret) {
> - atomic_dec(&pd->usecnt);
> - if (dm)
> - atomic_dec(&dm->usecnt);
> - kfree(sig_attrs);
> + if (ret) {
> + rdma_restrack_abort_del(&mr->res);
> + return ret;
> }
>
> - return ret;
> + rdma_restrack_finish_del(&mr->res);
> + atomic_dec(&pd->usecnt);
> + if (dm)
> + atomic_dec(&dm->usecnt);
> + kfree(sig_attrs);
> + return 0;
> }
> EXPORT_SYMBOL(ib_dereg_mr_user);
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone
2021-04-25 13:03 ` Leon Romanovsky
@ 2021-04-25 13:08 ` Jason Gunthorpe
2021-04-25 13:44 ` Leon Romanovsky
0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2021-04-25 13:08 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: Doug Ledford, Shay Drory, linux-rdma
On Sun, Apr 25, 2021 at 04:03:47PM +0300, Leon Romanovsky wrote:
> On Thu, Apr 22, 2021 at 11:29:39AM -0300, Jason Gunthorpe wrote:
> > On Wed, Apr 21, 2021 at 08:03:22AM +0300, Leon Romanovsky wrote:
> >
> > > I didn't understand when reviewed either, but decided to post it anyway
> > > to get possible explanation for this RDMA_RESTRACK_MR || RDMA_RESTRACK_QP
> > > check.
> >
> > I think the whole thing should look more like this and we delete the
> > if entirely.
>
> I have mixed feelings about this approach. Before "destroy can fail disaster",
> the restrack goal was to provide the following flow:
> 1. create new memory object - rdma_restrack_new()
> 2. create new HW object - .create_XXX() callback in the driver
> 3. add HW object to the DB - rdma_restrack_del()
> ....
> 4. wait for any work on this HW object to complete - rdma_restrack_del()
> 5. safely destroy HW object - .destroy_XXX()
>
> I really would like to stay with this flow and block any access to the
> object that failed to destruct - maybe add to some zombie list.
That isn't the semantic we now have for destroy.
> The proposed prepare/abort/finish flow is much harder to implement correctly.
> Let's take as an example ib_destroy_qp_user(), we called to rdma_rw_cleanup_mrs(),
> but didn't restore them after .destroy_qp() failure.
I think it is a bug we call rdma_rw code in a a user path.
Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone
2021-04-25 13:08 ` Jason Gunthorpe
@ 2021-04-25 13:44 ` Leon Romanovsky
2021-04-25 17:22 ` Jason Gunthorpe
0 siblings, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2021-04-25 13:44 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Doug Ledford, Shay Drory, linux-rdma
On Sun, Apr 25, 2021 at 10:08:57AM -0300, Jason Gunthorpe wrote:
> On Sun, Apr 25, 2021 at 04:03:47PM +0300, Leon Romanovsky wrote:
> > On Thu, Apr 22, 2021 at 11:29:39AM -0300, Jason Gunthorpe wrote:
> > > On Wed, Apr 21, 2021 at 08:03:22AM +0300, Leon Romanovsky wrote:
> > >
> > > > I didn't understand when reviewed either, but decided to post it anyway
> > > > to get possible explanation for this RDMA_RESTRACK_MR || RDMA_RESTRACK_QP
> > > > check.
> > >
> > > I think the whole thing should look more like this and we delete the
> > > if entirely.
> >
> > I have mixed feelings about this approach. Before "destroy can fail disaster",
> > the restrack goal was to provide the following flow:
> > 1. create new memory object - rdma_restrack_new()
> > 2. create new HW object - .create_XXX() callback in the driver
> > 3. add HW object to the DB - rdma_restrack_del()
> > ....
> > 4. wait for any work on this HW object to complete - rdma_restrack_del()
> > 5. safely destroy HW object - .destroy_XXX()
> >
> > I really would like to stay with this flow and block any access to the
> > object that failed to destruct - maybe add to some zombie list.
>
> That isn't the semantic we now have for destroy.
I would say that it is my mistake introduced when changed destroy to
return an error.
>
> > The proposed prepare/abort/finish flow is much harder to implement correctly.
> > Let's take as an example ib_destroy_qp_user(), we called to rdma_rw_cleanup_mrs(),
> > but didn't restore them after .destroy_qp() failure.
>
> I think it is a bug we call rdma_rw code in a a user path.
It was an example of a flow that wasn't restored properly.
The same goes for ib_dealloc_pd_user(), release of __internal_mr.
Of course, these flows shouldn't fail because of being kernel flows, but it is not clear
from the code.
Thanks
>
> Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone
2021-04-25 13:44 ` Leon Romanovsky
@ 2021-04-25 17:22 ` Jason Gunthorpe
2021-04-25 17:38 ` Leon Romanovsky
0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2021-04-25 17:22 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: Doug Ledford, Shay Drory, linux-rdma
On Sun, Apr 25, 2021 at 04:44:55PM +0300, Leon Romanovsky wrote:
> > > The proposed prepare/abort/finish flow is much harder to implement correctly.
> > > Let's take as an example ib_destroy_qp_user(), we called to rdma_rw_cleanup_mrs(),
> > > but didn't restore them after .destroy_qp() failure.
> >
> > I think it is a bug we call rdma_rw code in a a user path.
>
> It was an example of a flow that wasn't restored properly.
> The same goes for ib_dealloc_pd_user(), release of __internal_mr.
>
> Of course, these flows shouldn't fail because of being kernel flows, but it is not clear
> from the code.
Well, exactly, user flows are not allowed to do extra stuff before
calling the driver destroy
So the arrangement I gave is reasonable and make sense, it is
certainly better than the hodge podge of ordering that we have today
Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone
2021-04-25 17:22 ` Jason Gunthorpe
@ 2021-04-25 17:38 ` Leon Romanovsky
2021-04-26 12:03 ` Jason Gunthorpe
0 siblings, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2021-04-25 17:38 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Doug Ledford, Shay Drory, linux-rdma
On Sun, Apr 25, 2021 at 02:22:54PM -0300, Jason Gunthorpe wrote:
> On Sun, Apr 25, 2021 at 04:44:55PM +0300, Leon Romanovsky wrote:
> > > > The proposed prepare/abort/finish flow is much harder to implement correctly.
> > > > Let's take as an example ib_destroy_qp_user(), we called to rdma_rw_cleanup_mrs(),
> > > > but didn't restore them after .destroy_qp() failure.
> > >
> > > I think it is a bug we call rdma_rw code in a a user path.
> >
> > It was an example of a flow that wasn't restored properly.
> > The same goes for ib_dealloc_pd_user(), release of __internal_mr.
> >
> > Of course, these flows shouldn't fail because of being kernel flows, but it is not clear
> > from the code.
>
> Well, exactly, user flows are not allowed to do extra stuff before
> calling the driver destroy
>
> So the arrangement I gave is reasonable and make sense, it is
> certainly better than the hodge podge of ordering that we have today
I thought about simpler solution - move rdma_restrack_del() before .destroy()
callbacks together with attempt to readd res object if destroy fails.
Thanks
>
> Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone
2021-04-25 17:38 ` Leon Romanovsky
@ 2021-04-26 12:03 ` Jason Gunthorpe
2021-04-26 13:08 ` Leon Romanovsky
0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2021-04-26 12:03 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: Doug Ledford, Shay Drory, linux-rdma
On Sun, Apr 25, 2021 at 08:38:57PM +0300, Leon Romanovsky wrote:
> On Sun, Apr 25, 2021 at 02:22:54PM -0300, Jason Gunthorpe wrote:
> > On Sun, Apr 25, 2021 at 04:44:55PM +0300, Leon Romanovsky wrote:
> > > > > The proposed prepare/abort/finish flow is much harder to implement correctly.
> > > > > Let's take as an example ib_destroy_qp_user(), we called to rdma_rw_cleanup_mrs(),
> > > > > but didn't restore them after .destroy_qp() failure.
> > > >
> > > > I think it is a bug we call rdma_rw code in a a user path.
> > >
> > > It was an example of a flow that wasn't restored properly.
> > > The same goes for ib_dealloc_pd_user(), release of __internal_mr.
> > >
> > > Of course, these flows shouldn't fail because of being kernel flows, but it is not clear
> > > from the code.
> >
> > Well, exactly, user flows are not allowed to do extra stuff before
> > calling the driver destroy
> >
> > So the arrangement I gave is reasonable and make sense, it is
> > certainly better than the hodge podge of ordering that we have today
>
> I thought about simpler solution - move rdma_restrack_del() before .destroy()
> callbacks together with attempt to readd res object if destroy fails.
Is isn't simpler, now add can fail and can't be recovered
Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone
2021-04-26 12:03 ` Jason Gunthorpe
@ 2021-04-26 13:08 ` Leon Romanovsky
2021-04-26 13:11 ` Jason Gunthorpe
0 siblings, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2021-04-26 13:08 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Doug Ledford, Shay Drory, linux-rdma
On Mon, Apr 26, 2021 at 09:03:49AM -0300, Jason Gunthorpe wrote:
> On Sun, Apr 25, 2021 at 08:38:57PM +0300, Leon Romanovsky wrote:
> > On Sun, Apr 25, 2021 at 02:22:54PM -0300, Jason Gunthorpe wrote:
> > > On Sun, Apr 25, 2021 at 04:44:55PM +0300, Leon Romanovsky wrote:
> > > > > > The proposed prepare/abort/finish flow is much harder to implement correctly.
> > > > > > Let's take as an example ib_destroy_qp_user(), we called to rdma_rw_cleanup_mrs(),
> > > > > > but didn't restore them after .destroy_qp() failure.
> > > > >
> > > > > I think it is a bug we call rdma_rw code in a a user path.
> > > >
> > > > It was an example of a flow that wasn't restored properly.
> > > > The same goes for ib_dealloc_pd_user(), release of __internal_mr.
> > > >
> > > > Of course, these flows shouldn't fail because of being kernel flows, but it is not clear
> > > > from the code.
> > >
> > > Well, exactly, user flows are not allowed to do extra stuff before
> > > calling the driver destroy
> > >
> > > So the arrangement I gave is reasonable and make sense, it is
> > > certainly better than the hodge podge of ordering that we have today
> >
> > I thought about simpler solution - move rdma_restrack_del() before .destroy()
> > callbacks together with attempt to readd res object if destroy fails.
>
> Is isn't simpler, now add can fail and can't be recovered
It is not different from failure during first call to rdma_restrack_add().
You didn't like the idea to be strict with addition of restrack, but
want to be strict in reinsert.
Thanks
>
> Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone
2021-04-26 13:08 ` Leon Romanovsky
@ 2021-04-26 13:11 ` Jason Gunthorpe
2021-04-27 4:45 ` Leon Romanovsky
0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2021-04-26 13:11 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: Doug Ledford, Shay Drory, linux-rdma
On Mon, Apr 26, 2021 at 04:08:42PM +0300, Leon Romanovsky wrote:
> On Mon, Apr 26, 2021 at 09:03:49AM -0300, Jason Gunthorpe wrote:
> > On Sun, Apr 25, 2021 at 08:38:57PM +0300, Leon Romanovsky wrote:
> > > On Sun, Apr 25, 2021 at 02:22:54PM -0300, Jason Gunthorpe wrote:
> > > > On Sun, Apr 25, 2021 at 04:44:55PM +0300, Leon Romanovsky wrote:
> > > > > > > The proposed prepare/abort/finish flow is much harder to implement correctly.
> > > > > > > Let's take as an example ib_destroy_qp_user(), we called to rdma_rw_cleanup_mrs(),
> > > > > > > but didn't restore them after .destroy_qp() failure.
> > > > > >
> > > > > > I think it is a bug we call rdma_rw code in a a user path.
> > > > >
> > > > > It was an example of a flow that wasn't restored properly.
> > > > > The same goes for ib_dealloc_pd_user(), release of __internal_mr.
> > > > >
> > > > > Of course, these flows shouldn't fail because of being kernel flows, but it is not clear
> > > > > from the code.
> > > >
> > > > Well, exactly, user flows are not allowed to do extra stuff before
> > > > calling the driver destroy
> > > >
> > > > So the arrangement I gave is reasonable and make sense, it is
> > > > certainly better than the hodge podge of ordering that we have today
> > >
> > > I thought about simpler solution - move rdma_restrack_del() before .destroy()
> > > callbacks together with attempt to readd res object if destroy fails.
> >
> > Is isn't simpler, now add can fail and can't be recovered
>
> It is not different from failure during first call to rdma_restrack_add().
> You didn't like the idea to be strict with addition of restrack, but
> want to be strict in reinsert.
It is ugly we couldn't fix the add side, lets not repeat that uglyness
in other places
Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone
2021-04-26 13:11 ` Jason Gunthorpe
@ 2021-04-27 4:45 ` Leon Romanovsky
0 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2021-04-27 4:45 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Doug Ledford, Shay Drory, linux-rdma
On Mon, Apr 26, 2021 at 10:11:07AM -0300, Jason Gunthorpe wrote:
> On Mon, Apr 26, 2021 at 04:08:42PM +0300, Leon Romanovsky wrote:
> > On Mon, Apr 26, 2021 at 09:03:49AM -0300, Jason Gunthorpe wrote:
> > > On Sun, Apr 25, 2021 at 08:38:57PM +0300, Leon Romanovsky wrote:
> > > > On Sun, Apr 25, 2021 at 02:22:54PM -0300, Jason Gunthorpe wrote:
> > > > > On Sun, Apr 25, 2021 at 04:44:55PM +0300, Leon Romanovsky wrote:
> > > > > > > > The proposed prepare/abort/finish flow is much harder to implement correctly.
> > > > > > > > Let's take as an example ib_destroy_qp_user(), we called to rdma_rw_cleanup_mrs(),
> > > > > > > > but didn't restore them after .destroy_qp() failure.
> > > > > > >
> > > > > > > I think it is a bug we call rdma_rw code in a a user path.
> > > > > >
> > > > > > It was an example of a flow that wasn't restored properly.
> > > > > > The same goes for ib_dealloc_pd_user(), release of __internal_mr.
> > > > > >
> > > > > > Of course, these flows shouldn't fail because of being kernel flows, but it is not clear
> > > > > > from the code.
> > > > >
> > > > > Well, exactly, user flows are not allowed to do extra stuff before
> > > > > calling the driver destroy
> > > > >
> > > > > So the arrangement I gave is reasonable and make sense, it is
> > > > > certainly better than the hodge podge of ordering that we have today
> > > >
> > > > I thought about simpler solution - move rdma_restrack_del() before .destroy()
> > > > callbacks together with attempt to readd res object if destroy fails.
> > >
> > > Is isn't simpler, now add can fail and can't be recovered
> >
> > It is not different from failure during first call to rdma_restrack_add().
> > You didn't like the idea to be strict with addition of restrack, but
> > want to be strict in reinsert.
>
> It is ugly we couldn't fix the add side, lets not repeat that uglyness
> in other places
Why can't we fix _add?
Thanks
>
> Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone
2021-04-22 14:29 ` Jason Gunthorpe
2021-04-25 13:03 ` Leon Romanovsky
@ 2021-05-02 11:28 ` Leon Romanovsky
1 sibling, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2021-05-02 11:28 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Doug Ledford, Shay Drory, linux-rdma
On Thu, Apr 22, 2021 at 11:29:39AM -0300, Jason Gunthorpe wrote:
> On Wed, Apr 21, 2021 at 08:03:22AM +0300, Leon Romanovsky wrote:
>
> > I didn't understand when reviewed either, but decided to post it anyway
> > to get possible explanation for this RDMA_RESTRACK_MR || RDMA_RESTRACK_QP
> > check.
>
> I think the whole thing should look more like this and we delete the
> if entirely.
Finally, I had a time to work on it and got immediate reminder why we
have "if ...".
>
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index 2b9ffc21cbc4ad..479b16b8f6723a 100644
<...>
> @@ -1963,31 +1969,33 @@ int ib_destroy_qp_user(struct ib_qp *qp, struct ib_udata *udata)
> rdma_rw_cleanup_mrs(qp);
>
> rdma_counter_unbind_qp(qp, true);
> - rdma_restrack_del(&qp->res);
> + rdma_restrack_prepare_del(&qp->res);
> ret = qp->device->ops.destroy_qp(qp, udata);
> - if (!ret) {
> - if (alt_path_sgid_attr)
> - rdma_put_gid_attr(alt_path_sgid_attr);
> - if (av_sgid_attr)
> - rdma_put_gid_attr(av_sgid_attr);
> - if (pd)
> - atomic_dec(&pd->usecnt);
> - if (scq)
> - atomic_dec(&scq->usecnt);
> - if (rcq)
> - atomic_dec(&rcq->usecnt);
> - if (srq)
> - atomic_dec(&srq->usecnt);
> - if (ind_tbl)
> - atomic_dec(&ind_tbl->usecnt);
> - if (sec)
> - ib_destroy_qp_security_end(sec);
> - } else {
> + if (ret) {
> + rdma_restrack_abort_del(&qp->res);
> if (sec)
> ib_destroy_qp_security_abort(sec);
> + return ret;
> }
>
> - return ret;
> + rdma_restrack_finish_del(&qp->res);
This causes to use-after-free because "qp" is released in the driver.
[leonro@vm ~]$ mkt test
.s....[ 24.350657] ==================================================================
[ 24.350917] BUG: KASAN: use-after-free in rdma_restrack_finish_del+0x1a7/0x210 [ib_core]
[ 24.351303] Read of size 1 at addr ffff88800c59f118 by task python3/243
[ 24.351416]
[ 24.351495] CPU: 4 PID: 243 Comm: python3 Not tainted 5.12.0-rc2+ #2923
[ 24.351644] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 24.351832] Call Trace:
[ 24.351886] dump_stack+0x93/0xc2
[ 24.351965] print_address_description.constprop.0+0x18/0x140
[ 24.352074] ? rdma_restrack_finish_del+0x1a7/0x210 [ib_core]
[ 24.352204] ? rdma_restrack_finish_del+0x1a7/0x210 [ib_core]
[ 24.352333] kasan_report.cold+0x7c/0xd8
[ 24.352413] ? rdma_restrack_finish_del+0x1a7/0x210 [ib_core]
[ 24.352543] rdma_restrack_finish_del+0x1a7/0x210 [ib_core]
[ 24.352654] ib_destroy_qp_user+0x297/0x540 [ib_core]
[ 24.352769] uverbs_free_qp+0x7d/0x150 [ib_uverbs]
[ 24.352864] uverbs_destroy_uobject+0x164/0x6c0 [ib_uverbs]
[ 24.352949] uobj_destroy+0x72/0xf0 [ib_uverbs]
[ 24.353041] ib_uverbs_cmd_verbs+0x19b9/0x2ee0 [ib_uverbs]
[ 24.353147] ? uverbs_free_qp+0x150/0x150 [ib_uverbs]
[ 24.353246] ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
[ 24.353340] ? uverbs_fill_udata+0x540/0x540 [ib_uverbs]
[ 24.353455] ? filemap_map_pages+0x2d9/0xf30
[ 24.353552] ? lock_acquire+0x1a9/0x6d0
[ 24.353620] ? ib_uverbs_ioctl+0x11e/0x260 [ib_uverbs]
[ 24.353720] ? __might_fault+0xba/0x160
[ 24.353790] ? lock_release+0x6c0/0x6c0
[ 24.353866] ib_uverbs_ioctl+0x169/0x260 [ib_uverbs]
[ 24.353958] ? ib_uverbs_ioctl+0x11e/0x260 [ib_uverbs]
[ 24.354049] ? ib_uverbs_cmd_verbs+0x2ee0/0x2ee0 [ib_uverbs]
[ 24.354158] __x64_sys_ioctl+0x7e6/0x1050
[ 24.354217] ? generic_block_fiemap+0x60/0x60
[ 24.354305] ? down_write_nested+0x150/0x150
[ 24.354400] ? do_user_addr_fault+0x219/0xdc0
[ 24.354497] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 24.354584] ? syscall_enter_from_user_mode+0x1d/0x50
[ 24.354678] do_syscall_64+0x2d/0x40
[ 24.354747] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 24.354837] RIP: 0033:0x7fbfd862b5db
[ 24.354911] Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff 85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6d b8 0c 00 f7 d8 64 89 01 48
[ 24.355200] RSP: 002b:00007fff7e9fde58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 24.355336] RAX: ffffffffffffffda RBX: 00007fff7e9fdf88 RCX: 00007fbfd862b5db
[ 24.355468] RDX: 00007fff7e9fdf70 RSI: 00000000c0181b01 RDI: 0000000000000005
[ 24.355587] RBP: 00007fff7e9fdf50 R08: 000055d6a4f8b3f0 R09: 00007fbfd800d000
[ 24.355713] R10: 00007fbfd800d0f0 R11: 0000000000000246 R12: 00007fff7e9fdf50
[ 24.355836] R13: 00007fff7e9fdf2c R14: 000055d6a4e59d90 R15: 000055d6a4f8b530
[ 24.355973]
So let's finish memory allocation conversion patches before.
Thanks
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2021-05-02 11:28 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-18 13:37 [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone Leon Romanovsky
2021-04-20 12:39 ` Jason Gunthorpe
2021-04-20 13:06 ` Leon Romanovsky
2021-04-20 15:25 ` Jason Gunthorpe
2021-04-21 5:03 ` Leon Romanovsky
2021-04-22 14:29 ` Jason Gunthorpe
2021-04-25 13:03 ` Leon Romanovsky
2021-04-25 13:08 ` Jason Gunthorpe
2021-04-25 13:44 ` Leon Romanovsky
2021-04-25 17:22 ` Jason Gunthorpe
2021-04-25 17:38 ` Leon Romanovsky
2021-04-26 12:03 ` Jason Gunthorpe
2021-04-26 13:08 ` Leon Romanovsky
2021-04-26 13:11 ` Jason Gunthorpe
2021-04-27 4:45 ` Leon Romanovsky
2021-05-02 11:28 ` Leon Romanovsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).