* [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone @ 2021-04-18 13:37 Leon Romanovsky 2021-04-20 12:39 ` Jason Gunthorpe 0 siblings, 1 reply; 16+ messages in thread From: Leon Romanovsky @ 2021-04-18 13:37 UTC (permalink / raw) To: Doug Ledford, Jason Gunthorpe; +Cc: Shay Drory, linux-rdma From: Shay Drory <shayd@nvidia.com> Currently, in case of QP, the following use-after-free is possible: cpu0 cpu1 ---- ---- res_get_common_dumpit() rdma_restrack_get() fill_res_qp_entry() ib_destroy_qp_user() rdma_restrack_del() qp->device->ops.destroy_qp() ib_query_qp() qp->device->ops.query_qp() --> use-after-free-qp This is because rdma_restrack_del(), in case of QP, isn't waiting until all users are gone. Fix it by making rdma_restrack_del() wait until all users are gone for QPs as well. Fixes: 13ef5539def7 ("RDMA/restrack: Count references to the verbs objects") Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> --- drivers/infiniband/core/restrack.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c index ffabaf327242..def0c5b0efe9 100644 --- a/drivers/infiniband/core/restrack.c +++ b/drivers/infiniband/core/restrack.c @@ -340,7 +340,7 @@ void rdma_restrack_del(struct rdma_restrack_entry *res) rt = &dev->res[res->type]; old = xa_erase(&rt->xa, res->id); - if (res->type == RDMA_RESTRACK_MR || res->type == RDMA_RESTRACK_QP) + if (res->type == RDMA_RESTRACK_MR) return; WARN_ON(old != res); -- 2.30.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone 2021-04-18 13:37 [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone Leon Romanovsky @ 2021-04-20 12:39 ` Jason Gunthorpe 2021-04-20 13:06 ` Leon Romanovsky 0 siblings, 1 reply; 16+ messages in thread From: Jason Gunthorpe @ 2021-04-20 12:39 UTC (permalink / raw) To: Leon Romanovsky; +Cc: Doug Ledford, Shay Drory, linux-rdma On Sun, Apr 18, 2021 at 04:37:35PM +0300, Leon Romanovsky wrote: > From: Shay Drory <shayd@nvidia.com> > > Currently, in case of QP, the following use-after-free is possible: > > cpu0 cpu1 > ---- ---- > res_get_common_dumpit() > rdma_restrack_get() > fill_res_qp_entry() > ib_destroy_qp_user() > rdma_restrack_del() > qp->device->ops.destroy_qp() > ib_query_qp() > qp->device->ops.query_qp() > --> use-after-free-qp > > This is because rdma_restrack_del(), in case of QP, isn't waiting until > all users are gone. > > Fix it by making rdma_restrack_del() wait until all users are gone for > QPs as well. > > Fixes: 13ef5539def7 ("RDMA/restrack: Count references to the verbs objects") > Signed-off-by: Shay Drory <shayd@nvidia.com> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com> > --- > drivers/infiniband/core/restrack.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c > index ffabaf327242..def0c5b0efe9 100644 > --- a/drivers/infiniband/core/restrack.c > +++ b/drivers/infiniband/core/restrack.c > @@ -340,7 +340,7 @@ void rdma_restrack_del(struct rdma_restrack_entry *res) > rt = &dev->res[res->type]; > > old = xa_erase(&rt->xa, res->id); > - if (res->type == RDMA_RESTRACK_MR || res->type == RDMA_RESTRACK_QP) > + if (res->type == RDMA_RESTRACK_MR) > return; Why is MR skipping this? It also calls into the driver under its dumpit, at the very least this needs a comment. Jason ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone 2021-04-20 12:39 ` Jason Gunthorpe @ 2021-04-20 13:06 ` Leon Romanovsky 2021-04-20 15:25 ` Jason Gunthorpe 0 siblings, 1 reply; 16+ messages in thread From: Leon Romanovsky @ 2021-04-20 13:06 UTC (permalink / raw) To: Jason Gunthorpe; +Cc: Doug Ledford, Shay Drory, linux-rdma On Tue, Apr 20, 2021 at 09:39:50AM -0300, Jason Gunthorpe wrote: > On Sun, Apr 18, 2021 at 04:37:35PM +0300, Leon Romanovsky wrote: > > From: Shay Drory <shayd@nvidia.com> > > > > Currently, in case of QP, the following use-after-free is possible: > > > > cpu0 cpu1 > > ---- ---- > > res_get_common_dumpit() > > rdma_restrack_get() > > fill_res_qp_entry() > > ib_destroy_qp_user() > > rdma_restrack_del() > > qp->device->ops.destroy_qp() > > ib_query_qp() > > qp->device->ops.query_qp() > > --> use-after-free-qp > > > > This is because rdma_restrack_del(), in case of QP, isn't waiting until > > all users are gone. > > > > Fix it by making rdma_restrack_del() wait until all users are gone for > > QPs as well. > > > > Fixes: 13ef5539def7 ("RDMA/restrack: Count references to the verbs objects") > > Signed-off-by: Shay Drory <shayd@nvidia.com> > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com> > > --- > > drivers/infiniband/core/restrack.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c > > index ffabaf327242..def0c5b0efe9 100644 > > --- a/drivers/infiniband/core/restrack.c > > +++ b/drivers/infiniband/core/restrack.c > > @@ -340,7 +340,7 @@ void rdma_restrack_del(struct rdma_restrack_entry *res) > > rt = &dev->res[res->type]; > > > > old = xa_erase(&rt->xa, res->id); > > - if (res->type == RDMA_RESTRACK_MR || res->type == RDMA_RESTRACK_QP) > > + if (res->type == RDMA_RESTRACK_MR) > > return; > > Why is MR skipping this? I don't remember the justification for RDMA_RESTRACK_MR || RDMA_RESTRACK_QP check. My guess that it is related to the allocation flow (both are not converted) and have internal objects to the drivers. > > > It also calls into the driver under its dumpit, at the very least this > needs a comment. > > Jason ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone 2021-04-20 13:06 ` Leon Romanovsky @ 2021-04-20 15:25 ` Jason Gunthorpe 2021-04-21 5:03 ` Leon Romanovsky 0 siblings, 1 reply; 16+ messages in thread From: Jason Gunthorpe @ 2021-04-20 15:25 UTC (permalink / raw) To: Leon Romanovsky; +Cc: Doug Ledford, Shay Drory, linux-rdma On Tue, Apr 20, 2021 at 04:06:03PM +0300, Leon Romanovsky wrote: > On Tue, Apr 20, 2021 at 09:39:50AM -0300, Jason Gunthorpe wrote: > > On Sun, Apr 18, 2021 at 04:37:35PM +0300, Leon Romanovsky wrote: > > > From: Shay Drory <shayd@nvidia.com> > > > > > > Currently, in case of QP, the following use-after-free is possible: > > > > > > cpu0 cpu1 > > > res_get_common_dumpit() > > > rdma_restrack_get() > > > fill_res_qp_entry() > > > ib_destroy_qp_user() > > > rdma_restrack_del() > > > qp->device->ops.destroy_qp() > > > ib_query_qp() > > > qp->device->ops.query_qp() > > > > > > This is because rdma_restrack_del(), in case of QP, isn't waiting until > > > all users are gone. > > > > > > Fix it by making rdma_restrack_del() wait until all users are gone for > > > QPs as well. > > > > > > Fixes: 13ef5539def7 ("RDMA/restrack: Count references to the verbs objects") > > > Signed-off-by: Shay Drory <shayd@nvidia.com> > > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com> > > > drivers/infiniband/core/restrack.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c > > > index ffabaf327242..def0c5b0efe9 100644 > > > +++ b/drivers/infiniband/core/restrack.c > > > @@ -340,7 +340,7 @@ void rdma_restrack_del(struct rdma_restrack_entry *res) > > > rt = &dev->res[res->type]; > > > > > > old = xa_erase(&rt->xa, res->id); > > > - if (res->type == RDMA_RESTRACK_MR || res->type == RDMA_RESTRACK_QP) > > > + if (res->type == RDMA_RESTRACK_MR) > > > return; > > > > Why is MR skipping this? > > I don't remember the justification for RDMA_RESTRACK_MR || RDMA_RESTRACK_QP check. > My guess that it is related to the allocation flow (both are not converted) and > have internal objects to the drivers. Well, I don't understand why QP has a bug but MR would not have the same one?? Jason ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone 2021-04-20 15:25 ` Jason Gunthorpe @ 2021-04-21 5:03 ` Leon Romanovsky 2021-04-22 14:29 ` Jason Gunthorpe 0 siblings, 1 reply; 16+ messages in thread From: Leon Romanovsky @ 2021-04-21 5:03 UTC (permalink / raw) To: Jason Gunthorpe; +Cc: Doug Ledford, Shay Drory, linux-rdma On Tue, Apr 20, 2021 at 12:25:41PM -0300, Jason Gunthorpe wrote: > On Tue, Apr 20, 2021 at 04:06:03PM +0300, Leon Romanovsky wrote: > > On Tue, Apr 20, 2021 at 09:39:50AM -0300, Jason Gunthorpe wrote: > > > On Sun, Apr 18, 2021 at 04:37:35PM +0300, Leon Romanovsky wrote: > > > > From: Shay Drory <shayd@nvidia.com> > > > > > > > > Currently, in case of QP, the following use-after-free is possible: > > > > > > > > cpu0 cpu1 > > > > res_get_common_dumpit() > > > > rdma_restrack_get() > > > > fill_res_qp_entry() > > > > ib_destroy_qp_user() > > > > rdma_restrack_del() > > > > qp->device->ops.destroy_qp() > > > > ib_query_qp() > > > > qp->device->ops.query_qp() > > > > > > > > This is because rdma_restrack_del(), in case of QP, isn't waiting until > > > > all users are gone. > > > > > > > > Fix it by making rdma_restrack_del() wait until all users are gone for > > > > QPs as well. > > > > > > > > Fixes: 13ef5539def7 ("RDMA/restrack: Count references to the verbs objects") > > > > Signed-off-by: Shay Drory <shayd@nvidia.com> > > > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com> > > > > drivers/infiniband/core/restrack.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c > > > > index ffabaf327242..def0c5b0efe9 100644 > > > > +++ b/drivers/infiniband/core/restrack.c > > > > @@ -340,7 +340,7 @@ void rdma_restrack_del(struct rdma_restrack_entry *res) > > > > rt = &dev->res[res->type]; > > > > > > > > old = xa_erase(&rt->xa, res->id); > > > > - if (res->type == RDMA_RESTRACK_MR || res->type == RDMA_RESTRACK_QP) > > > > + if (res->type == RDMA_RESTRACK_MR) > > > > return; > > > > > > Why is MR skipping this? > > > > I don't remember the justification for RDMA_RESTRACK_MR || RDMA_RESTRACK_QP check. > > My guess that it is related to the allocation flow (both are not converted) and > > have internal objects to the drivers. > > Well, I don't understand why QP has a bug but MR would not have the > same one?? I didn't understand when reviewed either, but decided to post it anyway to get possible explanation for this RDMA_RESTRACK_MR || RDMA_RESTRACK_QP check. Thanks > > Jason ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone 2021-04-21 5:03 ` Leon Romanovsky @ 2021-04-22 14:29 ` Jason Gunthorpe 2021-04-25 13:03 ` Leon Romanovsky 2021-05-02 11:28 ` Leon Romanovsky 0 siblings, 2 replies; 16+ messages in thread From: Jason Gunthorpe @ 2021-04-22 14:29 UTC (permalink / raw) To: Leon Romanovsky; +Cc: Doug Ledford, Shay Drory, linux-rdma On Wed, Apr 21, 2021 at 08:03:22AM +0300, Leon Romanovsky wrote: > I didn't understand when reviewed either, but decided to post it anyway > to get possible explanation for this RDMA_RESTRACK_MR || RDMA_RESTRACK_QP > check. I think the whole thing should look more like this and we delete the if entirely. diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 2b9ffc21cbc4ad..479b16b8f6723a 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1860,6 +1860,7 @@ static void _destroy_id(struct rdma_id_private *id_priv, iw_destroy_cm_id(id_priv->cm_id.iw); } cma_leave_mc_groups(id_priv); + rdma_restrack_del(&id_priv->res); cma_release_dev(id_priv); } @@ -1873,7 +1874,6 @@ static void _destroy_id(struct rdma_id_private *id_priv, kfree(id_priv->id.route.path_rec); put_net(id_priv->id.route.addr.dev_addr.net); - rdma_restrack_del(&id_priv->res); kfree(id_priv); } diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c index 15493357cfef09..1808bc2533f155 100644 --- a/drivers/infiniband/core/counters.c +++ b/drivers/infiniband/core/counters.c @@ -176,6 +176,8 @@ static void rdma_counter_free(struct rdma_counter *counter) { struct rdma_port_counter *port_counter; + rdma_restrack_del(&counter->res); + port_counter = &counter->device->port_data[counter->port].port_counter; mutex_lock(&port_counter->lock); port_counter->num_counters--; @@ -185,7 +187,6 @@ static void rdma_counter_free(struct rdma_counter *counter) mutex_unlock(&port_counter->lock); - rdma_restrack_del(&counter->res); kfree(counter->stats); kfree(counter); } diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c index 433b426729d4ce..3884db637d33ab 100644 --- a/drivers/infiniband/core/cq.c +++ b/drivers/infiniband/core/cq.c @@ -339,11 +339,15 @@ void ib_free_cq(struct ib_cq *cq) WARN_ON_ONCE(1); } + rdma_restrack_prepare_del(&cq->res); rdma_dim_destroy(cq); trace_cq_free(cq); ret = cq->device->ops.destroy_cq(cq, NULL); - WARN_ONCE(ret, "Destroy of kernel CQ shouldn't fail"); - rdma_restrack_del(&cq->res); + if (WARN_ON(ret)) { + rdma_restrack_abort_del(&cq->res); + return; + } + rdma_restrack_finish_del(&cq->res); kfree(cq->wc); kfree(cq); } diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c index 94d83b665a2fe0..f57234ced93ca1 100644 --- a/drivers/infiniband/core/rdma_core.c +++ b/drivers/infiniband/core/rdma_core.c @@ -854,6 +854,8 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file *ufile, struct ib_ucontext *ucontext = ufile->ucontext; struct ib_device *ib_dev = ucontext->device; + rdma_restrack_prepare_del(&ucontext->res); + /* * If we are closing the FD then the user mmap VMAs must have * already been destroyed as they hold on to the filep, otherwise @@ -868,9 +870,8 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file *ufile, ib_rdmacg_uncharge(&ucontext->cg_obj, ib_dev, RDMACG_RESOURCE_HCA_HANDLE); - rdma_restrack_del(&ucontext->res); - ib_dev->ops.dealloc_ucontext(ucontext); + rdma_restrack_finish_del(&ucontext->res); WARN_ON(!xa_empty(&ucontext->mmap_xa)); kfree(ucontext); diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c index 033207882c82ce..8aa1dae40f38a7 100644 --- a/drivers/infiniband/core/restrack.c +++ b/drivers/infiniband/core/restrack.c @@ -315,11 +315,49 @@ int rdma_restrack_put(struct rdma_restrack_entry *res) } EXPORT_SYMBOL(rdma_restrack_put); -/** - * rdma_restrack_del() - delete object from the reource tracking database - * @res: resource entry - */ -void rdma_restrack_del(struct rdma_restrack_entry *res) +void rdma_restrack_prepare_del(struct rdma_restrack_entry *res) +{ + struct rdma_restrack_entry *old; + struct rdma_restrack_root *rt; + struct ib_device *dev; + + if (!res->valid) + return; + + dev = res_to_dev(res); + rt = &dev->res[res->type]; + + if (!res->no_track) { + old = xa_cmpxchg(&rt->xa, res->id, res, XA_ZERO_ENTRY, + GFP_KERNEL); + WARN_ON(old != res); + } + + /* Fence access from all concurrent threads, like netlink */ + rdma_restrack_put(res); + wait_for_completion(&res->comp); +} +EXPORT_SYMBOL(rdma_restrack_prepare_del); + +void rdma_restrack_abort_del(struct rdma_restrack_entry *res) +{ + struct rdma_restrack_entry *old; + struct rdma_restrack_root *rt; + struct ib_device *dev; + + if (!res->valid || res->no_track) + return; + + dev = res_to_dev(res); + rt = &dev->res[res->type]; + + kref_init(&res->kref); + old = xa_cmpxchg(&rt->xa, res->id, XA_ZERO_ENTRY, res, GFP_KERNEL); + WARN_ON(old != res); +} +EXPORT_SYMBOL(rdma_restrack_abort_del); + +void rdma_restrack_finish_del(struct rdma_restrack_entry *res) { struct rdma_restrack_entry *old; struct rdma_restrack_root *rt; @@ -332,24 +370,22 @@ void rdma_restrack_del(struct rdma_restrack_entry *res) } return; } - - if (res->no_track) - goto out; + res->valid = false; dev = res_to_dev(res); - if (WARN_ON(!dev)) - return; - rt = &dev->res[res->type]; - old = xa_erase(&rt->xa, res->id); - if (res->type == RDMA_RESTRACK_MR || res->type == RDMA_RESTRACK_QP) - return; - WARN_ON(old != res); + WARN_ON(old != NULL); +} +EXPORT_SYMBOL(rdma_restrack_finish_del); -out: - res->valid = false; - rdma_restrack_put(res); - wait_for_completion(&res->comp); +/** + * rdma_restrack_del() - delete object from the reource tracking database + * @res: resource entry + */ +void rdma_restrack_del(struct rdma_restrack_entry *res) +{ + rdma_restrack_prepare_del(res); + rdma_restrack_finish_del(res); } EXPORT_SYMBOL(rdma_restrack_del); diff --git a/drivers/infiniband/core/restrack.h b/drivers/infiniband/core/restrack.h index 6a04fc41f73801..87a670334acabe 100644 --- a/drivers/infiniband/core/restrack.h +++ b/drivers/infiniband/core/restrack.h @@ -27,6 +27,9 @@ int rdma_restrack_init(struct ib_device *dev); void rdma_restrack_clean(struct ib_device *dev); void rdma_restrack_add(struct rdma_restrack_entry *res); void rdma_restrack_del(struct rdma_restrack_entry *res); +void rdma_restrack_prepare_del(struct rdma_restrack_entry *res); +void rdma_restrack_finish_del(struct rdma_restrack_entry *res); +void rdma_restrack_abort_del(struct rdma_restrack_entry *res); void rdma_restrack_new(struct rdma_restrack_entry *res, enum rdma_restrack_type type); void rdma_restrack_set_name(struct rdma_restrack_entry *res, diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 2b0798151fb753..2e761a7eb92847 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -346,13 +346,16 @@ int ib_dealloc_pd_user(struct ib_pd *pd, struct ib_udata *udata) */ WARN_ON(atomic_read(&pd->usecnt)); + rdma_restrack_prepare_del(&pd->res); ret = pd->device->ops.dealloc_pd(pd, udata); - if (ret) + if (ret) { + rdma_restrack_abort_del(&pd->res); return ret; + } - rdma_restrack_del(&pd->res); + rdma_restrack_finish_del(&pd->res); kfree(pd); - return ret; + return 0; } EXPORT_SYMBOL(ib_dealloc_pd_user); @@ -1085,19 +1088,22 @@ int ib_destroy_srq_user(struct ib_srq *srq, struct ib_udata *udata) if (atomic_read(&srq->usecnt)) return -EBUSY; + rdma_restrack_prepare_del(&srq->res); ret = srq->device->ops.destroy_srq(srq, udata); - if (ret) + if (ret) { + rdma_restrack_abort_del(&srq->res); return ret; + } + rdma_restrack_finish_del(&srq->res); atomic_dec(&srq->pd->usecnt); if (srq->srq_type == IB_SRQT_XRC) atomic_dec(&srq->ext.xrc.xrcd->usecnt); if (ib_srq_has_cq(srq->srq_type)) atomic_dec(&srq->ext.cq->usecnt); - rdma_restrack_del(&srq->res); kfree(srq); - return ret; + return 0; } EXPORT_SYMBOL(ib_destroy_srq_user); @@ -1963,31 +1969,33 @@ int ib_destroy_qp_user(struct ib_qp *qp, struct ib_udata *udata) rdma_rw_cleanup_mrs(qp); rdma_counter_unbind_qp(qp, true); - rdma_restrack_del(&qp->res); + rdma_restrack_prepare_del(&qp->res); ret = qp->device->ops.destroy_qp(qp, udata); - if (!ret) { - if (alt_path_sgid_attr) - rdma_put_gid_attr(alt_path_sgid_attr); - if (av_sgid_attr) - rdma_put_gid_attr(av_sgid_attr); - if (pd) - atomic_dec(&pd->usecnt); - if (scq) - atomic_dec(&scq->usecnt); - if (rcq) - atomic_dec(&rcq->usecnt); - if (srq) - atomic_dec(&srq->usecnt); - if (ind_tbl) - atomic_dec(&ind_tbl->usecnt); - if (sec) - ib_destroy_qp_security_end(sec); - } else { + if (ret) { + rdma_restrack_abort_del(&qp->res); if (sec) ib_destroy_qp_security_abort(sec); + return ret; } - return ret; + rdma_restrack_finish_del(&qp->res); + if (alt_path_sgid_attr) + rdma_put_gid_attr(alt_path_sgid_attr); + if (av_sgid_attr) + rdma_put_gid_attr(av_sgid_attr); + if (pd) + atomic_dec(&pd->usecnt); + if (scq) + atomic_dec(&scq->usecnt); + if (rcq) + atomic_dec(&rcq->usecnt); + if (srq) + atomic_dec(&srq->usecnt); + if (ind_tbl) + atomic_dec(&ind_tbl->usecnt); + if (sec) + ib_destroy_qp_security_end(sec); + return 0; } EXPORT_SYMBOL(ib_destroy_qp_user); @@ -2050,13 +2058,16 @@ int ib_destroy_cq_user(struct ib_cq *cq, struct ib_udata *udata) if (atomic_read(&cq->usecnt)) return -EBUSY; + rdma_restrack_prepare_del(&cq->res); ret = cq->device->ops.destroy_cq(cq, udata); - if (ret) + if (ret) { + rdma_restrack_abort_del(&cq->res); return ret; + } - rdma_restrack_del(&cq->res); + rdma_restrack_finish_del(&cq->res); kfree(cq); - return ret; + return 0; } EXPORT_SYMBOL(ib_destroy_cq_user); @@ -2126,16 +2137,19 @@ int ib_dereg_mr_user(struct ib_mr *mr, struct ib_udata *udata) int ret; trace_mr_dereg(mr); - rdma_restrack_del(&mr->res); + rdma_restrack_prepare_del(&mr->res); ret = mr->device->ops.dereg_mr(mr, udata); - if (!ret) { - atomic_dec(&pd->usecnt); - if (dm) - atomic_dec(&dm->usecnt); - kfree(sig_attrs); + if (ret) { + rdma_restrack_abort_del(&mr->res); + return ret; } - return ret; + rdma_restrack_finish_del(&mr->res); + atomic_dec(&pd->usecnt); + if (dm) + atomic_dec(&dm->usecnt); + kfree(sig_attrs); + return 0; } EXPORT_SYMBOL(ib_dereg_mr_user); ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone 2021-04-22 14:29 ` Jason Gunthorpe @ 2021-04-25 13:03 ` Leon Romanovsky 2021-04-25 13:08 ` Jason Gunthorpe 2021-05-02 11:28 ` Leon Romanovsky 1 sibling, 1 reply; 16+ messages in thread From: Leon Romanovsky @ 2021-04-25 13:03 UTC (permalink / raw) To: Jason Gunthorpe; +Cc: Doug Ledford, Shay Drory, linux-rdma On Thu, Apr 22, 2021 at 11:29:39AM -0300, Jason Gunthorpe wrote: > On Wed, Apr 21, 2021 at 08:03:22AM +0300, Leon Romanovsky wrote: > > > I didn't understand when reviewed either, but decided to post it anyway > > to get possible explanation for this RDMA_RESTRACK_MR || RDMA_RESTRACK_QP > > check. > > I think the whole thing should look more like this and we delete the > if entirely. I have mixed feelings about this approach. Before "destroy can fail disaster", the restrack goal was to provide the following flow: 1. create new memory object - rdma_restrack_new() 2. create new HW object - .create_XXX() callback in the driver 3. add HW object to the DB - rdma_restrack_del() .... 4. wait for any work on this HW object to complete - rdma_restrack_del() 5. safely destroy HW object - .destroy_XXX() I really would like to stay with this flow and block any access to the object that failed to destruct - maybe add to some zombie list. The proposed prepare/abort/finish flow is much harder to implement correctly. Let's take as an example ib_destroy_qp_user(), we called to rdma_rw_cleanup_mrs(), but didn't restore them after .destroy_qp() failure. Thanks > > diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c > index 2b9ffc21cbc4ad..479b16b8f6723a 100644 > --- a/drivers/infiniband/core/cma.c > +++ b/drivers/infiniband/core/cma.c > @@ -1860,6 +1860,7 @@ static void _destroy_id(struct rdma_id_private *id_priv, > iw_destroy_cm_id(id_priv->cm_id.iw); > } > cma_leave_mc_groups(id_priv); > + rdma_restrack_del(&id_priv->res); > cma_release_dev(id_priv); > } > > @@ -1873,7 +1874,6 @@ static void _destroy_id(struct rdma_id_private *id_priv, > kfree(id_priv->id.route.path_rec); > > put_net(id_priv->id.route.addr.dev_addr.net); > - rdma_restrack_del(&id_priv->res); > kfree(id_priv); > } > > diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c > index 15493357cfef09..1808bc2533f155 100644 > --- a/drivers/infiniband/core/counters.c > +++ b/drivers/infiniband/core/counters.c > @@ -176,6 +176,8 @@ static void rdma_counter_free(struct rdma_counter *counter) > { > struct rdma_port_counter *port_counter; > > + rdma_restrack_del(&counter->res); > + > port_counter = &counter->device->port_data[counter->port].port_counter; > mutex_lock(&port_counter->lock); > port_counter->num_counters--; > @@ -185,7 +187,6 @@ static void rdma_counter_free(struct rdma_counter *counter) > > mutex_unlock(&port_counter->lock); > > - rdma_restrack_del(&counter->res); > kfree(counter->stats); > kfree(counter); > } > diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c > index 433b426729d4ce..3884db637d33ab 100644 > --- a/drivers/infiniband/core/cq.c > +++ b/drivers/infiniband/core/cq.c > @@ -339,11 +339,15 @@ void ib_free_cq(struct ib_cq *cq) > WARN_ON_ONCE(1); > } > > + rdma_restrack_prepare_del(&cq->res); > rdma_dim_destroy(cq); > trace_cq_free(cq); > ret = cq->device->ops.destroy_cq(cq, NULL); > - WARN_ONCE(ret, "Destroy of kernel CQ shouldn't fail"); > - rdma_restrack_del(&cq->res); > + if (WARN_ON(ret)) { > + rdma_restrack_abort_del(&cq->res); > + return; > + } > + rdma_restrack_finish_del(&cq->res); > kfree(cq->wc); > kfree(cq); > } > diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c > index 94d83b665a2fe0..f57234ced93ca1 100644 > --- a/drivers/infiniband/core/rdma_core.c > +++ b/drivers/infiniband/core/rdma_core.c > @@ -854,6 +854,8 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file *ufile, > struct ib_ucontext *ucontext = ufile->ucontext; > struct ib_device *ib_dev = ucontext->device; > > + rdma_restrack_prepare_del(&ucontext->res); > + > /* > * If we are closing the FD then the user mmap VMAs must have > * already been destroyed as they hold on to the filep, otherwise > @@ -868,9 +870,8 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file *ufile, > ib_rdmacg_uncharge(&ucontext->cg_obj, ib_dev, > RDMACG_RESOURCE_HCA_HANDLE); > > - rdma_restrack_del(&ucontext->res); > - > ib_dev->ops.dealloc_ucontext(ucontext); > + rdma_restrack_finish_del(&ucontext->res); > WARN_ON(!xa_empty(&ucontext->mmap_xa)); > kfree(ucontext); > > diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c > index 033207882c82ce..8aa1dae40f38a7 100644 > --- a/drivers/infiniband/core/restrack.c > +++ b/drivers/infiniband/core/restrack.c > @@ -315,11 +315,49 @@ int rdma_restrack_put(struct rdma_restrack_entry *res) > } > EXPORT_SYMBOL(rdma_restrack_put); > > -/** > - * rdma_restrack_del() - delete object from the reource tracking database > - * @res: resource entry > - */ > -void rdma_restrack_del(struct rdma_restrack_entry *res) > +void rdma_restrack_prepare_del(struct rdma_restrack_entry *res) > +{ > + struct rdma_restrack_entry *old; > + struct rdma_restrack_root *rt; > + struct ib_device *dev; > + > + if (!res->valid) > + return; > + > + dev = res_to_dev(res); > + rt = &dev->res[res->type]; > + > + if (!res->no_track) { > + old = xa_cmpxchg(&rt->xa, res->id, res, XA_ZERO_ENTRY, > + GFP_KERNEL); > + WARN_ON(old != res); > + } > + > + /* Fence access from all concurrent threads, like netlink */ > + rdma_restrack_put(res); > + wait_for_completion(&res->comp); > +} > +EXPORT_SYMBOL(rdma_restrack_prepare_del); > + > +void rdma_restrack_abort_del(struct rdma_restrack_entry *res) > +{ > + struct rdma_restrack_entry *old; > + struct rdma_restrack_root *rt; > + struct ib_device *dev; > + > + if (!res->valid || res->no_track) > + return; > + > + dev = res_to_dev(res); > + rt = &dev->res[res->type]; > + > + kref_init(&res->kref); > + old = xa_cmpxchg(&rt->xa, res->id, XA_ZERO_ENTRY, res, GFP_KERNEL); > + WARN_ON(old != res); > +} > +EXPORT_SYMBOL(rdma_restrack_abort_del); > + > +void rdma_restrack_finish_del(struct rdma_restrack_entry *res) > { > struct rdma_restrack_entry *old; > struct rdma_restrack_root *rt; > @@ -332,24 +370,22 @@ void rdma_restrack_del(struct rdma_restrack_entry *res) > } > return; > } > - > - if (res->no_track) > - goto out; > + res->valid = false; > > dev = res_to_dev(res); > - if (WARN_ON(!dev)) > - return; > - > rt = &dev->res[res->type]; > - > old = xa_erase(&rt->xa, res->id); > - if (res->type == RDMA_RESTRACK_MR || res->type == RDMA_RESTRACK_QP) > - return; > - WARN_ON(old != res); > + WARN_ON(old != NULL); > +} > +EXPORT_SYMBOL(rdma_restrack_finish_del); > > -out: > - res->valid = false; > - rdma_restrack_put(res); > - wait_for_completion(&res->comp); > +/** > + * rdma_restrack_del() - delete object from the reource tracking database > + * @res: resource entry > + */ > +void rdma_restrack_del(struct rdma_restrack_entry *res) > +{ > + rdma_restrack_prepare_del(res); > + rdma_restrack_finish_del(res); > } > EXPORT_SYMBOL(rdma_restrack_del); > diff --git a/drivers/infiniband/core/restrack.h b/drivers/infiniband/core/restrack.h > index 6a04fc41f73801..87a670334acabe 100644 > --- a/drivers/infiniband/core/restrack.h > +++ b/drivers/infiniband/core/restrack.h > @@ -27,6 +27,9 @@ int rdma_restrack_init(struct ib_device *dev); > void rdma_restrack_clean(struct ib_device *dev); > void rdma_restrack_add(struct rdma_restrack_entry *res); > void rdma_restrack_del(struct rdma_restrack_entry *res); > +void rdma_restrack_prepare_del(struct rdma_restrack_entry *res); > +void rdma_restrack_finish_del(struct rdma_restrack_entry *res); > +void rdma_restrack_abort_del(struct rdma_restrack_entry *res); > void rdma_restrack_new(struct rdma_restrack_entry *res, > enum rdma_restrack_type type); > void rdma_restrack_set_name(struct rdma_restrack_entry *res, > diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c > index 2b0798151fb753..2e761a7eb92847 100644 > --- a/drivers/infiniband/core/verbs.c > +++ b/drivers/infiniband/core/verbs.c > @@ -346,13 +346,16 @@ int ib_dealloc_pd_user(struct ib_pd *pd, struct ib_udata *udata) > */ > WARN_ON(atomic_read(&pd->usecnt)); > > + rdma_restrack_prepare_del(&pd->res); > ret = pd->device->ops.dealloc_pd(pd, udata); > - if (ret) > + if (ret) { > + rdma_restrack_abort_del(&pd->res); > return ret; > + } > > - rdma_restrack_del(&pd->res); > + rdma_restrack_finish_del(&pd->res); > kfree(pd); > - return ret; > + return 0; > } > EXPORT_SYMBOL(ib_dealloc_pd_user); > > @@ -1085,19 +1088,22 @@ int ib_destroy_srq_user(struct ib_srq *srq, struct ib_udata *udata) > if (atomic_read(&srq->usecnt)) > return -EBUSY; > > + rdma_restrack_prepare_del(&srq->res); > ret = srq->device->ops.destroy_srq(srq, udata); > - if (ret) > + if (ret) { > + rdma_restrack_abort_del(&srq->res); > return ret; > + } > + rdma_restrack_finish_del(&srq->res); > > atomic_dec(&srq->pd->usecnt); > if (srq->srq_type == IB_SRQT_XRC) > atomic_dec(&srq->ext.xrc.xrcd->usecnt); > if (ib_srq_has_cq(srq->srq_type)) > atomic_dec(&srq->ext.cq->usecnt); > - rdma_restrack_del(&srq->res); > kfree(srq); > > - return ret; > + return 0; > } > EXPORT_SYMBOL(ib_destroy_srq_user); > > @@ -1963,31 +1969,33 @@ int ib_destroy_qp_user(struct ib_qp *qp, struct ib_udata *udata) > rdma_rw_cleanup_mrs(qp); > > rdma_counter_unbind_qp(qp, true); > - rdma_restrack_del(&qp->res); > + rdma_restrack_prepare_del(&qp->res); > ret = qp->device->ops.destroy_qp(qp, udata); > - if (!ret) { > - if (alt_path_sgid_attr) > - rdma_put_gid_attr(alt_path_sgid_attr); > - if (av_sgid_attr) > - rdma_put_gid_attr(av_sgid_attr); > - if (pd) > - atomic_dec(&pd->usecnt); > - if (scq) > - atomic_dec(&scq->usecnt); > - if (rcq) > - atomic_dec(&rcq->usecnt); > - if (srq) > - atomic_dec(&srq->usecnt); > - if (ind_tbl) > - atomic_dec(&ind_tbl->usecnt); > - if (sec) > - ib_destroy_qp_security_end(sec); > - } else { > + if (ret) { > + rdma_restrack_abort_del(&qp->res); > if (sec) > ib_destroy_qp_security_abort(sec); > + return ret; > } > > - return ret; > + rdma_restrack_finish_del(&qp->res); > + if (alt_path_sgid_attr) > + rdma_put_gid_attr(alt_path_sgid_attr); > + if (av_sgid_attr) > + rdma_put_gid_attr(av_sgid_attr); > + if (pd) > + atomic_dec(&pd->usecnt); > + if (scq) > + atomic_dec(&scq->usecnt); > + if (rcq) > + atomic_dec(&rcq->usecnt); > + if (srq) > + atomic_dec(&srq->usecnt); > + if (ind_tbl) > + atomic_dec(&ind_tbl->usecnt); > + if (sec) > + ib_destroy_qp_security_end(sec); > + return 0; > } > EXPORT_SYMBOL(ib_destroy_qp_user); > > @@ -2050,13 +2058,16 @@ int ib_destroy_cq_user(struct ib_cq *cq, struct ib_udata *udata) > if (atomic_read(&cq->usecnt)) > return -EBUSY; > > + rdma_restrack_prepare_del(&cq->res); > ret = cq->device->ops.destroy_cq(cq, udata); > - if (ret) > + if (ret) { > + rdma_restrack_abort_del(&cq->res); > return ret; > + } > > - rdma_restrack_del(&cq->res); > + rdma_restrack_finish_del(&cq->res); > kfree(cq); > - return ret; > + return 0; > } > EXPORT_SYMBOL(ib_destroy_cq_user); > > @@ -2126,16 +2137,19 @@ int ib_dereg_mr_user(struct ib_mr *mr, struct ib_udata *udata) > int ret; > > trace_mr_dereg(mr); > - rdma_restrack_del(&mr->res); > + rdma_restrack_prepare_del(&mr->res); > ret = mr->device->ops.dereg_mr(mr, udata); > - if (!ret) { > - atomic_dec(&pd->usecnt); > - if (dm) > - atomic_dec(&dm->usecnt); > - kfree(sig_attrs); > + if (ret) { > + rdma_restrack_abort_del(&mr->res); > + return ret; > } > > - return ret; > + rdma_restrack_finish_del(&mr->res); > + atomic_dec(&pd->usecnt); > + if (dm) > + atomic_dec(&dm->usecnt); > + kfree(sig_attrs); > + return 0; > } > EXPORT_SYMBOL(ib_dereg_mr_user); > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone 2021-04-25 13:03 ` Leon Romanovsky @ 2021-04-25 13:08 ` Jason Gunthorpe 2021-04-25 13:44 ` Leon Romanovsky 0 siblings, 1 reply; 16+ messages in thread From: Jason Gunthorpe @ 2021-04-25 13:08 UTC (permalink / raw) To: Leon Romanovsky; +Cc: Doug Ledford, Shay Drory, linux-rdma On Sun, Apr 25, 2021 at 04:03:47PM +0300, Leon Romanovsky wrote: > On Thu, Apr 22, 2021 at 11:29:39AM -0300, Jason Gunthorpe wrote: > > On Wed, Apr 21, 2021 at 08:03:22AM +0300, Leon Romanovsky wrote: > > > > > I didn't understand when reviewed either, but decided to post it anyway > > > to get possible explanation for this RDMA_RESTRACK_MR || RDMA_RESTRACK_QP > > > check. > > > > I think the whole thing should look more like this and we delete the > > if entirely. > > I have mixed feelings about this approach. Before "destroy can fail disaster", > the restrack goal was to provide the following flow: > 1. create new memory object - rdma_restrack_new() > 2. create new HW object - .create_XXX() callback in the driver > 3. add HW object to the DB - rdma_restrack_del() > .... > 4. wait for any work on this HW object to complete - rdma_restrack_del() > 5. safely destroy HW object - .destroy_XXX() > > I really would like to stay with this flow and block any access to the > object that failed to destruct - maybe add to some zombie list. That isn't the semantic we now have for destroy. > The proposed prepare/abort/finish flow is much harder to implement correctly. > Let's take as an example ib_destroy_qp_user(), we called to rdma_rw_cleanup_mrs(), > but didn't restore them after .destroy_qp() failure. I think it is a bug we call rdma_rw code in a a user path. Jason ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone 2021-04-25 13:08 ` Jason Gunthorpe @ 2021-04-25 13:44 ` Leon Romanovsky 2021-04-25 17:22 ` Jason Gunthorpe 0 siblings, 1 reply; 16+ messages in thread From: Leon Romanovsky @ 2021-04-25 13:44 UTC (permalink / raw) To: Jason Gunthorpe; +Cc: Doug Ledford, Shay Drory, linux-rdma On Sun, Apr 25, 2021 at 10:08:57AM -0300, Jason Gunthorpe wrote: > On Sun, Apr 25, 2021 at 04:03:47PM +0300, Leon Romanovsky wrote: > > On Thu, Apr 22, 2021 at 11:29:39AM -0300, Jason Gunthorpe wrote: > > > On Wed, Apr 21, 2021 at 08:03:22AM +0300, Leon Romanovsky wrote: > > > > > > > I didn't understand when reviewed either, but decided to post it anyway > > > > to get possible explanation for this RDMA_RESTRACK_MR || RDMA_RESTRACK_QP > > > > check. > > > > > > I think the whole thing should look more like this and we delete the > > > if entirely. > > > > I have mixed feelings about this approach. Before "destroy can fail disaster", > > the restrack goal was to provide the following flow: > > 1. create new memory object - rdma_restrack_new() > > 2. create new HW object - .create_XXX() callback in the driver > > 3. add HW object to the DB - rdma_restrack_del() > > .... > > 4. wait for any work on this HW object to complete - rdma_restrack_del() > > 5. safely destroy HW object - .destroy_XXX() > > > > I really would like to stay with this flow and block any access to the > > object that failed to destruct - maybe add to some zombie list. > > That isn't the semantic we now have for destroy. I would say that it is my mistake introduced when changed destroy to return an error. > > > The proposed prepare/abort/finish flow is much harder to implement correctly. > > Let's take as an example ib_destroy_qp_user(), we called to rdma_rw_cleanup_mrs(), > > but didn't restore them after .destroy_qp() failure. > > I think it is a bug we call rdma_rw code in a a user path. It was an example of a flow that wasn't restored properly. The same goes for ib_dealloc_pd_user(), release of __internal_mr. Of course, these flows shouldn't fail because of being kernel flows, but it is not clear from the code. Thanks > > Jason ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone 2021-04-25 13:44 ` Leon Romanovsky @ 2021-04-25 17:22 ` Jason Gunthorpe 2021-04-25 17:38 ` Leon Romanovsky 0 siblings, 1 reply; 16+ messages in thread From: Jason Gunthorpe @ 2021-04-25 17:22 UTC (permalink / raw) To: Leon Romanovsky; +Cc: Doug Ledford, Shay Drory, linux-rdma On Sun, Apr 25, 2021 at 04:44:55PM +0300, Leon Romanovsky wrote: > > > The proposed prepare/abort/finish flow is much harder to implement correctly. > > > Let's take as an example ib_destroy_qp_user(), we called to rdma_rw_cleanup_mrs(), > > > but didn't restore them after .destroy_qp() failure. > > > > I think it is a bug we call rdma_rw code in a a user path. > > It was an example of a flow that wasn't restored properly. > The same goes for ib_dealloc_pd_user(), release of __internal_mr. > > Of course, these flows shouldn't fail because of being kernel flows, but it is not clear > from the code. Well, exactly, user flows are not allowed to do extra stuff before calling the driver destroy So the arrangement I gave is reasonable and make sense, it is certainly better than the hodge podge of ordering that we have today Jason ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone 2021-04-25 17:22 ` Jason Gunthorpe @ 2021-04-25 17:38 ` Leon Romanovsky 2021-04-26 12:03 ` Jason Gunthorpe 0 siblings, 1 reply; 16+ messages in thread From: Leon Romanovsky @ 2021-04-25 17:38 UTC (permalink / raw) To: Jason Gunthorpe; +Cc: Doug Ledford, Shay Drory, linux-rdma On Sun, Apr 25, 2021 at 02:22:54PM -0300, Jason Gunthorpe wrote: > On Sun, Apr 25, 2021 at 04:44:55PM +0300, Leon Romanovsky wrote: > > > > The proposed prepare/abort/finish flow is much harder to implement correctly. > > > > Let's take as an example ib_destroy_qp_user(), we called to rdma_rw_cleanup_mrs(), > > > > but didn't restore them after .destroy_qp() failure. > > > > > > I think it is a bug we call rdma_rw code in a a user path. > > > > It was an example of a flow that wasn't restored properly. > > The same goes for ib_dealloc_pd_user(), release of __internal_mr. > > > > Of course, these flows shouldn't fail because of being kernel flows, but it is not clear > > from the code. > > Well, exactly, user flows are not allowed to do extra stuff before > calling the driver destroy > > So the arrangement I gave is reasonable and make sense, it is > certainly better than the hodge podge of ordering that we have today I thought about simpler solution - move rdma_restrack_del() before .destroy() callbacks together with attempt to readd res object if destroy fails. Thanks > > Jason ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone 2021-04-25 17:38 ` Leon Romanovsky @ 2021-04-26 12:03 ` Jason Gunthorpe 2021-04-26 13:08 ` Leon Romanovsky 0 siblings, 1 reply; 16+ messages in thread From: Jason Gunthorpe @ 2021-04-26 12:03 UTC (permalink / raw) To: Leon Romanovsky; +Cc: Doug Ledford, Shay Drory, linux-rdma On Sun, Apr 25, 2021 at 08:38:57PM +0300, Leon Romanovsky wrote: > On Sun, Apr 25, 2021 at 02:22:54PM -0300, Jason Gunthorpe wrote: > > On Sun, Apr 25, 2021 at 04:44:55PM +0300, Leon Romanovsky wrote: > > > > > The proposed prepare/abort/finish flow is much harder to implement correctly. > > > > > Let's take as an example ib_destroy_qp_user(), we called to rdma_rw_cleanup_mrs(), > > > > > but didn't restore them after .destroy_qp() failure. > > > > > > > > I think it is a bug we call rdma_rw code in a a user path. > > > > > > It was an example of a flow that wasn't restored properly. > > > The same goes for ib_dealloc_pd_user(), release of __internal_mr. > > > > > > Of course, these flows shouldn't fail because of being kernel flows, but it is not clear > > > from the code. > > > > Well, exactly, user flows are not allowed to do extra stuff before > > calling the driver destroy > > > > So the arrangement I gave is reasonable and make sense, it is > > certainly better than the hodge podge of ordering that we have today > > I thought about simpler solution - move rdma_restrack_del() before .destroy() > callbacks together with attempt to readd res object if destroy fails. Is isn't simpler, now add can fail and can't be recovered Jason ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone 2021-04-26 12:03 ` Jason Gunthorpe @ 2021-04-26 13:08 ` Leon Romanovsky 2021-04-26 13:11 ` Jason Gunthorpe 0 siblings, 1 reply; 16+ messages in thread From: Leon Romanovsky @ 2021-04-26 13:08 UTC (permalink / raw) To: Jason Gunthorpe; +Cc: Doug Ledford, Shay Drory, linux-rdma On Mon, Apr 26, 2021 at 09:03:49AM -0300, Jason Gunthorpe wrote: > On Sun, Apr 25, 2021 at 08:38:57PM +0300, Leon Romanovsky wrote: > > On Sun, Apr 25, 2021 at 02:22:54PM -0300, Jason Gunthorpe wrote: > > > On Sun, Apr 25, 2021 at 04:44:55PM +0300, Leon Romanovsky wrote: > > > > > > The proposed prepare/abort/finish flow is much harder to implement correctly. > > > > > > Let's take as an example ib_destroy_qp_user(), we called to rdma_rw_cleanup_mrs(), > > > > > > but didn't restore them after .destroy_qp() failure. > > > > > > > > > > I think it is a bug we call rdma_rw code in a a user path. > > > > > > > > It was an example of a flow that wasn't restored properly. > > > > The same goes for ib_dealloc_pd_user(), release of __internal_mr. > > > > > > > > Of course, these flows shouldn't fail because of being kernel flows, but it is not clear > > > > from the code. > > > > > > Well, exactly, user flows are not allowed to do extra stuff before > > > calling the driver destroy > > > > > > So the arrangement I gave is reasonable and make sense, it is > > > certainly better than the hodge podge of ordering that we have today > > > > I thought about simpler solution - move rdma_restrack_del() before .destroy() > > callbacks together with attempt to readd res object if destroy fails. > > Is isn't simpler, now add can fail and can't be recovered It is not different from failure during first call to rdma_restrack_add(). You didn't like the idea to be strict with addition of restrack, but want to be strict in reinsert. Thanks > > Jason ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone 2021-04-26 13:08 ` Leon Romanovsky @ 2021-04-26 13:11 ` Jason Gunthorpe 2021-04-27 4:45 ` Leon Romanovsky 0 siblings, 1 reply; 16+ messages in thread From: Jason Gunthorpe @ 2021-04-26 13:11 UTC (permalink / raw) To: Leon Romanovsky; +Cc: Doug Ledford, Shay Drory, linux-rdma On Mon, Apr 26, 2021 at 04:08:42PM +0300, Leon Romanovsky wrote: > On Mon, Apr 26, 2021 at 09:03:49AM -0300, Jason Gunthorpe wrote: > > On Sun, Apr 25, 2021 at 08:38:57PM +0300, Leon Romanovsky wrote: > > > On Sun, Apr 25, 2021 at 02:22:54PM -0300, Jason Gunthorpe wrote: > > > > On Sun, Apr 25, 2021 at 04:44:55PM +0300, Leon Romanovsky wrote: > > > > > > > The proposed prepare/abort/finish flow is much harder to implement correctly. > > > > > > > Let's take as an example ib_destroy_qp_user(), we called to rdma_rw_cleanup_mrs(), > > > > > > > but didn't restore them after .destroy_qp() failure. > > > > > > > > > > > > I think it is a bug we call rdma_rw code in a a user path. > > > > > > > > > > It was an example of a flow that wasn't restored properly. > > > > > The same goes for ib_dealloc_pd_user(), release of __internal_mr. > > > > > > > > > > Of course, these flows shouldn't fail because of being kernel flows, but it is not clear > > > > > from the code. > > > > > > > > Well, exactly, user flows are not allowed to do extra stuff before > > > > calling the driver destroy > > > > > > > > So the arrangement I gave is reasonable and make sense, it is > > > > certainly better than the hodge podge of ordering that we have today > > > > > > I thought about simpler solution - move rdma_restrack_del() before .destroy() > > > callbacks together with attempt to readd res object if destroy fails. > > > > Is isn't simpler, now add can fail and can't be recovered > > It is not different from failure during first call to rdma_restrack_add(). > You didn't like the idea to be strict with addition of restrack, but > want to be strict in reinsert. It is ugly we couldn't fix the add side, lets not repeat that uglyness in other places Jason ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone 2021-04-26 13:11 ` Jason Gunthorpe @ 2021-04-27 4:45 ` Leon Romanovsky 0 siblings, 0 replies; 16+ messages in thread From: Leon Romanovsky @ 2021-04-27 4:45 UTC (permalink / raw) To: Jason Gunthorpe; +Cc: Doug Ledford, Shay Drory, linux-rdma On Mon, Apr 26, 2021 at 10:11:07AM -0300, Jason Gunthorpe wrote: > On Mon, Apr 26, 2021 at 04:08:42PM +0300, Leon Romanovsky wrote: > > On Mon, Apr 26, 2021 at 09:03:49AM -0300, Jason Gunthorpe wrote: > > > On Sun, Apr 25, 2021 at 08:38:57PM +0300, Leon Romanovsky wrote: > > > > On Sun, Apr 25, 2021 at 02:22:54PM -0300, Jason Gunthorpe wrote: > > > > > On Sun, Apr 25, 2021 at 04:44:55PM +0300, Leon Romanovsky wrote: > > > > > > > > The proposed prepare/abort/finish flow is much harder to implement correctly. > > > > > > > > Let's take as an example ib_destroy_qp_user(), we called to rdma_rw_cleanup_mrs(), > > > > > > > > but didn't restore them after .destroy_qp() failure. > > > > > > > > > > > > > > I think it is a bug we call rdma_rw code in a a user path. > > > > > > > > > > > > It was an example of a flow that wasn't restored properly. > > > > > > The same goes for ib_dealloc_pd_user(), release of __internal_mr. > > > > > > > > > > > > Of course, these flows shouldn't fail because of being kernel flows, but it is not clear > > > > > > from the code. > > > > > > > > > > Well, exactly, user flows are not allowed to do extra stuff before > > > > > calling the driver destroy > > > > > > > > > > So the arrangement I gave is reasonable and make sense, it is > > > > > certainly better than the hodge podge of ordering that we have today > > > > > > > > I thought about simpler solution - move rdma_restrack_del() before .destroy() > > > > callbacks together with attempt to readd res object if destroy fails. > > > > > > Is isn't simpler, now add can fail and can't be recovered > > > > It is not different from failure during first call to rdma_restrack_add(). > > You didn't like the idea to be strict with addition of restrack, but > > want to be strict in reinsert. > > It is ugly we couldn't fix the add side, lets not repeat that uglyness > in other places Why can't we fix _add? Thanks > > Jason ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone 2021-04-22 14:29 ` Jason Gunthorpe 2021-04-25 13:03 ` Leon Romanovsky @ 2021-05-02 11:28 ` Leon Romanovsky 1 sibling, 0 replies; 16+ messages in thread From: Leon Romanovsky @ 2021-05-02 11:28 UTC (permalink / raw) To: Jason Gunthorpe; +Cc: Doug Ledford, Shay Drory, linux-rdma On Thu, Apr 22, 2021 at 11:29:39AM -0300, Jason Gunthorpe wrote: > On Wed, Apr 21, 2021 at 08:03:22AM +0300, Leon Romanovsky wrote: > > > I didn't understand when reviewed either, but decided to post it anyway > > to get possible explanation for this RDMA_RESTRACK_MR || RDMA_RESTRACK_QP > > check. > > I think the whole thing should look more like this and we delete the > if entirely. Finally, I had a time to work on it and got immediate reminder why we have "if ...". > > diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c > index 2b9ffc21cbc4ad..479b16b8f6723a 100644 <...> > @@ -1963,31 +1969,33 @@ int ib_destroy_qp_user(struct ib_qp *qp, struct ib_udata *udata) > rdma_rw_cleanup_mrs(qp); > > rdma_counter_unbind_qp(qp, true); > - rdma_restrack_del(&qp->res); > + rdma_restrack_prepare_del(&qp->res); > ret = qp->device->ops.destroy_qp(qp, udata); > - if (!ret) { > - if (alt_path_sgid_attr) > - rdma_put_gid_attr(alt_path_sgid_attr); > - if (av_sgid_attr) > - rdma_put_gid_attr(av_sgid_attr); > - if (pd) > - atomic_dec(&pd->usecnt); > - if (scq) > - atomic_dec(&scq->usecnt); > - if (rcq) > - atomic_dec(&rcq->usecnt); > - if (srq) > - atomic_dec(&srq->usecnt); > - if (ind_tbl) > - atomic_dec(&ind_tbl->usecnt); > - if (sec) > - ib_destroy_qp_security_end(sec); > - } else { > + if (ret) { > + rdma_restrack_abort_del(&qp->res); > if (sec) > ib_destroy_qp_security_abort(sec); > + return ret; > } > > - return ret; > + rdma_restrack_finish_del(&qp->res); This causes to use-after-free because "qp" is released in the driver. [leonro@vm ~]$ mkt test .s....[ 24.350657] ================================================================== [ 24.350917] BUG: KASAN: use-after-free in rdma_restrack_finish_del+0x1a7/0x210 [ib_core] [ 24.351303] Read of size 1 at addr ffff88800c59f118 by task python3/243 [ 24.351416] [ 24.351495] CPU: 4 PID: 243 Comm: python3 Not tainted 5.12.0-rc2+ #2923 [ 24.351644] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 24.351832] Call Trace: [ 24.351886] dump_stack+0x93/0xc2 [ 24.351965] print_address_description.constprop.0+0x18/0x140 [ 24.352074] ? rdma_restrack_finish_del+0x1a7/0x210 [ib_core] [ 24.352204] ? rdma_restrack_finish_del+0x1a7/0x210 [ib_core] [ 24.352333] kasan_report.cold+0x7c/0xd8 [ 24.352413] ? rdma_restrack_finish_del+0x1a7/0x210 [ib_core] [ 24.352543] rdma_restrack_finish_del+0x1a7/0x210 [ib_core] [ 24.352654] ib_destroy_qp_user+0x297/0x540 [ib_core] [ 24.352769] uverbs_free_qp+0x7d/0x150 [ib_uverbs] [ 24.352864] uverbs_destroy_uobject+0x164/0x6c0 [ib_uverbs] [ 24.352949] uobj_destroy+0x72/0xf0 [ib_uverbs] [ 24.353041] ib_uverbs_cmd_verbs+0x19b9/0x2ee0 [ib_uverbs] [ 24.353147] ? uverbs_free_qp+0x150/0x150 [ib_uverbs] [ 24.353246] ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0 [ 24.353340] ? uverbs_fill_udata+0x540/0x540 [ib_uverbs] [ 24.353455] ? filemap_map_pages+0x2d9/0xf30 [ 24.353552] ? lock_acquire+0x1a9/0x6d0 [ 24.353620] ? ib_uverbs_ioctl+0x11e/0x260 [ib_uverbs] [ 24.353720] ? __might_fault+0xba/0x160 [ 24.353790] ? lock_release+0x6c0/0x6c0 [ 24.353866] ib_uverbs_ioctl+0x169/0x260 [ib_uverbs] [ 24.353958] ? ib_uverbs_ioctl+0x11e/0x260 [ib_uverbs] [ 24.354049] ? ib_uverbs_cmd_verbs+0x2ee0/0x2ee0 [ib_uverbs] [ 24.354158] __x64_sys_ioctl+0x7e6/0x1050 [ 24.354217] ? generic_block_fiemap+0x60/0x60 [ 24.354305] ? down_write_nested+0x150/0x150 [ 24.354400] ? do_user_addr_fault+0x219/0xdc0 [ 24.354497] ? lockdep_hardirqs_on_prepare+0x273/0x3e0 [ 24.354584] ? syscall_enter_from_user_mode+0x1d/0x50 [ 24.354678] do_syscall_64+0x2d/0x40 [ 24.354747] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 24.354837] RIP: 0033:0x7fbfd862b5db [ 24.354911] Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff 85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6d b8 0c 00 f7 d8 64 89 01 48 [ 24.355200] RSP: 002b:00007fff7e9fde58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 24.355336] RAX: ffffffffffffffda RBX: 00007fff7e9fdf88 RCX: 00007fbfd862b5db [ 24.355468] RDX: 00007fff7e9fdf70 RSI: 00000000c0181b01 RDI: 0000000000000005 [ 24.355587] RBP: 00007fff7e9fdf50 R08: 000055d6a4f8b3f0 R09: 00007fbfd800d000 [ 24.355713] R10: 00007fbfd800d0f0 R11: 0000000000000246 R12: 00007fff7e9fdf50 [ 24.355836] R13: 00007fff7e9fdf2c R14: 000055d6a4e59d90 R15: 000055d6a4f8b530 [ 24.355973] So let's finish memory allocation conversion patches before. Thanks ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2021-05-02 11:28 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-04-18 13:37 [PATCH rdma-next] RDMA/restrack: Delay QP deletion till all users are gone Leon Romanovsky 2021-04-20 12:39 ` Jason Gunthorpe 2021-04-20 13:06 ` Leon Romanovsky 2021-04-20 15:25 ` Jason Gunthorpe 2021-04-21 5:03 ` Leon Romanovsky 2021-04-22 14:29 ` Jason Gunthorpe 2021-04-25 13:03 ` Leon Romanovsky 2021-04-25 13:08 ` Jason Gunthorpe 2021-04-25 13:44 ` Leon Romanovsky 2021-04-25 17:22 ` Jason Gunthorpe 2021-04-25 17:38 ` Leon Romanovsky 2021-04-26 12:03 ` Jason Gunthorpe 2021-04-26 13:08 ` Leon Romanovsky 2021-04-26 13:11 ` Jason Gunthorpe 2021-04-27 4:45 ` Leon Romanovsky 2021-05-02 11:28 ` Leon Romanovsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).