All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V4] IB/uverbs: Fix race between uverbs_close and remove_one
@ 2016-03-12 15:18 Devesh Sharma
       [not found] ` <1457795927-16634-1-git-send-email-devesh.sharma-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Devesh Sharma @ 2016-03-12 15:18 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	Devesh Sharma

Fixes: 35d4a0b63dc0 ("IB/uverbs: Fix race between ib_uverbs_open and remove_one")

If "rmmod <vendor-driver>" is done while having rdma applications still running
on a host, the system crashes in the page-fault handler trying to fetch
physical address of an daggling device pointer.

During rmmod every vendor driver must call ib_unregister_device. As part of
this call, IB-stack tries to free-up all the resource associated with the
leaving driver. During the call to ib_uverbs_remove_one, a fatal-event is given
to all the alive rdma applications. The fatal-event causes applications to call
ib_uverbs_close(). Thus, causes two different cleanup context to run in parallel.
In the above scenario, it is possible that ib_uverbs_remove_one() completes and
unblock ib_unregister_device() while ib_uverbs_close() is still waiting for
some of the hardware specific firmware commands to finish. The unblocked
ib_unregister_device() context can actually proceed and free the ib_device
structure. At the same time, in ib_uverbs_close() context the firmware command
may complete and may try to dereference ib_device pointer. But ib_device
pointer is a daggling pointer. Dereference to this pointer causes kernel to
invoke the page_fault handler. It fails to fetch the physical address and
causes kernel panic.

This patch adds two solutions as a remedy:

A) In ib_uverbs_close() context a NULL pointer check on dev->ib_dev pointer is
   added. The check is under a srcu_read_lock. If dev->ib_dev is NULL, the
   check prevents ib_uverbs_close() to enter into ib_uverbs_cleanup_ucontext()
   if ib_uverbs_remove_one has already started. If dev->ib_dev is not NULL,
   ib_uverbs_close() will continue as it is today.

With solution 'A' in place, it is still possible that after reading dev->ib_dev
NULL ib_uverbs_close() context go ahaed and put reference to
ib_uverbs_release_file, even before ib_uverbs_remove_one() reaches to this file
pointer traversing the entire file list one by one. Thus, again to synchronize
these two independent contexts we add solution 'B'

B) If ib_uverbs_close() context reads dev->ib_dev as NULL then, drop the
   srcu_read_lock() and wait for ib_uverbs_remove_one() context to reach
   to the stage where all the resources attached to this file pointer are
   freed. Now, allow ib_uverbs_close() context to put the reference of
   ib_uverbs_release_file. This behaviour is achived with the help of a
   completion signaling.

CC: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Reviewed-by: Julia Lawall <julia.lawall-L2FTfq7BK8M@public.gmane.org>
Signed-off-by: Devesh Sharma <devesh.sharma-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uverbs.h      |  1 +
 drivers/infiniband/core/uverbs_main.c | 16 +++++++++++++++-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 612ccfd..94a7339 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -121,6 +121,7 @@ struct ib_uverbs_file {
 	struct ib_event_handler			event_handler;
 	struct ib_uverbs_event_file	       *async_file;
 	struct list_head			list;
+	struct completion			fcomp;
 	int					is_closed;
 };
 
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 39680ae..da1fed2 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -928,6 +928,7 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp)
 	file->async_file = NULL;
 	kref_init(&file->ref);
 	mutex_init(&file->mutex);
+	init_completion(&file->fcomp);
 
 	filp->private_data = file;
 	kobject_get(&dev->kobj);
@@ -954,6 +955,17 @@ static int ib_uverbs_close(struct inode *inode, struct file *filp)
 	struct ib_uverbs_file *file = filp->private_data;
 	struct ib_uverbs_device *dev = file->device;
 	struct ib_ucontext *ucontext = NULL;
+	struct ib_device *ib_dev;
+	int srcu_key;
+
+	srcu_key = srcu_read_lock(&dev->disassociate_srcu);
+	ib_dev = srcu_dereference(dev->ib_dev,
+				  &dev->disassociate_srcu);
+	if (!ib_dev) {
+		srcu_read_unlock(&dev->disassociate_srcu, srcu_key);
+		wait_for_completion(&file->fcomp);
+		goto out;
+	}
 
 	mutex_lock(&file->device->lists_mutex);
 	ucontext = file->ucontext;
@@ -965,10 +977,11 @@ static int ib_uverbs_close(struct inode *inode, struct file *filp)
 	mutex_unlock(&file->device->lists_mutex);
 	if (ucontext)
 		ib_uverbs_cleanup_ucontext(file, ucontext);
+	srcu_read_unlock(&dev->disassociate_srcu, srcu_key);
 
 	if (file->async_file)
 		kref_put(&file->async_file->ref, ib_uverbs_release_event_file);
-
+out:
 	kref_put(&file->ref, ib_uverbs_release_file);
 	kobject_put(&dev->kobj);
 
@@ -1199,6 +1212,7 @@ static void ib_uverbs_free_hw_resources(struct ib_uverbs_device *uverbs_dev,
 		}
 
 		mutex_lock(&uverbs_dev->lists_mutex);
+		complete(&file->fcomp);
 		kref_put(&file->ref, ib_uverbs_release_file);
 	}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH V4] IB/uverbs: Fix race between uverbs_close and remove_one
       [not found] ` <1457795927-16634-1-git-send-email-devesh.sharma-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
@ 2016-03-12 20:45   ` Jason Gunthorpe
       [not found]     ` <20160312204502.GA8346-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2016-03-12 20:45 UTC (permalink / raw)
  To: Devesh Sharma
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w

On Sat, Mar 12, 2016 at 10:18:47AM -0500, Devesh Sharma wrote:
> CC: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

I'm still pretty convinced this is wrong... But even still:

> @@ -954,6 +955,17 @@ static int ib_uverbs_close(struct inode *inode, struct file *filp)
>  	struct ib_uverbs_file *file = filp->private_data;
>  	struct ib_uverbs_device *dev = file->device;
>  	struct ib_ucontext *ucontext = NULL;
> +	struct ib_device *ib_dev;
> +	int srcu_key;
> +
> +	srcu_key = srcu_read_lock(&dev->disassociate_srcu);
> +	ib_dev = srcu_dereference(dev->ib_dev,
> +				  &dev->disassociate_srcu);
> +	if (!ib_dev) {
> +		srcu_read_unlock(&dev->disassociate_srcu, srcu_key);
> +		wait_for_completion(&file->fcomp);
> +		goto out;

This jumps over this:

>  	if (file->async_file)
>  		kref_put(&file->async_file->ref, ib_uverbs_release_event_file);

Which doesn't seem right.

As I've said, I'm not sure how this is any different from using
lists_mutex. The wait_for_completion will still block and deadlock if
ib_uverbs_close is entered during the disassociate flow.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V4] IB/uverbs: Fix race between uverbs_close and remove_one
       [not found]     ` <20160312204502.GA8346-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-03-14  5:40       ` Devesh Sharma
       [not found]         ` <CANjDDBiN8X-Efgp6s2wyT1G6fpQZjdreW0pnnBG71E9jjhy-YA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Devesh Sharma @ 2016-03-14  5:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Yishai Hadas

On Sun, Mar 13, 2016 at 2:15 AM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
>
> On Sat, Mar 12, 2016 at 10:18:47AM -0500, Devesh Sharma wrote:
> > CC: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>
> I'm still pretty convinced this is wrong... But even still:
>
> > @@ -954,6 +955,17 @@ static int ib_uverbs_close(struct inode *inode, struct file *filp)
> >       struct ib_uverbs_file *file = filp->private_data;
> >       struct ib_uverbs_device *dev = file->device;
> >       struct ib_ucontext *ucontext = NULL;
> > +     struct ib_device *ib_dev;
> > +     int srcu_key;
> > +
> > +     srcu_key = srcu_read_lock(&dev->disassociate_srcu);
> > +     ib_dev = srcu_dereference(dev->ib_dev,
> > +                               &dev->disassociate_srcu);
> > +     if (!ib_dev) {
> > +             srcu_read_unlock(&dev->disassociate_srcu, srcu_key);
> > +             wait_for_completion(&file->fcomp);
> > +             goto out;
>
> This jumps over this:

I am not sure, to my mind kref_put(...,ib_uverbs_release_file) should
also be skipped. Because,
by putting an wait_for_completion(), this context is effectively
waiting for ib_uverbs_cleanup_ucontext
to finish cleaning up this file pointer. if the other thread is
cleaning up, why do I need to put the kerf again?

>
> >       if (file->async_file)
> >               kref_put(&file->async_file->ref, ib_uverbs_release_event_file);
>
> Which doesn't seem right.
>
> As I've said, I'm not sure how this is any different from using
> lists_mutex. The wait_for_completion will still block and deadlock if
> ib_uverbs_close is entered during the disassociate flow.

Taking list-mutex is not stopping ib_dev pointer to become NULL, while
if we split the mutex
and wait_for_completion(), then effectively we are trying to sync
ib_uverb_close() and
remove_one() in such a way that no ib_uverb_close context hit ib_dev = NULL

I hope I was able to put the situation in simpler words?

Can you please explain how it can lead to a deadlock?


>
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V4] IB/uverbs: Fix race between uverbs_close and remove_one
       [not found]         ` <CANjDDBiN8X-Efgp6s2wyT1G6fpQZjdreW0pnnBG71E9jjhy-YA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-03-14 17:48           ` Jason Gunthorpe
       [not found]             ` <20160314174814.GB5240-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2016-03-14 17:48 UTC (permalink / raw)
  To: Devesh Sharma
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Yishai Hadas

On Mon, Mar 14, 2016 at 11:10:33AM +0530, Devesh Sharma wrote:
> On Sun, Mar 13, 2016 at 2:15 AM, Jason Gunthorpe
> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> >
> > On Sat, Mar 12, 2016 at 10:18:47AM -0500, Devesh Sharma wrote:
> > > CC: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >
> > I'm still pretty convinced this is wrong... But even still:
> >
> > > @@ -954,6 +955,17 @@ static int ib_uverbs_close(struct inode *inode, struct file *filp)
> > >       struct ib_uverbs_file *file = filp->private_data;
> > >       struct ib_uverbs_device *dev = file->device;
> > >       struct ib_ucontext *ucontext = NULL;
> > > +     struct ib_device *ib_dev;
> > > +     int srcu_key;
> > > +
> > > +     srcu_key = srcu_read_lock(&dev->disassociate_srcu);
> > > +     ib_dev = srcu_dereference(dev->ib_dev,
> > > +                               &dev->disassociate_srcu);
> > > +     if (!ib_dev) {
> > > +             srcu_read_unlock(&dev->disassociate_srcu, srcu_key);
> > > +             wait_for_completion(&file->fcomp);
> > > +             goto out;
> >
> > This jumps over this:
> 
> I am not sure, to my mind kref_put(...,ib_uverbs_release_file) should
> also be skipped.

I am talking about ib_uverbs_release_event_file

> Because, by putting an wait_for_completion(), this context is
> effectively waiting for ib_uverbs_cleanup_ucontext to finish
> cleaning up this file pointer. if the other thread is cleaning up,
> why do I need to put the kerf again?

The other context doesn't do a balancing kref_put(..,ib_uverbs_release_event_file).

> > As I've said, I'm not sure how this is any different from using
> > lists_mutex. The wait_for_completion will still block and deadlock if
> > ib_uverbs_close is entered during the disassociate flow.
> 
> Taking list-mutex is not stopping ib_dev pointer to become NULL, while
> if we split the mutex

I don't think you understand the problem. ib_uverbs_device->ib_dev can
be NULL just fine. In fact, it is always NULL when
ib_uverbs_free_hw_resources calls ib_uverbs_cleanup_ucontext - that is
obviously fine (or if it isn't fine today, there is yet another bug).

The issue appears to be that ib_uverbs_free_hw_resources does not wait
for ib_uverbs_cleanup_ucontext to complete before it goes ahead and
wrecks the ib_dev, resulting in use-after-free/etc on copies of ib_dev
pointer that are used by the still running ib_uverbs_free_hw_resources.

> and wait_for_completion(), then effectively we are trying to sync
> ib_uverb_close() and
> remove_one() in such a way that no ib_uverb_close context hit ib_dev = NULL

No, that is the wrong problem statement and wrong solution.

The solution is to strong fence ib_uverbs_cleanup_ucontext so that
ib_uverbs_free_hw_resources does not exit until it is completed,
either by its thread or in the close thread.

I prefer a mutex, but perhaps there are other ways to build the
fence (eg uverbs_dev->refcount springs to mind)

> Can you please explain how it can lead to a deadlock?

Yishai's note outlines it:

		/* We must release the mutex before going ahead and calling
		 * disassociate_ucontext. disassociate_ucontext might end up
		 * indirectly calling uverbs_close, for example due to freeing
		 * the resources (e.g mmput).

Meaning when we call ib_uverbs_cleanup_ucontext from within
ib_uverbs_free_hw_resources it may recurse down into ib_uverbs_close.

If this happens, with your patch ib_uverbs_close will wait on the
completion in the same thread that is supposed to trigger it. That is
the same deadlock as would happen if the lists_mutex would be used.

The last patch I sent is much closer to what is needed, it is
just completely wrong to try and use the srcu for this.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V4] IB/uverbs: Fix race between uverbs_close and remove_one
       [not found]             ` <20160314174814.GB5240-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-03-15  9:00               ` Devesh Sharma
       [not found]                 ` <CANjDDBhVH10=0nSr0q4P4imAX6YrFBhE7QEd4ccRe2yUhN0pQQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Devesh Sharma @ 2016-03-15  9:00 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Yishai Hadas

On Mon, Mar 14, 2016 at 11:18 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Mon, Mar 14, 2016 at 11:10:33AM +0530, Devesh Sharma wrote:
>> On Sun, Mar 13, 2016 at 2:15 AM, Jason Gunthorpe
>> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
>> >
>> > On Sat, Mar 12, 2016 at 10:18:47AM -0500, Devesh Sharma wrote:
>> > > CC: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> >
>> > I'm still pretty convinced this is wrong... But even still:
>> >
>> > > @@ -954,6 +955,17 @@ static int ib_uverbs_close(struct inode *inode, struct file *filp)
>> > >       struct ib_uverbs_file *file = filp->private_data;
>> > >       struct ib_uverbs_device *dev = file->device;
>> > >       struct ib_ucontext *ucontext = NULL;
>> > > +     struct ib_device *ib_dev;
>> > > +     int srcu_key;
>> > > +
>> > > +     srcu_key = srcu_read_lock(&dev->disassociate_srcu);
>> > > +     ib_dev = srcu_dereference(dev->ib_dev,
>> > > +                               &dev->disassociate_srcu);
>> > > +     if (!ib_dev) {
>> > > +             srcu_read_unlock(&dev->disassociate_srcu, srcu_key);
>> > > +             wait_for_completion(&file->fcomp);
>> > > +             goto out;
>> >
>> > This jumps over this:
>>
>> I am not sure, to my mind kref_put(...,ib_uverbs_release_file) should
>> also be skipped.
>
> I am talking about ib_uverbs_release_event_file

And I am adding ib_uverbs_release_file() as well. the other thread is already
cleaning it up and if ib_uverbs_close() reads it NULL then why to put krefs?

>
>> Because, by putting an wait_for_completion(), this context is
>> effectively waiting for ib_uverbs_cleanup_ucontext to finish
>> cleaning up this file pointer. if the other thread is cleaning up,
>> why do I need to put the kerf again?
>
> The other context doesn't do a balancing kref_put(..,ib_uverbs_release_event_file).

Actually it does, in the very next while loop on event_file list.

>
>> > As I've said, I'm not sure how this is any different from using
>> > lists_mutex. The wait_for_completion will still block and deadlock if
>> > ib_uverbs_close is entered during the disassociate flow.
>>
>> Taking list-mutex is not stopping ib_dev pointer to become NULL, while
>> if we split the mutex
>
> I don't think you understand the problem. ib_uverbs_device->ib_dev can
> be NULL just fine. In fact, it is always NULL when
> ib_uverbs_free_hw_resources calls ib_uverbs_cleanup_ucontext - that is
> obviously fine (or if it isn't fine today, there is yet another bug).

Okay, I meant it is being freed. before entring
ib_uverb_free_hw_resource() it is obviously set to NULL
which is well understood.

>
> The issue appears to be that ib_uverbs_free_hw_resources does not wait
> for ib_uverbs_cleanup_ucontext to complete before it goes ahead and
> wrecks the ib_dev, resulting in use-after-free/etc on copies of ib_dev
> pointer that are used by the still running ib_uverbs_free_hw_resources.

Exactly, that is the real problem. I have explained it in the description of V4.

>
>> and wait_for_completion(), then effectively we are trying to sync
>> ib_uverb_close() and
>> remove_one() in such a way that no ib_uverb_close context hit ib_dev = NULL
>
> No, that is the wrong problem statement and wrong solution.
>
> The solution is to strong fence ib_uverbs_cleanup_ucontext so that
> ib_uverbs_free_hw_resources does not exit until it is completed,
> either by its thread or in the close thread.
>
> I prefer a mutex, but perhaps there are other ways to build the
> fence (eg uverbs_dev->refcount springs to mind)

uverbs_dev->refcount is already there, we can choose to wait for
ref_count to become zero

static void ib_uverbs_remove_one(struct ib_device *device, void *client_data)
.
.
.
.
.
                rcu_assign_pointer(uverbs_dev->ib_dev, NULL);
                ib_uverbs_free_hw_resources(uverbs_dev, device);
                wait_clients = 0;
        }

wait for zero here instead of dec_and_test, I think things will
automatically fall in place after that

        if (atomic_dec_and_test(&uverbs_dev->refcount))
                ib_uverbs_comp_dev(uverbs_dev);
        if (wait_clients)
                wait_for_completion(&uverbs_dev->comp);
        kobject_put(&uverbs_dev->kobj);
}

>
>> Can you please explain how it can lead to a deadlock?
>
> Yishai's note outlines it:
>
>                 /* We must release the mutex before going ahead and calling
>                  * disassociate_ucontext. disassociate_ucontext might end up
>                  * indirectly calling uverbs_close, for example due to freeing
>                  * the resources (e.g mmput).
>
> Meaning when we call ib_uverbs_cleanup_ucontext from within
> ib_uverbs_free_hw_resources it may recurse down into ib_uverbs_close.
>
> If this happens, with your patch ib_uverbs_close will wait on the
> completion in the same thread that is supposed to trigger it. That is
> the same deadlock as would happen if the lists_mutex would be used.
>
> The last patch I sent is much closer to what is needed, it is
> just completely wrong to try and use the srcu for this.
>
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V4] IB/uverbs: Fix race between uverbs_close and remove_one
       [not found]                 ` <CANjDDBhVH10=0nSr0q4P4imAX6YrFBhE7QEd4ccRe2yUhN0pQQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-03-15  9:03                   ` Devesh Sharma
  2016-03-15 20:31                   ` Jason Gunthorpe
  1 sibling, 0 replies; 15+ messages in thread
From: Devesh Sharma @ 2016-03-15  9:03 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Yishai Hadas

Sorry, I pressed send too early, just one more point I wanted to make,
listed below

On Tue, Mar 15, 2016 at 2:30 PM, Devesh Sharma
<devesh.sharma-dY08KVG/lbpWk0Htik3J/w@public.gmane.org> wrote:
> On Mon, Mar 14, 2016 at 11:18 PM, Jason Gunthorpe
> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
>> On Mon, Mar 14, 2016 at 11:10:33AM +0530, Devesh Sharma wrote:
>>> On Sun, Mar 13, 2016 at 2:15 AM, Jason Gunthorpe
>>> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
>>> >
>>> > On Sat, Mar 12, 2016 at 10:18:47AM -0500, Devesh Sharma wrote:
>>> > > CC: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>> >
>>> > I'm still pretty convinced this is wrong... But even still:
>>> >
>>> > > @@ -954,6 +955,17 @@ static int ib_uverbs_close(struct inode *inode, struct file *filp)
>>> > >       struct ib_uverbs_file *file = filp->private_data;
>>> > >       struct ib_uverbs_device *dev = file->device;
>>> > >       struct ib_ucontext *ucontext = NULL;
>>> > > +     struct ib_device *ib_dev;
>>> > > +     int srcu_key;
>>> > > +
>>> > > +     srcu_key = srcu_read_lock(&dev->disassociate_srcu);
>>> > > +     ib_dev = srcu_dereference(dev->ib_dev,
>>> > > +                               &dev->disassociate_srcu);
>>> > > +     if (!ib_dev) {
>>> > > +             srcu_read_unlock(&dev->disassociate_srcu, srcu_key);
>>> > > +             wait_for_completion(&file->fcomp);
>>> > > +             goto out;
>>> >
>>> > This jumps over this:
>>>
>>> I am not sure, to my mind kref_put(...,ib_uverbs_release_file) should
>>> also be skipped.
>>
>> I am talking about ib_uverbs_release_event_file
>
> And I am adding ib_uverbs_release_file() as well. the other thread is already
> cleaning it up and if ib_uverbs_close() reads it NULL then why to put krefs?
>
>>
>>> Because, by putting an wait_for_completion(), this context is
>>> effectively waiting for ib_uverbs_cleanup_ucontext to finish
>>> cleaning up this file pointer. if the other thread is cleaning up,
>>> why do I need to put the kerf again?
>>
>> The other context doesn't do a balancing kref_put(..,ib_uverbs_release_event_file).
>
> Actually it does, in the very next while loop on event_file list.
>
>>
>>> > As I've said, I'm not sure how this is any different from using
>>> > lists_mutex. The wait_for_completion will still block and deadlock if
>>> > ib_uverbs_close is entered during the disassociate flow.
>>>
>>> Taking list-mutex is not stopping ib_dev pointer to become NULL, while
>>> if we split the mutex
>>
>> I don't think you understand the problem. ib_uverbs_device->ib_dev can
>> be NULL just fine. In fact, it is always NULL when
>> ib_uverbs_free_hw_resources calls ib_uverbs_cleanup_ucontext - that is
>> obviously fine (or if it isn't fine today, there is yet another bug).
>
> Okay, I meant it is being freed. before entring
> ib_uverb_free_hw_resource() it is obviously set to NULL
> which is well understood.
>
>>
>> The issue appears to be that ib_uverbs_free_hw_resources does not wait
>> for ib_uverbs_cleanup_ucontext to complete before it goes ahead and
>> wrecks the ib_dev, resulting in use-after-free/etc on copies of ib_dev
>> pointer that are used by the still running ib_uverbs_free_hw_resources.
>
> Exactly, that is the real problem. I have explained it in the description of V4.
>
>>
>>> and wait_for_completion(), then effectively we are trying to sync
>>> ib_uverb_close() and
>>> remove_one() in such a way that no ib_uverb_close context hit ib_dev = NULL
>>
>> No, that is the wrong problem statement and wrong solution.
>>
>> The solution is to strong fence ib_uverbs_cleanup_ucontext so that
>> ib_uverbs_free_hw_resources does not exit until it is completed,
>> either by its thread or in the close thread.
>>
>> I prefer a mutex, but perhaps there are other ways to build the
>> fence (eg uverbs_dev->refcount springs to mind)
>
> uverbs_dev->refcount is already there, we can choose to wait for
> ref_count to become zero
>
> static void ib_uverbs_remove_one(struct ib_device *device, void *client_data)
> .
> .
> .
> .
> .
>                 rcu_assign_pointer(uverbs_dev->ib_dev, NULL);
>                 ib_uverbs_free_hw_resources(uverbs_dev, device);
>                 wait_clients = 0;
>         }
>
> wait for zero here instead of dec_and_test, I think things will
> automatically fall in place after that

Off-course ib_uverb_close() will have to take the ref-count on entry and
drop on exit.

>
>         if (atomic_dec_and_test(&uverbs_dev->refcount))
>                 ib_uverbs_comp_dev(uverbs_dev);
>         if (wait_clients)
>                 wait_for_completion(&uverbs_dev->comp);
>         kobject_put(&uverbs_dev->kobj);
> }
>
>>
>>> Can you please explain how it can lead to a deadlock?
>>
>> Yishai's note outlines it:
>>
>>                 /* We must release the mutex before going ahead and calling
>>                  * disassociate_ucontext. disassociate_ucontext might end up
>>                  * indirectly calling uverbs_close, for example due to freeing
>>                  * the resources (e.g mmput).
>>
>> Meaning when we call ib_uverbs_cleanup_ucontext from within
>> ib_uverbs_free_hw_resources it may recurse down into ib_uverbs_close.
>>
>> If this happens, with your patch ib_uverbs_close will wait on the
>> completion in the same thread that is supposed to trigger it. That is
>> the same deadlock as would happen if the lists_mutex would be used.
>>
>> The last patch I sent is much closer to what is needed, it is
>> just completely wrong to try and use the srcu for this.
>>
>> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V4] IB/uverbs: Fix race between uverbs_close and remove_one
       [not found]                 ` <CANjDDBhVH10=0nSr0q4P4imAX6YrFBhE7QEd4ccRe2yUhN0pQQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2016-03-15  9:03                   ` Devesh Sharma
@ 2016-03-15 20:31                   ` Jason Gunthorpe
       [not found]                     ` <20160315203112.GA2786-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2016-03-15 20:31 UTC (permalink / raw)
  To: Devesh Sharma
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Yishai Hadas

On Tue, Mar 15, 2016 at 02:30:43PM +0530, Devesh Sharma wrote:
> >> I am not sure, to my mind kref_put(...,ib_uverbs_release_file) should
> >> also be skipped.
> >
> > I am talking about ib_uverbs_release_event_file
> 
> And I am adding ib_uverbs_release_file() as well. the other thread is already
> cleaning it up and if ib_uverbs_close() reads it NULL then why to put krefs?

> > The other context doesn't do a balancing kref_put(..,ib_uverbs_release_event_file).
> 
> Actually it does, in the very next while loop on event_file list.

I really don't understand your comments.

Maybe you can find some code snippits to explain this.

The krefs in ib_uverbs_event_close are unconditional. If that isn't
right, then there is another bug that needs to be explained. It sure
looks right to me as is.

The patch makes them conditional without any matching change
elsewhere, so it just simply must be wrong.

> > I prefer a mutex, but perhaps there are other ways to build the
> > fence (eg uverbs_dev->refcount springs to mind)
> 
> uverbs_dev->refcount is already there, we can choose to wait for
> ref_count to become zero

> wait for zero here instead of dec_and_test, I think things will
> automatically fall in place after that

More than that would be needed, the refcount is currently always being
held for the full life time of the ib_uverbs_device and the wait is
conditional. You'd have to restructure things so the wait becomes
unconditional and the refcount is conditionally held across the proper
times - eg only during close for the disassociate_ucontext mode.

Note sure this is prettier than a new mutex...

The fundamental thing is that the ib_uverbs_free_hw_resources thread
must wait for ib_uverbs_close's ib_uverbs_cleanup_ucontext to finish,
if necessary, not the other way around.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V4] IB/uverbs: Fix race between uverbs_close and remove_one
       [not found]                     ` <20160315203112.GA2786-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-03-17 16:08                       ` Devesh Sharma
       [not found]                         ` <CANjDDBj8aZTeAfwxFuyk9r=kdihfxpxDA69-c4uVxkvcAfXViw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Devesh Sharma @ 2016-03-17 16:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Yishai Hadas

On Wed, Mar 16, 2016 at 2:01 AM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
>
> The patch makes them conditional without any matching change
> elsewhere, so it just simply must be wrong.
>
> > > I prefer a mutex, but perhaps there are other ways to build the
> > > fence (eg uverbs_dev->refcount springs to mind)
> >
> > uverbs_dev->refcount is already there, we can choose to wait for
> > ref_count to become zero
>
> > wait for zero here instead of dec_and_test, I think things will
> > automatically fall in place after that
>
> More than that would be needed, the refcount is currently always being
> held for the full life time of the ib_uverbs_device and the wait is
> conditional. You'd have to restructure things so the wait becomes
> unconditional and the refcount is conditionally held across the proper
> times - eg only during close for the disassociate_ucontext mode.

Yes, I also have pretty much the same understanding.

>
> Note sure this is prettier than a new mutex...

To my mind mutex is *not* solving the problem completely unless we
make it a coarser grained lock. The possible deadlock problem still
lingers around it.

>
> The fundamental thing is that the ib_uverbs_free_hw_resources thread
> must wait for ib_uverbs_close's ib_uverbs_cleanup_ucontext to finish,
> if necessary, not the other way around.

Makes sense.

>
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V4] IB/uverbs: Fix race between uverbs_close and remove_one
       [not found]                         ` <CANjDDBj8aZTeAfwxFuyk9r=kdihfxpxDA69-c4uVxkvcAfXViw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-03-17 16:12                           ` Jason Gunthorpe
       [not found]                             ` <20160317161237.GB19501-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2016-03-17 16:12 UTC (permalink / raw)
  To: Devesh Sharma
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Yishai Hadas

On Thu, Mar 17, 2016 at 09:38:30PM +0530, Devesh Sharma wrote:

> To my mind mutex is *not* solving the problem completely unless we
> make it a coarser grained lock. The possible deadlock problem still
> lingers around it.

Review the last version I sent, with this statement in mind:

> > The fundamental thing is that the ib_uverbs_free_hw_resources thread
> > must wait for ib_uverbs_close's ib_uverbs_cleanup_ucontext to finish,
> > if necessary, not the other way around.

It sure doesn't look like it can deadlock...

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V4] IB/uverbs: Fix race between uverbs_close and remove_one
       [not found]                             ` <20160317161237.GB19501-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-03-17 16:31                               ` Devesh Sharma
       [not found]                                 ` <CANjDDBhYH7HsUyP8-Ko7G0tnWxYDGYCGgaC4HK0Eod_isvoWAA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Devesh Sharma @ 2016-03-17 16:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Yishai Hadas

On Thu, Mar 17, 2016 at 9:42 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Thu, Mar 17, 2016 at 09:38:30PM +0530, Devesh Sharma wrote:
>
>> To my mind mutex is *not* solving the problem completely unless we
>> make it a coarser grained lock. The possible deadlock problem still
>> lingers around it.
>
> Review the last version I sent, with this statement in mind:

I am sorry I lost the track of it, which one you are point..we have
been discussion for a quite some time now!

>
>> > The fundamental thing is that the ib_uverbs_free_hw_resources thread
>> > must wait for ib_uverbs_close's ib_uverbs_cleanup_ucontext to finish,
>> > if necessary, not the other way around.
>
> It sure doesn't look like it can deadlock...
>
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V4] IB/uverbs: Fix race between uverbs_close and remove_one
       [not found]                                 ` <CANjDDBhYH7HsUyP8-Ko7G0tnWxYDGYCGgaC4HK0Eod_isvoWAA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-03-17 16:48                                   ` Jason Gunthorpe
       [not found]                                     ` <20160317164831.GI19501-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2016-03-17 16:48 UTC (permalink / raw)
  To: Devesh Sharma
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Yishai Hadas

On Thu, Mar 17, 2016 at 10:01:55PM +0530, Devesh Sharma wrote:
> On Thu, Mar 17, 2016 at 9:42 PM, Jason Gunthorpe
> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> > On Thu, Mar 17, 2016 at 09:38:30PM +0530, Devesh Sharma wrote:
> >
> >> To my mind mutex is *not* solving the problem completely unless we
> >> make it a coarser grained lock. The possible deadlock problem still
> >> lingers around it.
> >
> > Review the last version I sent, with this statement in mind:
> 
> I am sorry I lost the track of it, which one you are point..we have
> been discussion for a quite some time now!

Not saying it is perfect, but it should be close:

--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -953,18 +953,20 @@ static int ib_uverbs_close(struct inode *inode, struct file *filp)
 {
 	struct ib_uverbs_file *file = filp->private_data;
 	struct ib_uverbs_device *dev = file->device;
-	struct ib_ucontext *ucontext = NULL;
+
+	mutex_lock(&file->cleanup_mutex);
+	if (file->ucontext) {
+		ib_uverbs_cleanup_ucontext(file, file->ucontext);
+		file->ucontext = NULL;
+	}
+	mutex_unlock(&file->cleanup_mutex);
 
 	mutex_lock(&file->device->lists_mutex);
-	ucontext = file->ucontext;
-	file->ucontext = NULL;
 	if (!file->is_closed) {
 		list_del(&file->list);
 		file->is_closed = 1;
 	}
 	mutex_unlock(&file->device->lists_mutex);
-	if (ucontext)
-		ib_uverbs_cleanup_ucontext(file, ucontext);
 
 	if (file->async_file)
 		kref_put(&file->async_file->ref, ib_uverbs_release_event_file);
@@ -1177,26 +1179,28 @@ static void ib_uverbs_free_hw_resources(struct ib_uverbs_device *uverbs_dev,
 
 	mutex_lock(&uverbs_dev->lists_mutex);
 	while (!list_empty(&uverbs_dev->uverbs_file_list)) {
-		struct ib_ucontext *ucontext;
-
 		file = list_first_entry(&uverbs_dev->uverbs_file_list,
 					struct ib_uverbs_file, list);
 		file->is_closed = 1;
-		ucontext = file->ucontext;
 		list_del(&file->list);
-		file->ucontext = NULL;
 		kref_get(&file->ref);
 		mutex_unlock(&uverbs_dev->lists_mutex);
+
 		/* We must release the mutex before going ahead and calling
 		 * disassociate_ucontext. disassociate_ucontext might end up
 		 * indirectly calling uverbs_close, for example due to freeing
 		 * the resources (e.g mmput).
 		 */
 		ib_uverbs_event_handler(&file->event_handler, &event);
-		if (ucontext) {
-			ib_dev->disassociate_ucontext(ucontext);
-			ib_uverbs_cleanup_ucontext(file, ucontext);
+		mutex_lock(&file->cleanup_mutex);
+		if (file->ucontext) {
+			file->ucontext = NULL;
+			mutex_unlock(&file->cleanup_mutex);
+			ib_dev->disassociate_ucontext(file->ucontext);
+			ib_uverbs_cleanup_ucontext(file, file->ucontext);
 		}
+		else
+			mutex_unlock(&file->cleanup_mutex);
 
 		mutex_lock(&uverbs_dev->lists_mutex);
 		kref_put(&file->ref, ib_uverbs_release_file);
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V4] IB/uverbs: Fix race between uverbs_close and remove_one
       [not found]                                     ` <20160317164831.GI19501-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-03-21  8:59                                       ` Haggai Eran
  2016-04-26 14:33                                       ` Doug Ledford
  1 sibling, 0 replies; 15+ messages in thread
From: Haggai Eran @ 2016-03-21  8:59 UTC (permalink / raw)
  To: Jason Gunthorpe, Devesh Sharma
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Yishai Hadas

On 3/17/2016 6:48 PM, Jason Gunthorpe wrote:
> +		if (file->ucontext) {
> +			file->ucontext = NULL;
> +			mutex_unlock(&file->cleanup_mutex);
> +			ib_dev->disassociate_ucontext(file->ucontext);
> +			ib_uverbs_cleanup_ucontext(file, file->ucontext);
>  		}

The general locking scheme sounds good to me, but obviously you need to keep
the original value of file->ucontext before clearing it.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V4] IB/uverbs: Fix race between uverbs_close and remove_one
       [not found]                                     ` <20160317164831.GI19501-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2016-03-21  8:59                                       ` Haggai Eran
@ 2016-04-26 14:33                                       ` Doug Ledford
       [not found]                                         ` <571F7C41.70700-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 15+ messages in thread
From: Doug Ledford @ 2016-04-26 14:33 UTC (permalink / raw)
  To: Jason Gunthorpe, Devesh Sharma
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Yishai Hadas


[-- Attachment #1.1: Type: text/plain, Size: 3294 bytes --]

On 3/17/2016 12:48 PM, Jason Gunthorpe wrote:
> On Thu, Mar 17, 2016 at 10:01:55PM +0530, Devesh Sharma wrote:
>> On Thu, Mar 17, 2016 at 9:42 PM, Jason Gunthorpe
>> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
>>> On Thu, Mar 17, 2016 at 09:38:30PM +0530, Devesh Sharma wrote:
>>>
>>>> To my mind mutex is *not* solving the problem completely unless we
>>>> make it a coarser grained lock. The possible deadlock problem still
>>>> lingers around it.
>>>
>>> Review the last version I sent, with this statement in mind:
>>
>> I am sorry I lost the track of it, which one you are point..we have
>> been discussion for a quite some time now!
> 
> Not saying it is perfect, but it should be close:

Jason, this is your patch since you completely tossed Devesh's work and
did your own.  Can you please post a corrected version of this with a
proper changelog and Signed-off-by.  Thanks.

> --- a/drivers/infiniband/core/uverbs_main.c
> +++ b/drivers/infiniband/core/uverbs_main.c
> @@ -953,18 +953,20 @@ static int ib_uverbs_close(struct inode *inode, struct file *filp)
>  {
>  	struct ib_uverbs_file *file = filp->private_data;
>  	struct ib_uverbs_device *dev = file->device;
> -	struct ib_ucontext *ucontext = NULL;
> +
> +	mutex_lock(&file->cleanup_mutex);
> +	if (file->ucontext) {
> +		ib_uverbs_cleanup_ucontext(file, file->ucontext);
> +		file->ucontext = NULL;
> +	}
> +	mutex_unlock(&file->cleanup_mutex);
>  
>  	mutex_lock(&file->device->lists_mutex);
> -	ucontext = file->ucontext;
> -	file->ucontext = NULL;
>  	if (!file->is_closed) {
>  		list_del(&file->list);
>  		file->is_closed = 1;
>  	}
>  	mutex_unlock(&file->device->lists_mutex);
> -	if (ucontext)
> -		ib_uverbs_cleanup_ucontext(file, ucontext);
>  
>  	if (file->async_file)
>  		kref_put(&file->async_file->ref, ib_uverbs_release_event_file);
> @@ -1177,26 +1179,28 @@ static void ib_uverbs_free_hw_resources(struct ib_uverbs_device *uverbs_dev,
>  
>  	mutex_lock(&uverbs_dev->lists_mutex);
>  	while (!list_empty(&uverbs_dev->uverbs_file_list)) {
> -		struct ib_ucontext *ucontext;
> -
>  		file = list_first_entry(&uverbs_dev->uverbs_file_list,
>  					struct ib_uverbs_file, list);
>  		file->is_closed = 1;
> -		ucontext = file->ucontext;
>  		list_del(&file->list);
> -		file->ucontext = NULL;
>  		kref_get(&file->ref);
>  		mutex_unlock(&uverbs_dev->lists_mutex);
> +
>  		/* We must release the mutex before going ahead and calling
>  		 * disassociate_ucontext. disassociate_ucontext might end up
>  		 * indirectly calling uverbs_close, for example due to freeing
>  		 * the resources (e.g mmput).
>  		 */
>  		ib_uverbs_event_handler(&file->event_handler, &event);
> -		if (ucontext) {
> -			ib_dev->disassociate_ucontext(ucontext);
> -			ib_uverbs_cleanup_ucontext(file, ucontext);
> +		mutex_lock(&file->cleanup_mutex);
> +		if (file->ucontext) {
> +			file->ucontext = NULL;
> +			mutex_unlock(&file->cleanup_mutex);
> +			ib_dev->disassociate_ucontext(file->ucontext);
> +			ib_uverbs_cleanup_ucontext(file, file->ucontext);
>  		}
> +		else
> +			mutex_unlock(&file->cleanup_mutex);
>  
>  		mutex_lock(&uverbs_dev->lists_mutex);
>  		kref_put(&file->ref, ib_uverbs_release_file);
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V4] IB/uverbs: Fix race between uverbs_close and remove_one
       [not found]                                         ` <571F7C41.70700-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-04-26 15:18                                           ` Jason Gunthorpe
       [not found]                                             ` <20160426151851.GC24104-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2016-04-26 15:18 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Devesh Sharma, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Yishai Hadas

On Tue, Apr 26, 2016 at 10:33:37AM -0400, Doug Ledford wrote:
> On 3/17/2016 12:48 PM, Jason Gunthorpe wrote:
> > On Thu, Mar 17, 2016 at 10:01:55PM +0530, Devesh Sharma wrote:
> >> On Thu, Mar 17, 2016 at 9:42 PM, Jason Gunthorpe
> >> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> >>> On Thu, Mar 17, 2016 at 09:38:30PM +0530, Devesh Sharma wrote:
> >>>
> >>>> To my mind mutex is *not* solving the problem completely unless we
> >>>> make it a coarser grained lock. The possible deadlock problem still
> >>>> lingers around it.
> >>>
> >>> Review the last version I sent, with this statement in mind:
> >>
> >> I am sorry I lost the track of it, which one you are point..we have
> >> been discussion for a quite some time now!
> > 
> > Not saying it is perfect, but it should be close:
> 
> Jason, this is your patch since you completely tossed Devesh's work and
> did your own.  Can you please post a corrected version of this with a
> proper changelog and Signed-off-by.  Thanks.

This was just a sketch/hint how to do it properly, it doesn't even
compile, and Devesh needs to test it before going any further.

Since this seems stuck, here is a complete version that does compile
and might even work..

>From 5b19ced7a4076807284fe76a76b63a9093b590aa Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
Date: Tue, 26 Apr 2016 09:16:01 -0600
Subject: [PATCH] IB/uverbs: Fix race between uverbs_close and remove_one

Fixes an oops that can happen if uverbs_close races with remove_one:

[67140.260665]  [<ffffffff810c16a0>] ? prepare_to_wait_event+0xf0/0xf0
[67140.268337]  [<ffffffffa04cabc3>] ? ib_dereg_mr+0x23/0x30 [ib_core]
[67140.276009]  [<ffffffffa03ee5f0>] ? ib_uverbs_cleanup_ucontext+0x320/0x440 [ib_uverbs]
[67140.285550]  [<ffffffffa03ee9e9>] ? ib_uverbs_close+0x59/0xb0 [ib_uverbs]
[67140.293807]  [<ffffffff811ff744>] ? __fput+0xe4/0x210

This is because both contexts are running ib_uverbs_cleanup_ucontext
concurrently as the locking scheme was not adaquate.

Directly protect ib_uverbs_cleanup_ucontext against concurrency with a
new lock.

Fixes: 35d4a0b63dc0 ("IB/uverbs: Fix race between ib_uverbs_open and remove_one")
Reported-by: Devesh Sharma <devesh.sharma-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Signed-off-by: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
---
 drivers/infiniband/core/uverbs.h      |  1 +
 drivers/infiniband/core/uverbs_main.c | 37 +++++++++++++++++++++++------------
 2 files changed, 25 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 612ccfd39bf9..f0f6c8fa4b5f 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -116,6 +116,7 @@ struct ib_uverbs_event_file {
 struct ib_uverbs_file {
 	struct kref				ref;
 	struct mutex				mutex;
+	struct mutex                            cleanup_mutex;
 	struct ib_uverbs_device		       *device;
 	struct ib_ucontext		       *ucontext;
 	struct ib_event_handler			event_handler;
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 39680aed99dd..43f268f2deb5 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -928,6 +928,7 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp)
 	file->async_file = NULL;
 	kref_init(&file->ref);
 	mutex_init(&file->mutex);
+	mutex_init(&file->cleanup_mutex);
 
 	filp->private_data = file;
 	kobject_get(&dev->kobj);
@@ -953,18 +954,20 @@ static int ib_uverbs_close(struct inode *inode, struct file *filp)
 {
 	struct ib_uverbs_file *file = filp->private_data;
 	struct ib_uverbs_device *dev = file->device;
-	struct ib_ucontext *ucontext = NULL;
+
+	mutex_lock(&file->cleanup_mutex);
+	if (file->ucontext) {
+		ib_uverbs_cleanup_ucontext(file, file->ucontext);
+		file->ucontext = NULL;
+	}
+	mutex_unlock(&file->cleanup_mutex);
 
 	mutex_lock(&file->device->lists_mutex);
-	ucontext = file->ucontext;
-	file->ucontext = NULL;
 	if (!file->is_closed) {
 		list_del(&file->list);
 		file->is_closed = 1;
 	}
 	mutex_unlock(&file->device->lists_mutex);
-	if (ucontext)
-		ib_uverbs_cleanup_ucontext(file, ucontext);
 
 	if (file->async_file)
 		kref_put(&file->async_file->ref, ib_uverbs_release_event_file);
@@ -1178,22 +1181,30 @@ static void ib_uverbs_free_hw_resources(struct ib_uverbs_device *uverbs_dev,
 	mutex_lock(&uverbs_dev->lists_mutex);
 	while (!list_empty(&uverbs_dev->uverbs_file_list)) {
 		struct ib_ucontext *ucontext;
-
 		file = list_first_entry(&uverbs_dev->uverbs_file_list,
 					struct ib_uverbs_file, list);
 		file->is_closed = 1;
-		ucontext = file->ucontext;
 		list_del(&file->list);
-		file->ucontext = NULL;
 		kref_get(&file->ref);
 		mutex_unlock(&uverbs_dev->lists_mutex);
-		/* We must release the mutex before going ahead and calling
-		 * disassociate_ucontext. disassociate_ucontext might end up
-		 * indirectly calling uverbs_close, for example due to freeing
-		 * the resources (e.g mmput).
-		 */
+
 		ib_uverbs_event_handler(&file->event_handler, &event);
+
+		mutex_lock(&file->cleanup_mutex);
+		ucontext = file->ucontext;
+		file->ucontext = NULL;
+		mutex_unlock(&file->cleanup_mutex);
+
+		/* At this point ib_uverbs_close cannnot be running
+		 * ib_uverbs_cleanup_ucontext
+		 */
 		if (ucontext) {
+			/* We must release the mutex before going ahead and
+			 * calling
+			 * disassociate_ucontext. disassociate_ucontext might
+			 * end up indirectly calling uverbs_close, for example
+			 * due to freeing the resources (e.g mmput).
+			 */
 			ib_dev->disassociate_ucontext(ucontext);
 			ib_uverbs_cleanup_ucontext(file, ucontext);
 		}
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH V4] IB/uverbs: Fix race between uverbs_close and remove_one
       [not found]                                             ` <20160426151851.GC24104-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-04-26 15:27                                               ` Doug Ledford
  0 siblings, 0 replies; 15+ messages in thread
From: Doug Ledford @ 2016-04-26 15:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Devesh Sharma, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Yishai Hadas


[-- Attachment #1.1: Type: text/plain, Size: 6317 bytes --]

On 4/26/2016 11:18 AM, Jason Gunthorpe wrote:
> On Tue, Apr 26, 2016 at 10:33:37AM -0400, Doug Ledford wrote:
>> On 3/17/2016 12:48 PM, Jason Gunthorpe wrote:
>>> On Thu, Mar 17, 2016 at 10:01:55PM +0530, Devesh Sharma wrote:
>>>> On Thu, Mar 17, 2016 at 9:42 PM, Jason Gunthorpe
>>>> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
>>>>> On Thu, Mar 17, 2016 at 09:38:30PM +0530, Devesh Sharma wrote:
>>>>>
>>>>>> To my mind mutex is *not* solving the problem completely unless we
>>>>>> make it a coarser grained lock. The possible deadlock problem still
>>>>>> lingers around it.
>>>>>
>>>>> Review the last version I sent, with this statement in mind:
>>>>
>>>> I am sorry I lost the track of it, which one you are point..we have
>>>> been discussion for a quite some time now!
>>>
>>> Not saying it is perfect, but it should be close:
>>
>> Jason, this is your patch since you completely tossed Devesh's work and
>> did your own.  Can you please post a corrected version of this with a
>> proper changelog and Signed-off-by.  Thanks.
> 
> This was just a sketch/hint how to do it properly, it doesn't even
> compile,

Yeah, that's what I was referring to when I said "corrected version".
Not the log, but the use-after-free issue.

> and Devesh needs to test it before going any further.
> 
> Since this seems stuck, here is a complete version that does compile
> and might even work..
> 
> From 5b19ced7a4076807284fe76a76b63a9093b590aa Mon Sep 17 00:00:00 2001
> From: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
> Date: Tue, 26 Apr 2016 09:16:01 -0600
> Subject: [PATCH] IB/uverbs: Fix race between uverbs_close and remove_one
> 
> Fixes an oops that can happen if uverbs_close races with remove_one:
> 
> [67140.260665]  [<ffffffff810c16a0>] ? prepare_to_wait_event+0xf0/0xf0
> [67140.268337]  [<ffffffffa04cabc3>] ? ib_dereg_mr+0x23/0x30 [ib_core]
> [67140.276009]  [<ffffffffa03ee5f0>] ? ib_uverbs_cleanup_ucontext+0x320/0x440 [ib_uverbs]
> [67140.285550]  [<ffffffffa03ee9e9>] ? ib_uverbs_close+0x59/0xb0 [ib_uverbs]
> [67140.293807]  [<ffffffff811ff744>] ? __fput+0xe4/0x210
> 
> This is because both contexts are running ib_uverbs_cleanup_ucontext
> concurrently as the locking scheme was not adaquate.
> 
> Directly protect ib_uverbs_cleanup_ucontext against concurrency with a
> new lock.
> 
> Fixes: 35d4a0b63dc0 ("IB/uverbs: Fix race between ib_uverbs_open and remove_one")
> Reported-by: Devesh Sharma <devesh.sharma-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
> ---
>  drivers/infiniband/core/uverbs.h      |  1 +
>  drivers/infiniband/core/uverbs_main.c | 37 +++++++++++++++++++++++------------
>  2 files changed, 25 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
> index 612ccfd39bf9..f0f6c8fa4b5f 100644
> --- a/drivers/infiniband/core/uverbs.h
> +++ b/drivers/infiniband/core/uverbs.h
> @@ -116,6 +116,7 @@ struct ib_uverbs_event_file {
>  struct ib_uverbs_file {
>  	struct kref				ref;
>  	struct mutex				mutex;
> +	struct mutex                            cleanup_mutex;
>  	struct ib_uverbs_device		       *device;
>  	struct ib_ucontext		       *ucontext;
>  	struct ib_event_handler			event_handler;
> diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
> index 39680aed99dd..43f268f2deb5 100644
> --- a/drivers/infiniband/core/uverbs_main.c
> +++ b/drivers/infiniband/core/uverbs_main.c
> @@ -928,6 +928,7 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp)
>  	file->async_file = NULL;
>  	kref_init(&file->ref);
>  	mutex_init(&file->mutex);
> +	mutex_init(&file->cleanup_mutex);
>  
>  	filp->private_data = file;
>  	kobject_get(&dev->kobj);
> @@ -953,18 +954,20 @@ static int ib_uverbs_close(struct inode *inode, struct file *filp)
>  {
>  	struct ib_uverbs_file *file = filp->private_data;
>  	struct ib_uverbs_device *dev = file->device;
> -	struct ib_ucontext *ucontext = NULL;
> +
> +	mutex_lock(&file->cleanup_mutex);
> +	if (file->ucontext) {
> +		ib_uverbs_cleanup_ucontext(file, file->ucontext);
> +		file->ucontext = NULL;
> +	}
> +	mutex_unlock(&file->cleanup_mutex);
>  
>  	mutex_lock(&file->device->lists_mutex);
> -	ucontext = file->ucontext;
> -	file->ucontext = NULL;
>  	if (!file->is_closed) {
>  		list_del(&file->list);
>  		file->is_closed = 1;
>  	}
>  	mutex_unlock(&file->device->lists_mutex);
> -	if (ucontext)
> -		ib_uverbs_cleanup_ucontext(file, ucontext);
>  
>  	if (file->async_file)
>  		kref_put(&file->async_file->ref, ib_uverbs_release_event_file);
> @@ -1178,22 +1181,30 @@ static void ib_uverbs_free_hw_resources(struct ib_uverbs_device *uverbs_dev,
>  	mutex_lock(&uverbs_dev->lists_mutex);
>  	while (!list_empty(&uverbs_dev->uverbs_file_list)) {
>  		struct ib_ucontext *ucontext;
> -
>  		file = list_first_entry(&uverbs_dev->uverbs_file_list,
>  					struct ib_uverbs_file, list);
>  		file->is_closed = 1;
> -		ucontext = file->ucontext;
>  		list_del(&file->list);
> -		file->ucontext = NULL;
>  		kref_get(&file->ref);
>  		mutex_unlock(&uverbs_dev->lists_mutex);
> -		/* We must release the mutex before going ahead and calling
> -		 * disassociate_ucontext. disassociate_ucontext might end up
> -		 * indirectly calling uverbs_close, for example due to freeing
> -		 * the resources (e.g mmput).
> -		 */
> +
>  		ib_uverbs_event_handler(&file->event_handler, &event);
> +
> +		mutex_lock(&file->cleanup_mutex);
> +		ucontext = file->ucontext;
> +		file->ucontext = NULL;
> +		mutex_unlock(&file->cleanup_mutex);
> +
> +		/* At this point ib_uverbs_close cannnot be running
> +		 * ib_uverbs_cleanup_ucontext
> +		 */
>  		if (ucontext) {
> +			/* We must release the mutex before going ahead and
> +			 * calling
> +			 * disassociate_ucontext. disassociate_ucontext might
> +			 * end up indirectly calling uverbs_close, for example
> +			 * due to freeing the resources (e.g mmput).
> +			 */
>  			ib_dev->disassociate_ucontext(ucontext);
>  			ib_uverbs_cleanup_ucontext(file, ucontext);
>  		}
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-04-26 15:27 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-12 15:18 [PATCH V4] IB/uverbs: Fix race between uverbs_close and remove_one Devesh Sharma
     [not found] ` <1457795927-16634-1-git-send-email-devesh.sharma-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
2016-03-12 20:45   ` Jason Gunthorpe
     [not found]     ` <20160312204502.GA8346-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-03-14  5:40       ` Devesh Sharma
     [not found]         ` <CANjDDBiN8X-Efgp6s2wyT1G6fpQZjdreW0pnnBG71E9jjhy-YA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-14 17:48           ` Jason Gunthorpe
     [not found]             ` <20160314174814.GB5240-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-03-15  9:00               ` Devesh Sharma
     [not found]                 ` <CANjDDBhVH10=0nSr0q4P4imAX6YrFBhE7QEd4ccRe2yUhN0pQQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-15  9:03                   ` Devesh Sharma
2016-03-15 20:31                   ` Jason Gunthorpe
     [not found]                     ` <20160315203112.GA2786-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-03-17 16:08                       ` Devesh Sharma
     [not found]                         ` <CANjDDBj8aZTeAfwxFuyk9r=kdihfxpxDA69-c4uVxkvcAfXViw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-17 16:12                           ` Jason Gunthorpe
     [not found]                             ` <20160317161237.GB19501-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-03-17 16:31                               ` Devesh Sharma
     [not found]                                 ` <CANjDDBhYH7HsUyP8-Ko7G0tnWxYDGYCGgaC4HK0Eod_isvoWAA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-17 16:48                                   ` Jason Gunthorpe
     [not found]                                     ` <20160317164831.GI19501-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-03-21  8:59                                       ` Haggai Eran
2016-04-26 14:33                                       ` Doug Ledford
     [not found]                                         ` <571F7C41.70700-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-04-26 15:18                                           ` Jason Gunthorpe
     [not found]                                             ` <20160426151851.GC24104-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-04-26 15:27                                               ` Doug Ledford

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.