linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] IB/umem: ib_ucontext already have tgid, remove pid from ib_umem structure
@ 2018-05-08  8:50 Lidong Chen
  2018-05-15 23:14 ` Jason Gunthorpe
  2018-06-13  4:36 ` Jason Gunthorpe
  0 siblings, 2 replies; 5+ messages in thread
From: Lidong Chen @ 2018-05-08  8:50 UTC (permalink / raw)
  To: dledford, jgg, akpm, qing.huang, leon, artemyko, dan.j.williams
  Cc: linux-rdma, linux-kernel, adido, galsha, aviadye, Lidong Chen

The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
exited, get_pid_task will return NULL, ib_umem_release does not decrease
mm->pinned_vm. This patch fixes it by use tgid in ib_ucontext struct.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
---
 [v2]
 - use ib_ucontext tgid instread of tgid in ib_umem structure
 
 drivers/infiniband/core/umem.c | 7 +------
 include/rdma/ib_umem.h         | 1 -
 2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 9a4e899..2b6c9b5 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -119,7 +119,6 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 	umem->length     = size;
 	umem->address    = addr;
 	umem->page_shift = PAGE_SHIFT;
-	umem->pid	 = get_task_pid(current, PIDTYPE_PID);
 	/*
 	 * We ask for writable memory if any of the following
 	 * access flags are set.  "Local write" and "remote write"
@@ -132,7 +131,6 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 		 IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND));
 
 	if (access & IB_ACCESS_ON_DEMAND) {
-		put_pid(umem->pid);
 		ret = ib_umem_odp_get(context, umem, access);
 		if (ret) {
 			kfree(umem);
@@ -148,7 +146,6 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 
 	page_list = (struct page **) __get_free_page(GFP_KERNEL);
 	if (!page_list) {
-		put_pid(umem->pid);
 		kfree(umem);
 		return ERR_PTR(-ENOMEM);
 	}
@@ -231,7 +228,6 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 	if (ret < 0) {
 		if (need_release)
 			__ib_umem_release(context->device, umem, 0);
-		put_pid(umem->pid);
 		kfree(umem);
 	} else
 		current->mm->pinned_vm = locked;
@@ -274,8 +270,7 @@ void ib_umem_release(struct ib_umem *umem)
 
 	__ib_umem_release(umem->context->device, umem, 1);
 
-	task = get_pid_task(umem->pid, PIDTYPE_PID);
-	put_pid(umem->pid);
+	task = get_pid_task(umem->context->tgid, PIDTYPE_PID);
 	if (!task)
 		goto out;
 	mm = get_task_mm(task);
diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index 23159dd..a1fd638 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -48,7 +48,6 @@ struct ib_umem {
 	int                     writable;
 	int                     hugetlb;
 	struct work_struct	work;
-	struct pid             *pid;
 	struct mm_struct       *mm;
 	unsigned long		diff;
 	struct ib_umem_odp     *odp_data;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] IB/umem: ib_ucontext already have tgid, remove pid from ib_umem structure
  2018-05-08  8:50 [PATCH v2] IB/umem: ib_ucontext already have tgid, remove pid from ib_umem structure Lidong Chen
@ 2018-05-15 23:14 ` Jason Gunthorpe
  2018-05-16  7:32   ` 858585 jemmy
  2018-06-13  4:36 ` Jason Gunthorpe
  1 sibling, 1 reply; 5+ messages in thread
From: Jason Gunthorpe @ 2018-05-15 23:14 UTC (permalink / raw)
  To: Lidong Chen
  Cc: dledford, akpm, qing.huang, leon, artemyko, dan.j.williams,
	linux-rdma, linux-kernel, adido, galsha, aviadye, Lidong Chen

On Tue, May 08, 2018 at 04:50:16PM +0800, Lidong Chen wrote:
> The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
> If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
> exited, get_pid_task will return NULL, ib_umem_release does not decrease
> mm->pinned_vm. This patch fixes it by use tgid in ib_ucontext struct.
> 
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> ---
>  [v2]
>  - use ib_ucontext tgid instread of tgid in ib_umem structure
>  
>  drivers/infiniband/core/umem.c | 7 +------
>  include/rdma/ib_umem.h         | 1 -
>  2 files changed, 1 insertion(+), 7 deletions(-)

Applied to for-rc, thanks.

It would be nice to send a cleanup to have all the users of tgid doing
this pattern

	task = get_pid_task(umem->context->tgid, PIDTYPE_PID);
  	if (!task)
  		goto out;
  	mm = get_task_mm(task);

To call some kind of common function like ib_get_mr_mm(), just to make
it really clear what is happening. 

Jason

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] IB/umem: ib_ucontext already have tgid, remove pid from ib_umem structure
  2018-05-15 23:14 ` Jason Gunthorpe
@ 2018-05-16  7:32   ` 858585 jemmy
  0 siblings, 0 replies; 5+ messages in thread
From: 858585 jemmy @ 2018-05-16  7:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dledford, akpm, qing.huang, Leon Romanovsky, artemyko,
	dan.j.williams, linux-rdma, linux-kernel, adido, Gal Shachaf,
	Aviad Yehezkel, Lidong Chen

On Wed, May 16, 2018 at 7:14 AM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Tue, May 08, 2018 at 04:50:16PM +0800, Lidong Chen wrote:
>> The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
>> If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
>> exited, get_pid_task will return NULL, ib_umem_release does not decrease
>> mm->pinned_vm. This patch fixes it by use tgid in ib_ucontext struct.
>>
>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>> ---
>>  [v2]
>>  - use ib_ucontext tgid instread of tgid in ib_umem structure
>>
>>  drivers/infiniband/core/umem.c | 7 +------
>>  include/rdma/ib_umem.h         | 1 -
>>  2 files changed, 1 insertion(+), 7 deletions(-)
>
> Applied to for-rc, thanks.
>
> It would be nice to send a cleanup to have all the users of tgid doing
> this pattern
>
>         task = get_pid_task(umem->context->tgid, PIDTYPE_PID);
>         if (!task)
>                 goto out;
>         mm = get_task_mm(task);
>
> To call some kind of common function like ib_get_mr_mm(), just to make
> it really clear what is happening.

OK, I will submit a patch for this.

>
> Jason

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] IB/umem: ib_ucontext already have tgid, remove pid from ib_umem structure
  2018-05-08  8:50 [PATCH v2] IB/umem: ib_ucontext already have tgid, remove pid from ib_umem structure Lidong Chen
  2018-05-15 23:14 ` Jason Gunthorpe
@ 2018-06-13  4:36 ` Jason Gunthorpe
  2018-06-13  9:25   ` 858585 jemmy
  1 sibling, 1 reply; 5+ messages in thread
From: Jason Gunthorpe @ 2018-06-13  4:36 UTC (permalink / raw)
  To: Lidong Chen
  Cc: dledford, akpm, qing.huang, leon, artemyko, dan.j.williams,
	linux-rdma, linux-kernel, adido, galsha, aviadye, Lidong Chen

On Tue, May 08, 2018 at 04:50:16PM +0800, Lidong Chen wrote:
> The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
> If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
> exited, get_pid_task will return NULL, ib_umem_release does not decrease
> mm->pinned_vm. This patch fixes it by use tgid in ib_ucontext struct.
> 
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> ---
>  [v2]
>  - use ib_ucontext tgid instread of tgid in ib_umem structure

I'm looking at this again, and it doesn't seem quite right..

> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index 9a4e899..2b6c9b5 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -119,7 +119,6 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>  	umem->length     = size;
>  	umem->address    = addr;
>  	umem->page_shift = PAGE_SHIFT;
> -	umem->pid	 = get_task_pid(current, PIDTYPE_PID);
>  	/*
>  	 * We ask for writable memory if any of the following
>  	 * access flags are set.  "Local write" and "remote write"
> @@ -132,7 +131,6 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>  		 IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND));
>  
>  	if (access & IB_ACCESS_ON_DEMAND) {
> -		put_pid(umem->pid);
>  		ret = ib_umem_odp_get(context, umem, access);
>  		if (ret) {
>  			kfree(umem);
> @@ -148,7 +146,6 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>  
>  	page_list = (struct page **) __get_free_page(GFP_KERNEL);
>  	if (!page_list) {
> -		put_pid(umem->pid);
>  		kfree(umem);
>  		return ERR_PTR(-ENOMEM);
>  	}

in ib_umem_get we are doing this:

	down_write(&current->mm->mmap_sem);
	locked     = npages + current->mm->pinned_vm;

And then in release we now do:

	task = get_pid_task(umem->context->tgid, PIDTYPE_PID);
	if (!task)
		goto out;
	mm = get_task_mm(task);
	mm->pinned_vm -= diff;

But there is no guarantee that context->tgid and 'current' are the
same thing during ib_umem_get..

So in the dysfunctional case where someone forks and keeps the context
FD open on both sides of the fork they can cause the pinned_vm
counter to become wrong in the processes. Sounds bad..

Thus, I think we need to go back to storing the tgid in the ib_umem
and just fix it to store the group leader not the thread PID?

And then even more we need the ib_get_mr_mm() helper to make sense of
this, because all the drivers are doing the wrong thing by using the
context->tgid too.

Is that all right?

Jason

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] IB/umem: ib_ucontext already have tgid, remove pid from ib_umem structure
  2018-06-13  4:36 ` Jason Gunthorpe
@ 2018-06-13  9:25   ` 858585 jemmy
  0 siblings, 0 replies; 5+ messages in thread
From: 858585 jemmy @ 2018-06-13  9:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dledford, akpm, qing.huang, Leon Romanovsky, artemyko,
	dan.j.williams, linux-rdma, linux-kernel, Adi Dotan, Gal Shachaf,
	Aviad Yehezkel, Lidong Chen

On Wed, Jun 13, 2018 at 12:36 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Tue, May 08, 2018 at 04:50:16PM +0800, Lidong Chen wrote:
>> The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
>> If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
>> exited, get_pid_task will return NULL, ib_umem_release does not decrease
>> mm->pinned_vm. This patch fixes it by use tgid in ib_ucontext struct.
>>
>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>> ---
>>  [v2]
>>  - use ib_ucontext tgid instread of tgid in ib_umem structure
>
> I'm looking at this again, and it doesn't seem quite right..
>
>> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
>> index 9a4e899..2b6c9b5 100644
>> --- a/drivers/infiniband/core/umem.c
>> +++ b/drivers/infiniband/core/umem.c
>> @@ -119,7 +119,6 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>>       umem->length     = size;
>>       umem->address    = addr;
>>       umem->page_shift = PAGE_SHIFT;
>> -     umem->pid        = get_task_pid(current, PIDTYPE_PID);
>>       /*
>>        * We ask for writable memory if any of the following
>>        * access flags are set.  "Local write" and "remote write"
>> @@ -132,7 +131,6 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>>                IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND));
>>
>>       if (access & IB_ACCESS_ON_DEMAND) {
>> -             put_pid(umem->pid);
>>               ret = ib_umem_odp_get(context, umem, access);
>>               if (ret) {
>>                       kfree(umem);
>> @@ -148,7 +146,6 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>>
>>       page_list = (struct page **) __get_free_page(GFP_KERNEL);
>>       if (!page_list) {
>> -             put_pid(umem->pid);
>>               kfree(umem);
>>               return ERR_PTR(-ENOMEM);
>>       }
>
> in ib_umem_get we are doing this:
>
>         down_write(&current->mm->mmap_sem);
>         locked     = npages + current->mm->pinned_vm;
>
> And then in release we now do:
>
>         task = get_pid_task(umem->context->tgid, PIDTYPE_PID);
>         if (!task)
>                 goto out;
>         mm = get_task_mm(task);
>         mm->pinned_vm -= diff;
>
> But there is no guarantee that context->tgid and 'current' are the
> same thing during ib_umem_get..

context->tgid and current maybe different. but different threads in one
process should point to one mm structure. so it should works for multithread.

>
> So in the dysfunctional case where someone forks and keeps the context
> FD open on both sides of the fork they can cause the pinned_vm
> counter to become wrong in the processes. Sounds bad..

I am not sure about fork support, I will check this problem.

>
> Thus, I think we need to go back to storing the tgid in the ib_umem
> and just fix it to store the group leader not the thread PID?
>
> And then even more we need the ib_get_mr_mm() helper to make sense of
> this, because all the drivers are doing the wrong thing by using the
> context->tgid too.
>
> Is that all right?
>
> Jason

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-06-13  9:25 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-08  8:50 [PATCH v2] IB/umem: ib_ucontext already have tgid, remove pid from ib_umem structure Lidong Chen
2018-05-15 23:14 ` Jason Gunthorpe
2018-05-16  7:32   ` 858585 jemmy
2018-06-13  4:36 ` Jason Gunthorpe
2018-06-13  9:25   ` 858585 jemmy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).