linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
@ 2018-05-03 14:04 Lidong Chen
  2018-05-03 15:33 ` Jason Gunthorpe
  0 siblings, 1 reply; 16+ messages in thread
From: Lidong Chen @ 2018-05-03 14:04 UTC (permalink / raw)
  To: dledford, jgg, akpm, qing.huang, leon, artemyko, dan.j.williams
  Cc: linux-rdma, linux-kernel, adido, galsha, aviadye, Lidong Chen

The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
exited, get_pid_task will return NULL, ib_umem_release does not decrease
mm->pinned_vm. This patch fixes it by use tgid.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
---
 drivers/infiniband/core/umem.c | 12 ++++++------
 include/rdma/ib_umem.h         |  2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 9a4e899..8813ba5 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -119,7 +119,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 	umem->length     = size;
 	umem->address    = addr;
 	umem->page_shift = PAGE_SHIFT;
-	umem->pid	 = get_task_pid(current, PIDTYPE_PID);
+	umem->tgid	 = get_task_pid(current->group_leader, PIDTYPE_PID);
 	/*
 	 * We ask for writable memory if any of the following
 	 * access flags are set.  "Local write" and "remote write"
@@ -132,7 +132,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 		 IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND));
 
 	if (access & IB_ACCESS_ON_DEMAND) {
-		put_pid(umem->pid);
+		put_pid(umem->tgid);
 		ret = ib_umem_odp_get(context, umem, access);
 		if (ret) {
 			kfree(umem);
@@ -148,7 +148,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 
 	page_list = (struct page **) __get_free_page(GFP_KERNEL);
 	if (!page_list) {
-		put_pid(umem->pid);
+		put_pid(umem->tgid);
 		kfree(umem);
 		return ERR_PTR(-ENOMEM);
 	}
@@ -231,7 +231,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 	if (ret < 0) {
 		if (need_release)
 			__ib_umem_release(context->device, umem, 0);
-		put_pid(umem->pid);
+		put_pid(umem->tgid);
 		kfree(umem);
 	} else
 		current->mm->pinned_vm = locked;
@@ -274,8 +274,8 @@ void ib_umem_release(struct ib_umem *umem)
 
 	__ib_umem_release(umem->context->device, umem, 1);
 
-	task = get_pid_task(umem->pid, PIDTYPE_PID);
-	put_pid(umem->pid);
+	task = get_pid_task(umem->tgid, PIDTYPE_PID);
+	put_pid(umem->tgid);
 	if (!task)
 		goto out;
 	mm = get_task_mm(task);
diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index 23159dd..2398849 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -48,7 +48,7 @@ struct ib_umem {
 	int                     writable;
 	int                     hugetlb;
 	struct work_struct	work;
-	struct pid             *pid;
+	struct pid             *tgid;
 	struct mm_struct       *mm;
 	unsigned long		diff;
 	struct ib_umem_odp     *odp_data;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
  2018-05-03 14:04 [PATCH] IB/umem: use tgid instead of pid in ib_umem structure Lidong Chen
@ 2018-05-03 15:33 ` Jason Gunthorpe
  2018-05-03 18:12   ` Leon Romanovsky
                     ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Jason Gunthorpe @ 2018-05-03 15:33 UTC (permalink / raw)
  To: Lidong Chen
  Cc: dledford, akpm, qing.huang, leon, artemyko, dan.j.williams,
	linux-rdma, linux-kernel, adido, galsha, aviadye, Lidong Chen

On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
> The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
> If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
> exited, get_pid_task will return NULL, ib_umem_release does not decrease
> mm->pinned_vm. This patch fixes it by use tgid.
> 
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> ---
>  drivers/infiniband/core/umem.c | 12 ++++++------
>  include/rdma/ib_umem.h         |  2 +-
>  2 files changed, 7 insertions(+), 7 deletions(-)

Why are we even using a struct pid for this? Does anyone know?

I'm surprised that struct task isn't held in the struct ib_umem..

Is group_leader really OK and always guarenteed to return the same
struct mm?? For some reason I have this recollection that the leader
can change under some situation..

> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index 9a4e899..8813ba5 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -119,7 +119,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>  	umem->length     = size;
>  	umem->address    = addr;
>  	umem->page_shift = PAGE_SHIFT;
> -	umem->pid	 = get_task_pid(current, PIDTYPE_PID);
> +	umem->tgid	 = get_task_pid(current->group_leader, PIDTYPE_PID);
>  	/*
>  	 * We ask for writable memory if any of the following
>  	 * access flags are set.  "Local write" and "remote write"
> @@ -132,7 +132,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>  		 IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND));
>  
>  	if (access & IB_ACCESS_ON_DEMAND) {
> -		put_pid(umem->pid);
> +		put_pid(umem->tgid);
>  		ret = ib_umem_odp_get(context, umem, access);
>  		if (ret) {
>  			kfree(umem);
> @@ -148,7 +148,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>  
>  	page_list = (struct page **) __get_free_page(GFP_KERNEL);
>  	if (!page_list) {
> -		put_pid(umem->pid);
> +		put_pid(umem->tgid);
>  		kfree(umem);
>  		return ERR_PTR(-ENOMEM);
>  	}
> @@ -231,7 +231,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>  	if (ret < 0) {
>  		if (need_release)
>  			__ib_umem_release(context->device, umem, 0);
> -		put_pid(umem->pid);
> +		put_pid(umem->tgid);
>  		kfree(umem);
>  	} else
>  		current->mm->pinned_vm = locked;
> @@ -274,8 +274,8 @@ void ib_umem_release(struct ib_umem *umem)
>  
>  	__ib_umem_release(umem->context->device, umem, 1);
>  
> -	task = get_pid_task(umem->pid, PIDTYPE_PID);
> -	put_pid(umem->pid);
> +	task = get_pid_task(umem->tgid, PIDTYPE_PID);
> +	put_pid(umem->tgid);
>  	if (!task)
>  		goto out;
>  	mm = get_task_mm(task);
> diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
> index 23159dd..2398849 100644
> --- a/include/rdma/ib_umem.h
> +++ b/include/rdma/ib_umem.h
> @@ -48,7 +48,7 @@ struct ib_umem {
>  	int                     writable;
>  	int                     hugetlb;
>  	struct work_struct	work;
> -	struct pid             *pid;
> +	struct pid             *tgid;
>  	struct mm_struct       *mm;
>  	unsigned long		diff;
>  	struct ib_umem_odp     *odp_data;
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
  2018-05-03 15:33 ` Jason Gunthorpe
@ 2018-05-03 18:12   ` Leon Romanovsky
  2018-05-03 18:26     ` Jason Gunthorpe
  2018-05-04  2:41   ` 858585 jemmy
  2018-05-04  3:14   ` 858585 jemmy
  2 siblings, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2018-05-03 18:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Lidong Chen, dledford, akpm, qing.huang, artemyko,
	dan.j.williams, linux-rdma, linux-kernel, adido, galsha, aviadye,
	Lidong Chen

[-- Attachment #1: Type: text/plain, Size: 919 bytes --]

On Thu, May 03, 2018 at 09:33:10AM -0600, Jason Gunthorpe wrote:
> On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
> > The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
> > If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
> > exited, get_pid_task will return NULL, ib_umem_release does not decrease
> > mm->pinned_vm. This patch fixes it by use tgid.
> >
> > Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> > ---
> >  drivers/infiniband/core/umem.c | 12 ++++++------
> >  include/rdma/ib_umem.h         |  2 +-
> >  2 files changed, 7 insertions(+), 7 deletions(-)
>
> Why are we even using a struct pid for this? Does anyone know?
>

Can it be related to "fork" support?

> I'm surprised that struct task isn't held in the struct ib_umem..
>

I think that this code can be removed and all accesses to mm_struct can
be done with "current->mm".

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
  2018-05-03 18:12   ` Leon Romanovsky
@ 2018-05-03 18:26     ` Jason Gunthorpe
  2018-05-03 18:43       ` Leon Romanovsky
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2018-05-03 18:26 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Lidong Chen, dledford, akpm, qing.huang, artemyko,
	dan.j.williams, linux-rdma, linux-kernel, adido, galsha, aviadye,
	Lidong Chen

On Thu, May 03, 2018 at 09:12:35PM +0300, Leon Romanovsky wrote:
> On Thu, May 03, 2018 at 09:33:10AM -0600, Jason Gunthorpe wrote:
> > On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
> > > The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
> > > If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
> > > exited, get_pid_task will return NULL, ib_umem_release does not decrease
> > > mm->pinned_vm. This patch fixes it by use tgid.
> > >
> > > Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> > >  drivers/infiniband/core/umem.c | 12 ++++++------
> > >  include/rdma/ib_umem.h         |  2 +-
> > >  2 files changed, 7 insertions(+), 7 deletions(-)
> >
> > Why are we even using a struct pid for this? Does anyone know?
> >
> 
> Can it be related to "fork" support?

Not sure..

Ideally we want to hold the struct mm, but we can't hold it long
term, so pid is a surrogate for that.

> > I'm surprised that struct task isn't held in the struct ib_umem..
> >
> 
> I think that this code can be removed and all accesses to mm_struct can
> be done with "current->mm".

That sounds wrong for fork support, as the mm used in destroy MUST
exactly match the mm used in create..

How does this accounting work in fork anyhow?

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
  2018-05-03 18:26     ` Jason Gunthorpe
@ 2018-05-03 18:43       ` Leon Romanovsky
  2018-05-03 22:01         ` Jason Gunthorpe
  0 siblings, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2018-05-03 18:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Lidong Chen, dledford, akpm, qing.huang, artemyko,
	dan.j.williams, linux-rdma, linux-kernel, adido, galsha, aviadye,
	Lidong Chen

[-- Attachment #1: Type: text/plain, Size: 1491 bytes --]

On Thu, May 03, 2018 at 12:26:56PM -0600, Jason Gunthorpe wrote:
> On Thu, May 03, 2018 at 09:12:35PM +0300, Leon Romanovsky wrote:
> > On Thu, May 03, 2018 at 09:33:10AM -0600, Jason Gunthorpe wrote:
> > > On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
> > > > The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
> > > > If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
> > > > exited, get_pid_task will return NULL, ib_umem_release does not decrease
> > > > mm->pinned_vm. This patch fixes it by use tgid.
> > > >
> > > > Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> > > >  drivers/infiniband/core/umem.c | 12 ++++++------
> > > >  include/rdma/ib_umem.h         |  2 +-
> > > >  2 files changed, 7 insertions(+), 7 deletions(-)
> > >
> > > Why are we even using a struct pid for this? Does anyone know?
> > >
> >
> > Can it be related to "fork" support?
>
> Not sure..
>
> Ideally we want to hold the struct mm, but we can't hold it long
> term, so pid is a surrogate for that.
>
> > > I'm surprised that struct task isn't held in the struct ib_umem..
> > >
> >
> > I think that this code can be removed and all accesses to mm_struct can
> > be done with "current->mm".
>
> That sounds wrong for fork support, as the mm used in destroy MUST
> exactly match the mm used in create..
>
> How does this accounting work in fork anyhow?

We are not supporting fork, so this is why I proposed to remove it.

Thanks

>
> Jason

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
  2018-05-03 18:43       ` Leon Romanovsky
@ 2018-05-03 22:01         ` Jason Gunthorpe
  2018-05-04  8:32           ` 858585 jemmy
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2018-05-03 22:01 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Lidong Chen, dledford, akpm, qing.huang, artemyko,
	dan.j.williams, linux-rdma, linux-kernel, adido, galsha, aviadye,
	Lidong Chen

On Thu, May 03, 2018 at 09:43:01PM +0300, Leon Romanovsky wrote:
> On Thu, May 03, 2018 at 12:26:56PM -0600, Jason Gunthorpe wrote:
> > On Thu, May 03, 2018 at 09:12:35PM +0300, Leon Romanovsky wrote:
> > > On Thu, May 03, 2018 at 09:33:10AM -0600, Jason Gunthorpe wrote:
> > > > On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
> > > > > The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
> > > > > If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
> > > > > exited, get_pid_task will return NULL, ib_umem_release does not decrease
> > > > > mm->pinned_vm. This patch fixes it by use tgid.
> > > > >
> > > > > Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> > > > >  drivers/infiniband/core/umem.c | 12 ++++++------
> > > > >  include/rdma/ib_umem.h         |  2 +-
> > > > >  2 files changed, 7 insertions(+), 7 deletions(-)
> > > >
> > > > Why are we even using a struct pid for this? Does anyone know?
> > > >
> > >
> > > Can it be related to "fork" support?
> >
> > Not sure..
> >
> > Ideally we want to hold the struct mm, but we can't hold it long
> > term, so pid is a surrogate for that.
> >
> > > > I'm surprised that struct task isn't held in the struct ib_umem..
> > > >
> > >
> > > I think that this code can be removed and all accesses to mm_struct can
> > > be done with "current->mm".
> >
> > That sounds wrong for fork support, as the mm used in destroy MUST
> > exactly match the mm used in create..
> >
> > How does this accounting work in fork anyhow?
> 
> We are not supporting fork, so this is why I proposed to remove it.

Er, the new kabi certainly can call reg and dereg across a fork

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
  2018-05-03 15:33 ` Jason Gunthorpe
  2018-05-03 18:12   ` Leon Romanovsky
@ 2018-05-04  2:41   ` 858585 jemmy
  2018-05-04  3:14   ` 858585 jemmy
  2 siblings, 0 replies; 16+ messages in thread
From: 858585 jemmy @ 2018-05-04  2:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dledford, akpm, qing.huang, leon, artemyko, dan.j.williams,
	linux-rdma, linux-kernel, adido, Gal Shachaf, Aviad Yehezkel,
	Lidong Chen

On Thu, May 3, 2018 at 11:33 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
>> The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
>> If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
>> exited, get_pid_task will return NULL, ib_umem_release does not decrease
>> mm->pinned_vm. This patch fixes it by use tgid.
>>
>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>> ---
>>  drivers/infiniband/core/umem.c | 12 ++++++------
>>  include/rdma/ib_umem.h         |  2 +-
>>  2 files changed, 7 insertions(+), 7 deletions(-)
>
> Why are we even using a struct pid for this? Does anyone know?
>
> I'm surprised that struct task isn't held in the struct ib_umem..
>
> Is group_leader really OK and always guarenteed to return the same
> struct mm?? For some reason I have this recollection that the leader
> can change under some situation..

I read kernel code, it seems that the userspace thread can guarantee
to return the same struct mm.
in the copy_process function, it checks the CLONE_THREAD,
CLONE_SIGHAND, CLONE_VM must use
at the same time.
And I do not find any other part code change the mm and group_leader
for userspace thread.

>
>> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
>> index 9a4e899..8813ba5 100644
>> --- a/drivers/infiniband/core/umem.c
>> +++ b/drivers/infiniband/core/umem.c
>> @@ -119,7 +119,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>>       umem->length     = size;
>>       umem->address    = addr;
>>       umem->page_shift = PAGE_SHIFT;
>> -     umem->pid        = get_task_pid(current, PIDTYPE_PID);
>> +     umem->tgid       = get_task_pid(current->group_leader, PIDTYPE_PID);
>>       /*
>>        * We ask for writable memory if any of the following
>>        * access flags are set.  "Local write" and "remote write"
>> @@ -132,7 +132,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>>                IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND));
>>
>>       if (access & IB_ACCESS_ON_DEMAND) {
>> -             put_pid(umem->pid);
>> +             put_pid(umem->tgid);
>>               ret = ib_umem_odp_get(context, umem, access);
>>               if (ret) {
>>                       kfree(umem);
>> @@ -148,7 +148,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>>
>>       page_list = (struct page **) __get_free_page(GFP_KERNEL);
>>       if (!page_list) {
>> -             put_pid(umem->pid);
>> +             put_pid(umem->tgid);
>>               kfree(umem);
>>               return ERR_PTR(-ENOMEM);
>>       }
>> @@ -231,7 +231,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>>       if (ret < 0) {
>>               if (need_release)
>>                       __ib_umem_release(context->device, umem, 0);
>> -             put_pid(umem->pid);
>> +             put_pid(umem->tgid);
>>               kfree(umem);
>>       } else
>>               current->mm->pinned_vm = locked;
>> @@ -274,8 +274,8 @@ void ib_umem_release(struct ib_umem *umem)
>>
>>       __ib_umem_release(umem->context->device, umem, 1);
>>
>> -     task = get_pid_task(umem->pid, PIDTYPE_PID);
>> -     put_pid(umem->pid);
>> +     task = get_pid_task(umem->tgid, PIDTYPE_PID);
>> +     put_pid(umem->tgid);
>>       if (!task)
>>               goto out;
>>       mm = get_task_mm(task);
>> diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
>> index 23159dd..2398849 100644
>> --- a/include/rdma/ib_umem.h
>> +++ b/include/rdma/ib_umem.h
>> @@ -48,7 +48,7 @@ struct ib_umem {
>>       int                     writable;
>>       int                     hugetlb;
>>       struct work_struct      work;
>> -     struct pid             *pid;
>> +     struct pid             *tgid;
>>       struct mm_struct       *mm;
>>       unsigned long           diff;
>>       struct ib_umem_odp     *odp_data;
>> --
>> 1.8.3.1
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
  2018-05-03 15:33 ` Jason Gunthorpe
  2018-05-03 18:12   ` Leon Romanovsky
  2018-05-04  2:41   ` 858585 jemmy
@ 2018-05-04  3:14   ` 858585 jemmy
  2018-05-04  8:51     ` 858585 jemmy
  2 siblings, 1 reply; 16+ messages in thread
From: 858585 jemmy @ 2018-05-04  3:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dledford, akpm, qing.huang, leon, artemyko, dan.j.williams,
	linux-rdma, linux-kernel, adido, Gal Shachaf, Aviad Yehezkel,
	Lidong Chen

On Thu, May 3, 2018 at 11:33 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
>> The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
>> If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
>> exited, get_pid_task will return NULL, ib_umem_release does not decrease
>> mm->pinned_vm. This patch fixes it by use tgid.
>>
>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>> ---
>>  drivers/infiniband/core/umem.c | 12 ++++++------
>>  include/rdma/ib_umem.h         |  2 +-
>>  2 files changed, 7 insertions(+), 7 deletions(-)
>
> Why are we even using a struct pid for this? Does anyone know?

commit 87773dd56d5405ac28119fcfadacefd35877c18f add pid in ib_umem structure.

and the comment has such information:
Later a different process with a different mm_struct than the one that
allocated the ib_umem struct
ends up releasing it which results in decrementing the new processes
mm->pinned_vm count past
zero and wrapping.

>
> I'm surprised that struct task isn't held in the struct ib_umem..
>
> Is group_leader really OK and always guarenteed to return the same
> struct mm?? For some reason I have this recollection that the leader
> can change under some situation..
>
>> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
>> index 9a4e899..8813ba5 100644
>> --- a/drivers/infiniband/core/umem.c
>> +++ b/drivers/infiniband/core/umem.c
>> @@ -119,7 +119,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>>       umem->length     = size;
>>       umem->address    = addr;
>>       umem->page_shift = PAGE_SHIFT;
>> -     umem->pid        = get_task_pid(current, PIDTYPE_PID);
>> +     umem->tgid       = get_task_pid(current->group_leader, PIDTYPE_PID);
>>       /*
>>        * We ask for writable memory if any of the following
>>        * access flags are set.  "Local write" and "remote write"
>> @@ -132,7 +132,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>>                IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND));
>>
>>       if (access & IB_ACCESS_ON_DEMAND) {
>> -             put_pid(umem->pid);
>> +             put_pid(umem->tgid);
>>               ret = ib_umem_odp_get(context, umem, access);
>>               if (ret) {
>>                       kfree(umem);
>> @@ -148,7 +148,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>>
>>       page_list = (struct page **) __get_free_page(GFP_KERNEL);
>>       if (!page_list) {
>> -             put_pid(umem->pid);
>> +             put_pid(umem->tgid);
>>               kfree(umem);
>>               return ERR_PTR(-ENOMEM);
>>       }
>> @@ -231,7 +231,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>>       if (ret < 0) {
>>               if (need_release)
>>                       __ib_umem_release(context->device, umem, 0);
>> -             put_pid(umem->pid);
>> +             put_pid(umem->tgid);
>>               kfree(umem);
>>       } else
>>               current->mm->pinned_vm = locked;
>> @@ -274,8 +274,8 @@ void ib_umem_release(struct ib_umem *umem)
>>
>>       __ib_umem_release(umem->context->device, umem, 1);
>>
>> -     task = get_pid_task(umem->pid, PIDTYPE_PID);
>> -     put_pid(umem->pid);
>> +     task = get_pid_task(umem->tgid, PIDTYPE_PID);
>> +     put_pid(umem->tgid);
>>       if (!task)
>>               goto out;
>>       mm = get_task_mm(task);
>> diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
>> index 23159dd..2398849 100644
>> --- a/include/rdma/ib_umem.h
>> +++ b/include/rdma/ib_umem.h
>> @@ -48,7 +48,7 @@ struct ib_umem {
>>       int                     writable;
>>       int                     hugetlb;
>>       struct work_struct      work;
>> -     struct pid             *pid;
>> +     struct pid             *tgid;
>>       struct mm_struct       *mm;
>>       unsigned long           diff;
>>       struct ib_umem_odp     *odp_data;
>> --
>> 1.8.3.1
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
  2018-05-03 22:01         ` Jason Gunthorpe
@ 2018-05-04  8:32           ` 858585 jemmy
  2018-05-04 13:39             ` Leon Romanovsky
  0 siblings, 1 reply; 16+ messages in thread
From: 858585 jemmy @ 2018-05-04  8:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, dledford, akpm, qing.huang, artemyko,
	dan.j.williams, linux-rdma, linux-kernel, adido, Gal Shachaf,
	Aviad Yehezkel, Lidong Chen

On Fri, May 4, 2018 at 6:01 AM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Thu, May 03, 2018 at 09:43:01PM +0300, Leon Romanovsky wrote:
>> On Thu, May 03, 2018 at 12:26:56PM -0600, Jason Gunthorpe wrote:
>> > On Thu, May 03, 2018 at 09:12:35PM +0300, Leon Romanovsky wrote:
>> > > On Thu, May 03, 2018 at 09:33:10AM -0600, Jason Gunthorpe wrote:
>> > > > On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
>> > > > > The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
>> > > > > If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
>> > > > > exited, get_pid_task will return NULL, ib_umem_release does not decrease
>> > > > > mm->pinned_vm. This patch fixes it by use tgid.
>> > > > >
>> > > > > Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>> > > > >  drivers/infiniband/core/umem.c | 12 ++++++------
>> > > > >  include/rdma/ib_umem.h         |  2 +-
>> > > > >  2 files changed, 7 insertions(+), 7 deletions(-)
>> > > >
>> > > > Why are we even using a struct pid for this? Does anyone know?
>> > > >
>> > >
>> > > Can it be related to "fork" support?
>> >
>> > Not sure..
>> >
>> > Ideally we want to hold the struct mm, but we can't hold it long
>> > term, so pid is a surrogate for that.
>> >
>> > > > I'm surprised that struct task isn't held in the struct ib_umem..
>> > > >
>> > >
>> > > I think that this code can be removed and all accesses to mm_struct can
>> > > be done with "current->mm".
>> >
>> > That sounds wrong for fork support, as the mm used in destroy MUST
>> > exactly match the mm used in create..
>> >
>> > How does this accounting work in fork anyhow?
>>
>> We are not supporting fork, so this is why I proposed to remove it.
>
> Er, the new kabi certainly can call reg and dereg across a fork

what is the expect behavior after fork?
I write a test code, the dereg just return EACCES in the child
process. and have no effect.

>
> Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
  2018-05-04  3:14   ` 858585 jemmy
@ 2018-05-04  8:51     ` 858585 jemmy
  2018-05-04 18:23       ` Jason Gunthorpe
  0 siblings, 1 reply; 16+ messages in thread
From: 858585 jemmy @ 2018-05-04  8:51 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dledford, akpm, qing.huang, Leon Romanovsky, artemyko,
	dan.j.williams, linux-rdma, linux-kernel, adido, Gal Shachaf,
	Aviad Yehezkel, Lidong Chen

On Fri, May 4, 2018 at 11:14 AM, 858585 jemmy <jemmy858585@gmail.com> wrote:
> On Thu, May 3, 2018 at 11:33 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>> On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
>>> The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
>>> If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
>>> exited, get_pid_task will return NULL, ib_umem_release does not decrease
>>> mm->pinned_vm. This patch fixes it by use tgid.
>>>
>>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>>> ---
>>>  drivers/infiniband/core/umem.c | 12 ++++++------
>>>  include/rdma/ib_umem.h         |  2 +-
>>>  2 files changed, 7 insertions(+), 7 deletions(-)
>>
>> Why are we even using a struct pid for this? Does anyone know?
>
> commit 87773dd56d5405ac28119fcfadacefd35877c18f add pid in ib_umem structure.
>
> and the comment has such information:
> Later a different process with a different mm_struct than the one that
> allocated the ib_umem struct
> ends up releasing it which results in decrementing the new processes
> mm->pinned_vm count past
> zero and wrapping.
>

I think a different process should not have the permission to release ib_umem.
so maybe the reason is not a different process?
can ib_umem_release be invoked in interrupt context?

>>
>> I'm surprised that struct task isn't held in the struct ib_umem..
>>
>> Is group_leader really OK and always guarenteed to return the same
>> struct mm?? For some reason I have this recollection that the leader
>> can change under some situation..
>>
>>> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
>>> index 9a4e899..8813ba5 100644
>>> --- a/drivers/infiniband/core/umem.c
>>> +++ b/drivers/infiniband/core/umem.c
>>> @@ -119,7 +119,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>>>       umem->length     = size;
>>>       umem->address    = addr;
>>>       umem->page_shift = PAGE_SHIFT;
>>> -     umem->pid        = get_task_pid(current, PIDTYPE_PID);
>>> +     umem->tgid       = get_task_pid(current->group_leader, PIDTYPE_PID);
>>>       /*
>>>        * We ask for writable memory if any of the following
>>>        * access flags are set.  "Local write" and "remote write"
>>> @@ -132,7 +132,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>>>                IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND));
>>>
>>>       if (access & IB_ACCESS_ON_DEMAND) {
>>> -             put_pid(umem->pid);
>>> +             put_pid(umem->tgid);
>>>               ret = ib_umem_odp_get(context, umem, access);
>>>               if (ret) {
>>>                       kfree(umem);
>>> @@ -148,7 +148,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>>>
>>>       page_list = (struct page **) __get_free_page(GFP_KERNEL);
>>>       if (!page_list) {
>>> -             put_pid(umem->pid);
>>> +             put_pid(umem->tgid);
>>>               kfree(umem);
>>>               return ERR_PTR(-ENOMEM);
>>>       }
>>> @@ -231,7 +231,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>>>       if (ret < 0) {
>>>               if (need_release)
>>>                       __ib_umem_release(context->device, umem, 0);
>>> -             put_pid(umem->pid);
>>> +             put_pid(umem->tgid);
>>>               kfree(umem);
>>>       } else
>>>               current->mm->pinned_vm = locked;
>>> @@ -274,8 +274,8 @@ void ib_umem_release(struct ib_umem *umem)
>>>
>>>       __ib_umem_release(umem->context->device, umem, 1);
>>>
>>> -     task = get_pid_task(umem->pid, PIDTYPE_PID);
>>> -     put_pid(umem->pid);
>>> +     task = get_pid_task(umem->tgid, PIDTYPE_PID);
>>> +     put_pid(umem->tgid);
>>>       if (!task)
>>>               goto out;
>>>       mm = get_task_mm(task);
>>> diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
>>> index 23159dd..2398849 100644
>>> --- a/include/rdma/ib_umem.h
>>> +++ b/include/rdma/ib_umem.h
>>> @@ -48,7 +48,7 @@ struct ib_umem {
>>>       int                     writable;
>>>       int                     hugetlb;
>>>       struct work_struct      work;
>>> -     struct pid             *pid;
>>> +     struct pid             *tgid;
>>>       struct mm_struct       *mm;
>>>       unsigned long           diff;
>>>       struct ib_umem_odp     *odp_data;
>>> --
>>> 1.8.3.1
>>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
  2018-05-04  8:32           ` 858585 jemmy
@ 2018-05-04 13:39             ` Leon Romanovsky
  2018-05-04 15:14               ` lidongchen(陈立东)
  0 siblings, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2018-05-04 13:39 UTC (permalink / raw)
  To: 858585 jemmy
  Cc: Jason Gunthorpe, dledford, akpm, qing.huang, artemyko,
	dan.j.williams, linux-rdma, linux-kernel, adido, Gal Shachaf,
	Aviad Yehezkel, Lidong Chen

[-- Attachment #1: Type: text/plain, Size: 2314 bytes --]

On Fri, May 04, 2018 at 04:32:38PM +0800, 858585 jemmy wrote:
> On Fri, May 4, 2018 at 6:01 AM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Thu, May 03, 2018 at 09:43:01PM +0300, Leon Romanovsky wrote:
> >> On Thu, May 03, 2018 at 12:26:56PM -0600, Jason Gunthorpe wrote:
> >> > On Thu, May 03, 2018 at 09:12:35PM +0300, Leon Romanovsky wrote:
> >> > > On Thu, May 03, 2018 at 09:33:10AM -0600, Jason Gunthorpe wrote:
> >> > > > On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
> >> > > > > The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
> >> > > > > If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
> >> > > > > exited, get_pid_task will return NULL, ib_umem_release does not decrease
> >> > > > > mm->pinned_vm. This patch fixes it by use tgid.
> >> > > > >
> >> > > > > Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> >> > > > >  drivers/infiniband/core/umem.c | 12 ++++++------
> >> > > > >  include/rdma/ib_umem.h         |  2 +-
> >> > > > >  2 files changed, 7 insertions(+), 7 deletions(-)
> >> > > >
> >> > > > Why are we even using a struct pid for this? Does anyone know?
> >> > > >
> >> > >
> >> > > Can it be related to "fork" support?
> >> >
> >> > Not sure..
> >> >
> >> > Ideally we want to hold the struct mm, but we can't hold it long
> >> > term, so pid is a surrogate for that.
> >> >
> >> > > > I'm surprised that struct task isn't held in the struct ib_umem..
> >> > > >
> >> > >
> >> > > I think that this code can be removed and all accesses to mm_struct can
> >> > > be done with "current->mm".
> >> >
> >> > That sounds wrong for fork support, as the mm used in destroy MUST
> >> > exactly match the mm used in create..
> >> >
> >> > How does this accounting work in fork anyhow?
> >>
> >> We are not supporting fork, so this is why I proposed to remove it.
> >
> > Er, the new kabi certainly can call reg and dereg across a fork
>
> what is the expect behavior after fork?
> I write a test code, the dereg just return EACCES in the child
> process. and have no effect.

Did you do reg/dereg over write() interface? If yes, this is expected
behaviour of "not-supported fork()". A couple of months/years ago, your
test program would work, but we closed this option due to security
constraints.

Thanks

>
> >
> > Jason

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
  2018-05-04 13:39             ` Leon Romanovsky
@ 2018-05-04 15:14               ` lidongchen(陈立东)
  0 siblings, 0 replies; 16+ messages in thread
From: lidongchen(陈立东) @ 2018-05-04 15:14 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: 858585 jemmy, Jason Gunthorpe, dledford, akpm, qing.huang,
	artemyko, dan.j.williams, linux-rdma, linux-kernel, adido,
	Gal Shachaf, Aviad Yehezkel



> 在 2018年5月4日,21:39,Leon Romanovsky <leon@kernel.org> 写道:
> 
>> On Fri, May 04, 2018 at 04:32:38PM +0800, 858585 jemmy wrote:
>>> On Fri, May 4, 2018 at 6:01 AM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>>>> On Thu, May 03, 2018 at 09:43:01PM +0300, Leon Romanovsky wrote:
>>>>> On Thu, May 03, 2018 at 12:26:56PM -0600, Jason Gunthorpe wrote:
>>>>>> On Thu, May 03, 2018 at 09:12:35PM +0300, Leon Romanovsky wrote:
>>>>>>> On Thu, May 03, 2018 at 09:33:10AM -0600, Jason Gunthorpe wrote:
>>>>>>>> On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
>>>>>>>> The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
>>>>>>>> If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
>>>>>>>> exited, get_pid_task will return NULL, ib_umem_release does not decrease
>>>>>>>> mm->pinned_vm. This patch fixes it by use tgid.
>>>>>>>> 
>>>>>>>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>>>>>>>> drivers/infiniband/core/umem.c | 12 ++++++------
>>>>>>>> include/rdma/ib_umem.h         |  2 +-
>>>>>>>> 2 files changed, 7 insertions(+), 7 deletions(-)
>>>>>>> 
>>>>>>> Why are we even using a struct pid for this? Does anyone know?
>>>>>>> 
>>>>>> 
>>>>>> Can it be related to "fork" support?
>>>>> 
>>>>> Not sure..
>>>>> 
>>>>> Ideally we want to hold the struct mm, but we can't hold it long
>>>>> term, so pid is a surrogate for that.
>>>>> 
>>>>>>> I'm surprised that struct task isn't held in the struct ib_umem..
>>>>>>> 
>>>>>> 
>>>>>> I think that this code can be removed and all accesses to mm_struct can
>>>>>> be done with "current->mm".
>>>>> 
>>>>> That sounds wrong for fork support, as the mm used in destroy MUST
>>>>> exactly match the mm used in create..
>>>>> 
>>>>> How does this accounting work in fork anyhow?
>>>> 
>>>> We are not supporting fork, so this is why I proposed to remove it.
>>> 
>>> Er, the new kabi certainly can call reg and dereg across a fork
>> 
>> what is the expect behavior after fork?
>> I write a test code, the dereg just return EACCES in the child
>> process. and have no effect.
> 
> Did you do reg/dereg over write() interface? If yes, this is expected
> behaviour of "not-supported fork()". A couple of months/years ago, your
> test program would work, but we closed this option due to security
> constraints.

the parent process call ibv_reg_mr, and the child process call ibv_dereg_mr.

If fork is not supported now, so use tgid to get mm structure is fine for multithread.

> 
> Thanks
> 
>> 
>>> 
>>> Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
  2018-05-04  8:51     ` 858585 jemmy
@ 2018-05-04 18:23       ` Jason Gunthorpe
  2018-05-07  1:38         ` 858585 jemmy
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2018-05-04 18:23 UTC (permalink / raw)
  To: 858585 jemmy
  Cc: dledford, akpm, qing.huang, Leon Romanovsky, artemyko,
	dan.j.williams, linux-rdma, linux-kernel, adido, Gal Shachaf,
	Aviad Yehezkel, Lidong Chen

On Fri, May 04, 2018 at 04:51:15PM +0800, 858585 jemmy wrote:
> On Fri, May 4, 2018 at 11:14 AM, 858585 jemmy <jemmy858585@gmail.com> wrote:
> > On Thu, May 3, 2018 at 11:33 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >> On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
> >>> The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
> >>> If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
> >>> exited, get_pid_task will return NULL, ib_umem_release does not decrease
> >>> mm->pinned_vm. This patch fixes it by use tgid.
> >>>
> >>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> >>>  drivers/infiniband/core/umem.c | 12 ++++++------
> >>>  include/rdma/ib_umem.h         |  2 +-
> >>>  2 files changed, 7 insertions(+), 7 deletions(-)
> >>
> >> Why are we even using a struct pid for this? Does anyone know?
> >
> > commit 87773dd56d5405ac28119fcfadacefd35877c18f add pid in ib_umem structure.
> >
> > and the comment has such information:
> > Later a different process with a different mm_struct than the one that
> > allocated the ib_umem struct
> > ends up releasing it which results in decrementing the new processes
> > mm->pinned_vm count past
> > zero and wrapping.
> 
> I think a different process should not have the permission to release ib_umem.
> so maybe the reason is not a different process?
> can ib_umem_release be invoked in interrupt context?

We plan to restore fork support and add some way to share MRs between
processes, so we must consider having a different process release the
umem than acquired it.

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
  2018-05-04 18:23       ` Jason Gunthorpe
@ 2018-05-07  1:38         ` 858585 jemmy
  2018-05-08  6:30           ` Jason Gunthorpe
  0 siblings, 1 reply; 16+ messages in thread
From: 858585 jemmy @ 2018-05-07  1:38 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dledford, akpm, qing.huang, Leon Romanovsky, artemyko,
	dan.j.williams, linux-rdma, linux-kernel, adido, Gal Shachaf,
	Aviad Yehezkel, Lidong Chen

On Sat, May 5, 2018 at 2:23 AM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Fri, May 04, 2018 at 04:51:15PM +0800, 858585 jemmy wrote:
>> On Fri, May 4, 2018 at 11:14 AM, 858585 jemmy <jemmy858585@gmail.com> wrote:
>> > On Thu, May 3, 2018 at 11:33 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>> >> On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
>> >>> The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
>> >>> If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
>> >>> exited, get_pid_task will return NULL, ib_umem_release does not decrease
>> >>> mm->pinned_vm. This patch fixes it by use tgid.
>> >>>
>> >>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>> >>>  drivers/infiniband/core/umem.c | 12 ++++++------
>> >>>  include/rdma/ib_umem.h         |  2 +-
>> >>>  2 files changed, 7 insertions(+), 7 deletions(-)
>> >>
>> >> Why are we even using a struct pid for this? Does anyone know?
>> >
>> > commit 87773dd56d5405ac28119fcfadacefd35877c18f add pid in ib_umem structure.
>> >
>> > and the comment has such information:
>> > Later a different process with a different mm_struct than the one that
>> > allocated the ib_umem struct
>> > ends up releasing it which results in decrementing the new processes
>> > mm->pinned_vm count past
>> > zero and wrapping.
>>
>> I think a different process should not have the permission to release ib_umem.
>> so maybe the reason is not a different process?
>> can ib_umem_release be invoked in interrupt context?
>
> We plan to restore fork support and add some way to share MRs between
> processes, so we must consider having a different process release the
> umem than acquired it.

If restore fork support, what is the expected behavior?
If parent process pinned_vm is x, what is the child process pinned_vm
value after fork? It reset to zero now.
If the parent process call ibv_dereg_mr after fork, should the child
process decrease pinned_vm?
If the child process call  ibv_dereg_mr after fork, should the parent
process decrease pinned_vm?


>
> Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
  2018-05-07  1:38         ` 858585 jemmy
@ 2018-05-08  6:30           ` Jason Gunthorpe
  2018-05-08  8:32             ` 858585 jemmy
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2018-05-08  6:30 UTC (permalink / raw)
  To: 858585 jemmy
  Cc: dledford, akpm, qing.huang, Leon Romanovsky, artemyko,
	dan.j.williams, linux-rdma, linux-kernel, adido, Gal Shachaf,
	Aviad Yehezkel, Lidong Chen

On Mon, May 07, 2018 at 09:38:53AM +0800, 858585 jemmy wrote:
> On Sat, May 5, 2018 at 2:23 AM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Fri, May 04, 2018 at 04:51:15PM +0800, 858585 jemmy wrote:
> >> On Fri, May 4, 2018 at 11:14 AM, 858585 jemmy <jemmy858585@gmail.com> wrote:
> >> > On Thu, May 3, 2018 at 11:33 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >> >> On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
> >> >>> The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
> >> >>> If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
> >> >>> exited, get_pid_task will return NULL, ib_umem_release does not decrease
> >> >>> mm->pinned_vm. This patch fixes it by use tgid.
> >> >>>
> >> >>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> >> >>>  drivers/infiniband/core/umem.c | 12 ++++++------
> >> >>>  include/rdma/ib_umem.h         |  2 +-
> >> >>>  2 files changed, 7 insertions(+), 7 deletions(-)
> >> >>
> >> >> Why are we even using a struct pid for this? Does anyone know?
> >> >
> >> > commit 87773dd56d5405ac28119fcfadacefd35877c18f add pid in ib_umem structure.
> >> >
> >> > and the comment has such information:
> >> > Later a different process with a different mm_struct than the one that
> >> > allocated the ib_umem struct
> >> > ends up releasing it which results in decrementing the new processes
> >> > mm->pinned_vm count past
> >> > zero and wrapping.
> >>
> >> I think a different process should not have the permission to release ib_umem.
> >> so maybe the reason is not a different process?
> >> can ib_umem_release be invoked in interrupt context?
> >
> > We plan to restore fork support and add some way to share MRs between
> > processes, so we must consider having a different process release the
> > umem than acquired it.
> 
> If restore fork support, what is the expected behavior?
> If parent process pinned_vm is x, what is the child process pinned_vm
> value after fork? It reset to zero now.
> If the parent process call ibv_dereg_mr after fork, should the child
> process decrease pinned_vm?
> If the child process call  ibv_dereg_mr after fork, should the parent
> process decrease pinned_vm?

If I recall the purpose of accessing the MM during de-register is to
undo the pinned pages change (pinned_vm) that register performed.

So, the semantic is simple, during deregister we must access excatly
the same MM that was used during register and undo the change to
pinned_vm.

The approach should be to find the most reliably way to hold a
reference to the MM that was used during register.

Apparently we can't just hold a ref on the mm (according to mm_get's
comment at least)

tgid is clearly a better indirect reference to the mm than pid (pid is
so obviously wrong)

But I am wondering why not just hold struct task here instead of tgid?
Isn't task->mm going to be more reliably than tgid->task->mm ??

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure
  2018-05-08  6:30           ` Jason Gunthorpe
@ 2018-05-08  8:32             ` 858585 jemmy
  0 siblings, 0 replies; 16+ messages in thread
From: 858585 jemmy @ 2018-05-08  8:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dledford, akpm, qing.huang, Leon Romanovsky, artemyko,
	dan.j.williams, linux-rdma, linux-kernel, adido, Gal Shachaf,
	Aviad Yehezkel, Lidong Chen

On Tue, May 8, 2018 at 2:30 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Mon, May 07, 2018 at 09:38:53AM +0800, 858585 jemmy wrote:
>> On Sat, May 5, 2018 at 2:23 AM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>> > On Fri, May 04, 2018 at 04:51:15PM +0800, 858585 jemmy wrote:
>> >> On Fri, May 4, 2018 at 11:14 AM, 858585 jemmy <jemmy858585@gmail.com> wrote:
>> >> > On Thu, May 3, 2018 at 11:33 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>> >> >> On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote:
>> >> >>> The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads.
>> >> >>> If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has
>> >> >>> exited, get_pid_task will return NULL, ib_umem_release does not decrease
>> >> >>> mm->pinned_vm. This patch fixes it by use tgid.
>> >> >>>
>> >> >>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>> >> >>>  drivers/infiniband/core/umem.c | 12 ++++++------
>> >> >>>  include/rdma/ib_umem.h         |  2 +-
>> >> >>>  2 files changed, 7 insertions(+), 7 deletions(-)
>> >> >>
>> >> >> Why are we even using a struct pid for this? Does anyone know?
>> >> >
>> >> > commit 87773dd56d5405ac28119fcfadacefd35877c18f add pid in ib_umem structure.
>> >> >
>> >> > and the comment has such information:
>> >> > Later a different process with a different mm_struct than the one that
>> >> > allocated the ib_umem struct
>> >> > ends up releasing it which results in decrementing the new processes
>> >> > mm->pinned_vm count past
>> >> > zero and wrapping.
>> >>
>> >> I think a different process should not have the permission to release ib_umem.
>> >> so maybe the reason is not a different process?
>> >> can ib_umem_release be invoked in interrupt context?
>> >
>> > We plan to restore fork support and add some way to share MRs between
>> > processes, so we must consider having a different process release the
>> > umem than acquired it.
>>
>> If restore fork support, what is the expected behavior?
>> If parent process pinned_vm is x, what is the child process pinned_vm
>> value after fork? It reset to zero now.
>> If the parent process call ibv_dereg_mr after fork, should the child
>> process decrease pinned_vm?
>> If the child process call  ibv_dereg_mr after fork, should the parent
>> process decrease pinned_vm?
>
> If I recall the purpose of accessing the MM during de-register is to
> undo the pinned pages change (pinned_vm) that register performed.
>
> So, the semantic is simple, during deregister we must access excatly
> the same MM that was used during register and undo the change to
> pinned_vm.
>
> The approach should be to find the most reliably way to hold a
> reference to the MM that was used during register.
>
> Apparently we can't just hold a ref on the mm (according to mm_get's
> comment at least)
>
> tgid is clearly a better indirect reference to the mm than pid (pid is
> so obviously wrong)
>
> But I am wondering why not just hold struct task here instead of tgid?
> Isn't task->mm going to be more reliably than tgid->task->mm ??

I think get_task_struct(current->group_leader) is also work.
But I find ib_ucontext structure already have a tgid field, so I think this not
necessary to ib_umem have tgid again. we can use ib_ucontext->tgid.

I will send a v2 patch.


>
> Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-05-08  8:32 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-03 14:04 [PATCH] IB/umem: use tgid instead of pid in ib_umem structure Lidong Chen
2018-05-03 15:33 ` Jason Gunthorpe
2018-05-03 18:12   ` Leon Romanovsky
2018-05-03 18:26     ` Jason Gunthorpe
2018-05-03 18:43       ` Leon Romanovsky
2018-05-03 22:01         ` Jason Gunthorpe
2018-05-04  8:32           ` 858585 jemmy
2018-05-04 13:39             ` Leon Romanovsky
2018-05-04 15:14               ` lidongchen(陈立东)
2018-05-04  2:41   ` 858585 jemmy
2018-05-04  3:14   ` 858585 jemmy
2018-05-04  8:51     ` 858585 jemmy
2018-05-04 18:23       ` Jason Gunthorpe
2018-05-07  1:38         ` 858585 jemmy
2018-05-08  6:30           ` Jason Gunthorpe
2018-05-08  8:32             ` 858585 jemmy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).