From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932929AbeEHIc5 (ORCPT ); Tue, 8 May 2018 04:32:57 -0400 Received: from mail-io0-f196.google.com ([209.85.223.196]:38867 "EHLO mail-io0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932784AbeEHIcw (ORCPT ); Tue, 8 May 2018 04:32:52 -0400 X-Google-Smtp-Source: AB8JxZpOpfYFVe+7hnnYPCn8aZ7jJMwS6II/JRDoVXDiSnwssVPPXzMy8ygX22UZYa8vuhnpTxsSRFsiY0lhUJswze0= MIME-Version: 1.0 In-Reply-To: <20180508063006.aicgwalirnkjmeuf@ziepe.ca> References: <1525356274-736-1-git-send-email-lidongchen@tencent.com> <20180503153310.GA9738@ziepe.ca> <20180504182302.zunfryk2czge5adx@ziepe.ca> <20180508063006.aicgwalirnkjmeuf@ziepe.ca> From: 858585 jemmy Date: Tue, 8 May 2018 16:32:51 +0800 Message-ID: Subject: Re: [PATCH] IB/umem: use tgid instead of pid in ib_umem structure To: Jason Gunthorpe Cc: dledford@redhat.com, akpm@linux-foundation.org, qing.huang@oracle.com, Leon Romanovsky , artemyko@mellanox.com, dan.j.williams@intel.com, linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org, adido@mellanox.com, Gal Shachaf , Aviad Yehezkel , Lidong Chen Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 8, 2018 at 2:30 PM, Jason Gunthorpe wrote: > On Mon, May 07, 2018 at 09:38:53AM +0800, 858585 jemmy wrote: >> On Sat, May 5, 2018 at 2:23 AM, Jason Gunthorpe wrote: >> > On Fri, May 04, 2018 at 04:51:15PM +0800, 858585 jemmy wrote: >> >> On Fri, May 4, 2018 at 11:14 AM, 858585 jemmy wrote: >> >> > On Thu, May 3, 2018 at 11:33 PM, Jason Gunthorpe wrote: >> >> >> On Thu, May 03, 2018 at 10:04:34PM +0800, Lidong Chen wrote: >> >> >>> The userspace may invoke ibv_reg_mr and ibv_dereg_mr by different threads. >> >> >>> If when ibv_dereg_mr invoke and the thread which invoked ibv_reg_mr has >> >> >>> exited, get_pid_task will return NULL, ib_umem_release does not decrease >> >> >>> mm->pinned_vm. This patch fixes it by use tgid. >> >> >>> >> >> >>> Signed-off-by: Lidong Chen >> >> >>> drivers/infiniband/core/umem.c | 12 ++++++------ >> >> >>> include/rdma/ib_umem.h | 2 +- >> >> >>> 2 files changed, 7 insertions(+), 7 deletions(-) >> >> >> >> >> >> Why are we even using a struct pid for this? Does anyone know? >> >> > >> >> > commit 87773dd56d5405ac28119fcfadacefd35877c18f add pid in ib_umem structure. >> >> > >> >> > and the comment has such information: >> >> > Later a different process with a different mm_struct than the one that >> >> > allocated the ib_umem struct >> >> > ends up releasing it which results in decrementing the new processes >> >> > mm->pinned_vm count past >> >> > zero and wrapping. >> >> >> >> I think a different process should not have the permission to release ib_umem. >> >> so maybe the reason is not a different process? >> >> can ib_umem_release be invoked in interrupt context? >> > >> > We plan to restore fork support and add some way to share MRs between >> > processes, so we must consider having a different process release the >> > umem than acquired it. >> >> If restore fork support, what is the expected behavior? >> If parent process pinned_vm is x, what is the child process pinned_vm >> value after fork? It reset to zero now. >> If the parent process call ibv_dereg_mr after fork, should the child >> process decrease pinned_vm? >> If the child process call ibv_dereg_mr after fork, should the parent >> process decrease pinned_vm? > > If I recall the purpose of accessing the MM during de-register is to > undo the pinned pages change (pinned_vm) that register performed. > > So, the semantic is simple, during deregister we must access excatly > the same MM that was used during register and undo the change to > pinned_vm. > > The approach should be to find the most reliably way to hold a > reference to the MM that was used during register. > > Apparently we can't just hold a ref on the mm (according to mm_get's > comment at least) > > tgid is clearly a better indirect reference to the mm than pid (pid is > so obviously wrong) > > But I am wondering why not just hold struct task here instead of tgid? > Isn't task->mm going to be more reliably than tgid->task->mm ?? I think get_task_struct(current->group_leader) is also work. But I find ib_ucontext structure already have a tgid field, so I think this not necessary to ib_umem have tgid again. we can use ib_ucontext->tgid. I will send a v2 patch. > > Jason