linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yuval Shaia <yuval.shaia@oracle.com>
To: Doug Ledford <dledford@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>,
	jgg@ziepe.ca, leon@kernel.org, monis@mellanox.com,
	parav@mellanox.com, danielj@mellanox.com, kamalheib1@gmail.com,
	markz@mellanox.com, johannes.berg@intel.com, willy@infradead.org,
	michaelgur@mellanox.com, markb@mellanox.com,
	dan.carpenter@oracle.com, bvanassche@acm.org, maxg@mellanox.com,
	israelr@mellanox.com, galpress@amazon.com, denisd@mellanox.com,
	yuvalav@mellanox.com, dennis.dalessandro@intel.com,
	will@kernel.org, ereza@mellanox.com, jgg@mellanox.com,
	linux-rdma@vger.kernel.org,
	Shamir Rabinovitch <srabinov7@gmail.com>
Subject: Re: [PATCH v1 00/24] Shared PD and MR
Date: Mon, 26 Aug 2019 12:35:04 +0300	[thread overview]
Message-ID: <20190826093504.GB3698@lap1> (raw)
In-Reply-To: <fbc950ac651d49e7f88dc483570b1ea3e56b980f.camel@redhat.com>

On Thu, Aug 22, 2019 at 10:15:11AM -0400, Doug Ledford wrote:
> On Thu, 2019-08-22 at 11:41 +0300, Yuval Shaia wrote:
> > On Wed, Aug 21, 2019 at 04:37:37PM -0700, Ira Weiny wrote:
> > > On Wed, Aug 21, 2019 at 05:21:01PM +0300, Yuval Shaia wrote:
> > > > Following patch-set introduce the shared object feature.
> > > > 
> > > > A shared object feature allows one process to create HW objects
> > > > (currently
> > > > PD and MR) so that a second process can import.
> > 
> > Hi Ira,
> > 
> > > For something this fundamental I think the cover letter should be
> > > more
> > > detailed than this.  Questions I have without digging into the code:
> > > 
> > > What is the use case?
> > 
> > I have only one use case but i didn't added it to commit log just not
> > to
> > limit the usage of this feature but you are right, cover letter is
> > great
> > for such things, will add it for v2.
> > 
> > Anyway, here is our use case: Consider a case of server with huge
> > amount
> > of memory and some hundreds or even thousands processes are using it
> > to
> > serves clients requests. In this case the HCA will have to manage
> > hundreds
> > or thousands MRs. A better design maybe would be that one process will
> > create one (or several) MR(s) which will be shared with the other
> > processes. This will reduce the number of address translation entries
> > and
> > cache miss dramatically.
> 
> Unless I'm misreading you here, it will be at the expense of pretty much
> all inter-process memory security.  You're talking about one process

Isn't it already there with the use of Linux shared memory?

> creating some large MRs just to cover the overall memory in the machine,
> then sharing that among processes, and all using it to reduce the MR
> workload of the card.  This sounds like going back to the days of MSDos.

Well, too many MRs can lead to serious bottleneck, we are currently dealing
with such issue when many VMs are trying to re-register their MRs at once,
but since it is out of the scope of $subject i will not expand, just
mentioning it because *it is* an issue and educing the number of MRs could
help.

> It also sounds like a programming error in one process could expose
> potentially all processes data buffers across all processes sharing this
> PD and MR.

Again, this is already possible with shared memory and some designs trusts
on that.

> 
> I get the idea, and the problem you are trying to solve, but I'm not
> sure that going down this path is wise.
> 
> Maybe....maybe if you limit a queue pair to send/recv only and no
> rdma_{read,write}, then this wouldn't be quite as bad.  But even then
> I'm still very leary of this "feature".

How about if all the processes are considered as one unit of trust? anyway
this could be done in a multi threaded application or when one process
forks child processes.

> 
> > 
> > > What is the "key" that allows a MR to be shared among 2
> > > processes?  Do you
> > > introduce some PD identifier?  And then some {PDID, lkey} tuple is
> > > used to ID
> > > the MR?
> > > 
> > > I assume you have to share the PD first and then any MR in the
> > > shared PD can be
> > > shared?  If so how does the MR get shared?
> > 
> > Sorry, i'm not following.
> > I think the term 'share' is somehow mistake, it is actually a process
> > 'imports' objects into it's context. And yes, the workflow is first to
> > import the PD and then import the MR.
> > 
> > > Again I'm concerned with how this will interact with the RDMA and
> > > file system
> > > interaction we have been trying to fix.
> > 
> > I'm not aware of this file-system thing, can you point me to some
> > discussion on that so i'll see how this patch-set affect it.
> > 
> > > Why is SCM_RIGHTS on the rdma context FD not sufficient to share the
> > > entire
> > > context, PD, and all MR's?
> > 
> > Well, this SCM_RIGHTS is great, one can share the IB context with
> > another.
> > But it is not enough, because:
> > - What API the second process can use to get his hands on one of the
> > PDs or
> >   MRs from this context?
> > - What mechanism takes care of the destruction of such objects
> > (SCM_RIGHTS
> >   takes care for the ref counting of the context but i'm referring to
> > the
> >   PDs and MRs objects)?
> > 
> > The entire purpose of this patch set is to address these two
> > questions.
> > 
> > Yuval
> > 
> > > Ira
> > > 
> > > > Patch-set is logically splits to 4 parts as the following:
> > > > - patches 1 to 7 and 18 are preparation steps.
> > > > - patches 8 to 14 are the implementation of import PD
> > > > - patches 15 to 17 are the implementation of the verb
> > > > - patches 19 to 24 are the implementation of import MR
> > > > 
> > > > v0 -> v1:
> > > > 	* Delete the patch "IB/uverbs: ufile must be freed only when not
> > > > 	  used anymore". The process can die, the ucontext remains until
> > > > 	  last reference to it is closed.
> > > > 	* Rebase to latest for-next branch
> > > > 
> > > > Shamir Rabinovitch (16):
> > > >   RDMA/uverbs: uobj_get_obj_read should return the ib_uobject
> > > >   RDMA/uverbs: Delete the macro uobj_put_obj_read
> > > >   RDMA/nldev: ib_pd can be pointed by multiple ib_ucontext
> > > >   IB/{core,hw}: ib_pd should not have ib_uobject pointer
> > > >   IB/core: ib_uobject need HW object reference count
> > > >   IB/uverbs: Helper function to initialize ufile member of
> > > >     uverbs_attr_bundle
> > > >   IB/uverbs: Add context import lock/unlock helper
> > > >   IB/verbs: Prototype of HW object clone callback
> > > >   IB/mlx4: Add implementation of clone_pd callback
> > > >   IB/mlx5: Add implementation of clone_pd callback
> > > >   RDMA/rxe: Add implementation of clone_pd callback
> > > >   IB/uverbs: Add clone reference counting to ib_pd
> > > >   IB/uverbs: Add PD import verb
> > > >   IB/mlx4: Enable import from FD verb
> > > >   IB/mlx5: Enable import from FD verb
> > > >   RDMA/rxe: Enable import from FD verb
> > > > 
> > > > Yuval Shaia (8):
> > > >   IB/core: Install clone ib_pd in device ops
> > > >   IB/core: ib_mr should not have ib_uobject pointer
> > > >   IB/core: Install clone ib_mr in device ops
> > > >   IB/mlx4: Add implementation of clone_pd callback
> > > >   IB/mlx5: Add implementation of clone_pd callback
> > > >   RDMA/rxe: Add implementation of clone_pd callback
> > > >   IB/uverbs: Add clone reference counting to ib_mr
> > > >   IB/uverbs: Add MR import verb
> > > > 
> > > >  drivers/infiniband/core/device.c              |   2 +
> > > >  drivers/infiniband/core/nldev.c               | 127 ++++-
> > > >  drivers/infiniband/core/rdma_core.c           |  23 +-
> > > >  drivers/infiniband/core/uverbs.h              |   2 +
> > > >  drivers/infiniband/core/uverbs_cmd.c          | 489
> > > > +++++++++++++++---
> > > >  drivers/infiniband/core/uverbs_main.c         |   1 +
> > > >  drivers/infiniband/core/uverbs_std_types_mr.c |   1 -
> > > >  drivers/infiniband/core/verbs.c               |   4 -
> > > >  drivers/infiniband/hw/hns/hns_roce_hw_v1.c    |   1 -
> > > >  drivers/infiniband/hw/mlx4/main.c             |  18 +-
> > > >  drivers/infiniband/hw/mlx5/main.c             |  34 +-
> > > >  drivers/infiniband/hw/mthca/mthca_qp.c        |   3 +-
> > > >  drivers/infiniband/sw/rxe/rxe_verbs.c         |   5 +
> > > >  include/rdma/ib_verbs.h                       |  43 +-
> > > >  include/rdma/uverbs_std_types.h               |  11 +-
> > > >  include/uapi/rdma/ib_user_verbs.h             |  15 +
> > > >  include/uapi/rdma/rdma_netlink.h              |   3 +
> > > >  17 files changed, 669 insertions(+), 113 deletions(-)
> > > > 
> > > > -- 
> > > > 2.20.1
> > > > 
> 
> -- 
> Doug Ledford <dledford@redhat.com>
>     GPG KeyID: B826A3330E572FDD
>     Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD



  reply	other threads:[~2019-08-26  9:36 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-21 14:21 [PATCH v1 00/24] Shared PD and MR Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 01/24] RDMA/uverbs: uobj_get_obj_read should return the ib_uobject Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 02/24] RDMA/uverbs: Delete the macro uobj_put_obj_read Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 03/24] RDMA/nldev: ib_pd can be pointed by multiple ib_ucontext Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 04/24] IB/{core,hw}: ib_pd should not have ib_uobject pointer Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 05/24] IB/core: ib_uobject need HW object reference count Yuval Shaia
2019-08-21 14:53   ` Jason Gunthorpe
2019-08-27 16:28     ` Yuval Shaia
2019-08-27 18:18       ` Jason Gunthorpe
2019-08-21 14:21 ` [PATCH v1 06/24] IB/uverbs: Helper function to initialize ufile member of uverbs_attr_bundle Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 07/24] IB/uverbs: Add context import lock/unlock helper Yuval Shaia
2019-08-21 14:57   ` Jason Gunthorpe
2019-08-21 14:21 ` [PATCH v1 08/24] IB/verbs: Prototype of HW object clone callback Yuval Shaia
2019-08-21 14:59   ` Jason Gunthorpe
2019-08-21 14:21 ` [PATCH v1 09/24] IB/core: Install clone ib_pd in device ops Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 10/24] IB/mlx4: Add implementation of clone_pd callback Yuval Shaia
2019-08-21 14:59   ` Jason Gunthorpe
2019-08-21 14:21 ` [PATCH v1 11/24] IB/mlx5: " Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 12/24] RDMA/rxe: " Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 13/24] IB/uverbs: Add clone reference counting to ib_pd Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 14/24] IB/uverbs: Add PD import verb Yuval Shaia
2019-08-21 15:00   ` Jason Gunthorpe
2019-08-21 14:21 ` [PATCH v1 15/24] IB/mlx4: Enable import from FD verb Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 16/24] IB/mlx5: " Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 17/24] RDMA/rxe: " Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 18/24] IB/core: ib_mr should not have ib_uobject pointer Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 19/24] IB/core: Install clone ib_mr in device ops Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 20/24] IB/mlx4: Add implementation of clone_pd callback Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 21/24] IB/mlx5: " Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 22/24] RDMA/rxe: " Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 23/24] IB/uverbs: Add clone reference counting to ib_mr Yuval Shaia
2019-08-21 14:21 ` [PATCH v1 24/24] IB/uverbs: Add MR import verb Yuval Shaia
2019-08-21 14:50 ` [PATCH v1 00/24] Shared PD and MR Jason Gunthorpe
2019-08-22  8:50   ` Yuval Shaia
2019-08-21 23:37 ` Ira Weiny
2019-08-22  8:41   ` Yuval Shaia
2019-08-22 14:15     ` Doug Ledford
2019-08-26  9:35       ` Yuval Shaia [this message]
2019-08-22 16:58     ` Ira Weiny
2019-08-22 17:03       ` Jason Gunthorpe
2019-08-22 20:10         ` Weiny, Ira
2019-08-23 11:57           ` Jason Gunthorpe
2019-08-23 21:33             ` Weiny, Ira
2019-08-26 10:58               ` Yuval Shaia
2019-08-26 10:29         ` Yuval Shaia
2019-08-26 12:26           ` Jason Gunthorpe
2019-08-26  9:51       ` Yuval Shaia
2019-08-26 10:04       ` Yuval Shaia
2019-08-26 10:10       ` Yuval Shaia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190826093504.GB3698@lap1 \
    --to=yuval.shaia@oracle.com \
    --cc=bvanassche@acm.org \
    --cc=dan.carpenter@oracle.com \
    --cc=danielj@mellanox.com \
    --cc=denisd@mellanox.com \
    --cc=dennis.dalessandro@intel.com \
    --cc=dledford@redhat.com \
    --cc=ereza@mellanox.com \
    --cc=galpress@amazon.com \
    --cc=ira.weiny@intel.com \
    --cc=israelr@mellanox.com \
    --cc=jgg@mellanox.com \
    --cc=jgg@ziepe.ca \
    --cc=johannes.berg@intel.com \
    --cc=kamalheib1@gmail.com \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=markb@mellanox.com \
    --cc=markz@mellanox.com \
    --cc=maxg@mellanox.com \
    --cc=michaelgur@mellanox.com \
    --cc=monis@mellanox.com \
    --cc=parav@mellanox.com \
    --cc=srabinov7@gmail.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=yuvalav@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).