linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: "Gal Pressman" <galpress@amazon.com>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Christian König" <christian.koenig@amd.com>,
	"Doug Ledford" <dledford@redhat.com>,
	"open list:DMA BUFFER SHARING FRAMEWORK"
	<linux-media@vger.kernel.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	linux-rdma <linux-rdma@vger.kernel.org>,
	"Oded Gabbay" <ogabbay@habana.ai>,
	"Tomer Tayar" <ttayar@habana.ai>,
	"Yossi Leybovich" <sleybo@amazon.com>,
	"Alexander Matushevsky" <matua@amazon.com>,
	"Leon Romanovsky" <leonro@nvidia.com>,
	"Jianxin Xiong" <jianxin.xiong@intel.com>,
	"John Hubbard" <jhubbard@nvidia.com>
Subject: Re: [RFC] Make use of non-dynamic dmabuf in RDMA
Date: Fri, 20 Aug 2021 09:33:16 -0300	[thread overview]
Message-ID: <20210820123316.GV543798@ziepe.ca> (raw)
In-Reply-To: <CAKMK7uGgQWcs4Va6TGN9akHSSkmTs1i0Kx+6WpeiXWhJKpasLA@mail.gmail.com>

On Fri, Aug 20, 2021 at 09:25:30AM +0200, Daniel Vetter wrote:
> On Fri, Aug 20, 2021 at 1:06 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Wed, Aug 18, 2021 at 11:34:51AM +0200, Daniel Vetter wrote:
> > > On Wed, Aug 18, 2021 at 9:45 AM Gal Pressman <galpress@amazon.com> wrote:
> > > >
> > > > Hey all,
> > > >
> > > > Currently, the RDMA subsystem can only work with dynamic dmabuf
> > > > attachments, which requires the RDMA device to support on-demand-paging
> > > > (ODP) which is not common on most devices (only supported by mlx5).
> > > >
> > > > While the dynamic requirement makes sense for certain GPUs, some devices
> > > > (such as habanalabs) have device memory that is always "pinned" and do
> > > > not need/use the move_notify operation.
> > > >
> > > > The motivation of this RFC is to use habanalabs as the dmabuf exporter,
> > > > and EFA as the importer to allow for peer2peer access through libibverbs.
> > > >
> > > > This draft patch changes the dmabuf driver to differentiate between
> > > > static/dynamic attachments by looking at the move_notify op instead of
> > > > the importer_ops struct, and allowing the peer2peer flag to be enabled
> > > > in case of a static exporter.
> > > >
> > > > Thanks
> > > >
> > > > Signed-off-by: Gal Pressman <galpress@amazon.com>
> > >
> > > Given that habanalabs dma-buf support is very firmly in limbo (at
> > > least it's not yet in linux-next or anywhere else) I think you want to
> > > solve that problem first before we tackle the additional issue of
> > > making p2p work without dynamic dma-buf. Without that it just doesn't
> > > make a lot of sense really to talk about solutions here.
> >
> > I have been thinking about adding a dmabuf exporter to VFIO, for
> > basically the same reason habana labs wants to do it.
> >
> > In that situation we'd want to see an approach similar to this as well
> > to have a broad usability.
> >
> > The GPU drivers also want this for certain sophisticated scenarios
> > with RDMA, the intree drivers just haven't quite got there yet.
> >
> > So, I think it is worthwhile to start thinking about this regardless
> > of habana labs.
> 
> Oh sure, I've been having these for a while. I think there's two options:
> - some kind of soft-pin, where the contract is that we only revoke
> when absolutely necessary, and it's expected to be catastrophic on the
> importer's side. 

Honestly, I'm not very keen on this. We don't really have HW support
in several RDMA scenarios for even catastrophic unpin.

Gal, can EFA even do this for a MR? You basically have to resize the
rkey/lkey to zero length (or invalidate it like a FMR) under the
catstrophic revoke. The rkey/lkey cannot just be destroyed as that
opens a security problem with rkey/lkey re-use.

I think I saw EFA's current out of tree implementations had this bug.

> to do is mmap revoke), and I think that model of exclusive device
> ownership with the option to revoke fits pretty well for at least some
> of the accelerators floating around. In that case importers would
> never get a move_notify (maybe we should call this revoke_notify to
> make it clear it's a bit different) callback, except when the entire
> thing has been yanked. I think that would fit pretty well for VFIO,
> and I think we should be able to make it work for rdma too as some
> kind of auto-deregister. The locking might be fun with both of these
> since I expect some inversions compared to the register path, we'll
> have to figure these out.

It fits semantically nicely, VFIO also has a revoke semantic for BAR
mappings.

The challenge is the RDMA side which doesn't have a 'dma disabled
error state' for objects as part of the spec.

Some HW, like mlx5, can implement this for MR objects (see revoke_mr),
but I don't know if anything else can, and even mlx5 currently can't
do a revoke for any other object type.

I don't know how useful it would be, need to check on some of the use
cases.

The locking is tricky as we have to issue a device command, but that
device command cannot run concurrently with destruction or the tail
part of creation.

Jason

  reply	other threads:[~2021-08-20 12:33 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-18  7:43 [RFC] Make use of non-dynamic dmabuf in RDMA Gal Pressman
2021-08-18  8:00 ` Christian König
2021-08-18  8:37   ` Gal Pressman
2021-08-18  9:34 ` Daniel Vetter
2021-08-19 23:06   ` Jason Gunthorpe
2021-08-20  7:25     ` Daniel Vetter
2021-08-20 12:33       ` Jason Gunthorpe [this message]
2021-08-20 12:58         ` Gal Pressman
2021-08-20 14:32           ` Jason Gunthorpe
2021-08-21  9:16             ` Gal Pressman
2021-08-23 10:43               ` Christian König
2021-08-24  9:06                 ` Gal Pressman
2021-08-24  9:32                   ` Christian König
2021-08-24 17:27                     ` John Hubbard
2021-08-24 17:32                       ` Jason Gunthorpe
2021-08-24 17:35                         ` John Hubbard
2021-08-24 19:15                           ` Dave Airlie
2021-08-24 19:30                             ` Jason Gunthorpe
2021-08-24 19:43                             ` Alex Deucher
2021-08-24 20:00                               ` Xiong, Jianxin
2021-08-25  6:17                         ` Christian König
2021-08-25  6:47                           ` John Hubbard
2021-08-25 12:18                           ` Jason Gunthorpe
2021-08-25 12:27                             ` Christian König
2021-08-25 12:38                               ` Jason Gunthorpe
2021-08-25 13:51                                 ` Christian König
2021-08-25 14:47                                   ` Jason Gunthorpe
2021-08-25 15:14                                     ` Christian König
2021-08-25 15:49                                       ` Jason Gunthorpe
2021-08-25 16:02                                       ` Oded Gabbay
2021-09-01 11:20                         ` Gal Pressman
2021-09-01 11:24                           ` Christian König
2021-09-02  6:56                             ` Gal Pressman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210820123316.GV543798@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=christian.koenig@amd.com \
    --cc=daniel@ffwll.ch \
    --cc=dledford@redhat.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=galpress@amazon.com \
    --cc=jhubbard@nvidia.com \
    --cc=jianxin.xiong@intel.com \
    --cc=leonro@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=matua@amazon.com \
    --cc=ogabbay@habana.ai \
    --cc=sleybo@amazon.com \
    --cc=sumit.semwal@linaro.org \
    --cc=ttayar@habana.ai \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).