RE: [PATCH rdma-core v3 4/6] pyverbs: Add dma-buf based MR support

From: "Xiong, Jianxin" <jianxin.xiong@intel.com>
To: Daniel Vetter <daniel@ffwll.ch>, Jason Gunthorpe <jgg@ziepe.ca>
Cc: "linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"Leon Romanovsky" <leon@kernel.org>,
	Doug Ledford <dledford@redhat.com>,
	"Vetter, Daniel" <daniel.vetter@intel.com>,
	Christian Koenig <christian.koenig@amd.com>
Subject: RE: [PATCH rdma-core v3 4/6] pyverbs: Add dma-buf based MR support
Date: Mon, 30 Nov 2020 18:03:47 +0000	[thread overview]
Message-ID: <MW3PR11MB4555FE23BDECC607C5C7A16CE5F50@MW3PR11MB4555.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20201130165535.GW401619@phenom.ffwll.local>

> -----Original Message-----
> From: Daniel Vetter <daniel@ffwll.ch>
> Sent: Monday, November 30, 2020 8:56 AM
> To: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Daniel Vetter <daniel@ffwll.ch>; Xiong, Jianxin <jianxin.xiong@intel.com>; linux-rdma@vger.kernel.org; dri-
> devel@lists.freedesktop.org; Leon Romanovsky <leon@kernel.org>; Doug Ledford <dledford@redhat.com>; Vetter, Daniel
> <daniel.vetter@intel.com>; Christian Koenig <christian.koenig@amd.com>
> Subject: Re: [PATCH rdma-core v3 4/6] pyverbs: Add dma-buf based MR support
> 
> On Mon, Nov 30, 2020 at 12:36:42PM -0400, Jason Gunthorpe wrote:
> > On Mon, Nov 30, 2020 at 05:04:43PM +0100, Daniel Vetter wrote:
> > > On Mon, Nov 30, 2020 at 11:55:44AM -0400, Jason Gunthorpe wrote:
> > > > On Mon, Nov 30, 2020 at 03:57:41PM +0100, Daniel Vetter wrote:
> > > > > > +	err = ioctl(dri->fd, DRM_IOCTL_AMDGPU_GEM_CREATE, &gem_create);
> > > > > > +	if (err)
> > > > > > +		return err;
> > > > > > +
> > > > > > +	*handle = gem_create.out.handle;
> > > > > > +	return 0;
> > > > > > +}
> > > > > > +
> > > > > > +static int radeon_alloc(struct dri *dri, size_t size,
> > > > > > +uint32_t *handle)
> > > > >
> > > > > Tbh radeon chips are old enough I wouldn't care. Also doesn't
> > > > > support p2p dma-buf, so always going to be in system memory when
> > > > > you share. Plus you also need some more flags like I suggested above I think.
> > > >
> > > > What about nouveau?
> > >
> > > Reallistically chances that someone wants to use rdma together with
> > > the upstream nouveau driver are roughly nil. Imo also needs someone
> > > with the right hardware to make sure it works (since the flags are
> > > all kinda arcane driver specific stuff testing is really needed).
> >
> > Well, it would be helpful if we can test the mlx5 part of the
> > implementation, and I have a lab stocked with nouveau compatible HW..
> >
> > But you are right someone needs to test/etc, so this does not seem
> > like Jianxin should worry
> 
> Ah yes sounds good. I can help with trying to find how to allocate vram with nouveau if you don't find it. Caveat is that nouveau doesn't do
> dynamic dma-buf exports and hence none of the intersting flows and also not p2p. Not sure how much work it would be to roll that out (iirc
> it wasnt that much amdgpu code really, just endless discussions on the interface semantics and how to roll it out without breaking any of
> the existing dma-buf users).
> 
> Another thing that just crossed my mind: Do we have a testcase for forcing the eviction? Should be fairly easy to provoke with something
> like this:
> 
> - register vram-only buffer with mlx5 and do something that binds it
> - allocate enough vram-only buffers to overfill vram (again figuring out
>   how much vram you have is driver specific)
> - touch each buffer with mmap. that should force the mlx5 buffer out. it
>   might be that eviction isn't lru but preferentially idle buffers (i.e.
>   not used by hw, so anything register to mlx5 won't qualify as first
>   victims). so we might need to instead register a ton of buffers with
>   mlx5 and access them through ibverbs
> - do something with mlx5 again to force the rebinding and test it all
>   keeps working
> 
> That entire invalidate/buffer move flow is the most complex interaction I think.

Right now on my side the evict scenario is tested with the "timeout" feature of the
AMD gpu. The GPU driver would move all VRAM allocations to system buffer after
a certain period of "inactivity" (10s by default). VRAM being accessed by peer DMA
is not counted as activity from GPU's POV. I can observe the invalidation/remapping
sequence by running an RDMA test for long enough time. 

I agree having a more generic mechanism to force this scenario is going to be useful.

> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch