Re: RFC: unpinned DMA-buf exporting v2

From: Daniel Vetter <daniel@ffwll.ch>
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>
Cc: linaro-mm-sig@lists.linaro.org, linux-media@vger.kernel.org,
	dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org
Subject: Re: RFC: unpinned DMA-buf exporting v2
Date: Mon, 19 Mar 2018 15:09:36 +0100	[thread overview]
Message-ID: <20180319140936.GO14155@phenom.ffwll.local> (raw)
In-Reply-To: <20180316132049.1748-1-christian.koenig@amd.com>

On Fri, Mar 16, 2018 at 02:20:44PM +0100, Christian König wrote:
> Hi everybody,
> 
> since I've got positive feedback from Daniel I continued working on this approach.
> 
> A few issues are still open:
> 1. Daniel suggested that I make the invalidate_mappings callback a parameter of dma_buf_attach().
> 
> This approach unfortunately won't work because when the attachment is
> created the importer is not necessarily ready to handle invalidation
> events.

Why do you have this constraint? This sounds a bit like inverted
create/teardown sequence troubles, where you make an object "life" before
the thing is fully set up.

Can't we fix this by creating the entire ttm scaffolding you'll need for a
dma-buf upfront, and only once you have everything we grab the dma_buf
attachment? At that point you really should be able to evict buffers
again.

Not requiring invalidate_mapping to be set together with the attachment
means we can't ever require importers to support it (e.g. to address your
concern with the userspace dma-buf userptr magic).

> E.g. in the amdgpu example we first need to setup the imported GEM/TMM
> objects and install that in the attachment.
> 
> My solution is to introduce a separate function to grab the locks and
> set the callback, this function could then be used to pin the buffer
> later on if that turns out to be necessary after all.
> 
> 2. With my example setup this currently results in a ping/pong situation
> because the exporter prefers a VRAM placement while the importer prefers
> a GTT placement.
> 
> This results in quite a performance drop, but can be fixed by a simple
> mesa patch which allows shred BOs to be placed in both VRAM and GTT.
> 
> Question is what should we do in the meantime? Accept the performance
> drop or only allow unpinned sharing with new Mesa?

Maybe the exporter should not try to move stuff back into VRAM as long as
there's an active dma-buf? I mean it's really cool that it works, but
maybe let's just do this for a tech demo :-)

Of course if it then runs out of TT then it could still try to move it
back in. And "let's not move it when it's imported" is probably too stupid
too, and will need to be improved again with more heuristics, but would at
least get it off the ground.

Long term you might want to move perhaps once per 10 seconds or so, to get
idle importers to detach. Adjust 10s to match whatever benchmark/workload
you care about.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch