Re: [Intel-gfx] [RFC PATCH] drm/ttm: Add a private member to the struct ttm_resource

From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: "Christian König" <christian.koenig@amd.com>,
	intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Cc: maarten.lankhorst@linux.intel.com, matthew.auld@intel.com,
	Matthew Auld <matthew.william.auld@gmail.com>
Subject: Re: [Intel-gfx] [RFC PATCH] drm/ttm: Add a private member to the struct ttm_resource
Date: Tue, 14 Sep 2021 10:27:47 +0200	[thread overview]
Message-ID: <4add643ae0b1a1daa4657106f5554894145a9778.camel@linux.intel.com> (raw)
In-Reply-To: <0a0f1b45-a668-e0a8-dcd0-d4413ec3b39b@amd.com>

On Tue, 2021-09-14 at 09:40 +0200, Christian König wrote:
> Am 13.09.21 um 14:41 schrieb Thomas Hellström:
> > [SNIP]
> > > > > Let's say you have a struct ttm_object_vram and a struct 
> > > > > ttm_object_gtt, both subclassing drm_gem_object. Then I'd say
> > > > > a 
> > > > > driver would want to subclass those to attach identical data,
> > > > > extend functionality and provide a single i915_gem_object to
> > > > > the 
> > > > > rest of the driver, which couldn't care less whether it's
> > > > > vram or 
> > > > > gtt? Wouldn't you say having separate struct ttm_object_vram
> > > > > and a 
> > > > > struct ttm_object_gtt in this case would be awkward?. We
> > > > > *want* to 
> > > > > allow common handling.
> > > > 
> > > > Yeah, but that's a bad idea. This is like diamond inheritance
> > > > in C++.
> > > > 
> > > > When you need the same functionality in different backends you 
> > > > implement that as separate object and then add a parent class.
> > > > 
> > > > > 
> > > > > It's the exact same situation here. With struct ttm_resource
> > > > > you 
> > > > > let *different* implementation flavours subclass it, which
> > > > > makes it 
> > > > > awkward for the driver to extend the functionality in a
> > > > > common way 
> > > > > by subclassing, unless the driver only uses a single
> > > > > implementation.
> > > > 
> > > > Well the driver should use separate implementations for their 
> > > > different domains as much as possible.
> > > > 
> > > Hmm, Now you lost me a bit. Are you saying that the way we do
> > > dynamic 
> > > backends in the struct ttm_buffer_object to facilitate driver 
> > > subclassing is a bad idea or that the RFC with backpointer is a
> > > bad 
> > > idea?
> > > 
> > > 
> > Or if you mean diamond inheritance is bad, yes that's basically my
> > point.
> 
> That diamond inheritance is a bad idea. What I don't understand is
> why 
> you need that in the first place?
> 
> Information that you attach to a resource are specific to the domain 
> where the resource is allocated from. So why do you want to attach
> the 
> same information to a resources from different domains?

Again, for the same reason that we do that with struct i915_gem_objects
and struct ttm_tts, to extend the functionality. I mean information
that we attach when we subclass a struct ttm_buffer_object doesn't
necessarily care about whether it's a VRAM or a GTT object. In exactly
the same way, information that we want to attach to a struct
ttm_resource doesn't necessarily care whether it's a system or a VRAM
resource, and need not be specific to any of those.

In this particular case, as memory management becomes asynchronous, you
can't attach things like sg-tables and gpu binding information to the
gem object anymore, because the object may have a number of migrations
in the pipeline. Such things need to be attached to the structure that
abstracts the memory allocation, and which may have a completely
different lifetime than the object itself.

In our particular case we want to attach information for cached page
lookup and and sg-table, and moving forward probably the gpu binding
(vma) information, and that is the same information for any
ttm_resource regardless where it's allocated from.

Typical example: A pipelined GPU operation happening before an async
eviction goes wrong. We need to error capture and reset. But if we look
at the object for error capturing, it's already updated pointing to an
after-eviction resource, and the resource sits on a ghost object (or in
the future when ghost objects go away perhaps in limbo somewhere).

We need to capture the memory pointed to by the struct ttm_resource the
GPU was referencing, and to be able to do that we need to cache driver-
specific info on the resource. Typically an sg-list and GPU binding
information. 

Anyway, that cached information needs to be destroyed together with the
resource and thus we need to be able to access that information from
the resource in some way, regardless whether it's a pointer or whether
we embed the struct resource.

I think it's pretty important here that we (using the inheritance
diagram below) recognize the need for D to inherit from A, just like we
do for objects or ttm_tts.

> 
> > 
> > Looking at
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FMultiple_inheritance%23%2Fmedia%2FFile%3ADiamond_inheritance.svg&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Cece4bd8aab644feacc1808d976b3ca56%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637671336950757656%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=LPMnfvC1px0bW8o420vP72oBbkm1v76A%2B0PDUw7urQY%3D&amp;reserved=0
> >  
> > 
> > 
> > 1)
> > 
> > A would be the struct ttm_resource itself,
> > D would be struct i915_resource,
> > B would be struct ttm_range_mgr_node,
> > C would be struct i915_ttm_buddy_resource
> > 
> > And we need to resolve the ambiguity using the awkward union 
> > construct, iff we need to derive from both B and C.
> > 
> > Struct ttm_buffer_object and struct ttm_tt instead have B) and C) 
> > being dynamic backends of A) or a single type derived from A) Hence
> > the problem doesn't exist for these types.
> > 
> > So the question from last email remains, if ditching this RFC, can
> > we 
> > have B) and C) implemented by helpers that can be used from D) and 
> > that don't derive from A?
> 
> Well we already have that in the form of drm_mm. I mean the 
> ttm_range_manager is just a relatively small glue code which
> implements 
> the TTMs resource interface using the drm_mm object and a spinlock.
> IIRC 
> that less than 200 lines of code.
> 
> So you should already have the necessary helpers and just need to 
> implement the resource manager as far as I can see.
> 
> I mean I reused the ttm_range_manager_node in for amdgpu_gtt_mgr and 
> could potentially reuse a bit more of the ttm_range_manager code. But
> I 
> don't see that as much of an issue, the extra functionality there is 
> just minimal.

Sure but that would give up the prereq of having reusable resource
manager implementations. What happens if someone would like to reuse
the buddy manager? And to complicate things even more, the information
we attach to VRAM resources also needs to be attached to system
resources. Sure we could probably re-implement a combined system-buddy-
range manager, but that seems like something overly complex.

The other object examples resolve the diamond inheritance with a
pointer to the specialization (BC) and let D derive from A.

TTM resources do it backwards. If we can just recognize that and ponder
what's the easiest way to resolve this given the current design, I
actually think we'd arrive at a backpointer to allow downcasting from A
to D.

Thanks,
Thomas

> 
> Regards,
> Christian.
> 
> > 
> > Thanks,
> > 
> > Thomas
> > 
> > 
> > 
>