From: "Christian König" <christian.koenig@amd.com>
To: Danilo Krummrich <dakr@redhat.com>,
Matthew Brost <matthew.brost@intel.com>
Cc: daniel@ffwll.ch, airlied@redhat.com, bskeggs@redhat.com,
jason@jlekstrand.net, tzimmermann@suse.de, mripard@kernel.org,
corbet@lwn.net, nouveau@lists.freedesktop.org,
linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
linux-doc@vger.kernel.org
Subject: Re: [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
Date: Mon, 30 Jan 2023 14:02:37 +0100 [thread overview]
Message-ID: <1840e9fb-fd1b-79b7-4238-54ae97333d0b@amd.com> (raw)
In-Reply-To: <15fb0179-c7c5-8a64-ed08-841189919f5e@redhat.com>
Am 29.01.23 um 19:46 schrieb Danilo Krummrich:
> On 1/27/23 22:09, Danilo Krummrich wrote:
>> On 1/27/23 16:17, Christian König wrote:
>>> Am 27.01.23 um 15:44 schrieb Danilo Krummrich:
>>>> [SNIP]
>>>>>>>
>>>>>>> What you want is one component for tracking the VA allocations
>>>>>>> (drm_mm based) and a different component/interface for tracking
>>>>>>> the VA mappings (probably rb tree based).
>>>>>>
>>>>>> That's what the GPUVA manager is doing. There are gpuva_regions
>>>>>> which correspond to VA allocations and gpuvas which represent the
>>>>>> mappings. Both are tracked separately (currently both with a
>>>>>> separate drm_mm, though). However, the GPUVA manager needs to
>>>>>> take regions into account when dealing with mappings to make sure
>>>>>> the GPUVA manager doesn't propose drivers to merge over region
>>>>>> boundaries. Speaking from userspace PoV, the kernel wouldn't
>>>>>> merge mappings from different VKBuffer objects even if they're
>>>>>> virtually and physically contiguous.
>>>>>
>>>>> That are two completely different things and shouldn't be handled
>>>>> in a single component.
>>>>
>>>> They are different things, but they're related in a way that for
>>>> handling the mappings (in particular merging and sparse) the GPUVA
>>>> manager needs to know the VA allocation (or region) boundaries.
>>>>
>>>> I have the feeling there might be a misunderstanding. Userspace is
>>>> in charge to actually allocate a portion of VA space and manage it.
>>>> The GPUVA manager just needs to know about those VA space
>>>> allocations and hence keeps track of them.
>>>>
>>>> The GPUVA manager is not meant to be an allocator in the sense of
>>>> finding and providing a hole for a given request.
>>>>
>>>> Maybe the non-ideal choice of using drm_mm was implying something
>>>> else.
>>>
>>> Uff, well long story short that doesn't even remotely match the
>>> requirements. This way the GPUVA manager won't be usable for a whole
>>> bunch of use cases.
>>>
>>> What we have are mappings which say X needs to point to Y with this
>>> and hw dependent flags.
>>>
>>> The whole idea of having ranges is not going to fly. Neither with
>>> AMD GPUs and I strongly think not with Intels XA either.
>>
>> A range in the sense of the GPUVA manager simply represents a VA
>> space allocation (which in case of Nouveau is taken in userspace).
>> Userspace allocates the portion of VA space and lets the kernel know
>> about it. The current implementation needs that for the named
>> reasons. So, I think there is no reason why this would work with one
>> GPU, but not with another. It's just part of the design choice of the
>> manager.
>>
>> And I'm absolutely happy to discuss the details of the manager
>> implementation though.
>>
>>>
>>>>> We should probably talk about the design of the GPUVA manager once
>>>>> more when this should be applicable to all GPU drivers.
>>>>
>>>> That's what I try to figure out with this RFC, how to make it
>>>> appicable for all GPU drivers, so I'm happy to discuss this. :-)
>>>
>>> Yeah, that was really good idea :) That proposal here is really far
>>> away from the actual requirements.
>>>
>>
>> And those are the ones I'm looking for. Do you mind sharing the
>> requirements for amdgpu in particular?
>>
>>>>>> For sparse residency the kernel also needs to know the region
>>>>>> boundaries to make sure that it keeps sparse mappings around.
>>>>>
>>>>> What?
>>>>
>>>> When userspace creates a new VKBuffer with the
>>>> VK_BUFFER_CREATE_SPARSE_BINDING_BIT the kernel may need to create
>>>> sparse mappings in order to ensure that using this buffer without
>>>> any memory backed mappings doesn't fault the GPU.
>>>>
>>>> Currently, the implementation does this the following way:
>>>>
>>>> 1. Userspace creates a new VKBuffer and hence allocates a portion
>>>> of the VA space for it. It calls into the kernel indicating the new
>>>> VA space region and the fact that the region is sparse.
>>>>
>>>> 2. The kernel picks up the region and stores it in the GPUVA
>>>> manager, the driver creates the corresponding sparse mappings /
>>>> page table entries.
>>>>
>>>> 3. Userspace might ask the driver to create a couple of memory
>>>> backed mappings for this particular VA region. The GPUVA manager
>>>> stores the mapping parameters, the driver creates the corresponding
>>>> page table entries.
>>>>
>>>> 4. Userspace might ask to unmap all the memory backed mappings from
>>>> this particular VA region. The GPUVA manager removes the mapping
>>>> parameters, the driver cleans up the corresponding page table
>>>> entries. However, the driver also needs to re-create the sparse
>>>> mappings, since it's a sparse buffer, hence it needs to know the
>>>> boundaries of the region it needs to create the sparse mappings in.
>>>
>>> Again, this is not how things are working. First of all the kernel
>>> absolutely should *NOT* know about those regions.
>>>
>>> What we have inside the kernel is the information what happens if an
>>> address X is accessed. On AMD HW this can be:
>>>
>>> 1. Route to the PCIe bus because the mapped BO is stored in system
>>> memory.
>>> 2. Route to the internal MC because the mapped BO is stored in local
>>> memory.
>>> 3. Route to other GPUs in the same hive.
>>> 4. Route to some doorbell to kick of other work.
>>> ...
>>> x. Ignore write, return 0 on reads (this is what is used for sparse
>>> mappings).
>>> x+1. Trigger a recoverable page fault. This is used for things like
>>> SVA.
>>> x+2. Trigger a non-recoverable page fault. This is used for things
>>> like unmapped regions where access is illegal.
>>>
>>> All this is plus some hw specific caching flags.
>>>
>>> When Vulkan allocates a sparse VKBuffer what should happen is the
>>> following:
>>>
>>> 1. The Vulkan driver somehow figures out a VA region A..B for the
>>> buffer. This can be in userspace (libdrm_amdgpu) or kernel (drm_mm),
>>> but essentially is currently driver specific.
>>
>> Right, for Nouveau we have this in userspace as well.
>>
>>>
>>> 2. The kernel gets a request to map the VA range A..B as sparse,
>>> meaning that it updates the page tables from A..B with the sparse
>>> setting.
>>>
>>> 3. User space asks kernel to map a couple of memory backings at
>>> location A+1, A+10, A+15 etc....
>>>
>>> 4. The VKBuffer is de-allocated, userspace asks kernel to update
>>> region A..B to not map anything (usually triggers a non-recoverable
>>> fault).
>>
>> Until here this seems to be identical to what I'm doing.
>>
>> It'd be interesting to know how amdgpu handles everything that
>> potentially happens between your 3) and 4). More specifically, how
>> are the page tables changed when memory backed mappings are mapped on
>> a sparse range? What happens when the memory backed mappings are
>> unmapped, but the VKBuffer isn't de-allocated, and hence sparse
>> mappings need to be re-deployed?
>>
>> Let's assume the sparse VKBuffer (and hence the VA space allocation)
>> is pretty large. In Nouveau the corresponding PTEs would have a
>> rather huge page size to cover this. Now, if small memory backed
>> mappings are mapped to this huge sparse buffer, in Nouveau we'd
>> allocate a new PT with a corresponding smaller page size overlaying
>> the sparse mappings PTEs.
>>
>> How would this look like in amdgpu?
>>
>>>
>>> When you want to unify this between hw drivers I strongly suggest to
>>> completely start from scratch once more.
>>>
>
> I just took some time digging into amdgpu and, surprisingly, aside
> from the gpuva_regions it seems like amdgpu basically does exactly the
> same as I do in the GPU VA manager. As explained, those region
> boundaries are needed for merging only and, depending on the driver,
> might be useful for sparse mappings.
>
> For drivers that don't intend to merge at all and (somehow) are
> capable of dealing with sparse regions without knowing the sparse
> region's boundaries, it'd be easy to make those gpuva_regions optional.
Yeah, but this then defeats the approach of having the same hw
independent interface/implementation for all drivers.
Let me ask the other way around how does the hw implementation of a
sparse mapping looks like for NVidia based hardware?
For newer AMD hw its a flag in the page tables, for older hw its a
register where you can specify ranges A..B. We don't really support the
later with AMDGPU any more, but from this interface I would guess you
have the second variant, right?
Christian.
>
>>> First of all don't think about those mappings as VMAs, that won't
>>> work because VMAs are usually something large. Think of this as
>>> individual PTEs controlled by the application. similar how COW
>>> mappings and struct pages are handled inside the kernel.
>>
>> Why do you consider tracking single PTEs superior to tracking VMAs?
>> All the properties for a page you mentioned above should be equal for
>> the entirety of pages of a whole (memory backed) mapping, aren't they?
>>
>>>
>>> Then I would start with the VA allocation manager. You could
>>> probably base that on drm_mm. We handle it differently in amdgpu
>>> currently, but I think this is something we could change.
>>
>> It was not my intention to come up with an actual allocator for the
>> VA space in the sense of actually finding a free and fitting hole in
>> the VA space.
>>
>> For Nouveau (and XE, I think) we have this in userspace and from what
>> you've written previously I thought the same applies for amdgpu?
>>
>>>
>>> Then come up with something close to the amdgpu VM system. I'm
>>> pretty sure that should work for Nouveau and Intel XA as well. In
>>> other words you just have a bunch of very very small structures
>>> which represents mappings and a larger structure which combine all
>>> mappings of a specific type, e.g. all mappings of a BO or all sparse
>>> mappings etc...
>>
>> Considering what you wrote above I assume that small structures /
>> mappings in this paragraph refer to PTEs.
>>
>> Immediately, I don't really see how this fine grained resolution of
>> single PTEs would help implementing this in Nouveau. Actually, I
>> think it would even complicate the handling of PTs, but I would need
>> to think about this a bit more.
>>
>>>
>>> Merging of regions is actually not mandatory. We don't do it in
>>> amdgpu and can live with the additional mappings pretty well. But I
>>> think this can differ between drivers.
>>>
>>> Regards,
>>> Christian.
>>>
>
next prev parent reply other threads:[~2023-01-30 13:02 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-18 6:12 [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
2023-01-18 6:12 ` [PATCH drm-next 01/14] drm: execution context for GEM buffers Danilo Krummrich
2023-01-18 6:12 ` [PATCH drm-next 02/14] drm/exec: fix memory leak in drm_exec_prepare_obj() Danilo Krummrich
2023-01-18 8:51 ` Christian König
2023-01-18 19:00 ` Danilo Krummrich
2023-01-18 6:12 ` [PATCH drm-next 03/14] drm: manager to keep track of GPUs VA mappings Danilo Krummrich
2023-01-19 4:14 ` Bagas Sanjaya
2023-01-20 18:32 ` Danilo Krummrich
2023-01-23 23:23 ` Niranjana Vishwanathapura
2023-01-26 23:43 ` Matthew Brost
2023-01-27 0:24 ` Matthew Brost
2023-02-03 17:37 ` Matthew Brost
2023-02-06 13:35 ` Christian König
2023-02-06 13:46 ` Danilo Krummrich
2023-02-14 11:52 ` Danilo Krummrich
2023-01-18 6:12 ` [PATCH drm-next 04/14] drm: debugfs: provide infrastructure to dump a DRM GPU VA space Danilo Krummrich
2023-01-18 13:55 ` kernel test robot
2023-01-18 15:47 ` kernel test robot
2023-01-18 6:12 ` [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces Danilo Krummrich
2023-01-27 1:05 ` Matthew Brost
2023-01-27 1:26 ` Danilo Krummrich
2023-01-27 7:55 ` Christian König
2023-01-27 13:12 ` Danilo Krummrich
2023-01-27 13:23 ` Christian König
2023-01-27 14:44 ` Danilo Krummrich
2023-01-27 15:17 ` Christian König
2023-01-27 20:25 ` David Airlie
2023-01-30 12:58 ` Christian König
2023-01-27 21:09 ` Danilo Krummrich
2023-01-29 18:46 ` Danilo Krummrich
2023-01-30 13:02 ` Christian König [this message]
2023-01-30 23:38 ` Danilo Krummrich
2023-02-01 8:10 ` [Nouveau] " Dave Airlie
2023-02-02 11:53 ` Christian König
2023-02-02 18:31 ` Danilo Krummrich
2023-02-06 9:48 ` Christian König
2023-02-06 13:27 ` Danilo Krummrich
2023-02-06 16:14 ` Christian König
2023-02-06 18:20 ` Danilo Krummrich
2023-02-07 9:35 ` Christian König
2023-02-07 10:50 ` Danilo Krummrich
2023-02-10 11:50 ` Christian König
2023-02-10 12:47 ` Danilo Krummrich
2023-01-27 1:43 ` Danilo Krummrich
2023-01-27 3:21 ` Matthew Brost
2023-01-27 3:33 ` Danilo Krummrich
2023-01-18 6:12 ` [PATCH drm-next 06/14] drm/nouveau: get vmm via nouveau_cli_vmm() Danilo Krummrich
2023-01-18 6:12 ` [PATCH drm-next 07/14] drm/nouveau: bo: initialize GEM GPU VA interface Danilo Krummrich
2023-01-18 6:12 ` [PATCH drm-next 08/14] drm/nouveau: move usercopy helpers to nouveau_drv.h Danilo Krummrich
2023-01-18 6:12 ` [PATCH drm-next 09/14] drm/nouveau: fence: fail to emit when fence context is killed Danilo Krummrich
2023-01-18 6:12 ` [PATCH drm-next 10/14] drm/nouveau: chan: provide nouveau_channel_kill() Danilo Krummrich
2023-01-18 6:12 ` [PATCH drm-next 11/14] drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm Danilo Krummrich
2023-01-20 3:37 ` kernel test robot
2023-01-18 6:12 ` [PATCH drm-next 12/14] drm/nouveau: implement uvmm for user mode bindings Danilo Krummrich
2023-01-18 6:12 ` [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI Danilo Krummrich
2023-01-18 20:37 ` Thomas Hellström (Intel)
2023-01-19 3:44 ` Danilo Krummrich
2023-01-19 4:58 ` Matthew Brost
2023-01-19 7:32 ` Thomas Hellström (Intel)
2023-01-20 10:08 ` Boris Brezillon
2023-01-18 6:12 ` [PATCH drm-next 14/14] drm/nouveau: debugfs: implement DRM GPU VA debugfs Danilo Krummrich
2023-01-18 8:53 ` [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Christian König
2023-01-18 15:34 ` Danilo Krummrich
2023-01-18 15:37 ` Christian König
2023-01-18 16:19 ` Danilo Krummrich
2023-01-18 16:30 ` Alex Deucher
2023-01-18 16:50 ` Danilo Krummrich
2023-01-18 16:54 ` Alex Deucher
2023-01-18 19:17 ` Dave Airlie
2023-01-18 19:48 ` Christian König
2023-01-19 4:04 ` Danilo Krummrich
2023-01-19 5:23 ` Matthew Brost
2023-01-19 11:33 ` drm_gpuva_manager requirements (was Re: [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI) Christian König
2023-02-06 14:48 ` [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Oded Gabbay
2023-03-16 16:39 ` Danilo Krummrich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1840e9fb-fd1b-79b7-4238-54ae97333d0b@amd.com \
--to=christian.koenig@amd.com \
--cc=airlied@redhat.com \
--cc=bskeggs@redhat.com \
--cc=corbet@lwn.net \
--cc=dakr@redhat.com \
--cc=daniel@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=jason@jlekstrand.net \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=matthew.brost@intel.com \
--cc=mripard@kernel.org \
--cc=nouveau@lists.freedesktop.org \
--cc=tzimmermann@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).