linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: Danilo Krummrich <dakr@redhat.com>,
	Matthew Brost <matthew.brost@intel.com>
Cc: daniel@ffwll.ch, airlied@redhat.com, bskeggs@redhat.com,
	jason@jlekstrand.net, tzimmermann@suse.de, mripard@kernel.org,
	corbet@lwn.net, nouveau@lists.freedesktop.org,
	linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linux-doc@vger.kernel.org
Subject: Re: [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces
Date: Mon, 30 Jan 2023 14:02:37 +0100	[thread overview]
Message-ID: <1840e9fb-fd1b-79b7-4238-54ae97333d0b@amd.com> (raw)
In-Reply-To: <15fb0179-c7c5-8a64-ed08-841189919f5e@redhat.com>

Am 29.01.23 um 19:46 schrieb Danilo Krummrich:
> On 1/27/23 22:09, Danilo Krummrich wrote:
>> On 1/27/23 16:17, Christian König wrote:
>>> Am 27.01.23 um 15:44 schrieb Danilo Krummrich:
>>>> [SNIP]
>>>>>>>
>>>>>>> What you want is one component for tracking the VA allocations 
>>>>>>> (drm_mm based) and a different component/interface for tracking 
>>>>>>> the VA mappings (probably rb tree based).
>>>>>>
>>>>>> That's what the GPUVA manager is doing. There are gpuva_regions 
>>>>>> which correspond to VA allocations and gpuvas which represent the 
>>>>>> mappings. Both are tracked separately (currently both with a 
>>>>>> separate drm_mm, though). However, the GPUVA manager needs to 
>>>>>> take regions into account when dealing with mappings to make sure 
>>>>>> the GPUVA manager doesn't propose drivers to merge over region 
>>>>>> boundaries. Speaking from userspace PoV, the kernel wouldn't 
>>>>>> merge mappings from different VKBuffer objects even if they're 
>>>>>> virtually and physically contiguous.
>>>>>
>>>>> That are two completely different things and shouldn't be handled 
>>>>> in a single component.
>>>>
>>>> They are different things, but they're related in a way that for 
>>>> handling the mappings (in particular merging and sparse) the GPUVA 
>>>> manager needs to know the VA allocation (or region) boundaries.
>>>>
>>>> I have the feeling there might be a misunderstanding. Userspace is 
>>>> in charge to actually allocate a portion of VA space and manage it. 
>>>> The GPUVA manager just needs to know about those VA space 
>>>> allocations and hence keeps track of them.
>>>>
>>>> The GPUVA manager is not meant to be an allocator in the sense of 
>>>> finding and providing a hole for a given request.
>>>>
>>>> Maybe the non-ideal choice of using drm_mm was implying something 
>>>> else.
>>>
>>> Uff, well long story short that doesn't even remotely match the 
>>> requirements. This way the GPUVA manager won't be usable for a whole 
>>> bunch of use cases.
>>>
>>> What we have are mappings which say X needs to point to Y with this 
>>> and hw dependent flags.
>>>
>>> The whole idea of having ranges is not going to fly. Neither with 
>>> AMD GPUs and I strongly think not with Intels XA either.
>>
>> A range in the sense of the GPUVA manager simply represents a VA 
>> space allocation (which in case of Nouveau is taken in userspace). 
>> Userspace allocates the portion of VA space and lets the kernel know 
>> about it. The current implementation needs that for the named 
>> reasons. So, I think there is no reason why this would work with one 
>> GPU, but not with another. It's just part of the design choice of the 
>> manager.
>>
>> And I'm absolutely happy to discuss the details of the manager 
>> implementation though.
>>
>>>
>>>>> We should probably talk about the design of the GPUVA manager once 
>>>>> more when this should be applicable to all GPU drivers.
>>>>
>>>> That's what I try to figure out with this RFC, how to make it 
>>>> appicable for all GPU drivers, so I'm happy to discuss this. :-)
>>>
>>> Yeah, that was really good idea :) That proposal here is really far 
>>> away from the actual requirements.
>>>
>>
>> And those are the ones I'm looking for. Do you mind sharing the 
>> requirements for amdgpu in particular?
>>
>>>>>> For sparse residency the kernel also needs to know the region 
>>>>>> boundaries to make sure that it keeps sparse mappings around.
>>>>>
>>>>> What?
>>>>
>>>> When userspace creates a new VKBuffer with the 
>>>> VK_BUFFER_CREATE_SPARSE_BINDING_BIT the kernel may need to create 
>>>> sparse mappings in order to ensure that using this buffer without 
>>>> any memory backed mappings doesn't fault the GPU.
>>>>
>>>> Currently, the implementation does this the following way:
>>>>
>>>> 1. Userspace creates a new VKBuffer and hence allocates a portion 
>>>> of the VA space for it. It calls into the kernel indicating the new 
>>>> VA space region and the fact that the region is sparse.
>>>>
>>>> 2. The kernel picks up the region and stores it in the GPUVA 
>>>> manager, the driver creates the corresponding sparse mappings / 
>>>> page table entries.
>>>>
>>>> 3. Userspace might ask the driver to create a couple of memory 
>>>> backed mappings for this particular VA region. The GPUVA manager 
>>>> stores the mapping parameters, the driver creates the corresponding 
>>>> page table entries.
>>>>
>>>> 4. Userspace might ask to unmap all the memory backed mappings from 
>>>> this particular VA region. The GPUVA manager removes the mapping 
>>>> parameters, the driver cleans up the corresponding page table 
>>>> entries. However, the driver also needs to re-create the sparse 
>>>> mappings, since it's a sparse buffer, hence it needs to know the 
>>>> boundaries of the region it needs to create the sparse mappings in.
>>>
>>> Again, this is not how things are working. First of all the kernel 
>>> absolutely should *NOT* know about those regions.
>>>
>>> What we have inside the kernel is the information what happens if an 
>>> address X is accessed. On AMD HW this can be:
>>>
>>> 1. Route to the PCIe bus because the mapped BO is stored in system 
>>> memory.
>>> 2. Route to the internal MC because the mapped BO is stored in local 
>>> memory.
>>> 3. Route to other GPUs in the same hive.
>>> 4. Route to some doorbell to kick of other work.
>>> ...
>>> x. Ignore write, return 0 on reads (this is what is used for sparse 
>>> mappings).
>>> x+1. Trigger a recoverable page fault. This is used for things like 
>>> SVA.
>>> x+2. Trigger a non-recoverable page fault. This is used for things 
>>> like unmapped regions where access is illegal.
>>>
>>> All this is plus some hw specific caching flags.
>>>
>>> When Vulkan allocates a sparse VKBuffer what should happen is the 
>>> following:
>>>
>>> 1. The Vulkan driver somehow figures out a VA region A..B for the 
>>> buffer. This can be in userspace (libdrm_amdgpu) or kernel (drm_mm), 
>>> but essentially is currently driver specific.
>>
>> Right, for Nouveau we have this in userspace as well.
>>
>>>
>>> 2. The kernel gets a request to map the VA range A..B as sparse, 
>>> meaning that it updates the page tables from A..B with the sparse 
>>> setting.
>>>
>>> 3. User space asks kernel to map a couple of memory backings at 
>>> location A+1, A+10, A+15 etc....
>>>
>>> 4. The VKBuffer is de-allocated, userspace asks kernel to update 
>>> region A..B to not map anything (usually triggers a non-recoverable 
>>> fault).
>>
>> Until here this seems to be identical to what I'm doing.
>>
>> It'd be interesting to know how amdgpu handles everything that 
>> potentially happens between your 3) and 4). More specifically, how 
>> are the page tables changed when memory backed mappings are mapped on 
>> a sparse range? What happens when the memory backed mappings are 
>> unmapped, but the VKBuffer isn't de-allocated, and hence sparse 
>> mappings need to be re-deployed?
>>
>> Let's assume the sparse VKBuffer (and hence the VA space allocation) 
>> is pretty large. In Nouveau the corresponding PTEs would have a 
>> rather huge page size to cover this. Now, if small memory backed 
>> mappings are mapped to this huge sparse buffer, in Nouveau we'd 
>> allocate a new PT with a corresponding smaller page size overlaying 
>> the sparse mappings PTEs.
>>
>> How would this look like in amdgpu?
>>
>>>
>>> When you want to unify this between hw drivers I strongly suggest to 
>>> completely start from scratch once more.
>>>
>
> I just took some time digging into amdgpu and, surprisingly, aside 
> from the gpuva_regions it seems like amdgpu basically does exactly the 
> same as I do in the GPU VA manager. As explained, those region 
> boundaries are needed for merging only and, depending on the driver, 
> might be useful for sparse mappings.
>
> For drivers that don't intend to merge at all and (somehow) are 
> capable of dealing with sparse regions without knowing the sparse 
> region's boundaries, it'd be easy to make those gpuva_regions optional.

Yeah, but this then defeats the approach of having the same hw 
independent interface/implementation for all drivers.

Let me ask the other way around how does the hw implementation of a 
sparse mapping looks like for NVidia based hardware?

For newer AMD hw its a flag in the page tables, for older hw its a 
register where you can specify ranges A..B. We don't really support the 
later with AMDGPU any more, but from this interface I would guess you 
have the second variant, right?

Christian.

>
>>> First of all don't think about those mappings as VMAs, that won't 
>>> work because VMAs are usually something large. Think of this as 
>>> individual PTEs controlled by the application. similar how COW 
>>> mappings and struct pages are handled inside the kernel.
>>
>> Why do you consider tracking single PTEs superior to tracking VMAs? 
>> All the properties for a page you mentioned above should be equal for 
>> the entirety of pages of a whole (memory backed) mapping, aren't they?
>>
>>>
>>> Then I would start with the VA allocation manager. You could 
>>> probably base that on drm_mm. We handle it differently in amdgpu 
>>> currently, but I think this is something we could change.
>>
>> It was not my intention to come up with an actual allocator for the 
>> VA space in the sense of actually finding a free and fitting hole in 
>> the VA space.
>>
>> For Nouveau (and XE, I think) we have this in userspace and from what 
>> you've written previously I thought the same applies for amdgpu?
>>
>>>
>>> Then come up with something close to the amdgpu VM system. I'm 
>>> pretty sure that should work for Nouveau and Intel XA as well. In 
>>> other words you just have a bunch of very very small structures 
>>> which represents mappings and a larger structure which combine all 
>>> mappings of a specific type, e.g. all mappings of a BO or all sparse 
>>> mappings etc...
>>
>> Considering what you wrote above I assume that small structures / 
>> mappings in this paragraph refer to PTEs.
>>
>> Immediately, I don't really see how this fine grained resolution of 
>> single PTEs would help implementing this in Nouveau. Actually, I 
>> think it would even complicate the handling of PTs, but I would need 
>> to think about this a bit more.
>>
>>>
>>> Merging of regions is actually not mandatory. We don't do it in 
>>> amdgpu and can live with the additional mappings pretty well. But I 
>>> think this can differ between drivers.
>>>
>>> Regards,
>>> Christian.
>>>
>


  reply	other threads:[~2023-01-30 13:02 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-18  6:12 [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
2023-01-18  6:12 ` [PATCH drm-next 01/14] drm: execution context for GEM buffers Danilo Krummrich
2023-01-18  6:12 ` [PATCH drm-next 02/14] drm/exec: fix memory leak in drm_exec_prepare_obj() Danilo Krummrich
2023-01-18  8:51   ` Christian König
2023-01-18 19:00     ` Danilo Krummrich
2023-01-18  6:12 ` [PATCH drm-next 03/14] drm: manager to keep track of GPUs VA mappings Danilo Krummrich
2023-01-19  4:14   ` Bagas Sanjaya
2023-01-20 18:32     ` Danilo Krummrich
2023-01-23 23:23   ` Niranjana Vishwanathapura
2023-01-26 23:43   ` Matthew Brost
2023-01-27  0:24   ` Matthew Brost
2023-02-03 17:37   ` Matthew Brost
2023-02-06 13:35     ` Christian König
2023-02-06 13:46       ` Danilo Krummrich
2023-02-14 11:52     ` Danilo Krummrich
2023-01-18  6:12 ` [PATCH drm-next 04/14] drm: debugfs: provide infrastructure to dump a DRM GPU VA space Danilo Krummrich
2023-01-18 13:55   ` kernel test robot
2023-01-18 15:47   ` kernel test robot
2023-01-18  6:12 ` [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces Danilo Krummrich
2023-01-27  1:05   ` Matthew Brost
2023-01-27  1:26     ` Danilo Krummrich
2023-01-27  7:55       ` Christian König
2023-01-27 13:12         ` Danilo Krummrich
2023-01-27 13:23           ` Christian König
2023-01-27 14:44             ` Danilo Krummrich
2023-01-27 15:17               ` Christian König
2023-01-27 20:25                 ` David Airlie
2023-01-30 12:58                   ` Christian König
2023-01-27 21:09                 ` Danilo Krummrich
2023-01-29 18:46                   ` Danilo Krummrich
2023-01-30 13:02                     ` Christian König [this message]
2023-01-30 23:38                       ` Danilo Krummrich
2023-02-01  8:10                       ` [Nouveau] " Dave Airlie
2023-02-02 11:53                         ` Christian König
2023-02-02 18:31                           ` Danilo Krummrich
2023-02-06  9:48                             ` Christian König
2023-02-06 13:27                               ` Danilo Krummrich
2023-02-06 16:14                                 ` Christian König
2023-02-06 18:20                                   ` Danilo Krummrich
2023-02-07  9:35                                     ` Christian König
2023-02-07 10:50                                       ` Danilo Krummrich
2023-02-10 11:50                                         ` Christian König
2023-02-10 12:47                                           ` Danilo Krummrich
2023-01-27  1:43     ` Danilo Krummrich
2023-01-27  3:21       ` Matthew Brost
2023-01-27  3:33         ` Danilo Krummrich
2023-01-18  6:12 ` [PATCH drm-next 06/14] drm/nouveau: get vmm via nouveau_cli_vmm() Danilo Krummrich
2023-01-18  6:12 ` [PATCH drm-next 07/14] drm/nouveau: bo: initialize GEM GPU VA interface Danilo Krummrich
2023-01-18  6:12 ` [PATCH drm-next 08/14] drm/nouveau: move usercopy helpers to nouveau_drv.h Danilo Krummrich
2023-01-18  6:12 ` [PATCH drm-next 09/14] drm/nouveau: fence: fail to emit when fence context is killed Danilo Krummrich
2023-01-18  6:12 ` [PATCH drm-next 10/14] drm/nouveau: chan: provide nouveau_channel_kill() Danilo Krummrich
2023-01-18  6:12 ` [PATCH drm-next 11/14] drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm Danilo Krummrich
2023-01-20  3:37   ` kernel test robot
2023-01-18  6:12 ` [PATCH drm-next 12/14] drm/nouveau: implement uvmm for user mode bindings Danilo Krummrich
2023-01-18  6:12 ` [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI Danilo Krummrich
2023-01-18 20:37   ` Thomas Hellström (Intel)
2023-01-19  3:44     ` Danilo Krummrich
2023-01-19  4:58       ` Matthew Brost
2023-01-19  7:32         ` Thomas Hellström (Intel)
2023-01-20 10:08         ` Boris Brezillon
2023-01-18  6:12 ` [PATCH drm-next 14/14] drm/nouveau: debugfs: implement DRM GPU VA debugfs Danilo Krummrich
2023-01-18  8:53 ` [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Christian König
2023-01-18 15:34   ` Danilo Krummrich
2023-01-18 15:37     ` Christian König
2023-01-18 16:19       ` Danilo Krummrich
2023-01-18 16:30         ` Alex Deucher
2023-01-18 16:50           ` Danilo Krummrich
2023-01-18 16:54             ` Alex Deucher
2023-01-18 19:17               ` Dave Airlie
2023-01-18 19:48                 ` Christian König
2023-01-19  4:04                   ` Danilo Krummrich
2023-01-19  5:23                     ` Matthew Brost
2023-01-19 11:33                       ` drm_gpuva_manager requirements (was Re: [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI) Christian König
2023-02-06 14:48                       ` [PATCH drm-next 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Oded Gabbay
2023-03-16 16:39                         ` Danilo Krummrich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1840e9fb-fd1b-79b7-4238-54ae97333d0b@amd.com \
    --to=christian.koenig@amd.com \
    --cc=airlied@redhat.com \
    --cc=bskeggs@redhat.com \
    --cc=corbet@lwn.net \
    --cc=dakr@redhat.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jason@jlekstrand.net \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matthew.brost@intel.com \
    --cc=mripard@kernel.org \
    --cc=nouveau@lists.freedesktop.org \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).