From: "Zeng, Oak" <oak.zeng@intel.com>
To: "Daniel Vetter" <daniel@ffwll.ch>,
"Christian König" <christian.koenig@amd.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>,
"dri-devel@lists.freedesktop.org"
<dri-devel@lists.freedesktop.org>,
Danilo Krummrich <dakr@redhat.com>,
Dave Airlie <airlied@redhat.com>,
"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>
Subject: RE: Making drm_gpuvm work across gpu devices
Date: Thu, 25 Jan 2024 21:02:02 +0000 [thread overview]
Message-ID: <SA1PR11MB6991B8FA2291D676B66DC865927A2@SA1PR11MB6991.namprd11.prod.outlook.com> (raw)
In-Reply-To: <ZbKpWpOGuNKLJ6sA@phenom.ffwll.local>
> -----Original Message-----
> From: Daniel Vetter <daniel@ffwll.ch>
> Sent: Thursday, January 25, 2024 1:33 PM
> To: Christian König <christian.koenig@amd.com>
> Cc: Zeng, Oak <oak.zeng@intel.com>; Danilo Krummrich <dakr@redhat.com>;
> Dave Airlie <airlied@redhat.com>; Daniel Vetter <daniel@ffwll.ch>; Felix
> Kuehling <felix.kuehling@amd.com>; Welty, Brian <brian.welty@intel.com>; dri-
> devel@lists.freedesktop.org; intel-xe@lists.freedesktop.org; Bommu, Krishnaiah
> <krishnaiah.bommu@intel.com>; Ghimiray, Himal Prasad
> <himal.prasad.ghimiray@intel.com>; Thomas.Hellstrom@linux.intel.com;
> Vishwanathapura, Niranjana <niranjana.vishwanathapura@intel.com>; Brost,
> Matthew <matthew.brost@intel.com>; Gupta, saurabhg
> <saurabhg.gupta@intel.com>
> Subject: Re: Making drm_gpuvm work across gpu devices
>
> On Wed, Jan 24, 2024 at 09:33:12AM +0100, Christian König wrote:
> > Am 23.01.24 um 20:37 schrieb Zeng, Oak:
> > > [SNIP]
> > > Yes most API are per device based.
> > >
> > > One exception I know is actually the kfd SVM API. If you look at the svm_ioctl
> function, it is per-process based. Each kfd_process represent a process across N
> gpu devices.
> >
> > Yeah and that was a big mistake in my opinion. We should really not do that
> > ever again.
> >
> > > Need to say, kfd SVM represent a shared virtual address space across CPU
> and all GPU devices on the system. This is by the definition of SVM (shared virtual
> memory). This is very different from our legacy gpu *device* driver which works
> for only one device (i.e., if you want one device to access another device's
> memory, you will have to use dma-buf export/import etc).
> >
> > Exactly that thinking is what we have currently found as blocker for a
> > virtualization projects. Having SVM as device independent feature which
> > somehow ties to the process address space turned out to be an extremely bad
> > idea.
> >
> > The background is that this only works for some use cases but not all of
> > them.
> >
> > What's working much better is to just have a mirror functionality which says
> > that a range A..B of the process address space is mapped into a range C..D
> > of the GPU address space.
> >
> > Those ranges can then be used to implement the SVM feature required for
> > higher level APIs and not something you need at the UAPI or even inside the
> > low level kernel memory management.
> >
> > When you talk about migrating memory to a device you also do this on a per
> > device basis and *not* tied to the process address space. If you then get
> > crappy performance because userspace gave contradicting information where
> to
> > migrate memory then that's a bug in userspace and not something the kernel
> > should try to prevent somehow.
> >
> > [SNIP]
> > > > I think if you start using the same drm_gpuvm for multiple devices you
> > > > will sooner or later start to run into the same mess we have seen with
> > > > KFD, where we moved more and more functionality from the KFD to the
> DRM
> > > > render node because we found that a lot of the stuff simply doesn't work
> > > > correctly with a single object to maintain the state.
> > > As I understand it, KFD is designed to work across devices. A single pseudo
> /dev/kfd device represent all hardware gpu devices. That is why during kfd open,
> many pdd (process device data) is created, each for one hardware device for this
> process.
> >
> > Yes, I'm perfectly aware of that. And I can only repeat myself that I see
> > this design as a rather extreme failure. And I think it's one of the reasons
> > why NVidia is so dominant with Cuda.
> >
> > This whole approach KFD takes was designed with the idea of extending the
> > CPU process into the GPUs, but this idea only works for a few use cases and
> > is not something we should apply to drivers in general.
> >
> > A very good example are virtualization use cases where you end up with CPU
> > address != GPU address because the VAs are actually coming from the guest
> VM
> > and not the host process.
> >
> > SVM is a high level concept of OpenCL, Cuda, ROCm etc.. This should not have
> > any influence on the design of the kernel UAPI.
> >
> > If you want to do something similar as KFD for Xe I think you need to get
> > explicit permission to do this from Dave and Daniel and maybe even Linus.
>
> I think the one and only one exception where an SVM uapi like in kfd makes
> sense, is if the _hardware_ itself, not the software stack defined
> semantics that you've happened to build on top of that hw, enforces a 1:1
> mapping with the cpu process address space.
>
> Which means your hardware is using PASID, IOMMU based translation, PCI-ATS
> (address translation services) or whatever your hw calls it and has _no_
> device-side pagetables on top. Which from what I've seen all devices with
> device-memory have, simply because they need some place to store whether
> that memory is currently in device memory or should be translated using
> PASID. Currently there's no gpu that works with PASID only, but there are
> some on-cpu-die accelerator things that do work like that.
>
> Maybe in the future there will be some accelerators that are fully cpu
> cache coherent (including atomics) with something like CXL, and the
> on-device memory is managed as normal system memory with struct page as
> ZONE_DEVICE and accelerator va -> physical address translation is only
> done with PASID ... but for now I haven't seen that, definitely not in
> upstream drivers.
>
> And the moment you have some per-device pagetables or per-device memory
> management of some sort (like using gpuva mgr) then I'm 100% agreeing with
> Christian that the kfd SVM model is too strict and not a great idea.
>
GPU is nothing more than a piece of HW to accelerate part of a program, just like an extra CPU core. From this perspective, a unified virtual address space across CPU and all GPU devices (and any other accelerators) is always more convenient to program than split address space b/t devices.
In reality, GPU program started from split address space. HMM is designed to provide unified virtual address space w/o a lot of advanced hardware feature you listed above.
I am aware Nvidia's new hardware platforms such as Grace Hopper natively support the Unified Memory programming model through hardware-based memory coherence among all CPUs and GPUs. For such systems, HMM is not required.
You can think HMM as a software based solution to provide unified address space b/t cpu and devices. Both AMD and Nvidia have been providing unified address space through hmm. I think it is still valuable.
Regards,
Oak
> Cheers, Sima
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
next prev parent reply other threads:[~2024-01-25 21:02 UTC|newest]
Thread overview: 126+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-17 22:12 [PATCH 00/23] XeKmd basic SVM support Oak Zeng
2024-01-17 22:12 ` [PATCH 01/23] drm/xe/svm: Add SVM document Oak Zeng
2024-01-17 22:12 ` [PATCH 02/23] drm/xe/svm: Add svm key data structures Oak Zeng
2024-01-17 22:12 ` [PATCH 03/23] drm/xe/svm: create xe svm during vm creation Oak Zeng
2024-01-17 22:12 ` [PATCH 04/23] drm/xe/svm: Trace svm creation Oak Zeng
2024-01-17 22:12 ` [PATCH 05/23] drm/xe/svm: add helper to retrieve svm range from address Oak Zeng
2024-01-17 22:12 ` [PATCH 06/23] drm/xe/svm: Introduce a helper to build sg table from hmm range Oak Zeng
2024-04-05 0:39 ` Jason Gunthorpe
2024-04-05 3:33 ` Zeng, Oak
2024-04-05 12:37 ` Jason Gunthorpe
2024-04-05 16:42 ` Zeng, Oak
2024-04-05 18:02 ` Jason Gunthorpe
2024-04-09 16:45 ` Zeng, Oak
2024-04-09 17:24 ` Jason Gunthorpe
2024-04-23 21:17 ` Zeng, Oak
2024-04-24 2:31 ` Matthew Brost
2024-04-24 13:57 ` Jason Gunthorpe
2024-04-24 16:35 ` Matthew Brost
2024-04-24 16:44 ` Jason Gunthorpe
2024-04-24 16:56 ` Matthew Brost
2024-04-24 17:48 ` Jason Gunthorpe
2024-04-24 13:48 ` Jason Gunthorpe
2024-04-24 23:59 ` Zeng, Oak
2024-04-25 1:05 ` Jason Gunthorpe
2024-04-26 9:55 ` Thomas Hellström
2024-04-26 12:00 ` Jason Gunthorpe
2024-04-26 14:49 ` Thomas Hellström
2024-04-26 16:35 ` Jason Gunthorpe
2024-04-29 8:25 ` Thomas Hellström
2024-04-30 17:30 ` Jason Gunthorpe
2024-04-30 18:57 ` Daniel Vetter
2024-05-01 0:09 ` Jason Gunthorpe
2024-05-02 8:04 ` Daniel Vetter
2024-05-02 9:11 ` Thomas Hellström
2024-05-02 12:46 ` Jason Gunthorpe
2024-05-02 15:01 ` Thomas Hellström
2024-05-02 19:25 ` Zeng, Oak
2024-05-03 13:37 ` Jason Gunthorpe
2024-05-03 14:43 ` Zeng, Oak
2024-05-03 16:28 ` Jason Gunthorpe
2024-05-03 20:29 ` Zeng, Oak
2024-05-04 1:03 ` Dave Airlie
2024-05-06 13:04 ` Daniel Vetter
2024-05-06 23:50 ` Matthew Brost
2024-05-07 11:56 ` Jason Gunthorpe
2024-05-06 13:33 ` Jason Gunthorpe
2024-04-09 17:33 ` Matthew Brost
2024-01-17 22:12 ` [PATCH 07/23] drm/xe/svm: Add helper for binding hmm range to gpu Oak Zeng
2024-01-17 22:12 ` [PATCH 08/23] drm/xe/svm: Add helper to invalidate svm range from GPU Oak Zeng
2024-01-17 22:12 ` [PATCH 09/23] drm/xe/svm: Remap and provide memmap backing for GPU vram Oak Zeng
2024-01-17 22:12 ` [PATCH 10/23] drm/xe/svm: Introduce svm migration function Oak Zeng
2024-01-17 22:12 ` [PATCH 11/23] drm/xe/svm: implement functions to allocate and free device memory Oak Zeng
2024-01-17 22:12 ` [PATCH 12/23] drm/xe/svm: Trace buddy block allocation and free Oak Zeng
2024-01-17 22:12 ` [PATCH 13/23] drm/xe/svm: Handle CPU page fault Oak Zeng
2024-01-17 22:12 ` [PATCH 14/23] drm/xe/svm: trace svm range migration Oak Zeng
2024-01-17 22:12 ` [PATCH 15/23] drm/xe/svm: Implement functions to register and unregister mmu notifier Oak Zeng
2024-01-17 22:12 ` [PATCH 16/23] drm/xe/svm: Implement the mmu notifier range invalidate callback Oak Zeng
2024-01-17 22:12 ` [PATCH 17/23] drm/xe/svm: clean up svm range during process exit Oak Zeng
2024-01-17 22:12 ` [PATCH 18/23] drm/xe/svm: Move a few structures to xe_gt.h Oak Zeng
2024-01-17 22:12 ` [PATCH 19/23] drm/xe/svm: migrate svm range to vram Oak Zeng
2024-01-17 22:12 ` [PATCH 20/23] drm/xe/svm: Populate svm range Oak Zeng
2024-01-17 22:12 ` [PATCH 21/23] drm/xe/svm: GPU page fault support Oak Zeng
2024-01-23 2:06 ` Welty, Brian
2024-01-23 3:09 ` Zeng, Oak
2024-01-23 3:21 ` Making drm_gpuvm work across gpu devices Zeng, Oak
2024-01-23 11:13 ` Christian König
2024-01-23 19:37 ` Zeng, Oak
2024-01-23 20:17 ` Felix Kuehling
2024-01-25 1:39 ` Zeng, Oak
2024-01-23 23:56 ` Danilo Krummrich
2024-01-24 3:57 ` Zeng, Oak
2024-01-24 4:14 ` Zeng, Oak
2024-01-24 6:48 ` Christian König
2024-01-25 22:13 ` Danilo Krummrich
2024-01-24 8:33 ` Christian König
2024-01-25 1:17 ` Zeng, Oak
2024-01-25 1:25 ` David Airlie
2024-01-25 5:25 ` Zeng, Oak
2024-01-26 10:09 ` Christian König
2024-01-26 20:13 ` Zeng, Oak
2024-01-29 10:10 ` Christian König
2024-01-29 20:09 ` Zeng, Oak
2024-01-25 11:00 ` 回复:Making " 周春明(日月)
2024-01-25 17:00 ` Zeng, Oak
2024-01-25 17:15 ` Making " Felix Kuehling
2024-01-25 18:37 ` Zeng, Oak
2024-01-26 13:23 ` Christian König
2024-01-25 16:42 ` Zeng, Oak
2024-01-25 18:32 ` Daniel Vetter
2024-01-25 21:02 ` Zeng, Oak [this message]
2024-01-26 8:21 ` Thomas Hellström
2024-01-26 12:52 ` Christian König
2024-01-27 2:21 ` Zeng, Oak
2024-01-29 10:19 ` Christian König
2024-01-30 0:21 ` Zeng, Oak
2024-01-30 8:39 ` Christian König
2024-01-30 22:29 ` Zeng, Oak
2024-01-30 23:12 ` David Airlie
2024-01-31 9:15 ` Daniel Vetter
2024-01-31 20:17 ` Zeng, Oak
2024-01-31 20:59 ` Zeng, Oak
2024-02-01 8:52 ` Christian König
2024-02-29 18:22 ` Zeng, Oak
2024-03-08 4:43 ` Zeng, Oak
2024-03-08 10:07 ` Christian König
2024-01-30 8:43 ` Thomas Hellström
2024-01-29 15:03 ` Felix Kuehling
2024-01-29 15:33 ` Christian König
2024-01-29 16:24 ` Felix Kuehling
2024-01-29 16:28 ` Christian König
2024-01-29 17:52 ` Felix Kuehling
2024-01-29 19:03 ` Christian König
2024-01-29 20:24 ` Felix Kuehling
2024-02-23 20:12 ` Zeng, Oak
2024-02-27 6:54 ` Christian König
2024-02-27 15:58 ` Zeng, Oak
2024-02-28 19:51 ` Zeng, Oak
2024-02-29 9:41 ` Christian König
2024-02-29 16:05 ` Zeng, Oak
2024-02-29 17:12 ` Thomas Hellström
2024-03-01 7:01 ` Christian König
2024-01-17 22:12 ` [PATCH 22/23] drm/xe/svm: Add DRM_XE_SVM kernel config entry Oak Zeng
2024-01-17 22:12 ` [PATCH 23/23] drm/xe/svm: Add svm memory hints interface Oak Zeng
2024-01-18 2:45 ` ✓ CI.Patch_applied: success for XeKmd basic SVM support Patchwork
2024-01-18 2:46 ` ✗ CI.checkpatch: warning " Patchwork
2024-01-18 2:46 ` ✗ CI.KUnit: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=SA1PR11MB6991B8FA2291D676B66DC865927A2@SA1PR11MB6991.namprd11.prod.outlook.com \
--to=oak.zeng@intel.com \
--cc=airlied@redhat.com \
--cc=christian.koenig@amd.com \
--cc=dakr@redhat.com \
--cc=daniel@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=felix.kuehling@amd.com \
--cc=intel-xe@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).