intel-xe.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: "Zeng, Oak" <oak.zeng@intel.com>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
	"Brost, Matthew" <matthew.brost@intel.com>,
	"Welty, Brian" <brian.welty@intel.com>,
	"Ghimiray, Himal Prasad" <himal.prasad.ghimiray@intel.com>,
	"Bommu, Krishnaiah" <krishnaiah.bommu@intel.com>,
	"Vishwanathapura,
	Niranjana" <niranjana.vishwanathapura@intel.com>,
	Leon Romanovsky <leon@kernel.org>
Subject: Re: [PATCH 06/23] drm/xe/svm: Introduce a helper to build sg table from hmm range
Date: Fri, 26 Apr 2024 13:35:19 -0300	[thread overview]
Message-ID: <20240426163519.GZ941030@nvidia.com> (raw)
In-Reply-To: <ad82f95ee29ada403459416d4c97c2b9083b5a0f.camel@linux.intel.com>

On Fri, Apr 26, 2024 at 04:49:26PM +0200, Thomas Hellström wrote:
> On Fri, 2024-04-26 at 09:00 -0300, Jason Gunthorpe wrote:
> > On Fri, Apr 26, 2024 at 11:55:05AM +0200, Thomas Hellström wrote:
> > > First, the gpu_vma structure is something that partitions the
> > > gpu_vm
> > > that holds gpu-related range metadata, like what to mirror, desired
> > > gpu
> > > caching policies etc. These are managed (created, removed and
> > > split)
> > > mainly from user-space. These are stored and looked up from an rb-
> > > tree.
> > 
> > Except we are talking about SVA here, so all of this should not be
> > exposed to userspace.
> 
> I think you are misreading. this is on the level "Mirror this region of
> the cpu_vm", "prefer this region placed in VRAM", "GPU will do atomic
> accesses on this region", very similar to cpu mmap / munmap and
> madvise. What I'm trying to say here is that this does not directly
> affect the SVA except whether to do SVA or not, and in that case what
> region of the CPU mm will be mirrored, and in addition, any gpu
> attributes for the mirrored region.

SVA is you bind the whole MM and device faults dynamically populate
the mirror page table. There aren't non-SVA regions. Meta data, like
you describe, is meta data for the allocation/migration mechanism, not
for the page table or that has anything to do with the SVA mirror
operation.

Yes there is another common scheme where you bind a window of CPU to a
window on the device and mirror a fixed range, but this is a quite
different thing. It is not SVA, it has a fixed range, and it is
probably bound to a single GPU VMA in a multi-VMA device page table.

SVA is not just a whole bunch of windows being dynamically created by
the OS, that is entirely the wrong mental model. It would be horrible
to expose to userspace something like that as uAPI. Any hidden SVA
granules and other implementation specific artifacts must not be made
visible to userspace!!

> > If you use dma_map_sg you get into the world of wrongness where you
> > have to track ranges and invalidation has to wipe an entire range -
> > because you cannot do a dma unmap of a single page from a dma_map_sg
> > mapping. This is all the wrong way to use hmm_range_fault.
> > 
> > hmm_range_fault() is page table mirroring, it fundamentally must be
> > page-by-page. The target page table structure must have similar
> > properties to the MM page table - especially page by page
> > validate/invalidate. Meaning you cannot use dma_map_sg().
> 
> To me this is purely an optimization to make the driver page-table and
> hence the GPU TLB benefit from iommu coalescing / large pages and large
> driver PTEs.

This is a different topic. Leon is working on improving the DMA API to
get these kinds of benifits for HMM users. dma_map_sg is not the path
to get this. Leon's work should be significantly better in terms of
optimizing IOVA contiguity for a GPU use case. You can get a
guaranteed DMA contiguity at your choosen granual level, even up to
something like 512M.

> It is true that invalidation will sometimes shoot down
> large gpu ptes unnecessarily but it will not put any additional burden
> on the core AFAICT. 

In my experience people doing performance workloads don't enable the
IOMMU due to the high performance cost, so while optimizing iommu
coalescing is sort of interesting, it is not as important as using the
APIs properly and not harming the much more common situation when
there is no iommu and there is no artificial contiguity.

> on invalidation since zapping the gpu PTEs effectively stops any dma
> accesses. The dma mappings are rebuilt on the next gpu pagefault,
> which, as you mention, are considered slow anyway, but will probably
> still reuse the same prefault region, hence needing to rebuild the dma
> mappings anyway.

This is bad too. The DMA should not remain mapped after pages have
been freed, it completely destroys the concept of IOMMU enforced DMA
security and the ACPI notion of untrusted external devices.

> So as long as we are correct and do not adversely affect core mm, If
> the gpu performance (for whatever reason) is severely hampered if
> large gpu page-table-entries are not used, couldn't this be considered
> left to the driver?

Please use the APIs properly. We are trying to improve the DMA API to
better support HMM users, and doing unnecessary things like this in
drivers is only harmful to that kind of consolidation.

There is nothing stopping getting large GPU page table entries for
large CPU page table entries.

> And a related question. What about THP pages? OK to set up a single
> dma-mapping to those?

Yes, THP is still a page and dma_map_page() will map it.
 
> > > That's why finer-granularity mmu_interval notifiers might be
> > > beneficial
> > > (and then cached for future re-use of the same prefault range).
> > > This
> > > leads me to the next question:
> > 
> > It is not the design, please don't invent crazy special Intel things
> > on top of hmm_range_fault.
> 
> For the record, this is not a "crazy special Intel" invention. It's the
> way all GPU implementations do this so far.

"all GPU implementations" you mean AMD, and AMD predates alot of the
modern versions of this infrastructure IIRC.

> > Why would a prefetch have anything to do with a VMA? Ie your app
> > calls
> > malloc() and gets a little allocation out of a giant mmap() arena -
> > you want to prefault the entire arena? Does that really make any
> > sense?
> 
> Personally, no it doesn't. I'd rather use some sort of fixed-size
> chunk. But to rephrase, the question was more into the strong "drivers
> should not be aware of the cpu mm vma structures" comment. 

But this is essentially why - there is nothing useful the driver can
possibly learn from the CPU VMA to drive
hmm_range_fault(). hmm_range_fault already has to walk the VMA's if
someday something is actually needed it needs to be integrated in a
general way, not by having the driver touch vmas directly.

Jason

  reply	other threads:[~2024-04-26 16:35 UTC|newest]

Thread overview: 126+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-17 22:12 [PATCH 00/23] XeKmd basic SVM support Oak Zeng
2024-01-17 22:12 ` [PATCH 01/23] drm/xe/svm: Add SVM document Oak Zeng
2024-01-17 22:12 ` [PATCH 02/23] drm/xe/svm: Add svm key data structures Oak Zeng
2024-01-17 22:12 ` [PATCH 03/23] drm/xe/svm: create xe svm during vm creation Oak Zeng
2024-01-17 22:12 ` [PATCH 04/23] drm/xe/svm: Trace svm creation Oak Zeng
2024-01-17 22:12 ` [PATCH 05/23] drm/xe/svm: add helper to retrieve svm range from address Oak Zeng
2024-01-17 22:12 ` [PATCH 06/23] drm/xe/svm: Introduce a helper to build sg table from hmm range Oak Zeng
2024-04-05  0:39   ` Jason Gunthorpe
2024-04-05  3:33     ` Zeng, Oak
2024-04-05 12:37       ` Jason Gunthorpe
2024-04-05 16:42         ` Zeng, Oak
2024-04-05 18:02           ` Jason Gunthorpe
2024-04-09 16:45             ` Zeng, Oak
2024-04-09 17:24               ` Jason Gunthorpe
2024-04-23 21:17                 ` Zeng, Oak
2024-04-24  2:31                   ` Matthew Brost
2024-04-24 13:57                     ` Jason Gunthorpe
2024-04-24 16:35                       ` Matthew Brost
2024-04-24 16:44                         ` Jason Gunthorpe
2024-04-24 16:56                           ` Matthew Brost
2024-04-24 17:48                             ` Jason Gunthorpe
2024-04-24 13:48                   ` Jason Gunthorpe
2024-04-24 23:59                     ` Zeng, Oak
2024-04-25  1:05                       ` Jason Gunthorpe
2024-04-26  9:55                         ` Thomas Hellström
2024-04-26 12:00                           ` Jason Gunthorpe
2024-04-26 14:49                             ` Thomas Hellström
2024-04-26 16:35                               ` Jason Gunthorpe [this message]
2024-04-29  8:25                                 ` Thomas Hellström
2024-04-30 17:30                                   ` Jason Gunthorpe
2024-04-30 18:57                                     ` Daniel Vetter
2024-05-01  0:09                                       ` Jason Gunthorpe
2024-05-02  8:04                                         ` Daniel Vetter
2024-05-02  9:11                                           ` Thomas Hellström
2024-05-02 12:46                                             ` Jason Gunthorpe
2024-05-02 15:01                                               ` Thomas Hellström
2024-05-02 19:25                                                 ` Zeng, Oak
2024-05-03 13:37                                                   ` Jason Gunthorpe
2024-05-03 14:43                                                     ` Zeng, Oak
2024-05-03 16:28                                                       ` Jason Gunthorpe
2024-05-03 20:29                                                         ` Zeng, Oak
2024-05-04  1:03                                                           ` Dave Airlie
2024-05-06 13:04                                                             ` Daniel Vetter
2024-05-06 23:50                                                               ` Matthew Brost
2024-05-07 11:56                                                                 ` Jason Gunthorpe
2024-05-06 13:33                                                           ` Jason Gunthorpe
2024-04-09 17:33               ` Matthew Brost
2024-01-17 22:12 ` [PATCH 07/23] drm/xe/svm: Add helper for binding hmm range to gpu Oak Zeng
2024-01-17 22:12 ` [PATCH 08/23] drm/xe/svm: Add helper to invalidate svm range from GPU Oak Zeng
2024-01-17 22:12 ` [PATCH 09/23] drm/xe/svm: Remap and provide memmap backing for GPU vram Oak Zeng
2024-01-17 22:12 ` [PATCH 10/23] drm/xe/svm: Introduce svm migration function Oak Zeng
2024-01-17 22:12 ` [PATCH 11/23] drm/xe/svm: implement functions to allocate and free device memory Oak Zeng
2024-01-17 22:12 ` [PATCH 12/23] drm/xe/svm: Trace buddy block allocation and free Oak Zeng
2024-01-17 22:12 ` [PATCH 13/23] drm/xe/svm: Handle CPU page fault Oak Zeng
2024-01-17 22:12 ` [PATCH 14/23] drm/xe/svm: trace svm range migration Oak Zeng
2024-01-17 22:12 ` [PATCH 15/23] drm/xe/svm: Implement functions to register and unregister mmu notifier Oak Zeng
2024-01-17 22:12 ` [PATCH 16/23] drm/xe/svm: Implement the mmu notifier range invalidate callback Oak Zeng
2024-01-17 22:12 ` [PATCH 17/23] drm/xe/svm: clean up svm range during process exit Oak Zeng
2024-01-17 22:12 ` [PATCH 18/23] drm/xe/svm: Move a few structures to xe_gt.h Oak Zeng
2024-01-17 22:12 ` [PATCH 19/23] drm/xe/svm: migrate svm range to vram Oak Zeng
2024-01-17 22:12 ` [PATCH 20/23] drm/xe/svm: Populate svm range Oak Zeng
2024-01-17 22:12 ` [PATCH 21/23] drm/xe/svm: GPU page fault support Oak Zeng
2024-01-23  2:06   ` Welty, Brian
2024-01-23  3:09     ` Zeng, Oak
2024-01-23  3:21       ` Making drm_gpuvm work across gpu devices Zeng, Oak
2024-01-23 11:13         ` Christian König
2024-01-23 19:37           ` Zeng, Oak
2024-01-23 20:17             ` Felix Kuehling
2024-01-25  1:39               ` Zeng, Oak
2024-01-23 23:56             ` Danilo Krummrich
2024-01-24  3:57               ` Zeng, Oak
2024-01-24  4:14                 ` Zeng, Oak
2024-01-24  6:48                   ` Christian König
2024-01-25 22:13                 ` Danilo Krummrich
2024-01-24  8:33             ` Christian König
2024-01-25  1:17               ` Zeng, Oak
2024-01-25  1:25                 ` David Airlie
2024-01-25  5:25                   ` Zeng, Oak
2024-01-26 10:09                     ` Christian König
2024-01-26 20:13                       ` Zeng, Oak
2024-01-29 10:10                         ` Christian König
2024-01-29 20:09                           ` Zeng, Oak
2024-01-25 11:00                 ` 回复:Making " 周春明(日月)
2024-01-25 17:00                   ` Zeng, Oak
2024-01-25 17:15                 ` Making " Felix Kuehling
2024-01-25 18:37                   ` Zeng, Oak
2024-01-26 13:23                     ` Christian König
2024-01-25 16:42               ` Zeng, Oak
2024-01-25 18:32               ` Daniel Vetter
2024-01-25 21:02                 ` Zeng, Oak
2024-01-26  8:21                 ` Thomas Hellström
2024-01-26 12:52                   ` Christian König
2024-01-27  2:21                     ` Zeng, Oak
2024-01-29 10:19                       ` Christian König
2024-01-30  0:21                         ` Zeng, Oak
2024-01-30  8:39                           ` Christian König
2024-01-30 22:29                             ` Zeng, Oak
2024-01-30 23:12                               ` David Airlie
2024-01-31  9:15                                 ` Daniel Vetter
2024-01-31 20:17                                   ` Zeng, Oak
2024-01-31 20:59                                     ` Zeng, Oak
2024-02-01  8:52                                     ` Christian König
2024-02-29 18:22                                       ` Zeng, Oak
2024-03-08  4:43                                         ` Zeng, Oak
2024-03-08 10:07                                           ` Christian König
2024-01-30  8:43                           ` Thomas Hellström
2024-01-29 15:03                 ` Felix Kuehling
2024-01-29 15:33                   ` Christian König
2024-01-29 16:24                     ` Felix Kuehling
2024-01-29 16:28                       ` Christian König
2024-01-29 17:52                         ` Felix Kuehling
2024-01-29 19:03                           ` Christian König
2024-01-29 20:24                             ` Felix Kuehling
2024-02-23 20:12               ` Zeng, Oak
2024-02-27  6:54                 ` Christian König
2024-02-27 15:58                   ` Zeng, Oak
2024-02-28 19:51                     ` Zeng, Oak
2024-02-29  9:41                       ` Christian König
2024-02-29 16:05                         ` Zeng, Oak
2024-02-29 17:12                         ` Thomas Hellström
2024-03-01  7:01                           ` Christian König
2024-01-17 22:12 ` [PATCH 22/23] drm/xe/svm: Add DRM_XE_SVM kernel config entry Oak Zeng
2024-01-17 22:12 ` [PATCH 23/23] drm/xe/svm: Add svm memory hints interface Oak Zeng
2024-01-18  2:45 ` ✓ CI.Patch_applied: success for XeKmd basic SVM support Patchwork
2024-01-18  2:46 ` ✗ CI.checkpatch: warning " Patchwork
2024-01-18  2:46 ` ✗ CI.KUnit: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240426163519.GZ941030@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=brian.welty@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=himal.prasad.ghimiray@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=krishnaiah.bommu@intel.com \
    --cc=leon@kernel.org \
    --cc=matthew.brost@intel.com \
    --cc=niranjana.vishwanathapura@intel.com \
    --cc=oak.zeng@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).