intel-xe.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: "Zeng, Oak" <oak.zeng@intel.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: "dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
	"Brost, Matthew" <matthew.brost@intel.com>,
	"Thomas.Hellstrom@linux.intel.com"
	<Thomas.Hellstrom@linux.intel.com>,
	"Welty, Brian" <brian.welty@intel.com>,
	"Ghimiray, Himal Prasad" <himal.prasad.ghimiray@intel.com>,
	"Bommu, Krishnaiah" <krishnaiah.bommu@intel.com>,
	"Vishwanathapura,
	Niranjana" <niranjana.vishwanathapura@intel.com>,
	Leon Romanovsky <leon@kernel.org>
Subject: RE: [PATCH 06/23] drm/xe/svm: Introduce a helper to build sg table from hmm range
Date: Tue, 23 Apr 2024 21:17:03 +0000	[thread overview]
Message-ID: <SA1PR11MB6991EDB4351D99B4E76EBC2992112@SA1PR11MB6991.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20240409172418.GA5383@nvidia.com>

Hi Jason,

Sorry for a late reply. I have been working on a v2 of this series: https://patchwork.freedesktop.org/series/132229/. This version addressed some of your concerns, such as removing the global character device, removing svm process concept (need further clean up per Matt's feedback).

But the main concern you raised is not addressed yet. I need to further make sure I understand your concerns, See inline.



> -----Original Message-----
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, April 9, 2024 1:24 PM
> To: Zeng, Oak <oak.zeng@intel.com>
> Cc: dri-devel@lists.freedesktop.org; intel-xe@lists.freedesktop.org; Brost, Matthew
> <matthew.brost@intel.com>; Thomas.Hellstrom@linux.intel.com; Welty, Brian
> <brian.welty@intel.com>; Ghimiray, Himal Prasad
> <himal.prasad.ghimiray@intel.com>; Bommu, Krishnaiah
> <krishnaiah.bommu@intel.com>; Vishwanathapura, Niranjana
> <niranjana.vishwanathapura@intel.com>; Leon Romanovsky <leon@kernel.org>
> Subject: Re: [PATCH 06/23] drm/xe/svm: Introduce a helper to build sg table from
> hmm range
> 
> On Tue, Apr 09, 2024 at 04:45:22PM +0000, Zeng, Oak wrote:
> 
> > > I saw, I am saying this should not be done. You cannot unmap bits of
> > > a sgl mapping if an invalidation comes in.
> >
> > You are right, if we register a huge mmu interval notifier to cover
> > the whole address space, then we should use dma map/unmap pages to
> > map bits of the address space. We will explore this approach.
> >
> > Right now, in xe driver, mmu interval notifier is dynamically
> > registered with small address range. We map/unmap the whole small
> > address ranges each time. So functionally it still works. But it
> > might not be as performant as the method you said.
> 
> Please don't do this, it is not how hmm_range_fault() should be
> used.
> 
> It is intended to be page by page and there is no API surface
> available to safely try to construct covering ranges. Drivers
> definately should not try to invent such a thing.

I need your help to understand this comment. Our gpu mirrors the whole CPU virtual address space. It is the first design pattern in your previous reply (entire exclusive owner of a single device private page table and fully mirrors the mm page table into the device table.) 

What do you mean by "page by page"/" no API surface available to safely try to construct covering ranges"? As I understand it, hmm_range_fault take a virtual address range (defined in hmm_range struct), and walk cpu page table in this range. It is a range based API.

From your previous reply ("So I find it a quite strange that this RFC is creating VMA's, notifiers and ranges on the fly "), it seems you are concerning why we are creating vma and register mmu interval notifier on the fly. Let me try to explain it. Xe_vma is a very fundamental concept in xe driver. The gpu page table update, invalidation are all vma-based. This concept exists before this svm work. For svm, we create a 2M (the size is user configurable) vma during gpu page fault handler and register this 2M range to mmu interval notifier.

Now I try to figure out if we don't create vma, what can we do? We can map one page (which contains the gpu fault address) to gpu page table. But that doesn't work for us because the GPU cache and TLB would not be performant for 4K page each time. One way to think of the vma is a chunk size which is good for GPU HW performance.

And the mmu notifier... if we don't register the mmu notifier on the fly, do we register one mmu notifier to cover the whole CPU virtual address space (which would be huge, e.g., 0~2^56 on a 57 bit machine, if we have half half user space kernel space split)? That won't be performant as well because for any address range that is unmapped from cpu program, but even if they are never touched by GPU, gpu driver still got a invalidation callback. In our approach, we only register a mmu notifier for address range that we know gpu would touch it. 

> 
> > > Please don't use sg list at all for this.
> >
> > As explained, we use sg list for device private pages so we can
> > re-used the gpu page table update codes.
> 
> I'm asking you not to use SGL lists for that too. SGL lists are not
> generic data structures to hold DMA lists.

Matt mentioned to use drm_buddy_block. I will see how that works out.

> 
> > > This is not what I'm talking about. The GPU VMA is bound to a specific
> > > MM VA, it should not be created on demand.
> >
> > Today we have two places where we create gpu vma: 1) create gpu vma
> > during a vm_bind ioctl 2) create gpu vma during a page fault of the
> > system allocator range (this will be in v2 of this series).
> 
> Don't do 2.

As said, we will try the approach of one gigantic gpu vma with N page table states. We will create page table states in page fault handling. But this is only planned for stage 2. 

> 
> > I suspect something dynamic is still necessary, either a vma or a
> > page table state (1 vma but many page table state created
> > dynamically, as planned in our stage 2).
> 
> I'm expecting you'd populate the page table memory on demand.

We do populate gpu page table on demand. When gpu access a virtual address, we populate the gpu page table.


> 
> > The reason is, we still need some gpu corresponding structure to
> > match the cpu vm_area_struct.
> 
> Definately not.

See explanation above.

> 
> > For example, when gpu page fault happens, you look
> > up the cpu vm_area_struct for the fault address and create a
> > corresponding state/struct. And people can have as many cpu
> > vm_area_struct as they want.
> 
> No you don't.

See explains above. I need your help to understand how we can do it without a vma (or chunk). One thing GPU driver is different from RDMA driver is, RDMA doesn't have device private memory so no migration. It only need to dma-map the system memory pages and use them to fill RDMA page table. so RDMA don't need another memory manager such as our buddy. RDMA only deal with system memory which is completely struct page based management. Page by page make 100 % sense for RDMA. 

But for gpu, we need a way to use device local memory efficiently. This is the main reason we have vma/chunk concept.

Thanks,
Oak


> 
> You call hmm_range_fault() and it does everything for you. A driver
> should never touch CPU VMAs and must not be aware of them in any away.
> 
> Jason

  reply	other threads:[~2024-04-23 21:17 UTC|newest]

Thread overview: 126+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-17 22:12 [PATCH 00/23] XeKmd basic SVM support Oak Zeng
2024-01-17 22:12 ` [PATCH 01/23] drm/xe/svm: Add SVM document Oak Zeng
2024-01-17 22:12 ` [PATCH 02/23] drm/xe/svm: Add svm key data structures Oak Zeng
2024-01-17 22:12 ` [PATCH 03/23] drm/xe/svm: create xe svm during vm creation Oak Zeng
2024-01-17 22:12 ` [PATCH 04/23] drm/xe/svm: Trace svm creation Oak Zeng
2024-01-17 22:12 ` [PATCH 05/23] drm/xe/svm: add helper to retrieve svm range from address Oak Zeng
2024-01-17 22:12 ` [PATCH 06/23] drm/xe/svm: Introduce a helper to build sg table from hmm range Oak Zeng
2024-04-05  0:39   ` Jason Gunthorpe
2024-04-05  3:33     ` Zeng, Oak
2024-04-05 12:37       ` Jason Gunthorpe
2024-04-05 16:42         ` Zeng, Oak
2024-04-05 18:02           ` Jason Gunthorpe
2024-04-09 16:45             ` Zeng, Oak
2024-04-09 17:24               ` Jason Gunthorpe
2024-04-23 21:17                 ` Zeng, Oak [this message]
2024-04-24  2:31                   ` Matthew Brost
2024-04-24 13:57                     ` Jason Gunthorpe
2024-04-24 16:35                       ` Matthew Brost
2024-04-24 16:44                         ` Jason Gunthorpe
2024-04-24 16:56                           ` Matthew Brost
2024-04-24 17:48                             ` Jason Gunthorpe
2024-04-24 13:48                   ` Jason Gunthorpe
2024-04-24 23:59                     ` Zeng, Oak
2024-04-25  1:05                       ` Jason Gunthorpe
2024-04-26  9:55                         ` Thomas Hellström
2024-04-26 12:00                           ` Jason Gunthorpe
2024-04-26 14:49                             ` Thomas Hellström
2024-04-26 16:35                               ` Jason Gunthorpe
2024-04-29  8:25                                 ` Thomas Hellström
2024-04-30 17:30                                   ` Jason Gunthorpe
2024-04-30 18:57                                     ` Daniel Vetter
2024-05-01  0:09                                       ` Jason Gunthorpe
2024-05-02  8:04                                         ` Daniel Vetter
2024-05-02  9:11                                           ` Thomas Hellström
2024-05-02 12:46                                             ` Jason Gunthorpe
2024-05-02 15:01                                               ` Thomas Hellström
2024-05-02 19:25                                                 ` Zeng, Oak
2024-05-03 13:37                                                   ` Jason Gunthorpe
2024-05-03 14:43                                                     ` Zeng, Oak
2024-05-03 16:28                                                       ` Jason Gunthorpe
2024-05-03 20:29                                                         ` Zeng, Oak
2024-05-04  1:03                                                           ` Dave Airlie
2024-05-06 13:04                                                             ` Daniel Vetter
2024-05-06 23:50                                                               ` Matthew Brost
2024-05-07 11:56                                                                 ` Jason Gunthorpe
2024-05-06 13:33                                                           ` Jason Gunthorpe
2024-04-09 17:33               ` Matthew Brost
2024-01-17 22:12 ` [PATCH 07/23] drm/xe/svm: Add helper for binding hmm range to gpu Oak Zeng
2024-01-17 22:12 ` [PATCH 08/23] drm/xe/svm: Add helper to invalidate svm range from GPU Oak Zeng
2024-01-17 22:12 ` [PATCH 09/23] drm/xe/svm: Remap and provide memmap backing for GPU vram Oak Zeng
2024-01-17 22:12 ` [PATCH 10/23] drm/xe/svm: Introduce svm migration function Oak Zeng
2024-01-17 22:12 ` [PATCH 11/23] drm/xe/svm: implement functions to allocate and free device memory Oak Zeng
2024-01-17 22:12 ` [PATCH 12/23] drm/xe/svm: Trace buddy block allocation and free Oak Zeng
2024-01-17 22:12 ` [PATCH 13/23] drm/xe/svm: Handle CPU page fault Oak Zeng
2024-01-17 22:12 ` [PATCH 14/23] drm/xe/svm: trace svm range migration Oak Zeng
2024-01-17 22:12 ` [PATCH 15/23] drm/xe/svm: Implement functions to register and unregister mmu notifier Oak Zeng
2024-01-17 22:12 ` [PATCH 16/23] drm/xe/svm: Implement the mmu notifier range invalidate callback Oak Zeng
2024-01-17 22:12 ` [PATCH 17/23] drm/xe/svm: clean up svm range during process exit Oak Zeng
2024-01-17 22:12 ` [PATCH 18/23] drm/xe/svm: Move a few structures to xe_gt.h Oak Zeng
2024-01-17 22:12 ` [PATCH 19/23] drm/xe/svm: migrate svm range to vram Oak Zeng
2024-01-17 22:12 ` [PATCH 20/23] drm/xe/svm: Populate svm range Oak Zeng
2024-01-17 22:12 ` [PATCH 21/23] drm/xe/svm: GPU page fault support Oak Zeng
2024-01-23  2:06   ` Welty, Brian
2024-01-23  3:09     ` Zeng, Oak
2024-01-23  3:21       ` Making drm_gpuvm work across gpu devices Zeng, Oak
2024-01-23 11:13         ` Christian König
2024-01-23 19:37           ` Zeng, Oak
2024-01-23 20:17             ` Felix Kuehling
2024-01-25  1:39               ` Zeng, Oak
2024-01-23 23:56             ` Danilo Krummrich
2024-01-24  3:57               ` Zeng, Oak
2024-01-24  4:14                 ` Zeng, Oak
2024-01-24  6:48                   ` Christian König
2024-01-25 22:13                 ` Danilo Krummrich
2024-01-24  8:33             ` Christian König
2024-01-25  1:17               ` Zeng, Oak
2024-01-25  1:25                 ` David Airlie
2024-01-25  5:25                   ` Zeng, Oak
2024-01-26 10:09                     ` Christian König
2024-01-26 20:13                       ` Zeng, Oak
2024-01-29 10:10                         ` Christian König
2024-01-29 20:09                           ` Zeng, Oak
2024-01-25 11:00                 ` 回复:Making " 周春明(日月)
2024-01-25 17:00                   ` Zeng, Oak
2024-01-25 17:15                 ` Making " Felix Kuehling
2024-01-25 18:37                   ` Zeng, Oak
2024-01-26 13:23                     ` Christian König
2024-01-25 16:42               ` Zeng, Oak
2024-01-25 18:32               ` Daniel Vetter
2024-01-25 21:02                 ` Zeng, Oak
2024-01-26  8:21                 ` Thomas Hellström
2024-01-26 12:52                   ` Christian König
2024-01-27  2:21                     ` Zeng, Oak
2024-01-29 10:19                       ` Christian König
2024-01-30  0:21                         ` Zeng, Oak
2024-01-30  8:39                           ` Christian König
2024-01-30 22:29                             ` Zeng, Oak
2024-01-30 23:12                               ` David Airlie
2024-01-31  9:15                                 ` Daniel Vetter
2024-01-31 20:17                                   ` Zeng, Oak
2024-01-31 20:59                                     ` Zeng, Oak
2024-02-01  8:52                                     ` Christian König
2024-02-29 18:22                                       ` Zeng, Oak
2024-03-08  4:43                                         ` Zeng, Oak
2024-03-08 10:07                                           ` Christian König
2024-01-30  8:43                           ` Thomas Hellström
2024-01-29 15:03                 ` Felix Kuehling
2024-01-29 15:33                   ` Christian König
2024-01-29 16:24                     ` Felix Kuehling
2024-01-29 16:28                       ` Christian König
2024-01-29 17:52                         ` Felix Kuehling
2024-01-29 19:03                           ` Christian König
2024-01-29 20:24                             ` Felix Kuehling
2024-02-23 20:12               ` Zeng, Oak
2024-02-27  6:54                 ` Christian König
2024-02-27 15:58                   ` Zeng, Oak
2024-02-28 19:51                     ` Zeng, Oak
2024-02-29  9:41                       ` Christian König
2024-02-29 16:05                         ` Zeng, Oak
2024-02-29 17:12                         ` Thomas Hellström
2024-03-01  7:01                           ` Christian König
2024-01-17 22:12 ` [PATCH 22/23] drm/xe/svm: Add DRM_XE_SVM kernel config entry Oak Zeng
2024-01-17 22:12 ` [PATCH 23/23] drm/xe/svm: Add svm memory hints interface Oak Zeng
2024-01-18  2:45 ` ✓ CI.Patch_applied: success for XeKmd basic SVM support Patchwork
2024-01-18  2:46 ` ✗ CI.checkpatch: warning " Patchwork
2024-01-18  2:46 ` ✗ CI.KUnit: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SA1PR11MB6991EDB4351D99B4E76EBC2992112@SA1PR11MB6991.namprd11.prod.outlook.com \
    --to=oak.zeng@intel.com \
    --cc=Thomas.Hellstrom@linux.intel.com \
    --cc=brian.welty@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=himal.prasad.ghimiray@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jgg@nvidia.com \
    --cc=krishnaiah.bommu@intel.com \
    --cc=leon@kernel.org \
    --cc=matthew.brost@intel.com \
    --cc=niranjana.vishwanathapura@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).