intel-xe.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: "Zeng, Oak" <oak.zeng@intel.com>
To: "Christian König" <christian.koenig@amd.com>,
	"David Airlie" <airlied@redhat.com>,
	"jglisse@redhat.com" <jglisse@redhat.com>,
	"rcampbell@nvidia.com" <rcampbell@nvidia.com>,
	"apopple@nvidia.com" <apopple@nvidia.com>
Cc: "Winiarski, Michal" <michal.winiarski@intel.com>,
	Felix Kuehling <felix.kuehling@amd.com>,
	"Shah,  Ankur N" <ankur.n.shah@intel.com>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	Daniel Vetter <daniel@ffwll.ch>,
	"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
	Danilo Krummrich <dakr@redhat.com>
Subject: RE: Making drm_gpuvm work across gpu devices
Date: Mon, 29 Jan 2024 20:09:24 +0000	[thread overview]
Message-ID: <SA1PR11MB6991DAF836BEC82564024956927E2@SA1PR11MB6991.namprd11.prod.outlook.com> (raw)
In-Reply-To: <ac844c7e-dd3b-426e-bfa4-87dc8aeaffcf@amd.com>

[-- Attachment #1: Type: text/plain, Size: 14860 bytes --]

Hi Christian,

Even though this email thread was started to discuss shared virtual address space b/t multiple GPU devices, I eventually found you even don’t agree with a shared virtual address space b/t CPU and GPU program. So let’s forget about multiple GPU devices for now. I will try explain the shared address space b/t cpu and one gpu.

HMM was designed to solve the GPU programmability problem with a very fundamental assumption which is GPU program shares a same virtual address space with CPU program, for example, with HMM any CPU pointers (such as malloc’ed, stack variables and globals) can be used directly on you GPU shader program. Are you against this design goal? HMM is already part of linux core MM and Linus approved this design. CC’ed Jérôme.

Here is an example of how application can use system allocator (hmm),  I copied from https://developer.nvidia.com/blog/simplifying-gpu-application-development-with-heterogeneous-memory-management/. CC’ed a few Nvidia folks.

void sortfile(FILE* fp, int N) {
  char* data;
  data = (char*)malloc(N);

  fread(data, 1, N, fp);
  qsort<<<...>>>(data, N, 1, cmp);
  cudaDeviceSynchronize();

  use_data(data);
  free(data)
}

As you can see, malloced ptr is used directly in GPU program, no userptr ioctl, no vm_bind. This is the model Intel also want to support, besides AMD and Nvidia.

Lastly, nouveau in the kernel already support hmm and system allocator. It also support shared virtual address space b/t CPU and GPU program. All the codes already merged upstream.


See also comments inline to your questions.

I will address your other email separately.

Regards,
Oak

From: Christian König <christian.koenig@amd.com>
Sent: Monday, January 29, 2024 5:11 AM
To: Zeng, Oak <oak.zeng@intel.com>; David Airlie <airlied@redhat.com>
Cc: Ghimiray, Himal Prasad <himal.prasad.ghimiray@intel.com>; Thomas.Hellstrom@linux.intel.com; Winiarski, Michal <michal.winiarski@intel.com>; Felix Kuehling <felix.kuehling@amd.com>; Welty, Brian <brian.welty@intel.com>; Shah, Ankur N <ankur.n.shah@intel.com>; dri-devel@lists.freedesktop.org; intel-xe@lists.freedesktop.org; Gupta, saurabhg <saurabhg.gupta@intel.com>; Danilo Krummrich <dakr@redhat.com>; Daniel Vetter <daniel@ffwll.ch>; Brost, Matthew <matthew.brost@intel.com>; Bommu, Krishnaiah <krishnaiah.bommu@intel.com>; Vishwanathapura, Niranjana <niranjana.vishwanathapura@intel.com>
Subject: Re: Making drm_gpuvm work across gpu devices

Am 26.01.24 um 21:13 schrieb Zeng, Oak:

-----Original Message-----

From: Christian König <christian.koenig@amd.com><mailto:christian.koenig@amd.com>

Sent: Friday, January 26, 2024 5:10 AM

To: Zeng, Oak <oak.zeng@intel.com><mailto:oak.zeng@intel.com>; David Airlie <airlied@redhat.com><mailto:airlied@redhat.com>

Cc: Ghimiray, Himal Prasad <himal.prasad.ghimiray@intel.com><mailto:himal.prasad.ghimiray@intel.com>;

Thomas.Hellstrom@linux.intel.com<mailto:Thomas.Hellstrom@linux.intel.com>; Winiarski, Michal

<michal.winiarski@intel.com><mailto:michal.winiarski@intel.com>; Felix Kuehling <felix.kuehling@amd.com><mailto:felix.kuehling@amd.com>; Welty,

Brian <brian.welty@intel.com><mailto:brian.welty@intel.com>; Shah, Ankur N <ankur.n.shah@intel.com><mailto:ankur.n.shah@intel.com>; dri-

devel@lists.freedesktop.org<mailto:devel@lists.freedesktop.org>; intel-xe@lists.freedesktop.org<mailto:intel-xe@lists.freedesktop.org>; Gupta, saurabhg

<saurabhg.gupta@intel.com><mailto:saurabhg.gupta@intel.com>; Danilo Krummrich <dakr@redhat.com><mailto:dakr@redhat.com>; Daniel

Vetter <daniel@ffwll.ch><mailto:daniel@ffwll.ch>; Brost, Matthew <matthew.brost@intel.com><mailto:matthew.brost@intel.com>; Bommu,

Krishnaiah <krishnaiah.bommu@intel.com><mailto:krishnaiah.bommu@intel.com>; Vishwanathapura, Niranjana

<niranjana.vishwanathapura@intel.com><mailto:niranjana.vishwanathapura@intel.com>

Subject: Re: Making drm_gpuvm work across gpu devices



Hi Oak,



you can still use SVM, but it should not be a design criteria for the

kernel UAPI. In other words the UAPI should be designed in such a way

that the GPU virtual address can be equal to the CPU virtual address of

a buffer, but can also be different to support use cases where this

isn't the case.



Terminology:

SVM: any technology which can achieve a shared virtual address space b/t cpu and devices. The virtual address space can be managed by user space or kernel space. Intel implemented a SVM, based on the BO-centric gpu driver (gem-create, vm-bind) where virtual address space is managed by UMD.

System allocator: another way of implement SVM. User just use malloc'ed memory for gpu submission. Virtual address space is managed by Linux core mm. In practice, we leverage HMM to implement system allocator.

This article described details of all those different model: https://developer.nvidia.com/blog/simplifying-gpu-application-development-with-heterogeneous-memory-management/



Our programming model allows a mixture use of system allocator (even though system allocator is ) and traditional vm_bind (where cpu address can != gpu address). Let me re-post the pseudo codes:



 1. Fd0 = open(/"dev/dri/render0")

 2. Fd1 = open("/dev/dri/render1")

 3. Fd3 = open("/dev/dri/xe-svm")

 4. Gpu_Vm0 =xe_vm_create(fd0)

 5. Gpu_Vm1 = xe_vm_create(fd1)

 6. Queue0 = xe_exec_queue_create(fd0, gpu_vm0)

 7. Queue1 = xe_exec_queue_create(fd1, gpu_vm1)

 8. ptr = malloc()

 9. bo = xe_bo_create(fd0)

 10. Vm_bind(bo, gpu_vm0, va)//va is from UMD, cpu can access bo with same or different va. It is UMD's responsibility that va doesn't conflict with malloc'ed PTRs.

 11. Xe_exec(queue0, ptr)//submit gpu job which use ptr, on card0

 12. Xe_exec(queue1, ptr)//submit gpu job which use ptr, on card1

 13. Xe_exec(queue0, va)//submit gpu job which use va, on card0



In above codes, the va used in vm_bind (line 10, Intel's API to bind an object to a va for GPU access) can be different from the CPU address when cpu access the same object. But whenever user use malloc'ed ptr for GPU submission (line 11, 12, so called system allocator), it implies CPU and GPUs use the same ptr to access.



In above vm_bind, it is user/UMD's responsibility to guarantee that vm_bind va doesn't conflict with malloc'ed ptr. Otherwise it is treated as programming error.



I think this design still meets your design restrictions.

Well why do you need this "Fd3 = open("/dev/dri/xe-svm")" ?

As far as I see fd3 isn't used anywhere.

We use fd3 for memory hints ioctls (I didn’t write in above program). Under the picture of system allocator, memory hint is applied to a virtual address range in a process, not specific to one GPU device. So we can’t use fd1 and fd2 for this purpose. For example, you can set the preferred memory location of a address range to be on gpu device1’s memory.


What you can do is to bind parts of your process address space to your driver connections (fd1, fd2 etc..) with a vm_bind(), but this should *not* come because of implicitely using some other file descriptor in the process.


We already have a vm_bind api which is used for a split CPU and GPU virtual address space (means GPU virtual address space can != CPU virtual address space.) for KMD. In this case, it is UMD’s responsibility to manage the whole virtual address space. UMD can make the CPU VA ==GPU VA or CPU VA!=GPU VA. It doesn’t matter for KMD. We already have this thing working. We also used this approach to achieve a shared virtual address space b/t CPU and GPU, where UMD managed to make CPU VA == GPU VA.

All the discussion in this email thread was triggered by our effort to support system allocator, which means application can use CPU pointers directly on GPU shader program *without* extra driver IOCTL call. The purpose of this programming model is to further simplify the GPU programming across all programming languages. By the definition of system allocator, GPU va is always == CPU VA.

Our API/xeKmd is designed to work for both of above two programming model.


As far as I can see this design is exactly what failed so badly with KFD.

Regards,
Christian.











Additionally to what Dave wrote I can summarize a few things I have

learned while working on the AMD GPU drivers in the last decade or so:



1. Userspace requirements are *not* relevant for UAPI or even more

general kernel driver design.



2. What should be done is to look at the hardware capabilities and try

to expose those in a save manner to userspace.



3. The userspace requirements are then used to validate the kernel

driver and especially the UAPI design to ensure that nothing was missed.



The consequence of this is that nobody should ever use things like Cuda,

Vulkan, OpenCL, OpenGL etc.. as argument to propose a certain UAPI design.



What should be done instead is to say: My hardware works in this and

that way -> we want to expose it like this -> because that enables us to

implement the high level API in this and that way.



Only this gives then a complete picture of how things interact together

and allows the kernel community to influence and validate the design.



What you described above is mainly bottom up. I know other people do top down, or whole system vertical HW-SW co-design. I don't have strong opinion here.



Regards,

Oak





This doesn't mean that you need to throw away everything, but it gives a

clear restriction that designs are not nailed in stone and for example

you can't use something like a waterfall model.



Going to answer on your other questions separately.



Regards,

Christian.



Am 25.01.24 um 06:25 schrieb Zeng, Oak:

Hi Dave,



Let me step back. When I wrote " shared virtual address space b/t cpu and all

gpu devices is a hard requirement for our system allocator design", I meant this is

not only Intel's design requirement. Rather this is a common requirement for

both Intel, AMD and Nvidia. Take a look at cuda driver API definition of

cuMemAllocManaged (search this API on https://docs.nvidia.com/cuda/cuda-

driver-api/group__CUDA__MEM.html#group__CUDA__MEM), it said:



"The pointer is valid on the CPU and on all GPUs in the system that support

managed memory."



This means the program virtual address space is shared b/t CPU and all GPU

devices on the system. The system allocator we are discussing is just one step

advanced than cuMemAllocManaged: it allows malloc'ed memory to be shared

b/t CPU and all GPU devices.



I hope we all agree with this point.



With that, I agree with Christian that in kmd we should make driver code per-

device based instead of managing all devices in one driver instance. Our system

allocator (and generally xekmd)design follows this rule: we make xe_vm per

device based - one device is *not* aware of other device's address space, as I

explained in previous email. I started this email seeking a one drm_gpuvm

instance to cover all GPU devices. I gave up this approach (at least for now) per

Danilo and Christian's feedback: We will continue to have per device based

drm_gpuvm. I hope this is aligned with Christian but I will have to wait for

Christian's reply to my previous email.



I hope this clarify thing a little.



Regards,

Oak



-----Original Message-----

From: dri-devel <dri-devel-bounces@lists.freedesktop.org><mailto:dri-devel-bounces@lists.freedesktop.org> On Behalf Of

David

Airlie

Sent: Wednesday, January 24, 2024 8:25 PM

To: Zeng, Oak <oak.zeng@intel.com><mailto:oak.zeng@intel.com>

Cc: Ghimiray, Himal Prasad <himal.prasad.ghimiray@intel.com><mailto:himal.prasad.ghimiray@intel.com>;

Thomas.Hellstrom@linux.intel.com<mailto:Thomas.Hellstrom@linux.intel.com>; Winiarski, Michal

<michal.winiarski@intel.com><mailto:michal.winiarski@intel.com>; Felix Kuehling <felix.kuehling@amd.com><mailto:felix.kuehling@amd.com>;

Welty,

Brian <brian.welty@intel.com><mailto:brian.welty@intel.com>; Shah, Ankur N <ankur.n.shah@intel.com><mailto:ankur.n.shah@intel.com>;

dri-

devel@lists.freedesktop.org<mailto:devel@lists.freedesktop.org>; intel-xe@lists.freedesktop.org<mailto:intel-xe@lists.freedesktop.org>; Gupta, saurabhg

<saurabhg.gupta@intel.com><mailto:saurabhg.gupta@intel.com>; Danilo Krummrich <dakr@redhat.com><mailto:dakr@redhat.com>; Daniel

Vetter <daniel@ffwll.ch><mailto:daniel@ffwll.ch>; Brost, Matthew <matthew.brost@intel.com><mailto:matthew.brost@intel.com>;

Bommu,

Krishnaiah <krishnaiah.bommu@intel.com><mailto:krishnaiah.bommu@intel.com>; Vishwanathapura, Niranjana

<niranjana.vishwanathapura@intel.com><mailto:niranjana.vishwanathapura@intel.com>; Christian König

<christian.koenig@amd.com><mailto:christian.koenig@amd.com>

Subject: Re: Making drm_gpuvm work across gpu devices





For us, Xekmd doesn't need to know it is running under bare metal or

virtualized environment. Xekmd is always a guest driver. All the virtual address

used in xekmd is guest virtual address. For SVM, we require all the VF devices

share one single shared address space with guest CPU program. So all the

design

works in bare metal environment can automatically work under virtualized

environment. +@Shah, Ankur N +@Winiarski, Michal to backup me if I am

wrong.





Again, shared virtual address space b/t cpu and all gpu devices is a hard

requirement for our system allocator design (which means malloc’ed memory,

cpu stack variables, globals can be directly used in gpu program. Same

requirement as kfd SVM design). This was aligned with our user space

software

stack.



Just to make a very general point here (I'm hoping you listen to

Christian a bit more and hoping he replies in more detail), but just

because you have a system allocator design done, it doesn't in any way

enforce the requirements on the kernel driver to accept that design.

Bad system design should be pushed back on, not enforced in

implementation stages. It's a trap Intel falls into regularly since

they say well we already agreed this design with the userspace team

and we can't change it now. This isn't acceptable. Design includes

upstream discussion and feedback, if you say misdesigned the system

allocator (and I'm not saying you definitely have), and this is

pushing back on that, then you have to go fix your system

architecture.



KFD was an experiment like this, I pushed back on AMD at the start

saying it was likely a bad plan, we let it go and got a lot of

experience in why it was a bad design.



Dave.




[-- Attachment #2: Type: text/html, Size: 26374 bytes --]

  reply	other threads:[~2024-01-29 20:09 UTC|newest]

Thread overview: 126+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-17 22:12 [PATCH 00/23] XeKmd basic SVM support Oak Zeng
2024-01-17 22:12 ` [PATCH 01/23] drm/xe/svm: Add SVM document Oak Zeng
2024-01-17 22:12 ` [PATCH 02/23] drm/xe/svm: Add svm key data structures Oak Zeng
2024-01-17 22:12 ` [PATCH 03/23] drm/xe/svm: create xe svm during vm creation Oak Zeng
2024-01-17 22:12 ` [PATCH 04/23] drm/xe/svm: Trace svm creation Oak Zeng
2024-01-17 22:12 ` [PATCH 05/23] drm/xe/svm: add helper to retrieve svm range from address Oak Zeng
2024-01-17 22:12 ` [PATCH 06/23] drm/xe/svm: Introduce a helper to build sg table from hmm range Oak Zeng
2024-04-05  0:39   ` Jason Gunthorpe
2024-04-05  3:33     ` Zeng, Oak
2024-04-05 12:37       ` Jason Gunthorpe
2024-04-05 16:42         ` Zeng, Oak
2024-04-05 18:02           ` Jason Gunthorpe
2024-04-09 16:45             ` Zeng, Oak
2024-04-09 17:24               ` Jason Gunthorpe
2024-04-23 21:17                 ` Zeng, Oak
2024-04-24  2:31                   ` Matthew Brost
2024-04-24 13:57                     ` Jason Gunthorpe
2024-04-24 16:35                       ` Matthew Brost
2024-04-24 16:44                         ` Jason Gunthorpe
2024-04-24 16:56                           ` Matthew Brost
2024-04-24 17:48                             ` Jason Gunthorpe
2024-04-24 13:48                   ` Jason Gunthorpe
2024-04-24 23:59                     ` Zeng, Oak
2024-04-25  1:05                       ` Jason Gunthorpe
2024-04-26  9:55                         ` Thomas Hellström
2024-04-26 12:00                           ` Jason Gunthorpe
2024-04-26 14:49                             ` Thomas Hellström
2024-04-26 16:35                               ` Jason Gunthorpe
2024-04-29  8:25                                 ` Thomas Hellström
2024-04-30 17:30                                   ` Jason Gunthorpe
2024-04-30 18:57                                     ` Daniel Vetter
2024-05-01  0:09                                       ` Jason Gunthorpe
2024-05-02  8:04                                         ` Daniel Vetter
2024-05-02  9:11                                           ` Thomas Hellström
2024-05-02 12:46                                             ` Jason Gunthorpe
2024-05-02 15:01                                               ` Thomas Hellström
2024-05-02 19:25                                                 ` Zeng, Oak
2024-05-03 13:37                                                   ` Jason Gunthorpe
2024-05-03 14:43                                                     ` Zeng, Oak
2024-05-03 16:28                                                       ` Jason Gunthorpe
2024-05-03 20:29                                                         ` Zeng, Oak
2024-05-04  1:03                                                           ` Dave Airlie
2024-05-06 13:04                                                             ` Daniel Vetter
2024-05-06 23:50                                                               ` Matthew Brost
2024-05-07 11:56                                                                 ` Jason Gunthorpe
2024-05-06 13:33                                                           ` Jason Gunthorpe
2024-04-09 17:33               ` Matthew Brost
2024-01-17 22:12 ` [PATCH 07/23] drm/xe/svm: Add helper for binding hmm range to gpu Oak Zeng
2024-01-17 22:12 ` [PATCH 08/23] drm/xe/svm: Add helper to invalidate svm range from GPU Oak Zeng
2024-01-17 22:12 ` [PATCH 09/23] drm/xe/svm: Remap and provide memmap backing for GPU vram Oak Zeng
2024-01-17 22:12 ` [PATCH 10/23] drm/xe/svm: Introduce svm migration function Oak Zeng
2024-01-17 22:12 ` [PATCH 11/23] drm/xe/svm: implement functions to allocate and free device memory Oak Zeng
2024-01-17 22:12 ` [PATCH 12/23] drm/xe/svm: Trace buddy block allocation and free Oak Zeng
2024-01-17 22:12 ` [PATCH 13/23] drm/xe/svm: Handle CPU page fault Oak Zeng
2024-01-17 22:12 ` [PATCH 14/23] drm/xe/svm: trace svm range migration Oak Zeng
2024-01-17 22:12 ` [PATCH 15/23] drm/xe/svm: Implement functions to register and unregister mmu notifier Oak Zeng
2024-01-17 22:12 ` [PATCH 16/23] drm/xe/svm: Implement the mmu notifier range invalidate callback Oak Zeng
2024-01-17 22:12 ` [PATCH 17/23] drm/xe/svm: clean up svm range during process exit Oak Zeng
2024-01-17 22:12 ` [PATCH 18/23] drm/xe/svm: Move a few structures to xe_gt.h Oak Zeng
2024-01-17 22:12 ` [PATCH 19/23] drm/xe/svm: migrate svm range to vram Oak Zeng
2024-01-17 22:12 ` [PATCH 20/23] drm/xe/svm: Populate svm range Oak Zeng
2024-01-17 22:12 ` [PATCH 21/23] drm/xe/svm: GPU page fault support Oak Zeng
2024-01-23  2:06   ` Welty, Brian
2024-01-23  3:09     ` Zeng, Oak
2024-01-23  3:21       ` Making drm_gpuvm work across gpu devices Zeng, Oak
2024-01-23 11:13         ` Christian König
2024-01-23 19:37           ` Zeng, Oak
2024-01-23 20:17             ` Felix Kuehling
2024-01-25  1:39               ` Zeng, Oak
2024-01-23 23:56             ` Danilo Krummrich
2024-01-24  3:57               ` Zeng, Oak
2024-01-24  4:14                 ` Zeng, Oak
2024-01-24  6:48                   ` Christian König
2024-01-25 22:13                 ` Danilo Krummrich
2024-01-24  8:33             ` Christian König
2024-01-25  1:17               ` Zeng, Oak
2024-01-25  1:25                 ` David Airlie
2024-01-25  5:25                   ` Zeng, Oak
2024-01-26 10:09                     ` Christian König
2024-01-26 20:13                       ` Zeng, Oak
2024-01-29 10:10                         ` Christian König
2024-01-29 20:09                           ` Zeng, Oak [this message]
2024-01-25 11:00                 ` 回复:Making " 周春明(日月)
2024-01-25 17:00                   ` Zeng, Oak
2024-01-25 17:15                 ` Making " Felix Kuehling
2024-01-25 18:37                   ` Zeng, Oak
2024-01-26 13:23                     ` Christian König
2024-01-25 16:42               ` Zeng, Oak
2024-01-25 18:32               ` Daniel Vetter
2024-01-25 21:02                 ` Zeng, Oak
2024-01-26  8:21                 ` Thomas Hellström
2024-01-26 12:52                   ` Christian König
2024-01-27  2:21                     ` Zeng, Oak
2024-01-29 10:19                       ` Christian König
2024-01-30  0:21                         ` Zeng, Oak
2024-01-30  8:39                           ` Christian König
2024-01-30 22:29                             ` Zeng, Oak
2024-01-30 23:12                               ` David Airlie
2024-01-31  9:15                                 ` Daniel Vetter
2024-01-31 20:17                                   ` Zeng, Oak
2024-01-31 20:59                                     ` Zeng, Oak
2024-02-01  8:52                                     ` Christian König
2024-02-29 18:22                                       ` Zeng, Oak
2024-03-08  4:43                                         ` Zeng, Oak
2024-03-08 10:07                                           ` Christian König
2024-01-30  8:43                           ` Thomas Hellström
2024-01-29 15:03                 ` Felix Kuehling
2024-01-29 15:33                   ` Christian König
2024-01-29 16:24                     ` Felix Kuehling
2024-01-29 16:28                       ` Christian König
2024-01-29 17:52                         ` Felix Kuehling
2024-01-29 19:03                           ` Christian König
2024-01-29 20:24                             ` Felix Kuehling
2024-02-23 20:12               ` Zeng, Oak
2024-02-27  6:54                 ` Christian König
2024-02-27 15:58                   ` Zeng, Oak
2024-02-28 19:51                     ` Zeng, Oak
2024-02-29  9:41                       ` Christian König
2024-02-29 16:05                         ` Zeng, Oak
2024-02-29 17:12                         ` Thomas Hellström
2024-03-01  7:01                           ` Christian König
2024-01-17 22:12 ` [PATCH 22/23] drm/xe/svm: Add DRM_XE_SVM kernel config entry Oak Zeng
2024-01-17 22:12 ` [PATCH 23/23] drm/xe/svm: Add svm memory hints interface Oak Zeng
2024-01-18  2:45 ` ✓ CI.Patch_applied: success for XeKmd basic SVM support Patchwork
2024-01-18  2:46 ` ✗ CI.checkpatch: warning " Patchwork
2024-01-18  2:46 ` ✗ CI.KUnit: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SA1PR11MB6991DAF836BEC82564024956927E2@SA1PR11MB6991.namprd11.prod.outlook.com \
    --to=oak.zeng@intel.com \
    --cc=airlied@redhat.com \
    --cc=ankur.n.shah@intel.com \
    --cc=apopple@nvidia.com \
    --cc=christian.koenig@amd.com \
    --cc=dakr@redhat.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=felix.kuehling@amd.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jglisse@redhat.com \
    --cc=michal.winiarski@intel.com \
    --cc=rcampbell@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).