* Regarding HMM @ 2020-08-18 7:15 Valmiki 2020-08-18 17:06 ` Ralph Campbell 0 siblings, 1 reply; 5+ messages in thread From: Valmiki @ 2020-08-18 7:15 UTC (permalink / raw) To: linux-mm; +Cc: jglisse Hi All, Im trying to understand heterogeneous memory management, i have following doubts. If HMM is being used we dont have to use DMA controller on device for memory transfers ? Without DMA if software is managing page faults and migrations, will there be any performance impacts ? Is HMM targeted for any specific use cases where DMA controller is not there on device ? Regards, Valmiki ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Regarding HMM 2020-08-18 7:15 Regarding HMM Valmiki @ 2020-08-18 17:06 ` Ralph Campbell 2020-08-18 20:35 ` John Hubbard 2020-08-23 13:08 ` Valmiki 0 siblings, 2 replies; 5+ messages in thread From: Ralph Campbell @ 2020-08-18 17:06 UTC (permalink / raw) To: Valmiki, linux-mm; +Cc: jglisse On 8/18/20 12:15 AM, Valmiki wrote: > Hi All, > > Im trying to understand heterogeneous memory management, i have following doubts. > > If HMM is being used we dont have to use DMA controller on device for memory transfers ? > Without DMA if software is managing page faults and migrations, will there be any performance impacts ? > > Is HMM targeted for any specific use cases where DMA controller is not there on device ? > > Regards, > Valmiki > There are two APIs that are part of "HMM" and are independent of each other. hmm_range_fault() is for getting the physical address of a system resident memory page that a device can map but is not pinned in the usual way I/O increases the page reference count to pin the page. The device driver has to handle invalidation callbacks to remove the device mapping. This lets the device access the page without moving it. migrate_vma_setup(), migrate_vma_pages(), and migrate_vma_finalize() are used by the device driver to migrate data to device private memory. After migration, the system memory is freed and the CPU page table holds an invalid PTE that points to the device private struct page (similar to a swap PTE). If the CPU process faults on that address, there is a callback to the driver to migrate it back to system memory. This is where device DMA engines can be used to copy data to/from system memory and device private memory. The use case for the above is to be able to run code such as OpenCL on GPUs and CPUs using the same virtual addresses without having to call special memory allocators. In other words, just use mmap() and malloc() and not clSVMAlloc(). There is a performance consideration here. If the GPU accesses the data over PCIe to system memory, there is much less bandwidth than accessing local GPU memory. If the data is to be accessed/used many times, it can be more efficient to migrate the data to local GPU memory. If the data is only accessed a few times, then it is probably more efficient to map system memory. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Regarding HMM 2020-08-18 17:06 ` Ralph Campbell @ 2020-08-18 20:35 ` John Hubbard 2020-08-23 13:21 ` Valmiki 2020-08-23 13:08 ` Valmiki 1 sibling, 1 reply; 5+ messages in thread From: John Hubbard @ 2020-08-18 20:35 UTC (permalink / raw) To: Ralph Campbell, Valmiki, linux-mm; +Cc: jglisse On 8/18/20 10:06 AM, Ralph Campbell wrote: > > On 8/18/20 12:15 AM, Valmiki wrote: >> Hi All, >> >> Im trying to understand heterogeneous memory management, i have following doubts. >> >> If HMM is being used we dont have to use DMA controller on device for memory transfers ? Hi, Nothing about HMM either requires or prevents using DMA controllers. >> Without DMA if software is managing page faults and migrations, will there be any performance >> impacts ? >> >> Is HMM targeted for any specific use cases where DMA controller is not there on device ? >> >> Regards, >> Valmiki >> > > There are two APIs that are part of "HMM" and are independent of each other. > > hmm_range_fault() is for getting the physical address of a system resident memory page that > a device can map but is not pinned in the usual way I/O increases the page reference count > to pin the page. The device driver has to handle invalidation callbacks to remove the device > mapping. This lets the device access the page without moving it. > > migrate_vma_setup(), migrate_vma_pages(), and migrate_vma_finalize() are used by the device > driver to migrate data to device private memory. After migration, the system memory is freed > and the CPU page table holds an invalid PTE that points to the device private struct page > (similar to a swap PTE). If the CPU process faults on that address, there is a callback > to the driver to migrate it back to system memory. This is where device DMA engines can > be used to copy data to/from system memory and device private memory. > > The use case for the above is to be able to run code such as OpenCL on GPUs and CPUs using > the same virtual addresses without having to call special memory allocators. > In other words, just use mmap() and malloc() and not clSVMAlloc(). > > There is a performance consideration here. If the GPU accesses the data over PCIe to > system memory, there is much less bandwidth than accessing local GPU memory. If the > data is to be accessed/used many times, it can be more efficient to migrate the data > to local GPU memory. If the data is only accessed a few times, then it is probably > more efficient to map system memory. > Ralph, that's a good write-up! Valmiki, did you already read Documentation/vm/hmm.rst, before posting your question? It's OK to say "no"--I'm not asking in order to criticize, but in order to calibrate the documentation. Because, we should consider merging in Ralph's write-up above into hmm.rst, depending on if it helps (which I expect it does, but I'm tainted by reading hmm.rst too many times and now I can't see what might be missing). Any time someone new tries to understand the system, it's an opportunity to "unit test" the documentation. Ideally, hmm.rst would answer many of a first-time reader's questions, that's where we'd like to end up. thanks, -- John Hubbard NVIDIA ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Regarding HMM 2020-08-18 20:35 ` John Hubbard @ 2020-08-23 13:21 ` Valmiki 0 siblings, 0 replies; 5+ messages in thread From: Valmiki @ 2020-08-23 13:21 UTC (permalink / raw) To: John Hubbard, Ralph Campbell, linux-mm; +Cc: jglisse On 19-08-2020 02:05 am, John Hubbard wrote: > On 8/18/20 10:06 AM, Ralph Campbell wrote: >> >> On 8/18/20 12:15 AM, Valmiki wrote: >>> Hi All, >>> >>> Im trying to understand heterogeneous memory management, i have >>> following doubts. >>> >>> If HMM is being used we dont have to use DMA controller on device for >>> memory transfers ? > > Hi, > > Nothing about HMM either requires or prevents using DMA controllers. > >>> Without DMA if software is managing page faults and migrations, will >>> there be any performance impacts ? >>> >>> Is HMM targeted for any specific use cases where DMA controller is >>> not there on device ? >>> >>> Regards, >>> Valmiki >>> >> >> There are two APIs that are part of "HMM" and are independent of each >> other. >> >> hmm_range_fault() is for getting the physical address of a system >> resident memory page that >> a device can map but is not pinned in the usual way I/O increases the >> page reference count >> to pin the page. The device driver has to handle invalidation >> callbacks to remove the device >> mapping. This lets the device access the page without moving it. >> >> migrate_vma_setup(), migrate_vma_pages(), and migrate_vma_finalize() >> are used by the device >> driver to migrate data to device private memory. After migration, the >> system memory is freed >> and the CPU page table holds an invalid PTE that points to the device >> private struct page >> (similar to a swap PTE). If the CPU process faults on that address, >> there is a callback >> to the driver to migrate it back to system memory. This is where >> device DMA engines can >> be used to copy data to/from system memory and device private memory. >> >> The use case for the above is to be able to run code such as OpenCL on >> GPUs and CPUs using >> the same virtual addresses without having to call special memory >> allocators. >> In other words, just use mmap() and malloc() and not clSVMAlloc(). >> >> There is a performance consideration here. If the GPU accesses the >> data over PCIe to >> system memory, there is much less bandwidth than accessing local GPU >> memory. If the >> data is to be accessed/used many times, it can be more efficient to >> migrate the data >> to local GPU memory. If the data is only accessed a few times, then it >> is probably >> more efficient to map system memory. >> > > Ralph, that's a good write-up! > > Valmiki, did you already read Documentation/vm/hmm.rst, before posting > your question? > > It's OK to say "no"--I'm not asking in order to criticize, but in order > to calibrate > the documentation. Because, we should consider merging in Ralph's > write-up above > into hmm.rst, depending on if it helps (which I expect it does, but I'm > tainted by > reading hmm.rst too many times and now I can't see what might be missing). > > Any time someone new tries to understand the system, it's an opportunity > to "unit test" > the documentation. Ideally, hmm.rst would answer many of a first-time > reader's questions, > that's where we'd like to end up. > > > thanks, Hi John, i did give initial reading, but Ralph details on migration_* API's above helped a lot to clarify things better, yes adding above details into hmm.rst would help beginners. Regards, Valmiki ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Regarding HMM 2020-08-18 17:06 ` Ralph Campbell 2020-08-18 20:35 ` John Hubbard @ 2020-08-23 13:08 ` Valmiki 1 sibling, 0 replies; 5+ messages in thread From: Valmiki @ 2020-08-23 13:08 UTC (permalink / raw) To: Ralph Campbell, linux-mm; +Cc: jglisse On 18-08-2020 10:36 pm, Ralph Campbell wrote: > > On 8/18/20 12:15 AM, Valmiki wrote: >> Hi All, >> >> Im trying to understand heterogeneous memory management, i have >> following doubts. >> >> If HMM is being used we dont have to use DMA controller on device for >> memory transfers ? >> Without DMA if software is managing page faults and migrations, will >> there be any performance impacts ? >> >> Is HMM targeted for any specific use cases where DMA controller is not >> there on device ? >> >> Regards, >> Valmiki >> > > There are two APIs that are part of "HMM" and are independent of each > other. > > hmm_range_fault() is for getting the physical address of a system > resident memory page that > a device can map but is not pinned in the usual way I/O increases the > page reference count > to pin the page. The device driver has to handle invalidation callbacks > to remove the device > mapping. This lets the device access the page without moving it. > > migrate_vma_setup(), migrate_vma_pages(), and migrate_vma_finalize() are > used by the device > driver to migrate data to device private memory. After migration, the > system memory is freed > and the CPU page table holds an invalid PTE that points to the device > private struct page > (similar to a swap PTE). If the CPU process faults on that address, > there is a callback > to the driver to migrate it back to system memory. This is where device > DMA engines can > be used to copy data to/from system memory and device private memory. > > The use case for the above is to be able to run code such as OpenCL on > GPUs and CPUs using > the same virtual addresses without having to call special memory > allocators. > In other words, just use mmap() and malloc() and not clSVMAlloc(). > > There is a performance consideration here. If the GPU accesses the data > over PCIe to > system memory, there is much less bandwidth than accessing local GPU > memory. If the > data is to be accessed/used many times, it can be more efficient to > migrate the data > to local GPU memory. If the data is only accessed a few times, then it > is probably > more efficient to map system memory. Thanks Ralph for the clarification. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-08-23 13:21 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-08-18 7:15 Regarding HMM Valmiki 2020-08-18 17:06 ` Ralph Campbell 2020-08-18 20:35 ` John Hubbard 2020-08-23 13:21 ` Valmiki 2020-08-23 13:08 ` Valmiki
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.