* Re: [RESEND PATCH 2/3] nouveau: fix mixed normal and device private page migration [not found] ` <330f6a82-d01d-db97-1dec-69346f41e707@nvidia.com> @ 2020-06-25 17:25 ` Ralph Campbell 2020-06-25 17:31 ` Jason Gunthorpe 0 siblings, 1 reply; 3+ messages in thread From: Ralph Campbell @ 2020-06-25 17:25 UTC (permalink / raw) To: Christoph Hellwig Cc: nouveau, linux-kernel, Jerome Glisse, John Hubbard, Jason Gunthorpe, Ben Skeggs, Andrew Morton, linux-mm, Bharata B Rao Making sure to include linux-mm and Bharata B Rao for IBM's use of migrate_vma*(). On 6/24/20 11:10 AM, Ralph Campbell wrote: > > On 6/24/20 12:23 AM, Christoph Hellwig wrote: >> On Mon, Jun 22, 2020 at 04:38:53PM -0700, Ralph Campbell wrote: >>> The OpenCL function clEnqueueSVMMigrateMem(), without any flags, will >>> migrate memory in the given address range to device private memory. The >>> source pages might already have been migrated to device private memory. >>> In that case, the source struct page is not checked to see if it is >>> a device private page and incorrectly computes the GPU's physical >>> address of local memory leading to data corruption. >>> Fix this by checking the source struct page and computing the correct >>> physical address. >> >> I'm really worried about all this delicate code to fix the mixed >> ranges. Can't we make it clear at the migrate_vma_* level if we want >> to migrate from or two device private memory, and then skip all the work >> for regions of memory that already are in the right place? This might be >> a little more work initially, but I think it leads to a much better >> API. >> > > The current code does encode the direction with src_owner != NULL meaning > device private to system memory and src_owner == NULL meaning system > memory to device private memory. This patch would obviously defeat that > so perhaps a flag could be added to the struct migrate_vma to indicate the > direction but I'm unclear how that makes things less delicate. > Can you expand on what you are worried about? > > The issue with invalidations might be better addressed by letting the device > driver handle device private page TLB invalidations when migrating to > system memory and changing migrate_vma_setup() to only invalidate CPU > TLB entries for normal pages being migrated to device private memory. > If a page isn't migrating, it seems inefficient to invalidate those TLB > entries. > > Any other suggestions? After a night's sleep, I think this might work. What do others think? 1) Add a new MMU_NOTIFY_MIGRATE enum to mmu_notifier_event. 2) Change migrate_vma_collect() to use the new MMU_NOTIFY_MIGRATE event type. 3) Modify nouveau_svmm_invalidate_range_start() to simply return (no invalidations) for MMU_NOTIFY_MIGRATE mmu notifier callbacks. 4) Leave the src_owner check in migrate_vma_collect_pmd() for normal pages so if the device driver is migrating normal pages to device private memory, the driver would set src_owner = NULL and already migrated device private pages would be skipped. Since the mmu notifier callback did nothing, the device private entries remain valid in the device's MMU. migrate_vma_collect_pmd() would still invalidate the CPU page tables for migrated normal pages. If the driver is migrating device private pages to system memory, it would set src_owner != NULL, normal pages would be skipped, but now the device driver has to invalidate device MMU mappings in the "alloc and copy" before doing the copy. This would be after migrate_vma_setup() returns so the list of migrating device pages is known to the driver. The rest of the migrate_vma_pages() and migrate_vma_finalize() remain the same. ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RESEND PATCH 2/3] nouveau: fix mixed normal and device private page migration 2020-06-25 17:25 ` [RESEND PATCH 2/3] nouveau: fix mixed normal and device private page migration Ralph Campbell @ 2020-06-25 17:31 ` Jason Gunthorpe 2020-06-25 17:42 ` Ralph Campbell 0 siblings, 1 reply; 3+ messages in thread From: Jason Gunthorpe @ 2020-06-25 17:31 UTC (permalink / raw) To: Ralph Campbell Cc: Christoph Hellwig, nouveau, linux-kernel, Jerome Glisse, John Hubbard, Ben Skeggs, Andrew Morton, linux-mm, Bharata B Rao On Thu, Jun 25, 2020 at 10:25:38AM -0700, Ralph Campbell wrote: > Making sure to include linux-mm and Bharata B Rao for IBM's > use of migrate_vma*(). > > On 6/24/20 11:10 AM, Ralph Campbell wrote: > > > > On 6/24/20 12:23 AM, Christoph Hellwig wrote: > > > On Mon, Jun 22, 2020 at 04:38:53PM -0700, Ralph Campbell wrote: > > > > The OpenCL function clEnqueueSVMMigrateMem(), without any flags, will > > > > migrate memory in the given address range to device private memory. The > > > > source pages might already have been migrated to device private memory. > > > > In that case, the source struct page is not checked to see if it is > > > > a device private page and incorrectly computes the GPU's physical > > > > address of local memory leading to data corruption. > > > > Fix this by checking the source struct page and computing the correct > > > > physical address. > > > > > > I'm really worried about all this delicate code to fix the mixed > > > ranges. Can't we make it clear at the migrate_vma_* level if we want > > > to migrate from or two device private memory, and then skip all the work > > > for regions of memory that already are in the right place? This might be > > > a little more work initially, but I think it leads to a much better > > > API. > > > > > > > The current code does encode the direction with src_owner != NULL meaning > > device private to system memory and src_owner == NULL meaning system > > memory to device private memory. This patch would obviously defeat that > > so perhaps a flag could be added to the struct migrate_vma to indicate the > > direction but I'm unclear how that makes things less delicate. > > Can you expand on what you are worried about? > > > > The issue with invalidations might be better addressed by letting the device > > driver handle device private page TLB invalidations when migrating to > > system memory and changing migrate_vma_setup() to only invalidate CPU > > TLB entries for normal pages being migrated to device private memory. > > If a page isn't migrating, it seems inefficient to invalidate those TLB > > entries. > > > > Any other suggestions? > > After a night's sleep, I think this might work. What do others think? > > 1) Add a new MMU_NOTIFY_MIGRATE enum to mmu_notifier_event. > > 2) Change migrate_vma_collect() to use the new MMU_NOTIFY_MIGRATE event type. > > 3) Modify nouveau_svmm_invalidate_range_start() to simply return (no invalidations) > for MMU_NOTIFY_MIGRATE mmu notifier callbacks. Isn't it a bit of an assumption that migrate_vma_collect() is only used by nouveau itself? What if some other devices' device_private pages are being migrated? Jason ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RESEND PATCH 2/3] nouveau: fix mixed normal and device private page migration 2020-06-25 17:31 ` Jason Gunthorpe @ 2020-06-25 17:42 ` Ralph Campbell 0 siblings, 0 replies; 3+ messages in thread From: Ralph Campbell @ 2020-06-25 17:42 UTC (permalink / raw) To: Jason Gunthorpe Cc: Christoph Hellwig, nouveau, linux-kernel, Jerome Glisse, John Hubbard, Ben Skeggs, Andrew Morton, linux-mm, Bharata B Rao On 6/25/20 10:31 AM, Jason Gunthorpe wrote: > On Thu, Jun 25, 2020 at 10:25:38AM -0700, Ralph Campbell wrote: >> Making sure to include linux-mm and Bharata B Rao for IBM's >> use of migrate_vma*(). >> >> On 6/24/20 11:10 AM, Ralph Campbell wrote: >>> >>> On 6/24/20 12:23 AM, Christoph Hellwig wrote: >>>> On Mon, Jun 22, 2020 at 04:38:53PM -0700, Ralph Campbell wrote: >>>>> The OpenCL function clEnqueueSVMMigrateMem(), without any flags, will >>>>> migrate memory in the given address range to device private memory. The >>>>> source pages might already have been migrated to device private memory. >>>>> In that case, the source struct page is not checked to see if it is >>>>> a device private page and incorrectly computes the GPU's physical >>>>> address of local memory leading to data corruption. >>>>> Fix this by checking the source struct page and computing the correct >>>>> physical address. >>>> >>>> I'm really worried about all this delicate code to fix the mixed >>>> ranges. Can't we make it clear at the migrate_vma_* level if we want >>>> to migrate from or two device private memory, and then skip all the work >>>> for regions of memory that already are in the right place? This might be >>>> a little more work initially, but I think it leads to a much better >>>> API. >>>> >>> >>> The current code does encode the direction with src_owner != NULL meaning >>> device private to system memory and src_owner == NULL meaning system >>> memory to device private memory. This patch would obviously defeat that >>> so perhaps a flag could be added to the struct migrate_vma to indicate the >>> direction but I'm unclear how that makes things less delicate. >>> Can you expand on what you are worried about? >>> >>> The issue with invalidations might be better addressed by letting the device >>> driver handle device private page TLB invalidations when migrating to >>> system memory and changing migrate_vma_setup() to only invalidate CPU >>> TLB entries for normal pages being migrated to device private memory. >>> If a page isn't migrating, it seems inefficient to invalidate those TLB >>> entries. >>> >>> Any other suggestions? >> >> After a night's sleep, I think this might work. What do others think? >> >> 1) Add a new MMU_NOTIFY_MIGRATE enum to mmu_notifier_event. >> >> 2) Change migrate_vma_collect() to use the new MMU_NOTIFY_MIGRATE event type. >> >> 3) Modify nouveau_svmm_invalidate_range_start() to simply return (no invalidations) >> for MMU_NOTIFY_MIGRATE mmu notifier callbacks. > > Isn't it a bit of an assumption that migrate_vma_collect() is only > used by nouveau itself? > > What if some other devices' device_private pages are being migrated? > > Jason > Good point. The driver needs a way of knowing the callback is due its call to migrate_vma_setup() and not some other migration invalidation. How about adding a void pointer to struct mmu_notifier_range which migrate_vma_collect() can set to src_owner. If the event is MMU_NOTIFY_MIGRATE and the src_owner matches the void pointer, then the callback should be the one the driver initiated. ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-06-25 17:42 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20200622233854.10889-1-rcampbell@nvidia.com> [not found] ` <20200622233854.10889-3-rcampbell@nvidia.com> [not found] ` <20200624072355.GB18609@lst.de> [not found] ` <330f6a82-d01d-db97-1dec-69346f41e707@nvidia.com> 2020-06-25 17:25 ` [RESEND PATCH 2/3] nouveau: fix mixed normal and device private page migration Ralph Campbell 2020-06-25 17:31 ` Jason Gunthorpe 2020-06-25 17:42 ` Ralph Campbell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).