From: Jerome Glisse <jglisse@redhat.com> To: Jason Gunthorpe <jgg@ziepe.ca> Cc: "Daniel Vetter" <daniel.vetter@ffwll.ch>, "Michal Hocko" <mhocko@kernel.org>, "Andrew Morton" <akpm@linux-foundation.org>, LKML <linux-kernel@vger.kernel.org>, "Linux MM" <linux-mm@kvack.org>, "DRI Development" <dri-devel@lists.freedesktop.org>, "Intel Graphics Development" <intel-gfx@lists.freedesktop.org>, "Peter Zijlstra" <peterz@infradead.org>, "Ingo Molnar" <mingo@redhat.com>, "David Rientjes" <rientjes@google.com>, "Christian König" <christian.koenig@amd.com>, "Masahiro Yamada" <yamada.masahiro@socionext.com>, "Wei Wang" <wvw@google.com>, "Andy Shevchenko" <andriy.shevchenko@linux.intel.com>, "Thomas Gleixner" <tglx@linutronix.de>, "Jann Horn" <jannh@google.com>, "Feng Tang" <feng.tang@intel.com>, "Kees Cook" <keescook@chromium.org>, "Randy Dunlap" <rdunlap@infradead.org>, "Daniel Vetter" <daniel.vetter@intel.com> Subject: Re: [PATCH 2/5] kernel.h: Add non_block_start/end() Date: Thu, 15 Aug 2019 14:27:19 -0400 [thread overview] Message-ID: <20190815182719.GB4920@redhat.com> (raw) In-Reply-To: <20190815180159.GO21596@ziepe.ca> On Thu, Aug 15, 2019 at 03:01:59PM -0300, Jason Gunthorpe wrote: > On Thu, Aug 15, 2019 at 01:39:22PM -0400, Jerome Glisse wrote: > > On Thu, Aug 15, 2019 at 02:35:57PM -0300, Jason Gunthorpe wrote: > > > On Thu, Aug 15, 2019 at 06:25:16PM +0200, Daniel Vetter wrote: > > > > > > > I'm not really well versed in the details of our userptr, but both > > > > amdgpu and i915 wait for the gpu to complete from > > > > invalidate_range_start. Jerome has at least looked a lot at the amdgpu > > > > one, so maybe he can explain what exactly it is we're doing ... > > > > > > amdgpu is (wrongly) using hmm for something, I can't really tell what > > > it is trying to do. The calls to dma_fence under the > > > invalidate_range_start do not give me a good feeling. > > > > > > However, i915 shows all the signs of trying to follow the registration > > > cache model, it even has a nice comment in > > > i915_gem_userptr_get_pages() explaining that the races it has don't > > > matter because it is a user space bug to change the VA mapping in the > > > first place. That just screams registration cache to me. > > > > > > So it is fine to run HW that way, but if you do, there is no reason to > > > fence inside the invalidate_range end. Just orphan the DMA buffer and > > > clean it up & release the page pins when all DMA buffer refs go to > > > zero. The next access to that VA should get a new DMA buffer with the > > > right mapping. > > > > > > In other words the invalidation should be very simple without > > > complicated locking, or wait_event's. Look at hfi1 for example. > > > > This would break the today usage model of uptr and it will > > break userspace expectation ie if GPU is writting to that > > memory and that memory then the userspace want to make sure > > that it will see what the GPU write. > > How exactly? This is holding the page pin, so the only way the VA > mapping can be changed is via explicit user action. > > ie: > > gpu_write_something(va, size) > mmap(.., va, size, MMAP_FIXED); > gpu_wait_done() > > This is racy and indeterminate with both models. > > Based on the comment in i915 it appears to be going on the model that > changes to the mmap by userspace when the GPU is working on it is a > programming bug. This is reasonable, lots of systems use this kind of > consistency model. Well userspace process doing munmap(), mremap(), fork() and things like that are a bug from the i915 kernel and userspace contract point of view. But things like migration or reclaim are not cover under that contract and for those the expectation is that CPU access to the same virtual address should allow to get what was last written to it either by the GPU or the CPU. > > Since the driver seems to rely on a dma_fence to block DMA access, it > looks to me like the kernel has full visibility to the > 'gpu_write_something()' and 'gpu_wait_done()' markers. So let's only consider the case where GPU wants to write to the memory (the read only case is obviously right and not need any notifier in fact) and like above the only thing we care about is reclaim or migration (for instance because of some numa compaction) as the rest is cover by i915 userspace contract. So in the write case we do GUPfast(write=true) which will be "safe" from any concurrent CPU page table update ie if GUPfast get a reference for the page then any racing CPU page table update will not be able to migrate or reclaim the page and thus the virtual address to page association will be preserve. This is only true because of GUPfast(), now if GUPfast() fails it will fallback to the slow GUP case which make the same thing safe by taking the page table lock. Because of the reference on the page the i915 driver can forego the mmu notifier end callback. The thing here is that taking a page reference is pointless if we have better synchronization and tracking of mmu notifier. Hence converting to hmm mirror allows to avoid taking a ref on the page while still keeping the same functionality as of today. > I think trying to use hmm_range_fault on HW that can't do HW page > faulting and HW 'TLB shootdown' is a very, very bad idea. I fear that > is what amd gpu is trying to do. > > I haven't yet seen anything that looks like 'TLB shootdown' in i915?? GPU driver have complex usage pattern the tlb shootdown is implicit once the GEM object associated with the uptr is invalidated it means next time userspace submit command against that GEM object it will have to re-validate it which means re-program the GPU page table to point to the proper address (and re-call GUP). So the whole GPU page table update is all hidden behind GEM function like i915_gem_object_unbind() (or ttm invalidate for amd/radeon). Technicaly those GPU do not support page faulting but because of the way GPU works you know that you have frequent checkpoint where you can unbind things. This is what is happening here. Also not all GPU engines can handle page fault, this is true of all GPU with page fault AFAIK (AMD and NVidia so far). This is why uptr is a limited form of SVM (share virtual memory) that can be use on all GPUs engine including the dma engine. When using the full SVM power (like hmm mirror with nouveau) this is only use in GPU engine that supports page fault (but updating the page table still require locking and waiting on acknowledge). Cheers, Jérôme
WARNING: multiple messages have this Message-ID (diff)
From: Jerome Glisse <jglisse@redhat.com> To: Jason Gunthorpe <jgg@ziepe.ca> Cc: "Daniel Vetter" <daniel.vetter@ffwll.ch>, "Michal Hocko" <mhocko@kernel.org>, "Andrew Morton" <akpm@linux-foundation.org>, LKML <linux-kernel@vger.kernel.org>, "Linux MM" <linux-mm@kvack.org>, "DRI Development" <dri-devel@lists.freedesktop.org>, "Intel Graphics Development" <intel-gfx@lists.freedesktop.org>, "Peter Zijlstra" <peterz@infradead.org>, "Ingo Molnar" <mingo@redhat.com>, "David Rientjes" <rientjes@google.com>, "Christian König" <christian.koenig@amd.com>, "Masahiro Yamada" <yamada.masahiro@socionext.com>, "Wei Wang" <wvw@google.com>, "Andy Shevchenko" <andriy.shevchenko@linux.intel.com>, "Thomas Gleixner" <tglx@linutronix.de>, "Jann Horn" <jannh@google.com>, "Feng Tang" <feng.tang@intel.com>, "Kees Cook" <keescook@chromium.org>, "Randy Dunlap" <rdunlap@inf> Subject: Re: [PATCH 2/5] kernel.h: Add non_block_start/end() Date: Thu, 15 Aug 2019 14:27:19 -0400 [thread overview] Message-ID: <20190815182719.GB4920@redhat.com> (raw) In-Reply-To: <20190815180159.GO21596@ziepe.ca> On Thu, Aug 15, 2019 at 03:01:59PM -0300, Jason Gunthorpe wrote: > On Thu, Aug 15, 2019 at 01:39:22PM -0400, Jerome Glisse wrote: > > On Thu, Aug 15, 2019 at 02:35:57PM -0300, Jason Gunthorpe wrote: > > > On Thu, Aug 15, 2019 at 06:25:16PM +0200, Daniel Vetter wrote: > > > > > > > I'm not really well versed in the details of our userptr, but both > > > > amdgpu and i915 wait for the gpu to complete from > > > > invalidate_range_start. Jerome has at least looked a lot at the amdgpu > > > > one, so maybe he can explain what exactly it is we're doing ... > > > > > > amdgpu is (wrongly) using hmm for something, I can't really tell what > > > it is trying to do. The calls to dma_fence under the > > > invalidate_range_start do not give me a good feeling. > > > > > > However, i915 shows all the signs of trying to follow the registration > > > cache model, it even has a nice comment in > > > i915_gem_userptr_get_pages() explaining that the races it has don't > > > matter because it is a user space bug to change the VA mapping in the > > > first place. That just screams registration cache to me. > > > > > > So it is fine to run HW that way, but if you do, there is no reason to > > > fence inside the invalidate_range end. Just orphan the DMA buffer and > > > clean it up & release the page pins when all DMA buffer refs go to > > > zero. The next access to that VA should get a new DMA buffer with the > > > right mapping. > > > > > > In other words the invalidation should be very simple without > > > complicated locking, or wait_event's. Look at hfi1 for example. > > > > This would break the today usage model of uptr and it will > > break userspace expectation ie if GPU is writting to that > > memory and that memory then the userspace want to make sure > > that it will see what the GPU write. > > How exactly? This is holding the page pin, so the only way the VA > mapping can be changed is via explicit user action. > > ie: > > gpu_write_something(va, size) > mmap(.., va, size, MMAP_FIXED); > gpu_wait_done() > > This is racy and indeterminate with both models. > > Based on the comment in i915 it appears to be going on the model that > changes to the mmap by userspace when the GPU is working on it is a > programming bug. This is reasonable, lots of systems use this kind of > consistency model. Well userspace process doing munmap(), mremap(), fork() and things like that are a bug from the i915 kernel and userspace contract point of view. But things like migration or reclaim are not cover under that contract and for those the expectation is that CPU access to the same virtual address should allow to get what was last written to it either by the GPU or the CPU. > > Since the driver seems to rely on a dma_fence to block DMA access, it > looks to me like the kernel has full visibility to the > 'gpu_write_something()' and 'gpu_wait_done()' markers. So let's only consider the case where GPU wants to write to the memory (the read only case is obviously right and not need any notifier in fact) and like above the only thing we care about is reclaim or migration (for instance because of some numa compaction) as the rest is cover by i915 userspace contract. So in the write case we do GUPfast(write=true) which will be "safe" from any concurrent CPU page table update ie if GUPfast get a reference for the page then any racing CPU page table update will not be able to migrate or reclaim the page and thus the virtual address to page association will be preserve. This is only true because of GUPfast(), now if GUPfast() fails it will fallback to the slow GUP case which make the same thing safe by taking the page table lock. Because of the reference on the page the i915 driver can forego the mmu notifier end callback. The thing here is that taking a page reference is pointless if we have better synchronization and tracking of mmu notifier. Hence converting to hmm mirror allows to avoid taking a ref on the page while still keeping the same functionality as of today. > I think trying to use hmm_range_fault on HW that can't do HW page > faulting and HW 'TLB shootdown' is a very, very bad idea. I fear that > is what amd gpu is trying to do. > > I haven't yet seen anything that looks like 'TLB shootdown' in i915?? GPU driver have complex usage pattern the tlb shootdown is implicit once the GEM object associated with the uptr is invalidated it means next time userspace submit command against that GEM object it will have to re-validate it which means re-program the GPU page table to point to the proper address (and re-call GUP). So the whole GPU page table update is all hidden behind GEM function like i915_gem_object_unbind() (or ttm invalidate for amd/radeon). Technicaly those GPU do not support page faulting but because of the way GPU works you know that you have frequent checkpoint where you can unbind things. This is what is happening here. Also not all GPU engines can handle page fault, this is true of all GPU with page fault AFAIK (AMD and NVidia so far). This is why uptr is a limited form of SVM (share virtual memory) that can be use on all GPUs engine including the dma engine. When using the full SVM power (like hmm mirror with nouveau) this is only use in GPU engine that supports page fault (but updating the page table still require locking and waiting on acknowledge). Cheers, Jérôme
next prev parent reply other threads:[~2019-08-15 18:27 UTC|newest] Thread overview: 130+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-08-14 20:20 [PATCH 0/5] hmm & mmu_notifier debug/lockdep annotations Daniel Vetter 2019-08-14 20:20 ` [PATCH 1/5] mm: Check if mmu notifier callbacks are allowed to fail Daniel Vetter 2019-08-14 22:14 ` Andrew Morton 2019-08-14 23:22 ` Jason Gunthorpe 2019-08-14 23:34 ` Ralph Campbell 2019-08-16 17:19 ` Jason Gunthorpe 2019-08-14 20:20 ` [PATCH 2/5] kernel.h: Add non_block_start/end() Daniel Vetter 2019-08-14 20:20 ` Daniel Vetter 2019-08-14 20:45 ` Andrew Morton 2019-08-14 20:45 ` Andrew Morton 2019-08-15 6:52 ` Daniel Vetter 2019-08-15 6:52 ` Daniel Vetter 2019-08-15 8:44 ` Michal Hocko 2019-08-15 8:44 ` Michal Hocko 2019-08-15 13:04 ` Jason Gunthorpe 2019-08-15 13:04 ` Jason Gunthorpe 2019-08-15 13:12 ` Daniel Vetter 2019-08-15 13:12 ` Daniel Vetter 2019-08-15 14:37 ` Jason Gunthorpe 2019-08-15 14:37 ` Jason Gunthorpe 2019-08-15 14:43 ` Daniel Vetter 2019-08-15 14:43 ` Daniel Vetter 2019-08-15 15:10 ` Jason Gunthorpe 2019-08-15 15:10 ` Jason Gunthorpe 2019-08-15 16:25 ` Daniel Vetter 2019-08-15 16:25 ` Daniel Vetter 2019-08-15 17:35 ` Jason Gunthorpe 2019-08-15 17:35 ` Jason Gunthorpe 2019-08-15 17:39 ` Jerome Glisse 2019-08-15 17:39 ` Jerome Glisse 2019-08-15 18:01 ` Jason Gunthorpe 2019-08-15 18:01 ` Jason Gunthorpe 2019-08-15 18:27 ` Jerome Glisse [this message] 2019-08-15 18:27 ` Jerome Glisse 2019-08-15 18:57 ` Jason Gunthorpe 2019-08-15 18:57 ` Jason Gunthorpe 2019-08-15 16:32 ` Jerome Glisse 2019-08-15 16:32 ` Jerome Glisse 2019-08-15 17:16 ` Jason Gunthorpe 2019-08-15 17:16 ` Jason Gunthorpe 2019-08-15 17:21 ` Daniel Vetter 2019-08-15 17:21 ` Daniel Vetter 2019-08-15 17:35 ` Jerome Glisse 2019-08-15 17:35 ` Jerome Glisse 2019-08-15 13:24 ` Michal Hocko 2019-08-15 13:24 ` Michal Hocko 2019-08-15 22:15 ` Andrew Morton 2019-08-15 22:15 ` Andrew Morton 2019-08-16 8:24 ` Michal Hocko 2019-08-16 8:24 ` Michal Hocko 2019-08-14 23:58 ` Jason Gunthorpe 2019-08-14 23:58 ` Jason Gunthorpe 2019-08-15 6:58 ` Daniel Vetter 2019-08-15 6:58 ` Daniel Vetter 2019-08-15 12:23 ` Jason Gunthorpe 2019-08-15 12:23 ` Jason Gunthorpe 2019-08-15 13:21 ` Michal Hocko 2019-08-15 13:21 ` Michal Hocko 2019-08-15 14:12 ` Jason Gunthorpe 2019-08-15 14:12 ` Jason Gunthorpe 2019-08-15 16:00 ` Michal Hocko 2019-08-15 16:00 ` Michal Hocko 2019-08-15 16:56 ` Jason Gunthorpe 2019-08-15 16:56 ` Jason Gunthorpe 2019-08-15 17:11 ` Jerome Glisse 2019-08-15 17:17 ` Jason Gunthorpe 2019-08-15 17:42 ` Michal Hocko 2019-08-15 17:42 ` Michal Hocko 2019-08-15 17:57 ` Jerome Glisse 2019-08-15 18:24 ` Jason Gunthorpe 2019-08-15 18:24 ` Jason Gunthorpe 2019-08-15 19:05 ` Michal Hocko 2019-08-15 19:05 ` Michal Hocko 2019-08-15 19:18 ` Jason Gunthorpe 2019-08-15 19:18 ` Jason Gunthorpe 2019-08-15 19:35 ` Michal Hocko 2019-08-15 19:35 ` Michal Hocko 2019-08-15 20:13 ` Jason Gunthorpe 2019-08-15 20:13 ` Jason Gunthorpe 2019-08-16 8:10 ` Michal Hocko 2019-08-16 8:10 ` Michal Hocko 2019-08-16 12:19 ` Jason Gunthorpe 2019-08-16 12:19 ` Jason Gunthorpe 2019-08-16 12:26 ` Michal Hocko 2019-08-16 12:26 ` Michal Hocko 2019-08-16 14:31 ` Jason Gunthorpe 2019-08-16 14:31 ` Jason Gunthorpe 2019-08-16 15:05 ` Jerome Glisse 2019-08-16 15:05 ` Jerome Glisse 2019-08-20 8:18 ` Michal Hocko 2019-08-20 8:18 ` Michal Hocko 2019-08-15 20:16 ` [Intel-gfx] " Daniel Vetter 2019-08-15 20:16 ` Daniel Vetter 2019-08-15 20:27 ` Jason Gunthorpe 2019-08-15 20:27 ` Jason Gunthorpe 2019-08-15 20:49 ` Daniel Vetter 2019-08-15 20:49 ` Daniel Vetter 2019-08-16 1:00 ` Jason Gunthorpe 2019-08-16 1:00 ` Jason Gunthorpe 2019-08-16 6:20 ` Daniel Vetter 2019-08-16 6:20 ` Daniel Vetter 2019-08-16 12:12 ` Jason Gunthorpe 2019-08-16 12:12 ` Jason Gunthorpe 2019-08-16 14:11 ` Daniel Vetter 2019-08-16 14:11 ` Daniel Vetter 2019-08-16 14:38 ` Jason Gunthorpe 2019-08-16 14:38 ` Jason Gunthorpe 2019-08-16 16:36 ` Daniel Vetter 2019-08-16 16:36 ` Daniel Vetter 2019-08-16 16:54 ` Jason Gunthorpe 2019-08-16 16:54 ` Jason Gunthorpe 2019-08-16 8:27 ` Michal Hocko 2019-08-16 8:27 ` Michal Hocko 2019-08-14 20:20 ` [PATCH 3/5] mm, notifier: Catch sleeping/blocking for !blockable Daniel Vetter 2019-08-15 0:00 ` Jason Gunthorpe 2019-08-15 7:02 ` Daniel Vetter 2019-08-15 12:35 ` Jason Gunthorpe 2019-08-17 16:09 ` Daniel Vetter 2019-08-17 16:09 ` Daniel Vetter 2019-08-14 20:20 ` [PATCH 4/5] mm, notifier: Add a lockdep map for invalidate_range_start Daniel Vetter 2019-08-15 0:09 ` Jason Gunthorpe 2019-08-15 7:10 ` Daniel Vetter 2019-08-15 7:10 ` Daniel Vetter 2019-08-15 12:53 ` Jason Gunthorpe 2019-08-14 20:20 ` [PATCH 5/5] mm/hmm: WARN on illegal ->sync_cpu_device_pagetables errors Daniel Vetter 2019-08-15 0:11 ` Jason Gunthorpe 2019-08-15 7:14 ` Daniel Vetter 2019-08-15 7:14 ` Daniel Vetter 2019-08-14 21:29 ` ✗ Fi.CI.CHECKPATCH: warning for hmm & mmu_notifier debug/lockdep annotations Patchwork 2019-08-14 21:56 ` ✓ Fi.CI.BAT: success " Patchwork
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190815182719.GB4920@redhat.com \ --to=jglisse@redhat.com \ --cc=akpm@linux-foundation.org \ --cc=andriy.shevchenko@linux.intel.com \ --cc=christian.koenig@amd.com \ --cc=daniel.vetter@ffwll.ch \ --cc=daniel.vetter@intel.com \ --cc=dri-devel@lists.freedesktop.org \ --cc=feng.tang@intel.com \ --cc=intel-gfx@lists.freedesktop.org \ --cc=jannh@google.com \ --cc=jgg@ziepe.ca \ --cc=keescook@chromium.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mhocko@kernel.org \ --cc=mingo@redhat.com \ --cc=peterz@infradead.org \ --cc=rdunlap@infradead.org \ --cc=rientjes@google.com \ --cc=tglx@linutronix.de \ --cc=wvw@google.com \ --cc=yamada.masahiro@socionext.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.