All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-22 17:08 ` Daniel Vetter
  0 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-22 17:08 UTC (permalink / raw)
  To: DRI Development
  Cc: Intel Graphics Development, Daniel Vetter, Christian König,
	Thomas Zimmermann, Jason Gunthorpe, Suren Baghdasaryan,
	Matthew Wilcox, John Stultz, Daniel Vetter, Sumit Semwal,
	linux-media, linaro-mm-sig

tldr; DMA buffers aren't normal memory, expecting that you can use
them like that (like calling get_user_pages works, or that they're
accounting like any other normal memory) cannot be guaranteed.

Since some userspace only runs on integrated devices, where all
buffers are actually all resident system memory, there's a huge
temptation to assume that a struct page is always present and useable
like for any more pagecache backed mmap. This has the potential to
result in a uapi nightmare.

To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
blocks get_user_pages and all the other struct page based
infrastructure for everyone. In spirit this is the uapi counterpart to
the kernel-internal CONFIG_DMABUF_DEBUG.

Motivated by a recent patch which wanted to swich the system dma-buf
heap to vm_insert_page instead of vm_insert_pfn.

v2:

Jason brought up that we also want to guarantee that all ptes have the
pte_special flag set, to catch fast get_user_pages (on architectures
that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.

From auditing the various functions to insert pfn pte entires
(vm_insert_pfn_prot, remap_pfn_range and all it's callers like
dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
this should be the correct flag to check for.

v3: Change to WARN_ON_ONCE (Thomas Zimmermann)

References: https://lore.kernel.org/lkml/CAKMK7uHi+mG0z0HUmNt13QCCvutuRVjpcR0NjRL12k-WbWzkRg@mail.gmail.com/
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
--
Ok I entirely forgot about this patch but stumbled over it and checked
what's up with it no. I think it's ready now for merging:
- shmem helper patches to fix up vgem landed
- ttm has been fixed since a while
- I don't think we've had any other open issues

Time to lock down this uapi contract for real?
-Daniel
---
 drivers/dma-buf/dma-buf.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index b6c36914e7c6..88718665c3c3 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -150,6 +150,8 @@ static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
 	ret = dmabuf->ops->mmap(dmabuf, vma);
 	dma_resv_unlock(dmabuf->resv);
 
+	WARN_ON_ONCE(!(vma->vm_flags & VM_PFNMAP));
+
 	return ret;
 }
 
@@ -1495,6 +1497,8 @@ int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
 	ret = dmabuf->ops->mmap(dmabuf, vma);
 	dma_resv_unlock(dmabuf->resv);
 
+	WARN_ON_ONCE(!(vma->vm_flags & VM_PFNMAP));
+
 	return ret;
 }
 EXPORT_SYMBOL_NS_GPL(dma_buf_mmap, DMA_BUF);
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-22 17:08 ` Daniel Vetter
  0 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-22 17:08 UTC (permalink / raw)
  To: DRI Development
  Cc: Daniel Vetter, Intel Graphics Development, Matthew Wilcox,
	Sumit Semwal, linaro-mm-sig, Jason Gunthorpe, John Stultz,
	Thomas Zimmermann, Daniel Vetter, Suren Baghdasaryan,
	Christian König, linux-media

tldr; DMA buffers aren't normal memory, expecting that you can use
them like that (like calling get_user_pages works, or that they're
accounting like any other normal memory) cannot be guaranteed.

Since some userspace only runs on integrated devices, where all
buffers are actually all resident system memory, there's a huge
temptation to assume that a struct page is always present and useable
like for any more pagecache backed mmap. This has the potential to
result in a uapi nightmare.

To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
blocks get_user_pages and all the other struct page based
infrastructure for everyone. In spirit this is the uapi counterpart to
the kernel-internal CONFIG_DMABUF_DEBUG.

Motivated by a recent patch which wanted to swich the system dma-buf
heap to vm_insert_page instead of vm_insert_pfn.

v2:

Jason brought up that we also want to guarantee that all ptes have the
pte_special flag set, to catch fast get_user_pages (on architectures
that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.

From auditing the various functions to insert pfn pte entires
(vm_insert_pfn_prot, remap_pfn_range and all it's callers like
dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
this should be the correct flag to check for.

v3: Change to WARN_ON_ONCE (Thomas Zimmermann)

References: https://lore.kernel.org/lkml/CAKMK7uHi+mG0z0HUmNt13QCCvutuRVjpcR0NjRL12k-WbWzkRg@mail.gmail.com/
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
--
Ok I entirely forgot about this patch but stumbled over it and checked
what's up with it no. I think it's ready now for merging:
- shmem helper patches to fix up vgem landed
- ttm has been fixed since a while
- I don't think we've had any other open issues

Time to lock down this uapi contract for real?
-Daniel
---
 drivers/dma-buf/dma-buf.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index b6c36914e7c6..88718665c3c3 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -150,6 +150,8 @@ static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
 	ret = dmabuf->ops->mmap(dmabuf, vma);
 	dma_resv_unlock(dmabuf->resv);
 
+	WARN_ON_ONCE(!(vma->vm_flags & VM_PFNMAP));
+
 	return ret;
 }
 
@@ -1495,6 +1497,8 @@ int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
 	ret = dmabuf->ops->mmap(dmabuf, vma);
 	dma_resv_unlock(dmabuf->resv);
 
+	WARN_ON_ONCE(!(vma->vm_flags & VM_PFNMAP));
+
 	return ret;
 }
 EXPORT_SYMBOL_NS_GPL(dma_buf_mmap, DMA_BUF);
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [Intel-gfx] [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-22 17:08 ` Daniel Vetter
  0 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-22 17:08 UTC (permalink / raw)
  To: DRI Development
  Cc: Daniel Vetter, Intel Graphics Development, Matthew Wilcox,
	Sumit Semwal, linaro-mm-sig, Jason Gunthorpe, John Stultz,
	Thomas Zimmermann, Daniel Vetter, Suren Baghdasaryan,
	Christian König, linux-media

tldr; DMA buffers aren't normal memory, expecting that you can use
them like that (like calling get_user_pages works, or that they're
accounting like any other normal memory) cannot be guaranteed.

Since some userspace only runs on integrated devices, where all
buffers are actually all resident system memory, there's a huge
temptation to assume that a struct page is always present and useable
like for any more pagecache backed mmap. This has the potential to
result in a uapi nightmare.

To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
blocks get_user_pages and all the other struct page based
infrastructure for everyone. In spirit this is the uapi counterpart to
the kernel-internal CONFIG_DMABUF_DEBUG.

Motivated by a recent patch which wanted to swich the system dma-buf
heap to vm_insert_page instead of vm_insert_pfn.

v2:

Jason brought up that we also want to guarantee that all ptes have the
pte_special flag set, to catch fast get_user_pages (on architectures
that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.

From auditing the various functions to insert pfn pte entires
(vm_insert_pfn_prot, remap_pfn_range and all it's callers like
dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
this should be the correct flag to check for.

v3: Change to WARN_ON_ONCE (Thomas Zimmermann)

References: https://lore.kernel.org/lkml/CAKMK7uHi+mG0z0HUmNt13QCCvutuRVjpcR0NjRL12k-WbWzkRg@mail.gmail.com/
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
--
Ok I entirely forgot about this patch but stumbled over it and checked
what's up with it no. I think it's ready now for merging:
- shmem helper patches to fix up vgem landed
- ttm has been fixed since a while
- I don't think we've had any other open issues

Time to lock down this uapi contract for real?
-Daniel
---
 drivers/dma-buf/dma-buf.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index b6c36914e7c6..88718665c3c3 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -150,6 +150,8 @@ static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
 	ret = dmabuf->ops->mmap(dmabuf, vma);
 	dma_resv_unlock(dmabuf->resv);
 
+	WARN_ON_ONCE(!(vma->vm_flags & VM_PFNMAP));
+
 	return ret;
 }
 
@@ -1495,6 +1497,8 @@ int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
 	ret = dmabuf->ops->mmap(dmabuf, vma);
 	dma_resv_unlock(dmabuf->resv);
 
+	WARN_ON_ONCE(!(vma->vm_flags & VM_PFNMAP));
+
 	return ret;
 }
 EXPORT_SYMBOL_NS_GPL(dma_buf_mmap, DMA_BUF);
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-22 17:08 ` Daniel Vetter
@ 2022-11-22 18:03   ` Jason Gunthorpe
  -1 siblings, 0 replies; 72+ messages in thread
From: Jason Gunthorpe @ 2022-11-22 18:03 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, Intel Graphics Development,
	Christian König, Thomas Zimmermann, Suren Baghdasaryan,
	Matthew Wilcox, John Stultz, Daniel Vetter, Sumit Semwal,
	linux-media, linaro-mm-sig

On Tue, Nov 22, 2022 at 06:08:00PM +0100, Daniel Vetter wrote:
> tldr; DMA buffers aren't normal memory, expecting that you can use
> them like that (like calling get_user_pages works, or that they're
> accounting like any other normal memory) cannot be guaranteed.
> 
> Since some userspace only runs on integrated devices, where all
> buffers are actually all resident system memory, there's a huge
> temptation to assume that a struct page is always present and useable
> like for any more pagecache backed mmap. This has the potential to
> result in a uapi nightmare.
> 
> To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
> blocks get_user_pages and all the other struct page based
> infrastructure for everyone. In spirit this is the uapi counterpart to
> the kernel-internal CONFIG_DMABUF_DEBUG.
> 
> Motivated by a recent patch which wanted to swich the system dma-buf
> heap to vm_insert_page instead of vm_insert_pfn.
> 
> v2:
> 
> Jason brought up that we also want to guarantee that all ptes have the
> pte_special flag set, to catch fast get_user_pages (on architectures
> that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
> still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.
> 
> From auditing the various functions to insert pfn pte entires
> (vm_insert_pfn_prot, remap_pfn_range and all it's callers like
> dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
> this should be the correct flag to check for.

I didn't look at how this actually gets used, but it is a bit of a
pain to insert a lifetime controlled object like a struct page as a
special PTE/VM_PFNMAP

How is the lifetime model implemented here? How do you know when
userspace has finally unmapped the page?

Jason

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-22 18:03   ` Jason Gunthorpe
  0 siblings, 0 replies; 72+ messages in thread
From: Jason Gunthorpe @ 2022-11-22 18:03 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Intel Graphics Development, Matthew Wilcox, Sumit Semwal,
	linaro-mm-sig, John Stultz, DRI Development, Thomas Zimmermann,
	Daniel Vetter, Suren Baghdasaryan, Christian König,
	linux-media

On Tue, Nov 22, 2022 at 06:08:00PM +0100, Daniel Vetter wrote:
> tldr; DMA buffers aren't normal memory, expecting that you can use
> them like that (like calling get_user_pages works, or that they're
> accounting like any other normal memory) cannot be guaranteed.
> 
> Since some userspace only runs on integrated devices, where all
> buffers are actually all resident system memory, there's a huge
> temptation to assume that a struct page is always present and useable
> like for any more pagecache backed mmap. This has the potential to
> result in a uapi nightmare.
> 
> To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
> blocks get_user_pages and all the other struct page based
> infrastructure for everyone. In spirit this is the uapi counterpart to
> the kernel-internal CONFIG_DMABUF_DEBUG.
> 
> Motivated by a recent patch which wanted to swich the system dma-buf
> heap to vm_insert_page instead of vm_insert_pfn.
> 
> v2:
> 
> Jason brought up that we also want to guarantee that all ptes have the
> pte_special flag set, to catch fast get_user_pages (on architectures
> that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
> still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.
> 
> From auditing the various functions to insert pfn pte entires
> (vm_insert_pfn_prot, remap_pfn_range and all it's callers like
> dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
> this should be the correct flag to check for.

I didn't look at how this actually gets used, but it is a bit of a
pain to insert a lifetime controlled object like a struct page as a
special PTE/VM_PFNMAP

How is the lifetime model implemented here? How do you know when
userspace has finally unmapped the page?

Jason

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-22 18:03   ` Jason Gunthorpe
  (?)
@ 2022-11-22 18:08     ` Daniel Vetter
  -1 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-22 18:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: DRI Development, Intel Graphics Development,
	Christian König, Thomas Zimmermann, Suren Baghdasaryan,
	Matthew Wilcox, John Stultz, Daniel Vetter, Sumit Semwal,
	linux-media, linaro-mm-sig

On Tue, 22 Nov 2022 at 19:04, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Nov 22, 2022 at 06:08:00PM +0100, Daniel Vetter wrote:
> > tldr; DMA buffers aren't normal memory, expecting that you can use
> > them like that (like calling get_user_pages works, or that they're
> > accounting like any other normal memory) cannot be guaranteed.
> >
> > Since some userspace only runs on integrated devices, where all
> > buffers are actually all resident system memory, there's a huge
> > temptation to assume that a struct page is always present and useable
> > like for any more pagecache backed mmap. This has the potential to
> > result in a uapi nightmare.
> >
> > To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
> > blocks get_user_pages and all the other struct page based
> > infrastructure for everyone. In spirit this is the uapi counterpart to
> > the kernel-internal CONFIG_DMABUF_DEBUG.
> >
> > Motivated by a recent patch which wanted to swich the system dma-buf
> > heap to vm_insert_page instead of vm_insert_pfn.
> >
> > v2:
> >
> > Jason brought up that we also want to guarantee that all ptes have the
> > pte_special flag set, to catch fast get_user_pages (on architectures
> > that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
> > still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.
> >
> > From auditing the various functions to insert pfn pte entires
> > (vm_insert_pfn_prot, remap_pfn_range and all it's callers like
> > dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
> > this should be the correct flag to check for.
>
> I didn't look at how this actually gets used, but it is a bit of a
> pain to insert a lifetime controlled object like a struct page as a
> special PTE/VM_PFNMAP
>
> How is the lifetime model implemented here? How do you know when
> userspace has finally unmapped the page?

The vma has a filp which is the refcounted dma_buf. With dma_buf you
never get an individual page it's always the entire object. And it's
up to the allocator how exactly it wants to use or not use the page's
refcount. So if gup goes in and elevates the refcount, you can break
stuff, which is why I'm doing this.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-22 18:08     ` Daniel Vetter
  0 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-22 18:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Intel Graphics Development, Matthew Wilcox, Sumit Semwal,
	linaro-mm-sig, John Stultz, DRI Development, Thomas Zimmermann,
	Daniel Vetter, Suren Baghdasaryan, Christian König,
	linux-media

On Tue, 22 Nov 2022 at 19:04, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Nov 22, 2022 at 06:08:00PM +0100, Daniel Vetter wrote:
> > tldr; DMA buffers aren't normal memory, expecting that you can use
> > them like that (like calling get_user_pages works, or that they're
> > accounting like any other normal memory) cannot be guaranteed.
> >
> > Since some userspace only runs on integrated devices, where all
> > buffers are actually all resident system memory, there's a huge
> > temptation to assume that a struct page is always present and useable
> > like for any more pagecache backed mmap. This has the potential to
> > result in a uapi nightmare.
> >
> > To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
> > blocks get_user_pages and all the other struct page based
> > infrastructure for everyone. In spirit this is the uapi counterpart to
> > the kernel-internal CONFIG_DMABUF_DEBUG.
> >
> > Motivated by a recent patch which wanted to swich the system dma-buf
> > heap to vm_insert_page instead of vm_insert_pfn.
> >
> > v2:
> >
> > Jason brought up that we also want to guarantee that all ptes have the
> > pte_special flag set, to catch fast get_user_pages (on architectures
> > that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
> > still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.
> >
> > From auditing the various functions to insert pfn pte entires
> > (vm_insert_pfn_prot, remap_pfn_range and all it's callers like
> > dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
> > this should be the correct flag to check for.
>
> I didn't look at how this actually gets used, but it is a bit of a
> pain to insert a lifetime controlled object like a struct page as a
> special PTE/VM_PFNMAP
>
> How is the lifetime model implemented here? How do you know when
> userspace has finally unmapped the page?

The vma has a filp which is the refcounted dma_buf. With dma_buf you
never get an individual page it's always the entire object. And it's
up to the allocator how exactly it wants to use or not use the page's
refcount. So if gup goes in and elevates the refcount, you can break
stuff, which is why I'm doing this.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Intel-gfx] [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-22 18:08     ` Daniel Vetter
  0 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-22 18:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Intel Graphics Development, Matthew Wilcox, Sumit Semwal,
	linaro-mm-sig, John Stultz, DRI Development, Thomas Zimmermann,
	Daniel Vetter, Suren Baghdasaryan, Christian König,
	linux-media

On Tue, 22 Nov 2022 at 19:04, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Nov 22, 2022 at 06:08:00PM +0100, Daniel Vetter wrote:
> > tldr; DMA buffers aren't normal memory, expecting that you can use
> > them like that (like calling get_user_pages works, or that they're
> > accounting like any other normal memory) cannot be guaranteed.
> >
> > Since some userspace only runs on integrated devices, where all
> > buffers are actually all resident system memory, there's a huge
> > temptation to assume that a struct page is always present and useable
> > like for any more pagecache backed mmap. This has the potential to
> > result in a uapi nightmare.
> >
> > To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
> > blocks get_user_pages and all the other struct page based
> > infrastructure for everyone. In spirit this is the uapi counterpart to
> > the kernel-internal CONFIG_DMABUF_DEBUG.
> >
> > Motivated by a recent patch which wanted to swich the system dma-buf
> > heap to vm_insert_page instead of vm_insert_pfn.
> >
> > v2:
> >
> > Jason brought up that we also want to guarantee that all ptes have the
> > pte_special flag set, to catch fast get_user_pages (on architectures
> > that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
> > still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.
> >
> > From auditing the various functions to insert pfn pte entires
> > (vm_insert_pfn_prot, remap_pfn_range and all it's callers like
> > dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
> > this should be the correct flag to check for.
>
> I didn't look at how this actually gets used, but it is a bit of a
> pain to insert a lifetime controlled object like a struct page as a
> special PTE/VM_PFNMAP
>
> How is the lifetime model implemented here? How do you know when
> userspace has finally unmapped the page?

The vma has a filp which is the refcounted dma_buf. With dma_buf you
never get an individual page it's always the entire object. And it's
up to the allocator how exactly it wants to use or not use the page's
refcount. So if gup goes in and elevates the refcount, you can break
stuff, which is why I'm doing this.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-22 18:08     ` Daniel Vetter
@ 2022-11-22 18:50       ` Jason Gunthorpe
  -1 siblings, 0 replies; 72+ messages in thread
From: Jason Gunthorpe @ 2022-11-22 18:50 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, Intel Graphics Development,
	Christian König, Thomas Zimmermann, Suren Baghdasaryan,
	Matthew Wilcox, John Stultz, Daniel Vetter, Sumit Semwal,
	linux-media, linaro-mm-sig

On Tue, Nov 22, 2022 at 07:08:25PM +0100, Daniel Vetter wrote:
> On Tue, 22 Nov 2022 at 19:04, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Tue, Nov 22, 2022 at 06:08:00PM +0100, Daniel Vetter wrote:
> > > tldr; DMA buffers aren't normal memory, expecting that you can use
> > > them like that (like calling get_user_pages works, or that they're
> > > accounting like any other normal memory) cannot be guaranteed.
> > >
> > > Since some userspace only runs on integrated devices, where all
> > > buffers are actually all resident system memory, there's a huge
> > > temptation to assume that a struct page is always present and useable
> > > like for any more pagecache backed mmap. This has the potential to
> > > result in a uapi nightmare.
> > >
> > > To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
> > > blocks get_user_pages and all the other struct page based
> > > infrastructure for everyone. In spirit this is the uapi counterpart to
> > > the kernel-internal CONFIG_DMABUF_DEBUG.
> > >
> > > Motivated by a recent patch which wanted to swich the system dma-buf
> > > heap to vm_insert_page instead of vm_insert_pfn.
> > >
> > > v2:
> > >
> > > Jason brought up that we also want to guarantee that all ptes have the
> > > pte_special flag set, to catch fast get_user_pages (on architectures
> > > that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
> > > still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.
> > >
> > > From auditing the various functions to insert pfn pte entires
> > > (vm_insert_pfn_prot, remap_pfn_range and all it's callers like
> > > dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
> > > this should be the correct flag to check for.
> >
> > I didn't look at how this actually gets used, but it is a bit of a
> > pain to insert a lifetime controlled object like a struct page as a
> > special PTE/VM_PFNMAP
> >
> > How is the lifetime model implemented here? How do you know when
> > userspace has finally unmapped the page?
> 
> The vma has a filp which is the refcounted dma_buf. With dma_buf you
> never get an individual page it's always the entire object. And it's
> up to the allocator how exactly it wants to use or not use the page's
> refcount. So if gup goes in and elevates the refcount, you can break
> stuff, which is why I'm doing this.

But how does move work?

Jason

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-22 18:50       ` Jason Gunthorpe
  0 siblings, 0 replies; 72+ messages in thread
From: Jason Gunthorpe @ 2022-11-22 18:50 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Intel Graphics Development, Matthew Wilcox, Sumit Semwal,
	linaro-mm-sig, John Stultz, DRI Development, Thomas Zimmermann,
	Daniel Vetter, Suren Baghdasaryan, Christian König,
	linux-media

On Tue, Nov 22, 2022 at 07:08:25PM +0100, Daniel Vetter wrote:
> On Tue, 22 Nov 2022 at 19:04, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Tue, Nov 22, 2022 at 06:08:00PM +0100, Daniel Vetter wrote:
> > > tldr; DMA buffers aren't normal memory, expecting that you can use
> > > them like that (like calling get_user_pages works, or that they're
> > > accounting like any other normal memory) cannot be guaranteed.
> > >
> > > Since some userspace only runs on integrated devices, where all
> > > buffers are actually all resident system memory, there's a huge
> > > temptation to assume that a struct page is always present and useable
> > > like for any more pagecache backed mmap. This has the potential to
> > > result in a uapi nightmare.
> > >
> > > To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
> > > blocks get_user_pages and all the other struct page based
> > > infrastructure for everyone. In spirit this is the uapi counterpart to
> > > the kernel-internal CONFIG_DMABUF_DEBUG.
> > >
> > > Motivated by a recent patch which wanted to swich the system dma-buf
> > > heap to vm_insert_page instead of vm_insert_pfn.
> > >
> > > v2:
> > >
> > > Jason brought up that we also want to guarantee that all ptes have the
> > > pte_special flag set, to catch fast get_user_pages (on architectures
> > > that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
> > > still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.
> > >
> > > From auditing the various functions to insert pfn pte entires
> > > (vm_insert_pfn_prot, remap_pfn_range and all it's callers like
> > > dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
> > > this should be the correct flag to check for.
> >
> > I didn't look at how this actually gets used, but it is a bit of a
> > pain to insert a lifetime controlled object like a struct page as a
> > special PTE/VM_PFNMAP
> >
> > How is the lifetime model implemented here? How do you know when
> > userspace has finally unmapped the page?
> 
> The vma has a filp which is the refcounted dma_buf. With dma_buf you
> never get an individual page it's always the entire object. And it's
> up to the allocator how exactly it wants to use or not use the page's
> refcount. So if gup goes in and elevates the refcount, you can break
> stuff, which is why I'm doing this.

But how does move work?

Jason

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-22 18:50       ` Jason Gunthorpe
  (?)
@ 2022-11-22 19:29         ` Daniel Vetter
  -1 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-22 19:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: DRI Development, Intel Graphics Development,
	Christian König, Thomas Zimmermann, Suren Baghdasaryan,
	Matthew Wilcox, John Stultz, Daniel Vetter, Sumit Semwal,
	linux-media, linaro-mm-sig

On Tue, 22 Nov 2022 at 19:50, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Nov 22, 2022 at 07:08:25PM +0100, Daniel Vetter wrote:
> > On Tue, 22 Nov 2022 at 19:04, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Tue, Nov 22, 2022 at 06:08:00PM +0100, Daniel Vetter wrote:
> > > > tldr; DMA buffers aren't normal memory, expecting that you can use
> > > > them like that (like calling get_user_pages works, or that they're
> > > > accounting like any other normal memory) cannot be guaranteed.
> > > >
> > > > Since some userspace only runs on integrated devices, where all
> > > > buffers are actually all resident system memory, there's a huge
> > > > temptation to assume that a struct page is always present and useable
> > > > like for any more pagecache backed mmap. This has the potential to
> > > > result in a uapi nightmare.
> > > >
> > > > To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
> > > > blocks get_user_pages and all the other struct page based
> > > > infrastructure for everyone. In spirit this is the uapi counterpart to
> > > > the kernel-internal CONFIG_DMABUF_DEBUG.
> > > >
> > > > Motivated by a recent patch which wanted to swich the system dma-buf
> > > > heap to vm_insert_page instead of vm_insert_pfn.
> > > >
> > > > v2:
> > > >
> > > > Jason brought up that we also want to guarantee that all ptes have the
> > > > pte_special flag set, to catch fast get_user_pages (on architectures
> > > > that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
> > > > still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.
> > > >
> > > > From auditing the various functions to insert pfn pte entires
> > > > (vm_insert_pfn_prot, remap_pfn_range and all it's callers like
> > > > dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
> > > > this should be the correct flag to check for.
> > >
> > > I didn't look at how this actually gets used, but it is a bit of a
> > > pain to insert a lifetime controlled object like a struct page as a
> > > special PTE/VM_PFNMAP
> > >
> > > How is the lifetime model implemented here? How do you know when
> > > userspace has finally unmapped the page?
> >
> > The vma has a filp which is the refcounted dma_buf. With dma_buf you
> > never get an individual page it's always the entire object. And it's
> > up to the allocator how exactly it wants to use or not use the page's
> > refcount. So if gup goes in and elevates the refcount, you can break
> > stuff, which is why I'm doing this.
>
> But how does move work?

You nuke all the ptes. Drivers that move have slightly more than a
bare struct file, they also have a struct address_space so that
invalidate_mapping_range() works. Refaulting and any coherency issues
when a refault races against a dma-buf migration is up to the
driver/exporter to handle correctly. None rely on struct page like mm/
moving stuff around for compaction/ksm/numa-balancing/whateverr.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-22 19:29         ` Daniel Vetter
  0 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-22 19:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Intel Graphics Development, Matthew Wilcox, Sumit Semwal,
	linaro-mm-sig, John Stultz, DRI Development, Thomas Zimmermann,
	Daniel Vetter, Suren Baghdasaryan, Christian König,
	linux-media

On Tue, 22 Nov 2022 at 19:50, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Nov 22, 2022 at 07:08:25PM +0100, Daniel Vetter wrote:
> > On Tue, 22 Nov 2022 at 19:04, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Tue, Nov 22, 2022 at 06:08:00PM +0100, Daniel Vetter wrote:
> > > > tldr; DMA buffers aren't normal memory, expecting that you can use
> > > > them like that (like calling get_user_pages works, or that they're
> > > > accounting like any other normal memory) cannot be guaranteed.
> > > >
> > > > Since some userspace only runs on integrated devices, where all
> > > > buffers are actually all resident system memory, there's a huge
> > > > temptation to assume that a struct page is always present and useable
> > > > like for any more pagecache backed mmap. This has the potential to
> > > > result in a uapi nightmare.
> > > >
> > > > To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
> > > > blocks get_user_pages and all the other struct page based
> > > > infrastructure for everyone. In spirit this is the uapi counterpart to
> > > > the kernel-internal CONFIG_DMABUF_DEBUG.
> > > >
> > > > Motivated by a recent patch which wanted to swich the system dma-buf
> > > > heap to vm_insert_page instead of vm_insert_pfn.
> > > >
> > > > v2:
> > > >
> > > > Jason brought up that we also want to guarantee that all ptes have the
> > > > pte_special flag set, to catch fast get_user_pages (on architectures
> > > > that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
> > > > still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.
> > > >
> > > > From auditing the various functions to insert pfn pte entires
> > > > (vm_insert_pfn_prot, remap_pfn_range and all it's callers like
> > > > dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
> > > > this should be the correct flag to check for.
> > >
> > > I didn't look at how this actually gets used, but it is a bit of a
> > > pain to insert a lifetime controlled object like a struct page as a
> > > special PTE/VM_PFNMAP
> > >
> > > How is the lifetime model implemented here? How do you know when
> > > userspace has finally unmapped the page?
> >
> > The vma has a filp which is the refcounted dma_buf. With dma_buf you
> > never get an individual page it's always the entire object. And it's
> > up to the allocator how exactly it wants to use or not use the page's
> > refcount. So if gup goes in and elevates the refcount, you can break
> > stuff, which is why I'm doing this.
>
> But how does move work?

You nuke all the ptes. Drivers that move have slightly more than a
bare struct file, they also have a struct address_space so that
invalidate_mapping_range() works. Refaulting and any coherency issues
when a refault races against a dma-buf migration is up to the
driver/exporter to handle correctly. None rely on struct page like mm/
moving stuff around for compaction/ksm/numa-balancing/whateverr.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Intel-gfx] [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-22 19:29         ` Daniel Vetter
  0 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-22 19:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Intel Graphics Development, Matthew Wilcox, Sumit Semwal,
	linaro-mm-sig, John Stultz, DRI Development, Thomas Zimmermann,
	Daniel Vetter, Suren Baghdasaryan, Christian König,
	linux-media

On Tue, 22 Nov 2022 at 19:50, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Nov 22, 2022 at 07:08:25PM +0100, Daniel Vetter wrote:
> > On Tue, 22 Nov 2022 at 19:04, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Tue, Nov 22, 2022 at 06:08:00PM +0100, Daniel Vetter wrote:
> > > > tldr; DMA buffers aren't normal memory, expecting that you can use
> > > > them like that (like calling get_user_pages works, or that they're
> > > > accounting like any other normal memory) cannot be guaranteed.
> > > >
> > > > Since some userspace only runs on integrated devices, where all
> > > > buffers are actually all resident system memory, there's a huge
> > > > temptation to assume that a struct page is always present and useable
> > > > like for any more pagecache backed mmap. This has the potential to
> > > > result in a uapi nightmare.
> > > >
> > > > To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
> > > > blocks get_user_pages and all the other struct page based
> > > > infrastructure for everyone. In spirit this is the uapi counterpart to
> > > > the kernel-internal CONFIG_DMABUF_DEBUG.
> > > >
> > > > Motivated by a recent patch which wanted to swich the system dma-buf
> > > > heap to vm_insert_page instead of vm_insert_pfn.
> > > >
> > > > v2:
> > > >
> > > > Jason brought up that we also want to guarantee that all ptes have the
> > > > pte_special flag set, to catch fast get_user_pages (on architectures
> > > > that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
> > > > still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.
> > > >
> > > > From auditing the various functions to insert pfn pte entires
> > > > (vm_insert_pfn_prot, remap_pfn_range and all it's callers like
> > > > dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
> > > > this should be the correct flag to check for.
> > >
> > > I didn't look at how this actually gets used, but it is a bit of a
> > > pain to insert a lifetime controlled object like a struct page as a
> > > special PTE/VM_PFNMAP
> > >
> > > How is the lifetime model implemented here? How do you know when
> > > userspace has finally unmapped the page?
> >
> > The vma has a filp which is the refcounted dma_buf. With dma_buf you
> > never get an individual page it's always the entire object. And it's
> > up to the allocator how exactly it wants to use or not use the page's
> > refcount. So if gup goes in and elevates the refcount, you can break
> > stuff, which is why I'm doing this.
>
> But how does move work?

You nuke all the ptes. Drivers that move have slightly more than a
bare struct file, they also have a struct address_space so that
invalidate_mapping_range() works. Refaulting and any coherency issues
when a refault races against a dma-buf migration is up to the
driver/exporter to handle correctly. None rely on struct page like mm/
moving stuff around for compaction/ksm/numa-balancing/whateverr.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-22 19:29         ` Daniel Vetter
@ 2022-11-22 19:34           ` Jason Gunthorpe
  -1 siblings, 0 replies; 72+ messages in thread
From: Jason Gunthorpe @ 2022-11-22 19:34 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, Intel Graphics Development,
	Christian König, Thomas Zimmermann, Suren Baghdasaryan,
	Matthew Wilcox, John Stultz, Daniel Vetter, Sumit Semwal,
	linux-media, linaro-mm-sig

On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:

> You nuke all the ptes. Drivers that move have slightly more than a
> bare struct file, they also have a struct address_space so that
> invalidate_mapping_range() works.

Okay, this is one of the ways that this can be made to work correctly,
as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this
was the DAX mistake)

Jason

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-22 19:34           ` Jason Gunthorpe
  0 siblings, 0 replies; 72+ messages in thread
From: Jason Gunthorpe @ 2022-11-22 19:34 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Intel Graphics Development, Matthew Wilcox, Sumit Semwal,
	linaro-mm-sig, John Stultz, DRI Development, Thomas Zimmermann,
	Daniel Vetter, Suren Baghdasaryan, Christian König,
	linux-media

On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:

> You nuke all the ptes. Drivers that move have slightly more than a
> bare struct file, they also have a struct address_space so that
> invalidate_mapping_range() works.

Okay, this is one of the ways that this can be made to work correctly,
as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this
was the DAX mistake)

Jason

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-22 17:08 ` Daniel Vetter
                   ` (2 preceding siblings ...)
  (?)
@ 2022-11-22 19:43 ` Patchwork
  -1 siblings, 0 replies; 72+ messages in thread
From: Patchwork @ 2022-11-22 19:43 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

== Series Details ==

Series: dma-buf: Require VM_PFNMAP vma for mmap
URL   : https://patchwork.freedesktop.org/series/111210/
State : warning

== Summary ==

Error: dim checkpatch failed
354674e79934 dma-buf: Require VM_PFNMAP vma for mmap
-:15: WARNING:TYPO_SPELLING: 'useable' may be misspelled - perhaps 'usable'?
#15: 
temptation to assume that a struct page is always present and useable
                                                              ^^^^^^^

-:34: WARNING:TYPO_SPELLING: 'entires' may be misspelled - perhaps 'entries'?
#34: 
From auditing the various functions to insert pfn pte entires
                                                      ^^^^^^^

-:41: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#41: 
References: https://lore.kernel.org/lkml/CAKMK7uHi+mG0z0HUmNt13QCCvutuRVjpcR0NjRL12k-WbWzkRg@mail.gmail.com/

-:76: WARNING:FROM_SIGN_OFF_MISMATCH: From:/Signed-off-by: email address mismatch: 'From: Daniel Vetter <daniel.vetter@ffwll.ch>' != 'Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>'

total: 0 errors, 4 warnings, 0 checks, 16 lines checked



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-22 19:34           ` Jason Gunthorpe
  (?)
@ 2022-11-22 19:50             ` Daniel Vetter
  -1 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-22 19:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: DRI Development, Intel Graphics Development,
	Christian König, Thomas Zimmermann, Suren Baghdasaryan,
	Matthew Wilcox, John Stultz, Daniel Vetter, Sumit Semwal,
	linux-media, linaro-mm-sig

On Tue, 22 Nov 2022 at 20:34, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:
> > You nuke all the ptes. Drivers that move have slightly more than a
> > bare struct file, they also have a struct address_space so that
> > invalidate_mapping_range() works.
>
> Okay, this is one of the ways that this can be made to work correctly,
> as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this
> was the DAX mistake)

Hence this patch, to enforce that no dma-buf exporter gets this wrong.
Which some did, and then blamed bug reporters for the resulting splats
:-) One of the things we've reverted was the ttm huge pte support,
since that doesn't have the pmd_special flag (yet) and so would let
gup_fast through.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-22 19:50             ` Daniel Vetter
  0 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-22 19:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Intel Graphics Development, Matthew Wilcox, Sumit Semwal,
	linaro-mm-sig, John Stultz, DRI Development, Thomas Zimmermann,
	Daniel Vetter, Suren Baghdasaryan, Christian König,
	linux-media

On Tue, 22 Nov 2022 at 20:34, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:
> > You nuke all the ptes. Drivers that move have slightly more than a
> > bare struct file, they also have a struct address_space so that
> > invalidate_mapping_range() works.
>
> Okay, this is one of the ways that this can be made to work correctly,
> as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this
> was the DAX mistake)

Hence this patch, to enforce that no dma-buf exporter gets this wrong.
Which some did, and then blamed bug reporters for the resulting splats
:-) One of the things we've reverted was the ttm huge pte support,
since that doesn't have the pmd_special flag (yet) and so would let
gup_fast through.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Intel-gfx] [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-22 19:50             ` Daniel Vetter
  0 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-22 19:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Intel Graphics Development, Matthew Wilcox, Sumit Semwal,
	linaro-mm-sig, John Stultz, DRI Development, Thomas Zimmermann,
	Daniel Vetter, Suren Baghdasaryan, Christian König,
	linux-media

On Tue, 22 Nov 2022 at 20:34, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:
> > You nuke all the ptes. Drivers that move have slightly more than a
> > bare struct file, they also have a struct address_space so that
> > invalidate_mapping_range() works.
>
> Okay, this is one of the ways that this can be made to work correctly,
> as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this
> was the DAX mistake)

Hence this patch, to enforce that no dma-buf exporter gets this wrong.
Which some did, and then blamed bug reporters for the resulting splats
:-) One of the things we've reverted was the ttm huge pte support,
since that doesn't have the pmd_special flag (yet) and so would let
gup_fast through.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-22 17:08 ` Daniel Vetter
                   ` (3 preceding siblings ...)
  (?)
@ 2022-11-22 20:08 ` Patchwork
  -1 siblings, 0 replies; 72+ messages in thread
From: Patchwork @ 2022-11-22 20:08 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 3443 bytes --]

== Series Details ==

Series: dma-buf: Require VM_PFNMAP vma for mmap
URL   : https://patchwork.freedesktop.org/series/111210/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12418 -> Patchwork_111210v1
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/index.html

Participating hosts (33 -> 32)
------------------------------

  Missing    (1): fi-pnv-d510 

Known issues
------------

  Here are the changes found in Patchwork_111210v1 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@i915_selftest@live@migrate:
    - bat-adlp-4:         [PASS][1] -> [INCOMPLETE][2] ([i915#7308] / [i915#7348])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/bat-adlp-4/igt@i915_selftest@live@migrate.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/bat-adlp-4/igt@i915_selftest@live@migrate.html

  * igt@runner@aborted:
    - bat-adlp-4:         NOTRUN -> [FAIL][3] ([i915#4312])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/bat-adlp-4/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@hangcheck:
    - {bat-adlm-1}:       [INCOMPLETE][4] ([i915#4983]) -> [PASS][5]
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/bat-adlm-1/igt@i915_selftest@live@hangcheck.html
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/bat-adlm-1/igt@i915_selftest@live@hangcheck.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions:
    - fi-bsw-kefka:       [FAIL][6] ([i915#6298]) -> [PASS][7]
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/fi-bsw-kefka/igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions.html
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/fi-bsw-kefka/igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845
  [i915#4258]: https://gitlab.freedesktop.org/drm/intel/issues/4258
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4983]: https://gitlab.freedesktop.org/drm/intel/issues/4983
  [i915#6298]: https://gitlab.freedesktop.org/drm/intel/issues/6298
  [i915#6434]: https://gitlab.freedesktop.org/drm/intel/issues/6434
  [i915#7308]: https://gitlab.freedesktop.org/drm/intel/issues/7308
  [i915#7346]: https://gitlab.freedesktop.org/drm/intel/issues/7346
  [i915#7348]: https://gitlab.freedesktop.org/drm/intel/issues/7348


Build changes
-------------

  * Linux: CI_DRM_12418 -> Patchwork_111210v1

  CI-20190529: 20190529
  CI_DRM_12418: 22789b788bcaf35826550836b0ad6872d6e85ca6 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_7071: 0801475083ccb938b1d3b358502ff97fdb435585 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_111210v1: 22789b788bcaf35826550836b0ad6872d6e85ca6 @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

1638e830bb57 dma-buf: Require VM_PFNMAP vma for mmap

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/index.html

[-- Attachment #2: Type: text/html, Size: 3806 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-22 17:08 ` Daniel Vetter
  (?)
@ 2022-11-23  8:07   ` Thomas Zimmermann
  -1 siblings, 0 replies; 72+ messages in thread
From: Thomas Zimmermann @ 2022-11-23  8:07 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development
  Cc: Intel Graphics Development, Matthew Wilcox, Sumit Semwal,
	linaro-mm-sig, Jason Gunthorpe, John Stultz, Daniel Vetter,
	Suren Baghdasaryan, Christian König, linux-media


[-- Attachment #1.1: Type: text/plain, Size: 3855 bytes --]

Hi

Am 22.11.22 um 18:08 schrieb Daniel Vetter:
> tldr; DMA buffers aren't normal memory, expecting that you can use
> them like that (like calling get_user_pages works, or that they're
> accounting like any other normal memory) cannot be guaranteed.
> 
> Since some userspace only runs on integrated devices, where all
> buffers are actually all resident system memory, there's a huge
> temptation to assume that a struct page is always present and useable
> like for any more pagecache backed mmap. This has the potential to
> result in a uapi nightmare.
> 
> To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
> blocks get_user_pages and all the other struct page based
> infrastructure for everyone. In spirit this is the uapi counterpart to
> the kernel-internal CONFIG_DMABUF_DEBUG.
> 
> Motivated by a recent patch which wanted to swich the system dma-buf
> heap to vm_insert_page instead of vm_insert_pfn.
> 
> v2:
> 
> Jason brought up that we also want to guarantee that all ptes have the
> pte_special flag set, to catch fast get_user_pages (on architectures
> that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
> still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.
> 
>  From auditing the various functions to insert pfn pte entires
> (vm_insert_pfn_prot, remap_pfn_range and all it's callers like
> dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
> this should be the correct flag to check for.
> 
> v3: Change to WARN_ON_ONCE (Thomas Zimmermann)
> 
> References: https://lore.kernel.org/lkml/CAKMK7uHi+mG0z0HUmNt13QCCvutuRVjpcR0NjRL12k-WbWzkRg@mail.gmail.com/
> Acked-by: Christian König <christian.koenig@amd.com>
> Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
> Cc: Thomas Zimmermann <tzimmermann@suse.de>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: John Stultz <john.stultz@linaro.org>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: linux-media@vger.kernel.org
> Cc: linaro-mm-sig@lists.linaro.org
> --
> Ok I entirely forgot about this patch but stumbled over it and checked
> what's up with it no. I think it's ready now for merging:
> - shmem helper patches to fix up vgem landed
> - ttm has been fixed since a while
> - I don't think we've had any other open issues
> 
> Time to lock down this uapi contract for real?
> -Daniel
> ---
>   drivers/dma-buf/dma-buf.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index b6c36914e7c6..88718665c3c3 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -150,6 +150,8 @@ static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
>   	ret = dmabuf->ops->mmap(dmabuf, vma);
>   	dma_resv_unlock(dmabuf->resv);
>   
> +	WARN_ON_ONCE(!(vma->vm_flags & VM_PFNMAP));

Well , I already a-b'ed this, but does it work with DMa helpers. I'm 
asking because of [1].

Best regards
Thomas

[1] 
https://elixir.bootlin.com/linux/v6.1-rc6/source/drivers/gpu/drm/drm_gem_dma_helper.c#L533

> +
>   	return ret;
>   }
>   
> @@ -1495,6 +1497,8 @@ int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
>   	ret = dmabuf->ops->mmap(dmabuf, vma);
>   	dma_resv_unlock(dmabuf->resv);
>   
> +	WARN_ON_ONCE(!(vma->vm_flags & VM_PFNMAP));
> +
>   	return ret;
>   }
>   EXPORT_SYMBOL_NS_GPL(dma_buf_mmap, DMA_BUF);

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23  8:07   ` Thomas Zimmermann
  0 siblings, 0 replies; 72+ messages in thread
From: Thomas Zimmermann @ 2022-11-23  8:07 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development
  Cc: Intel Graphics Development, Matthew Wilcox, Christian König,
	linaro-mm-sig, Jason Gunthorpe, John Stultz, Daniel Vetter,
	Suren Baghdasaryan, Sumit Semwal, linux-media


[-- Attachment #1.1: Type: text/plain, Size: 3855 bytes --]

Hi

Am 22.11.22 um 18:08 schrieb Daniel Vetter:
> tldr; DMA buffers aren't normal memory, expecting that you can use
> them like that (like calling get_user_pages works, or that they're
> accounting like any other normal memory) cannot be guaranteed.
> 
> Since some userspace only runs on integrated devices, where all
> buffers are actually all resident system memory, there's a huge
> temptation to assume that a struct page is always present and useable
> like for any more pagecache backed mmap. This has the potential to
> result in a uapi nightmare.
> 
> To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
> blocks get_user_pages and all the other struct page based
> infrastructure for everyone. In spirit this is the uapi counterpart to
> the kernel-internal CONFIG_DMABUF_DEBUG.
> 
> Motivated by a recent patch which wanted to swich the system dma-buf
> heap to vm_insert_page instead of vm_insert_pfn.
> 
> v2:
> 
> Jason brought up that we also want to guarantee that all ptes have the
> pte_special flag set, to catch fast get_user_pages (on architectures
> that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
> still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.
> 
>  From auditing the various functions to insert pfn pte entires
> (vm_insert_pfn_prot, remap_pfn_range and all it's callers like
> dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
> this should be the correct flag to check for.
> 
> v3: Change to WARN_ON_ONCE (Thomas Zimmermann)
> 
> References: https://lore.kernel.org/lkml/CAKMK7uHi+mG0z0HUmNt13QCCvutuRVjpcR0NjRL12k-WbWzkRg@mail.gmail.com/
> Acked-by: Christian König <christian.koenig@amd.com>
> Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
> Cc: Thomas Zimmermann <tzimmermann@suse.de>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: John Stultz <john.stultz@linaro.org>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: linux-media@vger.kernel.org
> Cc: linaro-mm-sig@lists.linaro.org
> --
> Ok I entirely forgot about this patch but stumbled over it and checked
> what's up with it no. I think it's ready now for merging:
> - shmem helper patches to fix up vgem landed
> - ttm has been fixed since a while
> - I don't think we've had any other open issues
> 
> Time to lock down this uapi contract for real?
> -Daniel
> ---
>   drivers/dma-buf/dma-buf.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index b6c36914e7c6..88718665c3c3 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -150,6 +150,8 @@ static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
>   	ret = dmabuf->ops->mmap(dmabuf, vma);
>   	dma_resv_unlock(dmabuf->resv);
>   
> +	WARN_ON_ONCE(!(vma->vm_flags & VM_PFNMAP));

Well , I already a-b'ed this, but does it work with DMa helpers. I'm 
asking because of [1].

Best regards
Thomas

[1] 
https://elixir.bootlin.com/linux/v6.1-rc6/source/drivers/gpu/drm/drm_gem_dma_helper.c#L533

> +
>   	return ret;
>   }
>   
> @@ -1495,6 +1497,8 @@ int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
>   	ret = dmabuf->ops->mmap(dmabuf, vma);
>   	dma_resv_unlock(dmabuf->resv);
>   
> +	WARN_ON_ONCE(!(vma->vm_flags & VM_PFNMAP));
> +
>   	return ret;
>   }
>   EXPORT_SYMBOL_NS_GPL(dma_buf_mmap, DMA_BUF);

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Intel-gfx] [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23  8:07   ` Thomas Zimmermann
  0 siblings, 0 replies; 72+ messages in thread
From: Thomas Zimmermann @ 2022-11-23  8:07 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development
  Cc: Intel Graphics Development, Matthew Wilcox, Christian König,
	linaro-mm-sig, Jason Gunthorpe, John Stultz, Daniel Vetter,
	Suren Baghdasaryan, Sumit Semwal, linux-media


[-- Attachment #1.1: Type: text/plain, Size: 3855 bytes --]

Hi

Am 22.11.22 um 18:08 schrieb Daniel Vetter:
> tldr; DMA buffers aren't normal memory, expecting that you can use
> them like that (like calling get_user_pages works, or that they're
> accounting like any other normal memory) cannot be guaranteed.
> 
> Since some userspace only runs on integrated devices, where all
> buffers are actually all resident system memory, there's a huge
> temptation to assume that a struct page is always present and useable
> like for any more pagecache backed mmap. This has the potential to
> result in a uapi nightmare.
> 
> To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
> blocks get_user_pages and all the other struct page based
> infrastructure for everyone. In spirit this is the uapi counterpart to
> the kernel-internal CONFIG_DMABUF_DEBUG.
> 
> Motivated by a recent patch which wanted to swich the system dma-buf
> heap to vm_insert_page instead of vm_insert_pfn.
> 
> v2:
> 
> Jason brought up that we also want to guarantee that all ptes have the
> pte_special flag set, to catch fast get_user_pages (on architectures
> that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
> still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.
> 
>  From auditing the various functions to insert pfn pte entires
> (vm_insert_pfn_prot, remap_pfn_range and all it's callers like
> dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
> this should be the correct flag to check for.
> 
> v3: Change to WARN_ON_ONCE (Thomas Zimmermann)
> 
> References: https://lore.kernel.org/lkml/CAKMK7uHi+mG0z0HUmNt13QCCvutuRVjpcR0NjRL12k-WbWzkRg@mail.gmail.com/
> Acked-by: Christian König <christian.koenig@amd.com>
> Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
> Cc: Thomas Zimmermann <tzimmermann@suse.de>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: John Stultz <john.stultz@linaro.org>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: linux-media@vger.kernel.org
> Cc: linaro-mm-sig@lists.linaro.org
> --
> Ok I entirely forgot about this patch but stumbled over it and checked
> what's up with it no. I think it's ready now for merging:
> - shmem helper patches to fix up vgem landed
> - ttm has been fixed since a while
> - I don't think we've had any other open issues
> 
> Time to lock down this uapi contract for real?
> -Daniel
> ---
>   drivers/dma-buf/dma-buf.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index b6c36914e7c6..88718665c3c3 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -150,6 +150,8 @@ static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
>   	ret = dmabuf->ops->mmap(dmabuf, vma);
>   	dma_resv_unlock(dmabuf->resv);
>   
> +	WARN_ON_ONCE(!(vma->vm_flags & VM_PFNMAP));

Well , I already a-b'ed this, but does it work with DMa helpers. I'm 
asking because of [1].

Best regards
Thomas

[1] 
https://elixir.bootlin.com/linux/v6.1-rc6/source/drivers/gpu/drm/drm_gem_dma_helper.c#L533

> +
>   	return ret;
>   }
>   
> @@ -1495,6 +1497,8 @@ int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
>   	ret = dmabuf->ops->mmap(dmabuf, vma);
>   	dma_resv_unlock(dmabuf->resv);
>   
> +	WARN_ON_ONCE(!(vma->vm_flags & VM_PFNMAP));
> +
>   	return ret;
>   }
>   EXPORT_SYMBOL_NS_GPL(dma_buf_mmap, DMA_BUF);

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-22 19:50             ` Daniel Vetter
  (?)
@ 2022-11-23  9:06               ` Christian König
  -1 siblings, 0 replies; 72+ messages in thread
From: Christian König @ 2022-11-23  9:06 UTC (permalink / raw)
  To: Daniel Vetter, Jason Gunthorpe
  Cc: Intel Graphics Development, Matthew Wilcox, linaro-mm-sig,
	John Stultz, DRI Development, Thomas Zimmermann, Daniel Vetter,
	Suren Baghdasaryan, Sumit Semwal, linux-media

Am 22.11.22 um 20:50 schrieb Daniel Vetter:
> On Tue, 22 Nov 2022 at 20:34, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>> On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:
>>> You nuke all the ptes. Drivers that move have slightly more than a
>>> bare struct file, they also have a struct address_space so that
>>> invalidate_mapping_range() works.
>> Okay, this is one of the ways that this can be made to work correctly,
>> as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this
>> was the DAX mistake)
> Hence this patch, to enforce that no dma-buf exporter gets this wrong.
> Which some did, and then blamed bug reporters for the resulting splats
> :-) One of the things we've reverted was the ttm huge pte support,
> since that doesn't have the pmd_special flag (yet) and so would let
> gup_fast through.

The problem is not only gup, a lot of people seem to assume that when 
you are able to grab a reference to a page that the ptes pointing to 
that page can't change any more. And that's obviously incorrect.

I witnessed tons of discussions about that already. Some customers even 
modified our code assuming that and then wondered why the heck they ran 
into data corruption.

It's gotten so bad that I've even proposed intentionally mangling the 
page reference count on TTM allocated pages: 
https://patchwork.kernel.org/project/dri-devel/patch/20220927143529.135689-1-christian.koenig@amd.com/

I think it would be better that instead of having special flags in the 
ptes and vmas that you can't follow them to a page structure we would 
add something to the page indicating that you can't grab a reference to 
it. But this might break some use cases as well.

Regards,
Christian.

> -Daniel


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Intel-gfx] [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23  9:06               ` Christian König
  0 siblings, 0 replies; 72+ messages in thread
From: Christian König @ 2022-11-23  9:06 UTC (permalink / raw)
  To: Daniel Vetter, Jason Gunthorpe
  Cc: Intel Graphics Development, Matthew Wilcox, linaro-mm-sig,
	John Stultz, DRI Development, Thomas Zimmermann, Daniel Vetter,
	Suren Baghdasaryan, Sumit Semwal, linux-media

Am 22.11.22 um 20:50 schrieb Daniel Vetter:
> On Tue, 22 Nov 2022 at 20:34, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>> On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:
>>> You nuke all the ptes. Drivers that move have slightly more than a
>>> bare struct file, they also have a struct address_space so that
>>> invalidate_mapping_range() works.
>> Okay, this is one of the ways that this can be made to work correctly,
>> as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this
>> was the DAX mistake)
> Hence this patch, to enforce that no dma-buf exporter gets this wrong.
> Which some did, and then blamed bug reporters for the resulting splats
> :-) One of the things we've reverted was the ttm huge pte support,
> since that doesn't have the pmd_special flag (yet) and so would let
> gup_fast through.

The problem is not only gup, a lot of people seem to assume that when 
you are able to grab a reference to a page that the ptes pointing to 
that page can't change any more. And that's obviously incorrect.

I witnessed tons of discussions about that already. Some customers even 
modified our code assuming that and then wondered why the heck they ran 
into data corruption.

It's gotten so bad that I've even proposed intentionally mangling the 
page reference count on TTM allocated pages: 
https://patchwork.kernel.org/project/dri-devel/patch/20220927143529.135689-1-christian.koenig@amd.com/

I think it would be better that instead of having special flags in the 
ptes and vmas that you can't follow them to a page structure we would 
add something to the page indicating that you can't grab a reference to 
it. But this might break some use cases as well.

Regards,
Christian.

> -Daniel


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23  9:06               ` Christian König
  0 siblings, 0 replies; 72+ messages in thread
From: Christian König @ 2022-11-23  9:06 UTC (permalink / raw)
  To: Daniel Vetter, Jason Gunthorpe
  Cc: DRI Development, Intel Graphics Development, Thomas Zimmermann,
	Suren Baghdasaryan, Matthew Wilcox, John Stultz, Daniel Vetter,
	Sumit Semwal, linux-media, linaro-mm-sig

Am 22.11.22 um 20:50 schrieb Daniel Vetter:
> On Tue, 22 Nov 2022 at 20:34, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>> On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:
>>> You nuke all the ptes. Drivers that move have slightly more than a
>>> bare struct file, they also have a struct address_space so that
>>> invalidate_mapping_range() works.
>> Okay, this is one of the ways that this can be made to work correctly,
>> as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this
>> was the DAX mistake)
> Hence this patch, to enforce that no dma-buf exporter gets this wrong.
> Which some did, and then blamed bug reporters for the resulting splats
> :-) One of the things we've reverted was the ttm huge pte support,
> since that doesn't have the pmd_special flag (yet) and so would let
> gup_fast through.

The problem is not only gup, a lot of people seem to assume that when 
you are able to grab a reference to a page that the ptes pointing to 
that page can't change any more. And that's obviously incorrect.

I witnessed tons of discussions about that already. Some customers even 
modified our code assuming that and then wondered why the heck they ran 
into data corruption.

It's gotten so bad that I've even proposed intentionally mangling the 
page reference count on TTM allocated pages: 
https://patchwork.kernel.org/project/dri-devel/patch/20220927143529.135689-1-christian.koenig@amd.com/

I think it would be better that instead of having special flags in the 
ptes and vmas that you can't follow them to a page structure we would 
add something to the page indicating that you can't grab a reference to 
it. But this might break some use cases as well.

Regards,
Christian.

> -Daniel


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-23  9:06               ` [Intel-gfx] " Christian König
  (?)
@ 2022-11-23  9:30                 ` Daniel Vetter
  -1 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-23  9:30 UTC (permalink / raw)
  To: Christian König
  Cc: Intel Graphics Development, Matthew Wilcox, linaro-mm-sig,
	Jason Gunthorpe, John Stultz, DRI Development, Thomas Zimmermann,
	Daniel Vetter, Suren Baghdasaryan, Sumit Semwal, linux-media

On Wed, 23 Nov 2022 at 10:06, Christian König <christian.koenig@amd.com> wrote:
> Am 22.11.22 um 20:50 schrieb Daniel Vetter:
> > On Tue, 22 Nov 2022 at 20:34, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >> On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:
> >>> You nuke all the ptes. Drivers that move have slightly more than a
> >>> bare struct file, they also have a struct address_space so that
> >>> invalidate_mapping_range() works.
> >> Okay, this is one of the ways that this can be made to work correctly,
> >> as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this
> >> was the DAX mistake)
> > Hence this patch, to enforce that no dma-buf exporter gets this wrong.
> > Which some did, and then blamed bug reporters for the resulting splats
> > :-) One of the things we've reverted was the ttm huge pte support,
> > since that doesn't have the pmd_special flag (yet) and so would let
> > gup_fast through.
>
> The problem is not only gup, a lot of people seem to assume that when
> you are able to grab a reference to a page that the ptes pointing to
> that page can't change any more. And that's obviously incorrect.
>
> I witnessed tons of discussions about that already. Some customers even
> modified our code assuming that and then wondered why the heck they ran
> into data corruption.
>
> It's gotten so bad that I've even proposed intentionally mangling the
> page reference count on TTM allocated pages:
> https://patchwork.kernel.org/project/dri-devel/patch/20220927143529.135689-1-christian.koenig@amd.com/

Yeah maybe something like this could be applied after we land this
patch here. Well maybe should have the same check in gem mmap code to
make sure no driver

> I think it would be better that instead of having special flags in the
> ptes and vmas that you can't follow them to a page structure we would
> add something to the page indicating that you can't grab a reference to
> it. But this might break some use cases as well.

Afaik the problem with that is that there's no free page bits left for
these debug checks. Plus the pte+vma flags are the flags to make this
clear already.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Intel-gfx] [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23  9:30                 ` Daniel Vetter
  0 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-23  9:30 UTC (permalink / raw)
  To: Christian König
  Cc: Intel Graphics Development, Matthew Wilcox, linaro-mm-sig,
	Jason Gunthorpe, John Stultz, DRI Development, Thomas Zimmermann,
	Daniel Vetter, Suren Baghdasaryan, Sumit Semwal, linux-media

On Wed, 23 Nov 2022 at 10:06, Christian König <christian.koenig@amd.com> wrote:
> Am 22.11.22 um 20:50 schrieb Daniel Vetter:
> > On Tue, 22 Nov 2022 at 20:34, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >> On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:
> >>> You nuke all the ptes. Drivers that move have slightly more than a
> >>> bare struct file, they also have a struct address_space so that
> >>> invalidate_mapping_range() works.
> >> Okay, this is one of the ways that this can be made to work correctly,
> >> as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this
> >> was the DAX mistake)
> > Hence this patch, to enforce that no dma-buf exporter gets this wrong.
> > Which some did, and then blamed bug reporters for the resulting splats
> > :-) One of the things we've reverted was the ttm huge pte support,
> > since that doesn't have the pmd_special flag (yet) and so would let
> > gup_fast through.
>
> The problem is not only gup, a lot of people seem to assume that when
> you are able to grab a reference to a page that the ptes pointing to
> that page can't change any more. And that's obviously incorrect.
>
> I witnessed tons of discussions about that already. Some customers even
> modified our code assuming that and then wondered why the heck they ran
> into data corruption.
>
> It's gotten so bad that I've even proposed intentionally mangling the
> page reference count on TTM allocated pages:
> https://patchwork.kernel.org/project/dri-devel/patch/20220927143529.135689-1-christian.koenig@amd.com/

Yeah maybe something like this could be applied after we land this
patch here. Well maybe should have the same check in gem mmap code to
make sure no driver

> I think it would be better that instead of having special flags in the
> ptes and vmas that you can't follow them to a page structure we would
> add something to the page indicating that you can't grab a reference to
> it. But this might break some use cases as well.

Afaik the problem with that is that there's no free page bits left for
these debug checks. Plus the pte+vma flags are the flags to make this
clear already.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23  9:30                 ` Daniel Vetter
  0 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-23  9:30 UTC (permalink / raw)
  To: Christian König
  Cc: Jason Gunthorpe, DRI Development, Intel Graphics Development,
	Thomas Zimmermann, Suren Baghdasaryan, Matthew Wilcox,
	John Stultz, Daniel Vetter, Sumit Semwal, linux-media,
	linaro-mm-sig

On Wed, 23 Nov 2022 at 10:06, Christian König <christian.koenig@amd.com> wrote:
> Am 22.11.22 um 20:50 schrieb Daniel Vetter:
> > On Tue, 22 Nov 2022 at 20:34, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >> On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:
> >>> You nuke all the ptes. Drivers that move have slightly more than a
> >>> bare struct file, they also have a struct address_space so that
> >>> invalidate_mapping_range() works.
> >> Okay, this is one of the ways that this can be made to work correctly,
> >> as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this
> >> was the DAX mistake)
> > Hence this patch, to enforce that no dma-buf exporter gets this wrong.
> > Which some did, and then blamed bug reporters for the resulting splats
> > :-) One of the things we've reverted was the ttm huge pte support,
> > since that doesn't have the pmd_special flag (yet) and so would let
> > gup_fast through.
>
> The problem is not only gup, a lot of people seem to assume that when
> you are able to grab a reference to a page that the ptes pointing to
> that page can't change any more. And that's obviously incorrect.
>
> I witnessed tons of discussions about that already. Some customers even
> modified our code assuming that and then wondered why the heck they ran
> into data corruption.
>
> It's gotten so bad that I've even proposed intentionally mangling the
> page reference count on TTM allocated pages:
> https://patchwork.kernel.org/project/dri-devel/patch/20220927143529.135689-1-christian.koenig@amd.com/

Yeah maybe something like this could be applied after we land this
patch here. Well maybe should have the same check in gem mmap code to
make sure no driver

> I think it would be better that instead of having special flags in the
> ptes and vmas that you can't follow them to a page structure we would
> add something to the page indicating that you can't grab a reference to
> it. But this might break some use cases as well.

Afaik the problem with that is that there's no free page bits left for
these debug checks. Plus the pte+vma flags are the flags to make this
clear already.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-23  8:07   ` Thomas Zimmermann
  (?)
@ 2022-11-23  9:33     ` Daniel Vetter
  -1 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-23  9:33 UTC (permalink / raw)
  To: Thomas Zimmermann, Laurent Pinchart
  Cc: Intel Graphics Development, Matthew Wilcox, Christian König,
	linaro-mm-sig, Jason Gunthorpe, John Stultz, DRI Development,
	Daniel Vetter, Suren Baghdasaryan, Sumit Semwal, linux-media

On Wed, 23 Nov 2022 at 09:07, Thomas Zimmermann <tzimmermann@suse.de> wrote:
>
> Hi
>
> Am 22.11.22 um 18:08 schrieb Daniel Vetter:
> > tldr; DMA buffers aren't normal memory, expecting that you can use
> > them like that (like calling get_user_pages works, or that they're
> > accounting like any other normal memory) cannot be guaranteed.
> >
> > Since some userspace only runs on integrated devices, where all
> > buffers are actually all resident system memory, there's a huge
> > temptation to assume that a struct page is always present and useable
> > like for any more pagecache backed mmap. This has the potential to
> > result in a uapi nightmare.
> >
> > To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
> > blocks get_user_pages and all the other struct page based
> > infrastructure for everyone. In spirit this is the uapi counterpart to
> > the kernel-internal CONFIG_DMABUF_DEBUG.
> >
> > Motivated by a recent patch which wanted to swich the system dma-buf
> > heap to vm_insert_page instead of vm_insert_pfn.
> >
> > v2:
> >
> > Jason brought up that we also want to guarantee that all ptes have the
> > pte_special flag set, to catch fast get_user_pages (on architectures
> > that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
> > still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.
> >
> >  From auditing the various functions to insert pfn pte entires
> > (vm_insert_pfn_prot, remap_pfn_range and all it's callers like
> > dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
> > this should be the correct flag to check for.
> >
> > v3: Change to WARN_ON_ONCE (Thomas Zimmermann)
> >
> > References: https://lore.kernel.org/lkml/CAKMK7uHi+mG0z0HUmNt13QCCvutuRVjpcR0NjRL12k-WbWzkRg@mail.gmail.com/
> > Acked-by: Christian König <christian.koenig@amd.com>
> > Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
> > Cc: Thomas Zimmermann <tzimmermann@suse.de>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Suren Baghdasaryan <surenb@google.com>
> > Cc: Matthew Wilcox <willy@infradead.org>
> > Cc: John Stultz <john.stultz@linaro.org>
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: "Christian König" <christian.koenig@amd.com>
> > Cc: linux-media@vger.kernel.org
> > Cc: linaro-mm-sig@lists.linaro.org
> > --
> > Ok I entirely forgot about this patch but stumbled over it and checked
> > what's up with it no. I think it's ready now for merging:
> > - shmem helper patches to fix up vgem landed
> > - ttm has been fixed since a while
> > - I don't think we've had any other open issues
> >
> > Time to lock down this uapi contract for real?
> > -Daniel
> > ---
> >   drivers/dma-buf/dma-buf.c | 4 ++++
> >   1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > index b6c36914e7c6..88718665c3c3 100644
> > --- a/drivers/dma-buf/dma-buf.c
> > +++ b/drivers/dma-buf/dma-buf.c
> > @@ -150,6 +150,8 @@ static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
> >       ret = dmabuf->ops->mmap(dmabuf, vma);
> >       dma_resv_unlock(dmabuf->resv);
> >
> > +     WARN_ON_ONCE(!(vma->vm_flags & VM_PFNMAP));
>
> Well , I already a-b'ed this, but does it work with DMa helpers. I'm
> asking because of [1].
>
> Best regards
> Thomas
>
> [1]
> https://elixir.bootlin.com/linux/v6.1-rc6/source/drivers/gpu/drm/drm_gem_dma_helper.c#L533

This one is entertaining, but also doesn't matter, because the
remap_pfn_range that the various dma_mmap functions boil down to sets
the VM_PFNMAP and a pile of other flags. See

https://elixir.bootlin.com/linux/v6.1-rc6/source/mm/memory.c#L2518

I have no idea why Laurent cleared this flag here just so it gets
reset again a bit later when he added that code. Laurent?
-Daniel

> > +
> >       return ret;
> >   }
> >
> > @@ -1495,6 +1497,8 @@ int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
> >       ret = dmabuf->ops->mmap(dmabuf, vma);
> >       dma_resv_unlock(dmabuf->resv);
> >
> > +     WARN_ON_ONCE(!(vma->vm_flags & VM_PFNMAP));
> > +
> >       return ret;
> >   }
> >   EXPORT_SYMBOL_NS_GPL(dma_buf_mmap, DMA_BUF);
>
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Ivo Totev



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Intel-gfx] [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23  9:33     ` Daniel Vetter
  0 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-23  9:33 UTC (permalink / raw)
  To: Thomas Zimmermann, Laurent Pinchart
  Cc: Intel Graphics Development, Matthew Wilcox, Christian König,
	linaro-mm-sig, Jason Gunthorpe, John Stultz, DRI Development,
	Daniel Vetter, Suren Baghdasaryan, Sumit Semwal, linux-media

On Wed, 23 Nov 2022 at 09:07, Thomas Zimmermann <tzimmermann@suse.de> wrote:
>
> Hi
>
> Am 22.11.22 um 18:08 schrieb Daniel Vetter:
> > tldr; DMA buffers aren't normal memory, expecting that you can use
> > them like that (like calling get_user_pages works, or that they're
> > accounting like any other normal memory) cannot be guaranteed.
> >
> > Since some userspace only runs on integrated devices, where all
> > buffers are actually all resident system memory, there's a huge
> > temptation to assume that a struct page is always present and useable
> > like for any more pagecache backed mmap. This has the potential to
> > result in a uapi nightmare.
> >
> > To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
> > blocks get_user_pages and all the other struct page based
> > infrastructure for everyone. In spirit this is the uapi counterpart to
> > the kernel-internal CONFIG_DMABUF_DEBUG.
> >
> > Motivated by a recent patch which wanted to swich the system dma-buf
> > heap to vm_insert_page instead of vm_insert_pfn.
> >
> > v2:
> >
> > Jason brought up that we also want to guarantee that all ptes have the
> > pte_special flag set, to catch fast get_user_pages (on architectures
> > that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
> > still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.
> >
> >  From auditing the various functions to insert pfn pte entires
> > (vm_insert_pfn_prot, remap_pfn_range and all it's callers like
> > dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
> > this should be the correct flag to check for.
> >
> > v3: Change to WARN_ON_ONCE (Thomas Zimmermann)
> >
> > References: https://lore.kernel.org/lkml/CAKMK7uHi+mG0z0HUmNt13QCCvutuRVjpcR0NjRL12k-WbWzkRg@mail.gmail.com/
> > Acked-by: Christian König <christian.koenig@amd.com>
> > Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
> > Cc: Thomas Zimmermann <tzimmermann@suse.de>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Suren Baghdasaryan <surenb@google.com>
> > Cc: Matthew Wilcox <willy@infradead.org>
> > Cc: John Stultz <john.stultz@linaro.org>
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: "Christian König" <christian.koenig@amd.com>
> > Cc: linux-media@vger.kernel.org
> > Cc: linaro-mm-sig@lists.linaro.org
> > --
> > Ok I entirely forgot about this patch but stumbled over it and checked
> > what's up with it no. I think it's ready now for merging:
> > - shmem helper patches to fix up vgem landed
> > - ttm has been fixed since a while
> > - I don't think we've had any other open issues
> >
> > Time to lock down this uapi contract for real?
> > -Daniel
> > ---
> >   drivers/dma-buf/dma-buf.c | 4 ++++
> >   1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > index b6c36914e7c6..88718665c3c3 100644
> > --- a/drivers/dma-buf/dma-buf.c
> > +++ b/drivers/dma-buf/dma-buf.c
> > @@ -150,6 +150,8 @@ static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
> >       ret = dmabuf->ops->mmap(dmabuf, vma);
> >       dma_resv_unlock(dmabuf->resv);
> >
> > +     WARN_ON_ONCE(!(vma->vm_flags & VM_PFNMAP));
>
> Well , I already a-b'ed this, but does it work with DMa helpers. I'm
> asking because of [1].
>
> Best regards
> Thomas
>
> [1]
> https://elixir.bootlin.com/linux/v6.1-rc6/source/drivers/gpu/drm/drm_gem_dma_helper.c#L533

This one is entertaining, but also doesn't matter, because the
remap_pfn_range that the various dma_mmap functions boil down to sets
the VM_PFNMAP and a pile of other flags. See

https://elixir.bootlin.com/linux/v6.1-rc6/source/mm/memory.c#L2518

I have no idea why Laurent cleared this flag here just so it gets
reset again a bit later when he added that code. Laurent?
-Daniel

> > +
> >       return ret;
> >   }
> >
> > @@ -1495,6 +1497,8 @@ int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
> >       ret = dmabuf->ops->mmap(dmabuf, vma);
> >       dma_resv_unlock(dmabuf->resv);
> >
> > +     WARN_ON_ONCE(!(vma->vm_flags & VM_PFNMAP));
> > +
> >       return ret;
> >   }
> >   EXPORT_SYMBOL_NS_GPL(dma_buf_mmap, DMA_BUF);
>
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Ivo Totev



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23  9:33     ` Daniel Vetter
  0 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-23  9:33 UTC (permalink / raw)
  To: Thomas Zimmermann, Laurent Pinchart
  Cc: DRI Development, Intel Graphics Development, Matthew Wilcox,
	Sumit Semwal, linaro-mm-sig, Jason Gunthorpe, John Stultz,
	Daniel Vetter, Suren Baghdasaryan, Christian König,
	linux-media

On Wed, 23 Nov 2022 at 09:07, Thomas Zimmermann <tzimmermann@suse.de> wrote:
>
> Hi
>
> Am 22.11.22 um 18:08 schrieb Daniel Vetter:
> > tldr; DMA buffers aren't normal memory, expecting that you can use
> > them like that (like calling get_user_pages works, or that they're
> > accounting like any other normal memory) cannot be guaranteed.
> >
> > Since some userspace only runs on integrated devices, where all
> > buffers are actually all resident system memory, there's a huge
> > temptation to assume that a struct page is always present and useable
> > like for any more pagecache backed mmap. This has the potential to
> > result in a uapi nightmare.
> >
> > To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
> > blocks get_user_pages and all the other struct page based
> > infrastructure for everyone. In spirit this is the uapi counterpart to
> > the kernel-internal CONFIG_DMABUF_DEBUG.
> >
> > Motivated by a recent patch which wanted to swich the system dma-buf
> > heap to vm_insert_page instead of vm_insert_pfn.
> >
> > v2:
> >
> > Jason brought up that we also want to guarantee that all ptes have the
> > pte_special flag set, to catch fast get_user_pages (on architectures
> > that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
> > still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.
> >
> >  From auditing the various functions to insert pfn pte entires
> > (vm_insert_pfn_prot, remap_pfn_range and all it's callers like
> > dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
> > this should be the correct flag to check for.
> >
> > v3: Change to WARN_ON_ONCE (Thomas Zimmermann)
> >
> > References: https://lore.kernel.org/lkml/CAKMK7uHi+mG0z0HUmNt13QCCvutuRVjpcR0NjRL12k-WbWzkRg@mail.gmail.com/
> > Acked-by: Christian König <christian.koenig@amd.com>
> > Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
> > Cc: Thomas Zimmermann <tzimmermann@suse.de>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Suren Baghdasaryan <surenb@google.com>
> > Cc: Matthew Wilcox <willy@infradead.org>
> > Cc: John Stultz <john.stultz@linaro.org>
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: "Christian König" <christian.koenig@amd.com>
> > Cc: linux-media@vger.kernel.org
> > Cc: linaro-mm-sig@lists.linaro.org
> > --
> > Ok I entirely forgot about this patch but stumbled over it and checked
> > what's up with it no. I think it's ready now for merging:
> > - shmem helper patches to fix up vgem landed
> > - ttm has been fixed since a while
> > - I don't think we've had any other open issues
> >
> > Time to lock down this uapi contract for real?
> > -Daniel
> > ---
> >   drivers/dma-buf/dma-buf.c | 4 ++++
> >   1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > index b6c36914e7c6..88718665c3c3 100644
> > --- a/drivers/dma-buf/dma-buf.c
> > +++ b/drivers/dma-buf/dma-buf.c
> > @@ -150,6 +150,8 @@ static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
> >       ret = dmabuf->ops->mmap(dmabuf, vma);
> >       dma_resv_unlock(dmabuf->resv);
> >
> > +     WARN_ON_ONCE(!(vma->vm_flags & VM_PFNMAP));
>
> Well , I already a-b'ed this, but does it work with DMa helpers. I'm
> asking because of [1].
>
> Best regards
> Thomas
>
> [1]
> https://elixir.bootlin.com/linux/v6.1-rc6/source/drivers/gpu/drm/drm_gem_dma_helper.c#L533

This one is entertaining, but also doesn't matter, because the
remap_pfn_range that the various dma_mmap functions boil down to sets
the VM_PFNMAP and a pile of other flags. See

https://elixir.bootlin.com/linux/v6.1-rc6/source/mm/memory.c#L2518

I have no idea why Laurent cleared this flag here just so it gets
reset again a bit later when he added that code. Laurent?
-Daniel

> > +
> >       return ret;
> >   }
> >
> > @@ -1495,6 +1497,8 @@ int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
> >       ret = dmabuf->ops->mmap(dmabuf, vma);
> >       dma_resv_unlock(dmabuf->resv);
> >
> > +     WARN_ON_ONCE(!(vma->vm_flags & VM_PFNMAP));
> > +
> >       return ret;
> >   }
> >   EXPORT_SYMBOL_NS_GPL(dma_buf_mmap, DMA_BUF);
>
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Ivo Totev



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-23  9:30                 ` [Intel-gfx] " Daniel Vetter
  (?)
@ 2022-11-23  9:39                   ` Christian König
  -1 siblings, 0 replies; 72+ messages in thread
From: Christian König @ 2022-11-23  9:39 UTC (permalink / raw)
  To: Daniel Vetter, Christian König
  Cc: Intel Graphics Development, Matthew Wilcox, linaro-mm-sig,
	Jason Gunthorpe, John Stultz, DRI Development, Thomas Zimmermann,
	Daniel Vetter, Suren Baghdasaryan, Sumit Semwal, linux-media

Am 23.11.22 um 10:30 schrieb Daniel Vetter:
> On Wed, 23 Nov 2022 at 10:06, Christian König <christian.koenig@amd.com> wrote:
>> Am 22.11.22 um 20:50 schrieb Daniel Vetter:
>>> On Tue, 22 Nov 2022 at 20:34, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>>>> On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:
>>>>> You nuke all the ptes. Drivers that move have slightly more than a
>>>>> bare struct file, they also have a struct address_space so that
>>>>> invalidate_mapping_range() works.
>>>> Okay, this is one of the ways that this can be made to work correctly,
>>>> as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this
>>>> was the DAX mistake)
>>> Hence this patch, to enforce that no dma-buf exporter gets this wrong.
>>> Which some did, and then blamed bug reporters for the resulting splats
>>> :-) One of the things we've reverted was the ttm huge pte support,
>>> since that doesn't have the pmd_special flag (yet) and so would let
>>> gup_fast through.
>> The problem is not only gup, a lot of people seem to assume that when
>> you are able to grab a reference to a page that the ptes pointing to
>> that page can't change any more. And that's obviously incorrect.
>>
>> I witnessed tons of discussions about that already. Some customers even
>> modified our code assuming that and then wondered why the heck they ran
>> into data corruption.
>>
>> It's gotten so bad that I've even proposed intentionally mangling the
>> page reference count on TTM allocated pages:
>> https://patchwork.kernel.org/project/dri-devel/patch/20220927143529.135689-1-christian.koenig@amd.com/
> Yeah maybe something like this could be applied after we land this
> patch here.

I think both should land ASAP. We don't have any other way than to 
clearly point out incorrect approaches if we want to prevent the 
resulting data corruption.

> Well maybe should have the same check in gem mmap code to
> make sure no driver

Reads like the sentence is somehow cut of?

>
>> I think it would be better that instead of having special flags in the
>> ptes and vmas that you can't follow them to a page structure we would
>> add something to the page indicating that you can't grab a reference to
>> it. But this might break some use cases as well.
> Afaik the problem with that is that there's no free page bits left for
> these debug checks. Plus the pte+vma flags are the flags to make this
> clear already.

Maybe a GFP flag to set the page reference count to zero or something 
like this?

Christian.

> -Daniel


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Intel-gfx] [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23  9:39                   ` Christian König
  0 siblings, 0 replies; 72+ messages in thread
From: Christian König @ 2022-11-23  9:39 UTC (permalink / raw)
  To: Daniel Vetter, Christian König
  Cc: Intel Graphics Development, Matthew Wilcox, linaro-mm-sig,
	Jason Gunthorpe, John Stultz, DRI Development, Thomas Zimmermann,
	Daniel Vetter, Suren Baghdasaryan, Sumit Semwal, linux-media

Am 23.11.22 um 10:30 schrieb Daniel Vetter:
> On Wed, 23 Nov 2022 at 10:06, Christian König <christian.koenig@amd.com> wrote:
>> Am 22.11.22 um 20:50 schrieb Daniel Vetter:
>>> On Tue, 22 Nov 2022 at 20:34, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>>>> On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:
>>>>> You nuke all the ptes. Drivers that move have slightly more than a
>>>>> bare struct file, they also have a struct address_space so that
>>>>> invalidate_mapping_range() works.
>>>> Okay, this is one of the ways that this can be made to work correctly,
>>>> as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this
>>>> was the DAX mistake)
>>> Hence this patch, to enforce that no dma-buf exporter gets this wrong.
>>> Which some did, and then blamed bug reporters for the resulting splats
>>> :-) One of the things we've reverted was the ttm huge pte support,
>>> since that doesn't have the pmd_special flag (yet) and so would let
>>> gup_fast through.
>> The problem is not only gup, a lot of people seem to assume that when
>> you are able to grab a reference to a page that the ptes pointing to
>> that page can't change any more. And that's obviously incorrect.
>>
>> I witnessed tons of discussions about that already. Some customers even
>> modified our code assuming that and then wondered why the heck they ran
>> into data corruption.
>>
>> It's gotten so bad that I've even proposed intentionally mangling the
>> page reference count on TTM allocated pages:
>> https://patchwork.kernel.org/project/dri-devel/patch/20220927143529.135689-1-christian.koenig@amd.com/
> Yeah maybe something like this could be applied after we land this
> patch here.

I think both should land ASAP. We don't have any other way than to 
clearly point out incorrect approaches if we want to prevent the 
resulting data corruption.

> Well maybe should have the same check in gem mmap code to
> make sure no driver

Reads like the sentence is somehow cut of?

>
>> I think it would be better that instead of having special flags in the
>> ptes and vmas that you can't follow them to a page structure we would
>> add something to the page indicating that you can't grab a reference to
>> it. But this might break some use cases as well.
> Afaik the problem with that is that there's no free page bits left for
> these debug checks. Plus the pte+vma flags are the flags to make this
> clear already.

Maybe a GFP flag to set the page reference count to zero or something 
like this?

Christian.

> -Daniel


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23  9:39                   ` Christian König
  0 siblings, 0 replies; 72+ messages in thread
From: Christian König @ 2022-11-23  9:39 UTC (permalink / raw)
  To: Daniel Vetter, Christian König
  Cc: Jason Gunthorpe, DRI Development, Intel Graphics Development,
	Thomas Zimmermann, Suren Baghdasaryan, Matthew Wilcox,
	John Stultz, Daniel Vetter, Sumit Semwal, linux-media,
	linaro-mm-sig

Am 23.11.22 um 10:30 schrieb Daniel Vetter:
> On Wed, 23 Nov 2022 at 10:06, Christian König <christian.koenig@amd.com> wrote:
>> Am 22.11.22 um 20:50 schrieb Daniel Vetter:
>>> On Tue, 22 Nov 2022 at 20:34, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>>>> On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:
>>>>> You nuke all the ptes. Drivers that move have slightly more than a
>>>>> bare struct file, they also have a struct address_space so that
>>>>> invalidate_mapping_range() works.
>>>> Okay, this is one of the ways that this can be made to work correctly,
>>>> as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this
>>>> was the DAX mistake)
>>> Hence this patch, to enforce that no dma-buf exporter gets this wrong.
>>> Which some did, and then blamed bug reporters for the resulting splats
>>> :-) One of the things we've reverted was the ttm huge pte support,
>>> since that doesn't have the pmd_special flag (yet) and so would let
>>> gup_fast through.
>> The problem is not only gup, a lot of people seem to assume that when
>> you are able to grab a reference to a page that the ptes pointing to
>> that page can't change any more. And that's obviously incorrect.
>>
>> I witnessed tons of discussions about that already. Some customers even
>> modified our code assuming that and then wondered why the heck they ran
>> into data corruption.
>>
>> It's gotten so bad that I've even proposed intentionally mangling the
>> page reference count on TTM allocated pages:
>> https://patchwork.kernel.org/project/dri-devel/patch/20220927143529.135689-1-christian.koenig@amd.com/
> Yeah maybe something like this could be applied after we land this
> patch here.

I think both should land ASAP. We don't have any other way than to 
clearly point out incorrect approaches if we want to prevent the 
resulting data corruption.

> Well maybe should have the same check in gem mmap code to
> make sure no driver

Reads like the sentence is somehow cut of?

>
>> I think it would be better that instead of having special flags in the
>> ptes and vmas that you can't follow them to a page structure we would
>> add something to the page indicating that you can't grab a reference to
>> it. But this might break some use cases as well.
> Afaik the problem with that is that there's no free page bits left for
> these debug checks. Plus the pte+vma flags are the flags to make this
> clear already.

Maybe a GFP flag to set the page reference count to zero or something 
like this?

Christian.

> -Daniel


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [Intel-gfx] ✗ Fi.CI.IGT: failure for dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-22 17:08 ` Daniel Vetter
                   ` (5 preceding siblings ...)
  (?)
@ 2022-11-23  9:58 ` Patchwork
  -1 siblings, 0 replies; 72+ messages in thread
From: Patchwork @ 2022-11-23  9:58 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 35775 bytes --]

== Series Details ==

Series: dma-buf: Require VM_PFNMAP vma for mmap
URL   : https://patchwork.freedesktop.org/series/111210/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_12418_full -> Patchwork_111210v1_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_111210v1_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_111210v1_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (11 -> 9)
------------------------------

  Missing    (2): pig-skl-6260u pig-glk-j5005 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_111210v1_full:

### IGT changes ###

#### Possible regressions ####

  * igt@kms_mmap_write_crc@main@pipe-a-edp-1:
    - shard-tglb:         [PASS][1] -> [DMESG-WARN][2] +3 similar issues
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-tglb3/igt@kms_mmap_write_crc@main@pipe-a-edp-1.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-tglb2/igt@kms_mmap_write_crc@main@pipe-a-edp-1.html

  * igt@kms_mmap_write_crc@main@pipe-a-hdmi-a-1:
    - shard-glk:          [PASS][3] -> [DMESG-WARN][4] +3 similar issues
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-glk3/igt@kms_mmap_write_crc@main@pipe-a-hdmi-a-1.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-glk9/igt@kms_mmap_write_crc@main@pipe-a-hdmi-a-1.html

  * igt@prime_mmap_coherency@ioctl-errors:
    - shard-snb:          [PASS][5] -> [DMESG-WARN][6] +3 similar issues
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-snb7/igt@prime_mmap_coherency@ioctl-errors.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-snb5/igt@prime_mmap_coherency@ioctl-errors.html
    - shard-apl:          [PASS][7] -> [INCOMPLETE][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-apl1/igt@prime_mmap_coherency@ioctl-errors.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-apl7/igt@prime_mmap_coherency@ioctl-errors.html
    - shard-skl:          [PASS][9] -> [INCOMPLETE][10] +1 similar issue
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-skl10/igt@prime_mmap_coherency@ioctl-errors.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-skl7/igt@prime_mmap_coherency@ioctl-errors.html

  * igt@prime_mmap_coherency@read:
    - shard-iclb:         [PASS][11] -> [DMESG-WARN][12] +3 similar issues
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-iclb1/igt@prime_mmap_coherency@read.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-iclb6/igt@prime_mmap_coherency@read.html
    - shard-apl:          NOTRUN -> [DMESG-WARN][13]
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-apl1/igt@prime_mmap_coherency@read.html

  * igt@prime_mmap_kms@buffer-sharing:
    - shard-apl:          [PASS][14] -> [DMESG-WARN][15] +1 similar issue
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-apl7/igt@prime_mmap_kms@buffer-sharing.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-apl3/igt@prime_mmap_kms@buffer-sharing.html
    - shard-skl:          [PASS][16] -> [DMESG-WARN][17]
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-skl4/igt@prime_mmap_kms@buffer-sharing.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-skl10/igt@prime_mmap_kms@buffer-sharing.html

  
#### Warnings ####

  * igt@runner@aborted:
    - shard-apl:          ([FAIL][18], [FAIL][19], [FAIL][20]) ([i915#180] / [i915#3002] / [i915#4312]) -> ([FAIL][21], [FAIL][22], [FAIL][23], [FAIL][24], [FAIL][25], [FAIL][26], [FAIL][27], [FAIL][28], [FAIL][29], [FAIL][30], [FAIL][31], [FAIL][32], [FAIL][33], [FAIL][34]) ([i915#3002] / [i915#4312])
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-apl1/igt@runner@aborted.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-apl7/igt@runner@aborted.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-apl6/igt@runner@aborted.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-apl7/igt@runner@aborted.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-apl7/igt@runner@aborted.html
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-apl6/igt@runner@aborted.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-apl3/igt@runner@aborted.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-apl3/igt@runner@aborted.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-apl8/igt@runner@aborted.html
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-apl8/igt@runner@aborted.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-apl7/igt@runner@aborted.html
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-apl1/igt@runner@aborted.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-apl1/igt@runner@aborted.html
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-apl3/igt@runner@aborted.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-apl3/igt@runner@aborted.html
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-apl1/igt@runner@aborted.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-apl3/igt@runner@aborted.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@prime_mmap_coherency@ioctl-errors:
    - {shard-rkl}:        [PASS][35] -> [DMESG-WARN][36]
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-rkl-5/igt@prime_mmap_coherency@ioctl-errors.html
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-rkl-3/igt@prime_mmap_coherency@ioctl-errors.html

  
Known issues
------------

  Here are the changes found in Patchwork_111210v1_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_balancer@parallel-bb-first:
    - shard-iclb:         [PASS][37] -> [SKIP][38] ([i915#4525])
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-iclb2/igt@gem_exec_balancer@parallel-bb-first.html
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-iclb7/igt@gem_exec_balancer@parallel-bb-first.html

  * igt@gem_exec_fair@basic-none@vcs1:
    - shard-iclb:         NOTRUN -> [FAIL][39] ([i915#2842])
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-iclb1/igt@gem_exec_fair@basic-none@vcs1.html

  * igt@gem_lmem_swapping@heavy-verify-random-ccs:
    - shard-tglb:         NOTRUN -> [SKIP][40] ([i915#4613])
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-tglb6/igt@gem_lmem_swapping@heavy-verify-random-ccs.html

  * igt@gem_pread@exhaustion:
    - shard-skl:          NOTRUN -> [INCOMPLETE][41] ([i915#7248])
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-skl7/igt@gem_pread@exhaustion.html

  * igt@gem_pwrite@basic-exhaustion:
    - shard-apl:          NOTRUN -> [INCOMPLETE][42] ([i915#7248])
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-apl1/igt@gem_pwrite@basic-exhaustion.html

  * igt@gem_userptr_blits@dmabuf-sync:
    - shard-skl:          NOTRUN -> [SKIP][43] ([fdo#109271] / [i915#3323])
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-skl7/igt@gem_userptr_blits@dmabuf-sync.html

  * igt@i915_module_load@resize-bar:
    - shard-tglb:         NOTRUN -> [SKIP][44] ([i915#6412])
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-tglb6/igt@i915_module_load@resize-bar.html

  * igt@i915_pipe_stress@stress-xrgb8888-untiled:
    - shard-apl:          NOTRUN -> [FAIL][45] ([i915#7036])
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-apl1/igt@i915_pipe_stress@stress-xrgb8888-untiled.html

  * igt@i915_pm_dc@dc6-dpms:
    - shard-skl:          NOTRUN -> [FAIL][46] ([i915#3989] / [i915#454])
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-skl1/igt@i915_pm_dc@dc6-dpms.html

  * igt@i915_pm_dc@dc9-dpms:
    - shard-iclb:         [PASS][47] -> [SKIP][48] ([i915#4281])
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-iclb7/igt@i915_pm_dc@dc9-dpms.html
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-iclb3/igt@i915_pm_dc@dc9-dpms.html

  * igt@i915_pm_rpm@system-suspend:
    - shard-skl:          [PASS][49] -> [INCOMPLETE][50] ([i915#7253])
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-skl6/igt@i915_pm_rpm@system-suspend.html
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-skl7/igt@i915_pm_rpm@system-suspend.html

  * igt@i915_suspend@forcewake:
    - shard-tglb:         [PASS][51] -> [INCOMPLETE][52] ([i915#2411])
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-tglb3/igt@i915_suspend@forcewake.html
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-tglb5/igt@i915_suspend@forcewake.html

  * igt@kms_big_fb@4-tiled-64bpp-rotate-0:
    - shard-apl:          NOTRUN -> [SKIP][53] ([fdo#109271]) +4 similar issues
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-apl1/igt@kms_big_fb@4-tiled-64bpp-rotate-0.html

  * igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-180-async-flip:
    - shard-skl:          NOTRUN -> [FAIL][54] ([i915#3763])
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-skl1/igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-180-async-flip.html

  * igt@kms_ccs@pipe-a-bad-pixel-format-y_tiled_gen12_mc_ccs:
    - shard-skl:          NOTRUN -> [SKIP][55] ([fdo#109271] / [i915#3886]) +1 similar issue
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-skl7/igt@kms_ccs@pipe-a-bad-pixel-format-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-a-bad-rotation-90-4_tiled_dg2_mc_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][56] ([i915#6095])
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-tglb6/igt@kms_ccs@pipe-a-bad-rotation-90-4_tiled_dg2_mc_ccs.html

  * igt@kms_ccs@pipe-b-crc-sprite-planes-basic-y_tiled_gen12_mc_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][57] ([i915#3689] / [i915#3886])
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-tglb6/igt@kms_ccs@pipe-b-crc-sprite-planes-basic-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-c-crc-primary-basic-y_tiled_gen12_mc_ccs:
    - shard-apl:          NOTRUN -> [SKIP][58] ([fdo#109271] / [i915#3886])
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-apl1/igt@kms_ccs@pipe-c-crc-primary-basic-y_tiled_gen12_mc_ccs.html

  * igt@kms_chamelium@dp-crc-single:
    - shard-skl:          NOTRUN -> [SKIP][59] ([fdo#109271] / [fdo#111827]) +1 similar issue
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-skl7/igt@kms_chamelium@dp-crc-single.html

  * igt@kms_cursor_crc@cursor-offscreen-32x10:
    - shard-skl:          NOTRUN -> [SKIP][60] ([fdo#109271]) +60 similar issues
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-skl7/igt@kms_cursor_crc@cursor-offscreen-32x10.html

  * igt@kms_cursor_crc@cursor-rapid-movement-max-size:
    - shard-tglb:         NOTRUN -> [SKIP][61] ([i915#3555])
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-tglb6/igt@kms_cursor_crc@cursor-rapid-movement-max-size.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible@b-edp1:
    - shard-skl:          [PASS][62] -> [FAIL][63] ([i915#79])
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-skl9/igt@kms_flip@flip-vs-expired-vblank-interruptible@b-edp1.html
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-skl10/igt@kms_flip@flip-vs-expired-vblank-interruptible@b-edp1.html

  * igt@kms_flip@plain-flip-ts-check@a-edp1:
    - shard-skl:          [PASS][64] -> [FAIL][65] ([i915#2122]) +2 similar issues
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-skl1/igt@kms_flip@plain-flip-ts-check@a-edp1.html
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-skl1/igt@kms_flip@plain-flip-ts-check@a-edp1.html

  * igt@kms_flip_scaled_crc@flip-32bpp-yftileccs-to-64bpp-yftile-upscaling@pipe-a-valid-mode:
    - shard-iclb:         NOTRUN -> [SKIP][66] ([i915#2587] / [i915#2672]) +2 similar issues
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-iclb7/igt@kms_flip_scaled_crc@flip-32bpp-yftileccs-to-64bpp-yftile-upscaling@pipe-a-valid-mode.html

  * igt@kms_flip_scaled_crc@flip-64bpp-4tile-to-16bpp-4tile-upscaling@pipe-a-default-mode:
    - shard-iclb:         NOTRUN -> [SKIP][67] ([i915#2672]) +3 similar issues
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-iclb2/igt@kms_flip_scaled_crc@flip-64bpp-4tile-to-16bpp-4tile-upscaling@pipe-a-default-mode.html

  * igt@kms_flip_scaled_crc@flip-64bpp-yftile-to-32bpp-yftile-downscaling@pipe-a-valid-mode:
    - shard-tglb:         NOTRUN -> [SKIP][68] ([i915#2587] / [i915#2672])
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-tglb6/igt@kms_flip_scaled_crc@flip-64bpp-yftile-to-32bpp-yftile-downscaling@pipe-a-valid-mode.html

  * igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytile-downscaling@pipe-a-default-mode:
    - shard-iclb:         NOTRUN -> [SKIP][69] ([i915#3555])
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-iclb2/igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytile-downscaling@pipe-a-default-mode.html

  * igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytilercccs-downscaling@pipe-a-default-mode:
    - shard-iclb:         NOTRUN -> [SKIP][70] ([i915#2672] / [i915#3555])
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-iclb3/igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytilercccs-downscaling@pipe-a-default-mode.html

  * igt@kms_plane_alpha_blend@constant-alpha-min@pipe-c-edp-1:
    - shard-skl:          NOTRUN -> [FAIL][71] ([i915#4573]) +2 similar issues
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-skl1/igt@kms_plane_alpha_blend@constant-alpha-min@pipe-c-edp-1.html

  * igt@kms_psr2_su@page_flip-nv12:
    - shard-iclb:         NOTRUN -> [SKIP][72] ([fdo#109642] / [fdo#111068] / [i915#658])
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-iclb7/igt@kms_psr2_su@page_flip-nv12.html

  * igt@kms_psr@psr2_sprite_plane_move:
    - shard-iclb:         [PASS][73] -> [SKIP][74] ([fdo#109441]) +1 similar issue
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-iclb2/igt@kms_psr@psr2_sprite_plane_move.html
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-iclb7/igt@kms_psr@psr2_sprite_plane_move.html

  * igt@kms_writeback@writeback-pixel-formats:
    - shard-skl:          NOTRUN -> [SKIP][75] ([fdo#109271] / [i915#2437])
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-skl7/igt@kms_writeback@writeback-pixel-formats.html

  * igt@perf_pmu@module-unload:
    - shard-skl:          [PASS][76] -> [DMESG-WARN][77] ([i915#1982])
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-skl4/igt@perf_pmu@module-unload.html
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-skl9/igt@perf_pmu@module-unload.html

  
#### Possible fixes ####

  * igt@drm_read@empty-nonblock:
    - {shard-rkl}:        [SKIP][78] ([i915#4098]) -> [PASS][79] +1 similar issue
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-rkl-3/igt@drm_read@empty-nonblock.html
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-rkl-6/igt@drm_read@empty-nonblock.html

  * igt@feature_discovery@psr1:
    - {shard-rkl}:        [SKIP][80] ([i915#658]) -> [PASS][81]
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-rkl-3/igt@feature_discovery@psr1.html
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-rkl-6/igt@feature_discovery@psr1.html

  * igt@gem_ctx_persistence@hang:
    - {shard-rkl}:        [SKIP][82] ([i915#6252]) -> [PASS][83]
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-rkl-5/igt@gem_ctx_persistence@hang.html
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-rkl-4/igt@gem_ctx_persistence@hang.html

  * igt@gem_exec_balancer@fairslice:
    - {shard-rkl}:        [SKIP][84] ([i915#6259]) -> [PASS][85]
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-rkl-5/igt@gem_exec_balancer@fairslice.html
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-rkl-4/igt@gem_exec_balancer@fairslice.html

  * igt@gem_exec_fair@basic-flow@rcs0:
    - shard-tglb:         [FAIL][86] ([i915#2842]) -> [PASS][87] +1 similar issue
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-tglb7/igt@gem_exec_fair@basic-flow@rcs0.html
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-tglb5/igt@gem_exec_fair@basic-flow@rcs0.html

  * igt@gem_exec_fair@basic-pace@rcs0:
    - shard-glk:          [FAIL][88] ([i915#2842]) -> [PASS][89]
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-glk1/igt@gem_exec_fair@basic-pace@rcs0.html
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-glk8/igt@gem_exec_fair@basic-pace@rcs0.html

  * igt@gem_exec_fair@basic-pace@vcs0:
    - shard-iclb:         [FAIL][90] ([i915#2842]) -> [PASS][91]
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-iclb2/igt@gem_exec_fair@basic-pace@vcs0.html
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-iclb7/igt@gem_exec_fair@basic-pace@vcs0.html

  * igt@gem_exec_reloc@basic-write-read-noreloc:
    - {shard-rkl}:        [SKIP][92] ([i915#3281]) -> [PASS][93] +3 similar issues
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-rkl-1/igt@gem_exec_reloc@basic-write-read-noreloc.html
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-rkl-5/igt@gem_exec_reloc@basic-write-read-noreloc.html

  * igt@gem_set_tiling_vs_pwrite:
    - {shard-rkl}:        [SKIP][94] ([i915#3282]) -> [PASS][95] +2 similar issues
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-rkl-1/igt@gem_set_tiling_vs_pwrite.html
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-rkl-5/igt@gem_set_tiling_vs_pwrite.html

  * igt@gem_softpin@evict-single-offset:
    - {shard-rkl}:        [FAIL][96] ([i915#4171]) -> [PASS][97]
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-rkl-5/igt@gem_softpin@evict-single-offset.html
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-rkl-4/igt@gem_softpin@evict-single-offset.html

  * igt@gen9_exec_parse@bb-secure:
    - {shard-rkl}:        [SKIP][98] ([i915#2527]) -> [PASS][99]
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-rkl-1/igt@gen9_exec_parse@bb-secure.html
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-rkl-5/igt@gen9_exec_parse@bb-secure.html

  * igt@i915_hangman@gt-engine-error@bcs0:
    - {shard-rkl}:        [SKIP][100] ([i915#6258]) -> [PASS][101]
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-rkl-5/igt@i915_hangman@gt-engine-error@bcs0.html
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-rkl-4/igt@i915_hangman@gt-engine-error@bcs0.html

  * igt@i915_pm_rc6_residency@rc6-idle@vcs0:
    - {shard-rkl}:        [WARN][102] ([i915#2681]) -> [PASS][103]
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-rkl-5/igt@i915_pm_rc6_residency@rc6-idle@vcs0.html
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-rkl-4/igt@i915_pm_rc6_residency@rc6-idle@vcs0.html

  * igt@i915_pm_rpm@dpms-mode-unset-lpsp:
    - {shard-rkl}:        [SKIP][104] ([i915#1397]) -> [PASS][105] +1 similar issue
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-rkl-3/igt@i915_pm_rpm@dpms-mode-unset-lpsp.html
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-rkl-6/igt@i915_pm_rpm@dpms-mode-unset-lpsp.html

  * igt@kms_color@ctm-green-to-red@pipe-c-edp-1:
    - shard-skl:          [DMESG-WARN][106] ([i915#1982]) -> [PASS][107]
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-skl1/igt@kms_color@ctm-green-to-red@pipe-c-edp-1.html
   [107]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-skl1/igt@kms_color@ctm-green-to-red@pipe-c-edp-1.html

  * igt@kms_cursor_legacy@flip-vs-cursor@atomic-transitions:
    - shard-iclb:         [FAIL][108] ([i915#2346]) -> [PASS][109]
   [108]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-iclb7/igt@kms_cursor_legacy@flip-vs-cursor@atomic-transitions.html
   [109]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-iclb1/igt@kms_cursor_legacy@flip-vs-cursor@atomic-transitions.html

  * igt@kms_cursor_legacy@flip-vs-cursor@atomic-transitions-varying-size:
    - shard-glk:          [FAIL][110] ([i915#2346]) -> [PASS][111]
   [110]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-glk2/igt@kms_cursor_legacy@flip-vs-cursor@atomic-transitions-varying-size.html
   [111]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-glk5/igt@kms_cursor_legacy@flip-vs-cursor@atomic-transitions-varying-size.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible@c-edp1:
    - shard-skl:          [FAIL][112] ([i915#79]) -> [PASS][113] +1 similar issue
   [112]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-skl9/igt@kms_flip@flip-vs-expired-vblank-interruptible@c-edp1.html
   [113]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-skl10/igt@kms_flip@flip-vs-expired-vblank-interruptible@c-edp1.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible@c-hdmi-a2:
    - shard-glk:          [FAIL][114] ([i915#79]) -> [PASS][115]
   [114]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-glk3/igt@kms_flip@flip-vs-expired-vblank-interruptible@c-hdmi-a2.html
   [115]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-glk8/igt@kms_flip@flip-vs-expired-vblank-interruptible@c-hdmi-a2.html

  * igt@kms_flip@flip-vs-suspend-interruptible@c-dp1:
    - shard-apl:          [DMESG-WARN][116] ([i915#180]) -> [PASS][117]
   [116]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-apl7/igt@kms_flip@flip-vs-suspend-interruptible@c-dp1.html
   [117]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-apl1/igt@kms_flip@flip-vs-suspend-interruptible@c-dp1.html

  * igt@kms_frontbuffer_tracking@fbc-modesetfrombusy:
    - shard-glk:          [FAIL][118] ([i915#2546]) -> [PASS][119]
   [118]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-glk2/igt@kms_frontbuffer_tracking@fbc-modesetfrombusy.html
   [119]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-glk1/igt@kms_frontbuffer_tracking@fbc-modesetfrombusy.html

  * igt@kms_frontbuffer_tracking@fbc-shrfb-scaledprimary:
    - {shard-rkl}:        [SKIP][120] ([i915#1849] / [i915#4098]) -> [PASS][121] +14 similar issues
   [120]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-rkl-3/igt@kms_frontbuffer_tracking@fbc-shrfb-scaledprimary.html
   [121]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-rkl-6/igt@kms_frontbuffer_tracking@fbc-shrfb-scaledprimary.html

  * igt@kms_psr@dpms:
    - {shard-rkl}:        [SKIP][122] ([i915#1072]) -> [PASS][123]
   [122]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-rkl-3/igt@kms_psr@dpms.html
   [123]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-rkl-6/igt@kms_psr@dpms.html

  * igt@kms_psr@psr2_cursor_mmap_gtt:
    - shard-iclb:         [SKIP][124] ([fdo#109441]) -> [PASS][125] +1 similar issue
   [124]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-iclb7/igt@kms_psr@psr2_cursor_mmap_gtt.html
   [125]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-iclb2/igt@kms_psr@psr2_cursor_mmap_gtt.html

  * igt@kms_psr_stress_test@flip-primary-invalidate-overlay:
    - {shard-rkl}:        [SKIP][126] ([i915#5461]) -> [PASS][127]
   [126]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-rkl-3/igt@kms_psr_stress_test@flip-primary-invalidate-overlay.html
   [127]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-rkl-6/igt@kms_psr_stress_test@flip-primary-invalidate-overlay.html

  * igt@kms_vblank@pipe-b-query-idle:
    - {shard-rkl}:        [SKIP][128] ([i915#1845] / [i915#4098]) -> [PASS][129] +27 similar issues
   [128]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-rkl-3/igt@kms_vblank@pipe-b-query-idle.html
   [129]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-rkl-6/igt@kms_vblank@pipe-b-query-idle.html

  * igt@perf@gen12-unprivileged-single-ctx-counters:
    - {shard-rkl}:        [SKIP][130] ([fdo#109289]) -> [PASS][131]
   [130]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-rkl-5/igt@perf@gen12-unprivileged-single-ctx-counters.html
   [131]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-rkl-4/igt@perf@gen12-unprivileged-single-ctx-counters.html

  * igt@perf@mi-rpc:
    - {shard-rkl}:        [SKIP][132] ([i915#2434]) -> [PASS][133]
   [132]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-rkl-3/igt@perf@mi-rpc.html
   [133]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-rkl-5/igt@perf@mi-rpc.html

  * igt@prime_vgem@basic-read:
    - {shard-rkl}:        [SKIP][134] ([fdo#109295] / [i915#3291] / [i915#3708]) -> [PASS][135]
   [134]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-rkl-3/igt@prime_vgem@basic-read.html
   [135]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-rkl-5/igt@prime_vgem@basic-read.html

  
#### Warnings ####

  * igt@i915_pm_rc6_residency@rc6-idle@vecs0:
    - shard-iclb:         [WARN][136] ([i915#2684]) -> [FAIL][137] ([i915#2684])
   [136]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12418/shard-iclb7/igt@i915_pm_rc6_residency@rc6-idle@vecs0.html
   [137]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/shard-iclb1/igt@i915_pm_rc6_residency@rc6-idle@vecs0.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109279]: https://bugs.freedesktop.org/show_bug.cgi?id=109279
  [fdo#109283]: https://bugs.freedesktop.org/show_bug.cgi?id=109283
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109289]: https://bugs.freedesktop.org/show_bug.cgi?id=109289
  [fdo#109295]: https://bugs.freedesktop.org/show_bug.cgi?id=109295
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#109642]: https://bugs.freedesktop.org/show_bug.cgi?id=109642
  [fdo#110723]: https://bugs.freedesktop.org/show_bug.cgi?id=110723
  [fdo#111068]: https://bugs.freedesktop.org/show_bug.cgi?id=111068
  [fdo#111614]: https://bugs.freedesktop.org/show_bug.cgi?id=111614
  [fdo#111615]: https://bugs.freedesktop.org/show_bug.cgi?id=111615
  [fdo#111825]: https://bugs.freedesktop.org/show_bug.cgi?id=111825
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [fdo#112283]: https://bugs.freedesktop.org/show_bug.cgi?id=112283
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#132]: https://gitlab.freedesktop.org/drm/intel/issues/132
  [i915#1397]: https://gitlab.freedesktop.org/drm/intel/issues/1397
  [i915#1722]: https://gitlab.freedesktop.org/drm/intel/issues/1722
  [i915#1755]: https://gitlab.freedesktop.org/drm/intel/issues/1755
  [i915#1769]: https://gitlab.freedesktop.org/drm/intel/issues/1769
  [i915#180]: https://gitlab.freedesktop.org/drm/intel/issues/180
  [i915#1825]: https://gitlab.freedesktop.org/drm/intel/issues/1825
  [i915#1839]: https://gitlab.freedesktop.org/drm/intel/issues/1839
  [i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845
  [i915#1849]: https://gitlab.freedesktop.org/drm/intel/issues/1849
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#2122]: https://gitlab.freedesktop.org/drm/intel/issues/2122
  [i915#2346]: https://gitlab.freedesktop.org/drm/intel/issues/2346
  [i915#2411]: https://gitlab.freedesktop.org/drm/intel/issues/2411
  [i915#2434]: https://gitlab.freedesktop.org/drm/intel/issues/2434
  [i915#2437]: https://gitlab.freedesktop.org/drm/intel/issues/2437
  [i915#2527]: https://gitlab.freedesktop.org/drm/intel/issues/2527
  [i915#2546]: https://gitlab.freedesktop.org/drm/intel/issues/2546
  [i915#2587]: https://gitlab.freedesktop.org/drm/intel/issues/2587
  [i915#2672]: https://gitlab.freedesktop.org/drm/intel/issues/2672
  [i915#2681]: https://gitlab.freedesktop.org/drm/intel/issues/2681
  [i915#2684]: https://gitlab.freedesktop.org/drm/intel/issues/2684
  [i915#2705]: https://gitlab.freedesktop.org/drm/intel/issues/2705
  [i915#280]: https://gitlab.freedesktop.org/drm/intel/issues/280
  [i915#284]: https://gitlab.freedesktop.org/drm/intel/issues/284
  [i915#2842]: https://gitlab.freedesktop.org/drm/intel/issues/2842
  [i915#2920]: https://gitlab.freedesktop.org/drm/intel/issues/2920
  [i915#2994]: https://gitlab.freedesktop.org/drm/intel/issues/2994
  [i915#3002]: https://gitlab.freedesktop.org/drm/intel/issues/3002
  [i915#315]: https://gitlab.freedesktop.org/drm/intel/issues/315
  [i915#3281]: https://gitlab.freedesktop.org/drm/intel/issues/3281
  [i915#3282]: https://gitlab.freedesktop.org/drm/intel/issues/3282
  [i915#3291]: https://gitlab.freedesktop.org/drm/intel/issues/3291
  [i915#3323]: https://gitlab.freedesktop.org/drm/intel/issues/3323
  [i915#3359]: https://gitlab.freedesktop.org/drm/intel/issues/3359
  [i915#3458]: https://gitlab.freedesktop.org/drm/intel/issues/3458
  [i915#3528]: https://gitlab.freedesktop.org/drm/intel/issues/3528
  [i915#3539]: https://gitlab.freedesktop.org/drm/intel/issues/3539
  [i915#3546]: https://gitlab.freedesktop.org/drm/intel/issues/3546
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3558]: https://gitlab.freedesktop.org/drm/intel/issues/3558
  [i915#3591]: https://gitlab.freedesktop.org/drm/intel/issues/3591
  [i915#3637]: https://gitlab.freedesktop.org/drm/intel/issues/3637
  [i915#3638]: https://gitlab.freedesktop.org/drm/intel/issues/3638
  [i915#3689]: https://gitlab.freedesktop.org/drm/intel/issues/3689
  [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708
  [i915#3734]: https://gitlab.freedesktop.org/drm/intel/issues/3734
  [i915#3742]: https://gitlab.freedesktop.org/drm/intel/issues/3742
  [i915#3763]: https://gitlab.freedesktop.org/drm/intel/issues/3763
  [i915#3840]: https://gitlab.freedesktop.org/drm/intel/issues/3840
  [i915#3886]: https://gitlab.freedesktop.org/drm/intel/issues/3886
  [i915#3989]: https://gitlab.freedesktop.org/drm/intel/issues/3989
  [i915#4036]: https://gitlab.freedesktop.org/drm/intel/issues/4036
  [i915#4070]: https://gitlab.freedesktop.org/drm/intel/issues/4070
  [i915#4077]: https://gitlab.freedesktop.org/drm/intel/issues/4077
  [i915#4079]: https://gitlab.freedesktop.org/drm/intel/issues/4079
  [i915#4083]: https://gitlab.freedesktop.org/drm/intel/issues/4083
  [i915#4098]: https://gitlab.freedesktop.org/drm/intel/issues/4098
  [i915#4171]: https://gitlab.freedesktop.org/drm/intel/issues/4171
  [i915#4270]: https://gitlab.freedesktop.org/drm/intel/issues/4270
  [i915#4281]: https://gitlab.freedesktop.org/drm/intel/issues/4281
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4525]: https://gitlab.freedesktop.org/drm/intel/issues/4525
  [i915#4538]: https://gitlab.freedesktop.org/drm/intel/issues/4538
  [i915#454]: https://gitlab.freedesktop.org/drm/intel/issues/454
  [i915#4573]: https://gitlab.freedesktop.org/drm/intel/issues/4573
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4812]: https://gitlab.freedesktop.org/drm/intel/issues/4812
  [i915#4833]: https://gitlab.freedesktop.org/drm/intel/issues/4833
  [i915#4852]: https://gitlab.freedesktop.org/drm/intel/issues/4852
  [i915#4885]: https://gitlab.freedesktop.org/drm/intel/issues/4885
  [i915#4991]: https://gitlab.freedesktop.org/drm/intel/issues/4991
  [i915#5176]: https://gitlab.freedesktop.org/drm/intel/issues/5176
  [i915#5235]: https://gitlab.freedesktop.org/drm/intel/issues/5235
  [i915#5286]: https://gitlab.freedesktop.org/drm/intel/issues/5286
  [i915#5325]: https://gitlab.freedesktop.org/drm/intel/issues/5325
  [i915#533]: https://gitlab.freedesktop.org/drm/intel/issues/533
  [i915#5439]: https://gitlab.freedesktop.org/drm/intel/issues/5439
  [i915#5461]: https://gitlab.freedesktop.org/drm/intel/issues/5461
  [i915#5563]: https://gitlab.freedesktop.org/drm/intel/issues/5563
  [i915#5566]: https://gitlab.freedesktop.org/drm/intel/issues/5566
  [i915#6095]: https://gitlab.freedesktop.org/drm/intel/issues/6095
  [i915#6245]: https://gitlab.freedesktop.org/drm/intel/issues/6245
  [i915#6247]: https://gitlab.freedesktop.org/drm/intel/issues/6247
  [i915#6248]: https://gitlab.freedesktop.org/drm/intel/issues/6248
  [i915#6252]: https://gitlab.freedesktop.org/drm/intel/issues/6252
  [i915#6258]: https://gitlab.freedesktop.org/drm/intel/issues/6258
  [i915#6259]: https://gitlab.freedesktop.org/drm/intel/issues/6259
  [i915#6268]: https://gitlab.freedesktop.org/drm/intel/issues/6268
  [i915#6412]: https://gitlab.freedesktop.org/drm/intel/issues/6412
  [i915#6497]: https://gitlab.freedesktop.org/drm/intel/issues/6497
  [i915#658]: https://gitlab.freedesktop.org/drm/intel/issues/658
  [i915#6621]: https://gitlab.freedesktop.org/drm/intel/issues/6621
  [i915#6768]: https://gitlab.freedesktop.org/drm/intel/issues/6768
  [i915#6946]: https://gitlab.freedesktop.org/drm/intel/issues/6946
  [i915#7036]: https://gitlab.freedesktop.org/drm/intel/issues/7036
  [i915#7037]: https://gitlab.freedesktop.org/drm/intel/issues/7037
  [i915#7248]: https://gitlab.freedesktop.org/drm/intel/issues/7248
  [i915#7253]: https://gitlab.freedesktop.org/drm/intel/issues/7253
  [i915#7276]: https://gitlab.freedesktop.org/drm/intel/issues/7276
  [i915#79]: https://gitlab.freedesktop.org/drm/intel/issues/79


Build changes
-------------

  * Linux: CI_DRM_12418 -> Patchwork_111210v1

  CI-20190529: 20190529
  CI_DRM_12418: 22789b788bcaf35826550836b0ad6872d6e85ca6 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_7071: 0801475083ccb938b1d3b358502ff97fdb435585 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_111210v1: 22789b788bcaf35826550836b0ad6872d6e85ca6 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_111210v1/index.html

[-- Attachment #2: Type: text/html, Size: 36436 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-23  9:39                   ` [Intel-gfx] " Christian König
  (?)
@ 2022-11-23 10:06                     ` Daniel Vetter
  -1 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-23 10:06 UTC (permalink / raw)
  To: Christian König
  Cc: Intel Graphics Development, DRI Development, Sumit Semwal,
	linaro-mm-sig, Jason Gunthorpe, John Stultz, Matthew Wilcox,
	Thomas Zimmermann, Daniel Vetter, Suren Baghdasaryan,
	Christian König, linux-media

On Wed, 23 Nov 2022 at 10:39, Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 23.11.22 um 10:30 schrieb Daniel Vetter:
> > On Wed, 23 Nov 2022 at 10:06, Christian König <christian.koenig@amd.com> wrote:
> >> Am 22.11.22 um 20:50 schrieb Daniel Vetter:
> >>> On Tue, 22 Nov 2022 at 20:34, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >>>> On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:
> >>>>> You nuke all the ptes. Drivers that move have slightly more than a
> >>>>> bare struct file, they also have a struct address_space so that
> >>>>> invalidate_mapping_range() works.
> >>>> Okay, this is one of the ways that this can be made to work correctly,
> >>>> as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this
> >>>> was the DAX mistake)
> >>> Hence this patch, to enforce that no dma-buf exporter gets this wrong.
> >>> Which some did, and then blamed bug reporters for the resulting splats
> >>> :-) One of the things we've reverted was the ttm huge pte support,
> >>> since that doesn't have the pmd_special flag (yet) and so would let
> >>> gup_fast through.
> >> The problem is not only gup, a lot of people seem to assume that when
> >> you are able to grab a reference to a page that the ptes pointing to
> >> that page can't change any more. And that's obviously incorrect.
> >>
> >> I witnessed tons of discussions about that already. Some customers even
> >> modified our code assuming that and then wondered why the heck they ran
> >> into data corruption.
> >>
> >> It's gotten so bad that I've even proposed intentionally mangling the
> >> page reference count on TTM allocated pages:
> >> https://patchwork.kernel.org/project/dri-devel/patch/20220927143529.135689-1-christian.koenig@amd.com/
> > Yeah maybe something like this could be applied after we land this
> > patch here.
>
> I think both should land ASAP. We don't have any other way than to
> clearly point out incorrect approaches if we want to prevent the
> resulting data corruption.
>
> > Well maybe should have the same check in gem mmap code to
> > make sure no driver
>
> Reads like the sentence is somehow cut of?

Yeah, just wanted to say that we need to make sure in as many places
as possible that VM_PFNMAP is set for bo mmaps.

> >> I think it would be better that instead of having special flags in the
> >> ptes and vmas that you can't follow them to a page structure we would
> >> add something to the page indicating that you can't grab a reference to
> >> it. But this might break some use cases as well.
> > Afaik the problem with that is that there's no free page bits left for
> > these debug checks. Plus the pte+vma flags are the flags to make this
> > clear already.
>
> Maybe a GFP flag to set the page reference count to zero or something
> like this?

Hm yeah that might work. I'm not sure what it will all break though?
And we'd need to make sure that underflowing the page refcount dies in
a backtrace.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Intel-gfx] [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23 10:06                     ` Daniel Vetter
  0 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-23 10:06 UTC (permalink / raw)
  To: Christian König
  Cc: Intel Graphics Development, DRI Development, Sumit Semwal,
	linaro-mm-sig, Jason Gunthorpe, John Stultz, Matthew Wilcox,
	Thomas Zimmermann, Daniel Vetter, Suren Baghdasaryan,
	Christian König, linux-media

On Wed, 23 Nov 2022 at 10:39, Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 23.11.22 um 10:30 schrieb Daniel Vetter:
> > On Wed, 23 Nov 2022 at 10:06, Christian König <christian.koenig@amd.com> wrote:
> >> Am 22.11.22 um 20:50 schrieb Daniel Vetter:
> >>> On Tue, 22 Nov 2022 at 20:34, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >>>> On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:
> >>>>> You nuke all the ptes. Drivers that move have slightly more than a
> >>>>> bare struct file, they also have a struct address_space so that
> >>>>> invalidate_mapping_range() works.
> >>>> Okay, this is one of the ways that this can be made to work correctly,
> >>>> as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this
> >>>> was the DAX mistake)
> >>> Hence this patch, to enforce that no dma-buf exporter gets this wrong.
> >>> Which some did, and then blamed bug reporters for the resulting splats
> >>> :-) One of the things we've reverted was the ttm huge pte support,
> >>> since that doesn't have the pmd_special flag (yet) and so would let
> >>> gup_fast through.
> >> The problem is not only gup, a lot of people seem to assume that when
> >> you are able to grab a reference to a page that the ptes pointing to
> >> that page can't change any more. And that's obviously incorrect.
> >>
> >> I witnessed tons of discussions about that already. Some customers even
> >> modified our code assuming that and then wondered why the heck they ran
> >> into data corruption.
> >>
> >> It's gotten so bad that I've even proposed intentionally mangling the
> >> page reference count on TTM allocated pages:
> >> https://patchwork.kernel.org/project/dri-devel/patch/20220927143529.135689-1-christian.koenig@amd.com/
> > Yeah maybe something like this could be applied after we land this
> > patch here.
>
> I think both should land ASAP. We don't have any other way than to
> clearly point out incorrect approaches if we want to prevent the
> resulting data corruption.
>
> > Well maybe should have the same check in gem mmap code to
> > make sure no driver
>
> Reads like the sentence is somehow cut of?

Yeah, just wanted to say that we need to make sure in as many places
as possible that VM_PFNMAP is set for bo mmaps.

> >> I think it would be better that instead of having special flags in the
> >> ptes and vmas that you can't follow them to a page structure we would
> >> add something to the page indicating that you can't grab a reference to
> >> it. But this might break some use cases as well.
> > Afaik the problem with that is that there's no free page bits left for
> > these debug checks. Plus the pte+vma flags are the flags to make this
> > clear already.
>
> Maybe a GFP flag to set the page reference count to zero or something
> like this?

Hm yeah that might work. I'm not sure what it will all break though?
And we'd need to make sure that underflowing the page refcount dies in
a backtrace.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23 10:06                     ` Daniel Vetter
  0 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-23 10:06 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, Jason Gunthorpe, DRI Development,
	Intel Graphics Development, Thomas Zimmermann,
	Suren Baghdasaryan, Matthew Wilcox, John Stultz, Daniel Vetter,
	Sumit Semwal, linux-media, linaro-mm-sig

On Wed, 23 Nov 2022 at 10:39, Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 23.11.22 um 10:30 schrieb Daniel Vetter:
> > On Wed, 23 Nov 2022 at 10:06, Christian König <christian.koenig@amd.com> wrote:
> >> Am 22.11.22 um 20:50 schrieb Daniel Vetter:
> >>> On Tue, 22 Nov 2022 at 20:34, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >>>> On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:
> >>>>> You nuke all the ptes. Drivers that move have slightly more than a
> >>>>> bare struct file, they also have a struct address_space so that
> >>>>> invalidate_mapping_range() works.
> >>>> Okay, this is one of the ways that this can be made to work correctly,
> >>>> as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this
> >>>> was the DAX mistake)
> >>> Hence this patch, to enforce that no dma-buf exporter gets this wrong.
> >>> Which some did, and then blamed bug reporters for the resulting splats
> >>> :-) One of the things we've reverted was the ttm huge pte support,
> >>> since that doesn't have the pmd_special flag (yet) and so would let
> >>> gup_fast through.
> >> The problem is not only gup, a lot of people seem to assume that when
> >> you are able to grab a reference to a page that the ptes pointing to
> >> that page can't change any more. And that's obviously incorrect.
> >>
> >> I witnessed tons of discussions about that already. Some customers even
> >> modified our code assuming that and then wondered why the heck they ran
> >> into data corruption.
> >>
> >> It's gotten so bad that I've even proposed intentionally mangling the
> >> page reference count on TTM allocated pages:
> >> https://patchwork.kernel.org/project/dri-devel/patch/20220927143529.135689-1-christian.koenig@amd.com/
> > Yeah maybe something like this could be applied after we land this
> > patch here.
>
> I think both should land ASAP. We don't have any other way than to
> clearly point out incorrect approaches if we want to prevent the
> resulting data corruption.
>
> > Well maybe should have the same check in gem mmap code to
> > make sure no driver
>
> Reads like the sentence is somehow cut of?

Yeah, just wanted to say that we need to make sure in as many places
as possible that VM_PFNMAP is set for bo mmaps.

> >> I think it would be better that instead of having special flags in the
> >> ptes and vmas that you can't follow them to a page structure we would
> >> add something to the page indicating that you can't grab a reference to
> >> it. But this might break some use cases as well.
> > Afaik the problem with that is that there's no free page bits left for
> > these debug checks. Plus the pte+vma flags are the flags to make this
> > clear already.
>
> Maybe a GFP flag to set the page reference count to zero or something
> like this?

Hm yeah that might work. I'm not sure what it will all break though?
And we'd need to make sure that underflowing the page refcount dies in
a backtrace.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-23 10:06                     ` [Intel-gfx] " Daniel Vetter
@ 2022-11-23 12:46                       ` Jason Gunthorpe
  -1 siblings, 0 replies; 72+ messages in thread
From: Jason Gunthorpe @ 2022-11-23 12:46 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Christian König, Intel Graphics Development,
	DRI Development, Sumit Semwal, linaro-mm-sig, John Stultz,
	Matthew Wilcox, Thomas Zimmermann, Daniel Vetter,
	Suren Baghdasaryan, Christian König, linux-media

On Wed, Nov 23, 2022 at 11:06:55AM +0100, Daniel Vetter wrote:

> > Maybe a GFP flag to set the page reference count to zero or something
> > like this?
> 
> Hm yeah that might work. I'm not sure what it will all break though?
> And we'd need to make sure that underflowing the page refcount dies in
> a backtrace.

Mucking with the refcount like this to protect against crazy out of
tree drives seems horrible..

The WARN_ON(pag_count(p) != 1) seems like a reasonable thing to do
though, though you must combine this with the special PTE flag..

Jason

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23 12:46                       ` Jason Gunthorpe
  0 siblings, 0 replies; 72+ messages in thread
From: Jason Gunthorpe @ 2022-11-23 12:46 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Christian König, Christian König, DRI Development,
	Intel Graphics Development, Thomas Zimmermann,
	Suren Baghdasaryan, Matthew Wilcox, John Stultz, Daniel Vetter,
	Sumit Semwal, linux-media, linaro-mm-sig

On Wed, Nov 23, 2022 at 11:06:55AM +0100, Daniel Vetter wrote:

> > Maybe a GFP flag to set the page reference count to zero or something
> > like this?
> 
> Hm yeah that might work. I'm not sure what it will all break though?
> And we'd need to make sure that underflowing the page refcount dies in
> a backtrace.

Mucking with the refcount like this to protect against crazy out of
tree drives seems horrible..

The WARN_ON(pag_count(p) != 1) seems like a reasonable thing to do
though, though you must combine this with the special PTE flag..

Jason

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-23 12:46                       ` Jason Gunthorpe
  (?)
@ 2022-11-23 12:49                         ` Christian König
  -1 siblings, 0 replies; 72+ messages in thread
From: Christian König @ 2022-11-23 12:49 UTC (permalink / raw)
  To: Jason Gunthorpe, Daniel Vetter
  Cc: Intel Graphics Development, Matthew Wilcox, Sumit Semwal,
	linaro-mm-sig, John Stultz, DRI Development, Thomas Zimmermann,
	Daniel Vetter, Suren Baghdasaryan, Christian König,
	linux-media

Am 23.11.22 um 13:46 schrieb Jason Gunthorpe:
> On Wed, Nov 23, 2022 at 11:06:55AM +0100, Daniel Vetter wrote:
>
>>> Maybe a GFP flag to set the page reference count to zero or something
>>> like this?
>> Hm yeah that might work. I'm not sure what it will all break though?
>> And we'd need to make sure that underflowing the page refcount dies in
>> a backtrace.
> Mucking with the refcount like this to protect against crazy out of
> tree drives seems horrible..

Well not only out of tree drivers. The intree KVM got that horrible 
wrong as well, those where the latest guys complaining about it.

>
> The WARN_ON(pag_count(p) != 1) seems like a reasonable thing to do
> though, though you must combine this with the special PTE flag..

That's not sufficient. The pages are released much later than things 
actually go wrong. In most cases this WARN_ON here won't hit.

Christian.

>
> Jason


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Intel-gfx] [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23 12:49                         ` Christian König
  0 siblings, 0 replies; 72+ messages in thread
From: Christian König @ 2022-11-23 12:49 UTC (permalink / raw)
  To: Jason Gunthorpe, Daniel Vetter
  Cc: Intel Graphics Development, Matthew Wilcox, Sumit Semwal,
	linaro-mm-sig, John Stultz, DRI Development, Thomas Zimmermann,
	Daniel Vetter, Suren Baghdasaryan, Christian König,
	linux-media

Am 23.11.22 um 13:46 schrieb Jason Gunthorpe:
> On Wed, Nov 23, 2022 at 11:06:55AM +0100, Daniel Vetter wrote:
>
>>> Maybe a GFP flag to set the page reference count to zero or something
>>> like this?
>> Hm yeah that might work. I'm not sure what it will all break though?
>> And we'd need to make sure that underflowing the page refcount dies in
>> a backtrace.
> Mucking with the refcount like this to protect against crazy out of
> tree drives seems horrible..

Well not only out of tree drivers. The intree KVM got that horrible 
wrong as well, those where the latest guys complaining about it.

>
> The WARN_ON(pag_count(p) != 1) seems like a reasonable thing to do
> though, though you must combine this with the special PTE flag..

That's not sufficient. The pages are released much later than things 
actually go wrong. In most cases this WARN_ON here won't hit.

Christian.

>
> Jason


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23 12:49                         ` Christian König
  0 siblings, 0 replies; 72+ messages in thread
From: Christian König @ 2022-11-23 12:49 UTC (permalink / raw)
  To: Jason Gunthorpe, Daniel Vetter
  Cc: Christian König, DRI Development,
	Intel Graphics Development, Thomas Zimmermann,
	Suren Baghdasaryan, Matthew Wilcox, John Stultz, Daniel Vetter,
	Sumit Semwal, linux-media, linaro-mm-sig

Am 23.11.22 um 13:46 schrieb Jason Gunthorpe:
> On Wed, Nov 23, 2022 at 11:06:55AM +0100, Daniel Vetter wrote:
>
>>> Maybe a GFP flag to set the page reference count to zero or something
>>> like this?
>> Hm yeah that might work. I'm not sure what it will all break though?
>> And we'd need to make sure that underflowing the page refcount dies in
>> a backtrace.
> Mucking with the refcount like this to protect against crazy out of
> tree drives seems horrible..

Well not only out of tree drivers. The intree KVM got that horrible 
wrong as well, those where the latest guys complaining about it.

>
> The WARN_ON(pag_count(p) != 1) seems like a reasonable thing to do
> though, though you must combine this with the special PTE flag..

That's not sufficient. The pages are released much later than things 
actually go wrong. In most cases this WARN_ON here won't hit.

Christian.

>
> Jason


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-23 12:49                         ` [Intel-gfx] " Christian König
@ 2022-11-23 12:53                           ` Jason Gunthorpe
  -1 siblings, 0 replies; 72+ messages in thread
From: Jason Gunthorpe @ 2022-11-23 12:53 UTC (permalink / raw)
  To: Christian König
  Cc: Daniel Vetter, Intel Graphics Development, DRI Development,
	Sumit Semwal, linaro-mm-sig, John Stultz, Matthew Wilcox,
	Thomas Zimmermann, Daniel Vetter, Suren Baghdasaryan,
	Christian König, linux-media

On Wed, Nov 23, 2022 at 01:49:41PM +0100, Christian König wrote:
> Am 23.11.22 um 13:46 schrieb Jason Gunthorpe:
> > On Wed, Nov 23, 2022 at 11:06:55AM +0100, Daniel Vetter wrote:
> > 
> > > > Maybe a GFP flag to set the page reference count to zero or something
> > > > like this?
> > > Hm yeah that might work. I'm not sure what it will all break though?
> > > And we'd need to make sure that underflowing the page refcount dies in
> > > a backtrace.
> > Mucking with the refcount like this to protect against crazy out of
> > tree drives seems horrible..
> 
> Well not only out of tree drivers. The intree KVM got that horrible
> wrong as well, those where the latest guys complaining about it.

kvm was taking refs on special PTEs? That seems really unlikely?

> > The WARN_ON(pag_count(p) != 1) seems like a reasonable thing to do
> > though, though you must combine this with the special PTE flag..
> 
> That's not sufficient. The pages are released much later than things
> actually go wrong. In most cases this WARN_ON here won't hit.

How so? As long as the page is mapped into the PTE there is no issue
with corruption. If dmabuf checks the refcount after it does the unmap
mapping range it should catch any bogus pin that might be confused
about address coherency.

Jason

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23 12:53                           ` Jason Gunthorpe
  0 siblings, 0 replies; 72+ messages in thread
From: Jason Gunthorpe @ 2022-11-23 12:53 UTC (permalink / raw)
  To: Christian König
  Cc: Daniel Vetter, Christian König, DRI Development,
	Intel Graphics Development, Thomas Zimmermann,
	Suren Baghdasaryan, Matthew Wilcox, John Stultz, Daniel Vetter,
	Sumit Semwal, linux-media, linaro-mm-sig

On Wed, Nov 23, 2022 at 01:49:41PM +0100, Christian König wrote:
> Am 23.11.22 um 13:46 schrieb Jason Gunthorpe:
> > On Wed, Nov 23, 2022 at 11:06:55AM +0100, Daniel Vetter wrote:
> > 
> > > > Maybe a GFP flag to set the page reference count to zero or something
> > > > like this?
> > > Hm yeah that might work. I'm not sure what it will all break though?
> > > And we'd need to make sure that underflowing the page refcount dies in
> > > a backtrace.
> > Mucking with the refcount like this to protect against crazy out of
> > tree drives seems horrible..
> 
> Well not only out of tree drivers. The intree KVM got that horrible
> wrong as well, those where the latest guys complaining about it.

kvm was taking refs on special PTEs? That seems really unlikely?

> > The WARN_ON(pag_count(p) != 1) seems like a reasonable thing to do
> > though, though you must combine this with the special PTE flag..
> 
> That's not sufficient. The pages are released much later than things
> actually go wrong. In most cases this WARN_ON here won't hit.

How so? As long as the page is mapped into the PTE there is no issue
with corruption. If dmabuf checks the refcount after it does the unmap
mapping range it should catch any bogus pin that might be confused
about address coherency.

Jason

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-23 12:53                           ` Jason Gunthorpe
  (?)
@ 2022-11-23 13:12                             ` Christian König
  -1 siblings, 0 replies; 72+ messages in thread
From: Christian König @ 2022-11-23 13:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Daniel Vetter, Intel Graphics Development, DRI Development,
	Sumit Semwal, linaro-mm-sig, John Stultz, Matthew Wilcox,
	Thomas Zimmermann, Daniel Vetter, Suren Baghdasaryan,
	Christian König, linux-media

Am 23.11.22 um 13:53 schrieb Jason Gunthorpe:
> On Wed, Nov 23, 2022 at 01:49:41PM +0100, Christian König wrote:
>> Am 23.11.22 um 13:46 schrieb Jason Gunthorpe:
>>> On Wed, Nov 23, 2022 at 11:06:55AM +0100, Daniel Vetter wrote:
>>>
>>>>> Maybe a GFP flag to set the page reference count to zero or something
>>>>> like this?
>>>> Hm yeah that might work. I'm not sure what it will all break though?
>>>> And we'd need to make sure that underflowing the page refcount dies in
>>>> a backtrace.
>>> Mucking with the refcount like this to protect against crazy out of
>>> tree drives seems horrible..
>> Well not only out of tree drivers. The intree KVM got that horrible
>> wrong as well, those where the latest guys complaining about it.
> kvm was taking refs on special PTEs? That seems really unlikely?

Well then look at this code here:

commit add6a0cd1c5ba51b201e1361b05a5df817083618
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Tue Jun 7 17:51:18 2016 +0200

     KVM: MMU: try to fix up page faults before giving up

     The vGPU folks would like to trap the first access to a BAR by setting
     vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault 
handler
     then can use remap_pfn_range to place some non-reserved pages in 
the VMA.

     This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn
     and fixup_user_fault together help supporting it.  The patch also 
supports
     VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to
     reference counting.

     Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
     Cc: Andrea Arcangeli <aarcange@redhat.com>
     Cc: Radim Krčmář <rkrcmar@redhat.com>
     Tested-by: Neo Jia <cjia@nvidia.com>
     Reported-by: Kirti Wankhede <kwankhede@nvidia.com>
     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

And see also the discussion here: 
https://patchwork.freedesktop.org/patch/414123/

as well as here: https://patchwork.freedesktop.org/patch/499190/

I can't count how often I have pointed out that this is absolutely 
illegal and KVM can't touch pages in VMAs with VM_PFNMAP.

>>> The WARN_ON(pag_count(p) != 1) seems like a reasonable thing to do
>>> though, though you must combine this with the special PTE flag..
>> That's not sufficient. The pages are released much later than things
>> actually go wrong. In most cases this WARN_ON here won't hit.
> How so? As long as the page is mapped into the PTE there is no issue
> with corruption. If dmabuf checks the refcount after it does the unmap
> mapping range it should catch any bogus pin that might be confused
> about address coherency.

Yeah, that would work. The problem is this WARN_ON() comes much later.

The device drivers usually keep the page around for a while even after 
it is unmapped. IIRC the cleanup worker only runs every 10ms or so.

Christian.

>
> Jason


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Intel-gfx] [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23 13:12                             ` Christian König
  0 siblings, 0 replies; 72+ messages in thread
From: Christian König @ 2022-11-23 13:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Daniel Vetter, Intel Graphics Development, DRI Development,
	Sumit Semwal, linaro-mm-sig, John Stultz, Matthew Wilcox,
	Thomas Zimmermann, Daniel Vetter, Suren Baghdasaryan,
	Christian König, linux-media

Am 23.11.22 um 13:53 schrieb Jason Gunthorpe:
> On Wed, Nov 23, 2022 at 01:49:41PM +0100, Christian König wrote:
>> Am 23.11.22 um 13:46 schrieb Jason Gunthorpe:
>>> On Wed, Nov 23, 2022 at 11:06:55AM +0100, Daniel Vetter wrote:
>>>
>>>>> Maybe a GFP flag to set the page reference count to zero or something
>>>>> like this?
>>>> Hm yeah that might work. I'm not sure what it will all break though?
>>>> And we'd need to make sure that underflowing the page refcount dies in
>>>> a backtrace.
>>> Mucking with the refcount like this to protect against crazy out of
>>> tree drives seems horrible..
>> Well not only out of tree drivers. The intree KVM got that horrible
>> wrong as well, those where the latest guys complaining about it.
> kvm was taking refs on special PTEs? That seems really unlikely?

Well then look at this code here:

commit add6a0cd1c5ba51b201e1361b05a5df817083618
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Tue Jun 7 17:51:18 2016 +0200

     KVM: MMU: try to fix up page faults before giving up

     The vGPU folks would like to trap the first access to a BAR by setting
     vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault 
handler
     then can use remap_pfn_range to place some non-reserved pages in 
the VMA.

     This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn
     and fixup_user_fault together help supporting it.  The patch also 
supports
     VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to
     reference counting.

     Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
     Cc: Andrea Arcangeli <aarcange@redhat.com>
     Cc: Radim Krčmář <rkrcmar@redhat.com>
     Tested-by: Neo Jia <cjia@nvidia.com>
     Reported-by: Kirti Wankhede <kwankhede@nvidia.com>
     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

And see also the discussion here: 
https://patchwork.freedesktop.org/patch/414123/

as well as here: https://patchwork.freedesktop.org/patch/499190/

I can't count how often I have pointed out that this is absolutely 
illegal and KVM can't touch pages in VMAs with VM_PFNMAP.

>>> The WARN_ON(pag_count(p) != 1) seems like a reasonable thing to do
>>> though, though you must combine this with the special PTE flag..
>> That's not sufficient. The pages are released much later than things
>> actually go wrong. In most cases this WARN_ON here won't hit.
> How so? As long as the page is mapped into the PTE there is no issue
> with corruption. If dmabuf checks the refcount after it does the unmap
> mapping range it should catch any bogus pin that might be confused
> about address coherency.

Yeah, that would work. The problem is this WARN_ON() comes much later.

The device drivers usually keep the page around for a while even after 
it is unmapped. IIRC the cleanup worker only runs every 10ms or so.

Christian.

>
> Jason


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23 13:12                             ` Christian König
  0 siblings, 0 replies; 72+ messages in thread
From: Christian König @ 2022-11-23 13:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Daniel Vetter, Christian König, DRI Development,
	Intel Graphics Development, Thomas Zimmermann,
	Suren Baghdasaryan, Matthew Wilcox, John Stultz, Daniel Vetter,
	Sumit Semwal, linux-media, linaro-mm-sig

Am 23.11.22 um 13:53 schrieb Jason Gunthorpe:
> On Wed, Nov 23, 2022 at 01:49:41PM +0100, Christian König wrote:
>> Am 23.11.22 um 13:46 schrieb Jason Gunthorpe:
>>> On Wed, Nov 23, 2022 at 11:06:55AM +0100, Daniel Vetter wrote:
>>>
>>>>> Maybe a GFP flag to set the page reference count to zero or something
>>>>> like this?
>>>> Hm yeah that might work. I'm not sure what it will all break though?
>>>> And we'd need to make sure that underflowing the page refcount dies in
>>>> a backtrace.
>>> Mucking with the refcount like this to protect against crazy out of
>>> tree drives seems horrible..
>> Well not only out of tree drivers. The intree KVM got that horrible
>> wrong as well, those where the latest guys complaining about it.
> kvm was taking refs on special PTEs? That seems really unlikely?

Well then look at this code here:

commit add6a0cd1c5ba51b201e1361b05a5df817083618
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Tue Jun 7 17:51:18 2016 +0200

     KVM: MMU: try to fix up page faults before giving up

     The vGPU folks would like to trap the first access to a BAR by setting
     vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault 
handler
     then can use remap_pfn_range to place some non-reserved pages in 
the VMA.

     This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn
     and fixup_user_fault together help supporting it.  The patch also 
supports
     VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to
     reference counting.

     Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
     Cc: Andrea Arcangeli <aarcange@redhat.com>
     Cc: Radim Krčmář <rkrcmar@redhat.com>
     Tested-by: Neo Jia <cjia@nvidia.com>
     Reported-by: Kirti Wankhede <kwankhede@nvidia.com>
     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

And see also the discussion here: 
https://patchwork.freedesktop.org/patch/414123/

as well as here: https://patchwork.freedesktop.org/patch/499190/

I can't count how often I have pointed out that this is absolutely 
illegal and KVM can't touch pages in VMAs with VM_PFNMAP.

>>> The WARN_ON(pag_count(p) != 1) seems like a reasonable thing to do
>>> though, though you must combine this with the special PTE flag..
>> That's not sufficient. The pages are released much later than things
>> actually go wrong. In most cases this WARN_ON here won't hit.
> How so? As long as the page is mapped into the PTE there is no issue
> with corruption. If dmabuf checks the refcount after it does the unmap
> mapping range it should catch any bogus pin that might be confused
> about address coherency.

Yeah, that would work. The problem is this WARN_ON() comes much later.

The device drivers usually keep the page around for a while even after 
it is unmapped. IIRC the cleanup worker only runs every 10ms or so.

Christian.

>
> Jason


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-23 13:12                             ` [Intel-gfx] " Christian König
@ 2022-11-23 13:28                               ` Jason Gunthorpe
  -1 siblings, 0 replies; 72+ messages in thread
From: Jason Gunthorpe @ 2022-11-23 13:28 UTC (permalink / raw)
  To: Christian König
  Cc: Daniel Vetter, Intel Graphics Development, DRI Development,
	Sumit Semwal, linaro-mm-sig, John Stultz, Matthew Wilcox,
	Thomas Zimmermann, Daniel Vetter, Suren Baghdasaryan,
	Christian König, linux-media

On Wed, Nov 23, 2022 at 02:12:25PM +0100, Christian König wrote:
> Am 23.11.22 um 13:53 schrieb Jason Gunthorpe:
> > On Wed, Nov 23, 2022 at 01:49:41PM +0100, Christian König wrote:
> > > Am 23.11.22 um 13:46 schrieb Jason Gunthorpe:
> > > > On Wed, Nov 23, 2022 at 11:06:55AM +0100, Daniel Vetter wrote:
> > > > 
> > > > > > Maybe a GFP flag to set the page reference count to zero or something
> > > > > > like this?
> > > > > Hm yeah that might work. I'm not sure what it will all break though?
> > > > > And we'd need to make sure that underflowing the page refcount dies in
> > > > > a backtrace.
> > > > Mucking with the refcount like this to protect against crazy out of
> > > > tree drives seems horrible..
> > > Well not only out of tree drivers. The intree KVM got that horrible
> > > wrong as well, those where the latest guys complaining about it.
> > kvm was taking refs on special PTEs? That seems really unlikely?
> 
> Well then look at this code here:
> 
> commit add6a0cd1c5ba51b201e1361b05a5df817083618
> Author: Paolo Bonzini <pbonzini@redhat.com>
> Date:   Tue Jun 7 17:51:18 2016 +0200
> 
>     KVM: MMU: try to fix up page faults before giving up
> 
>     The vGPU folks would like to trap the first access to a BAR by setting
>     vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault
> handler
>     then can use remap_pfn_range to place some non-reserved pages in the
> VMA.
> 
>     This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn
>     and fixup_user_fault together help supporting it.  The patch also
> supports
>     VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to
>     reference counting.
> 
>     Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>     Cc: Andrea Arcangeli <aarcange@redhat.com>
>     Cc: Radim Krčmář <rkrcmar@redhat.com>
>     Tested-by: Neo Jia <cjia@nvidia.com>
>     Reported-by: Kirti Wankhede <kwankhede@nvidia.com>
>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

This patch is known to be broken in so many ways. It also has a major
security hole that it ignores the PTE flags making the page
RO. Ignoring the special bit is somehow not surprising :(

This probably doesn't work, but is the general idea of what KVM needs
to do:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1376a47fedeedb..4161241fc3228c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2598,6 +2598,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 			return r;
 	}
 
+	/*
+	 * Special PTEs are never convertible into a struct page, even if the
+	 * driver that owns them might have put a PFN with a struct page into
+	 * the PFNMAP. If the arch doesn't support special then we cannot
+	 * safely process these pages.
+	 */
+#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
+	if (pte_special(*ptep))
+		return -EINVAL;
+#else
+	return -EINVAL;
+#endif
+
 	if (write_fault && !pte_write(*ptep)) {
 		pfn = KVM_PFN_ERR_RO_FAULT;
 		goto out;

Jason

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23 13:28                               ` Jason Gunthorpe
  0 siblings, 0 replies; 72+ messages in thread
From: Jason Gunthorpe @ 2022-11-23 13:28 UTC (permalink / raw)
  To: Christian König
  Cc: Daniel Vetter, Christian König, DRI Development,
	Intel Graphics Development, Thomas Zimmermann,
	Suren Baghdasaryan, Matthew Wilcox, John Stultz, Daniel Vetter,
	Sumit Semwal, linux-media, linaro-mm-sig

On Wed, Nov 23, 2022 at 02:12:25PM +0100, Christian König wrote:
> Am 23.11.22 um 13:53 schrieb Jason Gunthorpe:
> > On Wed, Nov 23, 2022 at 01:49:41PM +0100, Christian König wrote:
> > > Am 23.11.22 um 13:46 schrieb Jason Gunthorpe:
> > > > On Wed, Nov 23, 2022 at 11:06:55AM +0100, Daniel Vetter wrote:
> > > > 
> > > > > > Maybe a GFP flag to set the page reference count to zero or something
> > > > > > like this?
> > > > > Hm yeah that might work. I'm not sure what it will all break though?
> > > > > And we'd need to make sure that underflowing the page refcount dies in
> > > > > a backtrace.
> > > > Mucking with the refcount like this to protect against crazy out of
> > > > tree drives seems horrible..
> > > Well not only out of tree drivers. The intree KVM got that horrible
> > > wrong as well, those where the latest guys complaining about it.
> > kvm was taking refs on special PTEs? That seems really unlikely?
> 
> Well then look at this code here:
> 
> commit add6a0cd1c5ba51b201e1361b05a5df817083618
> Author: Paolo Bonzini <pbonzini@redhat.com>
> Date:   Tue Jun 7 17:51:18 2016 +0200
> 
>     KVM: MMU: try to fix up page faults before giving up
> 
>     The vGPU folks would like to trap the first access to a BAR by setting
>     vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault
> handler
>     then can use remap_pfn_range to place some non-reserved pages in the
> VMA.
> 
>     This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn
>     and fixup_user_fault together help supporting it.  The patch also
> supports
>     VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to
>     reference counting.
> 
>     Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>     Cc: Andrea Arcangeli <aarcange@redhat.com>
>     Cc: Radim Krčmář <rkrcmar@redhat.com>
>     Tested-by: Neo Jia <cjia@nvidia.com>
>     Reported-by: Kirti Wankhede <kwankhede@nvidia.com>
>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

This patch is known to be broken in so many ways. It also has a major
security hole that it ignores the PTE flags making the page
RO. Ignoring the special bit is somehow not surprising :(

This probably doesn't work, but is the general idea of what KVM needs
to do:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1376a47fedeedb..4161241fc3228c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2598,6 +2598,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 			return r;
 	}
 
+	/*
+	 * Special PTEs are never convertible into a struct page, even if the
+	 * driver that owns them might have put a PFN with a struct page into
+	 * the PFNMAP. If the arch doesn't support special then we cannot
+	 * safely process these pages.
+	 */
+#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
+	if (pte_special(*ptep))
+		return -EINVAL;
+#else
+	return -EINVAL;
+#endif
+
 	if (write_fault && !pte_write(*ptep)) {
 		pfn = KVM_PFN_ERR_RO_FAULT;
 		goto out;

Jason

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-23 13:28                               ` Jason Gunthorpe
  (?)
@ 2022-11-23 14:28                                 ` Daniel Vetter
  -1 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-23 14:28 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christian König, Intel Graphics Development,
	DRI Development, Sumit Semwal, linaro-mm-sig, John Stultz,
	Matthew Wilcox, Thomas Zimmermann, Daniel Vetter,
	Suren Baghdasaryan, Christian König, linux-media

On Wed, 23 Nov 2022 at 14:28, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Nov 23, 2022 at 02:12:25PM +0100, Christian König wrote:
> > Am 23.11.22 um 13:53 schrieb Jason Gunthorpe:
> > > On Wed, Nov 23, 2022 at 01:49:41PM +0100, Christian König wrote:
> > > > Am 23.11.22 um 13:46 schrieb Jason Gunthorpe:
> > > > > On Wed, Nov 23, 2022 at 11:06:55AM +0100, Daniel Vetter wrote:
> > > > >
> > > > > > > Maybe a GFP flag to set the page reference count to zero or something
> > > > > > > like this?
> > > > > > Hm yeah that might work. I'm not sure what it will all break though?
> > > > > > And we'd need to make sure that underflowing the page refcount dies in
> > > > > > a backtrace.
> > > > > Mucking with the refcount like this to protect against crazy out of
> > > > > tree drives seems horrible..
> > > > Well not only out of tree drivers. The intree KVM got that horrible
> > > > wrong as well, those where the latest guys complaining about it.
> > > kvm was taking refs on special PTEs? That seems really unlikely?
> >
> > Well then look at this code here:
> >
> > commit add6a0cd1c5ba51b201e1361b05a5df817083618
> > Author: Paolo Bonzini <pbonzini@redhat.com>
> > Date:   Tue Jun 7 17:51:18 2016 +0200
> >
> >     KVM: MMU: try to fix up page faults before giving up
> >
> >     The vGPU folks would like to trap the first access to a BAR by setting
> >     vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault
> > handler
> >     then can use remap_pfn_range to place some non-reserved pages in the
> > VMA.
> >
> >     This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn
> >     and fixup_user_fault together help supporting it.  The patch also
> > supports
> >     VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to
> >     reference counting.
> >
> >     Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> >     Cc: Andrea Arcangeli <aarcange@redhat.com>
> >     Cc: Radim Krčmář <rkrcmar@redhat.com>
> >     Tested-by: Neo Jia <cjia@nvidia.com>
> >     Reported-by: Kirti Wankhede <kwankhede@nvidia.com>
> >     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>
> This patch is known to be broken in so many ways. It also has a major
> security hole that it ignores the PTE flags making the page
> RO. Ignoring the special bit is somehow not surprising :(
>
> This probably doesn't work, but is the general idea of what KVM needs
> to do:

Oh dear, when I dug around in there I entirely missed that
kvm_try_get_pfn exists, and it's very broken indeed. kvm really needs
to grow a proper mmu notifier.

Another thing I'm wondering right now, the follow_pte();
fixup_user_fault(); follow_pte(); approach does not make any
guarantees of actually being right. If you're sufficiently unlucky you
might race against an immediate pte invalidate between the fixup and
the 2nd follow_pte(). But you can also not loop, because that would
fail to catch permanent faults.

I think the iommu fault drivers have a similar pattern.

What am I missing here? Or is that also just broken. gup works around
this with the slow path that takes the mmap sem and walking the vma
tree, follow_pte/fixup_user_fautl users dont. Maybe mmu notifier based
restarting would help with this too, if done properly.
-Daniel

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 1376a47fedeedb..4161241fc3228c 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2598,6 +2598,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>                         return r;
>         }
>
> +       /*
> +        * Special PTEs are never convertible into a struct page, even if the
> +        * driver that owns them might have put a PFN with a struct page into
> +        * the PFNMAP. If the arch doesn't support special then we cannot
> +        * safely process these pages.
> +        */
> +#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
> +       if (pte_special(*ptep))
> +               return -EINVAL;
> +#else
> +       return -EINVAL;
> +#endif
> +
>         if (write_fault && !pte_write(*ptep)) {
>                 pfn = KVM_PFN_ERR_RO_FAULT;
>                 goto out;
>
> Jason



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Intel-gfx] [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23 14:28                                 ` Daniel Vetter
  0 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-23 14:28 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christian König, Intel Graphics Development,
	DRI Development, Sumit Semwal, linaro-mm-sig, John Stultz,
	Matthew Wilcox, Thomas Zimmermann, Daniel Vetter,
	Suren Baghdasaryan, Christian König, linux-media

On Wed, 23 Nov 2022 at 14:28, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Nov 23, 2022 at 02:12:25PM +0100, Christian König wrote:
> > Am 23.11.22 um 13:53 schrieb Jason Gunthorpe:
> > > On Wed, Nov 23, 2022 at 01:49:41PM +0100, Christian König wrote:
> > > > Am 23.11.22 um 13:46 schrieb Jason Gunthorpe:
> > > > > On Wed, Nov 23, 2022 at 11:06:55AM +0100, Daniel Vetter wrote:
> > > > >
> > > > > > > Maybe a GFP flag to set the page reference count to zero or something
> > > > > > > like this?
> > > > > > Hm yeah that might work. I'm not sure what it will all break though?
> > > > > > And we'd need to make sure that underflowing the page refcount dies in
> > > > > > a backtrace.
> > > > > Mucking with the refcount like this to protect against crazy out of
> > > > > tree drives seems horrible..
> > > > Well not only out of tree drivers. The intree KVM got that horrible
> > > > wrong as well, those where the latest guys complaining about it.
> > > kvm was taking refs on special PTEs? That seems really unlikely?
> >
> > Well then look at this code here:
> >
> > commit add6a0cd1c5ba51b201e1361b05a5df817083618
> > Author: Paolo Bonzini <pbonzini@redhat.com>
> > Date:   Tue Jun 7 17:51:18 2016 +0200
> >
> >     KVM: MMU: try to fix up page faults before giving up
> >
> >     The vGPU folks would like to trap the first access to a BAR by setting
> >     vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault
> > handler
> >     then can use remap_pfn_range to place some non-reserved pages in the
> > VMA.
> >
> >     This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn
> >     and fixup_user_fault together help supporting it.  The patch also
> > supports
> >     VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to
> >     reference counting.
> >
> >     Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> >     Cc: Andrea Arcangeli <aarcange@redhat.com>
> >     Cc: Radim Krčmář <rkrcmar@redhat.com>
> >     Tested-by: Neo Jia <cjia@nvidia.com>
> >     Reported-by: Kirti Wankhede <kwankhede@nvidia.com>
> >     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>
> This patch is known to be broken in so many ways. It also has a major
> security hole that it ignores the PTE flags making the page
> RO. Ignoring the special bit is somehow not surprising :(
>
> This probably doesn't work, but is the general idea of what KVM needs
> to do:

Oh dear, when I dug around in there I entirely missed that
kvm_try_get_pfn exists, and it's very broken indeed. kvm really needs
to grow a proper mmu notifier.

Another thing I'm wondering right now, the follow_pte();
fixup_user_fault(); follow_pte(); approach does not make any
guarantees of actually being right. If you're sufficiently unlucky you
might race against an immediate pte invalidate between the fixup and
the 2nd follow_pte(). But you can also not loop, because that would
fail to catch permanent faults.

I think the iommu fault drivers have a similar pattern.

What am I missing here? Or is that also just broken. gup works around
this with the slow path that takes the mmap sem and walking the vma
tree, follow_pte/fixup_user_fautl users dont. Maybe mmu notifier based
restarting would help with this too, if done properly.
-Daniel

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 1376a47fedeedb..4161241fc3228c 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2598,6 +2598,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>                         return r;
>         }
>
> +       /*
> +        * Special PTEs are never convertible into a struct page, even if the
> +        * driver that owns them might have put a PFN with a struct page into
> +        * the PFNMAP. If the arch doesn't support special then we cannot
> +        * safely process these pages.
> +        */
> +#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
> +       if (pte_special(*ptep))
> +               return -EINVAL;
> +#else
> +       return -EINVAL;
> +#endif
> +
>         if (write_fault && !pte_write(*ptep)) {
>                 pfn = KVM_PFN_ERR_RO_FAULT;
>                 goto out;
>
> Jason



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23 14:28                                 ` Daniel Vetter
  0 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-23 14:28 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christian König, Christian König, DRI Development,
	Intel Graphics Development, Thomas Zimmermann,
	Suren Baghdasaryan, Matthew Wilcox, John Stultz, Daniel Vetter,
	Sumit Semwal, linux-media, linaro-mm-sig

On Wed, 23 Nov 2022 at 14:28, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Nov 23, 2022 at 02:12:25PM +0100, Christian König wrote:
> > Am 23.11.22 um 13:53 schrieb Jason Gunthorpe:
> > > On Wed, Nov 23, 2022 at 01:49:41PM +0100, Christian König wrote:
> > > > Am 23.11.22 um 13:46 schrieb Jason Gunthorpe:
> > > > > On Wed, Nov 23, 2022 at 11:06:55AM +0100, Daniel Vetter wrote:
> > > > >
> > > > > > > Maybe a GFP flag to set the page reference count to zero or something
> > > > > > > like this?
> > > > > > Hm yeah that might work. I'm not sure what it will all break though?
> > > > > > And we'd need to make sure that underflowing the page refcount dies in
> > > > > > a backtrace.
> > > > > Mucking with the refcount like this to protect against crazy out of
> > > > > tree drives seems horrible..
> > > > Well not only out of tree drivers. The intree KVM got that horrible
> > > > wrong as well, those where the latest guys complaining about it.
> > > kvm was taking refs on special PTEs? That seems really unlikely?
> >
> > Well then look at this code here:
> >
> > commit add6a0cd1c5ba51b201e1361b05a5df817083618
> > Author: Paolo Bonzini <pbonzini@redhat.com>
> > Date:   Tue Jun 7 17:51:18 2016 +0200
> >
> >     KVM: MMU: try to fix up page faults before giving up
> >
> >     The vGPU folks would like to trap the first access to a BAR by setting
> >     vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault
> > handler
> >     then can use remap_pfn_range to place some non-reserved pages in the
> > VMA.
> >
> >     This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn
> >     and fixup_user_fault together help supporting it.  The patch also
> > supports
> >     VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to
> >     reference counting.
> >
> >     Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> >     Cc: Andrea Arcangeli <aarcange@redhat.com>
> >     Cc: Radim Krčmář <rkrcmar@redhat.com>
> >     Tested-by: Neo Jia <cjia@nvidia.com>
> >     Reported-by: Kirti Wankhede <kwankhede@nvidia.com>
> >     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>
> This patch is known to be broken in so many ways. It also has a major
> security hole that it ignores the PTE flags making the page
> RO. Ignoring the special bit is somehow not surprising :(
>
> This probably doesn't work, but is the general idea of what KVM needs
> to do:

Oh dear, when I dug around in there I entirely missed that
kvm_try_get_pfn exists, and it's very broken indeed. kvm really needs
to grow a proper mmu notifier.

Another thing I'm wondering right now, the follow_pte();
fixup_user_fault(); follow_pte(); approach does not make any
guarantees of actually being right. If you're sufficiently unlucky you
might race against an immediate pte invalidate between the fixup and
the 2nd follow_pte(). But you can also not loop, because that would
fail to catch permanent faults.

I think the iommu fault drivers have a similar pattern.

What am I missing here? Or is that also just broken. gup works around
this with the slow path that takes the mmap sem and walking the vma
tree, follow_pte/fixup_user_fautl users dont. Maybe mmu notifier based
restarting would help with this too, if done properly.
-Daniel

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 1376a47fedeedb..4161241fc3228c 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2598,6 +2598,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>                         return r;
>         }
>
> +       /*
> +        * Special PTEs are never convertible into a struct page, even if the
> +        * driver that owns them might have put a PFN with a struct page into
> +        * the PFNMAP. If the arch doesn't support special then we cannot
> +        * safely process these pages.
> +        */
> +#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
> +       if (pte_special(*ptep))
> +               return -EINVAL;
> +#else
> +       return -EINVAL;
> +#endif
> +
>         if (write_fault && !pte_write(*ptep)) {
>                 pfn = KVM_PFN_ERR_RO_FAULT;
>                 goto out;
>
> Jason



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-23 13:28                               ` Jason Gunthorpe
  (?)
@ 2022-11-23 14:34                                 ` Daniel Vetter
  -1 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-23 14:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christian König, Christian König, DRI Development,
	Intel Graphics Development, Thomas Zimmermann,
	Suren Baghdasaryan, Matthew Wilcox, John Stultz, Daniel Vetter,
	Sumit Semwal, linux-media, linaro-mm-sig

On Wed, 23 Nov 2022 at 14:28, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Nov 23, 2022 at 02:12:25PM +0100, Christian König wrote:
> > Am 23.11.22 um 13:53 schrieb Jason Gunthorpe:
> > > On Wed, Nov 23, 2022 at 01:49:41PM +0100, Christian König wrote:
> > > > Am 23.11.22 um 13:46 schrieb Jason Gunthorpe:
> > > > > On Wed, Nov 23, 2022 at 11:06:55AM +0100, Daniel Vetter wrote:
> > > > >
> > > > > > > Maybe a GFP flag to set the page reference count to zero or something
> > > > > > > like this?
> > > > > > Hm yeah that might work. I'm not sure what it will all break though?
> > > > > > And we'd need to make sure that underflowing the page refcount dies in
> > > > > > a backtrace.
> > > > > Mucking with the refcount like this to protect against crazy out of
> > > > > tree drives seems horrible..
> > > > Well not only out of tree drivers. The intree KVM got that horrible
> > > > wrong as well, those where the latest guys complaining about it.
> > > kvm was taking refs on special PTEs? That seems really unlikely?
> >
> > Well then look at this code here:
> >
> > commit add6a0cd1c5ba51b201e1361b05a5df817083618
> > Author: Paolo Bonzini <pbonzini@redhat.com>
> > Date:   Tue Jun 7 17:51:18 2016 +0200
> >
> >     KVM: MMU: try to fix up page faults before giving up
> >
> >     The vGPU folks would like to trap the first access to a BAR by setting
> >     vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault
> > handler
> >     then can use remap_pfn_range to place some non-reserved pages in the
> > VMA.
> >
> >     This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn
> >     and fixup_user_fault together help supporting it.  The patch also
> > supports
> >     VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to
> >     reference counting.
> >
> >     Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> >     Cc: Andrea Arcangeli <aarcange@redhat.com>
> >     Cc: Radim Krčmář <rkrcmar@redhat.com>
> >     Tested-by: Neo Jia <cjia@nvidia.com>
> >     Reported-by: Kirti Wankhede <kwankhede@nvidia.com>
> >     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>
> This patch is known to be broken in so many ways. It also has a major
> security hole that it ignores the PTE flags making the page
> RO. Ignoring the special bit is somehow not surprising :(
>
> This probably doesn't work, but is the general idea of what KVM needs
> to do:
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 1376a47fedeedb..4161241fc3228c 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2598,6 +2598,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>                         return r;
>         }
>
> +       /*
> +        * Special PTEs are never convertible into a struct page, even if the
> +        * driver that owns them might have put a PFN with a struct page into
> +        * the PFNMAP. If the arch doesn't support special then we cannot
> +        * safely process these pages.
> +        */
> +#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
> +       if (pte_special(*ptep))
> +               return -EINVAL;

On second thought this wont work, because it completely defeats the
point of why this code here exists. remap_pfn_range() (which is what
the various dma_mmap functions and the ioremap functions are built on
top of too) sets VM_PFNMAP too, so this check would even catch the
static mappings.

Plus these static mappings aren't all that static either, e.g. pci
access also can revoke bar mappings nowadays.

I think nothing except full mmu_notifier will actually fix this.
-Daniel

> +#else
> +       return -EINVAL;
> +#endif
> +
>         if (write_fault && !pte_write(*ptep)) {
>                 pfn = KVM_PFN_ERR_RO_FAULT;
>                 goto out;
>
> Jason



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23 14:34                                 ` Daniel Vetter
  0 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-23 14:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christian König, Intel Graphics Development,
	DRI Development, Sumit Semwal, linaro-mm-sig, John Stultz,
	Matthew Wilcox, Thomas Zimmermann, Daniel Vetter,
	Suren Baghdasaryan, Christian König, linux-media

On Wed, 23 Nov 2022 at 14:28, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Nov 23, 2022 at 02:12:25PM +0100, Christian König wrote:
> > Am 23.11.22 um 13:53 schrieb Jason Gunthorpe:
> > > On Wed, Nov 23, 2022 at 01:49:41PM +0100, Christian König wrote:
> > > > Am 23.11.22 um 13:46 schrieb Jason Gunthorpe:
> > > > > On Wed, Nov 23, 2022 at 11:06:55AM +0100, Daniel Vetter wrote:
> > > > >
> > > > > > > Maybe a GFP flag to set the page reference count to zero or something
> > > > > > > like this?
> > > > > > Hm yeah that might work. I'm not sure what it will all break though?
> > > > > > And we'd need to make sure that underflowing the page refcount dies in
> > > > > > a backtrace.
> > > > > Mucking with the refcount like this to protect against crazy out of
> > > > > tree drives seems horrible..
> > > > Well not only out of tree drivers. The intree KVM got that horrible
> > > > wrong as well, those where the latest guys complaining about it.
> > > kvm was taking refs on special PTEs? That seems really unlikely?
> >
> > Well then look at this code here:
> >
> > commit add6a0cd1c5ba51b201e1361b05a5df817083618
> > Author: Paolo Bonzini <pbonzini@redhat.com>
> > Date:   Tue Jun 7 17:51:18 2016 +0200
> >
> >     KVM: MMU: try to fix up page faults before giving up
> >
> >     The vGPU folks would like to trap the first access to a BAR by setting
> >     vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault
> > handler
> >     then can use remap_pfn_range to place some non-reserved pages in the
> > VMA.
> >
> >     This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn
> >     and fixup_user_fault together help supporting it.  The patch also
> > supports
> >     VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to
> >     reference counting.
> >
> >     Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> >     Cc: Andrea Arcangeli <aarcange@redhat.com>
> >     Cc: Radim Krčmář <rkrcmar@redhat.com>
> >     Tested-by: Neo Jia <cjia@nvidia.com>
> >     Reported-by: Kirti Wankhede <kwankhede@nvidia.com>
> >     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>
> This patch is known to be broken in so many ways. It also has a major
> security hole that it ignores the PTE flags making the page
> RO. Ignoring the special bit is somehow not surprising :(
>
> This probably doesn't work, but is the general idea of what KVM needs
> to do:
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 1376a47fedeedb..4161241fc3228c 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2598,6 +2598,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>                         return r;
>         }
>
> +       /*
> +        * Special PTEs are never convertible into a struct page, even if the
> +        * driver that owns them might have put a PFN with a struct page into
> +        * the PFNMAP. If the arch doesn't support special then we cannot
> +        * safely process these pages.
> +        */
> +#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
> +       if (pte_special(*ptep))
> +               return -EINVAL;

On second thought this wont work, because it completely defeats the
point of why this code here exists. remap_pfn_range() (which is what
the various dma_mmap functions and the ioremap functions are built on
top of too) sets VM_PFNMAP too, so this check would even catch the
static mappings.

Plus these static mappings aren't all that static either, e.g. pci
access also can revoke bar mappings nowadays.

I think nothing except full mmu_notifier will actually fix this.
-Daniel

> +#else
> +       return -EINVAL;
> +#endif
> +
>         if (write_fault && !pte_write(*ptep)) {
>                 pfn = KVM_PFN_ERR_RO_FAULT;
>                 goto out;
>
> Jason



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Intel-gfx] [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23 14:34                                 ` Daniel Vetter
  0 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-23 14:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christian König, Intel Graphics Development,
	DRI Development, Sumit Semwal, linaro-mm-sig, John Stultz,
	Matthew Wilcox, Thomas Zimmermann, Daniel Vetter,
	Suren Baghdasaryan, Christian König, linux-media

On Wed, 23 Nov 2022 at 14:28, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Nov 23, 2022 at 02:12:25PM +0100, Christian König wrote:
> > Am 23.11.22 um 13:53 schrieb Jason Gunthorpe:
> > > On Wed, Nov 23, 2022 at 01:49:41PM +0100, Christian König wrote:
> > > > Am 23.11.22 um 13:46 schrieb Jason Gunthorpe:
> > > > > On Wed, Nov 23, 2022 at 11:06:55AM +0100, Daniel Vetter wrote:
> > > > >
> > > > > > > Maybe a GFP flag to set the page reference count to zero or something
> > > > > > > like this?
> > > > > > Hm yeah that might work. I'm not sure what it will all break though?
> > > > > > And we'd need to make sure that underflowing the page refcount dies in
> > > > > > a backtrace.
> > > > > Mucking with the refcount like this to protect against crazy out of
> > > > > tree drives seems horrible..
> > > > Well not only out of tree drivers. The intree KVM got that horrible
> > > > wrong as well, those where the latest guys complaining about it.
> > > kvm was taking refs on special PTEs? That seems really unlikely?
> >
> > Well then look at this code here:
> >
> > commit add6a0cd1c5ba51b201e1361b05a5df817083618
> > Author: Paolo Bonzini <pbonzini@redhat.com>
> > Date:   Tue Jun 7 17:51:18 2016 +0200
> >
> >     KVM: MMU: try to fix up page faults before giving up
> >
> >     The vGPU folks would like to trap the first access to a BAR by setting
> >     vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault
> > handler
> >     then can use remap_pfn_range to place some non-reserved pages in the
> > VMA.
> >
> >     This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn
> >     and fixup_user_fault together help supporting it.  The patch also
> > supports
> >     VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to
> >     reference counting.
> >
> >     Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> >     Cc: Andrea Arcangeli <aarcange@redhat.com>
> >     Cc: Radim Krčmář <rkrcmar@redhat.com>
> >     Tested-by: Neo Jia <cjia@nvidia.com>
> >     Reported-by: Kirti Wankhede <kwankhede@nvidia.com>
> >     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>
> This patch is known to be broken in so many ways. It also has a major
> security hole that it ignores the PTE flags making the page
> RO. Ignoring the special bit is somehow not surprising :(
>
> This probably doesn't work, but is the general idea of what KVM needs
> to do:
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 1376a47fedeedb..4161241fc3228c 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2598,6 +2598,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>                         return r;
>         }
>
> +       /*
> +        * Special PTEs are never convertible into a struct page, even if the
> +        * driver that owns them might have put a PFN with a struct page into
> +        * the PFNMAP. If the arch doesn't support special then we cannot
> +        * safely process these pages.
> +        */
> +#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
> +       if (pte_special(*ptep))
> +               return -EINVAL;

On second thought this wont work, because it completely defeats the
point of why this code here exists. remap_pfn_range() (which is what
the various dma_mmap functions and the ioremap functions are built on
top of too) sets VM_PFNMAP too, so this check would even catch the
static mappings.

Plus these static mappings aren't all that static either, e.g. pci
access also can revoke bar mappings nowadays.

I think nothing except full mmu_notifier will actually fix this.
-Daniel

> +#else
> +       return -EINVAL;
> +#endif
> +
>         if (write_fault && !pte_write(*ptep)) {
>                 pfn = KVM_PFN_ERR_RO_FAULT;
>                 goto out;
>
> Jason



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-23 14:28                                 ` [Intel-gfx] " Daniel Vetter
@ 2022-11-23 15:04                                   ` Jason Gunthorpe
  -1 siblings, 0 replies; 72+ messages in thread
From: Jason Gunthorpe @ 2022-11-23 15:04 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Christian König, Christian König, DRI Development,
	Intel Graphics Development, Thomas Zimmermann,
	Suren Baghdasaryan, Matthew Wilcox, John Stultz, Daniel Vetter,
	Sumit Semwal, linux-media, linaro-mm-sig

On Wed, Nov 23, 2022 at 03:28:27PM +0100, Daniel Vetter wrote:

> > This patch is known to be broken in so many ways. It also has a major
> > security hole that it ignores the PTE flags making the page
> > RO. Ignoring the special bit is somehow not surprising :(
> >
> > This probably doesn't work, but is the general idea of what KVM needs
> > to do:
> 
> Oh dear, when I dug around in there I entirely missed that
> kvm_try_get_pfn exists, and it's very broken indeed. kvm really needs
> to grow a proper mmu notifier.
> 
> Another thing I'm wondering right now, the follow_pte();
> fixup_user_fault(); follow_pte(); approach does not make any
> guarantees of actually being right. If you're sufficiently unlucky you
> might race against an immediate pte invalidate between the fixup and
> the 2nd follow_pte(). But you can also not loop, because that would
> fail to catch permanent faults.

Yes, it is pretty broken.

kvm already has support for mmu notifiers and uses it for other
stuff. I can't remember what exactly this code path was for, IIRC
Paolo talked about having a big rework/fix for it when we last talked
about the missing write protect. I also vauagely recall he had some
explanation why this might be safe.

> I think the iommu fault drivers have a similar pattern.

Where? It shouldn't

The common code for SVA just calls handle_mm_fault() and restarts the
PRI. Since the page table is physically shared there is no issue with
a stale copy.

> What am I missing here? Or is that also just broken. gup works around
> this with the slow path that takes the mmap sem and walking the vma
> tree, follow_pte/fixup_user_fautl users dont.

follow_pte() is just fundamentally broken, things must not use it.

> Maybe mmu notifier based restarting would help with this too, if
> done properly.

That is called hmm_range_fault()

Jason

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23 15:04                                   ` Jason Gunthorpe
  0 siblings, 0 replies; 72+ messages in thread
From: Jason Gunthorpe @ 2022-11-23 15:04 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Christian König, Intel Graphics Development,
	DRI Development, Sumit Semwal, linaro-mm-sig, John Stultz,
	Matthew Wilcox, Thomas Zimmermann, Daniel Vetter,
	Suren Baghdasaryan, Christian König, linux-media

On Wed, Nov 23, 2022 at 03:28:27PM +0100, Daniel Vetter wrote:

> > This patch is known to be broken in so many ways. It also has a major
> > security hole that it ignores the PTE flags making the page
> > RO. Ignoring the special bit is somehow not surprising :(
> >
> > This probably doesn't work, but is the general idea of what KVM needs
> > to do:
> 
> Oh dear, when I dug around in there I entirely missed that
> kvm_try_get_pfn exists, and it's very broken indeed. kvm really needs
> to grow a proper mmu notifier.
> 
> Another thing I'm wondering right now, the follow_pte();
> fixup_user_fault(); follow_pte(); approach does not make any
> guarantees of actually being right. If you're sufficiently unlucky you
> might race against an immediate pte invalidate between the fixup and
> the 2nd follow_pte(). But you can also not loop, because that would
> fail to catch permanent faults.

Yes, it is pretty broken.

kvm already has support for mmu notifiers and uses it for other
stuff. I can't remember what exactly this code path was for, IIRC
Paolo talked about having a big rework/fix for it when we last talked
about the missing write protect. I also vauagely recall he had some
explanation why this might be safe.

> I think the iommu fault drivers have a similar pattern.

Where? It shouldn't

The common code for SVA just calls handle_mm_fault() and restarts the
PRI. Since the page table is physically shared there is no issue with
a stale copy.

> What am I missing here? Or is that also just broken. gup works around
> this with the slow path that takes the mmap sem and walking the vma
> tree, follow_pte/fixup_user_fautl users dont.

follow_pte() is just fundamentally broken, things must not use it.

> Maybe mmu notifier based restarting would help with this too, if
> done properly.

That is called hmm_range_fault()

Jason

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-23 14:34                                 ` Daniel Vetter
@ 2022-11-23 15:08                                   ` Jason Gunthorpe
  -1 siblings, 0 replies; 72+ messages in thread
From: Jason Gunthorpe @ 2022-11-23 15:08 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Christian König, Intel Graphics Development,
	DRI Development, Sumit Semwal, linaro-mm-sig, John Stultz,
	Matthew Wilcox, Thomas Zimmermann, Daniel Vetter,
	Suren Baghdasaryan, Christian König, linux-media

On Wed, Nov 23, 2022 at 03:34:54PM +0100, Daniel Vetter wrote:
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 1376a47fedeedb..4161241fc3228c 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -2598,6 +2598,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
> >                         return r;
> >         }
> >
> > +       /*
> > +        * Special PTEs are never convertible into a struct page, even if the
> > +        * driver that owns them might have put a PFN with a struct page into
> > +        * the PFNMAP. If the arch doesn't support special then we cannot
> > +        * safely process these pages.
> > +        */
> > +#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
> > +       if (pte_special(*ptep))
> > +               return -EINVAL;
> 
> On second thought this wont work, because it completely defeats the
> point of why this code here exists. remap_pfn_range() (which is what
> the various dma_mmap functions and the ioremap functions are built on
> top of too) sets VM_PFNMAP too, so this check would even catch the
> static mappings.

The problem with the way this code is designed is how it allows
returning the pfn without taking any reference based on things like
!pfn_valid or page_reserved. This allows it to then conditionally put
back the reference based on the same reasoning. It is impossible to
thread pte special into that since it is a PTE flag, not a property of
the PFN.

I don't entirely understand why it needs the page reference at all,
even if it is available - so I can't guess why it is OK to ignore the
page reference in other cases, or why it is OK to be racy..

Eg hmm_range_fault() does not obtain page references and implements a
very similar algorithm to kvm.

> Plus these static mappings aren't all that static either, e.g. pci
> access also can revoke bar mappings nowadays.

And there are already mmu notifiers to handle that, AFAIK.

Jason

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23 15:08                                   ` Jason Gunthorpe
  0 siblings, 0 replies; 72+ messages in thread
From: Jason Gunthorpe @ 2022-11-23 15:08 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Christian König, Christian König, DRI Development,
	Intel Graphics Development, Thomas Zimmermann,
	Suren Baghdasaryan, Matthew Wilcox, John Stultz, Daniel Vetter,
	Sumit Semwal, linux-media, linaro-mm-sig

On Wed, Nov 23, 2022 at 03:34:54PM +0100, Daniel Vetter wrote:
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 1376a47fedeedb..4161241fc3228c 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -2598,6 +2598,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
> >                         return r;
> >         }
> >
> > +       /*
> > +        * Special PTEs are never convertible into a struct page, even if the
> > +        * driver that owns them might have put a PFN with a struct page into
> > +        * the PFNMAP. If the arch doesn't support special then we cannot
> > +        * safely process these pages.
> > +        */
> > +#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
> > +       if (pte_special(*ptep))
> > +               return -EINVAL;
> 
> On second thought this wont work, because it completely defeats the
> point of why this code here exists. remap_pfn_range() (which is what
> the various dma_mmap functions and the ioremap functions are built on
> top of too) sets VM_PFNMAP too, so this check would even catch the
> static mappings.

The problem with the way this code is designed is how it allows
returning the pfn without taking any reference based on things like
!pfn_valid or page_reserved. This allows it to then conditionally put
back the reference based on the same reasoning. It is impossible to
thread pte special into that since it is a PTE flag, not a property of
the PFN.

I don't entirely understand why it needs the page reference at all,
even if it is available - so I can't guess why it is OK to ignore the
page reference in other cases, or why it is OK to be racy..

Eg hmm_range_fault() does not obtain page references and implements a
very similar algorithm to kvm.

> Plus these static mappings aren't all that static either, e.g. pci
> access also can revoke bar mappings nowadays.

And there are already mmu notifiers to handle that, AFAIK.

Jason

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-23 15:08                                   ` Jason Gunthorpe
  (?)
@ 2022-11-23 15:15                                     ` Christian König
  -1 siblings, 0 replies; 72+ messages in thread
From: Christian König @ 2022-11-23 15:15 UTC (permalink / raw)
  To: Jason Gunthorpe, Daniel Vetter
  Cc: Christian König, DRI Development,
	Intel Graphics Development, Thomas Zimmermann,
	Suren Baghdasaryan, Matthew Wilcox, John Stultz, Daniel Vetter,
	Sumit Semwal, linux-media, linaro-mm-sig

Am 23.11.22 um 16:08 schrieb Jason Gunthorpe:
> On Wed, Nov 23, 2022 at 03:34:54PM +0100, Daniel Vetter wrote:
>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>> index 1376a47fedeedb..4161241fc3228c 100644
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -2598,6 +2598,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>>>                          return r;
>>>          }
>>>
>>> +       /*
>>> +        * Special PTEs are never convertible into a struct page, even if the
>>> +        * driver that owns them might have put a PFN with a struct page into
>>> +        * the PFNMAP. If the arch doesn't support special then we cannot
>>> +        * safely process these pages.
>>> +        */
>>> +#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
>>> +       if (pte_special(*ptep))
>>> +               return -EINVAL;
>> On second thought this wont work, because it completely defeats the
>> point of why this code here exists. remap_pfn_range() (which is what
>> the various dma_mmap functions and the ioremap functions are built on
>> top of too) sets VM_PFNMAP too, so this check would even catch the
>> static mappings.
> The problem with the way this code is designed is how it allows
> returning the pfn without taking any reference based on things like
> !pfn_valid or page_reserved. This allows it to then conditionally put
> back the reference based on the same reasoning. It is impossible to
> thread pte special into that since it is a PTE flag, not a property of
> the PFN.
>
> I don't entirely understand why it needs the page reference at all,

That's exactly what I've pointed out in the previous discussion about 
that code as well.

As far as I can see it this is just another case where people assumed 
that grabbing a page reference somehow magically prevents the pte from 
changing.

I have not the slightest idea how people got this impression, but I have 
heard it so many time from so many different sources that there must be 
some common cause to this. Is the maybe some book or tutorial how to 
sophisticate break the kernel or something like this?

Anyway as far as I can see only correct approach would be to use an MMU 
notifier or more high level hmm_range_fault()+seq number.

Regards,
Christian.

> even if it is available - so I can't guess why it is OK to ignore the
> page reference in other cases, or why it is OK to be racy..
>
> Eg hmm_range_fault() does not obtain page references and implements a
> very similar algorithm to kvm.
>
>> Plus these static mappings aren't all that static either, e.g. pci
>> access also can revoke bar mappings nowadays.
> And there are already mmu notifiers to handle that, AFAIK.
>
> Jason


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23 15:15                                     ` Christian König
  0 siblings, 0 replies; 72+ messages in thread
From: Christian König @ 2022-11-23 15:15 UTC (permalink / raw)
  To: Jason Gunthorpe, Daniel Vetter
  Cc: Intel Graphics Development, Matthew Wilcox, Sumit Semwal,
	linaro-mm-sig, John Stultz, DRI Development, Thomas Zimmermann,
	Daniel Vetter, Suren Baghdasaryan, Christian König,
	linux-media

Am 23.11.22 um 16:08 schrieb Jason Gunthorpe:
> On Wed, Nov 23, 2022 at 03:34:54PM +0100, Daniel Vetter wrote:
>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>> index 1376a47fedeedb..4161241fc3228c 100644
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -2598,6 +2598,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>>>                          return r;
>>>          }
>>>
>>> +       /*
>>> +        * Special PTEs are never convertible into a struct page, even if the
>>> +        * driver that owns them might have put a PFN with a struct page into
>>> +        * the PFNMAP. If the arch doesn't support special then we cannot
>>> +        * safely process these pages.
>>> +        */
>>> +#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
>>> +       if (pte_special(*ptep))
>>> +               return -EINVAL;
>> On second thought this wont work, because it completely defeats the
>> point of why this code here exists. remap_pfn_range() (which is what
>> the various dma_mmap functions and the ioremap functions are built on
>> top of too) sets VM_PFNMAP too, so this check would even catch the
>> static mappings.
> The problem with the way this code is designed is how it allows
> returning the pfn without taking any reference based on things like
> !pfn_valid or page_reserved. This allows it to then conditionally put
> back the reference based on the same reasoning. It is impossible to
> thread pte special into that since it is a PTE flag, not a property of
> the PFN.
>
> I don't entirely understand why it needs the page reference at all,

That's exactly what I've pointed out in the previous discussion about 
that code as well.

As far as I can see it this is just another case where people assumed 
that grabbing a page reference somehow magically prevents the pte from 
changing.

I have not the slightest idea how people got this impression, but I have 
heard it so many time from so many different sources that there must be 
some common cause to this. Is the maybe some book or tutorial how to 
sophisticate break the kernel or something like this?

Anyway as far as I can see only correct approach would be to use an MMU 
notifier or more high level hmm_range_fault()+seq number.

Regards,
Christian.

> even if it is available - so I can't guess why it is OK to ignore the
> page reference in other cases, or why it is OK to be racy..
>
> Eg hmm_range_fault() does not obtain page references and implements a
> very similar algorithm to kvm.
>
>> Plus these static mappings aren't all that static either, e.g. pci
>> access also can revoke bar mappings nowadays.
> And there are already mmu notifiers to handle that, AFAIK.
>
> Jason


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Intel-gfx] [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23 15:15                                     ` Christian König
  0 siblings, 0 replies; 72+ messages in thread
From: Christian König @ 2022-11-23 15:15 UTC (permalink / raw)
  To: Jason Gunthorpe, Daniel Vetter
  Cc: Intel Graphics Development, Matthew Wilcox, Sumit Semwal,
	linaro-mm-sig, John Stultz, DRI Development, Thomas Zimmermann,
	Daniel Vetter, Suren Baghdasaryan, Christian König,
	linux-media

Am 23.11.22 um 16:08 schrieb Jason Gunthorpe:
> On Wed, Nov 23, 2022 at 03:34:54PM +0100, Daniel Vetter wrote:
>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>> index 1376a47fedeedb..4161241fc3228c 100644
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -2598,6 +2598,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>>>                          return r;
>>>          }
>>>
>>> +       /*
>>> +        * Special PTEs are never convertible into a struct page, even if the
>>> +        * driver that owns them might have put a PFN with a struct page into
>>> +        * the PFNMAP. If the arch doesn't support special then we cannot
>>> +        * safely process these pages.
>>> +        */
>>> +#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
>>> +       if (pte_special(*ptep))
>>> +               return -EINVAL;
>> On second thought this wont work, because it completely defeats the
>> point of why this code here exists. remap_pfn_range() (which is what
>> the various dma_mmap functions and the ioremap functions are built on
>> top of too) sets VM_PFNMAP too, so this check would even catch the
>> static mappings.
> The problem with the way this code is designed is how it allows
> returning the pfn without taking any reference based on things like
> !pfn_valid or page_reserved. This allows it to then conditionally put
> back the reference based on the same reasoning. It is impossible to
> thread pte special into that since it is a PTE flag, not a property of
> the PFN.
>
> I don't entirely understand why it needs the page reference at all,

That's exactly what I've pointed out in the previous discussion about 
that code as well.

As far as I can see it this is just another case where people assumed 
that grabbing a page reference somehow magically prevents the pte from 
changing.

I have not the slightest idea how people got this impression, but I have 
heard it so many time from so many different sources that there must be 
some common cause to this. Is the maybe some book or tutorial how to 
sophisticate break the kernel or something like this?

Anyway as far as I can see only correct approach would be to use an MMU 
notifier or more high level hmm_range_fault()+seq number.

Regards,
Christian.

> even if it is available - so I can't guess why it is OK to ignore the
> page reference in other cases, or why it is OK to be racy..
>
> Eg hmm_range_fault() does not obtain page references and implements a
> very similar algorithm to kvm.
>
>> Plus these static mappings aren't all that static either, e.g. pci
>> access also can revoke bar mappings nowadays.
> And there are already mmu notifiers to handle that, AFAIK.
>
> Jason


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-23 15:04                                   ` Jason Gunthorpe
  (?)
@ 2022-11-23 16:22                                     ` Daniel Vetter
  -1 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-23 16:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christian König, Christian König, DRI Development,
	Intel Graphics Development, Thomas Zimmermann,
	Suren Baghdasaryan, Matthew Wilcox, John Stultz, Daniel Vetter,
	Sumit Semwal, linux-media, linaro-mm-sig

On Wed, 23 Nov 2022 at 16:04, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Nov 23, 2022 at 03:28:27PM +0100, Daniel Vetter wrote:
>
> > > This patch is known to be broken in so many ways. It also has a major
> > > security hole that it ignores the PTE flags making the page
> > > RO. Ignoring the special bit is somehow not surprising :(
> > >
> > > This probably doesn't work, but is the general idea of what KVM needs
> > > to do:
> >
> > Oh dear, when I dug around in there I entirely missed that
> > kvm_try_get_pfn exists, and it's very broken indeed. kvm really needs
> > to grow a proper mmu notifier.
> >
> > Another thing I'm wondering right now, the follow_pte();
> > fixup_user_fault(); follow_pte(); approach does not make any
> > guarantees of actually being right. If you're sufficiently unlucky you
> > might race against an immediate pte invalidate between the fixup and
> > the 2nd follow_pte(). But you can also not loop, because that would
> > fail to catch permanent faults.
>
> Yes, it is pretty broken.
>
> kvm already has support for mmu notifiers and uses it for other
> stuff. I can't remember what exactly this code path was for, IIRC
> Paolo talked about having a big rework/fix for it when we last talked
> about the missing write protect. I also vauagely recall he had some
> explanation why this might be safe.
>
> > I think the iommu fault drivers have a similar pattern.
>
> Where? It shouldn't
>
> The common code for SVA just calls handle_mm_fault() and restarts the
> PRI. Since the page table is physically shared there is no issue with
> a stale copy.
>
> > What am I missing here? Or is that also just broken. gup works around
> > this with the slow path that takes the mmap sem and walking the vma
> > tree, follow_pte/fixup_user_fautl users dont.
>
> follow_pte() is just fundamentally broken, things must not use it.
>
> > Maybe mmu notifier based restarting would help with this too, if
> > done properly.
>
> That is called hmm_range_fault()

Ah right I mixed that up on a quick grep, thanks for pointing me in
the right direction. Worries appeased.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23 16:22                                     ` Daniel Vetter
  0 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-23 16:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christian König, Intel Graphics Development,
	DRI Development, Sumit Semwal, linaro-mm-sig, John Stultz,
	Matthew Wilcox, Thomas Zimmermann, Daniel Vetter,
	Suren Baghdasaryan, Christian König, linux-media

On Wed, 23 Nov 2022 at 16:04, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Nov 23, 2022 at 03:28:27PM +0100, Daniel Vetter wrote:
>
> > > This patch is known to be broken in so many ways. It also has a major
> > > security hole that it ignores the PTE flags making the page
> > > RO. Ignoring the special bit is somehow not surprising :(
> > >
> > > This probably doesn't work, but is the general idea of what KVM needs
> > > to do:
> >
> > Oh dear, when I dug around in there I entirely missed that
> > kvm_try_get_pfn exists, and it's very broken indeed. kvm really needs
> > to grow a proper mmu notifier.
> >
> > Another thing I'm wondering right now, the follow_pte();
> > fixup_user_fault(); follow_pte(); approach does not make any
> > guarantees of actually being right. If you're sufficiently unlucky you
> > might race against an immediate pte invalidate between the fixup and
> > the 2nd follow_pte(). But you can also not loop, because that would
> > fail to catch permanent faults.
>
> Yes, it is pretty broken.
>
> kvm already has support for mmu notifiers and uses it for other
> stuff. I can't remember what exactly this code path was for, IIRC
> Paolo talked about having a big rework/fix for it when we last talked
> about the missing write protect. I also vauagely recall he had some
> explanation why this might be safe.
>
> > I think the iommu fault drivers have a similar pattern.
>
> Where? It shouldn't
>
> The common code for SVA just calls handle_mm_fault() and restarts the
> PRI. Since the page table is physically shared there is no issue with
> a stale copy.
>
> > What am I missing here? Or is that also just broken. gup works around
> > this with the slow path that takes the mmap sem and walking the vma
> > tree, follow_pte/fixup_user_fautl users dont.
>
> follow_pte() is just fundamentally broken, things must not use it.
>
> > Maybe mmu notifier based restarting would help with this too, if
> > done properly.
>
> That is called hmm_range_fault()

Ah right I mixed that up on a quick grep, thanks for pointing me in
the right direction. Worries appeased.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Intel-gfx] [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23 16:22                                     ` Daniel Vetter
  0 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-23 16:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christian König, Intel Graphics Development,
	DRI Development, Sumit Semwal, linaro-mm-sig, John Stultz,
	Matthew Wilcox, Thomas Zimmermann, Daniel Vetter,
	Suren Baghdasaryan, Christian König, linux-media

On Wed, 23 Nov 2022 at 16:04, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Nov 23, 2022 at 03:28:27PM +0100, Daniel Vetter wrote:
>
> > > This patch is known to be broken in so many ways. It also has a major
> > > security hole that it ignores the PTE flags making the page
> > > RO. Ignoring the special bit is somehow not surprising :(
> > >
> > > This probably doesn't work, but is the general idea of what KVM needs
> > > to do:
> >
> > Oh dear, when I dug around in there I entirely missed that
> > kvm_try_get_pfn exists, and it's very broken indeed. kvm really needs
> > to grow a proper mmu notifier.
> >
> > Another thing I'm wondering right now, the follow_pte();
> > fixup_user_fault(); follow_pte(); approach does not make any
> > guarantees of actually being right. If you're sufficiently unlucky you
> > might race against an immediate pte invalidate between the fixup and
> > the 2nd follow_pte(). But you can also not loop, because that would
> > fail to catch permanent faults.
>
> Yes, it is pretty broken.
>
> kvm already has support for mmu notifiers and uses it for other
> stuff. I can't remember what exactly this code path was for, IIRC
> Paolo talked about having a big rework/fix for it when we last talked
> about the missing write protect. I also vauagely recall he had some
> explanation why this might be safe.
>
> > I think the iommu fault drivers have a similar pattern.
>
> Where? It shouldn't
>
> The common code for SVA just calls handle_mm_fault() and restarts the
> PRI. Since the page table is physically shared there is no issue with
> a stale copy.
>
> > What am I missing here? Or is that also just broken. gup works around
> > this with the slow path that takes the mmap sem and walking the vma
> > tree, follow_pte/fixup_user_fautl users dont.
>
> follow_pte() is just fundamentally broken, things must not use it.
>
> > Maybe mmu notifier based restarting would help with this too, if
> > done properly.
>
> That is called hmm_range_fault()

Ah right I mixed that up on a quick grep, thanks for pointing me in
the right direction. Worries appeased.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-23 15:15                                     ` Christian König
  (?)
@ 2022-11-23 16:26                                       ` Daniel Vetter
  -1 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-23 16:26 UTC (permalink / raw)
  To: Christian König
  Cc: Jason Gunthorpe, DRI Development, Intel Graphics Development,
	Thomas Zimmermann, Suren Baghdasaryan, Matthew Wilcox,
	John Stultz, Daniel Vetter, Sumit Semwal, linux-media,
	linaro-mm-sig

On Wed, 23 Nov 2022 at 16:15, Christian König <christian.koenig@amd.com> wrote:
>
> Am 23.11.22 um 16:08 schrieb Jason Gunthorpe:
> > On Wed, Nov 23, 2022 at 03:34:54PM +0100, Daniel Vetter wrote:
> >>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >>> index 1376a47fedeedb..4161241fc3228c 100644
> >>> --- a/virt/kvm/kvm_main.c
> >>> +++ b/virt/kvm/kvm_main.c
> >>> @@ -2598,6 +2598,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
> >>>                          return r;
> >>>          }
> >>>
> >>> +       /*
> >>> +        * Special PTEs are never convertible into a struct page, even if the
> >>> +        * driver that owns them might have put a PFN with a struct page into
> >>> +        * the PFNMAP. If the arch doesn't support special then we cannot
> >>> +        * safely process these pages.
> >>> +        */
> >>> +#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
> >>> +       if (pte_special(*ptep))
> >>> +               return -EINVAL;
> >> On second thought this wont work, because it completely defeats the
> >> point of why this code here exists. remap_pfn_range() (which is what
> >> the various dma_mmap functions and the ioremap functions are built on
> >> top of too) sets VM_PFNMAP too, so this check would even catch the
> >> static mappings.
> > The problem with the way this code is designed is how it allows
> > returning the pfn without taking any reference based on things like
> > !pfn_valid or page_reserved. This allows it to then conditionally put
> > back the reference based on the same reasoning. It is impossible to
> > thread pte special into that since it is a PTE flag, not a property of
> > the PFN.
> >
> > I don't entirely understand why it needs the page reference at all,
>
> That's exactly what I've pointed out in the previous discussion about
> that code as well.
>
> As far as I can see it this is just another case where people assumed
> that grabbing a page reference somehow magically prevents the pte from
> changing.
>
> I have not the slightest idea how people got this impression, but I have
> heard it so many time from so many different sources that there must be
> some common cause to this. Is the maybe some book or tutorial how to
> sophisticate break the kernel or something like this?

It's what get_user_pages does, so it does "work". Except this path
here is the fallback for when get_user_pages does not work (because of
the pte_special/VM_SPECIAL case). So essentially it's just a rather
broken get_user_pages that handrolls a bunch of things with
bugs&races.

I have no idea why people don't realize they're just reinventing gup
without using gup, but that's essentially what's going on.

> Anyway as far as I can see only correct approach would be to use an MMU
> notifier or more high level hmm_range_fault()+seq number.

Yeah, plus if you go through ptes you really have to obey all the
flags or things will break. Especially the RO pte flag.
-Daniel

>
> Regards,
> Christian.
>
> > even if it is available - so I can't guess why it is OK to ignore the
> > page reference in other cases, or why it is OK to be racy..
> >
> > Eg hmm_range_fault() does not obtain page references and implements a
> > very similar algorithm to kvm.
> >
> >> Plus these static mappings aren't all that static either, e.g. pci
> >> access also can revoke bar mappings nowadays.
> > And there are already mmu notifiers to handle that, AFAIK.
> >
> > Jason
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23 16:26                                       ` Daniel Vetter
  0 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-23 16:26 UTC (permalink / raw)
  To: Christian König
  Cc: Intel Graphics Development, Matthew Wilcox, linaro-mm-sig,
	Jason Gunthorpe, John Stultz, DRI Development, Thomas Zimmermann,
	Daniel Vetter, Suren Baghdasaryan, Sumit Semwal, linux-media

On Wed, 23 Nov 2022 at 16:15, Christian König <christian.koenig@amd.com> wrote:
>
> Am 23.11.22 um 16:08 schrieb Jason Gunthorpe:
> > On Wed, Nov 23, 2022 at 03:34:54PM +0100, Daniel Vetter wrote:
> >>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >>> index 1376a47fedeedb..4161241fc3228c 100644
> >>> --- a/virt/kvm/kvm_main.c
> >>> +++ b/virt/kvm/kvm_main.c
> >>> @@ -2598,6 +2598,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
> >>>                          return r;
> >>>          }
> >>>
> >>> +       /*
> >>> +        * Special PTEs are never convertible into a struct page, even if the
> >>> +        * driver that owns them might have put a PFN with a struct page into
> >>> +        * the PFNMAP. If the arch doesn't support special then we cannot
> >>> +        * safely process these pages.
> >>> +        */
> >>> +#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
> >>> +       if (pte_special(*ptep))
> >>> +               return -EINVAL;
> >> On second thought this wont work, because it completely defeats the
> >> point of why this code here exists. remap_pfn_range() (which is what
> >> the various dma_mmap functions and the ioremap functions are built on
> >> top of too) sets VM_PFNMAP too, so this check would even catch the
> >> static mappings.
> > The problem with the way this code is designed is how it allows
> > returning the pfn without taking any reference based on things like
> > !pfn_valid or page_reserved. This allows it to then conditionally put
> > back the reference based on the same reasoning. It is impossible to
> > thread pte special into that since it is a PTE flag, not a property of
> > the PFN.
> >
> > I don't entirely understand why it needs the page reference at all,
>
> That's exactly what I've pointed out in the previous discussion about
> that code as well.
>
> As far as I can see it this is just another case where people assumed
> that grabbing a page reference somehow magically prevents the pte from
> changing.
>
> I have not the slightest idea how people got this impression, but I have
> heard it so many time from so many different sources that there must be
> some common cause to this. Is the maybe some book or tutorial how to
> sophisticate break the kernel or something like this?

It's what get_user_pages does, so it does "work". Except this path
here is the fallback for when get_user_pages does not work (because of
the pte_special/VM_SPECIAL case). So essentially it's just a rather
broken get_user_pages that handrolls a bunch of things with
bugs&races.

I have no idea why people don't realize they're just reinventing gup
without using gup, but that's essentially what's going on.

> Anyway as far as I can see only correct approach would be to use an MMU
> notifier or more high level hmm_range_fault()+seq number.

Yeah, plus if you go through ptes you really have to obey all the
flags or things will break. Especially the RO pte flag.
-Daniel

>
> Regards,
> Christian.
>
> > even if it is available - so I can't guess why it is OK to ignore the
> > page reference in other cases, or why it is OK to be racy..
> >
> > Eg hmm_range_fault() does not obtain page references and implements a
> > very similar algorithm to kvm.
> >
> >> Plus these static mappings aren't all that static either, e.g. pci
> >> access also can revoke bar mappings nowadays.
> > And there are already mmu notifiers to handle that, AFAIK.
> >
> > Jason
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Intel-gfx] [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23 16:26                                       ` Daniel Vetter
  0 siblings, 0 replies; 72+ messages in thread
From: Daniel Vetter @ 2022-11-23 16:26 UTC (permalink / raw)
  To: Christian König
  Cc: Intel Graphics Development, Matthew Wilcox, linaro-mm-sig,
	Jason Gunthorpe, John Stultz, DRI Development, Thomas Zimmermann,
	Daniel Vetter, Suren Baghdasaryan, Sumit Semwal, linux-media

On Wed, 23 Nov 2022 at 16:15, Christian König <christian.koenig@amd.com> wrote:
>
> Am 23.11.22 um 16:08 schrieb Jason Gunthorpe:
> > On Wed, Nov 23, 2022 at 03:34:54PM +0100, Daniel Vetter wrote:
> >>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >>> index 1376a47fedeedb..4161241fc3228c 100644
> >>> --- a/virt/kvm/kvm_main.c
> >>> +++ b/virt/kvm/kvm_main.c
> >>> @@ -2598,6 +2598,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
> >>>                          return r;
> >>>          }
> >>>
> >>> +       /*
> >>> +        * Special PTEs are never convertible into a struct page, even if the
> >>> +        * driver that owns them might have put a PFN with a struct page into
> >>> +        * the PFNMAP. If the arch doesn't support special then we cannot
> >>> +        * safely process these pages.
> >>> +        */
> >>> +#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
> >>> +       if (pte_special(*ptep))
> >>> +               return -EINVAL;
> >> On second thought this wont work, because it completely defeats the
> >> point of why this code here exists. remap_pfn_range() (which is what
> >> the various dma_mmap functions and the ioremap functions are built on
> >> top of too) sets VM_PFNMAP too, so this check would even catch the
> >> static mappings.
> > The problem with the way this code is designed is how it allows
> > returning the pfn without taking any reference based on things like
> > !pfn_valid or page_reserved. This allows it to then conditionally put
> > back the reference based on the same reasoning. It is impossible to
> > thread pte special into that since it is a PTE flag, not a property of
> > the PFN.
> >
> > I don't entirely understand why it needs the page reference at all,
>
> That's exactly what I've pointed out in the previous discussion about
> that code as well.
>
> As far as I can see it this is just another case where people assumed
> that grabbing a page reference somehow magically prevents the pte from
> changing.
>
> I have not the slightest idea how people got this impression, but I have
> heard it so many time from so many different sources that there must be
> some common cause to this. Is the maybe some book or tutorial how to
> sophisticate break the kernel or something like this?

It's what get_user_pages does, so it does "work". Except this path
here is the fallback for when get_user_pages does not work (because of
the pte_special/VM_SPECIAL case). So essentially it's just a rather
broken get_user_pages that handrolls a bunch of things with
bugs&races.

I have no idea why people don't realize they're just reinventing gup
without using gup, but that's essentially what's going on.

> Anyway as far as I can see only correct approach would be to use an MMU
> notifier or more high level hmm_range_fault()+seq number.

Yeah, plus if you go through ptes you really have to obey all the
flags or things will break. Especially the RO pte flag.
-Daniel

>
> Regards,
> Christian.
>
> > even if it is available - so I can't guess why it is OK to ignore the
> > page reference in other cases, or why it is OK to be racy..
> >
> > Eg hmm_range_fault() does not obtain page references and implements a
> > very similar algorithm to kvm.
> >
> >> Plus these static mappings aren't all that static either, e.g. pci
> >> access also can revoke bar mappings nowadays.
> > And there are already mmu notifiers to handle that, AFAIK.
> >
> > Jason
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
  2022-11-23 15:15                                     ` Christian König
@ 2022-11-23 16:26                                       ` Jason Gunthorpe
  -1 siblings, 0 replies; 72+ messages in thread
From: Jason Gunthorpe @ 2022-11-23 16:26 UTC (permalink / raw)
  To: Christian König
  Cc: Daniel Vetter, DRI Development, Intel Graphics Development,
	Thomas Zimmermann, Suren Baghdasaryan, Matthew Wilcox,
	John Stultz, Daniel Vetter, Sumit Semwal, linux-media,
	linaro-mm-sig

On Wed, Nov 23, 2022 at 04:15:05PM +0100, Christian König wrote:

> I have not the slightest idea how people got this impression, but I have
> heard it so many time from so many different sources that there must be some
> common cause to this. Is the maybe some book or tutorial how to sophisticate
> break the kernel or something like this?
> 
> Anyway as far as I can see only correct approach would be to use an MMU
> notifier or more high level hmm_range_fault()+seq number.

It already uses a mmu notifier, this is why I have no idea what the
page ref is doing..

Jason

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
@ 2022-11-23 16:26                                       ` Jason Gunthorpe
  0 siblings, 0 replies; 72+ messages in thread
From: Jason Gunthorpe @ 2022-11-23 16:26 UTC (permalink / raw)
  To: Christian König
  Cc: Daniel Vetter, Intel Graphics Development, Matthew Wilcox,
	linaro-mm-sig, John Stultz, DRI Development, Thomas Zimmermann,
	Daniel Vetter, Suren Baghdasaryan, Sumit Semwal, linux-media

On Wed, Nov 23, 2022 at 04:15:05PM +0100, Christian König wrote:

> I have not the slightest idea how people got this impression, but I have
> heard it so many time from so many different sources that there must be some
> common cause to this. Is the maybe some book or tutorial how to sophisticate
> break the kernel or something like this?
> 
> Anyway as far as I can see only correct approach would be to use an MMU
> notifier or more high level hmm_range_fault()+seq number.

It already uses a mmu notifier, this is why I have no idea what the
page ref is doing..

Jason

^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2022-11-23 16:27 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-22 17:08 [PATCH] dma-buf: Require VM_PFNMAP vma for mmap Daniel Vetter
2022-11-22 17:08 ` [Intel-gfx] " Daniel Vetter
2022-11-22 17:08 ` Daniel Vetter
2022-11-22 18:03 ` Jason Gunthorpe
2022-11-22 18:03   ` Jason Gunthorpe
2022-11-22 18:08   ` Daniel Vetter
2022-11-22 18:08     ` [Intel-gfx] " Daniel Vetter
2022-11-22 18:08     ` Daniel Vetter
2022-11-22 18:50     ` Jason Gunthorpe
2022-11-22 18:50       ` Jason Gunthorpe
2022-11-22 19:29       ` Daniel Vetter
2022-11-22 19:29         ` [Intel-gfx] " Daniel Vetter
2022-11-22 19:29         ` Daniel Vetter
2022-11-22 19:34         ` Jason Gunthorpe
2022-11-22 19:34           ` Jason Gunthorpe
2022-11-22 19:50           ` Daniel Vetter
2022-11-22 19:50             ` [Intel-gfx] " Daniel Vetter
2022-11-22 19:50             ` Daniel Vetter
2022-11-23  9:06             ` Christian König
2022-11-23  9:06               ` Christian König
2022-11-23  9:06               ` [Intel-gfx] " Christian König
2022-11-23  9:30               ` Daniel Vetter
2022-11-23  9:30                 ` Daniel Vetter
2022-11-23  9:30                 ` [Intel-gfx] " Daniel Vetter
2022-11-23  9:39                 ` [Linaro-mm-sig] " Christian König
2022-11-23  9:39                   ` Christian König
2022-11-23  9:39                   ` [Intel-gfx] " Christian König
2022-11-23 10:06                   ` Daniel Vetter
2022-11-23 10:06                     ` Daniel Vetter
2022-11-23 10:06                     ` [Intel-gfx] " Daniel Vetter
2022-11-23 12:46                     ` Jason Gunthorpe
2022-11-23 12:46                       ` Jason Gunthorpe
2022-11-23 12:49                       ` Christian König
2022-11-23 12:49                         ` Christian König
2022-11-23 12:49                         ` [Intel-gfx] " Christian König
2022-11-23 12:53                         ` Jason Gunthorpe
2022-11-23 12:53                           ` Jason Gunthorpe
2022-11-23 13:12                           ` Christian König
2022-11-23 13:12                             ` Christian König
2022-11-23 13:12                             ` [Intel-gfx] " Christian König
2022-11-23 13:28                             ` Jason Gunthorpe
2022-11-23 13:28                               ` Jason Gunthorpe
2022-11-23 14:28                               ` Daniel Vetter
2022-11-23 14:28                                 ` Daniel Vetter
2022-11-23 14:28                                 ` [Intel-gfx] " Daniel Vetter
2022-11-23 15:04                                 ` Jason Gunthorpe
2022-11-23 15:04                                   ` Jason Gunthorpe
2022-11-23 16:22                                   ` Daniel Vetter
2022-11-23 16:22                                     ` [Intel-gfx] " Daniel Vetter
2022-11-23 16:22                                     ` Daniel Vetter
2022-11-23 14:34                               ` Daniel Vetter
2022-11-23 14:34                                 ` [Intel-gfx] " Daniel Vetter
2022-11-23 14:34                                 ` Daniel Vetter
2022-11-23 15:08                                 ` Jason Gunthorpe
2022-11-23 15:08                                   ` Jason Gunthorpe
2022-11-23 15:15                                   ` Christian König
2022-11-23 15:15                                     ` [Intel-gfx] " Christian König
2022-11-23 15:15                                     ` Christian König
2022-11-23 16:26                                     ` Daniel Vetter
2022-11-23 16:26                                       ` [Intel-gfx] " Daniel Vetter
2022-11-23 16:26                                       ` Daniel Vetter
2022-11-23 16:26                                     ` Jason Gunthorpe
2022-11-23 16:26                                       ` Jason Gunthorpe
2022-11-22 19:43 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2022-11-22 20:08 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-11-23  8:07 ` [PATCH] " Thomas Zimmermann
2022-11-23  8:07   ` [Intel-gfx] " Thomas Zimmermann
2022-11-23  8:07   ` Thomas Zimmermann
2022-11-23  9:33   ` Daniel Vetter
2022-11-23  9:33     ` Daniel Vetter
2022-11-23  9:33     ` [Intel-gfx] " Daniel Vetter
2022-11-23  9:58 ` [Intel-gfx] ✗ Fi.CI.IGT: failure for " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.