All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel.vetter@ffwll.ch>
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>
Cc: "Intel Graphics Development" <intel-gfx@lists.freedesktop.org>,
	"DRI Development" <dri-devel@lists.freedesktop.org>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	linaro-mm-sig@lists.linaro.org, "Jason Gunthorpe" <jgg@ziepe.ca>,
	"John Stultz" <john.stultz@linaro.org>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"Daniel Vetter" <daniel.vetter@intel.com>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"Christian König" <christian.koenig@amd.com>,
	linux-media@vger.kernel.org
Subject: Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
Date: Wed, 23 Nov 2022 11:06:55 +0100	[thread overview]
Message-ID: <CAKMK7uFjmzewqv3r4hL9hvLADwV536n2n6xbAWaUvmAcStr5KQ@mail.gmail.com> (raw)
In-Reply-To: <b05e6091-4e07-1e32-773d-f603ac9ac98b@gmail.com>

On Wed, 23 Nov 2022 at 10:39, Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 23.11.22 um 10:30 schrieb Daniel Vetter:
> > On Wed, 23 Nov 2022 at 10:06, Christian König <christian.koenig@amd.com> wrote:
> >> Am 22.11.22 um 20:50 schrieb Daniel Vetter:
> >>> On Tue, 22 Nov 2022 at 20:34, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >>>> On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:
> >>>>> You nuke all the ptes. Drivers that move have slightly more than a
> >>>>> bare struct file, they also have a struct address_space so that
> >>>>> invalidate_mapping_range() works.
> >>>> Okay, this is one of the ways that this can be made to work correctly,
> >>>> as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this
> >>>> was the DAX mistake)
> >>> Hence this patch, to enforce that no dma-buf exporter gets this wrong.
> >>> Which some did, and then blamed bug reporters for the resulting splats
> >>> :-) One of the things we've reverted was the ttm huge pte support,
> >>> since that doesn't have the pmd_special flag (yet) and so would let
> >>> gup_fast through.
> >> The problem is not only gup, a lot of people seem to assume that when
> >> you are able to grab a reference to a page that the ptes pointing to
> >> that page can't change any more. And that's obviously incorrect.
> >>
> >> I witnessed tons of discussions about that already. Some customers even
> >> modified our code assuming that and then wondered why the heck they ran
> >> into data corruption.
> >>
> >> It's gotten so bad that I've even proposed intentionally mangling the
> >> page reference count on TTM allocated pages:
> >> https://patchwork.kernel.org/project/dri-devel/patch/20220927143529.135689-1-christian.koenig@amd.com/
> > Yeah maybe something like this could be applied after we land this
> > patch here.
>
> I think both should land ASAP. We don't have any other way than to
> clearly point out incorrect approaches if we want to prevent the
> resulting data corruption.
>
> > Well maybe should have the same check in gem mmap code to
> > make sure no driver
>
> Reads like the sentence is somehow cut of?

Yeah, just wanted to say that we need to make sure in as many places
as possible that VM_PFNMAP is set for bo mmaps.

> >> I think it would be better that instead of having special flags in the
> >> ptes and vmas that you can't follow them to a page structure we would
> >> add something to the page indicating that you can't grab a reference to
> >> it. But this might break some use cases as well.
> > Afaik the problem with that is that there's no free page bits left for
> > these debug checks. Plus the pte+vma flags are the flags to make this
> > clear already.
>
> Maybe a GFP flag to set the page reference count to zero or something
> like this?

Hm yeah that might work. I'm not sure what it will all break though?
And we'd need to make sure that underflowing the page refcount dies in
a backtrace.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

WARNING: multiple messages have this Message-ID (diff)
From: Daniel Vetter <daniel.vetter@ffwll.ch>
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>
Cc: "Intel Graphics Development" <intel-gfx@lists.freedesktop.org>,
	"DRI Development" <dri-devel@lists.freedesktop.org>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	linaro-mm-sig@lists.linaro.org, "Jason Gunthorpe" <jgg@ziepe.ca>,
	"John Stultz" <john.stultz@linaro.org>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"Daniel Vetter" <daniel.vetter@intel.com>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"Christian König" <christian.koenig@amd.com>,
	linux-media@vger.kernel.org
Subject: Re: [Intel-gfx] [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
Date: Wed, 23 Nov 2022 11:06:55 +0100	[thread overview]
Message-ID: <CAKMK7uFjmzewqv3r4hL9hvLADwV536n2n6xbAWaUvmAcStr5KQ@mail.gmail.com> (raw)
In-Reply-To: <b05e6091-4e07-1e32-773d-f603ac9ac98b@gmail.com>

On Wed, 23 Nov 2022 at 10:39, Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 23.11.22 um 10:30 schrieb Daniel Vetter:
> > On Wed, 23 Nov 2022 at 10:06, Christian König <christian.koenig@amd.com> wrote:
> >> Am 22.11.22 um 20:50 schrieb Daniel Vetter:
> >>> On Tue, 22 Nov 2022 at 20:34, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >>>> On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:
> >>>>> You nuke all the ptes. Drivers that move have slightly more than a
> >>>>> bare struct file, they also have a struct address_space so that
> >>>>> invalidate_mapping_range() works.
> >>>> Okay, this is one of the ways that this can be made to work correctly,
> >>>> as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this
> >>>> was the DAX mistake)
> >>> Hence this patch, to enforce that no dma-buf exporter gets this wrong.
> >>> Which some did, and then blamed bug reporters for the resulting splats
> >>> :-) One of the things we've reverted was the ttm huge pte support,
> >>> since that doesn't have the pmd_special flag (yet) and so would let
> >>> gup_fast through.
> >> The problem is not only gup, a lot of people seem to assume that when
> >> you are able to grab a reference to a page that the ptes pointing to
> >> that page can't change any more. And that's obviously incorrect.
> >>
> >> I witnessed tons of discussions about that already. Some customers even
> >> modified our code assuming that and then wondered why the heck they ran
> >> into data corruption.
> >>
> >> It's gotten so bad that I've even proposed intentionally mangling the
> >> page reference count on TTM allocated pages:
> >> https://patchwork.kernel.org/project/dri-devel/patch/20220927143529.135689-1-christian.koenig@amd.com/
> > Yeah maybe something like this could be applied after we land this
> > patch here.
>
> I think both should land ASAP. We don't have any other way than to
> clearly point out incorrect approaches if we want to prevent the
> resulting data corruption.
>
> > Well maybe should have the same check in gem mmap code to
> > make sure no driver
>
> Reads like the sentence is somehow cut of?

Yeah, just wanted to say that we need to make sure in as many places
as possible that VM_PFNMAP is set for bo mmaps.

> >> I think it would be better that instead of having special flags in the
> >> ptes and vmas that you can't follow them to a page structure we would
> >> add something to the page indicating that you can't grab a reference to
> >> it. But this might break some use cases as well.
> > Afaik the problem with that is that there's no free page bits left for
> > these debug checks. Plus the pte+vma flags are the flags to make this
> > clear already.
>
> Maybe a GFP flag to set the page reference count to zero or something
> like this?

Hm yeah that might work. I'm not sure what it will all break though?
And we'd need to make sure that underflowing the page refcount dies in
a backtrace.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

WARNING: multiple messages have this Message-ID (diff)
From: Daniel Vetter <daniel.vetter@ffwll.ch>
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>
Cc: "Christian König" <christian.koenig@amd.com>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"DRI Development" <dri-devel@lists.freedesktop.org>,
	"Intel Graphics Development" <intel-gfx@lists.freedesktop.org>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	"John Stultz" <john.stultz@linaro.org>,
	"Daniel Vetter" <daniel.vetter@intel.com>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org
Subject: Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap
Date: Wed, 23 Nov 2022 11:06:55 +0100	[thread overview]
Message-ID: <CAKMK7uFjmzewqv3r4hL9hvLADwV536n2n6xbAWaUvmAcStr5KQ@mail.gmail.com> (raw)
In-Reply-To: <b05e6091-4e07-1e32-773d-f603ac9ac98b@gmail.com>

On Wed, 23 Nov 2022 at 10:39, Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 23.11.22 um 10:30 schrieb Daniel Vetter:
> > On Wed, 23 Nov 2022 at 10:06, Christian König <christian.koenig@amd.com> wrote:
> >> Am 22.11.22 um 20:50 schrieb Daniel Vetter:
> >>> On Tue, 22 Nov 2022 at 20:34, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >>>> On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:
> >>>>> You nuke all the ptes. Drivers that move have slightly more than a
> >>>>> bare struct file, they also have a struct address_space so that
> >>>>> invalidate_mapping_range() works.
> >>>> Okay, this is one of the ways that this can be made to work correctly,
> >>>> as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this
> >>>> was the DAX mistake)
> >>> Hence this patch, to enforce that no dma-buf exporter gets this wrong.
> >>> Which some did, and then blamed bug reporters for the resulting splats
> >>> :-) One of the things we've reverted was the ttm huge pte support,
> >>> since that doesn't have the pmd_special flag (yet) and so would let
> >>> gup_fast through.
> >> The problem is not only gup, a lot of people seem to assume that when
> >> you are able to grab a reference to a page that the ptes pointing to
> >> that page can't change any more. And that's obviously incorrect.
> >>
> >> I witnessed tons of discussions about that already. Some customers even
> >> modified our code assuming that and then wondered why the heck they ran
> >> into data corruption.
> >>
> >> It's gotten so bad that I've even proposed intentionally mangling the
> >> page reference count on TTM allocated pages:
> >> https://patchwork.kernel.org/project/dri-devel/patch/20220927143529.135689-1-christian.koenig@amd.com/
> > Yeah maybe something like this could be applied after we land this
> > patch here.
>
> I think both should land ASAP. We don't have any other way than to
> clearly point out incorrect approaches if we want to prevent the
> resulting data corruption.
>
> > Well maybe should have the same check in gem mmap code to
> > make sure no driver
>
> Reads like the sentence is somehow cut of?

Yeah, just wanted to say that we need to make sure in as many places
as possible that VM_PFNMAP is set for bo mmaps.

> >> I think it would be better that instead of having special flags in the
> >> ptes and vmas that you can't follow them to a page structure we would
> >> add something to the page indicating that you can't grab a reference to
> >> it. But this might break some use cases as well.
> > Afaik the problem with that is that there's no free page bits left for
> > these debug checks. Plus the pte+vma flags are the flags to make this
> > clear already.
>
> Maybe a GFP flag to set the page reference count to zero or something
> like this?

Hm yeah that might work. I'm not sure what it will all break though?
And we'd need to make sure that underflowing the page refcount dies in
a backtrace.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

  reply	other threads:[~2022-11-23 10:07 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-22 17:08 [PATCH] dma-buf: Require VM_PFNMAP vma for mmap Daniel Vetter
2022-11-22 17:08 ` [Intel-gfx] " Daniel Vetter
2022-11-22 17:08 ` Daniel Vetter
2022-11-22 18:03 ` Jason Gunthorpe
2022-11-22 18:03   ` Jason Gunthorpe
2022-11-22 18:08   ` Daniel Vetter
2022-11-22 18:08     ` [Intel-gfx] " Daniel Vetter
2022-11-22 18:08     ` Daniel Vetter
2022-11-22 18:50     ` Jason Gunthorpe
2022-11-22 18:50       ` Jason Gunthorpe
2022-11-22 19:29       ` Daniel Vetter
2022-11-22 19:29         ` [Intel-gfx] " Daniel Vetter
2022-11-22 19:29         ` Daniel Vetter
2022-11-22 19:34         ` Jason Gunthorpe
2022-11-22 19:34           ` Jason Gunthorpe
2022-11-22 19:50           ` Daniel Vetter
2022-11-22 19:50             ` [Intel-gfx] " Daniel Vetter
2022-11-22 19:50             ` Daniel Vetter
2022-11-23  9:06             ` Christian König
2022-11-23  9:06               ` Christian König
2022-11-23  9:06               ` [Intel-gfx] " Christian König
2022-11-23  9:30               ` Daniel Vetter
2022-11-23  9:30                 ` Daniel Vetter
2022-11-23  9:30                 ` [Intel-gfx] " Daniel Vetter
2022-11-23  9:39                 ` [Linaro-mm-sig] " Christian König
2022-11-23  9:39                   ` Christian König
2022-11-23  9:39                   ` [Intel-gfx] " Christian König
2022-11-23 10:06                   ` Daniel Vetter [this message]
2022-11-23 10:06                     ` Daniel Vetter
2022-11-23 10:06                     ` [Intel-gfx] " Daniel Vetter
2022-11-23 12:46                     ` Jason Gunthorpe
2022-11-23 12:46                       ` Jason Gunthorpe
2022-11-23 12:49                       ` Christian König
2022-11-23 12:49                         ` Christian König
2022-11-23 12:49                         ` [Intel-gfx] " Christian König
2022-11-23 12:53                         ` Jason Gunthorpe
2022-11-23 12:53                           ` Jason Gunthorpe
2022-11-23 13:12                           ` Christian König
2022-11-23 13:12                             ` Christian König
2022-11-23 13:12                             ` [Intel-gfx] " Christian König
2022-11-23 13:28                             ` Jason Gunthorpe
2022-11-23 13:28                               ` Jason Gunthorpe
2022-11-23 14:28                               ` Daniel Vetter
2022-11-23 14:28                                 ` Daniel Vetter
2022-11-23 14:28                                 ` [Intel-gfx] " Daniel Vetter
2022-11-23 15:04                                 ` Jason Gunthorpe
2022-11-23 15:04                                   ` Jason Gunthorpe
2022-11-23 16:22                                   ` Daniel Vetter
2022-11-23 16:22                                     ` [Intel-gfx] " Daniel Vetter
2022-11-23 16:22                                     ` Daniel Vetter
2022-11-23 14:34                               ` Daniel Vetter
2022-11-23 14:34                                 ` [Intel-gfx] " Daniel Vetter
2022-11-23 14:34                                 ` Daniel Vetter
2022-11-23 15:08                                 ` Jason Gunthorpe
2022-11-23 15:08                                   ` Jason Gunthorpe
2022-11-23 15:15                                   ` Christian König
2022-11-23 15:15                                     ` [Intel-gfx] " Christian König
2022-11-23 15:15                                     ` Christian König
2022-11-23 16:26                                     ` Daniel Vetter
2022-11-23 16:26                                       ` [Intel-gfx] " Daniel Vetter
2022-11-23 16:26                                       ` Daniel Vetter
2022-11-23 16:26                                     ` Jason Gunthorpe
2022-11-23 16:26                                       ` Jason Gunthorpe
2022-11-22 19:43 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2022-11-22 20:08 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-11-23  8:07 ` [PATCH] " Thomas Zimmermann
2022-11-23  8:07   ` [Intel-gfx] " Thomas Zimmermann
2022-11-23  8:07   ` Thomas Zimmermann
2022-11-23  9:33   ` Daniel Vetter
2022-11-23  9:33     ` Daniel Vetter
2022-11-23  9:33     ` [Intel-gfx] " Daniel Vetter
2022-11-23  9:58 ` [Intel-gfx] ✗ Fi.CI.IGT: failure for " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKMK7uFjmzewqv3r4hL9hvLADwV536n2n6xbAWaUvmAcStr5KQ@mail.gmail.com \
    --to=daniel.vetter@ffwll.ch \
    --cc=christian.koenig@amd.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=daniel.vetter@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jgg@ziepe.ca \
    --cc=john.stultz@linaro.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-media@vger.kernel.org \
    --cc=sumit.semwal@linaro.org \
    --cc=surenb@google.com \
    --cc=tzimmermann@suse.de \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.