KVM Archive on lore.kernel.org
 help / color / Atom feed
From: Daniel Vetter <daniel.vetter@ffwll.ch>
To: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: "Jason Gunthorpe" <jgg@ziepe.ca>,
	"DRI Development" <dri-devel@lists.freedesktop.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"KVM list" <kvm@vger.kernel.org>, "Linux MM" <linux-mm@kvack.org>,
	"Linux ARM" <linux-arm-kernel@lists.infradead.org>,
	linux-samsung-soc <linux-samsung-soc@vger.kernel.org>,
	"open list:DMA BUFFER SHARING FRAMEWORK"
	<linux-media@vger.kernel.org>,
	linux-s390 <linux-s390@vger.kernel.org>,
	"Daniel Vetter" <daniel.vetter@intel.com>,
	"Kees Cook" <keescook@chromium.org>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"John Hubbard" <jhubbard@nvidia.com>,
	"Jérôme Glisse" <jglisse@redhat.com>, "Jan Kara" <jack@suse.cz>,
	"Linus Torvalds" <torvalds@linux-foundation.org>
Subject: Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
Date: Sat, 10 Oct 2020 12:53:49 +0200
Message-ID: <CAKMK7uEKP5UMKeQHkTHWYUJkp=mz-Hvh-fJZy1KP3kT2xHpHrg@mail.gmail.com> (raw)
In-Reply-To: <20201010112122.587f9945@coco.lan>

Hi Mauro,

You might want to read the patches more carefully, because what you're
demanding is what my patches do. Short summary:

- if STRICT_FOLLOW_PFN is set:
a) normal memory is handled as-is (i.e. your example works) through
the addition of FOLL_LONGTERM. This is the "pin the pages correctly"
approach you're demanding
b) for non-page memory (zerocopy sharing before dma-buf was upstreamed
is the only use-case for this) it is correctly rejected with -EINVAL

- if you do have blobby userspace which requires the zero-copy using
userptr to work, and doesn't have any of the fallbacks implemented
that you describe, this would be a regression. That's why
STRICT_FOLLOW_PFN can be unset. And yes there's a real security issue
in this usage, Marek also confirmed that the removal of the vma copy
code a few years ago essentially broke even the weak assumptions that
made the code work 10+ years ago when it was merged.

so tdlr; Everything you described will keep working even with the new
flag set, and everything you demand must be implemented _is_
implemented in this patch series.

Also please keep in mind that we are _not_ talking about the general
userptr support that was merge ~20 years ago. This patch series here
is _only_ about the zerocpy userptr support merged with 50ac952d2263
("[media] videobuf2-dma-sg: Support io userptr operations on io
memory") in 2013.

Why this hack was merged in 2013 when we merged dma-buf almost 2 years
before that I have no idea about. Imo that patch simply should never
have landed, and instead dma-buf support prioritized.

Cheers, Daniel


On Sat, Oct 10, 2020 at 11:21 AM Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
>
> Em Fri, 9 Oct 2020 19:52:05 +0200
> Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
>
> > On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> > >
> > > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> > >
> > > No, it is unconditionally unsafe. The CMA movable mappings are
> > > specific VMAs that will have bad issues here, but there are other
> > > types too.
>
> I didn't check the mm dirty details, but I strongly suspect that the mm
> code has a way to prevent moving a mmapped page while it is still in usage.
>
> If not, then the entire movable pages concept sounds broken to me, and
> has to be fixed at mm subsystem.
>
> > >
> > > The only way to do something at a VMA level is to have a list of OK
> > > VMAs, eg because they were creatd via a special mmap helper from the
> > > media subsystem.
>
> I'm not sure if I'm following you on that. The media API can work with
> different ways of sending buffers to userspace:
>
>         - read();
>
>         - using the overlay mode. This interface is deprecated in
>           practice, being replaced by DMABUF. Only a few older hardware
>           supports it, and it depends on an special X11 helper driver
>           for it to work.
>
>         - via DMABUF:
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html
>
>         - via mmap, using a mmap helper:
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/mmap.html
>
>         - via mmap, using userspace-provided pointers:
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html
>
> The existing open-source programs usually chose one or more of the above
> modes. if the Kernel driver returns -EINVAL when an mmapped streaming I/O
> mode is not supported, userspace has to select a different method.
>
> Most userspace open source programs have fallback support: if one
> mmap I/O method fails, it selects another one, although this is not
> a mandatory requirement. I found (and fixed) a few ones that were
> relying exclusively on userptr support, but I didn't make a
> comprehensive check.
>
> Also there are a number of relevant closed-source apps that we have no
> idea about what methods they use, like Skype, and other similar
> videoconferencing programs. Breaking support for those, specially at
> a time where people are relying on it in order to participate on
> conferences and doing internal meetings is a **very bad** idea.
>
> So, whatever solution is taken, it should not be dumping warning
> messages at the system and tainting the Kernel, but, instead, checking
> if the userspace request is valid or not. If it is invalid, return the
> proper error code via the right V4L2 ioctl, in order to let userspace
> choose a different method. I the request is valid, refcount the pages
> for them to not be moved while they're still in usage.
>
> -
>
> Let me provide some background about how things work at the media
> subsytem. If you want to know more, the userspace-provided memory
> mapped pointers work is described here:
>
>         https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html#userp
>
> Basically, userspace calls either one of those ioctls:
>
>         VIDIOC_CREATE_BUFS:
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-create-bufs.html
>
> Which is translated into a videobuf2 call to: vb2_ioctl_create_bufs()
>
>         VIDIOC_REQBUFS
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-reqbufs.html#vidioc-reqbufs
>
> Which is translated into a videobuf2 call to: vb2_ioctl_reqbufs()
>
> Both internally calls vb2_verify_memory_type(), which is responsible
> for checking if the provided pointers are OK for the usage and/or do
> all necessary page allocations, and taking care of any special
> requirements. This could easily have some additional checks to
> verify if the requested VMA address has pages that are movable or
> not, ensuring that ensure that the VMA is OK, and locking them in
> order to prevent the mm code to move such pages while they are in
> usage by the media subsystem.
>
> Now, as I said before, I don't know the dirty details about how
> to lock those pages at the mm subsystem in order to avoid it
> to move the used pages. Yet, when vb2_create_framevec()
> is called, the check if VMA is OK should already be happened
> at vb2_verify_memory_type().
>
> -
>
> It should be noticed that the dirty hack added by patch 09/17
> and 10/17 affects *all* types of memory allocations at V4L2,
> as this kAPI is called by the 3 different memory models
> supported at the media subsystem:
>
>         drivers/media/common/videobuf2/videobuf2-vmalloc.c
>         drivers/media/common/videobuf2/videobuf2-dma-contig.c
>         drivers/media/common/videobuf2/videobuf2-dma-sg.c
>
> In other words, with this code:
>
>         int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
>                 unsigned long *pfn)
>         {
>         #ifdef CONFIG_STRICT_FOLLOW_PFN
>                 pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n");
>                 return -EINVAL;
>         #else
>                 WARN_ONCE(1, "unsafe follow_pfn usage\n");
>                 add_taint(TAINT_USER, LOCKDEP_STILL_OK);
>
>                 return follow_pfn(vma, address, pfn);
>         #endif
>
> you're unconditionally breaking the media userspace API support not
> only for embedded systems that could be using userptr instead of
> DMA_BUF, but also for *all* video devices, including USB cameras.
>
> This is **NOT** an acceptable solution.
>
> So, I stand my NACK to this approach.
>
> > > > Well, no drivers inside the media subsystem uses such flag, although
> > > > they may rely on some infrastructure that could be using it behind
> > > > the bars.
> > >
> > > It doesn't matter, nothing prevents the user from calling media APIs
> > > on mmaps it gets from other subsystems.
> >
> > I think a good first step would be to disable userptr of non struct
> > page backed storage going forward for any new hw support. Even on
> > existing drivers. dma-buf sharing has been around for long enough now
> > that this shouldn't be a problem. Unfortunately right now this doesn't
> > seem to exist, so the entire problem keeps getting perpetuated.
>
> Well, the media uAPI does support DMABUF (both import and export):
>
>         https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html
>         https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-expbuf.html#vidioc-expbuf
>
> And I fully agree that newer programs should use DMABUF when sharing
> buffers with DRM subsystem, but that's not my main concern.
>
> What I do want is to not break userspace support nor to taint the Kernel
> due to a valid uAPI call.
>
> A valid uAPI call should check if the parameters passed though it are
> valid. If they are, it should handle. Otherwise, it should return
> -EINVAL, without tainting the Kernel or printing warning messages.
>
> The approach took by patches 09/17 and 10/17 doesn't do that.
> Instead, they just unconditionally breaks the media uAPI.
>
> What should be done, instead, is to drop patch 10/17, and work on
> a way for the code inside vb2_create_framevec() to ensure that, if
> USERPTR is used, the memory pages will be properly locked while the
> driver is using, returning -EINVAL only if there's no way to proceed,
> without tainting the Kernel.
>
> >
> > > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> > >
> > > We can't tell from the VMA these kinds of details..
> > >
> > > It has to go the other direction, evey mmap that might be used as a
> > > userptr here has to be found and the VMA specially created to allow
> > > its use. At least that is a kernel only change, but will need people
> > > with the HW to do this work.
> >
> > I think the only reasonable way to keep this working is:
> > - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
>
> Not sure how an userspace buffer could be mapped to be using it,
> specially since the buffer may not even be imported/exported
> from the DRM subsystem, but it could simply be allocated
> via glibc calloc()/malloc().
>
> > - add dma-buf export support to fbdev and v4l
>
> V4L has support for it already.
>
> > - roll this out everywhere we still need it.
>
> That's where things are hard. This is not like DRM, where the APIs
> are called via some open source libraries that are also managed
> by DRM upstream developers.
>
> In the case of the media subsystem, while we added a libv4l sometime
> ago, not all userspace apps use it, as a large part of them used
> to exist before the addition of the libraries. Also, we're currently
> trying to deprecate libv4l, at least for embedded systems, in favor
> of libcamera.
>
> On media, there are lots of closed source apps that uses the media API
> directly. Even talking about open source ones, there are lots of
> different apps, including not only media-specific apps, but also
> generic ones, like web browsers, which don't use the libraries we
> wrote.
>
> An userspace API breakage would take *huge* efforts and will take
> lots of time to have it merged everywhere. It will cause lots of
> troubles everywhere.
>
> > Realistically this just isn't going to happen.
>
> Why not? Any app developer could already use DMA-BUF if required,
> as the upstream support was added several years ago.
>
> > And anything else just
> > reimplements half of dma-buf,
>
> It is just the opposite: those uAPIs came years before dma-buf.
> In other words, it was dma-buf that re-implemented it ;-)
>
> Now, I agree with you that dma-buf is a way cooler than the past
> alternatives.
>
> -
>
> It sounds to me that you're considering on only one use case of
> USERPTR: to pass a buffer created from DRM. As far as I'm aware,
> only embedded userspace applications actually do that.
>
> Yet, there are a number of other applications that do something like
> the userptr_capture() function on this code:
>
>         https://git.linuxtv.org/v4l-utils.git/tree/contrib/test/v4l2grab.c
>
> E. g. using glibc alloc functions like calloc() to allocate memory,
> passing the user-allocated data to the Kernel via something like this:
>
>         struct v4l2_requestbuffers req;
>         struct v4l2_buffer buf;
>         int n_buffers = 2;
>
>         req.count  = 2;
>         req.type   = V4L2_BUF_TYPE_VIDEO_CAPTURE;
>         req.memory = V4L2_MEMORY_USERPTR;
>         if (ioctl(fd, VIDIOC_REQBUFS, &req))
>                 return errno;
>
>         for (i = 0; i < req.count; ++i) {
>                 buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
>                 buf.memory = V4L2_MEMORY_USERPTR;
>                 buf.index = i;
>                 buf.m.userptr = (unsigned long)buffers[i].start;
>                 buf.length = buffers[i].length;
>                 if (ioctl(fd, VIDIOC_QBUF, &buf))
>                         return errno;
>         }
>         if (ioctl(fd, VIDIOC_STREAMON, &req.type))
>                 return errno;
>
>         /* some capture loop */
>
>         ioctl(fd, VIDIOC_STREAMOFF, &req.type);
>
> I can't possibly see *any* security issues with the above code.
>
> As I said before, VIDIOC_REQBUFS should be checking if the userspace
> buffers are OK and ensure that their refcounts will be incremented,
> in order to avoid mm to move the pages used there, freeing the
> refconts when VIDIOC_STREAMOFF - or close(fd) - is called.
>
> > which is kinda pointless (you need
> > minimally refcounting and some way to get at a promise of a permanent
> > sg list for dma. Plus probably the vmap for kernel cpu access.
>
> Yeah, refcounting needs to happen.
>
> Thanks,
> Mauro



--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

  reply index

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-09  7:59 [PATCH v2 00/17] follow_pfn and other iomap races Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 01/17] drm/exynos: Stop using frame_vector helpers Daniel Vetter
2020-10-16  7:42   ` John Hubbard
2020-10-09  7:59 ` [PATCH v2 02/17] drm/exynos: Use FOLL_LONGTERM for g2d cmdlists Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 03/17] misc/habana: Stop using frame_vector helpers Daniel Vetter
2020-10-10 20:26   ` Oded Gabbay
2020-10-10 21:32     ` Daniel Vetter
2020-10-10 21:41       ` Daniel Vetter
2020-10-10 21:47         ` Oded Gabbay
2020-10-16  7:45   ` John Hubbard
2020-10-09  7:59 ` [PATCH v2 04/17] misc/habana: Use FOLL_LONGTERM for userptr Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 05/17] mm/frame-vector: Use FOLL_LONGTERM Daniel Vetter
2020-10-16  7:54   ` John Hubbard
2020-10-16  8:03     ` Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 06/17] media: videobuf2: Move frame_vector into media subsystem Daniel Vetter
2020-10-09 10:14   ` Mauro Carvalho Chehab
2020-10-09 16:57     ` Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 07/17] mm: Close race in generic_access_phys Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 08/17] s390/pci: Remove races against pte updates Daniel Vetter
2020-10-12 14:03   ` Niklas Schnelle
2020-10-12 14:19     ` Daniel Vetter
2020-10-12 14:39       ` Niklas Schnelle
2020-10-21  7:55       ` Niklas Schnelle
2020-10-22  7:39         ` Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 09/17] mm: Add unsafe_follow_pfn Daniel Vetter
2020-10-09 10:34   ` Mauro Carvalho Chehab
2020-10-09 12:21     ` Jason Gunthorpe
2020-10-09 12:37       ` Mauro Carvalho Chehab
2020-10-09 12:39         ` Mauro Carvalho Chehab
2020-10-09 12:48         ` Jason Gunthorpe
2020-10-09 17:52           ` Daniel Vetter
2020-10-09 18:01             ` Jason Gunthorpe
2020-10-09 19:31               ` Daniel Vetter
2020-10-10  9:21             ` Mauro Carvalho Chehab
2020-10-10 10:53               ` Daniel Vetter [this message]
2020-10-10 11:39                 ` Mauro Carvalho Chehab
2020-10-10 11:56                   ` Daniel Vetter
2020-10-10 17:22             ` Tomasz Figa
2020-10-10 21:35               ` Laurent Pinchart
2020-10-10 21:50                 ` Daniel Vetter
2020-10-11  6:27                   ` Mauro Carvalho Chehab
2020-10-11  6:36                     ` Mauro Carvalho Chehab
2020-10-10 21:11             ` Laurent Pinchart
2020-10-12 10:46           ` Marek Szyprowski
2020-10-12 13:49             ` Daniel Vetter
2020-10-10 17:30         ` Tomasz Figa
2020-10-09  7:59 ` [PATCH v2 10/17] media/videbuf1|2: Mark follow_pfn usage as unsafe Daniel Vetter
2020-10-10  9:24   ` Mauro Carvalho Chehab
2020-10-09  7:59 ` [PATCH v2 11/17] vfio/type1: Mark follow_pfn " Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 12/17] PCI: Obey iomem restrictions for procfs mmap Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 13/17] /dev/mem: Only set filp->f_mapping Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 14/17] resource: Move devmem revoke code to resource framework Daniel Vetter
2020-10-09 10:59   ` Greg Kroah-Hartman
2020-10-09 12:31   ` Jason Gunthorpe
2020-10-09 14:24     ` Daniel Vetter
2020-10-09 14:32       ` Jason Gunthorpe
2020-10-09 18:28         ` Dan Williams
2020-10-15  0:09           ` Jason Gunthorpe
2020-10-15  7:52             ` Daniel Vetter
2020-10-15  7:55               ` Daniel Vetter
2020-10-15 15:29             ` Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 15/17] sysfs: Support zapping of binary attr mmaps Daniel Vetter
2020-10-09 10:58   ` Greg Kroah-Hartman
2020-10-09  7:59 ` [PATCH v2 16/17] PCI: Revoke mappings like devmem Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 17/17] drm/i915: Properly request PCI BARs Daniel Vetter
2020-10-09  9:47   ` Ville Syrjälä
2020-10-09 10:01     ` Daniel Vetter
2020-10-09 10:41       ` Ville Syrjälä
2020-10-09 14:18         ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKMK7uEKP5UMKeQHkTHWYUJkp=mz-Hvh-fJZy1KP3kT2xHpHrg@mail.gmail.com' \
    --to=daniel.vetter@ffwll.ch \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=daniel.vetter@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jack@suse.cz \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=keescook@chromium.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-samsung-soc@vger.kernel.org \
    --cc=mchehab+huawei@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git