KVM Archive on lore.kernel.org
 help / color / Atom feed
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "Jason Gunthorpe" <jgg@ziepe.ca>,
	"DRI Development" <dri-devel@lists.freedesktop.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"KVM list" <kvm@vger.kernel.org>, "Linux MM" <linux-mm@kvack.org>,
	"Linux ARM" <linux-arm-kernel@lists.infradead.org>,
	linux-samsung-soc <linux-samsung-soc@vger.kernel.org>,
	"open list:DMA BUFFER SHARING FRAMEWORK"
	<linux-media@vger.kernel.org>,
	linux-s390 <linux-s390@vger.kernel.org>,
	"Daniel Vetter" <daniel.vetter@intel.com>,
	"Kees Cook" <keescook@chromium.org>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"John Hubbard" <jhubbard@nvidia.com>,
	"Jérôme Glisse" <jglisse@redhat.com>, "Jan Kara" <jack@suse.cz>,
	"Linus Torvalds" <torvalds@linux-foundation.org>
Subject: Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
Date: Sat, 10 Oct 2020 11:21:22 +0200
Message-ID: <20201010112122.587f9945@coco.lan> (raw)
In-Reply-To: <CAKMK7uF-hrSwzFQkp6qEP88hM1Qg8TMQOunuRHh=f2+D8MaMRg@mail.gmail.com>

Em Fri, 9 Oct 2020 19:52:05 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:

> On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> >  
> > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.  
> >
> > No, it is unconditionally unsafe. The CMA movable mappings are
> > specific VMAs that will have bad issues here, but there are other
> > types too.

I didn't check the mm dirty details, but I strongly suspect that the mm
code has a way to prevent moving a mmapped page while it is still in usage.

If not, then the entire movable pages concept sounds broken to me, and
has to be fixed at mm subsystem.

> >
> > The only way to do something at a VMA level is to have a list of OK
> > VMAs, eg because they were creatd via a special mmap helper from the
> > media subsystem.

I'm not sure if I'm following you on that. The media API can work with
different ways of sending buffers to userspace:

	- read();

	- using the overlay mode. This interface is deprecated in
	  practice, being replaced by DMABUF. Only a few older hardware
	  supports it, and it depends on an special X11 helper driver
	  for it to work.

	- via DMABUF:
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html

	- via mmap, using a mmap helper:
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/mmap.html

	- via mmap, using userspace-provided pointers:
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html

The existing open-source programs usually chose one or more of the above
modes. if the Kernel driver returns -EINVAL when an mmapped streaming I/O
mode is not supported, userspace has to select a different method.

Most userspace open source programs have fallback support: if one
mmap I/O method fails, it selects another one, although this is not
a mandatory requirement. I found (and fixed) a few ones that were
relying exclusively on userptr support, but I didn't make a 
comprehensive check.

Also there are a number of relevant closed-source apps that we have no 
idea about what methods they use, like Skype, and other similar
videoconferencing programs. Breaking support for those, specially at
a time where people are relying on it in order to participate on
conferences and doing internal meetings is a **very bad** idea.

So, whatever solution is taken, it should not be dumping warning
messages at the system and tainting the Kernel, but, instead, checking
if the userspace request is valid or not. If it is invalid, return the
proper error code via the right V4L2 ioctl, in order to let userspace
choose a different method. I the request is valid, refcount the pages 
for them to not be moved while they're still in usage.

-

Let me provide some background about how things work at the media
subsytem. If you want to know more, the userspace-provided memory 
mapped pointers work is described here:

	https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html#userp

Basically, userspace calls either one of those ioctls:

	VIDIOC_CREATE_BUFS:
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-create-bufs.html

Which is translated into a videobuf2 call to: vb2_ioctl_create_bufs()

	VIDIOC_REQBUFS
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-reqbufs.html#vidioc-reqbufs

Which is translated into a videobuf2 call to: vb2_ioctl_reqbufs()

Both internally calls vb2_verify_memory_type(), which is responsible
for checking if the provided pointers are OK for the usage and/or do
all necessary page allocations, and taking care of any special 
requirements. This could easily have some additional checks to
verify if the requested VMA address has pages that are movable or
not, ensuring that ensure that the VMA is OK, and locking them in
order to prevent the mm code to move such pages while they are in
usage by the media subsystem.

Now, as I said before, I don't know the dirty details about how
to lock those pages at the mm subsystem in order to avoid it
to move the used pages. Yet, when vb2_create_framevec()
is called, the check if VMA is OK should already be happened
at vb2_verify_memory_type().

-

It should be noticed that the dirty hack added by patch 09/17
and 10/17 affects *all* types of memory allocations at V4L2, 
as this kAPI is called by the 3 different memory models
supported at the media subsystem:

	drivers/media/common/videobuf2/videobuf2-vmalloc.c
	drivers/media/common/videobuf2/videobuf2-dma-contig.c
	drivers/media/common/videobuf2/videobuf2-dma-sg.c

In other words, with this code:

	int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
		unsigned long *pfn)
	{
	#ifdef CONFIG_STRICT_FOLLOW_PFN
		pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n");
		return -EINVAL;
	#else
		WARN_ONCE(1, "unsafe follow_pfn usage\n");
		add_taint(TAINT_USER, LOCKDEP_STILL_OK);

		return follow_pfn(vma, address, pfn);
	#endif

you're unconditionally breaking the media userspace API support not
only for embedded systems that could be using userptr instead of
DMA_BUF, but also for *all* video devices, including USB cameras.

This is **NOT** an acceptable solution. 

So, I stand my NACK to this approach.

> > > Well, no drivers inside the media subsystem uses such flag, although
> > > they may rely on some infrastructure that could be using it behind
> > > the bars.  
> >
> > It doesn't matter, nothing prevents the user from calling media APIs
> > on mmaps it gets from other subsystems.  
> 
> I think a good first step would be to disable userptr of non struct
> page backed storage going forward for any new hw support. Even on
> existing drivers. dma-buf sharing has been around for long enough now
> that this shouldn't be a problem. Unfortunately right now this doesn't
> seem to exist, so the entire problem keeps getting perpetuated.

Well, the media uAPI does support DMABUF (both import and export):

	https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html
	https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-expbuf.html#vidioc-expbuf

And I fully agree that newer programs should use DMABUF when sharing
buffers with DRM subsystem, but that's not my main concern.

What I do want is to not break userspace support nor to taint the Kernel
due to a valid uAPI call.

A valid uAPI call should check if the parameters passed though it are
valid. If they are, it should handle. Otherwise, it should return
-EINVAL, without tainting the Kernel or printing warning messages.

The approach took by patches 09/17 and 10/17 doesn't do that.
Instead, they just unconditionally breaks the media uAPI.

What should be done, instead, is to drop patch 10/17, and work on
a way for the code inside vb2_create_framevec() to ensure that, if
USERPTR is used, the memory pages will be properly locked while the
driver is using, returning -EINVAL only if there's no way to proceed,
without tainting the Kernel.

> 
> > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > flag that it would be denying the core mm code to set __GFP_MOVABLE.  
> >
> > We can't tell from the VMA these kinds of details..
> >
> > It has to go the other direction, evey mmap that might be used as a
> > userptr here has to be found and the VMA specially created to allow
> > its use. At least that is a kernel only change, but will need people
> > with the HW to do this work.  
> 
> I think the only reasonable way to keep this working is:
> - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);

Not sure how an userspace buffer could be mapped to be using it,
specially since the buffer may not even be imported/exported
from the DRM subsystem, but it could simply be allocated
via glibc calloc()/malloc().

> - add dma-buf export support to fbdev and v4l

V4L has support for it already.

> - roll this out everywhere we still need it.

That's where things are hard. This is not like DRM, where the APIs
are called via some open source libraries that are also managed
by DRM upstream developers.

In the case of the media subsystem, while we added a libv4l sometime
ago, not all userspace apps use it, as a large part of them used
to exist before the addition of the libraries. Also, we're currently 
trying to deprecate libv4l, at least for embedded systems, in favor
of libcamera.

On media, there are lots of closed source apps that uses the media API
directly. Even talking about open source ones, there are lots of
different apps, including not only media-specific apps, but also
generic ones, like web browsers, which don't use the libraries we
wrote.

An userspace API breakage would take *huge* efforts and will take
lots of time to have it merged everywhere. It will cause lots of
troubles everywhere.

> Realistically this just isn't going to happen. 

Why not? Any app developer could already use DMA-BUF if required,
as the upstream support was added several years ago.

> And anything else just
> reimplements half of dma-buf, 

It is just the opposite: those uAPIs came years before dma-buf.
In other words, it was dma-buf that re-implemented it ;-)

Now, I agree with you that dma-buf is a way cooler than the past
alternatives.

-

It sounds to me that you're considering on only one use case of 
USERPTR: to pass a buffer created from DRM. As far as I'm aware,
only embedded userspace applications actually do that.

Yet, there are a number of other applications that do something like
the userptr_capture() function on this code:

	https://git.linuxtv.org/v4l-utils.git/tree/contrib/test/v4l2grab.c

E. g. using glibc alloc functions like calloc() to allocate memory, 
passing the user-allocated data to the Kernel via something like this:

	struct v4l2_requestbuffers req;
	struct v4l2_buffer buf;
	int n_buffers = 2;

	req.count  = 2;
	req.type   = V4L2_BUF_TYPE_VIDEO_CAPTURE;
	req.memory = V4L2_MEMORY_USERPTR;
	if (ioctl(fd, VIDIOC_REQBUFS, &req))
		return errno;

	for (i = 0; i < req.count; ++i) {
		buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
		buf.memory = V4L2_MEMORY_USERPTR;
		buf.index = i;
		buf.m.userptr = (unsigned long)buffers[i].start;
		buf.length = buffers[i].length;
		if (ioctl(fd, VIDIOC_QBUF, &buf))
			return errno;
	}
	if (ioctl(fd, VIDIOC_STREAMON, &req.type))
		return errno;

	/* some capture loop */

	ioctl(fd, VIDIOC_STREAMOFF, &req.type);

I can't possibly see *any* security issues with the above code.

As I said before, VIDIOC_REQBUFS should be checking if the userspace
buffers are OK and ensure that their refcounts will be incremented,
in order to avoid mm to move the pages used there, freeing the 
refconts when VIDIOC_STREAMOFF - or close(fd) - is called.

> which is kinda pointless (you need
> minimally refcounting and some way to get at a promise of a permanent
> sg list for dma. Plus probably the vmap for kernel cpu access.

Yeah, refcounting needs to happen. 

Thanks,
Mauro

  parent reply index

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-09  7:59 [PATCH v2 00/17] follow_pfn and other iomap races Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 01/17] drm/exynos: Stop using frame_vector helpers Daniel Vetter
2020-10-16  7:42   ` John Hubbard
2020-10-09  7:59 ` [PATCH v2 02/17] drm/exynos: Use FOLL_LONGTERM for g2d cmdlists Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 03/17] misc/habana: Stop using frame_vector helpers Daniel Vetter
2020-10-10 20:26   ` Oded Gabbay
2020-10-10 21:32     ` Daniel Vetter
2020-10-10 21:41       ` Daniel Vetter
2020-10-10 21:47         ` Oded Gabbay
2020-10-16  7:45   ` John Hubbard
2020-10-09  7:59 ` [PATCH v2 04/17] misc/habana: Use FOLL_LONGTERM for userptr Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 05/17] mm/frame-vector: Use FOLL_LONGTERM Daniel Vetter
2020-10-16  7:54   ` John Hubbard
2020-10-16  8:03     ` Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 06/17] media: videobuf2: Move frame_vector into media subsystem Daniel Vetter
2020-10-09 10:14   ` Mauro Carvalho Chehab
2020-10-09 16:57     ` Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 07/17] mm: Close race in generic_access_phys Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 08/17] s390/pci: Remove races against pte updates Daniel Vetter
2020-10-12 14:03   ` Niklas Schnelle
2020-10-12 14:19     ` Daniel Vetter
2020-10-12 14:39       ` Niklas Schnelle
2020-10-21  7:55       ` Niklas Schnelle
2020-10-22  7:39         ` Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 09/17] mm: Add unsafe_follow_pfn Daniel Vetter
2020-10-09 10:34   ` Mauro Carvalho Chehab
2020-10-09 12:21     ` Jason Gunthorpe
2020-10-09 12:37       ` Mauro Carvalho Chehab
2020-10-09 12:39         ` Mauro Carvalho Chehab
2020-10-09 12:48         ` Jason Gunthorpe
2020-10-09 17:52           ` Daniel Vetter
2020-10-09 18:01             ` Jason Gunthorpe
2020-10-09 19:31               ` Daniel Vetter
2020-10-10  9:21             ` Mauro Carvalho Chehab [this message]
2020-10-10 10:53               ` Daniel Vetter
2020-10-10 11:39                 ` Mauro Carvalho Chehab
2020-10-10 11:56                   ` Daniel Vetter
2020-10-10 17:22             ` Tomasz Figa
2020-10-10 21:35               ` Laurent Pinchart
2020-10-10 21:50                 ` Daniel Vetter
2020-10-11  6:27                   ` Mauro Carvalho Chehab
2020-10-11  6:36                     ` Mauro Carvalho Chehab
2020-10-10 21:11             ` Laurent Pinchart
2020-10-12 10:46           ` Marek Szyprowski
2020-10-12 13:49             ` Daniel Vetter
2020-10-10 17:30         ` Tomasz Figa
2020-10-09  7:59 ` [PATCH v2 10/17] media/videbuf1|2: Mark follow_pfn usage as unsafe Daniel Vetter
2020-10-10  9:24   ` Mauro Carvalho Chehab
2020-10-09  7:59 ` [PATCH v2 11/17] vfio/type1: Mark follow_pfn " Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 12/17] PCI: Obey iomem restrictions for procfs mmap Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 13/17] /dev/mem: Only set filp->f_mapping Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 14/17] resource: Move devmem revoke code to resource framework Daniel Vetter
2020-10-09 10:59   ` Greg Kroah-Hartman
2020-10-09 12:31   ` Jason Gunthorpe
2020-10-09 14:24     ` Daniel Vetter
2020-10-09 14:32       ` Jason Gunthorpe
2020-10-09 18:28         ` Dan Williams
2020-10-15  0:09           ` Jason Gunthorpe
2020-10-15  7:52             ` Daniel Vetter
2020-10-15  7:55               ` Daniel Vetter
2020-10-15 15:29             ` Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 15/17] sysfs: Support zapping of binary attr mmaps Daniel Vetter
2020-10-09 10:58   ` Greg Kroah-Hartman
2020-10-09  7:59 ` [PATCH v2 16/17] PCI: Revoke mappings like devmem Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 17/17] drm/i915: Properly request PCI BARs Daniel Vetter
2020-10-09  9:47   ` Ville Syrjälä
2020-10-09 10:01     ` Daniel Vetter
2020-10-09 10:41       ` Ville Syrjälä
2020-10-09 14:18         ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201010112122.587f9945@coco.lan \
    --to=mchehab+huawei@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=daniel.vetter@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jack@suse.cz \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=keescook@chromium.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-samsung-soc@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git