Re: [ANN] Meeting to discuss improvements to support MC-based cameras on generic apps

From: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
To: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Nicolas Dufresne <nicolas@ndufresne.ca>,
	LMML <linux-media@vger.kernel.org>,
	Wim Taymans <wtaymans@redhat.com>,
	schaller@redhat.com
Subject: Re: [ANN] Meeting to discuss improvements to support MC-based cameras on generic apps
Date: Fri, 18 May 2018 15:38:06 +0300	[thread overview]
Message-ID: <1568098.156aR60jyk@avalon> (raw)
In-Reply-To: <20180518082447.3068c34c@vento.lan>

Hi Mauro,

On Friday, 18 May 2018 14:24:47 EEST Mauro Carvalho Chehab wrote:
> Em Fri, 18 May 2018 11:15:39 +0300 Laurent Pinchart escreveu:
> > On Friday, 18 May 2018 00:38:53 EEST Nicolas Dufresne wrote:
> >> Le jeudi 17 mai 2018 à 16:07 -0300, Mauro Carvalho Chehab a écrit :
> >>> Hi all,
> >>> 
> >>> The goal of this e-mail is to schedule a meeting in order to discuss
> >>> improvements at the media subsystem in order to support complex
> >>> camera hardware by usual apps.
> >>> 
> >>> The main focus here is to allow supporting devices with MC-based
> >>> hardware connected to a camera.
> >>> 
> >>> In short, my proposal is to meet with the interested parties on
> >>> solving this issue during the Open Source Summit in Japan, e. g.
> >>> between June, 19-22, in Tokyo.
> >>> 
> >>> I'd like to know who is interested on joining us for such meeting,
> >> and to hear a proposal of themes for discussions.
> >>> 
> >>> I'm enclosing a detailed description of the problem, in order to
> >>> allow the interested parties to be at the same page.
> >> 
> >> It's unlikely I'll be able to attend this meeting, but I'd like to
> >> provide some initial input on this. Find inline some clarification on
> >> why libv4l2 is disabled by default in Gst, as it's not just
> >> performance.
> 
> Thanks for complementing it!
> 
> >> A major aspect that is totally absent of this mail is PipeWire. With
> >> the venue of sandboxed application, there is a need to control access
> >> to cameras through a daemon. The same daemon is also used to control
> >> access to screen capture on Wayland (instead of letting any random
> >> application capture your screen, like on X11). The effort is lead by
> >> the desktop team at RedHat (folks CCed). PipeWire already have V4L2
> >> native support and is integrated in GStreamer already in a way that it
> >> can totally replace the V4L2 capture component there. PipeWire is
> >> plugin base, so more type of camera support (including proprietary
> >> ones) can be added.
> >
> > One issue that has been worrying me for the past five years or so is how
> > to ensure that we will continue having open-source camera support in the
> > future. Pipewire is just a technology and as such can be used in good or
> > evil ways, but as a community we need to care about availability of open
> > solutions.
> > 
> > So far, by pushing the V4L2 API as the proper way to support cameras, we
> > have tried to resist the natural inclination of vendors to close
> > everything, as implementing a closed-source kernel driver isn't an option
> > that most would consider. Of course, the drawback is that some vendors
> > have simply decided not to care about upstream camera support.
> > 
> > If we move the camera API one level up to userspace (and whether the API
> > will be defined by Pipewire, by libv4l or by something else), we'll make
> > it easier for vendors not to play along. My big question is how to
> > prevent that. I think there's still value there in mandating V4L2 as the
> > only API for cameras, and in ensuring that we support multiple userspace
> > multimedia stacks, not just Pipewire (this is already done in a way, as I
> > don't foresee Android moving away from their camera HAL in the near
> > future). That will likely not be enough, and I'd like to hear other
> > people's opinions on this topic.
> > 
> > I would like to emphasize that I don't expect vendors to open the
> > implementation of their 3A algorithms, and I'm not actually concerned
> > about that part. If that's the only part shipped as closed-source, and if
> > the hardware operation is documented (ideally in public datasheet, but at
> > a minimum with proper documentation of custom ioctls used to configure the
> > hardware), then the community will have the opportunity to implement an
> > open- source 3A library. My main concern is thus about all component
> > other than the 3A library.
> 
> Yeah, I share the same concern. Whatever solution we take, we should
> do our best to ensure that the camera driver will be open and it would
> be possible to have open-sourced 3A libraries as alternatives.
> 
> One of the biggest reasons why we decided to start libv4l project,
> in the past, was to ensure an open source solution. The problem we
> faced on that time is to ensure that, when a new media driver were
> added with some proprietary output format, an open source decoding
> software were also added at libv4l.
> 
> This approach ensured that all non-MC cameras are supported by all
> V4L2 applications.
> 
> Before libv4l, media support for a given device were limited to a few
> apps that knew how to decode the format. There were even cases were a
> proprietary app were required, as no open source decoders were available.
> 
> From my PoV, the biggest gain with libv4l is that the same group of
> maintainers can ensure that the entire solution (Kernel driver and
> low level userspace support) will provide everything required for an
> open source app to work with it.
> 
> I'm not sure how we would keep enforcing it if the pipeline setting
> and control propagation logic for an specific hardware will be
> delegated to PipeWire. It seems easier to keep doing it on a libv4l
> (version 2) and let PipeWire to use it.

I believe we need to first study pipewire in more details. I have no personal 
opinion yet as I haven't had time to investigate it. That being said, I don't 
think that libv4l with closed-source plugins would be much better than a 
closed-source pipewire plugin. What main concern once we provide a userspace 
camera stack API is that vendors might implement that API in a closed-source 
component that calls to a kernel driver implementing a custom API, with all 
knowledge about the camera located in the closed-source component. I'm not 
sure how to prevent that, my best proposal would be to make V4L2 so useful 
that vendors wouldn't even think about a different solution (possibly coupled 
by the pressure put by platform vendors such as Google who mandate upstream 
kernel drivers for Chrome OS, but that's still limited as even when it comes 
to Google there's no such pressure on the Android side).

> >> Remote daemon can also provide streams, as this is the case for
> >> compositors and screen casting. An extra benefit is that you can have
> >> multiple application reading frames from the same camera. It also allow
> >> sandboxed application (the do not have access to /dev) to use the
> >> cameras. PipeWire is much more then that, but let's focus on that.
> >> 
> >> This is the direction we are heading on the "generic" / Desktop Linux.
> >> Porting Firefox and Chrome is obviously planed, as these beast are
> >> clear candidate for being sand-boxed and requires screen share feature
> >> for WebRTC.
> >> 
> >> In this context, proprietary or HW specific algorithm could be
> >> implemented in userspace as PipeWire plugins, and then application will
> >> automatically be enable to enumerate and use these. I'm not saying the
> >> libv4l2 stuff is not needed short term, but it's just a short term
> >> thing in my opinion.
> >> 
> >>> 1. Introduction
> >>> ===============
> >>> 
> >>> 1.1 V4L2 Kernel aspects
> >>> -----------------------
> >>> 
> >>> The media subsystem supports two types of devices:
> >>> 
> >>> - "traditional" media hardware, supported via V4L2 API. On such
> >>>   hardware, opening a single device node (usually /dev/video0) is
> >>>   enough to control the entire device. We call it as devnode-based
> >>>   devices.
> >>> 
> >>> - Media-controller based devices. On those devices, there are several
> >>>   /dev/video? nodes and several /dev/v4l2-subdev? nodes, plus a media
> >>>   controller device node (usually /dev/media0).
> >>>   We call it as mc-based devices. Controlling the hardware require
> >>>   opening the media device (/dev/media0), setup the pipeline and
> >>>   adjust the sub-devices via /dev/v4l2-subdev?. Only streaming is
> >>>   controlled by /dev/video?.
> >>> 
> >>> All "standard" media applications, including open source ones
> >>> (Camorama, Cheese, Xawtv, Firefox, Chromium, ...) and closed source
> >>> ones (Skype, Chrome, ...) supports devnode-based devices.
> >>> 
> >>> Support for mc-based devices currently require an specialized
> >>> application in order to prepare the device for its usage (setup
> >>> pipelines, adjust hardware controls, etc). Once pipeline is set, the
> >>> streaming goes via /dev/video?, although usually some /dev/v4l2-subdev?
> >>> devnodes should also be opened, in order to implement algorithms
> >>> designed to make video quality reasonable. On such devices, it is not
> >>> uncommon that the device used by the application to be a random number
> >>> (on OMAP3 driver, typically, is either /dev/video4 or /dev/video6).
> >>> 
> >>> One example of such hardware is at the OMAP3-based hardware:
> >>> 	http://www.infradead.org/~mchehab/mc-next-gen/omap3-igepv2-with
> >>> 
> >>> -tvp5150.png
> >>> 
> >>> On the picture, there's a graph with the hardware blocks in
> >>> blue/dark/blue and the corresponding devnode interfaces in yellow.
> >>> 
> >>> The mc-based approach was taken when support for Nokia N9/N900
> >>> cameras was added (with has OMAP3 SoC). It is required because the
> >>> camera hardware on SoC comes with a media processor (ISP), with does a
> >>> lot more than just capturing, allowing complex algorithms to enhance
> >>> image quality in runtime.
> >>> 
> >>> Those algorithms are known as 3A - an acronym for 3 other acronyms:
> >>> 	- AE (Auto Exposure);
> >>> 	- AF (Auto Focus);
> >>> 	- AWB (Auto White Balance).
> >>> 
> >>> Setting a camera with such ISPs are harder because the pipelines to
> >>> be set actually depends the requirements for those 3A algorithms to
> >>> run. Also, usually, the 3A algorithms use some chipset-specific
> >>> userspace API, that exports some image properties, calculated by the
> >>> ISP, to speed up the convergence of those algorithms.
> >>> 
> >>> Btw, usually, the 3A algorithms are IP-protected, provided by vendors
> >>> as binary only blobs, although there are a few OSS implementations.
> >>> 
> >>> 1.2 V4L2 userspace aspects
> >>> --------------------------
> >>> 
> >>> Back when USB cameras were introduced, the hardware were really
> >>> simple: they had a CCD camera sensor and a chip that bridges the data
> >>> though USB. CCD camera sensors typically provide data using a bayer
> >>> format, but they usually have their own proprietary ways to pack the
> >>> data, in order to reduce the USB bandwidth (original cameras were USB
> >>> 1.1).
> >>> 
> >>> So, V4L2 has a myriad of different formats, in order to match each
> >>> CCD camera sensor format. At the end of the day, applications were
> >>> able to use only a subset of the available hardware, since they need
> >>> to come with format converters for all formats the developer uses
> >>> (usually a very small subset of the available ones).
> >>> 
> >>> To end with this mess, an userspace library was written, called
> >>> libv4l. It supports all those proprietary formats. So, applications
> >>> can use a RGB or YUV format, without needing to concern about
> >>> conversions.
> >>> 
> >>> The way it works is by adding wrappers to system calls: open, close,
> >>> ioctl, mmap, mmunmap. So, a conversion to use it is really simple:
> >>> at the source code of the apps, all it was needed is to prepend the
> >>> existing calls with "v4l2_", e. g. v4l2_open, v4l2_close, etc.
> >>> 
> >>> All open source apps we know now supports libv4l. On a few (like
> >>> gstreamer), support for it is optional.
> >>> 
> >>> In order to support closed source, another wrapper was added,
> >>> allowing to call any closed source application to use it, by using
> >>> LD_PRELOAD.
> >>> 
> >>> For example, using skype with it is as simple as calling it with:
> >>> 	$ LD_PRELOAD=/usr/lib/libv4l/v4l1compat.so
> >>> 
> >>> /usr/bin/skypeforlinux
> >>> 
> >>> 2. Current problems
> >>> ===================
> >>> 
> >>> 2.1 Libv4l can slow image handling
> >>> ----------------------------------
> >>> 
> >>> Nowadays, almost all new "simple" cameras are connected via USB using
> >>> the UVC class (USB Video Class). UVC standardized the allowed
> >>> formats, and most apps just implement them. The UVC hardware is more
> >>> complex, having format converters inside it. So, for most usages,
> >>> format conversion isn't needed anymore.
> >>> 
> >>> The need of doing format conversion in software makes libv4l slow,
> >>> requiring lots of CPU usage in order to convert a 4K or 8K format,
> >>> being even worse with 3D cameras.
> >>> 
> >>> Also, due to the need of supporting LD_PRELOAD, zero-buffer copy via
> >>> DMA_BUFFER currently doesn't work with libv4l.
> >>> 
> >>> Right now, gstreamer defaults to not enable libv4l2, mainly due to
> >>> those performance issues.
> >> 
> >> I need to clarify a little bit on why we disabled libv4l2 in GStreamer,
> >> as it's not only for performance reason, there is couple of major
> >> issues in the libv4l2 implementation that get's in way. Just a short
> >> list:
> >> 
> >>   - Crash when CREATE_BUFS is being used
> >>   - Crash in the jpeg decoder (when frames are corrupted)
> >>   - App exporting DMABuf need to be aware of emulation, otherwise the
> >>     DMABuf exported are in the orignal format
> >>   - RW emulation only initialize the queue on first read (causing
> >>     userspace poll() to fail)
> >>   - Signature of v4l2_mmap does not match mmap() (minor)
> >>   - The colorimetry does not seem emulated when conversion
> >>   - Sub-optimal locking (at least deadlocks were fixed)
> > 
> > Do you see any point in that list that couldn't be fixed in libv4l ?
> > 
> >> Except for the colorimetry (which causes negotiation failure, as it
> >> causes invalid colorimetry / format matches), these issues are already
> >> worked around in GStreamer, but with the lost of features of course.
> >> There is other cases were something worked without libv4l2, but didn't
> >> work with libv4l2, but we haven't tracked down the cause.
> >> 
> >> For people working on this venue, since 1.14, you can enable libv4l2 at
> >> run-time using env GST_V4L2_USE_LIBV4L2=1.
> >> 
> >>> 2.2 Modern hardware is starting to come with "complex" camera ISP
> >>> -----------------------------------------------------------------
> >>> 
> >>> While mc-based devices were limited to SoC, it was easy to
> >>> "delegate" the task of talking with the hardware to the
> >>> embedded hardware designers.
> >>> 
> >>> However, this is changing. Dell Latitude 5285 laptop is a standard
> >>> PC with an i3-core, i5-core or i7-core CPU, with comes with the
> >>> Intel IMU3 ISP hardware[1]
> >>> 
> >>> [1] https://www.spinics.net/lists/linux-usb/msg167478.html
> >>> 
> >>> There, instead of an USB camera, the hardware is equipped with a
> >>> MC-based ISP, connected to its camera. Currently, despite having
> >>> a Kernel driver for it, the camera doesn't work with any
> >>> userspace application.
> >>> 
> >>> I'm also aware of other projects that are considering the usage of
> >>> mc-based devices for non-dedicated hardware.
> >>> 
> >>> 3. How to solve it?
> >>> ===================
> >>> 
> >>> That's the main focus of the meeting :-)
> >>> 
> >>> From a previous discussion I had with media sub-maintainers, there
> >>> are at least two actions that seem required. I'm listing them below as
> >>> an starting point for the discussions, but we can eventually come up
> >>> with some different approach after the meeting.
> >>> 
> >>> 3.1 libv4l2 support for mc-based hardware
> >>> =========================================
> >>> 
> >>> In order to support those hardware, we'll need to do some redesign
> >>> mainly at libv4l2[2].
> >>> 
> >>> The idea is to work on a new API for libv4l2 that will allow to
> >>> split the format conversion on a separate part of it, add support
> >>> for DMA Buffer and come up with a way for the library to work
> >>> transparently with both devnode-based and mc-based hardware.
> >>> 
> >>> That envolves adding capacity at libv4l to setup hardware pipelines
> >>> and to propagate controls among their sub-devices. Eventually, part
> >>> of it will be done in Kernel.
> >>> 
> >>> That should give performance increase at the library and would allow
> >>> gstreamer to use it by default, without compromising performance.
> >>> 
> >>> [2] I don't discard that some Kernel changes could also be part of
> >>> the solution, like, for example, doing control propagation along the
> >>> pipeline on simple use case scenarios.
> >>> 
> >>> 3.2 libv4l2 support for 3A algorithms
> >>> =====================================
> >>> 
> >>> The 3A algorithm handing is highly dependent on the hardware. The
> >>> idea here is to allow libv4l to have a set of 3A algorithms that
> >>> will be specific to certain mc-based hardware. Ideally, this should
> >>> be added in a way that it will allow external closed-source
> >>> algorithms to run as well.

-- 
Regards,

Laurent Pinchart