ksummit.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
@ 2021-09-10 21:00 Jonathan Corbet
  2021-09-10 21:32 ` Josh Triplett
                   ` (3 more replies)
  0 siblings, 4 replies; 77+ messages in thread
From: Jonathan Corbet @ 2021-09-10 21:00 UTC (permalink / raw)
  To: ksummit

There has been a regular disagreement in recent years about whether
drivers for accelerators (such as for the Habana Gaudi device) should be
subject to the same requirements as GPU drivers when it comes to the
availability of a free implementation of the user-space side.  It flared
up again recently:

   https://lwn.net/Articles/867168/

Happily, the Habana situation in particular seems to be resolving
itself:

   https://lwn.net/ml/linux-kernel/CAFCwf119s7iXk+qpwoVPnRtOGcxeuZb3rnihf6NWWoVT-4ODHA@mail.gmail.com/

But even there it is clear that the fundamental question has not yet
been resolved.

This seems like the sort of question that the maintainer summit exists
to address.  Specifically, we could discuss:

 - Under which circumstances should the kernel community require the
   existence of freely licensed user-space code that exercises all
   functionalities of a proposed kernel driver or feature?

 - Are there internal kernel interfaces, such as DMA-BUF or P2PDMA, that
   are only available to drivers with a free user-space implementation?
   Do we need an EXPORT_SYMBOL_USERSPACE_GPL()?

 - What constitutes an acceptable user-space implementation in cases
   where these restrictions apply?

I suspect that more clarity (and fewer arguments) on these questions
would be welcome both within and beyond the development community.

Thanks,

jon

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-10 21:00 [MAINTAINER SUMMIT] User-space requirements for accelerator drivers Jonathan Corbet
@ 2021-09-10 21:32 ` Josh Triplett
  2021-09-13 13:50   ` Christian Brauner
  2021-09-14 14:40   ` Jani Nikula
  2021-09-10 21:51 ` James Bottomley
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 77+ messages in thread
From: Josh Triplett @ 2021-09-10 21:32 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: ksummit

On Fri, Sep 10, 2021 at 03:00:58PM -0600, Jonathan Corbet wrote:
> There has been a regular disagreement in recent years about whether
> drivers for accelerators (such as for the Habana Gaudi device) should be
> subject to the same requirements as GPU drivers when it comes to the
> availability of a free implementation of the user-space side.  It flared
> up again recently:
> 
>    https://lwn.net/Articles/867168/
> 
> Happily, the Habana situation in particular seems to be resolving
> itself:
> 
>    https://lwn.net/ml/linux-kernel/CAFCwf119s7iXk+qpwoVPnRtOGcxeuZb3rnihf6NWWoVT-4ODHA@mail.gmail.com/
> 
> But even there it is clear that the fundamental question has not yet
> been resolved.
> 
> This seems like the sort of question that the maintainer summit exists
> to address.  Specifically, we could discuss:
> 
>  - Under which circumstances should the kernel community require the
>    existence of freely licensed user-space code that exercises all
>    functionalities of a proposed kernel driver or feature?

I think it'd be reasonable to ask, as well: if we required this for
*all* kernel functionality, such that we never add any userspace
interface to the kernel unless there's *some* Open Source userspace that
needs/wants it, what problems would that cause if any?

It appears that in this case the kernel pushing back has influenced the
release of Open Source userspace code. Having a kernel-wide policy here
seems like it'll *help* people within many companies to push for such
changes: "We're never going to be able to get our changes into the
upstream kernel if there's no userspace to drive them."

One tradeoff would be, in theory, that there are some vendors who won't
care enough about upstreaming their changes, and will just keep their
drivers out of tree in that circumstance. There was a time where that
would have been reason enough not to have such a policy. I think that
time has passed, though, and now I think we'd get more benefit from
requiring open userspace consumers of APIs than we'd lose by having some
APIs not submitted upstream. (Plus, those vendors are still obligated to
*ship* the source of those changes to their users, and if a third party
wants to use those changes they can always upstream them, at which point
the vendor still faces the choice of "do I want/need to participate in
this conversation or not".)

>  - What constitutes an acceptable user-space implementation in cases
>    where these restrictions apply?

This seems like it'll always be a fuzzy line. The main issue: it's OK if
there are both open and proprietary users, but it's not OK if the only
open implementation is an outdated or token project that nobody actually
uses, that exists and is maintained solely for the purposes of placating
the kernel requirement. There's no easy way to define that line, other
than "we'll know it when we see it".

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-10 21:00 [MAINTAINER SUMMIT] User-space requirements for accelerator drivers Jonathan Corbet
  2021-09-10 21:32 ` Josh Triplett
@ 2021-09-10 21:51 ` James Bottomley
  2021-09-10 21:59   ` Alexandre Belloni
  2021-09-11  0:08   ` Laurent Pinchart
  2021-09-10 22:52 ` Mauro Carvalho Chehab
  2021-09-12 19:13 ` Dave Airlie
  3 siblings, 2 replies; 77+ messages in thread
From: James Bottomley @ 2021-09-10 21:51 UTC (permalink / raw)
  To: Jonathan Corbet, ksummit

On Fri, 2021-09-10 at 15:00 -0600, Jonathan Corbet wrote:
>  - Are there internal kernel interfaces, such as DMA-BUF or P2PDMA,
> that are only available to drivers with a free user-space
> implementation? Do we need an EXPORT_SYMBOL_USERSPACE_GPL()?

I don't think reasonably we can do this.  The kernel GPLv2 licence
includes this system exception:

      NOTE! This copyright does *not* cover user programs that use
   kernel services by normal system calls - this is merely considered
   normal use of the kernel, and does *not* fall under the heading of
   "derived work". Also note that the GPL below is copyrighted by the
   Free Software Foundation, but the instance of code that it refers to
   (the Linux kernel) is copyrighted by me and others who actually
   wrote it.

    Also note that the only valid version of the GPL as far as the
   kernel is concerned is _this_ particular version of the license (ie
   v2, not v2.2 or v3.x or whatever), unless explicitly otherwise
   stated.

This means currently that once an API is exposed to user space, we've
given up control of the type of programme (proprietary or open source)
that may use it.

It might be possible legally to try and take back that control by
modifying the system exception (what is a "normal" system call), but I
personally think that would be unwise and create a raft of other
problems for other proprietary user space code running on Linux, which
I really think we don't want to do.

I think our only recourse for user space accelerators is not to export
the interface if we think it would only be used for evil purposes.

James



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-10 21:51 ` James Bottomley
@ 2021-09-10 21:59   ` Alexandre Belloni
  2021-09-10 22:35     ` James Bottomley
  2021-09-11  0:08   ` Laurent Pinchart
  1 sibling, 1 reply; 77+ messages in thread
From: Alexandre Belloni @ 2021-09-10 21:59 UTC (permalink / raw)
  To: James Bottomley; +Cc: Jonathan Corbet, ksummit

On 10/09/2021 14:51:43-0700, James Bottomley wrote:
> On Fri, 2021-09-10 at 15:00 -0600, Jonathan Corbet wrote:
> >  - Are there internal kernel interfaces, such as DMA-BUF or P2PDMA,
> > that are only available to drivers with a free user-space
> > implementation? Do we need an EXPORT_SYMBOL_USERSPACE_GPL()?
> 
> I don't think reasonably we can do this.  The kernel GPLv2 licence
> includes this system exception:
> 
>       NOTE! This copyright does *not* cover user programs that use
>    kernel services by normal system calls - this is merely considered
>    normal use of the kernel, and does *not* fall under the heading of
>    "derived work". Also note that the GPL below is copyrighted by the
>    Free Software Foundation, but the instance of code that it refers to
>    (the Linux kernel) is copyrighted by me and others who actually
>    wrote it.
> 
>     Also note that the only valid version of the GPL as far as the
>    kernel is concerned is _this_ particular version of the license (ie
>    v2, not v2.2 or v3.x or whatever), unless explicitly otherwise
>    stated.
> 
> This means currently that once an API is exposed to user space, we've
> given up control of the type of programme (proprietary or open source)
> that may use it.
> 
> It might be possible legally to try and take back that control by
> modifying the system exception (what is a "normal" system call), but I
> personally think that would be unwise and create a raft of other
> problems for other proprietary user space code running on Linux, which
> I really think we don't want to do.
> 
> I think our only recourse for user space accelerators is not to export
> the interface if we think it would only be used for evil purposes.
> 

I think the question is not whether we want to forbid proprietary user
space using an API but whether we want to merge said API so the license
on the kernel doesn't matter much.

-- 
Alexandre Belloni, co-owner and COO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-10 21:59   ` Alexandre Belloni
@ 2021-09-10 22:35     ` James Bottomley
  2021-09-11 14:51       ` Jonathan Corbet
  0 siblings, 1 reply; 77+ messages in thread
From: James Bottomley @ 2021-09-10 22:35 UTC (permalink / raw)
  To: Alexandre Belloni; +Cc: Jonathan Corbet, ksummit

On Fri, 2021-09-10 at 23:59 +0200, Alexandre Belloni wrote:
> On 10/09/2021 14:51:43-0700, James Bottomley wrote:
> > On Fri, 2021-09-10 at 15:00 -0600, Jonathan Corbet wrote:
> > >  - Are there internal kernel interfaces, such as DMA-BUF or
> > > P2PDMA, that are only available to drivers with a free user-space
> > > implementation? Do we need an EXPORT_SYMBOL_USERSPACE_GPL()?
> > 
> > I don't think reasonably we can do this.  The kernel GPLv2 licence
> > includes this system exception:
> > 
> >       NOTE! This copyright does *not* cover user programs that use
> >    kernel services by normal system calls - this is merely
> > considered
> >    normal use of the kernel, and does *not* fall under the heading
> > of
> >    "derived work". Also note that the GPL below is copyrighted by
> > the
> >    Free Software Foundation, but the instance of code that it
> > refers to
> >    (the Linux kernel) is copyrighted by me and others who actually
> >    wrote it.
> > 
> >     Also note that the only valid version of the GPL as far as the
> >    kernel is concerned is _this_ particular version of the license
> > (ie
> >    v2, not v2.2 or v3.x or whatever), unless explicitly otherwise
> >    stated.
> > 
> > This means currently that once an API is exposed to user space,
> > we've given up control of the type of programme (proprietary or
> > open source) that may use it.
> > 
> > It might be possible legally to try and take back that control by
> > modifying the system exception (what is a "normal" system call),
> > but I personally think that would be unwise and create a raft of
> > other problems for other proprietary user space code running on
> > Linux, which I really think we don't want to do.
> > 
> > I think our only recourse for user space accelerators is not to
> > export the interface if we think it would only be used for evil
> > purposes.
> > 
> 
> I think the question is not whether we want to forbid proprietary
> user space using an API but whether we want to merge said API so the
> license on the kernel doesn't matter much.

I thought that *was* the statement I made in the last paragraph: we can
choose whether or not to merge the enabling API into the kernel. 
However, if we merge it we can't choose whether a proprietary user
space takes advantage of the API.  My original reply was to the notion
of EXPORT_USERSPACE_GPL, which I think we have no legal basis for
enforcing without modifying the system exception.

James



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-10 21:00 [MAINTAINER SUMMIT] User-space requirements for accelerator drivers Jonathan Corbet
  2021-09-10 21:32 ` Josh Triplett
  2021-09-10 21:51 ` James Bottomley
@ 2021-09-10 22:52 ` Mauro Carvalho Chehab
  2021-09-10 23:45   ` Josh Triplett
  2021-09-10 23:46   ` Laurent Pinchart
  2021-09-12 19:13 ` Dave Airlie
  3 siblings, 2 replies; 77+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-10 22:52 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: ksummit

Em Fri, 10 Sep 2021 15:00:58 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> There has been a regular disagreement in recent years about whether
> drivers for accelerators (such as for the Habana Gaudi device) should be
> subject to the same requirements as GPU drivers when it comes to the
> availability of a free implementation of the user-space side.  It flared
> up again recently:
> 
>    https://lwn.net/Articles/867168/
> 
> Happily, the Habana situation in particular seems to be resolving
> itself:
> 
>    https://lwn.net/ml/linux-kernel/CAFCwf119s7iXk+qpwoVPnRtOGcxeuZb3rnihf6NWWoVT-4ODHA@mail.gmail.com/
> 
> But even there it is clear that the fundamental question has not yet
> been resolved.
> 
> This seems like the sort of question that the maintainer summit exists
> to address.  Specifically, we could discuss:
> 
>  - Under which circumstances should the kernel community require the
>    existence of freely licensed user-space code that exercises all
>    functionalities of a proposed kernel driver or feature?
> 
>  - Are there internal kernel interfaces, such as DMA-BUF or P2PDMA, that
>    are only available to drivers with a free user-space implementation?
>    Do we need an EXPORT_SYMBOL_USERSPACE_GPL()?
> 
>  - What constitutes an acceptable user-space implementation in cases
>    where these restrictions apply?
> 
> I suspect that more clarity (and fewer arguments) on these questions
> would be welcome both within and beyond the development community.

The media subsystem also has this sort of issues: there are several
drivers there to support hardware accelerators for video encoders and 
decoders. In the case of media, usually devices with such hardware have
an Image Signal Processor, where the codec runs on some firmware.

On media, enforcing userspace to always be open source would
have been very bad, as it would prevent several videoconferencing 
software to exist on Linux.

Also, there are several such codec hardware that only exists on 
embedded hardware that already depends on proprietary software 
to run.

So, a policy like that would make more damage than good.

What we do, instead, is to try to enforce that the userspace API to
be fully documented in a way that open source software can exist.

This is easier said than done, but we have some compliance tools
that we use, in order to help to validate the uAPI implementations.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-10 22:52 ` Mauro Carvalho Chehab
@ 2021-09-10 23:45   ` Josh Triplett
  2021-09-10 23:48     ` Dave Hansen
  2021-09-10 23:55     ` Thomas Gleixner
  2021-09-10 23:46   ` Laurent Pinchart
  1 sibling, 2 replies; 77+ messages in thread
From: Josh Triplett @ 2021-09-10 23:45 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: Jonathan Corbet, ksummit

On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote:
> On media, enforcing userspace to always be open source would
> have been very bad, as it would prevent several videoconferencing 
> software to exist on Linux.

I don't think we should enforce that all userspace users of an interface
be Open Source. I do think we should enforce that *some* userspace user
of an interface be Open Source before we add the interface.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-10 22:52 ` Mauro Carvalho Chehab
  2021-09-10 23:45   ` Josh Triplett
@ 2021-09-10 23:46   ` Laurent Pinchart
  2021-09-11  0:38     ` Mauro Carvalho Chehab
  1 sibling, 1 reply; 77+ messages in thread
From: Laurent Pinchart @ 2021-09-10 23:46 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: Jonathan Corbet, ksummit

Hi Mauro,

On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote:
> Em Fri, 10 Sep 2021 15:00:58 -0600 Jonathan Corbet escreveu:
> 
> > There has been a regular disagreement in recent years about whether
> > drivers for accelerators (such as for the Habana Gaudi device) should be
> > subject to the same requirements as GPU drivers when it comes to the
> > availability of a free implementation of the user-space side.  It flared
> > up again recently:
> > 
> >    https://lwn.net/Articles/867168/
> > 
> > Happily, the Habana situation in particular seems to be resolving
> > itself:
> > 
> >    https://lwn.net/ml/linux-kernel/CAFCwf119s7iXk+qpwoVPnRtOGcxeuZb3rnihf6NWWoVT-4ODHA@mail.gmail.com/
> > 
> > But even there it is clear that the fundamental question has not yet
> > been resolved.
> > 
> > This seems like the sort of question that the maintainer summit exists
> > to address.  Specifically, we could discuss:
> > 
> >  - Under which circumstances should the kernel community require the
> >    existence of freely licensed user-space code that exercises all
> >    functionalities of a proposed kernel driver or feature?
> > 
> >  - Are there internal kernel interfaces, such as DMA-BUF or P2PDMA, that
> >    are only available to drivers with a free user-space implementation?
> >    Do we need an EXPORT_SYMBOL_USERSPACE_GPL()?
> > 
> >  - What constitutes an acceptable user-space implementation in cases
> >    where these restrictions apply?
> > 
> > I suspect that more clarity (and fewer arguments) on these questions
> > would be welcome both within and beyond the development community.
> 
> The media subsystem also has this sort of issues: there are several
> drivers there to support hardware accelerators for video encoders and 
> decoders. In the case of media, usually devices with such hardware have
> an Image Signal Processor, where the codec runs on some firmware.
> 
> On media, enforcing userspace to always be open source would
> have been very bad, as it would prevent several videoconferencing 
> software to exist on Linux.

Could you elaborate on which software you're thinking of ? And maybe
which driver(s) you're thinking about ?

> Also, there are several such codec hardware that only exists on 
> embedded hardware that already depends on proprietary software 
> to run.
> 
> So, a policy like that would make more damage than good.

I wonder if there's some sort of misunderstanding. We're not talking
about requiring *all* userspace to be open, but about requiring the
existence of *one* open userspace as an acceptance criteria for merging
drivers.

> What we do, instead, is to try to enforce that the userspace API to
> be fully documented in a way that open source software can exist.
> 
> This is easier said than done, but we have some compliance tools
> that we use, in order to help to validate the uAPI implementations.

I won't comment on the codec side as there are people more knowledgeable
than me in that area, but on the camera side, my analysis of the
situation is different than yours. The vast majority of drivers only use
standard parts of the V4L2 and MC APIs. For those, we do have plenty of
existing open userspace, as well as compliance tools as you mentioned
(some drivers also expose custom controls, but that's a very small API
footprint and they are documented well enough to be usable by any
application).

The possibly problematic case is mostly about ISP drivers. For those,
the userspace API is more complex, with lots of device-specific
elements. The first ISP that received kernel support was the OMAP3 ISP,
and the driver has custom ioctls. Requiring an open userspace may indeed
have delayed the driver from being merged. However, for that particular
device, we had a public datasheet that documented the ISP, which we
could consider as an alternative to the open userspace implementation
(a topic worth discussing I believe). Even if we had considered the
public datasheet to not be enough, I think we would have eventually got
an open userspace anyway (based on my internal knowledge of the Nokia
team working on this project).

More recently, we have two ISP drivers that got merged, for the Rockchip
RK3399 ISP and the Intel IPU3. Those drivers differ from all previous
drivers in the sense that the device is configured through a blob of
parameters passed by userspace to the kernel, written to registers by
the driver in the Rockchip case, and passed to the ISP firmware in the
Intel case. We have for both drivers a header file that describes the
layout of those blobs as C structures, but I can tell with first hand
experience working on an open userspace implementation that at least in
the Intel case that's not enough to use the ISP.

There's also an ISP driver for Raspberry Pi that is currently out of
tree and that we'll try to upstream, and for that one we have an open
userspace already (there's actually no closed userspace, kudos to
Raspberry Pi for doing the right thing, I'd like to see more vendors
following that great lead).

Finally, having spent the last 2 years and a half working on an open
userspace camera stack (libcamera) that exercise the V4L2 and MC APIs, I
was quite horrified to find out how some parts of those APIs are pretty
badly designed. I'm not just blaming others here, this includes APIs
that I have designed myself. They have been tested at the time with test
applications (either extending tools such as v4l2-ctl, or writing
dedicated tested tools for the API), but failed me when exercised in
real use cases. In retrospect I shouldn't have been surprised,
developing a test application that exercises the API in the way it was
designed as opposed to the way a real use case would need it can only
lead to problems. I think that requiring an open implementation of a
real use case, not just a test tool, would be a very good practice for
new APIs or extensions or existing APIs.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-10 23:45   ` Josh Triplett
@ 2021-09-10 23:48     ` Dave Hansen
  2021-09-11  0:13       ` Laurent Pinchart
  2021-09-10 23:55     ` Thomas Gleixner
  1 sibling, 1 reply; 77+ messages in thread
From: Dave Hansen @ 2021-09-10 23:48 UTC (permalink / raw)
  To: Josh Triplett, Mauro Carvalho Chehab; +Cc: Jonathan Corbet, ksummit

On 9/10/21 4:45 PM, Josh Triplett wrote:
> On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote:
>> On media, enforcing userspace to always be open source would
>> have been very bad, as it would prevent several videoconferencing 
>> software to exist on Linux.
> I don't think we should enforce that all userspace users of an interface
> be Open Source. I do think we should enforce that *some* userspace user
> of an interface be Open Source before we add the interface.

Right, if there's *no* open userspace, we can't meaningfully test or
debug the thing.

Maybe we don't need a whole userspace stack for every last interface,
but if folks can't even offer up a selftest, it's not a good sign.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-10 23:45   ` Josh Triplett
  2021-09-10 23:48     ` Dave Hansen
@ 2021-09-10 23:55     ` Thomas Gleixner
  2021-09-11  0:20       ` Laurent Pinchart
  2021-09-11 10:31       ` Leon Romanovsky
  1 sibling, 2 replies; 77+ messages in thread
From: Thomas Gleixner @ 2021-09-10 23:55 UTC (permalink / raw)
  To: Josh Triplett, Mauro Carvalho Chehab; +Cc: Jonathan Corbet, ksummit

On Fri, Sep 10 2021 at 16:45, Josh Triplett wrote:

> On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote:
>> On media, enforcing userspace to always be open source would
>> have been very bad, as it would prevent several videoconferencing 
>> software to exist on Linux.
>
> I don't think we should enforce that all userspace users of an interface
> be Open Source. I do think we should enforce that *some* userspace user
> of an interface be Open Source before we add the interface.

The real question is whether the interface is documented in a way that
an Open Source implementation is possible. It does not matter whether it
exists at that point in time or not. Even if it exists there is no
guarantee that it is feature complete.

Freely accessible documentation is really the key.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-10 21:51 ` James Bottomley
  2021-09-10 21:59   ` Alexandre Belloni
@ 2021-09-11  0:08   ` Laurent Pinchart
  1 sibling, 0 replies; 77+ messages in thread
From: Laurent Pinchart @ 2021-09-11  0:08 UTC (permalink / raw)
  To: James Bottomley; +Cc: Jonathan Corbet, ksummit

Hi James,

On Fri, Sep 10, 2021 at 02:51:43PM -0700, James Bottomley wrote:
> On Fri, 2021-09-10 at 15:00 -0600, Jonathan Corbet wrote:
> >  - Are there internal kernel interfaces, such as DMA-BUF or P2PDMA,
> > that are only available to drivers with a free user-space
> > implementation? Do we need an EXPORT_SYMBOL_USERSPACE_GPL()?
> 
> I don't think reasonably we can do this.  The kernel GPLv2 licence
> includes this system exception:
> 
>       NOTE! This copyright does *not* cover user programs that use
>    kernel services by normal system calls - this is merely considered
>    normal use of the kernel, and does *not* fall under the heading of
>    "derived work". Also note that the GPL below is copyrighted by the
>    Free Software Foundation, but the instance of code that it refers to
>    (the Linux kernel) is copyrighted by me and others who actually
>    wrote it.
> 
>     Also note that the only valid version of the GPL as far as the
>    kernel is concerned is _this_ particular version of the license (ie
>    v2, not v2.2 or v3.x or whatever), unless explicitly otherwise
>    stated.
> 
> This means currently that once an API is exposed to user space, we've
> given up control of the type of programme (proprietary or open source)
> that may use it.
> 
> It might be possible legally to try and take back that control by
> modifying the system exception (what is a "normal" system call), but I
> personally think that would be unwise and create a raft of other
> problems for other proprietary user space code running on Linux, which
> I really think we don't want to do.

I overall agree that forbidding APIs from being used by closed-source
userspace is likely a no-go from a license point of view, and that it
would create a dangerous precedent and convey a bad message.

> I think our only recourse for user space accelerators is not to export
> the interface if we think it would only be used for evil purposes.

In my opinion the issue at hand isn't so much that the interface can be
used for evil purpose, but that drivers can reap the benefits of being
included in mainline while ignoring (in good faith or not) the
counterpart of allowing all userspace, open or not, to use the device.
The problematic part is usually not the internal kernel interfaces that
those drivers use, but the fact that they expose vendor-specific API
elements to userspace without documenting them.

One obvious option, *if* we decide that this isn't an acceptable
behaviour, is to refuse merging such drivers. DMA-BUF or P2PDMA are not
in themselves problematic, but in the case that Jon mentioned, they
indicate that the device is expected to inter-operate with other devices
by sharing data either through system memory or with direct DMA between
the devices. This makes the absence of an open userspace more
problematic as it may also affect the ability to use other devices in
the system. It could thus be considered as a criteria to decide which
drivers would require at one open userspace, should we decide that not
all drivers would.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-10 23:48     ` Dave Hansen
@ 2021-09-11  0:13       ` Laurent Pinchart
  0 siblings, 0 replies; 77+ messages in thread
From: Laurent Pinchart @ 2021-09-11  0:13 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit

Hi Dave,

On Fri, Sep 10, 2021 at 04:48:37PM -0700, Dave Hansen wrote:
> On 9/10/21 4:45 PM, Josh Triplett wrote:
> > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote:
> >> On media, enforcing userspace to always be open source would
> >> have been very bad, as it would prevent several videoconferencing 
> >> software to exist on Linux.
> >
> > I don't think we should enforce that all userspace users of an interface
> > be Open Source. I do think we should enforce that *some* userspace user
> > of an interface be Open Source before we add the interface.
> 
> Right, if there's *no* open userspace, we can't meaningfully test or
> debug the thing.
> 
> Maybe we don't need a whole userspace stack for every last interface,
> but if folks can't even offer up a selftest, it's not a good sign.

It really depends on the type of driver and device. For GPUs or camera
ISPs, for instance, a selftest is pointless, a full stack is required to
be able to meaningfully test the driver and use the device as those
expose a very large custom API to userspace (usually in the form of
command buffers that contain device-specific instructions or register
values).

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-10 23:55     ` Thomas Gleixner
@ 2021-09-11  0:20       ` Laurent Pinchart
  2021-09-11 14:20         ` Steven Rostedt
  2021-09-11 10:31       ` Leon Romanovsky
  1 sibling, 1 reply; 77+ messages in thread
From: Laurent Pinchart @ 2021-09-11  0:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit

Hi Thomas,

On Sat, Sep 11, 2021 at 01:55:16AM +0200, Thomas Gleixner wrote:
> On Fri, Sep 10 2021 at 16:45, Josh Triplett wrote:
> > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote:
> >> On media, enforcing userspace to always be open source would
> >> have been very bad, as it would prevent several videoconferencing 
> >> software to exist on Linux.
> >
> > I don't think we should enforce that all userspace users of an interface
> > be Open Source. I do think we should enforce that *some* userspace user
> > of an interface be Open Source before we add the interface.
> 
> The real question is whether the interface is documented in a way that
> an Open Source implementation is possible. It does not matter whether it
> exists at that point in time or not. Even if it exists there is no
> guarantee that it is feature complete.
> 
> Freely accessible documentation is really the key.

In principle I'd agree, but that assumes such documentation would exist
in the first place, with a sufficient level of quality. In many cases an
open implementation the exercises all device features is a better form
of documentation than what vendors have, even internally. Of course, the
opposite is true as well, having seen too much vendor code for my own
good, there is such a thing as a working for unreadable implementation.

I fully agree with your point about feature completeness by the way,
vendors will always find ways to hide pieces of the API if they really
want to, but I think that would be true of documentation as well.

In the DRM/KMS subsystem, the requirement is to provide an
implementation in a mainstream graphics stack (depending on the device,
that could be Mesa, Xorg, Weston, Android AOSP, ...) *and* get it
approved by the maintainers of that stack. Requiring maintainer approval
is the best way that was found to ensure a sufficient level of quality
in those cases.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-10 23:46   ` Laurent Pinchart
@ 2021-09-11  0:38     ` Mauro Carvalho Chehab
  2021-09-11  9:27       ` Laurent Pinchart
  0 siblings, 1 reply; 77+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-11  0:38 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: Jonathan Corbet, ksummit

Em Sat, 11 Sep 2021 02:46:42 +0300
Laurent Pinchart <laurent.pinchart@ideasonboard.com> escreveu:

> Hi Mauro,
> 
> On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote:
> > Em Fri, 10 Sep 2021 15:00:58 -0600 Jonathan Corbet escreveu:
> >   
> > > There has been a regular disagreement in recent years about whether
> > > drivers for accelerators (such as for the Habana Gaudi device) should be
> > > subject to the same requirements as GPU drivers when it comes to the
> > > availability of a free implementation of the user-space side.  It flared
> > > up again recently:
> > > 
> > >    https://lwn.net/Articles/867168/
> > > 
> > > Happily, the Habana situation in particular seems to be resolving
> > > itself:
> > > 
> > >    https://lwn.net/ml/linux-kernel/CAFCwf119s7iXk+qpwoVPnRtOGcxeuZb3rnihf6NWWoVT-4ODHA@mail.gmail.com/
> > > 
> > > But even there it is clear that the fundamental question has not yet
> > > been resolved.
> > > 
> > > This seems like the sort of question that the maintainer summit exists
> > > to address.  Specifically, we could discuss:
> > > 
> > >  - Under which circumstances should the kernel community require the
> > >    existence of freely licensed user-space code that exercises all
> > >    functionalities of a proposed kernel driver or feature?
> > > 
> > >  - Are there internal kernel interfaces, such as DMA-BUF or P2PDMA, that
> > >    are only available to drivers with a free user-space implementation?
> > >    Do we need an EXPORT_SYMBOL_USERSPACE_GPL()?
> > > 
> > >  - What constitutes an acceptable user-space implementation in cases
> > >    where these restrictions apply?
> > > 
> > > I suspect that more clarity (and fewer arguments) on these questions
> > > would be welcome both within and beyond the development community.  
> > 
> > The media subsystem also has this sort of issues: there are several
> > drivers there to support hardware accelerators for video encoders and 
> > decoders. In the case of media, usually devices with such hardware have
> > an Image Signal Processor, where the codec runs on some firmware.
> > 
> > On media, enforcing userspace to always be open source would
> > have been very bad, as it would prevent several videoconferencing 
> > software to exist on Linux.  
> 
> Could you elaborate on which software you're thinking of ? And maybe
> which driver(s) you're thinking about ?

I'm referring to tools like v4l2-compliance, qv4l2 and other tools
we maintain at v4l-utils tree.

> > Also, there are several such codec hardware that only exists on 
> > embedded hardware that already depends on proprietary software 
> > to run.
> > 
> > So, a policy like that would make more damage than good.  
> 
> I wonder if there's some sort of misunderstanding. We're not talking
> about requiring *all* userspace to be open, but about requiring the
> existence of *one* open userspace as an acceptance criteria for merging
> drivers.

Something like EXPORT_SYMBOL_USERSPACE_GPL() implies that any
userspace app using such symbols would be GPL'd.

> 
> > What we do, instead, is to try to enforce that the userspace API to
> > be fully documented in a way that open source software can exist.
> > 
> > This is easier said than done, but we have some compliance tools
> > that we use, in order to help to validate the uAPI implementations.  
> 
> I won't comment on the codec side as there are people more knowledgeable
> than me in that area, but on the camera side, my analysis of the
> situation is different than yours. The vast majority of drivers only use
> standard parts of the V4L2 and MC APIs. For those, we do have plenty of
> existing open userspace, as well as compliance tools as you mentioned
> (some drivers also expose custom controls, but that's a very small API
> footprint and they are documented well enough to be usable by any
> application).

Yes, that's my view too. We used to have problems in the past with
some proprietary fourccs, but I guess the problematic ones were
either removed (because there was no upstream driver) or documented.

> The possibly problematic case is mostly about ISP drivers. For those,
> the userspace API is more complex, with lots of device-specific
> elements. The first ISP that received kernel support was the OMAP3 ISP,
> and the driver has custom ioctls. Requiring an open userspace may indeed
> have delayed the driver from being merged. However, for that particular
> device, we had a public datasheet that documented the ISP,

Yes, but afterwards, other ISP drivers got added. I don't think they
all have public datasheets. 

> which we
> could consider as an alternative to the open userspace implementation
> (a topic worth discussing I believe).

Yeah, a public datasheet sounds an interesting requirement. It offers
a problem, though: maybe some details could be missed on it, which
would prevent any real open source userspace development.

> Even if we had considered the
> public datasheet to not be enough, I think we would have eventually got
> an open userspace anyway (based on my internal knowledge of the Nokia
> team working on this project).

Yes.

> More recently, we have two ISP drivers that got merged, for the Rockchip
> RK3399 ISP and the Intel IPU3. Those drivers differ from all previous
> drivers in the sense that the device is configured through a blob of
> parameters passed by userspace to the kernel, written to registers by
> the driver in the Rockchip case, and passed to the ISP firmware in the
> Intel case. We have for both drivers a header file that describes the
> layout of those blobs as C structures, but I can tell with first hand
> experience working on an open userspace implementation that at least in
> the Intel case that's not enough to use the ISP.

Yeah, I was afraid that this would end happening some day. Not too big
harm, though, as IPU3 is under staging. We should enforce it to be
be supported at libcamera or to have some other open source application
before moving it out of staging.

> There's also an ISP driver for Raspberry Pi that is currently out of
> tree and that we'll try to upstream, and for that one we have an open
> userspace already (there's actually no closed userspace, kudos to
> Raspberry Pi for doing the right thing, I'd like to see more vendors
> following that great lead).
> 
> Finally, having spent the last 2 years and a half working on an open
> userspace camera stack (libcamera) that exercise the V4L2 and MC APIs, I
> was quite horrified to find out how some parts of those APIs are pretty
> badly designed. I'm not just blaming others here, this includes APIs
> that I have designed myself. They have been tested at the time with test
> applications (either extending tools such as v4l2-ctl, or writing
> dedicated tested tools for the API), but failed me when exercised in
> real use cases. In retrospect I shouldn't have been surprised,
> developing a test application that exercises the API in the way it was
> designed as opposed to the way a real use case would need it can only
> lead to problems. I think that requiring an open implementation of a
> real use case, not just a test tool, would be a very good practice for
> new APIs or extensions or existing APIs.

I remember that, during OMAP3 development, I required several times
to have an userspace app/lib before merging it upstream (somewhat
similar to libcamera goals). 

On that time, we didn't have staging yet. So, when Nokia ended with
MeeGo development, I opted to merge what we had so far to support
OMAP3 even without having an open source counterpart, as there
were already some public documentation that seemed to help someone
to write userspace tools in the future.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-11  0:38     ` Mauro Carvalho Chehab
@ 2021-09-11  9:27       ` Laurent Pinchart
  2021-09-11 22:33         ` Mauro Carvalho Chehab
  2021-09-13 12:04         ` Mark Brown
  0 siblings, 2 replies; 77+ messages in thread
From: Laurent Pinchart @ 2021-09-11  9:27 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: Jonathan Corbet, ksummit

Hi Mauro,

On Sat, Sep 11, 2021 at 02:38:11AM +0200, Mauro Carvalho Chehab wrote:
> Em Sat, 11 Sep 2021 02:46:42 +0300 Laurent Pinchart escreveu:
> > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote:
> > > Em Fri, 10 Sep 2021 15:00:58 -0600 Jonathan Corbet escreveu:
> > >   
> > > > There has been a regular disagreement in recent years about whether
> > > > drivers for accelerators (such as for the Habana Gaudi device) should be
> > > > subject to the same requirements as GPU drivers when it comes to the
> > > > availability of a free implementation of the user-space side.  It flared
> > > > up again recently:
> > > > 
> > > >    https://lwn.net/Articles/867168/
> > > > 
> > > > Happily, the Habana situation in particular seems to be resolving
> > > > itself:
> > > > 
> > > >    https://lwn.net/ml/linux-kernel/CAFCwf119s7iXk+qpwoVPnRtOGcxeuZb3rnihf6NWWoVT-4ODHA@mail.gmail.com/
> > > > 
> > > > But even there it is clear that the fundamental question has not yet
> > > > been resolved.
> > > > 
> > > > This seems like the sort of question that the maintainer summit exists
> > > > to address.  Specifically, we could discuss:
> > > > 
> > > >  - Under which circumstances should the kernel community require the
> > > >    existence of freely licensed user-space code that exercises all
> > > >    functionalities of a proposed kernel driver or feature?
> > > > 
> > > >  - Are there internal kernel interfaces, such as DMA-BUF or P2PDMA, that
> > > >    are only available to drivers with a free user-space implementation?
> > > >    Do we need an EXPORT_SYMBOL_USERSPACE_GPL()?
> > > > 
> > > >  - What constitutes an acceptable user-space implementation in cases
> > > >    where these restrictions apply?
> > > > 
> > > > I suspect that more clarity (and fewer arguments) on these questions
> > > > would be welcome both within and beyond the development community.  
> > > 
> > > The media subsystem also has this sort of issues: there are several
> > > drivers there to support hardware accelerators for video encoders and 
> > > decoders. In the case of media, usually devices with such hardware have
> > > an Image Signal Processor, where the codec runs on some firmware.
> > > 
> > > On media, enforcing userspace to always be open source would
> > > have been very bad, as it would prevent several videoconferencing 
> > > software to exist on Linux.  
> > 
> > Could you elaborate on which software you're thinking of ? And maybe
> > which driver(s) you're thinking about ?
> 
> I'm referring to tools like v4l2-compliance, qv4l2 and other tools
> we maintain at v4l-utils tree.

I meant the video conferencing software that would have been prevented
from existing. I'd like to understand if you think that requiring *one*
open userspace would be problematic.

> > > Also, there are several such codec hardware that only exists on 
> > > embedded hardware that already depends on proprietary software 
> > > to run.
> > > 
> > > So, a policy like that would make more damage than good.  
> > 
> > I wonder if there's some sort of misunderstanding. We're not talking
> > about requiring *all* userspace to be open, but about requiring the
> > existence of *one* open userspace as an acceptance criteria for merging
> > drivers.
> 
> Something like EXPORT_SYMBOL_USERSPACE_GPL() implies that any
> userspace app using such symbols would be GPL'd.

I think EXPORT_SYMBOL_USERSPACE_GPL() has already been deemed not to be
the right option based on the discussions in this e-mail thread. The
requirement of having *one* open userspace is still being discussed, and
is orthogonal to EXPORT_SYMBOL_USERSPACE_GPL() I believe.

> > > What we do, instead, is to try to enforce that the userspace API to
> > > be fully documented in a way that open source software can exist.
> > > 
> > > This is easier said than done, but we have some compliance tools
> > > that we use, in order to help to validate the uAPI implementations.  
> > 
> > I won't comment on the codec side as there are people more knowledgeable
> > than me in that area, but on the camera side, my analysis of the
> > situation is different than yours. The vast majority of drivers only use
> > standard parts of the V4L2 and MC APIs. For those, we do have plenty of
> > existing open userspace, as well as compliance tools as you mentioned
> > (some drivers also expose custom controls, but that's a very small API
> > footprint and they are documented well enough to be usable by any
> > application).
> 
> Yes, that's my view too. We used to have problems in the past with
> some proprietary fourccs, but I guess the problematic ones were
> either removed (because there was no upstream driver) or documented.
> 
> > The possibly problematic case is mostly about ISP drivers. For those,
> > the userspace API is more complex, with lots of device-specific
> > elements. The first ISP that received kernel support was the OMAP3 ISP,
> > and the driver has custom ioctls. Requiring an open userspace may indeed
> > have delayed the driver from being merged. However, for that particular
> > device, we had a public datasheet that documented the ISP,
> 
> Yes, but afterwards, other ISP drivers got added. I don't think they
> all have public datasheets. 

Sure. That's addressed below :-)

> > which we
> > could consider as an alternative to the open userspace implementation
> > (a topic worth discussing I believe).
> 
> Yeah, a public datasheet sounds an interesting requirement. It offers
> a problem, though: maybe some details could be missed on it, which
> would prevent any real open source userspace development.

Absolutely, and I don't think we can come up with any process and
technical measure that would prevent a vendor from cheating if they
really want to. It will always be possible to hide some features behind
reserved registers that wouldn't need to be programmed for basic
operation but that would be crucial to optimize the quality or
performances. This is regardless of whether we want to enforce openness
of documentation in the form of datasheets or source code.

I'm not too concerned about this though. If we can address most of this
issue with a clear process and message I think it would be a very good
step forward already.

> > Even if we had considered the
> > public datasheet to not be enough, I think we would have eventually got
> > an open userspace anyway (based on my internal knowledge of the Nokia
> > team working on this project).
> 
> Yes.
> 
> > More recently, we have two ISP drivers that got merged, for the Rockchip
> > RK3399 ISP and the Intel IPU3. Those drivers differ from all previous
> > drivers in the sense that the device is configured through a blob of
> > parameters passed by userspace to the kernel, written to registers by
> > the driver in the Rockchip case, and passed to the ISP firmware in the
> > Intel case. We have for both drivers a header file that describes the
> > layout of those blobs as C structures, but I can tell with first hand
> > experience working on an open userspace implementation that at least in
> > the Intel case that's not enough to use the ISP.
> 
> Yeah, I was afraid that this would end happening some day. Not too big
> harm, though, as IPU3 is under staging. We should enforce it to be
> be supported at libcamera or to have some other open source application
> before moving it out of staging.

The same may be true of the rkisp1 driver, we haven't moved forward
enough with its support in libcamera yet to tell for sure.

> > There's also an ISP driver for Raspberry Pi that is currently out of
> > tree and that we'll try to upstream, and for that one we have an open
> > userspace already (there's actually no closed userspace, kudos to
> > Raspberry Pi for doing the right thing, I'd like to see more vendors
> > following that great lead).
> > 
> > Finally, having spent the last 2 years and a half working on an open
> > userspace camera stack (libcamera) that exercise the V4L2 and MC APIs, I
> > was quite horrified to find out how some parts of those APIs are pretty
> > badly designed. I'm not just blaming others here, this includes APIs
> > that I have designed myself. They have been tested at the time with test
> > applications (either extending tools such as v4l2-ctl, or writing
> > dedicated tested tools for the API), but failed me when exercised in
> > real use cases. In retrospect I shouldn't have been surprised,
> > developing a test application that exercises the API in the way it was
> > designed as opposed to the way a real use case would need it can only
> > lead to problems. I think that requiring an open implementation of a
> > real use case, not just a test tool, would be a very good practice for
> > new APIs or extensions or existing APIs.
> 
> I remember that, during OMAP3 development, I required several times
> to have an userspace app/lib before merging it upstream (somewhat
> similar to libcamera goals). 
> 
> On that time, we didn't have staging yet. So, when Nokia ended with
> MeeGo development, I opted to merge what we had so far to support
> OMAP3 even without having an open source counterpart, as there
> were already some public documentation that seemed to help someone
> to write userspace tools in the future.

And there was also https://git.ideasonboard.org/omap3-isp-live.git that
was published shortly after.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-10 23:55     ` Thomas Gleixner
  2021-09-11  0:20       ` Laurent Pinchart
@ 2021-09-11 10:31       ` Leon Romanovsky
  2021-09-11 11:41         ` Laurent Pinchart
  1 sibling, 1 reply; 77+ messages in thread
From: Leon Romanovsky @ 2021-09-11 10:31 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit

On Sat, Sep 11, 2021 at 01:55:16AM +0200, Thomas Gleixner wrote:
> On Fri, Sep 10 2021 at 16:45, Josh Triplett wrote:
> 
> > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote:
> >> On media, enforcing userspace to always be open source would
> >> have been very bad, as it would prevent several videoconferencing 
> >> software to exist on Linux.
> >
> > I don't think we should enforce that all userspace users of an interface
> > be Open Source. I do think we should enforce that *some* userspace user
> > of an interface be Open Source before we add the interface.
> 
> The real question is whether the interface is documented in a way that
> an Open Source implementation is possible. It does not matter whether it
> exists at that point in time or not. Even if it exists there is no
> guarantee that it is feature complete.
> 
> Freely accessible documentation is really the key.

I have more radical view than you and think that documentation is far
from being enough. I would like to see any userspace API used (or to be
used) in any package which exists in Debiam/Fedora/SuSE.

Only this will give us some sort of confidence that API and device are usable
to some level. As a side note, we will be able to estimate possible API
deprecation/fix/extension based on simple search in package databases.

IMHO, github projects to show API usage are the worst possible way to
allow acceptance for new userspace API.

Thanks

> 
> Thanks,
> 
>         tglx
> 
> 
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-11 10:31       ` Leon Romanovsky
@ 2021-09-11 11:41         ` Laurent Pinchart
  2021-09-11 12:04           ` Leon Romanovsky
  0 siblings, 1 reply; 77+ messages in thread
From: Laurent Pinchart @ 2021-09-11 11:41 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab,
	Jonathan Corbet, ksummit

On Sat, Sep 11, 2021 at 01:31:02PM +0300, Leon Romanovsky wrote:
> On Sat, Sep 11, 2021 at 01:55:16AM +0200, Thomas Gleixner wrote:
> > On Fri, Sep 10 2021 at 16:45, Josh Triplett wrote:
> > 
> > > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote:
> > >> On media, enforcing userspace to always be open source would
> > >> have been very bad, as it would prevent several videoconferencing 
> > >> software to exist on Linux.
> > >
> > > I don't think we should enforce that all userspace users of an interface
> > > be Open Source. I do think we should enforce that *some* userspace user
> > > of an interface be Open Source before we add the interface.
> > 
> > The real question is whether the interface is documented in a way that
> > an Open Source implementation is possible. It does not matter whether it
> > exists at that point in time or not. Even if it exists there is no
> > guarantee that it is feature complete.
> > 
> > Freely accessible documentation is really the key.
> 
> I have more radical view than you and think that documentation is far
> from being enough. I would like to see any userspace API used (or to be
> used) in any package which exists in Debiam/Fedora/SuSE.

We probably need to add Android AOSP to that list, as we have
Android-specific APIs (not that I believe we *should* have
Android-specific APIs, there's been lots of efforts over the past years
to develop standard APIs for use cases that stem from Android, slowly
replacing Android-specific APIs in some area, but I don't believe we can
realisticly bridge that gap completely overnight, if ever).

> Only this will give us some sort of confidence that API and device are usable
> to some level. As a side note, we will be able to estimate possible API
> deprecation/fix/extension based on simple search in package databases.

Linux supports devices from very diverse markets, from very tiny
embedded devices to supercomputers. We have drivers for devices that
exist in data centres of a single company only, or for which only a
handful of units exist through the world. The set of rules that we'll
decide on, if any, should take this into account.

> IMHO, github projects to show API usage are the worst possible way to
> allow acceptance for new userspace API.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-11 11:41         ` Laurent Pinchart
@ 2021-09-11 12:04           ` Leon Romanovsky
  2021-09-11 22:04             ` Laurent Pinchart
  0 siblings, 1 reply; 77+ messages in thread
From: Leon Romanovsky @ 2021-09-11 12:04 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab,
	Jonathan Corbet, ksummit

On Sat, Sep 11, 2021 at 02:41:52PM +0300, Laurent Pinchart wrote:
> On Sat, Sep 11, 2021 at 01:31:02PM +0300, Leon Romanovsky wrote:
> > On Sat, Sep 11, 2021 at 01:55:16AM +0200, Thomas Gleixner wrote:
> > > On Fri, Sep 10 2021 at 16:45, Josh Triplett wrote:
> > > 
> > > > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote:
> > > >> On media, enforcing userspace to always be open source would
> > > >> have been very bad, as it would prevent several videoconferencing 
> > > >> software to exist on Linux.
> > > >
> > > > I don't think we should enforce that all userspace users of an interface
> > > > be Open Source. I do think we should enforce that *some* userspace user
> > > > of an interface be Open Source before we add the interface.
> > > 
> > > The real question is whether the interface is documented in a way that
> > > an Open Source implementation is possible. It does not matter whether it
> > > exists at that point in time or not. Even if it exists there is no
> > > guarantee that it is feature complete.
> > > 
> > > Freely accessible documentation is really the key.
> > 
> > I have more radical view than you and think that documentation is far
> > from being enough. I would like to see any userspace API used (or to be
> > used) in any package which exists in Debiam/Fedora/SuSE.
> 
> We probably need to add Android AOSP to that list, as we have
> Android-specific APIs (not that I believe we *should* have
> Android-specific APIs, there's been lots of efforts over the past years
> to develop standard APIs for use cases that stem from Android, slowly
> replacing Android-specific APIs in some area, but I don't believe we can
> realisticly bridge that gap completely overnight, if ever).

Maybe.

> 
> > Only this will give us some sort of confidence that API and device are usable
> > to some level. As a side note, we will be able to estimate possible API
> > deprecation/fix/extension based on simple search in package databases.
> 
> Linux supports devices from very diverse markets, from very tiny
> embedded devices to supercomputers. We have drivers for devices that
> exist in data centres of a single company only, or for which only a
> handful of units exist through the world. The set of rules that we'll
> decide on, if any, should take this into account.

I'm part of that group (RDMA) who cares about enterprise, cloud and supercomputers. :)
So for us, working out-of-the box (distro packages and not github code drops) is
the key to the scalability.

Regarding "embedded devices", I remind that we are talking about
userspace API and most likely busybox will be used for them, which is
also part of larger distro anyway, so fails under category "exists in
Debian/Fedora/SuSE".

> 
> > IMHO, github projects to show API usage are the worst possible way to
> > allow acceptance for new userspace API.
> 
> -- 
> Regards,
> 
> Laurent Pinchart

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-11  0:20       ` Laurent Pinchart
@ 2021-09-11 14:20         ` Steven Rostedt
  2021-09-11 22:08           ` Laurent Pinchart
                             ` (2 more replies)
  0 siblings, 3 replies; 77+ messages in thread
From: Steven Rostedt @ 2021-09-11 14:20 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab,
	Jonathan Corbet, ksummit

On Sat, 11 Sep 2021 03:20:50 +0300
Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote:

> > Freely accessible documentation is really the key.  
> 
> In principle I'd agree, but that assumes such documentation would exist
> in the first place, with a sufficient level of quality. In many cases an
> open implementation the exercises all device features is a better form
> of documentation than what vendors have, even internally. Of course, the
> opposite is true as well, having seen too much vendor code for my own
> good, there is such a thing as a working for unreadable implementation.
> 
> I fully agree with your point about feature completeness by the way,
> vendors will always find ways to hide pieces of the API if they really
> want to, but I think that would be true of documentation as well.

I would like not only documentation, but also an open source test suite
that simply tests the interface. Honestly, I believe that all new
interfaces to the kernel (open or not) should have full documentation
and a test suite interface before it gets accepted. We have
tools/selftests that should be updated with all new interfaces into the
kernel.

Even if it's just a smoke test, that would be fine. Obviously if
there's a driver without hardware, it can't be tested. But if you have
that hardware, perhaps there could be a simple test suite of the
interface to let you know it is still functional.

-- Steve

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-10 22:35     ` James Bottomley
@ 2021-09-11 14:51       ` Jonathan Corbet
  2021-09-11 15:24         ` James Bottomley
  0 siblings, 1 reply; 77+ messages in thread
From: Jonathan Corbet @ 2021-09-11 14:51 UTC (permalink / raw)
  To: James Bottomley, Alexandre Belloni; +Cc: ksummit

James Bottomley <James.Bottomley@HansenPartnership.com> writes:

> On Fri, 2021-09-10 at 23:59 +0200, Alexandre Belloni wrote:
>> I think the question is not whether we want to forbid proprietary
>> user space using an API but whether we want to merge said API so the
>> license on the kernel doesn't matter much.
>
> I thought that *was* the statement I made in the last paragraph: we can
> choose whether or not to merge the enabling API into the kernel. 
> However, if we merge it we can't choose whether a proprietary user
> space takes advantage of the API.  My original reply was to the notion
> of EXPORT_USERSPACE_GPL, which I think we have no legal basis for
> enforcing without modifying the system exception.

That wasn't thinking when I pulled the idea of EXPORT_USERSPACE_GPL out
of whatever dark place it was lurking in.  The idea was, instead, to
document that if your driver is using that interface, it won't be
considered for merging into the kernel in the absence of a working,
free, user-space implementation -- should we choose to adopt such a
policy, of course.

Nobody is trying to prohibit proprietary user space, that's not the
point.

jon

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-11 14:51       ` Jonathan Corbet
@ 2021-09-11 15:24         ` James Bottomley
  2021-09-11 21:52           ` Laurent Pinchart
  0 siblings, 1 reply; 77+ messages in thread
From: James Bottomley @ 2021-09-11 15:24 UTC (permalink / raw)
  To: Jonathan Corbet, Alexandre Belloni; +Cc: ksummit

On Sat, 2021-09-11 at 08:51 -0600, Jonathan Corbet wrote:
> James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> 
> > On Fri, 2021-09-10 at 23:59 +0200, Alexandre Belloni wrote:
> > > I think the question is not whether we want to forbid proprietary
> > > user space using an API but whether we want to merge said API so
> > > the license on the kernel doesn't matter much.
> > 
> > I thought that *was* the statement I made in the last paragraph: we
> > can choose whether or not to merge the enabling API into the
> > kernel. However, if we merge it we can't choose whether a
> > proprietary user space takes advantage of the API.  My original
> > reply was to the notion of EXPORT_USERSPACE_GPL, which I think we
> > have no legal basis for enforcing without modifying the system
> > exception.
> 
> That wasn't thinking when I pulled the idea of EXPORT_USERSPACE_GPL
> out of whatever dark place it was lurking in.

OK, but you can see how that thought is arrived at since
EXPORT_SYMBOL_GPL is a technically enforced licensing permission tag. 
However, I was seriously pushing back against the *idea* of such a tag
because once it crosses the kernel to user boundary it would cause huge
confusion of our current licensing positions ... regardless of what it
actually means.

>   The idea was, instead, to document that if your driver is using
> that interface, it won't be considered for merging into the kernel in
> the absence of a working, free, user-space implementation -- should
> we choose to adopt such a policy, of course.

Right, and if you have a driver with an internal API that's used for
communication with a userspace blob, we can evaluate that, as we have
done before, on a case by case basis.  It's not a new thing, because
we're both old enough to remember "my wireless driver has to have a
proprietary component for regulatory reasons".

We've made decisions both for and against such drivers in the past, but
I think the issues are too nuanced to make a general rule.  If you do
have a general rule, what other things, like firmware, would get caught
by it and so on ...

> Nobody is trying to prohibit proprietary user space, that's not the
> point.

I didn't think you were in general, but requiring a free userspace
driver implementation is prohibiting a proprietary one and so then you
get into questions of how wide the reach is and what the knock on
effects are if you try to craft a general policy around this ...
especially if it has technical enforcement measures.

James



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-11 15:24         ` James Bottomley
@ 2021-09-11 21:52           ` Laurent Pinchart
  2021-09-14 13:22             ` Johannes Berg
  0 siblings, 1 reply; 77+ messages in thread
From: Laurent Pinchart @ 2021-09-11 21:52 UTC (permalink / raw)
  To: James Bottomley; +Cc: Jonathan Corbet, Alexandre Belloni, ksummit

Hi James,

On Sat, Sep 11, 2021 at 08:24:38AM -0700, James Bottomley wrote:
> On Sat, 2021-09-11 at 08:51 -0600, Jonathan Corbet wrote:
> > James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> > 
> > > On Fri, 2021-09-10 at 23:59 +0200, Alexandre Belloni wrote:
> > > > I think the question is not whether we want to forbid proprietary
> > > > user space using an API but whether we want to merge said API so
> > > > the license on the kernel doesn't matter much.
> > > 
> > > I thought that *was* the statement I made in the last paragraph: we
> > > can choose whether or not to merge the enabling API into the
> > > kernel. However, if we merge it we can't choose whether a
> > > proprietary user space takes advantage of the API.  My original
> > > reply was to the notion of EXPORT_USERSPACE_GPL, which I think we
> > > have no legal basis for enforcing without modifying the system
> > > exception.
> > 
> > That wasn't thinking when I pulled the idea of EXPORT_USERSPACE_GPL
> > out of whatever dark place it was lurking in.
> 
> OK, but you can see how that thought is arrived at since
> EXPORT_SYMBOL_GPL is a technically enforced licensing permission tag. 
> However, I was seriously pushing back against the *idea* of such a tag
> because once it crosses the kernel to user boundary it would cause huge
> confusion of our current licensing positions ... regardless of what it
> actually means.
> 
> >   The idea was, instead, to document that if your driver is using
> > that interface, it won't be considered for merging into the kernel in
> > the absence of a working, free, user-space implementation -- should
> > we choose to adopt such a policy, of course.
> 
> Right, and if you have a driver with an internal API that's used for
> communication with a userspace blob, we can evaluate that, as we have
> done before, on a case by case basis.  It's not a new thing, because
> we're both old enough to remember "my wireless driver has to have a
> proprietary component for regulatory reasons".
> 
> We've made decisions both for and against such drivers in the past, but
> I think the issues are too nuanced to make a general rule.  If you do
> have a general rule, what other things, like firmware, would get caught
> by it and so on ...
> 
> > Nobody is trying to prohibit proprietary user space, that's not the
> > point.
> 
> I didn't think you were in general, but requiring a free userspace
> driver implementation is prohibiting a proprietary one and so then you
> get into questions of how wide the reach is and what the knock on
> effects are if you try to craft a general policy around this ...
> especially if it has technical enforcement measures.

Requiring the existence of one open userspace implementation doesn't
preclude the ability for vendors to develop and ship closed
implementations in parallel, at least in the general case. For instance,
with GPUs or cameras, an open implementation could be developed (in Mesa
and libcamera respectively) to exercise the device features (such as the
GPU shader instruction set, or the camera ISP processing parameters),
but wouldn't be required to include all the tuning and optimizations
that a closed implementation would typically have. A vendor could thus
ship a closed-source shader compiler in its OpenGL/Vulkan userspace
driver, protecting the R&D investment to implement those optimizations
(this would most likely also include lots of hacks to please
benchmarks), and the community would be able to use the GPU and improve
the open implementation.

For GPUs, the situation has been quite clear for years: if a vendor
wants an upstream driver, with all the benefits this brings, they have
to also provide a (mostly?) feature-complete (in the sense of hardware
features) but not necessarily optimized open-source counterpart. We're
exploring here whether or not the same deal should cover camera ISPs and
ML/AI accelerators (and possibly other devices that I'm less familiar
with).

For a wireless driver the situation is possibly different, I suppose
that if the closed-source userspace blob is there only for regulatory
reasons, then there would be very little point in having a closed-source
implementation with a parallel one.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-11 12:04           ` Leon Romanovsky
@ 2021-09-11 22:04             ` Laurent Pinchart
  2021-09-12  4:27               ` Leon Romanovsky
  0 siblings, 1 reply; 77+ messages in thread
From: Laurent Pinchart @ 2021-09-11 22:04 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab,
	Jonathan Corbet, ksummit

Hi Leon,

On Sat, Sep 11, 2021 at 03:04:07PM +0300, Leon Romanovsky wrote:
> On Sat, Sep 11, 2021 at 02:41:52PM +0300, Laurent Pinchart wrote:
> > On Sat, Sep 11, 2021 at 01:31:02PM +0300, Leon Romanovsky wrote:
> > > On Sat, Sep 11, 2021 at 01:55:16AM +0200, Thomas Gleixner wrote:
> > > > On Fri, Sep 10 2021 at 16:45, Josh Triplett wrote:
> > > > 
> > > > > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote:
> > > > >> On media, enforcing userspace to always be open source would
> > > > >> have been very bad, as it would prevent several videoconferencing 
> > > > >> software to exist on Linux.
> > > > >
> > > > > I don't think we should enforce that all userspace users of an interface
> > > > > be Open Source. I do think we should enforce that *some* userspace user
> > > > > of an interface be Open Source before we add the interface.
> > > > 
> > > > The real question is whether the interface is documented in a way that
> > > > an Open Source implementation is possible. It does not matter whether it
> > > > exists at that point in time or not. Even if it exists there is no
> > > > guarantee that it is feature complete.
> > > > 
> > > > Freely accessible documentation is really the key.
> > > 
> > > I have more radical view than you and think that documentation is far
> > > from being enough. I would like to see any userspace API used (or to be
> > > used) in any package which exists in Debiam/Fedora/SuSE.
> > 
> > We probably need to add Android AOSP to that list, as we have
> > Android-specific APIs (not that I believe we *should* have
> > Android-specific APIs, there's been lots of efforts over the past years
> > to develop standard APIs for use cases that stem from Android, slowly
> > replacing Android-specific APIs in some area, but I don't believe we can
> > realisticly bridge that gap completely overnight, if ever).
> 
> Maybe.
> 
> > > Only this will give us some sort of confidence that API and device are usable
> > > to some level. As a side note, we will be able to estimate possible API
> > > deprecation/fix/extension based on simple search in package databases.
> > 
> > Linux supports devices from very diverse markets, from very tiny
> > embedded devices to supercomputers. We have drivers for devices that
> > exist in data centres of a single company only, or for which only a
> > handful of units exist through the world. The set of rules that we'll
> > decide on, if any, should take this into account.
> 
> I'm part of that group (RDMA) who cares about enterprise, cloud and supercomputers. :)
> So for us, working out-of-the box (distro packages and not github code drops) is
> the key to the scalability.

What if we're dealing with a device that only exists in a handful of
machines though ? Would distributions accept the burden of packaging
corresponding userspace code, and maintaining the packages, when only a
handful of people in the world will use it ? It's a genuine question.

> Regarding "embedded devices", I remind that we are talking about
> userspace API and most likely busybox will be used for them, which is
> also part of larger distro anyway, so fails under category "exists in
> Debian/Fedora/SuSE".

We're talking about APIs exposed by drivers, for devices such as GPUs,
cameras or AI/ML accelerators. I don't think busybox will exercise those
:-) We have Masa for GPUs, libcamera for cameras, and other frameworks
I'm less familiar with for AI/ML accelerators, and I expect those to be
packaged by distributions. There are however other kind of devices that
don't fall in existing well-defined categories.

I'm thinking, for instance, about dewarp engines that are used to create
3D surround view for cars. In a nutshell, those devices take a set of
texture and a list of triangles, and perform texture mapping. They're a
bit like GPUs but without 3D, so APIs such as OpenGL or Vulkan don't
apply. There's no standard API for such devices, and no existing
userspace framework similar to Mesa in which a vendor could upstream the
open userspace driver code. I believe that requiring an open userspace
to merge such drivers in the kernel would make sense, but I also don't
think it would be reasonable to ask the first vendor who wants to do so
to create a complete userspace framework with a standard API. The bar to
entry would be too high. An open implementation specific to that device,
with a custom application API, would be a good first step, and it could
serve as a basis to create a framework once a second vendor wants to do
the same. We have to set the end goal, but also consider how it can be
reached.

> > > IMHO, github projects to show API usage are the worst possible way to
> > > allow acceptance for new userspace API.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-11 14:20         ` Steven Rostedt
@ 2021-09-11 22:08           ` Laurent Pinchart
  2021-09-11 22:42             ` Steven Rostedt
  2021-09-11 22:51           ` Mauro Carvalho Chehab
  2021-09-11 23:22           ` Mauro Carvalho Chehab
  2 siblings, 1 reply; 77+ messages in thread
From: Laurent Pinchart @ 2021-09-11 22:08 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab,
	Jonathan Corbet, ksummit

Hi Steven,

On Sat, Sep 11, 2021 at 10:20:07AM -0400, Steven Rostedt wrote:
> On Sat, 11 Sep 2021 03:20:50 +0300 Laurent Pinchart wrote:
> 
> > > Freely accessible documentation is really the key.  
> > 
> > In principle I'd agree, but that assumes such documentation would exist
> > in the first place, with a sufficient level of quality. In many cases an
> > open implementation the exercises all device features is a better form
> > of documentation than what vendors have, even internally. Of course, the
> > opposite is true as well, having seen too much vendor code for my own
> > good, there is such a thing as a working for unreadable implementation.
> > 
> > I fully agree with your point about feature completeness by the way,
> > vendors will always find ways to hide pieces of the API if they really
> > want to, but I think that would be true of documentation as well.
> 
> I would like not only documentation, but also an open source test suite
> that simply tests the interface. Honestly, I believe that all new
> interfaces to the kernel (open or not) should have full documentation
> and a test suite interface before it gets accepted. We have
> tools/selftests that should be updated with all new interfaces into the
> kernel.
> 
> Even if it's just a smoke test, that would be fine. Obviously if
> there's a driver without hardware, it can't be tested. But if you have
> that hardware, perhaps there could be a simple test suite of the
> interface to let you know it is still functional.

It really depends on the device and the interface it requires. A GPU or
camera ISP driver can't be meaningfully tested just at the interface
level. The interface exposed to userspace is usually of the form of an
ioctl that allows passing a large command buffer in a device-specific
format, full of data that is then consumed by hardware or firmware. For
instance, look at the ipu3_uapi_params structure in
drivers/staging/media/ipu3/include/uapi/intel-ipu3.h. You need very
elaborate code to exercise such an API.

If you wanted GPU drivers to have tests in tools/selftests, you'd have
to move Mesa to the kernel :-)

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-11  9:27       ` Laurent Pinchart
@ 2021-09-11 22:33         ` Mauro Carvalho Chehab
  2021-09-13 12:04         ` Mark Brown
  1 sibling, 0 replies; 77+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-11 22:33 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: Jonathan Corbet, ksummit

Em Sat, 11 Sep 2021 12:27:37 +0300
Laurent Pinchart <laurent.pinchart@ideasonboard.com> escreveu:

> Hi Mauro,

> > > > On media, enforcing userspace to always be open source would
> > > > have been very bad, as it would prevent several videoconferencing 
> > > > software to exist on Linux.    
> > > 
> > > Could you elaborate on which software you're thinking of ? And maybe
> > > which driver(s) you're thinking about ?  
> > 
> > I'm referring to tools like v4l2-compliance, qv4l2 and other tools
> > we maintain at v4l-utils tree.  
> 
> I meant the video conferencing software that would have been prevented
> from existing. I'd like to understand if you think that requiring *one*
> open userspace would be problematic.

No, requiring *one* open userspace real application should be enough.

> > Yeah, a public datasheet sounds an interesting requirement. It offers
> > a problem, though: maybe some details could be missed on it, which
> > would prevent any real open source userspace development.  
> 
> Absolutely, and I don't think we can come up with any process and
> technical measure that would prevent a vendor from cheating if they
> really want to. It will always be possible to hide some features behind
> reserved registers that wouldn't need to be programmed for basic
> operation but that would be crucial to optimize the quality or
> performances. This is regardless of whether we want to enforce openness
> of documentation in the form of datasheets or source code.

Unfortunately true.

> I'm not too concerned about this though. If we can address most of this
> issue with a clear process and message I think it would be a very good
> step forward already.

Yeah, a policy could be implemented in order to address such cases,
asking the vendor for a fix or even removing drivers and banning
vendors that are, by purpose, sending broken drivers/APIs.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-11 22:08           ` Laurent Pinchart
@ 2021-09-11 22:42             ` Steven Rostedt
  2021-09-11 23:10               ` Laurent Pinchart
  2021-09-13 11:10               ` Mark Brown
  0 siblings, 2 replies; 77+ messages in thread
From: Steven Rostedt @ 2021-09-11 22:42 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab,
	Jonathan Corbet, ksummit

On Sun, 12 Sep 2021 01:08:55 +0300
Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote:

> If you wanted GPU drivers to have tests in tools/selftests, you'd have
> to move Mesa to the kernel :-)

Some selftests have dependencies. It could require that Mesa is
installed to run the tests, otherwise it just returns "unsupported".

-- Steve

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-11 14:20         ` Steven Rostedt
  2021-09-11 22:08           ` Laurent Pinchart
@ 2021-09-11 22:51           ` Mauro Carvalho Chehab
  2021-09-11 23:22           ` Mauro Carvalho Chehab
  2 siblings, 0 replies; 77+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-11 22:51 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Laurent Pinchart, Thomas Gleixner, Josh Triplett,
	Jonathan Corbet, ksummit

Em Sat, 11 Sep 2021 10:20:07 -0400
Steven Rostedt <rostedt@goodmis.org> escreveu:

> On Sat, 11 Sep 2021 03:20:50 +0300
> Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote:
> 
> > > Freely accessible documentation is really the key.    
> > 
> > In principle I'd agree, but that assumes such documentation would exist
> > in the first place, with a sufficient level of quality. In many cases an
> > open implementation the exercises all device features is a better form
> > of documentation than what vendors have, even internally. Of course, the
> > opposite is true as well, having seen too much vendor code for my own
> > good, there is such a thing as a working for unreadable implementation.
> > 
> > I fully agree with your point about feature completeness by the way,
> > vendors will always find ways to hide pieces of the API if they really
> > want to, but I think that would be true of documentation as well.  
> 
> I would like not only documentation, but also an open source test suite
> that simply tests the interface. Honestly, I believe that all new
> interfaces to the kernel (open or not) should have full documentation
> and a test suite interface before it gets accepted.

Fully agreed.

> We have
> tools/selftests that should be updated with all new interfaces into the
> kernel.
> 
> Even if it's just a smoke test, that would be fine. Obviously if
> there's a driver without hardware, it can't be tested. But if you have
> that hardware, perhaps there could be a simple test suite of the
> interface to let you know it is still functional.

Those days, if a vendor is adding support for a hardware that requires 
a new API, it usually means that it is a new hardware under development.
Only such vendor may have the hardware. A smoke test would mean about
nothing to the ones reviewing the patches, except if the vendor will
also be shipping it to the reviewers.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-11 22:42             ` Steven Rostedt
@ 2021-09-11 23:10               ` Laurent Pinchart
  2021-09-13 11:10               ` Mark Brown
  1 sibling, 0 replies; 77+ messages in thread
From: Laurent Pinchart @ 2021-09-11 23:10 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab,
	Jonathan Corbet, ksummit

Hi Steven,

On Sat, Sep 11, 2021 at 06:42:05PM -0400, Steven Rostedt wrote:
> On Sun, 12 Sep 2021 01:08:55 +0300 Laurent Pinchart wrote:
> 
> > If you wanted GPU drivers to have tests in tools/selftests, you'd have
> > to move Mesa to the kernel :-)
> 
> Some selftests have dependencies. It could require that Mesa is
> installed to run the tests, otherwise it just returns "unsupported".

Obviously, I should have considered that.

Projects such as Mesa or libcamera have extensive test suites for the
supported devices. Is that something you'd like to integrate with
selftests ? I'm not really sure how that should be done.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-11 14:20         ` Steven Rostedt
  2021-09-11 22:08           ` Laurent Pinchart
  2021-09-11 22:51           ` Mauro Carvalho Chehab
@ 2021-09-11 23:22           ` Mauro Carvalho Chehab
  2 siblings, 0 replies; 77+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-11 23:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Laurent Pinchart, Thomas Gleixner, Josh Triplett,
	Jonathan Corbet, ksummit

Em Sat, 11 Sep 2021 10:20:07 -0400
Steven Rostedt <rostedt@goodmis.org> escreveu:

> I would like not only documentation, but also an open source test suite
> that simply tests the interface. Honestly, I believe that all new
> interfaces to the kernel (open or not) should have full documentation
> and a test suite interface before it gets accepted

Btw, I've been working on an improvement for scripts/get_abi.pl, in order
to allow it to check for missing API documentation:

	https://lore.kernel.org/lkml/cover.1631112725.git.mchehab+huawei@kernel.org/

It basically reads everything under /sys and at Documentation/ABI,
and checks if something was found at sysfs but there's no
documentation for it. It allows to optionally search for an specific
string (actually, it uses regex):

	$ ./scripts/get_abi.pl undefined --search-string devices.*cpulistaffinity
	/sys/devices/pci0000:00/0000:00:01.0/pci_bus/0000:01/cpulistaffinity not found.
	/sys/devices/pci0000:00/0000:00:01.1/pci_bus/0000:02/cpulistaffinity not found.
	/sys/devices/pci0000:00/0000:00:01.2/pci_bus/0000:03/cpulistaffinity not found.
	/sys/devices/pci0000:00/0000:00:1c.0/pci_bus/0000:04/cpulistaffinity not found.
	/sys/devices/pci0000:00/0000:00:1c.1/pci_bus/0000:05/cpulistaffinity not found.
	/sys/devices/pci0000:00/0000:00:1c.2/pci_bus/0000:06/cpulistaffinity not found.
	/sys/devices/pci0000:00/0000:00:1c.4/pci_bus/0000:07/cpulistaffinity not found.
	/sys/devices/pci0000:00/0000:00:1d.0/pci_bus/0000:72/cpulistaffinity not found.
	/sys/devices/pci0000:00/0000:00:1d.4/pci_bus/0000:73/cpulistaffinity not found.
	/sys/devices/pci0000:00/pci_bus/0000:00/cpulistaffinity not found.

While it won't check the quality of the ABI, it would let someone to
at least double check if a driver is not exposing something undocumented
via sysfs.

If someone wants to test, the newest version is at:

	https://github.com/mchehab/linux/commits/get_undefined

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-11 22:04             ` Laurent Pinchart
@ 2021-09-12  4:27               ` Leon Romanovsky
  2021-09-12  7:26                 ` Greg KH
  2021-09-12  7:46                 ` Mauro Carvalho Chehab
  0 siblings, 2 replies; 77+ messages in thread
From: Leon Romanovsky @ 2021-09-12  4:27 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab,
	Jonathan Corbet, ksummit

On Sun, Sep 12, 2021 at 01:04:01AM +0300, Laurent Pinchart wrote:
> Hi Leon,
> 
> On Sat, Sep 11, 2021 at 03:04:07PM +0300, Leon Romanovsky wrote:
> > On Sat, Sep 11, 2021 at 02:41:52PM +0300, Laurent Pinchart wrote:
> > > On Sat, Sep 11, 2021 at 01:31:02PM +0300, Leon Romanovsky wrote:
> > > > On Sat, Sep 11, 2021 at 01:55:16AM +0200, Thomas Gleixner wrote:
> > > > > On Fri, Sep 10 2021 at 16:45, Josh Triplett wrote:
> > > > > 
> > > > > > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote:
> > > > > >> On media, enforcing userspace to always be open source would
> > > > > >> have been very bad, as it would prevent several videoconferencing 
> > > > > >> software to exist on Linux.
> > > > > >
> > > > > > I don't think we should enforce that all userspace users of an interface
> > > > > > be Open Source. I do think we should enforce that *some* userspace user
> > > > > > of an interface be Open Source before we add the interface.
> > > > > 
> > > > > The real question is whether the interface is documented in a way that
> > > > > an Open Source implementation is possible. It does not matter whether it
> > > > > exists at that point in time or not. Even if it exists there is no
> > > > > guarantee that it is feature complete.
> > > > > 
> > > > > Freely accessible documentation is really the key.
> > > > 
> > > > I have more radical view than you and think that documentation is far
> > > > from being enough. I would like to see any userspace API used (or to be
> > > > used) in any package which exists in Debiam/Fedora/SuSE.
> > > 
> > > We probably need to add Android AOSP to that list, as we have
> > > Android-specific APIs (not that I believe we *should* have
> > > Android-specific APIs, there's been lots of efforts over the past years
> > > to develop standard APIs for use cases that stem from Android, slowly
> > > replacing Android-specific APIs in some area, but I don't believe we can
> > > realisticly bridge that gap completely overnight, if ever).
> > 
> > Maybe.
> > 
> > > > Only this will give us some sort of confidence that API and device are usable
> > > > to some level. As a side note, we will be able to estimate possible API
> > > > deprecation/fix/extension based on simple search in package databases.
> > > 
> > > Linux supports devices from very diverse markets, from very tiny
> > > embedded devices to supercomputers. We have drivers for devices that
> > > exist in data centres of a single company only, or for which only a
> > > handful of units exist through the world. The set of rules that we'll
> > > decide on, if any, should take this into account.
> > 
> > I'm part of that group (RDMA) who cares about enterprise, cloud and supercomputers. :)
> > So for us, working out-of-the box (distro packages and not github code drops) is
> > the key to the scalability.
> 
> What if we're dealing with a device that only exists in a handful of
> machines though ? Would distributions accept the burden of packaging
> corresponding userspace code, and maintaining the packages, when only a
> handful of people in the world will use it ? It's a genuine question.

Fedora, Debian and OpenSuSE are volunteer based distributions, they
accept new packages, which need to be prepared (or asked to be
prepared) by such vendors.

There is no "accept the burden of packaging corresponding userspace code,
and maintaining the packages", it is on package maintainer who can or
can't be associated with distribution.

> 
> > Regarding "embedded devices", I remind that we are talking about
> > userspace API and most likely busybox will be used for them, which is
> > also part of larger distro anyway, so fails under category "exists in
> > Debian/Fedora/SuSE".
> 
> We're talking about APIs exposed by drivers, for devices such as GPUs,
> cameras or AI/ML accelerators. I don't think busybox will exercise those
> :-) We have Masa for GPUs, libcamera for cameras, and other frameworks
> I'm less familiar with for AI/ML accelerators, and I expect those to be
> packaged by distributions. There are however other kind of devices that
> don't fall in existing well-defined categories.

I'm a little bit confused here. IMHO, you are trying to find an universal
solution for a problem that doesn't exist.

Above you asked how to deal with niche devices? Here you talk about mass
products devices for the enterprise while before you mentioned "embedded
devices".

1. Niche devices - continue to do as they do it now, by supplying
out-of-tree solutions for their customers. Such devices and companies
rarely need upstream linux kernel support, because the burden to
upstream it is very high. We don't want them in the tree either, because
once they upstream it, the maintenance burden will be on us.
2. Devices that hits the certain level of adoption - need to be
integrated into certain userspace stack, which needs to be part of
distro.

And AI/ML is no different here, someone just need to start build such
stack. Otherwise, we will continue to see more free riders like HabanaLabs
which don't have any real benefit to the community.

> 
> I'm thinking, for instance, about dewarp engines that are used to create
> 3D surround view for cars. In a nutshell, those devices take a set of
> texture and a list of triangles, and perform texture mapping. They're a
> bit like GPUs but without 3D, so APIs such as OpenGL or Vulkan don't
> apply. There's no standard API for such devices, and no existing
> userspace framework similar to Mesa in which a vendor could upstream the
> open userspace driver code. I believe that requiring an open userspace
> to merge such drivers in the kernel would make sense, but I also don't
> think it would be reasonable to ask the first vendor who wants to do so
> to create a complete userspace framework with a standard API. The bar to
> entry would be too high. An open implementation specific to that device,
> with a custom application API, would be a good first step, and it could
> serve as a basis to create a framework once a second vendor wants to do
> the same. We have to set the end goal, but also consider how it can be
> reached.

The bar needs to be high from the beginning, We can lower it later if it
doesn't work.

Thanks

> 
> > > > IMHO, github projects to show API usage are the worst possible way to
> > > > allow acceptance for new userspace API.
> 
> -- 
> Regards,
> 
> Laurent Pinchart

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-12  4:27               ` Leon Romanovsky
@ 2021-09-12  7:26                 ` Greg KH
  2021-09-12  8:29                   ` Leon Romanovsky
  2021-09-12 19:52                   ` Dave Airlie
  2021-09-12  7:46                 ` Mauro Carvalho Chehab
  1 sibling, 2 replies; 77+ messages in thread
From: Greg KH @ 2021-09-12  7:26 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Laurent Pinchart, Thomas Gleixner, Josh Triplett,
	Mauro Carvalho Chehab, Jonathan Corbet, ksummit

On Sun, Sep 12, 2021 at 07:27:55AM +0300, Leon Romanovsky wrote:
> On Sun, Sep 12, 2021 at 01:04:01AM +0300, Laurent Pinchart wrote:
> > Hi Leon,
> > 
> > On Sat, Sep 11, 2021 at 03:04:07PM +0300, Leon Romanovsky wrote:
> > > On Sat, Sep 11, 2021 at 02:41:52PM +0300, Laurent Pinchart wrote:
> > > > On Sat, Sep 11, 2021 at 01:31:02PM +0300, Leon Romanovsky wrote:
> > > > > On Sat, Sep 11, 2021 at 01:55:16AM +0200, Thomas Gleixner wrote:
> > > > > > On Fri, Sep 10 2021 at 16:45, Josh Triplett wrote:
> > > > > > 
> > > > > > > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote:
> > > > > > >> On media, enforcing userspace to always be open source would
> > > > > > >> have been very bad, as it would prevent several videoconferencing 
> > > > > > >> software to exist on Linux.
> > > > > > >
> > > > > > > I don't think we should enforce that all userspace users of an interface
> > > > > > > be Open Source. I do think we should enforce that *some* userspace user
> > > > > > > of an interface be Open Source before we add the interface.
> > > > > > 
> > > > > > The real question is whether the interface is documented in a way that
> > > > > > an Open Source implementation is possible. It does not matter whether it
> > > > > > exists at that point in time or not. Even if it exists there is no
> > > > > > guarantee that it is feature complete.
> > > > > > 
> > > > > > Freely accessible documentation is really the key.
> > > > > 
> > > > > I have more radical view than you and think that documentation is far
> > > > > from being enough. I would like to see any userspace API used (or to be
> > > > > used) in any package which exists in Debiam/Fedora/SuSE.
> > > > 
> > > > We probably need to add Android AOSP to that list, as we have
> > > > Android-specific APIs (not that I believe we *should* have
> > > > Android-specific APIs, there's been lots of efforts over the past years
> > > > to develop standard APIs for use cases that stem from Android, slowly
> > > > replacing Android-specific APIs in some area, but I don't believe we can
> > > > realisticly bridge that gap completely overnight, if ever).
> > > 
> > > Maybe.
> > > 
> > > > > Only this will give us some sort of confidence that API and device are usable
> > > > > to some level. As a side note, we will be able to estimate possible API
> > > > > deprecation/fix/extension based on simple search in package databases.
> > > > 
> > > > Linux supports devices from very diverse markets, from very tiny
> > > > embedded devices to supercomputers. We have drivers for devices that
> > > > exist in data centres of a single company only, or for which only a
> > > > handful of units exist through the world. The set of rules that we'll
> > > > decide on, if any, should take this into account.
> > > 
> > > I'm part of that group (RDMA) who cares about enterprise, cloud and supercomputers. :)
> > > So for us, working out-of-the box (distro packages and not github code drops) is
> > > the key to the scalability.
> > 
> > What if we're dealing with a device that only exists in a handful of
> > machines though ? Would distributions accept the burden of packaging
> > corresponding userspace code, and maintaining the packages, when only a
> > handful of people in the world will use it ? It's a genuine question.
> 
> Fedora, Debian and OpenSuSE are volunteer based distributions, they
> accept new packages, which need to be prepared (or asked to be
> prepared) by such vendors.
> 
> There is no "accept the burden of packaging corresponding userspace code,
> and maintaining the packages", it is on package maintainer who can or
> can't be associated with distribution.
> 
> > 
> > > Regarding "embedded devices", I remind that we are talking about
> > > userspace API and most likely busybox will be used for them, which is
> > > also part of larger distro anyway, so fails under category "exists in
> > > Debian/Fedora/SuSE".
> > 
> > We're talking about APIs exposed by drivers, for devices such as GPUs,
> > cameras or AI/ML accelerators. I don't think busybox will exercise those
> > :-) We have Masa for GPUs, libcamera for cameras, and other frameworks
> > I'm less familiar with for AI/ML accelerators, and I expect those to be
> > packaged by distributions. There are however other kind of devices that
> > don't fall in existing well-defined categories.
> 
> I'm a little bit confused here. IMHO, you are trying to find an universal
> solution for a problem that doesn't exist.
> 
> Above you asked how to deal with niche devices? Here you talk about mass
> products devices for the enterprise while before you mentioned "embedded
> devices".
> 
> 1. Niche devices - continue to do as they do it now, by supplying
> out-of-tree solutions for their customers. Such devices and companies
> rarely need upstream linux kernel support, because the burden to
> upstream it is very high. We don't want them in the tree either, because
> once they upstream it, the maintenance burden will be on us.

{sigh}

No, that is NOT our rule at all.

These devices and companies need to be upstream more than anything else
as that way they become part of our community and are responsible for
maintaining their code in the tree.  To force them to remain outside is
to go against everything that many of us have been saying for _decades_
now.

And how are you going to judge what is, and is not, a "niche" device?

> 2. Devices that hits the certain level of adoption - need to be
> integrated into certain userspace stack, which needs to be part of
> distro.

Distros are a very odd rule to rely on given that they are by far the
minority of the usage in raw numbers for Linux in the world.

> And AI/ML is no different here, someone just need to start build such
> stack. Otherwise, we will continue to see more free riders like HabanaLabs
> which don't have any real benefit to the community.

Everyone contributes to Linux in a selfish manner, that's just how the
community works.  The work that companies like habanalabs is NOT being a
"free rider" at all, they have worked with us and done the hard work of
actually getting their code merged into the tree and their userspace
code released under an open source license (unlike _ALL_ other AI/ML
companies, including Intel).  It would have been much cheaper and
quicker of them to just ignore upstream entirely, but that would have
meant that the community would not have any idea of what exactly these
use-case models were nor what the problems were that they were trying to
get Linux to do.

Linux benefits overall by having everyone participate, do NOT make
arbitrary rules to somehow prevent one company/group from being allowed
to upstream their code vs. another.  That is NOT how we have worked in
the past, and would only cause us to slowly die and become irrelevant.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-12  4:27               ` Leon Romanovsky
  2021-09-12  7:26                 ` Greg KH
@ 2021-09-12  7:46                 ` Mauro Carvalho Chehab
  2021-09-12  8:00                   ` Leon Romanovsky
  1 sibling, 1 reply; 77+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-12  7:46 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Laurent Pinchart, Thomas Gleixner, Josh Triplett,
	Jonathan Corbet, ksummit

Em Sun, 12 Sep 2021 07:27:55 +0300
Leon Romanovsky <leon@kernel.org> escreveu:

> > What if we're dealing with a device that only exists in a handful of
> > machines though ? Would distributions accept the burden of packaging
> > corresponding userspace code, and maintaining the packages, when only a
> > handful of people in the world will use it ? It's a genuine question.  
> 
> Fedora, Debian and OpenSuSE are volunteer based distributions, they
> accept new packages, which need to be prepared (or asked to be
> prepared) by such vendors.
> 
> There is no "accept the burden of packaging corresponding userspace code,
> and maintaining the packages", it is on package maintainer who can or
> can't be associated with distribution.

There is a dead lock issue, though: if we're willing to have a policy
of only accepting a new Kernel API after Fedora/Debian/openSuse accepts
its userspace counterpart, it would mean, in practice, that no new
APIs will ever be added, as I'm pretty sure most Fedora/Debian/openSuse
maintainers will refuse an application that depends on a non-accepted
Kernel API.

As a maintainer of several Fedora packages myself, I would refuse
any attempts of adding support for a non-accepted kernel API on
the packages I maintain.

-

Also, it makes no sense to add support on such general-purpose
distros for some hardware that will never be supported by it.

See, there are, for instance, some types of hardware that are
specific for some industry, like for instance, the CAN bus.
While CAN buses remain restricted to vehicles, it won't make any 
sense to crowd a general purpose distro with support for such
hardware. Such distros are not certified with ASIL. So, they
aren't allowed by law to be used inside vehicles.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-12  7:46                 ` Mauro Carvalho Chehab
@ 2021-09-12  8:00                   ` Leon Romanovsky
  2021-09-12 14:53                     ` Laurent Pinchart
  0 siblings, 1 reply; 77+ messages in thread
From: Leon Romanovsky @ 2021-09-12  8:00 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Laurent Pinchart, Thomas Gleixner, Josh Triplett,
	Jonathan Corbet, ksummit

On Sun, Sep 12, 2021 at 09:46:48AM +0200, Mauro Carvalho Chehab wrote:
> Em Sun, 12 Sep 2021 07:27:55 +0300
> Leon Romanovsky <leon@kernel.org> escreveu:
> 
> > > What if we're dealing with a device that only exists in a handful of
> > > machines though ? Would distributions accept the burden of packaging
> > > corresponding userspace code, and maintaining the packages, when only a
> > > handful of people in the world will use it ? It's a genuine question.  
> > 
> > Fedora, Debian and OpenSuSE are volunteer based distributions, they
> > accept new packages, which need to be prepared (or asked to be
> > prepared) by such vendors.
> > 
> > There is no "accept the burden of packaging corresponding userspace code,
> > and maintaining the packages", it is on package maintainer who can or
> > can't be associated with distribution.
> 
> There is a dead lock issue, though: if we're willing to have a policy
> of only accepting a new Kernel API after Fedora/Debian/openSuse accepts
> its userspace counterpart, it would mean, in practice, that no new
> APIs will ever be added, as I'm pretty sure most Fedora/Debian/openSuse
> maintainers will refuse an application that depends on a non-accepted
> Kernel API.

I said something different - "I would like to see any userspace API used (or to be
used)". 
https://lore.kernel.org/ksummit/20210912003349.6d2cacb1@coco.lan/T/#m3b7fbbe0959f1b59288dec9afd39f7cda0eeefe9

"To be used" means some open PR to existing package or request for
inclusion for new packages.

> 
> As a maintainer of several Fedora packages myself, I would refuse
> any attempts of adding support for a non-accepted kernel API on
> the packages I maintain.
> 
> -
> 
> Also, it makes no sense to add support on such general-purpose
> distros for some hardware that will never be supported by it.
> 
> See, there are, for instance, some types of hardware that are
> specific for some industry, like for instance, the CAN bus.
> While CAN buses remain restricted to vehicles, it won't make any 
> sense to crowd a general purpose distro with support for such
> hardware. Such distros are not certified with ASIL. So, they
> aren't allowed by law to be used inside vehicles.

And github pile of ... is certified?

In attempt to find general solution for all types of APIs and devices,
we won't solve anything.

So I suggest to return and talk about AI/ML devices and APIs that
targeted for enterprise/cloud and needs to be supported by major
distros.

Thanks

> 
> Thanks,
> Mauro

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-12  7:26                 ` Greg KH
@ 2021-09-12  8:29                   ` Leon Romanovsky
  2021-09-12 13:25                     ` Greg KH
  2021-09-12 19:52                   ` Dave Airlie
  1 sibling, 1 reply; 77+ messages in thread
From: Leon Romanovsky @ 2021-09-12  8:29 UTC (permalink / raw)
  To: Greg KH
  Cc: Laurent Pinchart, Thomas Gleixner, Josh Triplett,
	Mauro Carvalho Chehab, Jonathan Corbet, ksummit

On Sun, Sep 12, 2021 at 09:26:57AM +0200, Greg KH wrote:
> On Sun, Sep 12, 2021 at 07:27:55AM +0300, Leon Romanovsky wrote:
> > On Sun, Sep 12, 2021 at 01:04:01AM +0300, Laurent Pinchart wrote:
> > > Hi Leon,
> > > 
> > > On Sat, Sep 11, 2021 at 03:04:07PM +0300, Leon Romanovsky wrote:
> > > > On Sat, Sep 11, 2021 at 02:41:52PM +0300, Laurent Pinchart wrote:
> > > > > On Sat, Sep 11, 2021 at 01:31:02PM +0300, Leon Romanovsky wrote:
> > > > > > On Sat, Sep 11, 2021 at 01:55:16AM +0200, Thomas Gleixner wrote:
> > > > > > > On Fri, Sep 10 2021 at 16:45, Josh Triplett wrote:
> > > > > > > 
> > > > > > > > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote:
> > > > > > > >> On media, enforcing userspace to always be open source would
> > > > > > > >> have been very bad, as it would prevent several videoconferencing 
> > > > > > > >> software to exist on Linux.
> > > > > > > >
> > > > > > > > I don't think we should enforce that all userspace users of an interface
> > > > > > > > be Open Source. I do think we should enforce that *some* userspace user
> > > > > > > > of an interface be Open Source before we add the interface.
> > > > > > > 
> > > > > > > The real question is whether the interface is documented in a way that
> > > > > > > an Open Source implementation is possible. It does not matter whether it
> > > > > > > exists at that point in time or not. Even if it exists there is no
> > > > > > > guarantee that it is feature complete.
> > > > > > > 
> > > > > > > Freely accessible documentation is really the key.
> > > > > > 
> > > > > > I have more radical view than you and think that documentation is far
> > > > > > from being enough. I would like to see any userspace API used (or to be
> > > > > > used) in any package which exists in Debiam/Fedora/SuSE.
> > > > > 
> > > > > We probably need to add Android AOSP to that list, as we have
> > > > > Android-specific APIs (not that I believe we *should* have
> > > > > Android-specific APIs, there's been lots of efforts over the past years
> > > > > to develop standard APIs for use cases that stem from Android, slowly
> > > > > replacing Android-specific APIs in some area, but I don't believe we can
> > > > > realisticly bridge that gap completely overnight, if ever).
> > > > 
> > > > Maybe.
> > > > 
> > > > > > Only this will give us some sort of confidence that API and device are usable
> > > > > > to some level. As a side note, we will be able to estimate possible API
> > > > > > deprecation/fix/extension based on simple search in package databases.
> > > > > 
> > > > > Linux supports devices from very diverse markets, from very tiny
> > > > > embedded devices to supercomputers. We have drivers for devices that
> > > > > exist in data centres of a single company only, or for which only a
> > > > > handful of units exist through the world. The set of rules that we'll
> > > > > decide on, if any, should take this into account.
> > > > 
> > > > I'm part of that group (RDMA) who cares about enterprise, cloud and supercomputers. :)
> > > > So for us, working out-of-the box (distro packages and not github code drops) is
> > > > the key to the scalability.
> > > 
> > > What if we're dealing with a device that only exists in a handful of
> > > machines though ? Would distributions accept the burden of packaging
> > > corresponding userspace code, and maintaining the packages, when only a
> > > handful of people in the world will use it ? It's a genuine question.
> > 
> > Fedora, Debian and OpenSuSE are volunteer based distributions, they
> > accept new packages, which need to be prepared (or asked to be
> > prepared) by such vendors.
> > 
> > There is no "accept the burden of packaging corresponding userspace code,
> > and maintaining the packages", it is on package maintainer who can or
> > can't be associated with distribution.
> > 
> > > 
> > > > Regarding "embedded devices", I remind that we are talking about
> > > > userspace API and most likely busybox will be used for them, which is
> > > > also part of larger distro anyway, so fails under category "exists in
> > > > Debian/Fedora/SuSE".
> > > 
> > > We're talking about APIs exposed by drivers, for devices such as GPUs,
> > > cameras or AI/ML accelerators. I don't think busybox will exercise those
> > > :-) We have Masa for GPUs, libcamera for cameras, and other frameworks
> > > I'm less familiar with for AI/ML accelerators, and I expect those to be
> > > packaged by distributions. There are however other kind of devices that
> > > don't fall in existing well-defined categories.
> > 
> > I'm a little bit confused here. IMHO, you are trying to find an universal
> > solution for a problem that doesn't exist.
> > 
> > Above you asked how to deal with niche devices? Here you talk about mass
> > products devices for the enterprise while before you mentioned "embedded
> > devices".
> > 
> > 1. Niche devices - continue to do as they do it now, by supplying
> > out-of-tree solutions for their customers. Such devices and companies
> > rarely need upstream linux kernel support, because the burden to
> > upstream it is very high. We don't want them in the tree either, because
> > once they upstream it, the maintenance burden will be on us.
> 
> {sigh}
> 
> No, that is NOT our rule at all.
> 
> These devices and companies need to be upstream more than anything else
> as that way they become part of our community and are responsible for
> maintaining their code in the tree.  To force them to remain outside is
> to go against everything that many of us have been saying for _decades_
> now.
> 
> And how are you going to judge what is, and is not, a "niche" device?

I will leave to that company to decide. Again this is exactly how they
operate now, there is nothing new here. Every company calculates ROI
for working with upstream and small companies with niche devices are not
different here.

The main idea that I want to see working userspace stack, and being in
distro sets a certain quality level, am I asking too much?

> 
> > 2. Devices that hits the certain level of adoption - need to be
> > integrated into certain userspace stack, which needs to be part of
> > distro.
> 
> Distros are a very odd rule to rely on given that they are by far the
> minority of the usage in raw numbers for Linux in the world.

You can count Android as another distro, it is just semantics.

> 
> > And AI/ML is no different here, someone just need to start build such
> > stack. Otherwise, we will continue to see more free riders like HabanaLabs
> > which don't have any real benefit to the community.
> 
> Everyone contributes to Linux in a selfish manner, that's just how the
> community works.  The work that companies like habanalabs is NOT being a
> "free rider" at all, they have worked with us and done the hard work of
> actually getting their code merged into the tree.

I perfectly remember them trying to bypass netdev and RDMA communities
by pretending "misc" device.

https://lore.kernel.org/linux-rdma/20200915133556.21268811@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/
https://lore.kernel.org/linux-rdma/20200917171833.GJ8409@ziepe.ca/

Or DRM
https://lore.kernel.org/linux-rdma/CAKMK7uFOfoxbD2Z5mb-qHFnUe5rObGKQ6Ygh--HSH9M=9bziGg@mail.gmail.com/

So I can agree with the statement "worked hard", but not with the
relevant communities.

> code released under an open source license (unlike _ALL_ other AI/ML
> companies, including Intel). 

Yes, they provided user-space library, but didn't release compiler, so till recently,
it wasn't usable at all.

> It would have been much cheaper and
> quicker of them to just ignore upstream entirely, but that would have
> meant that the community would not have any idea of what exactly these
> use-case models were nor what the problems were that they were trying to
> get Linux to do.

The thing is that community talks about AI/ML stack for a long time, but
as long as backdoor to merge code exists, we won't have anything good
for the end users.

> 
> Linux benefits overall by having everyone participate, do NOT make
> arbitrary rules to somehow prevent one company/group from being allowed
> to upstream their code vs. another.  That is NOT how we have worked in
> the past, and would only cause us to slowly die and become irrelevant.

Somehow, we have rules, for example, we require user space part for any
API merged. Should we cancel it too? so all groups and companies will be
able to contribute.

Thanks

> 
> thanks,
> 
> greg k-h

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-12  8:29                   ` Leon Romanovsky
@ 2021-09-12 13:25                     ` Greg KH
  2021-09-12 14:15                       ` Leon Romanovsky
  2021-09-12 15:55                       ` Laurent Pinchart
  0 siblings, 2 replies; 77+ messages in thread
From: Greg KH @ 2021-09-12 13:25 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Laurent Pinchart, Thomas Gleixner, Josh Triplett,
	Mauro Carvalho Chehab, Jonathan Corbet, ksummit

On Sun, Sep 12, 2021 at 11:29:45AM +0300, Leon Romanovsky wrote:
> On Sun, Sep 12, 2021 at 09:26:57AM +0200, Greg KH wrote:
> > On Sun, Sep 12, 2021 at 07:27:55AM +0300, Leon Romanovsky wrote:
> > > On Sun, Sep 12, 2021 at 01:04:01AM +0300, Laurent Pinchart wrote:
> > > > Hi Leon,
> > > > 
> > > > On Sat, Sep 11, 2021 at 03:04:07PM +0300, Leon Romanovsky wrote:
> > > > > On Sat, Sep 11, 2021 at 02:41:52PM +0300, Laurent Pinchart wrote:
> > > > > > On Sat, Sep 11, 2021 at 01:31:02PM +0300, Leon Romanovsky wrote:
> > > > > > > On Sat, Sep 11, 2021 at 01:55:16AM +0200, Thomas Gleixner wrote:
> > > > > > > > On Fri, Sep 10 2021 at 16:45, Josh Triplett wrote:
> > > > > > > > 
> > > > > > > > > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote:
> > > > > > > > >> On media, enforcing userspace to always be open source would
> > > > > > > > >> have been very bad, as it would prevent several videoconferencing 
> > > > > > > > >> software to exist on Linux.
> > > > > > > > >
> > > > > > > > > I don't think we should enforce that all userspace users of an interface
> > > > > > > > > be Open Source. I do think we should enforce that *some* userspace user
> > > > > > > > > of an interface be Open Source before we add the interface.
> > > > > > > > 
> > > > > > > > The real question is whether the interface is documented in a way that
> > > > > > > > an Open Source implementation is possible. It does not matter whether it
> > > > > > > > exists at that point in time or not. Even if it exists there is no
> > > > > > > > guarantee that it is feature complete.
> > > > > > > > 
> > > > > > > > Freely accessible documentation is really the key.
> > > > > > > 
> > > > > > > I have more radical view than you and think that documentation is far
> > > > > > > from being enough. I would like to see any userspace API used (or to be
> > > > > > > used) in any package which exists in Debiam/Fedora/SuSE.
> > > > > > 
> > > > > > We probably need to add Android AOSP to that list, as we have
> > > > > > Android-specific APIs (not that I believe we *should* have
> > > > > > Android-specific APIs, there's been lots of efforts over the past years
> > > > > > to develop standard APIs for use cases that stem from Android, slowly
> > > > > > replacing Android-specific APIs in some area, but I don't believe we can
> > > > > > realisticly bridge that gap completely overnight, if ever).
> > > > > 
> > > > > Maybe.
> > > > > 
> > > > > > > Only this will give us some sort of confidence that API and device are usable
> > > > > > > to some level. As a side note, we will be able to estimate possible API
> > > > > > > deprecation/fix/extension based on simple search in package databases.
> > > > > > 
> > > > > > Linux supports devices from very diverse markets, from very tiny
> > > > > > embedded devices to supercomputers. We have drivers for devices that
> > > > > > exist in data centres of a single company only, or for which only a
> > > > > > handful of units exist through the world. The set of rules that we'll
> > > > > > decide on, if any, should take this into account.
> > > > > 
> > > > > I'm part of that group (RDMA) who cares about enterprise, cloud and supercomputers. :)
> > > > > So for us, working out-of-the box (distro packages and not github code drops) is
> > > > > the key to the scalability.
> > > > 
> > > > What if we're dealing with a device that only exists in a handful of
> > > > machines though ? Would distributions accept the burden of packaging
> > > > corresponding userspace code, and maintaining the packages, when only a
> > > > handful of people in the world will use it ? It's a genuine question.
> > > 
> > > Fedora, Debian and OpenSuSE are volunteer based distributions, they
> > > accept new packages, which need to be prepared (or asked to be
> > > prepared) by such vendors.
> > > 
> > > There is no "accept the burden of packaging corresponding userspace code,
> > > and maintaining the packages", it is on package maintainer who can or
> > > can't be associated with distribution.
> > > 
> > > > 
> > > > > Regarding "embedded devices", I remind that we are talking about
> > > > > userspace API and most likely busybox will be used for them, which is
> > > > > also part of larger distro anyway, so fails under category "exists in
> > > > > Debian/Fedora/SuSE".
> > > > 
> > > > We're talking about APIs exposed by drivers, for devices such as GPUs,
> > > > cameras or AI/ML accelerators. I don't think busybox will exercise those
> > > > :-) We have Masa for GPUs, libcamera for cameras, and other frameworks
> > > > I'm less familiar with for AI/ML accelerators, and I expect those to be
> > > > packaged by distributions. There are however other kind of devices that
> > > > don't fall in existing well-defined categories.
> > > 
> > > I'm a little bit confused here. IMHO, you are trying to find an universal
> > > solution for a problem that doesn't exist.
> > > 
> > > Above you asked how to deal with niche devices? Here you talk about mass
> > > products devices for the enterprise while before you mentioned "embedded
> > > devices".
> > > 
> > > 1. Niche devices - continue to do as they do it now, by supplying
> > > out-of-tree solutions for their customers. Such devices and companies
> > > rarely need upstream linux kernel support, because the burden to
> > > upstream it is very high. We don't want them in the tree either, because
> > > once they upstream it, the maintenance burden will be on us.
> > 
> > {sigh}
> > 
> > No, that is NOT our rule at all.
> > 
> > These devices and companies need to be upstream more than anything else
> > as that way they become part of our community and are responsible for
> > maintaining their code in the tree.  To force them to remain outside is
> > to go against everything that many of us have been saying for _decades_
> > now.
> > 
> > And how are you going to judge what is, and is not, a "niche" device?
> 
> I will leave to that company to decide. Again this is exactly how they
> operate now, there is nothing new here. Every company calculates ROI
> for working with upstream and small companies with niche devices are not
> different here.
> 
> The main idea that I want to see working userspace stack, and being in
> distro sets a certain quality level, am I asking too much?

Define "working userspace stack" and "distro" please.  Like others have
said, many distros will not take userspace code unless it's already in
the kernel tree first, as that ensures that the abi will not break.

> > > 2. Devices that hits the certain level of adoption - need to be
> > > integrated into certain userspace stack, which needs to be part of
> > > distro.
> > 
> > Distros are a very odd rule to rely on given that they are by far the
> > minority of the usage in raw numbers for Linux in the world.
> 
> You can count Android as another distro, it is just semantics.

But how do you define Android's userspace?  Just one vendor?  2 vendors?
10 vendors?  There is major userspace fragmentation in Android userspace
in many places, the user/kernel boundry being one of the big ones as
many of us have found out over the past years.  And many of us are
working to resolve this, but it's not so simple at times, and I have
many examples if you want specifics.

> > > And AI/ML is no different here, someone just need to start build such
> > > stack. Otherwise, we will continue to see more free riders like HabanaLabs
> > > which don't have any real benefit to the community.
> > 
> > Everyone contributes to Linux in a selfish manner, that's just how the
> > community works.  The work that companies like habanalabs is NOT being a
> > "free rider" at all, they have worked with us and done the hard work of
> > actually getting their code merged into the tree.
> 
> I perfectly remember them trying to bypass netdev and RDMA communities
> by pretending "misc" device.
> 
> https://lore.kernel.org/linux-rdma/20200915133556.21268811@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/
> https://lore.kernel.org/linux-rdma/20200917171833.GJ8409@ziepe.ca/
> 
> Or DRM
> https://lore.kernel.org/linux-rdma/CAKMK7uFOfoxbD2Z5mb-qHFnUe5rObGKQ6Ygh--HSH9M=9bziGg@mail.gmail.com/
> 
> So I can agree with the statement "worked hard", but not with the
> relevant communities.

I point at these as doing exactly what we want vendors to be doing!
Thank you for finding the good examples.  This is a vendor submitting
patches and saying, "here is what we want to do, with a first cut at
doing it."  It's up to us as a community to tell them if they are doing
it the right way or not.

If we just let them all go their own ways, they will come up with
horrible apis and interfaces, we have all seen that before.

So by working together, we both can learn from, and work together to
solve the issue.  And that is what these driver authors and company has
been doing!  They are part of our community, why are you saying they
should now just go do their own thing away from us?

And as for "bypassing", that feels very mean.  We have had accelerator
code in the char/misc and other parts of the kernel tree since at least
2018 if not earlier (I didn't look all that hard.)  Just because someone
wanted to use the in-kernel apis that are there (why is dma-buf some
magic thing?) does not mean that they suddenly need to move to a
different subsystem.

We get at least 1-2 new subsystems and major drivers that get added to
the kernel tree that do things that have never been done before with
custom user/kernel apis every kernel release.  Not everything can be a
standard api no matter how much I, and others, wish it were.

As examples, what about the hyperv blob api that was submitted recently
going around the block layer?  What about the new Intel accelerator that
added yet-another-set-of-custom-ioctls?  What about the rpi drivers?
What about the virtualbox drivers?  Should all of those just live
outside of the kernel for forever?

Of course not.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-12 13:25                     ` Greg KH
@ 2021-09-12 14:15                       ` Leon Romanovsky
  2021-09-12 14:34                         ` Greg KH
  2021-09-12 15:55                       ` Laurent Pinchart
  1 sibling, 1 reply; 77+ messages in thread
From: Leon Romanovsky @ 2021-09-12 14:15 UTC (permalink / raw)
  To: Greg KH
  Cc: Laurent Pinchart, Thomas Gleixner, Josh Triplett,
	Mauro Carvalho Chehab, Jonathan Corbet, ksummit

On Sun, Sep 12, 2021 at 03:25:58PM +0200, Greg KH wrote:
> On Sun, Sep 12, 2021 at 11:29:45AM +0300, Leon Romanovsky wrote:
> > On Sun, Sep 12, 2021 at 09:26:57AM +0200, Greg KH wrote:
> > > On Sun, Sep 12, 2021 at 07:27:55AM +0300, Leon Romanovsky wrote:
> > > > On Sun, Sep 12, 2021 at 01:04:01AM +0300, Laurent Pinchart wrote:
> > > > > Hi Leon,
> > > > > 
> > > > > On Sat, Sep 11, 2021 at 03:04:07PM +0300, Leon Romanovsky wrote:
> > > > > > On Sat, Sep 11, 2021 at 02:41:52PM +0300, Laurent Pinchart wrote:
> > > > > > > On Sat, Sep 11, 2021 at 01:31:02PM +0300, Leon Romanovsky wrote:
> > > > > > > > On Sat, Sep 11, 2021 at 01:55:16AM +0200, Thomas Gleixner wrote:
> > > > > > > > > On Fri, Sep 10 2021 at 16:45, Josh Triplett wrote:
> > > > > > > > > 
> > > > > > > > > > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote:
> > > > > > > > > >> On media, enforcing userspace to always be open source would
> > > > > > > > > >> have been very bad, as it would prevent several videoconferencing 
> > > > > > > > > >> software to exist on Linux.
> > > > > > > > > >
> > > > > > > > > > I don't think we should enforce that all userspace users of an interface
> > > > > > > > > > be Open Source. I do think we should enforce that *some* userspace user
> > > > > > > > > > of an interface be Open Source before we add the interface.
> > > > > > > > > 
> > > > > > > > > The real question is whether the interface is documented in a way that
> > > > > > > > > an Open Source implementation is possible. It does not matter whether it
> > > > > > > > > exists at that point in time or not. Even if it exists there is no
> > > > > > > > > guarantee that it is feature complete.
> > > > > > > > > 
> > > > > > > > > Freely accessible documentation is really the key.
> > > > > > > > 
> > > > > > > > I have more radical view than you and think that documentation is far
> > > > > > > > from being enough. I would like to see any userspace API used (or to be
> > > > > > > > used) in any package which exists in Debiam/Fedora/SuSE.
> > > > > > > 
> > > > > > > We probably need to add Android AOSP to that list, as we have
> > > > > > > Android-specific APIs (not that I believe we *should* have
> > > > > > > Android-specific APIs, there's been lots of efforts over the past years
> > > > > > > to develop standard APIs for use cases that stem from Android, slowly
> > > > > > > replacing Android-specific APIs in some area, but I don't believe we can
> > > > > > > realisticly bridge that gap completely overnight, if ever).
> > > > > > 
> > > > > > Maybe.
> > > > > > 
> > > > > > > > Only this will give us some sort of confidence that API and device are usable
> > > > > > > > to some level. As a side note, we will be able to estimate possible API
> > > > > > > > deprecation/fix/extension based on simple search in package databases.
> > > > > > > 
> > > > > > > Linux supports devices from very diverse markets, from very tiny
> > > > > > > embedded devices to supercomputers. We have drivers for devices that
> > > > > > > exist in data centres of a single company only, or for which only a
> > > > > > > handful of units exist through the world. The set of rules that we'll
> > > > > > > decide on, if any, should take this into account.
> > > > > > 
> > > > > > I'm part of that group (RDMA) who cares about enterprise, cloud and supercomputers. :)
> > > > > > So for us, working out-of-the box (distro packages and not github code drops) is
> > > > > > the key to the scalability.
> > > > > 
> > > > > What if we're dealing with a device that only exists in a handful of
> > > > > machines though ? Would distributions accept the burden of packaging
> > > > > corresponding userspace code, and maintaining the packages, when only a
> > > > > handful of people in the world will use it ? It's a genuine question.
> > > > 
> > > > Fedora, Debian and OpenSuSE are volunteer based distributions, they
> > > > accept new packages, which need to be prepared (or asked to be
> > > > prepared) by such vendors.
> > > > 
> > > > There is no "accept the burden of packaging corresponding userspace code,
> > > > and maintaining the packages", it is on package maintainer who can or
> > > > can't be associated with distribution.
> > > > 
> > > > > 
> > > > > > Regarding "embedded devices", I remind that we are talking about
> > > > > > userspace API and most likely busybox will be used for them, which is
> > > > > > also part of larger distro anyway, so fails under category "exists in
> > > > > > Debian/Fedora/SuSE".
> > > > > 
> > > > > We're talking about APIs exposed by drivers, for devices such as GPUs,
> > > > > cameras or AI/ML accelerators. I don't think busybox will exercise those
> > > > > :-) We have Masa for GPUs, libcamera for cameras, and other frameworks
> > > > > I'm less familiar with for AI/ML accelerators, and I expect those to be
> > > > > packaged by distributions. There are however other kind of devices that
> > > > > don't fall in existing well-defined categories.
> > > > 
> > > > I'm a little bit confused here. IMHO, you are trying to find an universal
> > > > solution for a problem that doesn't exist.
> > > > 
> > > > Above you asked how to deal with niche devices? Here you talk about mass
> > > > products devices for the enterprise while before you mentioned "embedded
> > > > devices".
> > > > 
> > > > 1. Niche devices - continue to do as they do it now, by supplying
> > > > out-of-tree solutions for their customers. Such devices and companies
> > > > rarely need upstream linux kernel support, because the burden to
> > > > upstream it is very high. We don't want them in the tree either, because
> > > > once they upstream it, the maintenance burden will be on us.
> > > 
> > > {sigh}
> > > 
> > > No, that is NOT our rule at all.
> > > 
> > > These devices and companies need to be upstream more than anything else
> > > as that way they become part of our community and are responsible for
> > > maintaining their code in the tree.  To force them to remain outside is
> > > to go against everything that many of us have been saying for _decades_
> > > now.
> > > 
> > > And how are you going to judge what is, and is not, a "niche" device?
> > 
> > I will leave to that company to decide. Again this is exactly how they
> > operate now, there is nothing new here. Every company calculates ROI
> > for working with upstream and small companies with niche devices are not
> > different here.
> > 
> > The main idea that I want to see working userspace stack, and being in
> > distro sets a certain quality level, am I asking too much?
> 
> Define "working userspace stack" and "distro" please.  Like others have
> said, many distros will not take userspace code unless it's already in
> the kernel tree first, as that ensures that the abi will not break.

Like I already answered
https://lore.kernel.org/all/YT2zryAKHc%2F5R2IH@unreal/
"To be used" means some open PR to existing package or request for
inclusion for new packages.

> 
> > > > 2. Devices that hits the certain level of adoption - need to be
> > > > integrated into certain userspace stack, which needs to be part of
> > > > distro.
> > > 
> > > Distros are a very odd rule to rely on given that they are by far the
> > > minority of the usage in raw numbers for Linux in the world.
> > 
> > You can count Android as another distro, it is just semantics.
> 
> But how do you define Android's userspace?  Just one vendor?  2 vendors?
> 10 vendors?  There is major userspace fragmentation in Android userspace
> in many places, the user/kernel boundry being one of the big ones as
> many of us have found out over the past years.  And many of us are
> working to resolve this, but it's not so simple at times, and I have
> many examples if you want specifics.

Lauerent suggested AOSP
https://lore.kernel.org/all/YTyWANV%2FmSkQbYhj@pendragon.ideasonboard.com/

> 
> > > > And AI/ML is no different here, someone just need to start build such
> > > > stack. Otherwise, we will continue to see more free riders like HabanaLabs
> > > > which don't have any real benefit to the community.
> > > 
> > > Everyone contributes to Linux in a selfish manner, that's just how the
> > > community works.  The work that companies like habanalabs is NOT being a
> > > "free rider" at all, they have worked with us and done the hard work of
> > > actually getting their code merged into the tree.
> > 
> > I perfectly remember them trying to bypass netdev and RDMA communities
> > by pretending "misc" device.
> > 
> > https://lore.kernel.org/linux-rdma/20200915133556.21268811@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/
> > https://lore.kernel.org/linux-rdma/20200917171833.GJ8409@ziepe.ca/
> > 
> > Or DRM
> > https://lore.kernel.org/linux-rdma/CAKMK7uFOfoxbD2Z5mb-qHFnUe5rObGKQ6Ygh--HSH9M=9bziGg@mail.gmail.com/
> > 
> > So I can agree with the statement "worked hard", but not with the
> > relevant communities.
> 
> I point at these as doing exactly what we want vendors to be doing!
> Thank you for finding the good examples.  This is a vendor submitting
> patches and saying, "here is what we want to do, with a first cut at
> doing it."  It's up to us as a community to tell them if they are doing
> it the right way or not.
> 
> If we just let them all go their own ways, they will come up with
> horrible apis and interfaces, we have all seen that before.
> 
> So by working together, we both can learn from, and work together to
> solve the issue.  And that is what these driver authors and company has
> been doing!  They are part of our community, why are you saying they
> should now just go do their own thing away from us?

This is not what I said. I don't see Intel (habanalabs) as a company
that can't create proper AI stack and think that this is our
responsibility to provide them enough incentive to do it.

> 
> And as for "bypassing", that feels very mean.  We have had accelerator
> code in the char/misc and other parts of the kernel tree since at least
> 2018 if not earlier (I didn't look all that hard.)  Just because someone
> wanted to use the in-kernel apis that are there (why is dma-buf some
> magic thing?) does not mean that they suddenly need to move to a
> different subsystem.

Because dma-buf API has specific semantics and was designed with very
specific usage model in mind.

> 
> We get at least 1-2 new subsystems and major drivers that get added to
> the kernel tree that do things that have never been done before with
> custom user/kernel apis every kernel release.  Not everything can be a
> standard api no matter how much I, and others, wish it were.

So when will you draw a line and ask to create proper susbsystem
with standard APIs? After 2, 3 ... 100 similar (from our point of view)
and different (from vendor point of view) devices with custom API?

> 
> As examples, what about the hyperv blob api that was submitted recently
> going around the block layer?  What about the new Intel accelerator that
> added yet-another-set-of-custom-ioctls?  What about the rpi drivers?
> What about the virtualbox drivers?  Should all of those just live
> outside of the kernel for forever?
> 
> Of course not.

So what is your bar? Accept everything?

Thanks

> 
> thanks,
> 
> greg k-h

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-12 14:15                       ` Leon Romanovsky
@ 2021-09-12 14:34                         ` Greg KH
  2021-09-12 16:41                           ` Laurent Pinchart
                                             ` (3 more replies)
  0 siblings, 4 replies; 77+ messages in thread
From: Greg KH @ 2021-09-12 14:34 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Laurent Pinchart, Thomas Gleixner, Josh Triplett,
	Mauro Carvalho Chehab, Jonathan Corbet, ksummit

On Sun, Sep 12, 2021 at 05:15:30PM +0300, Leon Romanovsky wrote:
> On Sun, Sep 12, 2021 at 03:25:58PM +0200, Greg KH wrote:
> > > The main idea that I want to see working userspace stack, and being in
> > > distro sets a certain quality level, am I asking too much?
> > 
> > Define "working userspace stack" and "distro" please.  Like others have
> > said, many distros will not take userspace code unless it's already in
> > the kernel tree first, as that ensures that the abi will not break.
> 
> Like I already answered
> https://lore.kernel.org/all/YT2zryAKHc%2F5R2IH@unreal/
> "To be used" means some open PR to existing package or request for
> inclusion for new packages.

But again, distros will not take things that are not already in the
kernel.

> > > > > 2. Devices that hits the certain level of adoption - need to be
> > > > > integrated into certain userspace stack, which needs to be part of
> > > > > distro.
> > > > 
> > > > Distros are a very odd rule to rely on given that they are by far the
> > > > minority of the usage in raw numbers for Linux in the world.
> > > 
> > > You can count Android as another distro, it is just semantics.
> > 
> > But how do you define Android's userspace?  Just one vendor?  2 vendors?
> > 10 vendors?  There is major userspace fragmentation in Android userspace
> > in many places, the user/kernel boundry being one of the big ones as
> > many of us have found out over the past years.  And many of us are
> > working to resolve this, but it's not so simple at times, and I have
> > many examples if you want specifics.
> 
> Lauerent suggested AOSP
> https://lore.kernel.org/all/YTyWANV%2FmSkQbYhj@pendragon.ideasonboard.com/

Vendors can not get code into AOSP for various reasons that only Google
understands.  There are many millions, if not billions of Android
devices out there with user/kernel apis that are not upstream nor in
AOSP because Google doesn't want to take them, or because the vendor can
not go through those hoops (international law is tricky at times...)

So are we to just not be able to take drivers that add those new apis if
AOSP can not take the userspace side, yet the userspace side is
published somewhere else?

> > > > > And AI/ML is no different here, someone just need to start build such
> > > > > stack. Otherwise, we will continue to see more free riders like HabanaLabs
> > > > > which don't have any real benefit to the community.
> > > > 
> > > > Everyone contributes to Linux in a selfish manner, that's just how the
> > > > community works.  The work that companies like habanalabs is NOT being a
> > > > "free rider" at all, they have worked with us and done the hard work of
> > > > actually getting their code merged into the tree.
> > > 
> > > I perfectly remember them trying to bypass netdev and RDMA communities
> > > by pretending "misc" device.
> > > 
> > > https://lore.kernel.org/linux-rdma/20200915133556.21268811@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/
> > > https://lore.kernel.org/linux-rdma/20200917171833.GJ8409@ziepe.ca/
> > > 
> > > Or DRM
> > > https://lore.kernel.org/linux-rdma/CAKMK7uFOfoxbD2Z5mb-qHFnUe5rObGKQ6Ygh--HSH9M=9bziGg@mail.gmail.com/
> > > 
> > > So I can agree with the statement "worked hard", but not with the
> > > relevant communities.
> > 
> > I point at these as doing exactly what we want vendors to be doing!
> > Thank you for finding the good examples.  This is a vendor submitting
> > patches and saying, "here is what we want to do, with a first cut at
> > doing it."  It's up to us as a community to tell them if they are doing
> > it the right way or not.
> > 
> > If we just let them all go their own ways, they will come up with
> > horrible apis and interfaces, we have all seen that before.
> > 
> > So by working together, we both can learn from, and work together to
> > solve the issue.  And that is what these driver authors and company has
> > been doing!  They are part of our community, why are you saying they
> > should now just go do their own thing away from us?
> 
> This is not what I said. I don't see Intel (habanalabs) as a company
> that can't create proper AI stack and think that this is our
> responsibility to provide them enough incentive to do it.

So should we be forcing everyone to follow the IBM standard for
accelerator drivers because they were in the kernel first all those
years ago?  Or what other standard do we pick?

And why are we dictating new industry standards here?  Who are we to do
that?  Who is going to take that responsibility on?

> > And as for "bypassing", that feels very mean.  We have had accelerator
> > code in the char/misc and other parts of the kernel tree since at least
> > 2018 if not earlier (I didn't look all that hard.)  Just because someone
> > wanted to use the in-kernel apis that are there (why is dma-buf some
> > magic thing?) does not mean that they suddenly need to move to a
> > different subsystem.
> 
> Because dma-buf API has specific semantics and was designed with very
> specific usage model in mind.

So will the IB patches usage be re-reviewed?

Anyway, we have apis that are used throughout the kernel all the time
that don't end up on the various subsystem mailing list because people
forget, or just do not know.  That's normal and something we have dealt
with for forever.  As an example, I didn't realise that just using the
dma-buf api required such a review.

Can we put that in the MAINTAINERS file somehow for apis?

> > We get at least 1-2 new subsystems and major drivers that get added to
> > the kernel tree that do things that have never been done before with
> > custom user/kernel apis every kernel release.  Not everything can be a
> > standard api no matter how much I, and others, wish it were.
> 
> So when will you draw a line and ask to create proper susbsystem
> with standard APIs? After 2, 3 ... 100 similar (from our point of view)
> and different (from vendor point of view) devices with custom API?

That is a great question and I do not have the answer to that.  Should
we have done that after the first one went into the kernel all those
years ago?  Maybe, but I seem to recal the answer being "our hardware
works much differently, so our user api will be much different", and
that's a valid answer.

If your standard can not handle new usage models and a way to handle
that, then it isn't a good standard that companies will follow for new
types of devices.

We have loads of char drivers with odd ioctl apis because we have loads
of odd hardware devices out in the world.  We have been treating these
accelerators like that for a long time now, except when they try to
duplicate existing in-kernel code (like crypto or networking).

> > As examples, what about the hyperv blob api that was submitted recently
> > going around the block layer?  What about the new Intel accelerator that
> > added yet-another-set-of-custom-ioctls?  What about the rpi drivers?
> > What about the virtualbox drivers?  Should all of those just live
> > outside of the kernel for forever?
> > 
> > Of course not.
> 
> So what is your bar? Accept everything?

It's a hard line to draw, and for some reason, I seem to be the one
having to review these types of drivers every kernel release.  If people
wish to help me out, please do so, all the patches are on the lists.

Right now I push back where I can and try to get semi-sane apis created
that are "obviously not wrong" where I notice.  After that, I just need
to trust that the maintainer of the driver knows what they are doing and
will maintain the code going forward.  So far, it's worked out.

Do you have a better idea of what to do instead?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-12  8:00                   ` Leon Romanovsky
@ 2021-09-12 14:53                     ` Laurent Pinchart
  2021-09-12 15:41                       ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 77+ messages in thread
From: Laurent Pinchart @ 2021-09-12 14:53 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Mauro Carvalho Chehab, Thomas Gleixner, Josh Triplett,
	Jonathan Corbet, ksummit

Hello,

On Sun, Sep 12, 2021 at 11:00:47AM +0300, Leon Romanovsky wrote:
> On Sun, Sep 12, 2021 at 09:46:48AM +0200, Mauro Carvalho Chehab wrote:
> > Em Sun, 12 Sep 2021 07:27:55 +0300 Leon Romanovsky escreveu:
> > 
> > > > What if we're dealing with a device that only exists in a handful of
> > > > machines though ? Would distributions accept the burden of packaging
> > > > corresponding userspace code, and maintaining the packages, when only a
> > > > handful of people in the world will use it ? It's a genuine question.  
> > > 
> > > Fedora, Debian and OpenSuSE are volunteer based distributions, they
> > > accept new packages, which need to be prepared (or asked to be
> > > prepared) by such vendors.
> > > 
> > > There is no "accept the burden of packaging corresponding userspace code,
> > > and maintaining the packages", it is on package maintainer who can or
> > > can't be associated with distribution.
> > 
> > There is a dead lock issue, though: if we're willing to have a policy
> > of only accepting a new Kernel API after Fedora/Debian/openSuse accepts
> > its userspace counterpart, it would mean, in practice, that no new
> > APIs will ever be added, as I'm pretty sure most Fedora/Debian/openSuse
> > maintainers will refuse an application that depends on a non-accepted
> > Kernel API.
> 
> I said something different - "I would like to see any userspace API used (or to be
> used)". 
> https://lore.kernel.org/ksummit/20210912003349.6d2cacb1@coco.lan/T/#m3b7fbbe0959f1b59288dec9afd39f7cda0eeefe9
> 
> "To be used" means some open PR to existing package or request for
> inclusion for new packages.

Requiring userspace support to be merged in the appropriate framework or
accepted as a package by distributions can result in deadlocks, but
requiring only aa upstream pre-approval is I think a good way to deal
with the issue. That's what DRM/KMS does, there's no hard requirement
(as far as I can tell) to have code merged in Mesa before the kernel,
only a requirement of getting the Mesa code reviewed and acked.

> > As a maintainer of several Fedora packages myself, I would refuse
> > any attempts of adding support for a non-accepted kernel API on
> > the packages I maintain.
> > 
> > -
> > 
> > Also, it makes no sense to add support on such general-purpose
> > distros for some hardware that will never be supported by it.
> > 
> > See, there are, for instance, some types of hardware that are
> > specific for some industry, like for instance, the CAN bus.
> > While CAN buses remain restricted to vehicles, it won't make any 
> > sense to crowd a general purpose distro with support for such
> > hardware. Such distros are not certified with ASIL. So, they
> > aren't allowed by law to be used inside vehicles.

I'm not sure that's the best example, CAN has uses in other types of
devices, some of which may run a general-purpose distribution.

> And github pile of ... is certified?
> 
> In attempt to find general solution for all types of APIs and devices,
> we won't solve anything.

That I agree with.

> So I suggest to return and talk about AI/ML devices and APIs that
> targeted for enterprise/cloud and needs to be supported by major
> distros.

And that I don't :-) I think the issue is the same for at least GPUs and
AI/ML accelerators, and quite possible camera ISPs too. I'd like to try
and define clear sets of criteria to address the problem, and that can
include different alternatives (just as an example, not necessarily
something I'd advocate for, open userspace vs. documentation) that
subsystems can then select based on their specific situation.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-12 14:53                     ` Laurent Pinchart
@ 2021-09-12 15:41                       ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 77+ messages in thread
From: Mauro Carvalho Chehab @ 2021-09-12 15:41 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Leon Romanovsky, Thomas Gleixner, Josh Triplett, Jonathan Corbet,
	ksummit

Em Sun, 12 Sep 2021 17:53:51 +0300
Laurent Pinchart <laurent.pinchart@ideasonboard.com> escreveu:

> Hello,
> 
> On Sun, Sep 12, 2021 at 11:00:47AM +0300, Leon Romanovsky wrote:
> > On Sun, Sep 12, 2021 at 09:46:48AM +0200, Mauro Carvalho Chehab wrote:  
> > > Em Sun, 12 Sep 2021 07:27:55 +0300 Leon Romanovsky escreveu:
> > >   
> > > > > What if we're dealing with a device that only exists in a handful of
> > > > > machines though ? Would distributions accept the burden of packaging
> > > > > corresponding userspace code, and maintaining the packages, when only a
> > > > > handful of people in the world will use it ? It's a genuine question.    
> > > > 
> > > > Fedora, Debian and OpenSuSE are volunteer based distributions, they
> > > > accept new packages, which need to be prepared (or asked to be
> > > > prepared) by such vendors.
> > > > 
> > > > There is no "accept the burden of packaging corresponding userspace code,
> > > > and maintaining the packages", it is on package maintainer who can or
> > > > can't be associated with distribution.  
> > > 
> > > There is a dead lock issue, though: if we're willing to have a policy
> > > of only accepting a new Kernel API after Fedora/Debian/openSuse accepts
> > > its userspace counterpart, it would mean, in practice, that no new
> > > APIs will ever be added, as I'm pretty sure most Fedora/Debian/openSuse
> > > maintainers will refuse an application that depends on a non-accepted
> > > Kernel API.  
> > 
> > I said something different - "I would like to see any userspace API used (or to be
> > used)". 
> > https://lore.kernel.org/ksummit/20210912003349.6d2cacb1@coco.lan/T/#m3b7fbbe0959f1b59288dec9afd39f7cda0eeefe9
> > 
> > "To be used" means some open PR to existing package or request for
> > inclusion for new packages.  
> 
> Requiring userspace support to be merged in the appropriate framework or
> accepted as a package by distributions can result in deadlocks, but
> requiring only aa upstream pre-approval is I think a good way to deal
> with the issue.
> 
> > > As a maintainer of several Fedora packages myself, I would refuse
> > > any attempts of adding support for a non-accepted kernel API on
> > > the packages I maintain.
> > > 
> > > -
> > > 
> > > Also, it makes no sense to add support on such general-purpose
> > > distros for some hardware that will never be supported by it.
> > > 
> > > See, there are, for instance, some types of hardware that are
> > > specific for some industry, like for instance, the CAN bus.
> > > While CAN buses remain restricted to vehicles, it won't make any 
> > > sense to crowd a general purpose distro with support for such
> > > hardware. Such distros are not certified with ASIL. So, they
> > > aren't allowed by law to be used inside vehicles.  
> 
> I'm not sure that's the best example, CAN has uses in other types of
> devices, some of which may run a general-purpose distribution.

Surely. That's why I added the "While CAN buses remain restricted to 
vehicles" on the above phrase. This was created for a demand from
one specific industry, by it could be used on other places.

The same happened in the past with cameras that required an ISP
IP block: they started being used only on embedded, but migrated 
to laptops and other devices after some time.

> > And github pile of ... is certified?
> > 
> > In attempt to find general solution for all types of APIs and devices,
> > we won't solve anything.  

A maintainer's summit discussion is the forum for discussing issues
that cross multiple subsystems. AI/ML is not the first case where
new APIs are needed, nor will be the last one. 

So, while I agree that AI/ML should be discussed, it can't stop on
it, as similar issues happen on other subsystems.

> > So I suggest to return and talk about AI/ML devices and APIs that
> > targeted for enterprise/cloud and needs to be supported by major
> > distros.  
> 
> And that I don't :-) I think the issue is the same for at least GPUs and
> AI/ML accelerators, and quite possible camera ISPs too. I'd like to try
> and define clear sets of criteria to address the problem, and that can
> include different alternatives (just as an example, not necessarily
> something I'd advocate for, open userspace vs. documentation) that
> subsystems can then select based on their specific situation.

Agreed.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-12 13:25                     ` Greg KH
  2021-09-12 14:15                       ` Leon Romanovsky
@ 2021-09-12 15:55                       ` Laurent Pinchart
  2021-09-12 16:43                         ` James Bottomley
  1 sibling, 1 reply; 77+ messages in thread
From: Laurent Pinchart @ 2021-09-12 15:55 UTC (permalink / raw)
  To: Greg KH
  Cc: Leon Romanovsky, Thomas Gleixner, Josh Triplett,
	Mauro Carvalho Chehab, Jonathan Corbet, ksummit

Hi Greg and Leon,

(Sorry to reply in the middle of the thread, but there is context I
wanted to reply to that has been deleted in the last e-mails)

On Sun, Sep 12, 2021 at 03:25:58PM +0200, Greg KH wrote:
> On Sun, Sep 12, 2021 at 11:29:45AM +0300, Leon Romanovsky wrote:
> > On Sun, Sep 12, 2021 at 09:26:57AM +0200, Greg KH wrote:
> > > On Sun, Sep 12, 2021 at 07:27:55AM +0300, Leon Romanovsky wrote:
> > > > On Sun, Sep 12, 2021 at 01:04:01AM +0300, Laurent Pinchart wrote:
> > > > > On Sat, Sep 11, 2021 at 03:04:07PM +0300, Leon Romanovsky wrote:
> > > > > > On Sat, Sep 11, 2021 at 02:41:52PM +0300, Laurent Pinchart wrote:
> > > > > > > On Sat, Sep 11, 2021 at 01:31:02PM +0300, Leon Romanovsky wrote:
> > > > > > > > On Sat, Sep 11, 2021 at 01:55:16AM +0200, Thomas Gleixner wrote:
> > > > > > > > > On Fri, Sep 10 2021 at 16:45, Josh Triplett wrote:
> > > > > > > > > 
> > > > > > > > > > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote:
> > > > > > > > > >> On media, enforcing userspace to always be open source would
> > > > > > > > > >> have been very bad, as it would prevent several videoconferencing 
> > > > > > > > > >> software to exist on Linux.
> > > > > > > > > >
> > > > > > > > > > I don't think we should enforce that all userspace users of an interface
> > > > > > > > > > be Open Source. I do think we should enforce that *some* userspace user
> > > > > > > > > > of an interface be Open Source before we add the interface.
> > > > > > > > > 
> > > > > > > > > The real question is whether the interface is documented in a way that
> > > > > > > > > an Open Source implementation is possible. It does not matter whether it
> > > > > > > > > exists at that point in time or not. Even if it exists there is no
> > > > > > > > > guarantee that it is feature complete.
> > > > > > > > > 
> > > > > > > > > Freely accessible documentation is really the key.
> > > > > > > > 
> > > > > > > > I have more radical view than you and think that documentation is far
> > > > > > > > from being enough. I would like to see any userspace API used (or to be
> > > > > > > > used) in any package which exists in Debiam/Fedora/SuSE.
> > > > > > > 
> > > > > > > We probably need to add Android AOSP to that list, as we have
> > > > > > > Android-specific APIs (not that I believe we *should* have
> > > > > > > Android-specific APIs, there's been lots of efforts over the past years
> > > > > > > to develop standard APIs for use cases that stem from Android, slowly
> > > > > > > replacing Android-specific APIs in some area, but I don't believe we can
> > > > > > > realisticly bridge that gap completely overnight, if ever).
> > > > > > 
> > > > > > Maybe.
> > > > > > 
> > > > > > > > Only this will give us some sort of confidence that API and device are usable
> > > > > > > > to some level. As a side note, we will be able to estimate possible API
> > > > > > > > deprecation/fix/extension based on simple search in package databases.
> > > > > > > 
> > > > > > > Linux supports devices from very diverse markets, from very tiny
> > > > > > > embedded devices to supercomputers. We have drivers for devices that
> > > > > > > exist in data centres of a single company only, or for which only a
> > > > > > > handful of units exist through the world. The set of rules that we'll
> > > > > > > decide on, if any, should take this into account.
> > > > > > 
> > > > > > I'm part of that group (RDMA) who cares about enterprise, cloud and supercomputers. :)
> > > > > > So for us, working out-of-the box (distro packages and not github code drops) is
> > > > > > the key to the scalability.
> > > > > 
> > > > > What if we're dealing with a device that only exists in a handful of
> > > > > machines though ? Would distributions accept the burden of packaging
> > > > > corresponding userspace code, and maintaining the packages, when only a
> > > > > handful of people in the world will use it ? It's a genuine question.
> > > > 
> > > > Fedora, Debian and OpenSuSE are volunteer based distributions, they
> > > > accept new packages, which need to be prepared (or asked to be
> > > > prepared) by such vendors.
> > > > 
> > > > There is no "accept the burden of packaging corresponding userspace code,
> > > > and maintaining the packages", it is on package maintainer who can or
> > > > can't be associated with distribution.
> > > > 
> > > > > > Regarding "embedded devices", I remind that we are talking about
> > > > > > userspace API and most likely busybox will be used for them, which is
> > > > > > also part of larger distro anyway, so fails under category "exists in
> > > > > > Debian/Fedora/SuSE".
> > > > > 
> > > > > We're talking about APIs exposed by drivers, for devices such as GPUs,
> > > > > cameras or AI/ML accelerators. I don't think busybox will exercise those
> > > > > :-) We have Masa for GPUs, libcamera for cameras, and other frameworks
> > > > > I'm less familiar with for AI/ML accelerators, and I expect those to be
> > > > > packaged by distributions. There are however other kind of devices that
> > > > > don't fall in existing well-defined categories.
> > > > 
> > > > I'm a little bit confused here. IMHO, you are trying to find an universal
> > > > solution for a problem that doesn't exist.
> > > > 
> > > > Above you asked how to deal with niche devices? Here you talk about mass
> > > > products devices for the enterprise while before you mentioned "embedded
> > > > devices".
> > > > 
> > > > 1. Niche devices - continue to do as they do it now, by supplying
> > > > out-of-tree solutions for their customers. Such devices and companies
> > > > rarely need upstream linux kernel support, because the burden to
> > > > upstream it is very high. We don't want them in the tree either, because
> > > > once they upstream it, the maintenance burden will be on us.
> > > 
> > > {sigh}
> > > 
> > > No, that is NOT our rule at all.
> > > 
> > > These devices and companies need to be upstream more than anything else
> > > as that way they become part of our community and are responsible for
> > > maintaining their code in the tree.  To force them to remain outside is
> > > to go against everything that many of us have been saying for _decades_
> > > now.
> > > 
> > > And how are you going to judge what is, and is not, a "niche" device?

I partly side with Greg here. I welcome drivers for "niche" devices,
regardless of how we define them, in the kernel, *if* they comply with
the rules. In some cases companies won't bother to upstream the code
because of the "niche" and ROI criteria, but that should only be their
decision, *not* something we force upon them.

There could be some exceptions, when the device architecture is so alien
that it would require an effort from the community that we just can't
afford at that particular point of time (rewriting the driver model for
instance), but I think that would be caught by the ROI criteria anyway.

> > I will leave to that company to decide. Again this is exactly how they
> > operate now, there is nothing new here. Every company calculates ROI
> > for working with upstream and small companies with niche devices are not
> > different here.
> > 
> > The main idea that I want to see working userspace stack, and being in
> > distro sets a certain quality level, am I asking too much?
> 
> Define "working userspace stack" and "distro" please.  Like others have
> said, many distros will not take userspace code unless it's already in
> the kernel tree first, as that ensures that the abi will not break.

As mentioned in another part of the mail thread, requiring code being
merged in upstream userspace projects and/or packaged by distributions
will cause deadlocks, but requiring code to be submitted and
(pre-)approved is workable. That's what DRM/KMS does. To upstream a new
KMS property for instance, you need to show how it's going to be used in
Weston/Xorg/Android/... by submitting patches, and have the overall
architecture approved by the corresponding maintainers.

Does this raise the bar to entry ? Yes. But it also expands the
community, we've seen cases where vendors, being told that their random
unproven API wouldn't be accepted in the kernel, looked for options and
realized that other vendors were facing the exact same problem. This
leads to cross-vendor discussions and collaborative design of solutions.
That's the Linux kernel development model as far as I'm concerned.

I do however agree that defining "working userspace stack" precisely
will be difficult, but I don't see that as an unsolvable problem.
Whatever criteria we set, if someone wants to cheat, it will always be
possible. We need to assume a minimum level of good faith on all sides.
After all, if that wasn't the case, collaboration would be inherently
impossible. If a vendor is then caught cheating, that will damage their
reputation. We will be more cautious the next time they submit, and we
could even decide to drop drivers in that case (not that I'd push for
that in particular, it's just an example of options that we can
evaluate).

> > > > 2. Devices that hits the certain level of adoption - need to be
> > > > integrated into certain userspace stack, which needs to be part of
> > > > distro.
> > > 
> > > Distros are a very odd rule to rely on given that they are by far the
> > > minority of the usage in raw numbers for Linux in the world.
> > 
> > You can count Android as another distro, it is just semantics.
> 
> But how do you define Android's userspace?  Just one vendor?  2 vendors?
> 10 vendors?

Possibly AOSP ? We don't need to have device support merged into AOSP as
a criteria there, but if we have a multi-vendor framework that becomes
the de facto standard in Linux for a particular set of use cases (think
about Mesa for instance), having the framework included in generic
distributions and having the device support submitted in the framework
would be enough in my opinion. We don't need to wait until the support
for that particular device hits distributions eventually when packages
will be updated.

As I said previously, we need to consider the end goal, but also create
the path to achieve it. It's not fair telling vendors what they have to
achieve if no way to do so exists.

> There is major userspace fragmentation in Android userspace
> in many places, the user/kernel boundry being one of the big ones as
> many of us have found out over the past years.  And many of us are
> working to resolve this, but it's not so simple at times, and I have
> many examples if you want specifics.
> 
> > > > And AI/ML is no different here, someone just need to start build such
> > > > stack. Otherwise, we will continue to see more free riders like HabanaLabs
> > > > which don't have any real benefit to the community.
> > > 
> > > Everyone contributes to Linux in a selfish manner, that's just how the
> > > community works.

That is not true, you're disregarding at least hobbyists here. They
contribute for a wide variety of reasons, and most often get something
in return (a working device, knowledge, a professional reputation, or
even just the sense of contributing to humanity). The same is true of
some companies too. Unless we're getting in the philosophical debate of
whether true altruim even exists (I'd be happy to discuss that, but not
here), I wouldn't qualify all this as selfish.

> > > The work that companies like habanalabs is NOT being a
> > > "free rider" at all, they have worked with us and done the hard work of
> > > actually getting their code merged into the tree.
> > 
> > I perfectly remember them trying to bypass netdev and RDMA communities
> > by pretending "misc" device.
> > 
> > https://lore.kernel.org/linux-rdma/20200915133556.21268811@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/
> > https://lore.kernel.org/linux-rdma/20200917171833.GJ8409@ziepe.ca/
> > 
> > Or DRM
> > https://lore.kernel.org/linux-rdma/CAKMK7uFOfoxbD2Z5mb-qHFnUe5rObGKQ6Ygh--HSH9M=9bziGg@mail.gmail.com/
> > 
> > So I can agree with the statement "worked hard", but not with the
> > relevant communities.
> 
> I point at these as doing exactly what we want vendors to be doing!
> Thank you for finding the good examples.  This is a vendor submitting
> patches and saying, "here is what we want to do, with a first cut at
> doing it."  It's up to us as a community to tell them if they are doing
> it the right way or not.

Isn't it exactly what we're discussing here ? Isn't telling them that we
can't accept the driver because the device can't be used at all without
their closed-source blob an acceptable way of saying they're doing it
wrong ?

> If we just let them all go their own ways, they will come up with
> horrible apis and interfaces, we have all seen that before.
> 
> So by working together, we both can learn from, and work together to
> solve the issue.  And that is what these driver authors and company has
> been doing!  They are part of our community, why are you saying they
> should now just go do their own thing away from us?

I feel some cognitive dissonance here :-) I don't really interpret
anyone's comment in this mail thread as telling vendors to go away.
Quite the contrary, as I mentioned above, requiring open userspace leads
to standardized userspace framework and better designs in the end, which
*is* community building.

> And as for "bypassing", that feels very mean.  We have had accelerator
> code in the char/misc and other parts of the kernel tree since at least
> 2018 if not earlier (I didn't look all that hard.)

That's when subsystems have been bypassed. I vividly recall a discussion
at plumbers on this topic a few years ago, about creating an accelerator
subsystem and what requirements it should have. Some people pushed for
an unregulated subsystem with vendor-specific per-driver userspace APIs,
and some called for standardization of frameworks in userspace.  There
was no agreement at the time, but instead of trying to continue the
effort, vendors got given a backdoor in drivers/misc/. I'll let the
corresponding community members speak up if they recognize themselves
and want to participate in the discussion, but I can tell you that this
has been felt as a betrayal of our core values and a major blow to many
attempts at fostering collaboration in userspace.

> Just because someone
> wanted to use the in-kernel apis that are there (why is dma-buf some
> magic thing?) does not mean that they suddenly need to move to a
> different subsystem.

It means they should have been in a different subsystem from the
beginning. After the bitter taste that the accelerators mess left in
2018, I know some people decided they had no other option than ignore
those drivers, as long as they would stay there and do their own thing
by themselves. Adding support for dma-buf means interoperating with
other devices and drivers. That's a clear indication that those drivers
are spreading their reach within the kernel, and if we accept this, then
vendors will have free reigns to bypass any subsystem for any type of
device by claiming it's a bit different. You mentioned you don't like
fragmentation, that's exactly what we would have.

> We get at least 1-2 new subsystems and major drivers that get added to
> the kernel tree that do things that have never been done before with
> custom user/kernel apis every kernel release.  Not everything can be a
> standard api no matter how much I, and others, wish it were.

I don't think anyone has called for everything being standard. The point
we're discussing is whether the non-standard APIs need to have a
corresponding open userspace.

On a side note, it's also not all black or white, in many cases device
expose a standard API with device-specific pieces. A GPU driver uses the
DRM API, which standardizes a set of common operations, but has also a
set of custom ioctls to submit jobs to the GPU. Over time we may find
that some of those custom ioctls may be standardized, but certainly not
all of them. That's fine, nobody is complaining about that.

> As examples, what about the hyperv blob api that was submitted recently
> going around the block layer?  What about the new Intel accelerator that
> added yet-another-set-of-custom-ioctls?  What about the rpi drivers?
> What about the virtualbox drivers?  Should all of those just live
> outside of the kernel for forever?

I'll comment on the RPi drivers only as I'm not familiar with the rest.

It's interesting that you mention Raspberry Pi, as I've been working
with them over the past couple of years to upstream camera support.
They've had out-of-tree camera drivers since 2013, available in the RPi
downstream kernel only. The situation is changing, we're working on
upstreaming those drivers. This has required a very large amount of work
in two areas:

- In the kernel, the drivers use V4L2 with custom extensions that make
  them incompatible with camera sensor drivers in upstream. This means
  that merging, for instance, the RPi driver for the Sony IMX477 driver
  would make it usable on a RPi, but not on any other device. To solve
  this we're working on standardizing V4L2 extensions to cover the
  corresponding use cases. It's a large amount of work, which we've only
  been able to do by finding multiple vendors who are facing the same
  issues and convincing them to sponsor the development. If camera
  drivers could be merged in drivers/misc/ this would never have been
  possible.

  Note that large companies that have the resources to solve this issues
  often lack either the will or the knowledge, if not both. If you look
  at out-of-tree camera drivers from NVidia, Intel or other vendors, you
  will see duplicated drivers for camera sensors from Sony, OmniVision,
  ON Semi, ... that are bundled with the SoC kernel camera drivers,
  implemented in incompatible ways.

- In userspace, RPi didn't have any framework in which to upstream any
  code, as there was simply no userspace camera framework (V4L2 is a
  kernel API designed to be used directly by application, the V4L2
  equivalent to libalsa was a historical mistake and is considered
  legacy now). RPi has thus moved all the camera code that can't live in
  the kernel to a firmware (there's lots of complex algorithms that need
  to be implemented to make an embedded camera work, unlike USB webcams
  that are comparatively extremely simple to handle in all layers of the
  stack because the complex part is implemented inside the webcam). The
  firmware had support for 3 camera sensors, and that's more or less all
  that could be used on a RPi. Adding support for new sensors wans't
  possible for users, creating a very closed stack.

  The situation has changed with the development of the libcamera
  project. We now have a framework where vendors can upstream the
  device-specific userspace code, and RPi has done so. They've been more
  open than any other camera ISP vendor on the market today (there was
  also the TI OMAP3 that had public ISP documentation, but that's legacy
  and since then vendors have shifted to keeping everything closed).

  Here too this has been made possible because we have identified a
  problem and tried to fix it. It's a complex area, the amount of work
  required is huge, and it's very difficult to get vendors to do the
  right thing and contribute. I see camera support as another example of
  a situation where most vendors do it wrong, and we have to push them
  to collaborate and do it right instead. If we allowed camera kernel
  drivers with custom undocumented APIs and no open userspace, none of
  this would be possible.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-12 14:34                         ` Greg KH
@ 2021-09-12 16:41                           ` Laurent Pinchart
  2021-09-12 20:35                           ` Dave Airlie
                                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 77+ messages in thread
From: Laurent Pinchart @ 2021-09-12 16:41 UTC (permalink / raw)
  To: Greg KH
  Cc: Leon Romanovsky, Thomas Gleixner, Josh Triplett,
	Mauro Carvalho Chehab, Jonathan Corbet, ksummit

Hi Greg,

On Sun, Sep 12, 2021 at 04:34:48PM +0200, Greg KH wrote:
> On Sun, Sep 12, 2021 at 05:15:30PM +0300, Leon Romanovsky wrote:
> > On Sun, Sep 12, 2021 at 03:25:58PM +0200, Greg KH wrote:
> > > > The main idea that I want to see working userspace stack, and being in
> > > > distro sets a certain quality level, am I asking too much?
> > > 
> > > Define "working userspace stack" and "distro" please.  Like others have
> > > said, many distros will not take userspace code unless it's already in
> > > the kernel tree first, as that ensures that the abi will not break.
> > 
> > Like I already answered
> > https://lore.kernel.org/all/YT2zryAKHc%2F5R2IH@unreal/
> > "To be used" means some open PR to existing package or request for
> > inclusion for new packages.
> 
> But again, distros will not take things that are not already in the
> kernel.

It's becoming difficult to follow the discussion as it has branched.
I've replied on this topic separately.

> > > > > > 2. Devices that hits the certain level of adoption - need to be
> > > > > > integrated into certain userspace stack, which needs to be part of
> > > > > > distro.
> > > > > 
> > > > > Distros are a very odd rule to rely on given that they are by far the
> > > > > minority of the usage in raw numbers for Linux in the world.
> > > > 
> > > > You can count Android as another distro, it is just semantics.
> > > 
> > > But how do you define Android's userspace?  Just one vendor?  2 vendors?
> > > 10 vendors?  There is major userspace fragmentation in Android userspace
> > > in many places, the user/kernel boundry being one of the big ones as
> > > many of us have found out over the past years.  And many of us are
> > > working to resolve this, but it's not so simple at times, and I have
> > > many examples if you want specifics.
> > 
> > Lauerent suggested AOSP
> > https://lore.kernel.org/all/YTyWANV%2FmSkQbYhj@pendragon.ideasonboard.com/
> 
> Vendors can not get code into AOSP for various reasons that only Google
> understands.  There are many millions, if not billions of Android
> devices out there with user/kernel apis that are not upstream nor in
> AOSP because Google doesn't want to take them, or because the vendor can
> not go through those hoops (international law is tricky at times...)
> 
> So are we to just not be able to take drivers that add those new apis if
> AOSP can not take the userspace side, yet the userspace side is
> published somewhere else?

"Open userspace" and "packaged in distros" are two criteria that have
been proposed. There are more, such as "open documentation" for
instance. It's up to us to decide what to do (if anything), and I don't
believe we'll be able to find one-size-fits-them-all criteria that can
apply globally. There is however in my opinion value in carefully
designing a set of criteria and document them, to then for instance let
subsystems pick the ones that work best for the type of devices they
handle.

The "packaged in distros" criteria is, as I understand it, an attempt to
avoid code dumps on git..b that would have been so badly designed that
they would be unmaintainable. It's a tricky area, what I think is
required is that vendors publish an open userspace implementation that
is serious enough, and not just a way to tick a box while circumventing
the spirit of the rule. Distro packaging may help achieving that, but
there are certainly other ways too. For me, at the end of the day it's
really about how to create a community starting from a single
implementation.

> > > > > > And AI/ML is no different here, someone just need to start build such
> > > > > > stack. Otherwise, we will continue to see more free riders like HabanaLabs
> > > > > > which don't have any real benefit to the community.
> > > > > 
> > > > > Everyone contributes to Linux in a selfish manner, that's just how the
> > > > > community works.  The work that companies like habanalabs is NOT being a
> > > > > "free rider" at all, they have worked with us and done the hard work of
> > > > > actually getting their code merged into the tree.
> > > > 
> > > > I perfectly remember them trying to bypass netdev and RDMA communities
> > > > by pretending "misc" device.
> > > > 
> > > > https://lore.kernel.org/linux-rdma/20200915133556.21268811@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/
> > > > https://lore.kernel.org/linux-rdma/20200917171833.GJ8409@ziepe.ca/
> > > > 
> > > > Or DRM
> > > > https://lore.kernel.org/linux-rdma/CAKMK7uFOfoxbD2Z5mb-qHFnUe5rObGKQ6Ygh--HSH9M=9bziGg@mail.gmail.com/
> > > > 
> > > > So I can agree with the statement "worked hard", but not with the
> > > > relevant communities.
> > > 
> > > I point at these as doing exactly what we want vendors to be doing!
> > > Thank you for finding the good examples.  This is a vendor submitting
> > > patches and saying, "here is what we want to do, with a first cut at
> > > doing it."  It's up to us as a community to tell them if they are doing
> > > it the right way or not.
> > > 
> > > If we just let them all go their own ways, they will come up with
> > > horrible apis and interfaces, we have all seen that before.
> > > 
> > > So by working together, we both can learn from, and work together to
> > > solve the issue.  And that is what these driver authors and company has
> > > been doing!  They are part of our community, why are you saying they
> > > should now just go do their own thing away from us?
> > 
> > This is not what I said. I don't see Intel (habanalabs) as a company
> > that can't create proper AI stack and think that this is our
> > responsibility to provide them enough incentive to do it.
> 
> So should we be forcing everyone to follow the IBM standard for
> accelerator drivers because they were in the kernel first all those
> years ago?  Or what other standard do we pick?
> 
> And why are we dictating new industry standards here?  Who are we to do
> that?  Who is going to take that responsibility on?
> 
> > > And as for "bypassing", that feels very mean.  We have had accelerator
> > > code in the char/misc and other parts of the kernel tree since at least
> > > 2018 if not earlier (I didn't look all that hard.)  Just because someone
> > > wanted to use the in-kernel apis that are there (why is dma-buf some
> > > magic thing?) does not mean that they suddenly need to move to a
> > > different subsystem.
> > 
> > Because dma-buf API has specific semantics and was designed with very
> > specific usage model in mind.
> 
> So will the IB patches usage be re-reviewed?
> 
> Anyway, we have apis that are used throughout the kernel all the time
> that don't end up on the various subsystem mailing list because people
> forget, or just do not know.  That's normal and something we have dealt
> with for forever.  As an example, I didn't realise that just using the
> dma-buf api required such a review.
> 
> Can we put that in the MAINTAINERS file somehow for apis?
> 
> > > We get at least 1-2 new subsystems and major drivers that get added to
> > > the kernel tree that do things that have never been done before with
> > > custom user/kernel apis every kernel release.  Not everything can be a
> > > standard api no matter how much I, and others, wish it were.
> > 
> > So when will you draw a line and ask to create proper susbsystem
> > with standard APIs? After 2, 3 ... 100 similar (from our point of view)
> > and different (from vendor point of view) devices with custom API?
> 
> That is a great question and I do not have the answer to that.  Should
> we have done that after the first one went into the kernel all those
> years ago?  Maybe, but I seem to recal the answer being "our hardware
> works much differently, so our user api will be much different", and
> that's a valid answer.

And it's also the answer that all vendors will give, because it's an
easy way to avoid doing extra work. It may sometimes be true, but that's
an exception rather than a rule.

It reminds me of something I've heard in a working group recently, when
someone mentioned a "key differentiating factor" that requires a free
ticket for vendors not to open the implementation, and a few seconds
later went on to say it was "available in all phones in the market
today". I won't call these lies, I believe that in most cases the
vendors actually believe it's true.

> If your standard can not handle new usage models and a way to handle
> that, then it isn't a good standard that companies will follow for new
> types of devices.
> 
> We have loads of char drivers with odd ioctl apis because we have loads
> of odd hardware devices out in the world.  We have been treating these
> accelerators like that for a long time now, except when they try to
> duplicate existing in-kernel code (like crypto or networking).

Going back to the "is an accelerator a GPU?" topic for a bit, DRM
doesn't prevent drivers from exposing custom features with custom API
elements. AI/ML accelerators aren't GPUs in the original sense of 3D
rendering accelerators (maybe that's a cause of misunderstanding, we're
not using the best terminology), but they fit pretty well within the
device model that DRM creates. The side effect of using DRM is that an
open userspace is required, and this is why some people in the community
believe Habanalabs tried to work around that rule by going for
drivers/misc/. I don't know enough about the history to know if they
were behaving in good faith or not, but maybe we could try to turn this
page by deciding on the right path forward together and forget about the
finger pointing and blaming.

> > > As examples, what about the hyperv blob api that was submitted recently
> > > going around the block layer?  What about the new Intel accelerator that
> > > added yet-another-set-of-custom-ioctls?  What about the rpi drivers?
> > > What about the virtualbox drivers?  Should all of those just live
> > > outside of the kernel for forever?
> > > 
> > > Of course not.
> > 
> > So what is your bar? Accept everything?
> 
> It's a hard line to draw, and for some reason, I seem to be the one
> having to review these types of drivers every kernel release.  If people
> wish to help me out, please do so, all the patches are on the lists.

This may be a controversial point, but could it be because vendors
perceive you as less likely to look closely and push back ? If
drivers/misc/ is seen as being free-for-all and other subsystems are
likely to ask for more work, natural laziness will push vendors to
drivers/misc/. 

> Right now I push back where I can and try to get semi-sane apis created
> that are "obviously not wrong" where I notice.  After that, I just need
> to trust that the maintainer of the driver knows what they are doing and
> will maintain the code going forward.  So far, it's worked out.
> 
> Do you have a better idea of what to do instead?

Is there a way we could push those drivers more strongly towards other
subsystems ? There's certainly no way you will be able to foster the
creating of a dozen userspace frameworks and related communities from
drivers/misc/ by yourself.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-12 15:55                       ` Laurent Pinchart
@ 2021-09-12 16:43                         ` James Bottomley
  2021-09-12 16:58                           ` Laurent Pinchart
  0 siblings, 1 reply; 77+ messages in thread
From: James Bottomley @ 2021-09-12 16:43 UTC (permalink / raw)
  To: Laurent Pinchart, Greg KH
  Cc: Leon Romanovsky, Thomas Gleixner, Josh Triplett,
	Mauro Carvalho Chehab, Jonathan Corbet, ksummit

On Sun, 2021-09-12 at 18:55 +0300, Laurent Pinchart wrote:
> As mentioned in another part of the mail thread, requiring code being
> merged in upstream userspace projects and/or packaged by
> distributions will cause deadlocks, but requiring code to be
> submitted and (pre-)approved is workable. That's what DRM/KMS does.
> To upstream a new KMS property for instance, you need to show how
> it's going to be used in Weston/Xorg/Android/... by submitting
> patches, and have the overall architecture approved by the
> corresponding maintainers.

This is no different from interlocks required in pretty much every
other project crossing open source feature, so it seems like the right
approach to me.  We already use this for confidential computing, which
often requires interlocking changes to QEMU, edk2 and other tools. 
Usually, for confidential computing, the evaluation is on either the
QEMU or edk2 list which then accepts the patch and the rest of the
projects follow.  We do, occasionally, get a late objection to the API
from one of the other projects after part of the enabling code has gone
upstream in the others, but we handle this like a bug fix.

James




^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-12 16:43                         ` James Bottomley
@ 2021-09-12 16:58                           ` Laurent Pinchart
  2021-09-12 17:08                             ` James Bottomley
  0 siblings, 1 reply; 77+ messages in thread
From: Laurent Pinchart @ 2021-09-12 16:58 UTC (permalink / raw)
  To: James Bottomley
  Cc: Greg KH, Leon Romanovsky, Thomas Gleixner, Josh Triplett,
	Mauro Carvalho Chehab, Jonathan Corbet, ksummit

On Sun, Sep 12, 2021 at 09:43:39AM -0700, James Bottomley wrote:
> On Sun, 2021-09-12 at 18:55 +0300, Laurent Pinchart wrote:
> > As mentioned in another part of the mail thread, requiring code being
> > merged in upstream userspace projects and/or packaged by
> > distributions will cause deadlocks, but requiring code to be
> > submitted and (pre-)approved is workable. That's what DRM/KMS does.
> > To upstream a new KMS property for instance, you need to show how
> > it's going to be used in Weston/Xorg/Android/... by submitting
> > patches, and have the overall architecture approved by the
> > corresponding maintainers.
> 
> This is no different from interlocks required in pretty much every
> other project crossing open source feature, so it seems like the right
> approach to me.  We already use this for confidential computing, which
> often requires interlocking changes to QEMU, edk2 and other tools. 
> Usually, for confidential computing, the evaluation is on either the
> QEMU or edk2 list which then accepts the patch and the rest of the
> projects follow.  We do, occasionally, get a late objection to the API
> from one of the other projects after part of the enabling code has gone
> upstream in the others, but we handle this like a bug fix.

On the DRM/KMS side that's also handled fine as far as I know, as
mentioned above. For cameras, libcamera is becoming the de facto
standard userspace stack, so we'll have a solution too. The harder
question is what to do when no standard userspace stack exists. The
answer obviously is to create one (or possibly multiple alternatives),
but we'll need more than wishful thinking to make that happened. I can
tell it took lots of work for libcamera to see the light of the day,
including on the business side of it, not just the technical side.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-12 16:58                           ` Laurent Pinchart
@ 2021-09-12 17:08                             ` James Bottomley
  0 siblings, 0 replies; 77+ messages in thread
From: James Bottomley @ 2021-09-12 17:08 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Greg KH, Leon Romanovsky, Thomas Gleixner, Josh Triplett,
	Mauro Carvalho Chehab, Jonathan Corbet, ksummit

On Sun, 2021-09-12 at 19:58 +0300, Laurent Pinchart wrote:
> On Sun, Sep 12, 2021 at 09:43:39AM -0700, James Bottomley wrote:
> > On Sun, 2021-09-12 at 18:55 +0300, Laurent Pinchart wrote:
> > > As mentioned in another part of the mail thread, requiring code
> > > being merged in upstream userspace projects and/or packaged by
> > > distributions will cause deadlocks, but requiring code to be
> > > submitted and (pre-)approved is workable. That's what DRM/KMS
> > > does. To upstream a new KMS property for instance, you need to
> > > show how it's going to be used in Weston/Xorg/Android/... by
> > > submitting patches, and have the overall architecture approved by
> > > the corresponding maintainers.
> > 
> > This is no different from interlocks required in pretty much every
> > other project crossing open source feature, so it seems like the
> > right approach to me.  We already use this for confidential
> > computing, which often requires interlocking changes to QEMU, edk2
> > and other tools.  Usually, for confidential computing, the
> > evaluation is on either the QEMU or edk2 list which then accepts
> > the patch and the rest of the projects follow.  We do,
> > occasionally, get a late objection to the API from one of the other
> > projects after part of the enabling code has gone upstream in the
> > others, but we handle this like a bug fix.
> 
> On the DRM/KMS side that's also handled fine as far as I know, as
> mentioned above. For cameras, libcamera is becoming the de facto
> standard userspace stack, so we'll have a solution too. The harder
> question is what to do when no standard userspace stack exists. The
> answer obviously is to create one (or possibly multiple
> alternatives), but we'll need more than wishful thinking to make that
> happened. I can tell it took lots of work for libcamera to see the
> light of the day, including on the business side of it, not just the
> technical side.

Well, you know, this is where Open Source as a Standard comes from.  We
see the same in Confidential Computing.  There are several
manufacturers and they always specify how they think their stuff should
work in their standards or code drops, but rarely get beyond a proof of
concept in their own labs.  Once we start moving it upstream, we find
points of similarity between the different chip vendors, or sometimes
specified implementations which plain don't work, and start modifying
the APIs to take this into account.  What we eventually end up with
often doesn't mirror what the manufacturer standard says but it ends up
being the actual standard for current and future confidential computing
chips.

James



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-10 21:00 [MAINTAINER SUMMIT] User-space requirements for accelerator drivers Jonathan Corbet
                   ` (2 preceding siblings ...)
  2021-09-10 22:52 ` Mauro Carvalho Chehab
@ 2021-09-12 19:13 ` Dave Airlie
  2021-09-12 19:48   ` Laurent Pinchart
  3 siblings, 1 reply; 77+ messages in thread
From: Dave Airlie @ 2021-09-12 19:13 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: ksummit

On Sat, 11 Sept 2021 at 07:10, Jonathan Corbet <corbet@lwn.net> wrote:
>
> There has been a regular disagreement in recent years about whether
> drivers for accelerators (such as for the Habana Gaudi device) should be
> subject to the same requirements as GPU drivers when it comes to the
> availability of a free implementation of the user-space side.  It flared
> up again recently:
>
>    https://lwn.net/Articles/867168/
>
> Happily, the Habana situation in particular seems to be resolving
> itself:
>
>    https://lwn.net/ml/linux-kernel/CAFCwf119s7iXk+qpwoVPnRtOGcxeuZb3rnihf6NWWoVT-4ODHA@mail.gmail.com/
>
> But even there it is clear that the fundamental question has not yet
> been resolved.
>
> This seems like the sort of question that the maintainer summit exists
> to address.  Specifically, we could discuss:
>
>  - Under which circumstances should the kernel community require the
>    existence of freely licensed user-space code that exercises all
>    functionalities of a proposed kernel driver or feature?
>
>  - Are there internal kernel interfaces, such as DMA-BUF or P2PDMA, that
>    are only available to drivers with a free user-space implementation?
>    Do we need an EXPORT_SYMBOL_USERSPACE_GPL()?
>
>  - What constitutes an acceptable user-space implementation in cases
>    where these restrictions apply?
>
> I suspect that more clarity (and fewer arguments) on these questions
> would be welcome both within and beyond the development community.
>
> Thanks,

Can everyone take a read of:

https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements

I think in order to clean the signal/noise ratio up in here, some
education effort might help people realise how non-trivial these
things are.

1. These drivers are not one or two ioctls that a few selftests and a
small test app can cover. It's like saying LTP is all we need to
define the uAPI for the kernel and if anyone does something LTP
doesn't cover the app is broken. These systems are generally complex,
multithreaded and multiuser uAPIs, involving command streams recorded
in userspace being submitted to the devices. They interact with memory
management and can cause unfindable deadlocks across the system if
designed incorrectly. Documentation or kselftests aren't going to cut
it here.

2. In my experience we don't build communities by merging everything,
we build them by saying No more and pushing back on companies with
education and cross-vendor cooperation. Responsible kernel maintenance
shouldn't end at the kernel boundaries. If you aren't the person to
help enforce a userspace for a driver you are being asked to merge,
then don't merge it, but try and engage the vendor with the
communities of interest in the kernel who already deal in those areas.

3. The pressures on these companies to merge things into Linux isn't
altruistic or even that they necessarily want to be in the Linux
kernel upstream. They are being told by Red Hat, Facebook, Google or
someone else that they need an upstream driver. They will generally
engage at a minimal level to get past that blockage and then
disengage. Having a clear set of rules (or a place to discuss those
rules, for new subsystems) and a gentle pushback helps develop
communities by unlocking funding within those larger areas. As Laurent
has said this isn't free, but just putting things into the kernel and
not caring about userspace hasn't built any Linux communities in the
accelerator areas.

That said I started writing a cleaned up version of the above document
which is more generic that other subsystems could sign on to. I was
going to engage with a coalition of like-minded maintainers rather
than trying to gain consensus among a herd of cats to see if we can
draw clearer lines in the sand that cross more subsystems so the
experience of drivers/gpu doesn't go unwasted but also isn't just
bypassed by subsystem hunting.

https://cgit.freedesktop.org/~airlied/linux/log/?h=wip-open-source-userspace

Dave.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-12 19:13 ` Dave Airlie
@ 2021-09-12 19:48   ` Laurent Pinchart
  2021-09-13  2:26     ` Dave Airlie
  0 siblings, 1 reply; 77+ messages in thread
From: Laurent Pinchart @ 2021-09-12 19:48 UTC (permalink / raw)
  To: Dave Airlie; +Cc: Jonathan Corbet, ksummit

Hi Dave,

On Mon, Sep 13, 2021 at 05:13:05AM +1000, Dave Airlie wrote:
> On Sat, 11 Sept 2021 at 07:10, Jonathan Corbet <corbet@lwn.net> wrote:
> >
> > There has been a regular disagreement in recent years about whether
> > drivers for accelerators (such as for the Habana Gaudi device) should be
> > subject to the same requirements as GPU drivers when it comes to the
> > availability of a free implementation of the user-space side.  It flared
> > up again recently:
> >
> >    https://lwn.net/Articles/867168/
> >
> > Happily, the Habana situation in particular seems to be resolving
> > itself:
> >
> >    https://lwn.net/ml/linux-kernel/CAFCwf119s7iXk+qpwoVPnRtOGcxeuZb3rnihf6NWWoVT-4ODHA@mail.gmail.com/
> >
> > But even there it is clear that the fundamental question has not yet
> > been resolved.
> >
> > This seems like the sort of question that the maintainer summit exists
> > to address.  Specifically, we could discuss:
> >
> >  - Under which circumstances should the kernel community require the
> >    existence of freely licensed user-space code that exercises all
> >    functionalities of a proposed kernel driver or feature?
> >
> >  - Are there internal kernel interfaces, such as DMA-BUF or P2PDMA, that
> >    are only available to drivers with a free user-space implementation?
> >    Do we need an EXPORT_SYMBOL_USERSPACE_GPL()?
> >
> >  - What constitutes an acceptable user-space implementation in cases
> >    where these restrictions apply?
> >
> > I suspect that more clarity (and fewer arguments) on these questions
> > would be welcome both within and beyond the development community.
> >
> > Thanks,
> 
> Can everyone take a read of:
> 
> https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements
> 
> I think in order to clean the signal/noise ratio up in here, some
> education effort might help people realise how non-trivial these
> things are.
> 
> 1. These drivers are not one or two ioctls that a few selftests and a
> small test app can cover. It's like saying LTP is all we need to
> define the uAPI for the kernel and if anyone does something LTP
> doesn't cover the app is broken. These systems are generally complex,
> multithreaded and multiuser uAPIs, involving command streams recorded
> in userspace being submitted to the devices. They interact with memory
> management and can cause unfindable deadlocks across the system if
> designed incorrectly. Documentation or kselftests aren't going to cut
> it here.
> 
> 2. In my experience we don't build communities by merging everything,
> we build them by saying No more and pushing back on companies with
> education and cross-vendor cooperation. Responsible kernel maintenance
> shouldn't end at the kernel boundaries. If you aren't the person to
> help enforce a userspace for a driver you are being asked to merge,
> then don't merge it, but try and engage the vendor with the
> communities of interest in the kernel who already deal in those areas.
> 
> 3. The pressures on these companies to merge things into Linux isn't
> altruistic or even that they necessarily want to be in the Linux
> kernel upstream. They are being told by Red Hat, Facebook, Google or
> someone else that they need an upstream driver. They will generally
> engage at a minimal level to get past that blockage and then
> disengage. Having a clear set of rules (or a place to discuss those
> rules, for new subsystems) and a gentle pushback helps develop
> communities by unlocking funding within those larger areas. As Laurent
> has said this isn't free, but just putting things into the kernel and
> not caring about userspace hasn't built any Linux communities in the
> accelerator areas.
> 
> That said I started writing a cleaned up version of the above document
> which is more generic that other subsystems could sign on to. I was
> going to engage with a coalition of like-minded maintainers rather
> than trying to gain consensus among a herd of cats to see if we can
> draw clearer lines in the sand that cross more subsystems so the
> experience of drivers/gpu doesn't go unwasted but also isn't just
> bypassed by subsystem hunting.
> 
> https://cgit.freedesktop.org/~airlied/linux/log/?h=wip-open-source-userspace

Thank you for that effort. Could you add camera ISPs to the list with
FPGAs, DSPs and ML accelerators ?

You mention Level0 in that document. I assume you don't mean the
OpenStreetMap editor ?

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-12  7:26                 ` Greg KH
  2021-09-12  8:29                   ` Leon Romanovsky
@ 2021-09-12 19:52                   ` Dave Airlie
  1 sibling, 0 replies; 77+ messages in thread
From: Dave Airlie @ 2021-09-12 19:52 UTC (permalink / raw)
  To: Greg KH
  Cc: Leon Romanovsky, Laurent Pinchart, Thomas Gleixner,
	Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit

Outside of all this, I disagree on distros being the target at all.
However distros are probably a good thing to have involved.

It's much easier for a distro to package one project than one
per-vendor project, esp if that one project has a release cycle.

If every company forks an LLVM backend for example distros will never
be able to ship things, getting the LLVM backends upstream into LLVM
means the distro get it for "free" on the next LLVM release. Creating
kernel-like communities for userspace should be the goal, why do we
want to forget the benefits the kernel ecosystem has given us as soon
as we exit a syscall handler?

> >
> > 1. Niche devices - continue to do as they do it now, by supplying
> > out-of-tree solutions for their customers. Such devices and companies
> > rarely need upstream linux kernel support, because the burden to
> > upstream it is very high. We don't want them in the tree either, because
> > once they upstream it, the maintenance burden will be on us.
>
> {sigh}
>
> No, that is NOT our rule at all.
>
> These devices and companies need to be upstream more than anything else
> as that way they become part of our community and are responsible for
> maintaining their code in the tree.  To force them to remain outside is
> to go against everything that many of us have been saying for _decades_
> now.

Name one group that has actively become part of the community via this
advice, (I'll wait).

From my view most of the communities have been created with more
push-back by kernel maintainers, gpus, rdma, media, alsa vs misc (X
accel drivers with no home or common cause).



> > And AI/ML is no different here, someone just need to start build such
> > stack. Otherwise, we will continue to see more free riders like HabanaLabs
> > which don't have any real benefit to the community.
>
> Everyone contributes to Linux in a selfish manner, that's just how the
> community works.  The work that companies like habanalabs is NOT being a
> "free rider" at all, they have worked with us and done the hard work of
> actually getting their code merged into the tree and their userspace
> code released under an open source license (unlike _ALL_ other AI/ML
> companies, including Intel).  It would have been much cheaper and
> quicker of them to just ignore upstream entirely, but that would have
> meant that the community would not have any idea of what exactly these
> use-case models were nor what the problems were that they were trying to
> get Linux to do.

These companies don't get to ignore upstream entirely. They aren't
here because they want to be, at least initially, they are here
because RHEL, Amazon, Facebook, Google whoever told them they would
buy their hw if it had upstream drivers in a contract and they have to
do the minimal amount of work to get past Greg to merge stuff and
satisfy that agreement. The community is very well aware of the needs
of these groups, it's not like we don't have lots of GPUs being using
for AI/ML. The habanalabs hardware is just a VLIW multithreaded
processor almost like taking an AMD evergreen and shaving off the
texture engines and other GPU specific bits. There is nothing new or
exciting here that hasn't been solved.

>
> Linux benefits overall by having everyone participate, do NOT make
> arbitrary rules to somehow prevent one company/group from being allowed
> to upstream their code vs. another.  That is NOT how we have worked in
> the past, and would only cause us to slowly die and become irrelevant.

The Linux Foundation might benefit, Linux doesn't. Linux benefits and
stays maintainable by having responsible maintainers guide the
direction of the kernel design, and creating upstream communities to
sustain that.

Dave.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-12 14:34                         ` Greg KH
  2021-09-12 16:41                           ` Laurent Pinchart
@ 2021-09-12 20:35                           ` Dave Airlie
  2021-09-12 20:41                           ` Dave Airlie
  2021-09-13 14:03                           ` Mark Brown
  3 siblings, 0 replies; 77+ messages in thread
From: Dave Airlie @ 2021-09-12 20:35 UTC (permalink / raw)
  To: Greg KH
  Cc: Leon Romanovsky, Laurent Pinchart, Thomas Gleixner,
	Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit

>
> So should we be forcing everyone to follow the IBM standard for
> accelerator drivers because they were in the kernel first all those
> years ago?  Or what other standard do we pick?
>
> And why are we dictating new industry standards here?  Who are we to do
> that?  Who is going to take that responsibility on?

There is no sane kernel API standards here, the standards that control
these devices live out in userspace, far away from the world you want
to inhabit. Responsible kernel maintainership should come with
knowledge of the entire ecosystem and where it's going. If people are
trying to merge kernel drivers and you don't have enough
info/knowledge about the ecosystem, then say No.

> >
> > Because dma-buf API has specific semantics and was designed with very
> > specific usage model in mind.
>
> So will the IB patches usage be re-reviewed?
>
> Anyway, we have apis that are used throughout the kernel all the time
> that don't end up on the various subsystem mailing list because people
> forget, or just do not know.  That's normal and something we have dealt
> with for forever.  As an example, I didn't realise that just using the
> dma-buf api required such a review.
>
> Can we put that in the MAINTAINERS file somehow for apis?

We have had MAINTAINERS rules matching on the dma-buf includes

78baee8d3b976a6a6a2c208e3a36d3f1e6297e6c
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Dec 4 22:51:05 2019 +0100

    MAINTAINERS: Match on dma_buf|fence|resv anywhere
and a later followup to clean it up a bit.

> That is a great question and I do not have the answer to that.  Should
> we have done that after the first one went into the kernel all those
> years ago?  Maybe, but I seem to recal the answer being "our hardware
> works much differently, so our user api will be much different", and
> that's a valid answer.

Every GPU driver has a different user API, all of them there is no
standard. We still merge them but we require userspace. Maybe if you
could sign up to follow the same rules it might be less onerous on
you.

>
> It's a hard line to draw, and for some reason, I seem to be the one
> having to review these types of drivers every kernel release.  If people
> wish to help me out, please do so, all the patches are on the lists.

We do help out, we've said No. I've no idea why you go ahead and merge
things sometimes.

Creating a trash pile in your neighbourhood and then complaining when
more people continue to dump more trash on it seems a little
disingenuous to me.

We need to take more responsibility for the way these things are used,
and making sure there are frameworks for them.

We got things where they were by saying upstream first a lot without
thinking of the consequences of success. Now we have that success we
should start thinking of the responsibilities that come with it.
Distros like RHEL/Centos are why these vendors are pushing stuff
upstream, the want to be included. However once they do that they no
longer gain the benefits of the Linux development model and just run
off and spawn 20 userspace projects that they can maintain control
over. Companies love control, they hate not having ultimate say over
their kernel drivers, and they won't willingly create userspace
projects where that happens either, not unless we work together for
the good of the ecosystem, not just the good of the kernel.

Dave.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-12 14:34                         ` Greg KH
  2021-09-12 16:41                           ` Laurent Pinchart
  2021-09-12 20:35                           ` Dave Airlie
@ 2021-09-12 20:41                           ` Dave Airlie
  2021-09-12 20:49                             ` Daniel Vetter
  2021-09-13 14:03                           ` Mark Brown
  3 siblings, 1 reply; 77+ messages in thread
From: Dave Airlie @ 2021-09-12 20:41 UTC (permalink / raw)
  To: Greg KH
  Cc: Leon Romanovsky, Laurent Pinchart, Thomas Gleixner,
	Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit

>
> So will the IB patches usage be re-reviewed?

https://lore.kernel.org/linux-rdma/MW3PR11MB4555CCCDD42F1ADEC61F7ACAE5AB0@MW3PR11MB4555.namprd11.prod.outlook.com/

FYI it's a thread where GPU devs reviewing IB dma-buf patches, what's
next cat and dogs living together?

Dave.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-12 20:41                           ` Dave Airlie
@ 2021-09-12 20:49                             ` Daniel Vetter
  2021-09-12 21:12                               ` Dave Airlie
  0 siblings, 1 reply; 77+ messages in thread
From: Daniel Vetter @ 2021-09-12 20:49 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Greg KH, Leon Romanovsky, Laurent Pinchart, Thomas Gleixner,
	Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit

On Sun, Sep 12, 2021 at 10:41 PM Dave Airlie <airlied@gmail.com> wrote:
> > So will the IB patches usage be re-reviewed?
>
> https://lore.kernel.org/linux-rdma/MW3PR11MB4555CCCDD42F1ADEC61F7ACAE5AB0@MW3PR11MB4555.namprd11.prod.outlook.com/
>
> FYI it's a thread where GPU devs reviewing IB dma-buf patches, what's
> next cat and dogs living together?

And as you can see, the review has been long and involved a ton of
different dri-devel (and rdma ofc too) folks.

It's like there's an entire community of experts at hands who could
help out in reviewing these things, if only we'd not have a maintainer
who happily bypasses all that and invites all the dumpster fires into
drivers/misc. And then complains that no one helps with reviewing ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-12 20:49                             ` Daniel Vetter
@ 2021-09-12 21:12                               ` Dave Airlie
  2021-09-12 22:51                                 ` Linus Walleij
  0 siblings, 1 reply; 77+ messages in thread
From: Dave Airlie @ 2021-09-12 21:12 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Greg KH, Leon Romanovsky, Laurent Pinchart, Thomas Gleixner,
	Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit

On Mon, 13 Sept 2021 at 06:49, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Sun, Sep 12, 2021 at 10:41 PM Dave Airlie <airlied@gmail.com> wrote:
> > > So will the IB patches usage be re-reviewed?
> >
> > https://lore.kernel.org/linux-rdma/MW3PR11MB4555CCCDD42F1ADEC61F7ACAE5AB0@MW3PR11MB4555.namprd11.prod.outlook.com/
> >
> > FYI it's a thread where GPU devs reviewing IB dma-buf patches, what's
> > next cat and dogs living together?
>
> And as you can see, the review has been long and involved a ton of
> different dri-devel (and rdma ofc too) folks.
>
> It's like there's an entire community of experts at hands who could
> help out in reviewing these things, if only we'd not have a maintainer
> who happily bypasses all that and invites all the dumpster fires into
> drivers/misc. And then complains that no one helps with reviewing ...
> -Daniel

Daniel makes a good point here about "communities of experts".

We need to foster those cross-vendor expert communities to sustain
Linux going forward.

For userspace components as well these communities of experts need to
exist for each domain, and we need to encourage upstream first
processes across the board for these split kernel/userspace stacks.

The habanalabs compiler backend is an LLVM fork, I'd like to see the
effort to upstream that LLVM backend into LLVM proper. When this sort
of thing happens it gets on the radar of the LLVM compiler experts,
instead of it just being the habanalabs experts. I've met so many
internal company experts who remain unchallenged internally but buckle
when introduced to true communities of expertise. If we want to keep
this thing growing and maintainable we need to tap into those existing
expertise groups. This is why it's important we foster userspace
groups. If you hear the myth that only our company understands our hw
enough to write code for it, it's been proven bullshit numerous times.
It's an excuse for retaining control.

Also if there was a shared runtime library repo with cross vendor
review, I'm betting they'd all learn a lot about how a userspace
should work and be maintained rather than assuming they knew it all
themselves.

Be more like spiderman, great maintainership + great responsibility.

Dave.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-12 21:12                               ` Dave Airlie
@ 2021-09-12 22:51                                 ` Linus Walleij
  2021-09-12 23:15                                   ` Dave Airlie
  2021-09-13 13:20                                   ` Arnd Bergmann
  0 siblings, 2 replies; 77+ messages in thread
From: Linus Walleij @ 2021-09-12 22:51 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Daniel Vetter, Greg KH, Leon Romanovsky, Laurent Pinchart,
	Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab,
	Jonathan Corbet, ksummit, dev

On Sun, Sep 12, 2021 at 11:13 PM Dave Airlie <airlied@gmail.com> wrote:

> For userspace components as well these communities of experts need to
> exist for each domain, and we need to encourage upstream first
> processes across the board for these split kernel/userspace stacks.
>
> The habanalabs compiler backend is an LLVM fork, I'd like to see the
> effort to upstream that LLVM backend into LLVM proper.

I couldn't agree more.

A big part of the problem with inference engines / NPU:s is that of no
standardized userspace. Several of the machine learning initiatives
from some years back now have stale git repositories and are
visibly unmaintained, c.f. Caffe https://github.com/BVLC/caffe
last commit 2 years ago.

In a discussion thread at LWN I raised Apache TVM as a currently
quite obviously alive and kicking community, and these people have
the ambition to provide "an open source machine learning compiler
framework for CPUs, GPUs, and machine learning accelerators".
https://tvm.apache.org/
At least they have all relevant companies logotypes on their homepage,
so there is some kind of commitment.
You can find for example from Arm an RFC for real HW accelerator code
support using (out of tree) Linux kernel drivers with Apache TVM:
https://discuss.tvm.apache.org/t/rfc-ethosn-arm-ethos-n-integration/6680

Then there is Google's TensorFlow. How open is that for a random
HW vendor who want to integrate their accelerator and how open is
it to working with the kernel community? Then there is PyTorch.
All of these apparently active. Well CPU vendors often support
two different compilers so I guess they could very well support
three machine learning userspaces, why not.

What confuses me is what kind of time horizon and longevity these
projects have, and what level of commitment is involved and
what ambition. Especially to what extent they would care about
working with the Linux kernel community. (TVM have a mail
address so I added them on CC.)

Habanalabs propose an LLVM fork as compiler, yet the Intel
logo is on the Apache TVM website, and no sign of integrating with
that project. They claim to support also TensorFlow.

The way I percieve it is that there simply isn't any GCC/LLVM or
Gallium 3D of NPU:s, these people haven't yet decided that "here
is that userspace we are all going to use". Or have they?

LLVM? TVM? TensorFlow? PyTorch? Some other one?

What worries me is that I don't see one single developer being
able to say "this one definately, and they will work with the kernel
community", and that is what we need to hear.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-12 22:51                                 ` Linus Walleij
@ 2021-09-12 23:15                                   ` Dave Airlie
  2021-09-13 13:20                                   ` Arnd Bergmann
  1 sibling, 0 replies; 77+ messages in thread
From: Dave Airlie @ 2021-09-12 23:15 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Daniel Vetter, Greg KH, Leon Romanovsky, Laurent Pinchart,
	Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab,
	Jonathan Corbet, ksummit, dev

On Mon, 13 Sept 2021 at 08:52, Linus Walleij <linus.walleij@linaro.org> wrote:
>
> On Sun, Sep 12, 2021 at 11:13 PM Dave Airlie <airlied@gmail.com> wrote:
>
> > For userspace components as well these communities of experts need to
> > exist for each domain, and we need to encourage upstream first
> > processes across the board for these split kernel/userspace stacks.
> >
> > The habanalabs compiler backend is an LLVM fork, I'd like to see the
> > effort to upstream that LLVM backend into LLVM proper.
>
> I couldn't agree more.
>
> A big part of the problem with inference engines / NPU:s is that of no
> standardized userspace. Several of the machine learning initiatives
> from some years back now have stale git repositories and are
> visibly unmaintained, c.f. Caffe https://github.com/BVLC/caffe
> last commit 2 years ago.
>
> In a discussion thread at LWN I raised Apache TVM as a currently
> quite obviously alive and kicking community, and these people have
> the ambition to provide "an open source machine learning compiler
> framework for CPUs, GPUs, and machine learning accelerators".
> https://tvm.apache.org/
> At least they have all relevant companies logotypes on their homepage,
> so there is some kind of commitment.
> You can find for example from Arm an RFC for real HW accelerator code
> support using (out of tree) Linux kernel drivers with Apache TVM:
> https://discuss.tvm.apache.org/t/rfc-ethosn-arm-ethos-n-integration/6680
>
> Then there is Google's TensorFlow. How open is that for a random
> HW vendor who want to integrate their accelerator and how open is
> it to working with the kernel community? Then there is PyTorch.
> All of these apparently active. Well CPU vendors often support
> two different compilers so I guess they could very well support
> three machine learning userspaces, why not.
>
> What confuses me is what kind of time horizon and longevity these
> projects have, and what level of commitment is involved and
> what ambition. Especially to what extent they would care about
> working with the Linux kernel community. (TVM have a mail
> address so I added them on CC.)
>
> Habanalabs propose an LLVM fork as compiler, yet the Intel
> logo is on the Apache TVM website, and no sign of integrating with
> that project. They claim to support also TensorFlow.
>
> The way I percieve it is that there simply isn't any GCC/LLVM or
> Gallium 3D of NPU:s, these people haven't yet decided that "here
> is that userspace we are all going to use". Or have they?
>
> LLVM? TVM? TensorFlow? PyTorch? Some other one?

Yeah I've been doing the same research, and there is also the Glow
project I think to add to the list.

The thing is control, everyone wants to run it, when it comes to Linux
nearly all the vendors have realised they've lost their control and
learned to live with it, but the second they are into userspace, it's
like hey we need to be in charge of every single piece of this, thus
losing the Linux kernel advantage of pooling engineering expertise
cross-vendor.

I certainly don't want to be the distro packager having to package 30
forks of LLVM for 20 different vendor accelerators with 20 runtime
APIs and 20 forks of TVM/Tensorflow/pytorch.

Enabling that behaviour by just merging kernel drivers and washing our
hands to me seems like a large misstep for the future of
maintainability of the kernel, esp as these devices start interacting
with GPUs or RDMA and we get locked into unmovable interfaces that we
can't even analyse for deadlocks etc.

Dave.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-12 19:48   ` Laurent Pinchart
@ 2021-09-13  2:26     ` Dave Airlie
  0 siblings, 0 replies; 77+ messages in thread
From: Dave Airlie @ 2021-09-13  2:26 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: Jonathan Corbet, ksummit

On Mon, 13 Sept 2021 at 05:48, Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> Hi Dave,
>
> On Mon, Sep 13, 2021 at 05:13:05AM +1000, Dave Airlie wrote:
> > On Sat, 11 Sept 2021 at 07:10, Jonathan Corbet <corbet@lwn.net> wrote:
> > >
> > > There has been a regular disagreement in recent years about whether
> > > drivers for accelerators (such as for the Habana Gaudi device) should be
> > > subject to the same requirements as GPU drivers when it comes to the
> > > availability of a free implementation of the user-space side.  It flared
> > > up again recently:
> > >
> > >    https://lwn.net/Articles/867168/
> > >
> > > Happily, the Habana situation in particular seems to be resolving
> > > itself:
> > >
> > >    https://lwn.net/ml/linux-kernel/CAFCwf119s7iXk+qpwoVPnRtOGcxeuZb3rnihf6NWWoVT-4ODHA@mail.gmail.com/
> > >
> > > But even there it is clear that the fundamental question has not yet
> > > been resolved.
> > >
> > > This seems like the sort of question that the maintainer summit exists
> > > to address.  Specifically, we could discuss:
> > >
> > >  - Under which circumstances should the kernel community require the
> > >    existence of freely licensed user-space code that exercises all
> > >    functionalities of a proposed kernel driver or feature?
> > >
> > >  - Are there internal kernel interfaces, such as DMA-BUF or P2PDMA, that
> > >    are only available to drivers with a free user-space implementation?
> > >    Do we need an EXPORT_SYMBOL_USERSPACE_GPL()?
> > >
> > >  - What constitutes an acceptable user-space implementation in cases
> > >    where these restrictions apply?
> > >
> > > I suspect that more clarity (and fewer arguments) on these questions
> > > would be welcome both within and beyond the development community.
> > >
> > > Thanks,
> >
> > Can everyone take a read of:
> >
> > https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements
> >
> > I think in order to clean the signal/noise ratio up in here, some
> > education effort might help people realise how non-trivial these
> > things are.
> >
> > 1. These drivers are not one or two ioctls that a few selftests and a
> > small test app can cover. It's like saying LTP is all we need to
> > define the uAPI for the kernel and if anyone does something LTP
> > doesn't cover the app is broken. These systems are generally complex,
> > multithreaded and multiuser uAPIs, involving command streams recorded
> > in userspace being submitted to the devices. They interact with memory
> > management and can cause unfindable deadlocks across the system if
> > designed incorrectly. Documentation or kselftests aren't going to cut
> > it here.
> >
> > 2. In my experience we don't build communities by merging everything,
> > we build them by saying No more and pushing back on companies with
> > education and cross-vendor cooperation. Responsible kernel maintenance
> > shouldn't end at the kernel boundaries. If you aren't the person to
> > help enforce a userspace for a driver you are being asked to merge,
> > then don't merge it, but try and engage the vendor with the
> > communities of interest in the kernel who already deal in those areas.
> >
> > 3. The pressures on these companies to merge things into Linux isn't
> > altruistic or even that they necessarily want to be in the Linux
> > kernel upstream. They are being told by Red Hat, Facebook, Google or
> > someone else that they need an upstream driver. They will generally
> > engage at a minimal level to get past that blockage and then
> > disengage. Having a clear set of rules (or a place to discuss those
> > rules, for new subsystems) and a gentle pushback helps develop
> > communities by unlocking funding within those larger areas. As Laurent
> > has said this isn't free, but just putting things into the kernel and
> > not caring about userspace hasn't built any Linux communities in the
> > accelerator areas.
> >
> > That said I started writing a cleaned up version of the above document
> > which is more generic that other subsystems could sign on to. I was
> > going to engage with a coalition of like-minded maintainers rather
> > than trying to gain consensus among a herd of cats to see if we can
> > draw clearer lines in the sand that cross more subsystems so the
> > experience of drivers/gpu doesn't go unwasted but also isn't just
> > bypassed by subsystem hunting.
> >
> > https://cgit.freedesktop.org/~airlied/linux/log/?h=wip-open-source-userspace
>
> Thank you for that effort. Could you add camera ISPs to the list with
> FPGAs, DSPs and ML accelerators ?

I'll add that to the next iteration, thanks.

>
> You mention Level0 in that document. I assume you don't mean the
> OpenStreetMap editor ?

https://spec.oneapi.io/level-zero/latest/core/INTRO.html

It's like a vulkan for OpenCL effort, they've already managed to put
things in the API that are close to impossible to make work on the
Linux kernel properly, again because Intel internally thought they had
better experts than the kernel, but we are trying to get that all
fixed up.

Dave.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-11 22:42             ` Steven Rostedt
  2021-09-11 23:10               ` Laurent Pinchart
@ 2021-09-13 11:10               ` Mark Brown
  1 sibling, 0 replies; 77+ messages in thread
From: Mark Brown @ 2021-09-13 11:10 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Laurent Pinchart, Thomas Gleixner, Josh Triplett,
	Mauro Carvalho Chehab, Jonathan Corbet, ksummit

[-- Attachment #1: Type: text/plain, Size: 1228 bytes --]

On Sat, Sep 11, 2021 at 06:42:05PM -0400, Steven Rostedt wrote:
> Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote:

> > If you wanted GPU drivers to have tests in tools/selftests, you'd have
> > to move Mesa to the kernel :-)

> Some selftests have dependencies. It could require that Mesa is
> installed to run the tests, otherwise it just returns "unsupported".

There are some constraints on selftests for usability reasons, adding
too many dependencies and too exotic a set of dependencies works against
that - we already disable the BPF tests by default because it is not
reasonable for people who are not actively working on BPF to be able to
get the dependencies needed for the testsuite up and running and it was
causing disruption to people trying to actually use kselftest for actual
testing.  Of course there's a balancing act here with having the tests
picked and used by people but the kernel is such a big piece of software
it seems reasonable to expect that we're not going to end up with
everything in one place, and if it's not solving a practical problem for
the people actively working with the tests it really doesn't seem like a
good use of the limited time people have to work on quality stuff.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-11  9:27       ` Laurent Pinchart
  2021-09-11 22:33         ` Mauro Carvalho Chehab
@ 2021-09-13 12:04         ` Mark Brown
  1 sibling, 0 replies; 77+ messages in thread
From: Mark Brown @ 2021-09-13 12:04 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: Mauro Carvalho Chehab, Jonathan Corbet, ksummit

[-- Attachment #1: Type: text/plain, Size: 1913 bytes --]

On Sat, Sep 11, 2021 at 12:27:37PM +0300, Laurent Pinchart wrote:
> On Sat, Sep 11, 2021 at 02:38:11AM +0200, Mauro Carvalho Chehab wrote:

> > > which we
> > > could consider as an alternative to the open userspace implementation
> > > (a topic worth discussing I believe).

> > Yeah, a public datasheet sounds an interesting requirement. It offers
> > a problem, though: maybe some details could be missed on it, which
> > would prevent any real open source userspace development.

> Absolutely, and I don't think we can come up with any process and
> technical measure that would prevent a vendor from cheating if they
> really want to. It will always be possible to hide some features behind
> reserved registers that wouldn't need to be programmed for basic
> operation but that would be crucial to optimize the quality or
> performances. This is regardless of whether we want to enforce openness
> of documentation in the form of datasheets or source code.

This is already very standard in some parts of the industry, even
between vendors and customers.  Sometimes it's done intentionally but
it's also often just that the actual practical configuration process
relies on some non-trivial test system and perhaps has as much art as
science involved.  It can also be a decision about managing support
costs which works for everyone involved on the business side - sometimes
the product being delivered includes the vendor doing a good deal of the
tuning for some combination of cost and complexity reasons.

> I'm not too concerned about this though. If we can address most of this
> issue with a clear process and message I think it would be a very good
> step forward already.

Yeah, I'm personally not so concerned about the callibration and tuning
side - ideally that would be fully open but like you say even without
that we've achieved something and there may not actually be anything
extant to open.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-12 22:51                                 ` Linus Walleij
  2021-09-12 23:15                                   ` Dave Airlie
@ 2021-09-13 13:20                                   ` Arnd Bergmann
  2021-09-13 13:54                                     ` Daniel Vetter
                                                       ` (2 more replies)
  1 sibling, 3 replies; 77+ messages in thread
From: Arnd Bergmann @ 2021-09-13 13:20 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Dave Airlie, Daniel Vetter, Greg KH, Leon Romanovsky,
	Laurent Pinchart, Thomas Gleixner, Josh Triplett,
	Mauro Carvalho Chehab, Jonathan Corbet, ksummit, dev

On Mon, Sep 13, 2021 at 12:51 AM Linus Walleij <linus.walleij@linaro.org> wrote:
> On Sun, Sep 12, 2021 at 11:13 PM Dave Airlie <airlied@gmail.com> wrote:
>
> > For userspace components as well these communities of experts need to
> > exist for each domain, and we need to encourage upstream first
> > processes across the board for these split kernel/userspace stacks.
> >
> > The habanalabs compiler backend is an LLVM fork, I'd like to see the
> > effort to upstream that LLVM backend into LLVM proper.
>
> I couldn't agree more.
>
> A big part of the problem with inference engines / NPU:s is that of no
> standardized userspace. Several of the machine learning initiatives
> from some years back now have stale git repositories and are
> visibly unmaintained, c.f. Caffe https://github.com/BVLC/caffe
> last commit 2 years ago.

Caffe as a standalone project was abandoned and merged into
PyTorch, see https://caffe2.ai/. I think this is the kind of consolidation
of those projects that you are looking for.

> Habanalabs propose an LLVM fork as compiler, yet the Intel
> logo is on the Apache TVM website, and no sign of integrating with
> that project. They claim to support also TensorFlow.
>
> The way I perceive it is that there simply isn't any GCC/LLVM or
> Gallium 3D of NPU:s, these people haven't yet decided that "here
> is that userspace we are all going to use". Or have they?
>
> LLVM? TVM? TensorFlow? PyTorch? Some other one?
>
> What worries me is that I don't see one single developer being
> able to say "this one definitely, and they will work with the kernel
> community", and that is what we need to hear.

I don't actually think this is a decision we can possibly wait for.
The ones you listed all work on different levels, some build on top
of others, and some may get replaced by new ones over time.

For a generic kernel interface, we need something that can be
supported as a back-end for multiple such libraries, and that
works on more than just one hardware. Most likely we will need
both higher-level and lower-level interfaces, so that a
framework (or an application directly) may target one interface,
but some hardware may not be able to implement this.

One straightforward hardware independent low-level API would
be the traditional BLAS GEMM call[1] for matrix multiplication
and its variants (integer, float, bfloat16, ...).  Most of the frameworks
are able to use SGEMM to do the actual calculation since that
has optimized versions for most CPUs and GPUs, and most
hardware accelerators should be able to provide an
implementation of this that doesn't completely suck. This
can be used for both inferencing and training.

On the kernel side, this could probably be done inside the
existing crypto (async), media (mem2mem), or gpu/drm
interfaces that all provide ways to offload computational
functions on blocks of memory potentially backed by a dmabuf,
but having a new top-level chardev interface may be a better
fit.

A completely different interface would something that lets you
compile a model into a hardware specific blob in user space
and then submit that blob into the kernel, using further commands
to send and receive model specific data. As I understand it,
this method is roughly what habanalabs and some of the
other ones do for inferencing. The performance is almost
certainly better here, but it requires a high degree of integration
between model, framework, user space driver, compiler and
kernel driver.
We already do similar things in the gpu, fpga and remoteproc
frameworks, all of which could be used here, or we add a more
specialized interface.

What the actual interfaces should be I have no clue, those
two are just examples of what it could be, being completely
ignorant of what drivers do today. As Dave said, this really
needs a maintainer that understands both the kernel side
and what kind of hardware and frameworks exist and
what interfaces both sides actually require.

       Arnd

[1] http://www.netlib.org/lapack/explore-html/db/dc9/group__single__blas__level3_gafe51bacb54592ff5de056acabd83c260.html

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-10 21:32 ` Josh Triplett
@ 2021-09-13 13:50   ` Christian Brauner
  2021-09-13 13:57     ` Daniel Vetter
  2021-09-14 14:40   ` Jani Nikula
  1 sibling, 1 reply; 77+ messages in thread
From: Christian Brauner @ 2021-09-13 13:50 UTC (permalink / raw)
  To: Josh Triplett; +Cc: Jonathan Corbet, ksummit

On Fri, Sep 10, 2021 at 02:32:48PM -0700, Josh Triplett wrote:
> On Fri, Sep 10, 2021 at 03:00:58PM -0600, Jonathan Corbet wrote:
> > There has been a regular disagreement in recent years about whether
> > drivers for accelerators (such as for the Habana Gaudi device) should be
> > subject to the same requirements as GPU drivers when it comes to the
> > availability of a free implementation of the user-space side.  It flared
> > up again recently:
> > 
> >    https://lwn.net/Articles/867168/
> > 
> > Happily, the Habana situation in particular seems to be resolving
> > itself:
> > 
> >    https://lwn.net/ml/linux-kernel/CAFCwf119s7iXk+qpwoVPnRtOGcxeuZb3rnihf6NWWoVT-4ODHA@mail.gmail.com/
> > 
> > But even there it is clear that the fundamental question has not yet
> > been resolved.
> > 
> > This seems like the sort of question that the maintainer summit exists
> > to address.  Specifically, we could discuss:
> > 
> >  - Under which circumstances should the kernel community require the
> >    existence of freely licensed user-space code that exercises all
> >    functionalities of a proposed kernel driver or feature?
> 
> I think it'd be reasonable to ask, as well: if we required this for
> *all* kernel functionality, such that we never add any userspace
> interface to the kernel unless there's *some* Open Source userspace that
> needs/wants it, what problems would that cause if any?
> 
> It appears that in this case the kernel pushing back has influenced the
> release of Open Source userspace code. Having a kernel-wide policy here
> seems like it'll *help* people within many companies to push for such
> changes: "We're never going to be able to get our changes into the
> upstream kernel if there's no userspace to drive them."

I can certainly see why that discussion is needed for features that deal
with hardware which requires an elaborate userspace component in order
to work.
But I'm not convinced this policy makes sense for all kernel features.
For example, when we introduce a new general api in kernel core it will
often be driven by requirements of other well-known open source
projects. If such projects state that they will add support for it once
a kernel supporting this feature is released that expression of their
intent is often sufficient. We usually don't make such projects jump
through the hoops of implementing the userspace side upfront to proof
that they would use it. Although to the credit of a few open source
projects that does also happen. But I'm hesitant to make this a general
rule.

Christian

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-13 13:20                                   ` Arnd Bergmann
@ 2021-09-13 13:54                                     ` Daniel Vetter
  2021-09-13 22:04                                       ` Arnd Bergmann
  2021-09-13 14:52                                     ` James Bottomley
  2021-09-14 13:07                                     ` Linus Walleij
  2 siblings, 1 reply; 77+ messages in thread
From: Daniel Vetter @ 2021-09-13 13:54 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Linus Walleij, Dave Airlie, Greg KH, Leon Romanovsky,
	Laurent Pinchart, Thomas Gleixner, Josh Triplett,
	Mauro Carvalho Chehab, Jonathan Corbet, ksummit, dev

On Mon, Sep 13, 2021 at 3:20 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Mon, Sep 13, 2021 at 12:51 AM Linus Walleij <linus.walleij@linaro.org> wrote:
> > On Sun, Sep 12, 2021 at 11:13 PM Dave Airlie <airlied@gmail.com> wrote:
> >
> > > For userspace components as well these communities of experts need to
> > > exist for each domain, and we need to encourage upstream first
> > > processes across the board for these split kernel/userspace stacks.
> > >
> > > The habanalabs compiler backend is an LLVM fork, I'd like to see the
> > > effort to upstream that LLVM backend into LLVM proper.
> >
> > I couldn't agree more.
> >
> > A big part of the problem with inference engines / NPU:s is that of no
> > standardized userspace. Several of the machine learning initiatives
> > from some years back now have stale git repositories and are
> > visibly unmaintained, c.f. Caffe https://github.com/BVLC/caffe
> > last commit 2 years ago.
>
> Caffe as a standalone project was abandoned and merged into
> PyTorch, see https://caffe2.ai/. I think this is the kind of consolidation
> of those projects that you are looking for.
>
> > Habanalabs propose an LLVM fork as compiler, yet the Intel
> > logo is on the Apache TVM website, and no sign of integrating with
> > that project. They claim to support also TensorFlow.
> >
> > The way I perceive it is that there simply isn't any GCC/LLVM or
> > Gallium 3D of NPU:s, these people haven't yet decided that "here
> > is that userspace we are all going to use". Or have they?
> >
> > LLVM? TVM? TensorFlow? PyTorch? Some other one?
> >
> > What worries me is that I don't see one single developer being
> > able to say "this one definitely, and they will work with the kernel
> > community", and that is what we need to hear.
>
> I don't actually think this is a decision we can possibly wait for.
> The ones you listed all work on different levels, some build on top
> of others, and some may get replaced by new ones over time.
>
> For a generic kernel interface, we need something that can be
> supported as a back-end for multiple such libraries, and that
> works on more than just one hardware. Most likely we will need
> both higher-level and lower-level interfaces, so that a
> framework (or an application directly) may target one interface,
> but some hardware may not be able to implement this.
>
> One straightforward hardware independent low-level API would
> be the traditional BLAS GEMM call[1] for matrix multiplication
> and its variants (integer, float, bfloat16, ...).  Most of the frameworks
> are able to use SGEMM to do the actual calculation since that
> has optimized versions for most CPUs and GPUs, and most
> hardware accelerators should be able to provide an
> implementation of this that doesn't completely suck. This
> can be used for both inferencing and training.

I think BLAS are too high-level for these. Sure fore perfect speed the
vendor probably wants to have their own BLAS thing, their own NN
optmizer and a heap of other things, but for the low-level userspace
we're talking about here that pretty much doesn't matter. I think a
really good example of this is the compute stack Intel is building:
- level0 is the absolute bare-bones low level driver. For this
discussion here that's enough of a userspace to make at least Dave&me
happy. In 3d this would be vulkan. In AI/NN space, there's nothing
here, at least nothing cross-vendor.
- Then there's the entire OneApi ecosystem on top. Lots of this is
open, some of it is closed, but from the pov of an accel stack it's
all looking like applications, not like driver code. BLAS is sitting
here. For AI/NN this is pytorch, tensorflow and all these higher-level
frameworks (which often have quite sophisticated optimizers of their
won)
- then there's funny intermediate apis like opencl, where the state of
the art is still to implement them directly as userspace drivers on
top of the kernel. Although on the 3d side at least we're getting to a
point where opengl on top of  vulkan is impressively close to an
optimized driver. But for know it's still mostly custom. This is what
AI/NN drivers generally look like, with the high-level library fused
together with the backend. Or the backend being an out-of-tree fork
(which is pretty much always an llvm fork for the compiler side).

Especially BLAS isn't the most impressive, since largely it's fused
multiple-add benchmark and not much else. Ok, enormous amounts of
tuning to perfectly exploit the execution bw and interconnect/cache
hierarchy of your chip, whatever it is. That's often something vendors
don't like sharing (intel's math kernels are still closed afaik)
because it leaks a bit much about actual implementation details of the
chip as opposed to how it's programmed. Also not something I really
care about with my maintainer hat on.

> On the kernel side, this could probably be done inside the
> existing crypto (async), media (mem2mem), or gpu/drm
> interfaces that all provide ways to offload computational
> functions on blocks of memory potentially backed by a dmabuf,
> but having a new top-level chardev interface may be a better
> fit.
>
> A completely different interface would something that lets you
> compile a model into a hardware specific blob in user space
> and then submit that blob into the kernel, using further commands
> to send and receive model specific data. As I understand it,
> this method is roughly what habanalabs and some of the
> other ones do for inferencing. The performance is almost
> certainly better here, but it requires a high degree of integration
> between model, framework, user space driver, compiler and
> kernel driver.
> We already do similar things in the gpu, fpga and remoteproc
> frameworks, all of which could be used here, or we add a more
> specialized interface.

Not even the interface matters that much, there's very little the
3d/compute gpu drivers share there. It's the community of experts that
matters, and the cross-vendor userspace project.

> What the actual interfaces should be I have no clue, those
> two are just examples of what it could be, being completely
> ignorant of what drivers do today. As Dave said, this really
> needs a maintainer that understands both the kernel side
> and what kind of hardware and frameworks exist and
> what interfaces both sides actually require.

So yeah, agreeing here.
-Daniel



>        Arnd
>
> [1] http://www.netlib.org/lapack/explore-html/db/dc9/group__single__blas__level3_gafe51bacb54592ff5de056acabd83c260.html



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-13 13:50   ` Christian Brauner
@ 2021-09-13 13:57     ` Daniel Vetter
  2021-09-14  2:07       ` Laurent Pinchart
  0 siblings, 1 reply; 77+ messages in thread
From: Daniel Vetter @ 2021-09-13 13:57 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Josh Triplett, Jonathan Corbet, ksummit

On Mon, Sep 13, 2021 at 3:50 PM Christian Brauner
<christian.brauner@ubuntu.com> wrote:
> On Fri, Sep 10, 2021 at 02:32:48PM -0700, Josh Triplett wrote:
> > On Fri, Sep 10, 2021 at 03:00:58PM -0600, Jonathan Corbet wrote:
> > > There has been a regular disagreement in recent years about whether
> > > drivers for accelerators (such as for the Habana Gaudi device) should be
> > > subject to the same requirements as GPU drivers when it comes to the
> > > availability of a free implementation of the user-space side.  It flared
> > > up again recently:
> > >
> > >    https://lwn.net/Articles/867168/
> > >
> > > Happily, the Habana situation in particular seems to be resolving
> > > itself:
> > >
> > >    https://lwn.net/ml/linux-kernel/CAFCwf119s7iXk+qpwoVPnRtOGcxeuZb3rnihf6NWWoVT-4ODHA@mail.gmail.com/
> > >
> > > But even there it is clear that the fundamental question has not yet
> > > been resolved.
> > >
> > > This seems like the sort of question that the maintainer summit exists
> > > to address.  Specifically, we could discuss:
> > >
> > >  - Under which circumstances should the kernel community require the
> > >    existence of freely licensed user-space code that exercises all
> > >    functionalities of a proposed kernel driver or feature?
> >
> > I think it'd be reasonable to ask, as well: if we required this for
> > *all* kernel functionality, such that we never add any userspace
> > interface to the kernel unless there's *some* Open Source userspace that
> > needs/wants it, what problems would that cause if any?
> >
> > It appears that in this case the kernel pushing back has influenced the
> > release of Open Source userspace code. Having a kernel-wide policy here
> > seems like it'll *help* people within many companies to push for such
> > changes: "We're never going to be able to get our changes into the
> > upstream kernel if there's no userspace to drive them."
>
> I can certainly see why that discussion is needed for features that deal
> with hardware which requires an elaborate userspace component in order
> to work.
> But I'm not convinced this policy makes sense for all kernel features.
> For example, when we introduce a new general api in kernel core it will
> often be driven by requirements of other well-known open source
> projects. If such projects state that they will add support for it once
> a kernel supporting this feature is released that expression of their
> intent is often sufficient. We usually don't make such projects jump
> through the hoops of implementing the userspace side upfront to proof
> that they would use it. Although to the credit of a few open source
> projects that does also happen. But I'm hesitant to make this a general
> rule.

I agree it's an orthogonal discussion, but I think we've also had our
fair share of fully generic interface that turned out to miss the mark
in real-world usage. This is why the generic kernel
modesetting/display interface for drivers in drivers/gpu also needs
fully open implementation. Not because we really need that for
long-term maintainability - the interfaces are generally well-defined
enough that testcases + docs are sufficient for that, but because in
practices it just catches so many small gotchas that are otherwise
overlooked in good generic uapi design.

But I do think we should keep this apart from the discussions for hw
drivers, where 80+% of the driver that's absolutely needed to drive
the hardware is in userspace.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-12 14:34                         ` Greg KH
                                             ` (2 preceding siblings ...)
  2021-09-12 20:41                           ` Dave Airlie
@ 2021-09-13 14:03                           ` Mark Brown
  3 siblings, 0 replies; 77+ messages in thread
From: Mark Brown @ 2021-09-13 14:03 UTC (permalink / raw)
  To: Greg KH
  Cc: Leon Romanovsky, Laurent Pinchart, Thomas Gleixner,
	Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit

[-- Attachment #1: Type: text/plain, Size: 1773 bytes --]

On Sun, Sep 12, 2021 at 04:34:48PM +0200, Greg KH wrote:
> On Sun, Sep 12, 2021 at 05:15:30PM +0300, Leon Romanovsky wrote:

> > https://lore.kernel.org/all/YT2zryAKHc%2F5R2IH@unreal/
> > "To be used" means some open PR to existing package or request for
> > inclusion for new packages.

> But again, distros will not take things that are not already in the
> kernel.

Or, mainly for the community distros which are open to people
volunteering to package things, can't be relied on to do validation
beyond checking that the package is distributable and that the installed
files integrate into the distro in roughly the right form.  That's not
really a meaningful form of back pressure from our point of view.

> > > But how do you define Android's userspace?  Just one vendor?  2 vendors?
> > > 10 vendors?  There is major userspace fragmentation in Android userspace
> > > in many places, the user/kernel boundry being one of the big ones as
> > > many of us have found out over the past years.  And many of us are
> > > working to resolve this, but it's not so simple at times, and I have
> > > many examples if you want specifics.

> > Lauerent suggested AOSP
> > https://lore.kernel.org/all/YTyWANV%2FmSkQbYhj@pendragon.ideasonboard.com/

> Vendors can not get code into AOSP for various reasons that only Google
> understands.  There are many millions, if not billions of Android
> devices out there with user/kernel apis that are not upstream nor in
> AOSP because Google doesn't want to take them, or because the vendor can
> not go through those hoops (international law is tricky at times...)

Right, if you're not one of the main SoC vendors working on something
that's a main application of Android it can be very hard to get anyone
to give you the time of day.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-13 13:20                                   ` Arnd Bergmann
  2021-09-13 13:54                                     ` Daniel Vetter
@ 2021-09-13 14:52                                     ` James Bottomley
  2021-09-14 13:07                                     ` Linus Walleij
  2 siblings, 0 replies; 77+ messages in thread
From: James Bottomley @ 2021-09-13 14:52 UTC (permalink / raw)
  To: Arnd Bergmann, Linus Walleij
  Cc: Dave Airlie, Daniel Vetter, Greg KH, Leon Romanovsky,
	Laurent Pinchart, Thomas Gleixner, Josh Triplett,
	Mauro Carvalho Chehab, Jonathan Corbet, ksummit, dev

On Mon, 2021-09-13 at 15:20 +0200, Arnd Bergmann wrote:
> On Mon, Sep 13, 2021 at 12:51 AM Linus Walleij <
> linus.walleij@linaro.org> wrote:
> > On Sun, Sep 12, 2021 at 11:13 PM Dave Airlie <airlied@gmail.com>
> > wrote:
> > 
> > > For userspace components as well these communities of experts
> > > need to exist for each domain, and we need to encourage upstream
> > > first processes across the board for these split kernel/userspace
> > > stacks.
> > > 
> > > The habanalabs compiler backend is an LLVM fork, I'd like to see
> > > the effort to upstream that LLVM backend into LLVM proper.
> > 
> > I couldn't agree more.
> > 
> > A big part of the problem with inference engines / NPU:s is that of
> > no standardized userspace. Several of the machine learning
> > initiatives from some years back now have stale git repositories
> > and are visibly unmaintained, c.f. Caffe 
> > https://github.com/BVLC/caffe last commit 2 years ago.
> 
> Caffe as a standalone project was abandoned and merged into
> PyTorch, see https://caffe2.ai/. I think this is the kind of
> consolidation of those projects that you are looking for.
> 
> > Habanalabs propose an LLVM fork as compiler, yet the Intel
> > logo is on the Apache TVM website, and no sign of integrating with
> > that project. They claim to support also TensorFlow.
> > 
> > The way I perceive it is that there simply isn't any GCC/LLVM or
> > Gallium 3D of NPU:s, these people haven't yet decided that "here
> > is that userspace we are all going to use". Or have they?
> > 
> > LLVM? TVM? TensorFlow? PyTorch? Some other one?
> > 
> > What worries me is that I don't see one single developer being
> > able to say "this one definitely, and they will work with the
> > kernel community", and that is what we need to hear.
> 
> I don't actually think this is a decision we can possibly wait for.
> The ones you listed all work on different levels, some build on top
> of others, and some may get replaced by new ones over time.

I cut all the interesting design stuff because there's a meta problem
here: we seem to be charting a course based on the idea we have to get
the userspace API right first time.  We really don't, we have to make a
reasonable effort to get it right, but we can go around for a v2 if we
fail ... that's the whole point about open source: fail fast and redo. 
No-one can really design an API without seeing how the users actually
use it.  When we do get it right first time, it's more by luck than
judgment, so we should expect failure more often than not.  The trick
to a successful API is usually finding what the minimal set of
operations is and implementing that.  If you think about bells and
whistles first (as 95% of API design documents do tend to) you usually
fail.

Completely new APIs with producer consumer interlock always have this
failure problem, because in a blue sky environment, neither the
producer nor consumer knows exactly what they want the first time
around ... they usually have to try a couple of times to figure out
what works and what doesn't.  What we have to enable is this fast
iteration while they work it out.  API versioning is usually a good
beginning to this ...

There's also nothing wrong with recommending existing interfaces and
seeing how that works because existing patterns are there for a reason.

James





^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-13 13:54                                     ` Daniel Vetter
@ 2021-09-13 22:04                                       ` Arnd Bergmann
  2021-09-13 23:33                                         ` Dave Airlie
  0 siblings, 1 reply; 77+ messages in thread
From: Arnd Bergmann @ 2021-09-13 22:04 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Arnd Bergmann, Linus Walleij, Dave Airlie, Greg KH,
	Leon Romanovsky, Laurent Pinchart, Thomas Gleixner,
	Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit,
	dev

>n Mon, Sep 13, 2021 at 3:54 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:

> > One straightforward hardware independent low-level API would
> > be the traditional BLAS GEMM call[1] for matrix multiplication
> > and its variants (integer, float, bfloat16, ...).  Most of the frameworks
> > are able to use SGEMM to do the actual calculation since that
> > has optimized versions for most CPUs and GPUs, and most
> > hardware accelerators should be able to provide an
> > implementation of this that doesn't completely suck. This
> > can be used for both inferencing and training.
>
> I think BLAS are too high-level for these. Sure fore perfect speed the
> vendor probably wants to have their own BLAS thing, their own NN
> optmizer and a heap of other things, but for the low-level userspace
> we're talking about here that pretty much doesn't matter.

I suppose high-level vs low-level is not the correct distinction here,
it's more like fixed-function vs programmable.

As a fixed-function interface, something like GEMM is probably as
low-level as you would want to get, as it's big enough to make sense
as a single atomic command, but small enough to be able to build on
top of it.

> I think a really good example of this is the compute stack Intel is building:
> - level0 is the absolute bare-bones low level driver. For this
> discussion here that's enough of a userspace to make at least Dave&me
> happy. In 3d this would be vulkan. In AI/NN space, there's nothing
> here, at least nothing cross-vendor.
> - Then there's the entire OneApi ecosystem on top. Lots of this is
> open, some of it is closed, but from the pov of an accel stack it's
> all looking like applications, not like driver code. BLAS is sitting
> here. For AI/NN this is pytorch, tensorflow and all these higher-level
> frameworks (which often have quite sophisticated optimizers of their
> won)

Looking at OneAPI, I see a BLAS implementation (oneMKL) next to
somewhat higher-level abstraction (oneDNN). Which of the two are
the generic frameworks (pytorch/tensorflow/...) built on top of?

The oneDNN interface looks like it could be implemented not only on
top of level0 but also layered above some BLAS library or as a thin
wrapper above a fixed-function kernel interface that provides similar
high-level abstractions. Is that a correct understanding? It also seems
like this is similar in purpose to Apple's BNNS library.

> Especially BLAS isn't the most impressive, since largely it's fused
> multiple-add benchmark and not much else. Ok, enormous amounts of
> tuning to perfectly exploit the execution bw and interconnect/cache
> hierarchy of your chip, whatever it is. That's often something vendors
> don't like sharing (intel's math kernels are still closed afaik)
> because it leaks a bit much about actual implementation details of the
> chip as opposed to how it's programmed. Also not something I really
> care about with my maintainer hat on.

It's not /just/ benchmarks, it's actually being used directly underneath
the high-level frameworks precisely because it is simple, portable and
well optimized. If there is a higher-level interface like oneDNN that
is usable by the common frameworks, using a subset of that as a
fixed-function interface for the kernel may be a good alternative
(or at least complementary) to a fully programmable interface.

I realize that fixed-function is not fashionable on GPUs, but they
are widely used in other areas (video codecs, crypto, ...) even when
you are running precompiled code on the accelerator hardware.
This would of course replace the question of open source user space
with the question of open-source firmware, as the user side would
become mostly while the accelerator goes from dynamically created
to a firmware blob.

       Arnd

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-13 22:04                                       ` Arnd Bergmann
@ 2021-09-13 23:33                                         ` Dave Airlie
  2021-09-14  9:08                                           ` Arnd Bergmann
  0 siblings, 1 reply; 77+ messages in thread
From: Dave Airlie @ 2021-09-13 23:33 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Daniel Vetter, Linus Walleij, Greg KH, Leon Romanovsky,
	Laurent Pinchart, Thomas Gleixner, Josh Triplett,
	Mauro Carvalho Chehab, Jonathan Corbet, ksummit, dev

On Tue, 14 Sept 2021 at 08:05, Arnd Bergmann <arnd@arndb.de> wrote:
>
> >n Mon, Sep 13, 2021 at 3:54 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> > > One straightforward hardware independent low-level API would
> > > be the traditional BLAS GEMM call[1] for matrix multiplication
> > > and its variants (integer, float, bfloat16, ...).  Most of the frameworks
> > > are able to use SGEMM to do the actual calculation since that
> > > has optimized versions for most CPUs and GPUs, and most
> > > hardware accelerators should be able to provide an
> > > implementation of this that doesn't completely suck. This
> > > can be used for both inferencing and training.
> >
> > I think BLAS are too high-level for these. Sure fore perfect speed the
> > vendor probably wants to have their own BLAS thing, their own NN
> > optmizer and a heap of other things, but for the low-level userspace
> > we're talking about here that pretty much doesn't matter.
>
> I suppose high-level vs low-level is not the correct distinction here,
> it's more like fixed-function vs programmable.
>
> As a fixed-function interface, something like GEMM is probably as
> low-level as you would want to get, as it's big enough to make sense
> as a single atomic command, but small enough to be able to build on
> top of it.

The distinctions is more programming model than fixed vs programmable
in rough order of complexity

a) device is MMIO programmed and can process one thing, kernel needs
to mediate between exclusive users (big lock, initial drm subsystem)
b) device has a queue that can process untrusted userspace command
with no memory safety (old drm drivers, in-kernel command stream
parsing)
c) device has queues, contexts, memory safety, virtual address space
(newer drm drivers)
d) device has full preempt on all hw blocks, is fully coherent, can
trigger paging sanely, userspace can submit directly (pipe dream).

What the device processes is of little consequence to the kernel
driver model. the uAPI of course needs to reflect the above along with
what the device can program. Since there could be a queue for a DMA
device that isn't specificed but can be programmed to DMA random
system memory.

Devices in category (a) are the sort of things that can need kernel
interfaces like a GEMM or BLAS level, however there is no point having
an interface at that level for any of the b/c/d device. That interface
needs to be in userspace somewhere, level0 or something like is
probably where things will end up, and the type (a) devices will die
out.

> I realize that fixed-function is not fashionable on GPUs, but they
> are widely used in other areas (video codecs, crypto, ...) even when
> you are running precompiled code on the accelerator hardware.
> This would of course replace the question of open source user space
> with the question of open-source firmware, as the user side would
> become mostly while the accelerator goes from dynamically created
> to a firmware blob.

We have lots of fixed function on GPUs, video codecs are on most x86
GPUs. It's how you program them that matters, most of them are behind
queues similar to the 3D engine, so you program them the same way.

What isn't fashionable on GPUs is programmable blocks that are single
user that only the kernel can program one user on at a time, since hw
has long since left that model as desirable. There are some AI
accelerators going doing the same path, but eventually they'll have to
be shareable and catch up with GPU programming models to remain
competitive.

Dave.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-13 13:57     ` Daniel Vetter
@ 2021-09-14  2:07       ` Laurent Pinchart
  0 siblings, 0 replies; 77+ messages in thread
From: Laurent Pinchart @ 2021-09-14  2:07 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Christian Brauner, Josh Triplett, Jonathan Corbet, ksummit

Hi Daniel,

On Mon, Sep 13, 2021 at 03:57:25PM +0200, Daniel Vetter wrote:
> On Mon, Sep 13, 2021 at 3:50 PM Christian Brauner wrote:
> > On Fri, Sep 10, 2021 at 02:32:48PM -0700, Josh Triplett wrote:
> > > On Fri, Sep 10, 2021 at 03:00:58PM -0600, Jonathan Corbet wrote:
> > > > There has been a regular disagreement in recent years about whether
> > > > drivers for accelerators (such as for the Habana Gaudi device) should be
> > > > subject to the same requirements as GPU drivers when it comes to the
> > > > availability of a free implementation of the user-space side.  It flared
> > > > up again recently:
> > > >
> > > >    https://lwn.net/Articles/867168/
> > > >
> > > > Happily, the Habana situation in particular seems to be resolving
> > > > itself:
> > > >
> > > >    https://lwn.net/ml/linux-kernel/CAFCwf119s7iXk+qpwoVPnRtOGcxeuZb3rnihf6NWWoVT-4ODHA@mail.gmail.com/
> > > >
> > > > But even there it is clear that the fundamental question has not yet
> > > > been resolved.
> > > >
> > > > This seems like the sort of question that the maintainer summit exists
> > > > to address.  Specifically, we could discuss:
> > > >
> > > >  - Under which circumstances should the kernel community require the
> > > >    existence of freely licensed user-space code that exercises all
> > > >    functionalities of a proposed kernel driver or feature?
> > >
> > > I think it'd be reasonable to ask, as well: if we required this for
> > > *all* kernel functionality, such that we never add any userspace
> > > interface to the kernel unless there's *some* Open Source userspace that
> > > needs/wants it, what problems would that cause if any?
> > >
> > > It appears that in this case the kernel pushing back has influenced the
> > > release of Open Source userspace code. Having a kernel-wide policy here
> > > seems like it'll *help* people within many companies to push for such
> > > changes: "We're never going to be able to get our changes into the
> > > upstream kernel if there's no userspace to drive them."
> >
> > I can certainly see why that discussion is needed for features that deal
> > with hardware which requires an elaborate userspace component in order
> > to work.
> > But I'm not convinced this policy makes sense for all kernel features.
> > For example, when we introduce a new general api in kernel core it will
> > often be driven by requirements of other well-known open source
> > projects. If such projects state that they will add support for it once
> > a kernel supporting this feature is released that expression of their
> > intent is often sufficient. We usually don't make such projects jump
> > through the hoops of implementing the userspace side upfront to proof
> > that they would use it. Although to the credit of a few open source
> > projects that does also happen. But I'm hesitant to make this a general
> > rule.
> 
> I agree it's an orthogonal discussion, but I think we've also had our
> fair share of fully generic interface that turned out to miss the mark
> in real-world usage. This is why the generic kernel
> modesetting/display interface for drivers in drivers/gpu also needs
> fully open implementation. Not because we really need that for
> long-term maintainability - the interfaces are generally well-defined
> enough that testcases + docs are sufficient for that, but because in
> practices it just catches so many small gotchas that are otherwise
> overlooked in good generic uapi design.

I concur here. I've spent the past 3 years working on libcamera in
userspace after a decade of experience in the kernel side of V4L2. It
was an enlightening (and embarassing) moment to realize that some kernel
APIs that I had designed myself didn't stand the test of being used for
real. A test application written to test an API in the way the API was
designed will generally not be good at finding design flaws.

> But I do think we should keep this apart from the discussions for hw
> drivers, where 80+% of the driver that's absolutely needed to drive
> the hardware is in userspace.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-13 23:33                                         ` Dave Airlie
@ 2021-09-14  9:08                                           ` Arnd Bergmann
  2021-09-14  9:23                                             ` Daniel Vetter
  2021-09-14 15:43                                             ` Luck, Tony
  0 siblings, 2 replies; 77+ messages in thread
From: Arnd Bergmann @ 2021-09-14  9:08 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Arnd Bergmann, Daniel Vetter, Linus Walleij, Greg KH,
	Leon Romanovsky, Laurent Pinchart, Thomas Gleixner,
	Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit,
	dev

On Tue, Sep 14, 2021 at 1:33 AM Dave Airlie <airlied@gmail.com> wrote:
> On Tue, 14 Sept 2021 at 08:05, Arnd Bergmann <arnd@arndb.de> wrote:
> > >On Mon, Sep 13, 2021 at 3:54 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > I think BLAS are too high-level for these. Sure fore perfect speed the
> > > vendor probably wants to have their own BLAS thing, their own NN
> > > optmizer and a heap of other things, but for the low-level userspace
> > > we're talking about here that pretty much doesn't matter.
> >
> > I suppose high-level vs low-level is not the correct distinction here,
> > it's more like fixed-function vs programmable.
> >
> > As a fixed-function interface, something like GEMM is probably as
> > low-level as you would want to get, as it's big enough to make sense
> > as a single atomic command, but small enough to be able to build on
> > top of it.
>
> The distinctions is more programming model than fixed vs programmable
> in rough order of complexity
>
> a) device is MMIO programmed and can process one thing, kernel needs
> to mediate between exclusive users (big lock, initial drm subsystem)
> b) device has a queue that can process untrusted userspace command
> with no memory safety (old drm drivers, in-kernel command stream
> parsing)
> c) device has queues, contexts, memory safety, virtual address space
> (newer drm drivers)
> d) device has full preempt on all hw blocks, is fully coherent, can
> trigger paging sanely, userspace can submit directly (pipe dream).
>
> What the device processes is of little consequence to the kernel
> driver model. the uAPI of course needs to reflect the above along with
> what the device can program. Since there could be a queue for a DMA
> device that isn't specificed but can be programmed to DMA random
> system memory.

Thank you for the useful overview!

> Devices in category (a) are the sort of things that can need kernel
> interfaces like a GEMM or BLAS level, however there is no point having
> an interface at that level for any of the b/c/d device. That interface
> needs to be in userspace somewhere, level0 or something like is
> probably where things will end up, and the type (a) devices will die
> out.

I can see two reasons why one would want to support a type (a)
interface even with the more versatile devices:

- It can be done in a generic way so that simply adding a kernel
  driver and loading some firmware into it makes existing user space
  software work out of the box.

- It gives the manufacturer a way to get an upstream kernel driver
  without open sourcing their firmware (a.k.a. compiler and user
  space driver). Whether you consider this a good or bad thing is
  of course a matter of perspective.

> > I realize that fixed-function is not fashionable on GPUs, but they
> > are widely used in other areas (video codecs, crypto, ...) even when
> > you are running precompiled code on the accelerator hardware.
> > This would of course replace the question of open source user space
> > with the question of open-source firmware, as the user side would
> > become mostly while the accelerator goes from dynamically created
> > to a firmware blob.
>
> We have lots of fixed function on GPUs, video codecs are on most x86
> GPUs. It's how you program them that matters, most of them are behind
> queues similar to the 3D engine, so you program them the same way.

So these would go through /dev/dri instead of /dev/media0? I can definitely
see a lot of codec drivers in the kernel that use a /dev/media interfaces,
and the tradeoffs between those two seem very similar to the tradeoffs
you get for machine learning accelerators.

> What isn't fashionable on GPUs is programmable blocks that are single
> user that only the kernel can program one user on at a time, since hw
> has long since left that model as desirable. There are some AI
> accelerators going doing the same path, but eventually they'll have to
> be shareable and catch up with GPU programming models to remain
> competitive.

I'm not convinced by this at all. While I totally understand this argument
for GPUs and general-purpose users (phone, PC, server, ...), I also see
a lot of cheap SoC hardware with much simpler requirements. If the chip
is built for an embedded application (face detection, smart speaker, ...)
you would never need to have two processes access the same
accelerator hardware, or even just load a new model into it after
boot. Adding any complexity to the hardware increases the cost, so
you would only do it if absolutely necessary, or if the cheapest
off-the-shelf solution already includes it.

           Arnd

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-14  9:08                                           ` Arnd Bergmann
@ 2021-09-14  9:23                                             ` Daniel Vetter
  2021-09-14 10:47                                               ` Laurent Pinchart
  2021-09-14 12:58                                               ` Arnd Bergmann
  2021-09-14 15:43                                             ` Luck, Tony
  1 sibling, 2 replies; 77+ messages in thread
From: Daniel Vetter @ 2021-09-14  9:23 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Dave Airlie, Linus Walleij, Greg KH, Leon Romanovsky,
	Laurent Pinchart, Thomas Gleixner, Josh Triplett,
	Mauro Carvalho Chehab, Jonathan Corbet, ksummit, dev

On Tue, Sep 14, 2021 at 11:09 AM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Tue, Sep 14, 2021 at 1:33 AM Dave Airlie <airlied@gmail.com> wrote:
> > On Tue, 14 Sept 2021 at 08:05, Arnd Bergmann <arnd@arndb.de> wrote:
> > > >On Mon, Sep 13, 2021 at 3:54 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > > I think BLAS are too high-level for these. Sure fore perfect speed the
> > > > vendor probably wants to have their own BLAS thing, their own NN
> > > > optmizer and a heap of other things, but for the low-level userspace
> > > > we're talking about here that pretty much doesn't matter.
> > >
> > > I suppose high-level vs low-level is not the correct distinction here,
> > > it's more like fixed-function vs programmable.
> > >
> > > As a fixed-function interface, something like GEMM is probably as
> > > low-level as you would want to get, as it's big enough to make sense
> > > as a single atomic command, but small enough to be able to build on
> > > top of it.
> >
> > The distinctions is more programming model than fixed vs programmable
> > in rough order of complexity
> >
> > a) device is MMIO programmed and can process one thing, kernel needs
> > to mediate between exclusive users (big lock, initial drm subsystem)

I think even for these you might want a drm style uapi, where
drm/sched takes different jobs and hammers them into hw in a kernel
thread. Ofc it all depends what the programming model is, and
something more fixed like media might make sense.

> > b) device has a queue that can process untrusted userspace command
> > with no memory safety (old drm drivers, in-kernel command stream
> > parsing)
> > c) device has queues, contexts, memory safety, virtual address space
> > (newer drm drivers)
> > d) device has full preempt on all hw blocks, is fully coherent, can
> > trigger paging sanely, userspace can submit directly (pipe dream).
> >
> > What the device processes is of little consequence to the kernel
> > driver model. the uAPI of course needs to reflect the above along with
> > what the device can program. Since there could be a queue for a DMA
> > device that isn't specificed but can be programmed to DMA random
> > system memory.
>
> Thank you for the useful overview!
>
> > Devices in category (a) are the sort of things that can need kernel
> > interfaces like a GEMM or BLAS level, however there is no point having
> > an interface at that level for any of the b/c/d device. That interface
> > needs to be in userspace somewhere, level0 or something like is
> > probably where things will end up, and the type (a) devices will die
> > out.
>
> I can see two reasons why one would want to support a type (a)
> interface even with the more versatile devices:
>
> - It can be done in a generic way so that simply adding a kernel
>   driver and loading some firmware into it makes existing user space
>   software work out of the box.
>
> - It gives the manufacturer a way to get an upstream kernel driver
>   without open sourcing their firmware (a.k.a. compiler and user
>   space driver). Whether you consider this a good or bad thing is
>   of course a matter of perspective.

I think for some embedded use-case this makes sense, especially around
media stuff.

I don't think it's BLAS, because on the compute side you really want a
compiler that sees through the entire thing and can optimize it. Afaik
BLAS is for some quick prototype of matrix algorithms and most
importantly, for the top500 list :-)

> > > I realize that fixed-function is not fashionable on GPUs, but they
> > > are widely used in other areas (video codecs, crypto, ...) even when
> > > you are running precompiled code on the accelerator hardware.
> > > This would of course replace the question of open source user space
> > > with the question of open-source firmware, as the user side would
> > > become mostly while the accelerator goes from dynamically created
> > > to a firmware blob.
> >
> > We have lots of fixed function on GPUs, video codecs are on most x86
> > GPUs. It's how you program them that matters, most of them are behind
> > queues similar to the 3D engine, so you program them the same way.
>
> So these would go through /dev/dri instead of /dev/media0? I can definitely
> see a lot of codec drivers in the kernel that use a /dev/media interfaces,
> and the tradeoffs between those two seem very similar to the tradeoffs
> you get for machine learning accelerators.

Yeah we have plenty of codes running on top of /dev/dri0, with all the
magic in userspace.

They are all very far away from anything that is a machine learning accelerator.

> > What isn't fashionable on GPUs is programmable blocks that are single
> > user that only the kernel can program one user on at a time, since hw
> > has long since left that model as desirable. There are some AI
> > accelerators going doing the same path, but eventually they'll have to
> > be shareable and catch up with GPU programming models to remain
> > competitive.
>
> I'm not convinced by this at all. While I totally understand this argument
> for GPUs and general-purpose users (phone, PC, server, ...), I also see
> a lot of cheap SoC hardware with much simpler requirements. If the chip
> is built for an embedded application (face detection, smart speaker, ...)
> you would never need to have two processes access the same
> accelerator hardware, or even just load a new model into it after
> boot. Adding any complexity to the hardware increases the cost, so
> you would only do it if absolutely necessary, or if the cheapest
> off-the-shelf solution already includes it.

Yeah for those I think a more fixed uapi like drivers/media has a lot
of makes sense. What I don't like is when vendors then use that excuse
of "oh you only upload a fixed model at boot" to shovel in an acccel
driver with full generic interface, but not all the userspace
bits&pieces. There's unfortunately another accel driver in
drivers/misc for qualcom soc, which really should be either a media
driver (for the fixed function use-case) or a drm driver (for the
fully programmable) use-case.

I think for the fixed-function interface case you can also make a
reasonable argument that just documenting that fixed interface and all
the parameters is good enough. But as soon as the interface becomes a
generic "submit workload" style thing because you want to make it work
for an entire set of "firmware" compiled by your closed stack, that's
out of the window.

So yeah there's another driver in misc which managed to bypass review
of two subsystem, not just one :-/
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-14  9:23                                             ` Daniel Vetter
@ 2021-09-14 10:47                                               ` Laurent Pinchart
  2021-09-14 12:58                                               ` Arnd Bergmann
  1 sibling, 0 replies; 77+ messages in thread
From: Laurent Pinchart @ 2021-09-14 10:47 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Arnd Bergmann, Dave Airlie, Linus Walleij, Greg KH,
	Leon Romanovsky, Thomas Gleixner, Josh Triplett,
	Mauro Carvalho Chehab, Jonathan Corbet, ksummit, dev

On Tue, Sep 14, 2021 at 11:23:56AM +0200, Daniel Vetter wrote:
> On Tue, Sep 14, 2021 at 11:09 AM Arnd Bergmann wrote:
> > On Tue, Sep 14, 2021 at 1:33 AM Dave Airlie wrote:
> > > On Tue, 14 Sept 2021 at 08:05, Arnd Bergmann wrote:
> > > > >On Mon, Sep 13, 2021 at 3:54 PM Daniel Vetter wrote:
> > > > > I think BLAS are too high-level for these. Sure fore perfect speed the
> > > > > vendor probably wants to have their own BLAS thing, their own NN
> > > > > optmizer and a heap of other things, but for the low-level userspace
> > > > > we're talking about here that pretty much doesn't matter.
> > > >
> > > > I suppose high-level vs low-level is not the correct distinction here,
> > > > it's more like fixed-function vs programmable.
> > > >
> > > > As a fixed-function interface, something like GEMM is probably as
> > > > low-level as you would want to get, as it's big enough to make sense
> > > > as a single atomic command, but small enough to be able to build on
> > > > top of it.
> > >
> > > The distinctions is more programming model than fixed vs programmable
> > > in rough order of complexity
> > >
> > > a) device is MMIO programmed and can process one thing, kernel needs
> > > to mediate between exclusive users (big lock, initial drm subsystem)
> 
> I think even for these you might want a drm style uapi, where
> drm/sched takes different jobs and hammers them into hw in a kernel
> thread. Ofc it all depends what the programming model is, and
> something more fixed like media might make sense.

For completeness, there's a similar component in the V4L2 M2M framework,
but simpler. Jobs are executed sequentially in the order they are
received. The simplicity is mostly due to the fact that the type of
hardware V4L2 M2M supports doesn't have the ability to run multiple jobs
in parallel.

We also have ISPs that fall in this category, and use the V4L2 API in
memory-to-memory mode but without any scheduling, because context
switching doesn't exist at the hardware level and is too expensive to
implement in software. For those we restrict operation to a single
process at a time.

> > > b) device has a queue that can process untrusted userspace command
> > > with no memory safety (old drm drivers, in-kernel command stream
> > > parsing)
> > > c) device has queues, contexts, memory safety, virtual address space
> > > (newer drm drivers)
> > > d) device has full preempt on all hw blocks, is fully coherent, can
> > > trigger paging sanely, userspace can submit directly (pipe dream).
> > >
> > > What the device processes is of little consequence to the kernel
> > > driver model. the uAPI of course needs to reflect the above along with
> > > what the device can program. Since there could be a queue for a DMA
> > > device that isn't specificed but can be programmed to DMA random
> > > system memory.
> >
> > Thank you for the useful overview!
> >
> > > Devices in category (a) are the sort of things that can need kernel
> > > interfaces like a GEMM or BLAS level, however there is no point having
> > > an interface at that level for any of the b/c/d device. That interface
> > > needs to be in userspace somewhere, level0 or something like is
> > > probably where things will end up, and the type (a) devices will die
> > > out.
> >
> > I can see two reasons why one would want to support a type (a)
> > interface even with the more versatile devices:
> >
> > - It can be done in a generic way so that simply adding a kernel
> >   driver and loading some firmware into it makes existing user space
> >   software work out of the box.
> >
> > - It gives the manufacturer a way to get an upstream kernel driver
> >   without open sourcing their firmware (a.k.a. compiler and user
> >   space driver). Whether you consider this a good or bad thing is
> >   of course a matter of perspective.
> 
> I think for some embedded use-case this makes sense, especially around
> media stuff.
> 
> I don't think it's BLAS, because on the compute side you really want a
> compiler that sees through the entire thing and can optimize it. Afaik
> BLAS is for some quick prototype of matrix algorithms and most
> importantly, for the top500 list :-)
> 
> > > > I realize that fixed-function is not fashionable on GPUs, but they
> > > > are widely used in other areas (video codecs, crypto, ...) even when
> > > > you are running precompiled code on the accelerator hardware.
> > > > This would of course replace the question of open source user space
> > > > with the question of open-source firmware, as the user side would
> > > > become mostly while the accelerator goes from dynamically created
> > > > to a firmware blob.
> > >
> > > We have lots of fixed function on GPUs, video codecs are on most x86
> > > GPUs. It's how you program them that matters, most of them are behind
> > > queues similar to the 3D engine, so you program them the same way.
> >
> > So these would go through /dev/dri instead of /dev/media0? I can definitely
> > see a lot of codec drivers in the kernel that use a /dev/media interfaces,
> > and the tradeoffs between those two seem very similar to the tradeoffs
> > you get for machine learning accelerators.
> 
> Yeah we have plenty of codes running on top of /dev/dri0, with all the
> magic in userspace.
> 
> They are all very far away from anything that is a machine learning accelerator.
> 
> > > What isn't fashionable on GPUs is programmable blocks that are single
> > > user that only the kernel can program one user on at a time, since hw
> > > has long since left that model as desirable. There are some AI
> > > accelerators going doing the same path, but eventually they'll have to
> > > be shareable and catch up with GPU programming models to remain
> > > competitive.
> >
> > I'm not convinced by this at all. While I totally understand this argument
> > for GPUs and general-purpose users (phone, PC, server, ...), I also see
> > a lot of cheap SoC hardware with much simpler requirements. If the chip
> > is built for an embedded application (face detection, smart speaker, ...)
> > you would never need to have two processes access the same
> > accelerator hardware, or even just load a new model into it after
> > boot. Adding any complexity to the hardware increases the cost, so
> > you would only do it if absolutely necessary, or if the cheapest
> > off-the-shelf solution already includes it.
> 
> Yeah for those I think a more fixed uapi like drivers/media has a lot
> of makes sense. What I don't like is when vendors then use that excuse
> of "oh you only upload a fixed model at boot" to shovel in an acccel
> driver with full generic interface, but not all the userspace
> bits&pieces. There's unfortunately another accel driver in
> drivers/misc for qualcom soc, which really should be either a media
> driver (for the fixed function use-case) or a drm driver (for the
> fully programmable) use-case.
> 
> I think for the fixed-function interface case you can also make a
> reasonable argument that just documenting that fixed interface and all
> the parameters is good enough. But as soon as the interface becomes a
> generic "submit workload" style thing because you want to make it work
> for an entire set of "firmware" compiled by your closed stack, that's
> out of the window.
> 
> So yeah there's another driver in misc which managed to bypass review
> of two subsystem, not just one :-/

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-14  9:23                                             ` Daniel Vetter
  2021-09-14 10:47                                               ` Laurent Pinchart
@ 2021-09-14 12:58                                               ` Arnd Bergmann
  2021-09-14 19:45                                                 ` Daniel Vetter
  1 sibling, 1 reply; 77+ messages in thread
From: Arnd Bergmann @ 2021-09-14 12:58 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Arnd Bergmann, Dave Airlie, Linus Walleij, Greg KH,
	Leon Romanovsky, Laurent Pinchart, Thomas Gleixner,
	Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit,
	dev

On Tue, Sep 14, 2021 at 11:23 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> On Tue, Sep 14, 2021 at 11:09 AM Arnd Bergmann <arnd@arndb.de> wrote:

> > I can see two reasons why one would want to support a type (a)
> > interface even with the more versatile devices:
> >
> > - It can be done in a generic way so that simply adding a kernel
> >   driver and loading some firmware into it makes existing user space
> >   software work out of the box.
> >
> > - It gives the manufacturer a way to get an upstream kernel driver
> >   without open sourcing their firmware (a.k.a. compiler and user
> >   space driver). Whether you consider this a good or bad thing is
> >   of course a matter of perspective.
>
> I think for some embedded use-case this makes sense, especially around
> media stuff.
>
> I don't think it's BLAS, because on the compute side you really want a
> compiler that sees through the entire thing and can optimize it. Afaik
> BLAS is for some quick prototype of matrix algorithms and most
> importantly, for the top500 list :-)

It's probably not the only thing you need, but I would assume something
like sgemm and its variants are one of the building blocks you'd need
in this kind of interface. Note that oneDNN also comes with a
simplified interface similar to gemm[1] as well as straight wrapper around
gemm itself.

There are definitely frameworks that are successfully built just on top
of NumPy and blas (with NumPy itself being built on top of blas).
I used to make fun of linpack as the supercomputer benchmark that
has no practical use, but in the end it does spend most of its time in
the SGEMM function that is the most optimized algorithm in the world
and that is also where you end up spending your cycles in many AI
applications. I found a link to this blog post[2] explaining why this is still
used everywhere, and this matches what I've seen elsewhere, but
unlike me, the author seems to know what they are talking about ;-)

To get back to my own question from earlier about which part of oneAPI
is actually being used, I see that pytorch (to pick a common framework)
can use either mkl (oneMKL, BLAS) or mkldnn (dnnl, oneDNN) as a backend,
next to cuda, cudnn, openmp and certainly a number of third-party
backends.

The mkl backend seems to mostly be a wrapper around cblas_*gemm(),
though I may be reading that wrong.
The oneDNN backend operates on a higher level, calling into a
subset of the oneDNN interfaces. The other frameworks I looked at
(mxnet, tensorflow) look similar, probably each using other subsets of
oneDNN.

> > > We have lots of fixed function on GPUs, video codecs are on most x86
> > > GPUs. It's how you program them that matters, most of them are behind
> > > queues similar to the 3D engine, so you program them the same way.
> >
> > So these would go through /dev/dri instead of /dev/media0? I can definitely
> > see a lot of codec drivers in the kernel that use a /dev/media interfaces,
> > and the tradeoffs between those two seem very similar to the tradeoffs
> > you get for machine learning accelerators.
>
> Yeah we have plenty of codes running on top of /dev/dri0, with all the
> magic in userspace.
>
> They are all very far away from anything that is a machine learning accelerator.

Sure, I only meant the relation between dri codecs and media codecs
is similar to the relation between the ways one can implement the AI
accelerator APIs.

> Yeah for those I think a more fixed uapi like drivers/media has a lot
> of makes sense. What I don't like is when vendors then use that excuse
> of "oh you only upload a fixed model at boot" to shovel in an acccel
> driver with full generic interface, but not all the userspace
> bits&pieces. There's unfortunately another accel driver in
> drivers/misc for qualcom soc, which really should be either a media
> driver (for the fixed function use-case) or a drm driver (for the
> fully programmable) use-case.

I would argue that for the fixed-function use case, the media subsystem
isn't a great fit either. It would probably work just as well (as would the
crypto subsystem), but having a distinct interface that does just
one thing makes more sense conceptually, if only to make it clear
where to look for such drivers and to have a consistent interface
documentation.

> I think for the fixed-function interface case you can also make a
> reasonable argument that just documenting that fixed interface and all
> the parameters is good enough. But as soon as the interface becomes a
> generic "submit workload" style thing because you want to make it work
> for an entire set of "firmware" compiled by your closed stack, that's
> out of the window.

Right, agreed. If we add a fixed-function interface, that should ideally
not allow any vendor specific extensions at all, just a set of well-defined
operations, and certainly not a bypass mode that gets used to
send compiled binaries.

       Arnd

[1] https://oneapi-src.github.io/oneDNN/dev_guide_matmul.html
[1] https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-13 13:20                                   ` Arnd Bergmann
  2021-09-13 13:54                                     ` Daniel Vetter
  2021-09-13 14:52                                     ` James Bottomley
@ 2021-09-14 13:07                                     ` Linus Walleij
  2 siblings, 0 replies; 77+ messages in thread
From: Linus Walleij @ 2021-09-14 13:07 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Dave Airlie, Daniel Vetter, Greg KH, Leon Romanovsky,
	Laurent Pinchart, Thomas Gleixner, Josh Triplett,
	Mauro Carvalho Chehab, Jonathan Corbet, ksummit, dev

On Mon, Sep 13, 2021 at 3:20 PM Arnd Bergmann <arnd@arndb.de> wrote:

> One straightforward hardware independent low-level API would
> be the traditional BLAS GEMM call[1] for matrix multiplication
> and its variants (integer, float, bfloat16, ...).

What this (and subsequent posts from Dave and Daniel) show, is
that the general pattern is that what we are accelerating is no longer
the specialized use cases of linear algebra such as 3D "shaders"
or whatever inference linear algebra NPUs are doing, which
appear to include regression, bayesian stuff, gaussian quadrature...
name it.

What we are talking about here is acceleration, using an efficient
data path, of numerical analysis, using tailored hardware.
I'm not even sure we are limited to linear algebra anymore.

Is this what is happening, and should we be thinking numerical
analysis accelerators and their different shapes and sizes
rather than usecase-foo-accelerators, so we don't end up with
this situation again the next time applied math comes knocking
on the door with their next usecase?

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-11 21:52           ` Laurent Pinchart
@ 2021-09-14 13:22             ` Johannes Berg
  0 siblings, 0 replies; 77+ messages in thread
From: Johannes Berg @ 2021-09-14 13:22 UTC (permalink / raw)
  To: Laurent Pinchart, James Bottomley
  Cc: Jonathan Corbet, Alexandre Belloni, ksummit

On Sun, 2021-09-12 at 00:52 +0300, Laurent Pinchart wrote:
> 
> For a wireless driver the situation is possibly different, I suppose
> that if the closed-source userspace blob is there only for regulatory
> reasons, then there would be very little point in having a closed-source
> implementation with a parallel one.
> 
For the record, I know of no such thing, certainly not with an upstream
driver.

Regulatory enforcement is either done through regulatory.db{,.p7s}
loaded into the kernel (the accepted keys are determined at build time),
or, in many newer devices, by firmware.

johannes


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-10 21:32 ` Josh Triplett
  2021-09-13 13:50   ` Christian Brauner
@ 2021-09-14 14:40   ` Jani Nikula
  2021-09-14 14:45     ` Geert Uytterhoeven
  1 sibling, 1 reply; 77+ messages in thread
From: Jani Nikula @ 2021-09-14 14:40 UTC (permalink / raw)
  To: Josh Triplett, Jonathan Corbet; +Cc: ksummit

On Fri, 10 Sep 2021, Josh Triplett <josh@joshtriplett.org> wrote:
> On Fri, Sep 10, 2021 at 03:00:58PM -0600, Jonathan Corbet wrote:
>>  - What constitutes an acceptable user-space implementation in cases
>>    where these restrictions apply?
>
> This seems like it'll always be a fuzzy line. The main issue: it's OK if
> there are both open and proprietary users, but it's not OK if the only
> open implementation is an outdated or token project that nobody actually
> uses, that exists and is maintained solely for the purposes of placating
> the kernel requirement. There's no easy way to define that line, other
> than "we'll know it when we see it".

One aspect of it should be easy enough: If you have an issue with your
proprietary stack, but you can't reproduce it with the open stack, you
won't get your fix in the kernel.


BR,
Jani.


-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-14 14:40   ` Jani Nikula
@ 2021-09-14 14:45     ` Geert Uytterhoeven
  2021-09-14 14:59       ` Jani Nikula
  0 siblings, 1 reply; 77+ messages in thread
From: Geert Uytterhoeven @ 2021-09-14 14:45 UTC (permalink / raw)
  To: Jani Nikula; +Cc: Josh Triplett, Jonathan Corbet, ksummit

Hi Jani,

On Tue, Sep 14, 2021 at 4:40 PM Jani Nikula <jani.nikula@intel.com> wrote:
> On Fri, 10 Sep 2021, Josh Triplett <josh@joshtriplett.org> wrote:
> > On Fri, Sep 10, 2021 at 03:00:58PM -0600, Jonathan Corbet wrote:
> >>  - What constitutes an acceptable user-space implementation in cases
> >>    where these restrictions apply?
> >
> > This seems like it'll always be a fuzzy line. The main issue: it's OK if
> > there are both open and proprietary users, but it's not OK if the only
> > open implementation is an outdated or token project that nobody actually
> > uses, that exists and is maintained solely for the purposes of placating
> > the kernel requirement. There's no easy way to define that line, other
> > than "we'll know it when we see it".
>
> One aspect of it should be easy enough: If you have an issue with your
> proprietary stack, but you can't reproduce it with the open stack, you
> won't get your fix in the kernel.

Which basically boils down to the old mantra: before fixing a bug,
first add a new test case to trigger the bug.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-14 14:45     ` Geert Uytterhoeven
@ 2021-09-14 14:59       ` Jani Nikula
  2021-09-14 15:10         ` Geert Uytterhoeven
  0 siblings, 1 reply; 77+ messages in thread
From: Jani Nikula @ 2021-09-14 14:59 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: Josh Triplett, Jonathan Corbet, ksummit

On Tue, 14 Sep 2021, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> Hi Jani,
>
> On Tue, Sep 14, 2021 at 4:40 PM Jani Nikula <jani.nikula@intel.com> wrote:
>> On Fri, 10 Sep 2021, Josh Triplett <josh@joshtriplett.org> wrote:
>> > On Fri, Sep 10, 2021 at 03:00:58PM -0600, Jonathan Corbet wrote:
>> >>  - What constitutes an acceptable user-space implementation in cases
>> >>    where these restrictions apply?
>> >
>> > This seems like it'll always be a fuzzy line. The main issue: it's OK if
>> > there are both open and proprietary users, but it's not OK if the only
>> > open implementation is an outdated or token project that nobody actually
>> > uses, that exists and is maintained solely for the purposes of placating
>> > the kernel requirement. There's no easy way to define that line, other
>> > than "we'll know it when we see it".
>>
>> One aspect of it should be easy enough: If you have an issue with your
>> proprietary stack, but you can't reproduce it with the open stack, you
>> won't get your fix in the kernel.
>
> Which basically boils down to the old mantra: before fixing a bug,
> first add a new test case to trigger the bug.

Oh, but then the question becomes, is it enough to add a reproducer,
simplified from your proprietary stack, in your test asset, and then fix
the kernel issue? Even if it's not a problem in your open stack at all?


BR,
Jani.


-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-14 14:59       ` Jani Nikula
@ 2021-09-14 15:10         ` Geert Uytterhoeven
  0 siblings, 0 replies; 77+ messages in thread
From: Geert Uytterhoeven @ 2021-09-14 15:10 UTC (permalink / raw)
  To: Jani Nikula; +Cc: Josh Triplett, Jonathan Corbet, ksummit

Hi Jani,

On Tue, Sep 14, 2021 at 5:00 PM Jani Nikula <jani.nikula@intel.com> wrote:
> On Tue, 14 Sep 2021, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Tue, Sep 14, 2021 at 4:40 PM Jani Nikula <jani.nikula@intel.com> wrote:
> >> On Fri, 10 Sep 2021, Josh Triplett <josh@joshtriplett.org> wrote:
> >> > On Fri, Sep 10, 2021 at 03:00:58PM -0600, Jonathan Corbet wrote:
> >> >>  - What constitutes an acceptable user-space implementation in cases
> >> >>    where these restrictions apply?
> >> >
> >> > This seems like it'll always be a fuzzy line. The main issue: it's OK if
> >> > there are both open and proprietary users, but it's not OK if the only
> >> > open implementation is an outdated or token project that nobody actually
> >> > uses, that exists and is maintained solely for the purposes of placating
> >> > the kernel requirement. There's no easy way to define that line, other
> >> > than "we'll know it when we see it".
> >>
> >> One aspect of it should be easy enough: If you have an issue with your
> >> proprietary stack, but you can't reproduce it with the open stack, you
> >> won't get your fix in the kernel.
> >
> > Which basically boils down to the old mantra: before fixing a bug,
> > first add a new test case to trigger the bug.
>
> Oh, but then the question becomes, is it enough to add a reproducer,
> simplified from your proprietary stack, in your test asset, and then fix
> the kernel issue? Even if it's not a problem in your open stack at all?

I was thinking test ~ open stack.
I.e. enhance the open stack to reproduce the issue.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 77+ messages in thread

* RE: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-14  9:08                                           ` Arnd Bergmann
  2021-09-14  9:23                                             ` Daniel Vetter
@ 2021-09-14 15:43                                             ` Luck, Tony
  1 sibling, 0 replies; 77+ messages in thread
From: Luck, Tony @ 2021-09-14 15:43 UTC (permalink / raw)
  To: Arnd Bergmann, Dave Airlie
  Cc: Daniel Vetter, Linus Walleij, Greg KH, Leon Romanovsky,
	Laurent Pinchart, Thomas Gleixner, Josh Triplett,
	Mauro Carvalho Chehab, Jonathan Corbet, ksummit, dev

> d) device has full preempt on all hw blocks, is fully coherent, can
> trigger paging sanely, userspace can submit directly (pipe dream).

Not a pipe dream. Coming soon to a server near you. The Intel "ENQCMD"
instruction can be used from userspace to submit a descriptor to an
accelerator device. ENQCMD picks up the PASID from an MSR during
submission, so the device can ask the iommu to translate virtual addresses
based on the address space of the process that submitted the descriptor.

-Tony

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers
  2021-09-14 12:58                                               ` Arnd Bergmann
@ 2021-09-14 19:45                                                 ` Daniel Vetter
  0 siblings, 0 replies; 77+ messages in thread
From: Daniel Vetter @ 2021-09-14 19:45 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Dave Airlie, Linus Walleij, Greg KH, Leon Romanovsky,
	Laurent Pinchart, Thomas Gleixner, Josh Triplett,
	Mauro Carvalho Chehab, Jonathan Corbet, ksummit, dev

On Tue, Sep 14, 2021 at 2:58 PM Arnd Bergmann <arnd@arndb.de> wrote:
> On Tue, Sep 14, 2021 at 11:23 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > On Tue, Sep 14, 2021 at 11:09 AM Arnd Bergmann <arnd@arndb.de> wrote:
>
> > > I can see two reasons why one would want to support a type (a)
> > > interface even with the more versatile devices:
> > >
> > > - It can be done in a generic way so that simply adding a kernel
> > >   driver and loading some firmware into it makes existing user space
> > >   software work out of the box.
> > >
> > > - It gives the manufacturer a way to get an upstream kernel driver
> > >   without open sourcing their firmware (a.k.a. compiler and user
> > >   space driver). Whether you consider this a good or bad thing is
> > >   of course a matter of perspective.
> >
> > I think for some embedded use-case this makes sense, especially around
> > media stuff.
> >
> > I don't think it's BLAS, because on the compute side you really want a
> > compiler that sees through the entire thing and can optimize it. Afaik
> > BLAS is for some quick prototype of matrix algorithms and most
> > importantly, for the top500 list :-)
>
> It's probably not the only thing you need, but I would assume something
> like sgemm and its variants are one of the building blocks you'd need
> in this kind of interface. Note that oneDNN also comes with a
> simplified interface similar to gemm[1] as well as straight wrapper around
> gemm itself.
>
> There are definitely frameworks that are successfully built just on top
> of NumPy and blas (with NumPy itself being built on top of blas).
> I used to make fun of linpack as the supercomputer benchmark that
> has no practical use, but in the end it does spend most of its time in
> the SGEMM function that is the most optimized algorithm in the world
> and that is also where you end up spending your cycles in many AI
> applications. I found a link to this blog post[2] explaining why this is still
> used everywhere, and this matches what I've seen elsewhere, but
> unlike me, the author seems to know what they are talking about ;-)
>
> To get back to my own question from earlier about which part of oneAPI
> is actually being used, I see that pytorch (to pick a common framework)
> can use either mkl (oneMKL, BLAS) or mkldnn (dnnl, oneDNN) as a backend,
> next to cuda, cudnn, openmp and certainly a number of third-party
> backends.
>
> The mkl backend seems to mostly be a wrapper around cblas_*gemm(),
> though I may be reading that wrong.
> The oneDNN backend operates on a higher level, calling into a
> subset of the oneDNN interfaces. The other frameworks I looked at
> (mxnet, tensorflow) look similar, probably each using other subsets of
> oneDNN.

Hm I didn't know that in practice it's all just matrix multiplies in
AI land too. I thought there's more fun going on here, but I guess as
long as you have dense (enough) networks it's fully limited by the
matrix multiply step and nothing else matters. Thanks for the
references.

I still dont think BLAS is what you want, except for a very specific
NPU thing in a soc maybe that can't do anything else than actually
matrix multiplies in hw. The reason is that vendors are most likely
not going to give you the optimized kernels, and the dumb kernels are
very boring (just multiply-add in a loop). So for anything somewhat
programmable you want want level below that, or it's just not very
interesting as userspace demonstraction vehicle for your kernel
interface. Also there's generally quite some featurs in the command
streamer (inter-engine sync as just one example), so a gemm ioctl call
(or whatever you pick from blas) is definitely not what you want for
anything that has a command streamer in hw.

But I guess for the various NPUs that pop up in socs all over a
limited blas interface with documentation might be good enough.

> > > > We have lots of fixed function on GPUs, video codecs are on most x86
> > > > GPUs. It's how you program them that matters, most of them are behind
> > > > queues similar to the 3D engine, so you program them the same way.
> > >
> > > So these would go through /dev/dri instead of /dev/media0? I can definitely
> > > see a lot of codec drivers in the kernel that use a /dev/media interfaces,
> > > and the tradeoffs between those two seem very similar to the tradeoffs
> > > you get for machine learning accelerators.
> >
> > Yeah we have plenty of codes running on top of /dev/dri0, with all the
> > magic in userspace.
> >
> > They are all very far away from anything that is a machine learning accelerator.
>
> Sure, I only meant the relation between dri codecs and media codecs
> is similar to the relation between the ways one can implement the AI
> accelerator APIs.
>
> > Yeah for those I think a more fixed uapi like drivers/media has a lot
> > of makes sense. What I don't like is when vendors then use that excuse
> > of "oh you only upload a fixed model at boot" to shovel in an acccel
> > driver with full generic interface, but not all the userspace
> > bits&pieces. There's unfortunately another accel driver in
> > drivers/misc for qualcom soc, which really should be either a media
> > driver (for the fixed function use-case) or a drm driver (for the
> > fully programmable) use-case.
>
> I would argue that for the fixed-function use case, the media subsystem
> isn't a great fit either. It would probably work just as well (as would the
> crypto subsystem), but having a distinct interface that does just
> one thing makes more sense conceptually, if only to make it clear
> where to look for such drivers and to have a consistent interface
> documentation.

Yeah for tiny soc NPU a fixed interface might work out. Would need
some benchmarking to check the ioctl overhead isn't too bad, I guess
worst case the new uring ioctl stuff could be used for real fast
dispatch. I've seen an nvida npu (but not sure that shipped anywhere)
and the arm npu that Linus mentioned somewhere else with open enough
drivers to make this possible.
-Daniel

> > I think for the fixed-function interface case you can also make a
> > reasonable argument that just documenting that fixed interface and all
> > the parameters is good enough. But as soon as the interface becomes a
> > generic "submit workload" style thing because you want to make it work
> > for an entire set of "firmware" compiled by your closed stack, that's
> > out of the window.
>
> Right, agreed. If we add a fixed-function interface, that should ideally
> not allow any vendor specific extensions at all, just a set of well-defined
> operations, and certainly not a bypass mode that gets used to
> send compiled binaries.
>
>        Arnd
>
> [1] https://oneapi-src.github.io/oneDNN/dev_guide_matmul.html
> [1] https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2021-09-14 19:45 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-10 21:00 [MAINTAINER SUMMIT] User-space requirements for accelerator drivers Jonathan Corbet
2021-09-10 21:32 ` Josh Triplett
2021-09-13 13:50   ` Christian Brauner
2021-09-13 13:57     ` Daniel Vetter
2021-09-14  2:07       ` Laurent Pinchart
2021-09-14 14:40   ` Jani Nikula
2021-09-14 14:45     ` Geert Uytterhoeven
2021-09-14 14:59       ` Jani Nikula
2021-09-14 15:10         ` Geert Uytterhoeven
2021-09-10 21:51 ` James Bottomley
2021-09-10 21:59   ` Alexandre Belloni
2021-09-10 22:35     ` James Bottomley
2021-09-11 14:51       ` Jonathan Corbet
2021-09-11 15:24         ` James Bottomley
2021-09-11 21:52           ` Laurent Pinchart
2021-09-14 13:22             ` Johannes Berg
2021-09-11  0:08   ` Laurent Pinchart
2021-09-10 22:52 ` Mauro Carvalho Chehab
2021-09-10 23:45   ` Josh Triplett
2021-09-10 23:48     ` Dave Hansen
2021-09-11  0:13       ` Laurent Pinchart
2021-09-10 23:55     ` Thomas Gleixner
2021-09-11  0:20       ` Laurent Pinchart
2021-09-11 14:20         ` Steven Rostedt
2021-09-11 22:08           ` Laurent Pinchart
2021-09-11 22:42             ` Steven Rostedt
2021-09-11 23:10               ` Laurent Pinchart
2021-09-13 11:10               ` Mark Brown
2021-09-11 22:51           ` Mauro Carvalho Chehab
2021-09-11 23:22           ` Mauro Carvalho Chehab
2021-09-11 10:31       ` Leon Romanovsky
2021-09-11 11:41         ` Laurent Pinchart
2021-09-11 12:04           ` Leon Romanovsky
2021-09-11 22:04             ` Laurent Pinchart
2021-09-12  4:27               ` Leon Romanovsky
2021-09-12  7:26                 ` Greg KH
2021-09-12  8:29                   ` Leon Romanovsky
2021-09-12 13:25                     ` Greg KH
2021-09-12 14:15                       ` Leon Romanovsky
2021-09-12 14:34                         ` Greg KH
2021-09-12 16:41                           ` Laurent Pinchart
2021-09-12 20:35                           ` Dave Airlie
2021-09-12 20:41                           ` Dave Airlie
2021-09-12 20:49                             ` Daniel Vetter
2021-09-12 21:12                               ` Dave Airlie
2021-09-12 22:51                                 ` Linus Walleij
2021-09-12 23:15                                   ` Dave Airlie
2021-09-13 13:20                                   ` Arnd Bergmann
2021-09-13 13:54                                     ` Daniel Vetter
2021-09-13 22:04                                       ` Arnd Bergmann
2021-09-13 23:33                                         ` Dave Airlie
2021-09-14  9:08                                           ` Arnd Bergmann
2021-09-14  9:23                                             ` Daniel Vetter
2021-09-14 10:47                                               ` Laurent Pinchart
2021-09-14 12:58                                               ` Arnd Bergmann
2021-09-14 19:45                                                 ` Daniel Vetter
2021-09-14 15:43                                             ` Luck, Tony
2021-09-13 14:52                                     ` James Bottomley
2021-09-14 13:07                                     ` Linus Walleij
2021-09-13 14:03                           ` Mark Brown
2021-09-12 15:55                       ` Laurent Pinchart
2021-09-12 16:43                         ` James Bottomley
2021-09-12 16:58                           ` Laurent Pinchart
2021-09-12 17:08                             ` James Bottomley
2021-09-12 19:52                   ` Dave Airlie
2021-09-12  7:46                 ` Mauro Carvalho Chehab
2021-09-12  8:00                   ` Leon Romanovsky
2021-09-12 14:53                     ` Laurent Pinchart
2021-09-12 15:41                       ` Mauro Carvalho Chehab
2021-09-10 23:46   ` Laurent Pinchart
2021-09-11  0:38     ` Mauro Carvalho Chehab
2021-09-11  9:27       ` Laurent Pinchart
2021-09-11 22:33         ` Mauro Carvalho Chehab
2021-09-13 12:04         ` Mark Brown
2021-09-12 19:13 ` Dave Airlie
2021-09-12 19:48   ` Laurent Pinchart
2021-09-13  2:26     ` Dave Airlie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).