* [MAINTAINER SUMMIT] User-space requirements for accelerator drivers @ 2021-09-10 21:00 Jonathan Corbet 2021-09-10 21:32 ` Josh Triplett ` (3 more replies) 0 siblings, 4 replies; 77+ messages in thread From: Jonathan Corbet @ 2021-09-10 21:00 UTC (permalink / raw) To: ksummit There has been a regular disagreement in recent years about whether drivers for accelerators (such as for the Habana Gaudi device) should be subject to the same requirements as GPU drivers when it comes to the availability of a free implementation of the user-space side. It flared up again recently: https://lwn.net/Articles/867168/ Happily, the Habana situation in particular seems to be resolving itself: https://lwn.net/ml/linux-kernel/CAFCwf119s7iXk+qpwoVPnRtOGcxeuZb3rnihf6NWWoVT-4ODHA@mail.gmail.com/ But even there it is clear that the fundamental question has not yet been resolved. This seems like the sort of question that the maintainer summit exists to address. Specifically, we could discuss: - Under which circumstances should the kernel community require the existence of freely licensed user-space code that exercises all functionalities of a proposed kernel driver or feature? - Are there internal kernel interfaces, such as DMA-BUF or P2PDMA, that are only available to drivers with a free user-space implementation? Do we need an EXPORT_SYMBOL_USERSPACE_GPL()? - What constitutes an acceptable user-space implementation in cases where these restrictions apply? I suspect that more clarity (and fewer arguments) on these questions would be welcome both within and beyond the development community. Thanks, jon ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-10 21:00 [MAINTAINER SUMMIT] User-space requirements for accelerator drivers Jonathan Corbet @ 2021-09-10 21:32 ` Josh Triplett 2021-09-13 13:50 ` Christian Brauner 2021-09-14 14:40 ` Jani Nikula 2021-09-10 21:51 ` James Bottomley ` (2 subsequent siblings) 3 siblings, 2 replies; 77+ messages in thread From: Josh Triplett @ 2021-09-10 21:32 UTC (permalink / raw) To: Jonathan Corbet; +Cc: ksummit On Fri, Sep 10, 2021 at 03:00:58PM -0600, Jonathan Corbet wrote: > There has been a regular disagreement in recent years about whether > drivers for accelerators (such as for the Habana Gaudi device) should be > subject to the same requirements as GPU drivers when it comes to the > availability of a free implementation of the user-space side. It flared > up again recently: > > https://lwn.net/Articles/867168/ > > Happily, the Habana situation in particular seems to be resolving > itself: > > https://lwn.net/ml/linux-kernel/CAFCwf119s7iXk+qpwoVPnRtOGcxeuZb3rnihf6NWWoVT-4ODHA@mail.gmail.com/ > > But even there it is clear that the fundamental question has not yet > been resolved. > > This seems like the sort of question that the maintainer summit exists > to address. Specifically, we could discuss: > > - Under which circumstances should the kernel community require the > existence of freely licensed user-space code that exercises all > functionalities of a proposed kernel driver or feature? I think it'd be reasonable to ask, as well: if we required this for *all* kernel functionality, such that we never add any userspace interface to the kernel unless there's *some* Open Source userspace that needs/wants it, what problems would that cause if any? It appears that in this case the kernel pushing back has influenced the release of Open Source userspace code. Having a kernel-wide policy here seems like it'll *help* people within many companies to push for such changes: "We're never going to be able to get our changes into the upstream kernel if there's no userspace to drive them." One tradeoff would be, in theory, that there are some vendors who won't care enough about upstreaming their changes, and will just keep their drivers out of tree in that circumstance. There was a time where that would have been reason enough not to have such a policy. I think that time has passed, though, and now I think we'd get more benefit from requiring open userspace consumers of APIs than we'd lose by having some APIs not submitted upstream. (Plus, those vendors are still obligated to *ship* the source of those changes to their users, and if a third party wants to use those changes they can always upstream them, at which point the vendor still faces the choice of "do I want/need to participate in this conversation or not".) > - What constitutes an acceptable user-space implementation in cases > where these restrictions apply? This seems like it'll always be a fuzzy line. The main issue: it's OK if there are both open and proprietary users, but it's not OK if the only open implementation is an outdated or token project that nobody actually uses, that exists and is maintained solely for the purposes of placating the kernel requirement. There's no easy way to define that line, other than "we'll know it when we see it". ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-10 21:32 ` Josh Triplett @ 2021-09-13 13:50 ` Christian Brauner 2021-09-13 13:57 ` Daniel Vetter 2021-09-14 14:40 ` Jani Nikula 1 sibling, 1 reply; 77+ messages in thread From: Christian Brauner @ 2021-09-13 13:50 UTC (permalink / raw) To: Josh Triplett; +Cc: Jonathan Corbet, ksummit On Fri, Sep 10, 2021 at 02:32:48PM -0700, Josh Triplett wrote: > On Fri, Sep 10, 2021 at 03:00:58PM -0600, Jonathan Corbet wrote: > > There has been a regular disagreement in recent years about whether > > drivers for accelerators (such as for the Habana Gaudi device) should be > > subject to the same requirements as GPU drivers when it comes to the > > availability of a free implementation of the user-space side. It flared > > up again recently: > > > > https://lwn.net/Articles/867168/ > > > > Happily, the Habana situation in particular seems to be resolving > > itself: > > > > https://lwn.net/ml/linux-kernel/CAFCwf119s7iXk+qpwoVPnRtOGcxeuZb3rnihf6NWWoVT-4ODHA@mail.gmail.com/ > > > > But even there it is clear that the fundamental question has not yet > > been resolved. > > > > This seems like the sort of question that the maintainer summit exists > > to address. Specifically, we could discuss: > > > > - Under which circumstances should the kernel community require the > > existence of freely licensed user-space code that exercises all > > functionalities of a proposed kernel driver or feature? > > I think it'd be reasonable to ask, as well: if we required this for > *all* kernel functionality, such that we never add any userspace > interface to the kernel unless there's *some* Open Source userspace that > needs/wants it, what problems would that cause if any? > > It appears that in this case the kernel pushing back has influenced the > release of Open Source userspace code. Having a kernel-wide policy here > seems like it'll *help* people within many companies to push for such > changes: "We're never going to be able to get our changes into the > upstream kernel if there's no userspace to drive them." I can certainly see why that discussion is needed for features that deal with hardware which requires an elaborate userspace component in order to work. But I'm not convinced this policy makes sense for all kernel features. For example, when we introduce a new general api in kernel core it will often be driven by requirements of other well-known open source projects. If such projects state that they will add support for it once a kernel supporting this feature is released that expression of their intent is often sufficient. We usually don't make such projects jump through the hoops of implementing the userspace side upfront to proof that they would use it. Although to the credit of a few open source projects that does also happen. But I'm hesitant to make this a general rule. Christian ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-13 13:50 ` Christian Brauner @ 2021-09-13 13:57 ` Daniel Vetter 2021-09-14 2:07 ` Laurent Pinchart 0 siblings, 1 reply; 77+ messages in thread From: Daniel Vetter @ 2021-09-13 13:57 UTC (permalink / raw) To: Christian Brauner; +Cc: Josh Triplett, Jonathan Corbet, ksummit On Mon, Sep 13, 2021 at 3:50 PM Christian Brauner <christian.brauner@ubuntu.com> wrote: > On Fri, Sep 10, 2021 at 02:32:48PM -0700, Josh Triplett wrote: > > On Fri, Sep 10, 2021 at 03:00:58PM -0600, Jonathan Corbet wrote: > > > There has been a regular disagreement in recent years about whether > > > drivers for accelerators (such as for the Habana Gaudi device) should be > > > subject to the same requirements as GPU drivers when it comes to the > > > availability of a free implementation of the user-space side. It flared > > > up again recently: > > > > > > https://lwn.net/Articles/867168/ > > > > > > Happily, the Habana situation in particular seems to be resolving > > > itself: > > > > > > https://lwn.net/ml/linux-kernel/CAFCwf119s7iXk+qpwoVPnRtOGcxeuZb3rnihf6NWWoVT-4ODHA@mail.gmail.com/ > > > > > > But even there it is clear that the fundamental question has not yet > > > been resolved. > > > > > > This seems like the sort of question that the maintainer summit exists > > > to address. Specifically, we could discuss: > > > > > > - Under which circumstances should the kernel community require the > > > existence of freely licensed user-space code that exercises all > > > functionalities of a proposed kernel driver or feature? > > > > I think it'd be reasonable to ask, as well: if we required this for > > *all* kernel functionality, such that we never add any userspace > > interface to the kernel unless there's *some* Open Source userspace that > > needs/wants it, what problems would that cause if any? > > > > It appears that in this case the kernel pushing back has influenced the > > release of Open Source userspace code. Having a kernel-wide policy here > > seems like it'll *help* people within many companies to push for such > > changes: "We're never going to be able to get our changes into the > > upstream kernel if there's no userspace to drive them." > > I can certainly see why that discussion is needed for features that deal > with hardware which requires an elaborate userspace component in order > to work. > But I'm not convinced this policy makes sense for all kernel features. > For example, when we introduce a new general api in kernel core it will > often be driven by requirements of other well-known open source > projects. If such projects state that they will add support for it once > a kernel supporting this feature is released that expression of their > intent is often sufficient. We usually don't make such projects jump > through the hoops of implementing the userspace side upfront to proof > that they would use it. Although to the credit of a few open source > projects that does also happen. But I'm hesitant to make this a general > rule. I agree it's an orthogonal discussion, but I think we've also had our fair share of fully generic interface that turned out to miss the mark in real-world usage. This is why the generic kernel modesetting/display interface for drivers in drivers/gpu also needs fully open implementation. Not because we really need that for long-term maintainability - the interfaces are generally well-defined enough that testcases + docs are sufficient for that, but because in practices it just catches so many small gotchas that are otherwise overlooked in good generic uapi design. But I do think we should keep this apart from the discussions for hw drivers, where 80+% of the driver that's absolutely needed to drive the hardware is in userspace. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-13 13:57 ` Daniel Vetter @ 2021-09-14 2:07 ` Laurent Pinchart 0 siblings, 0 replies; 77+ messages in thread From: Laurent Pinchart @ 2021-09-14 2:07 UTC (permalink / raw) To: Daniel Vetter; +Cc: Christian Brauner, Josh Triplett, Jonathan Corbet, ksummit Hi Daniel, On Mon, Sep 13, 2021 at 03:57:25PM +0200, Daniel Vetter wrote: > On Mon, Sep 13, 2021 at 3:50 PM Christian Brauner wrote: > > On Fri, Sep 10, 2021 at 02:32:48PM -0700, Josh Triplett wrote: > > > On Fri, Sep 10, 2021 at 03:00:58PM -0600, Jonathan Corbet wrote: > > > > There has been a regular disagreement in recent years about whether > > > > drivers for accelerators (such as for the Habana Gaudi device) should be > > > > subject to the same requirements as GPU drivers when it comes to the > > > > availability of a free implementation of the user-space side. It flared > > > > up again recently: > > > > > > > > https://lwn.net/Articles/867168/ > > > > > > > > Happily, the Habana situation in particular seems to be resolving > > > > itself: > > > > > > > > https://lwn.net/ml/linux-kernel/CAFCwf119s7iXk+qpwoVPnRtOGcxeuZb3rnihf6NWWoVT-4ODHA@mail.gmail.com/ > > > > > > > > But even there it is clear that the fundamental question has not yet > > > > been resolved. > > > > > > > > This seems like the sort of question that the maintainer summit exists > > > > to address. Specifically, we could discuss: > > > > > > > > - Under which circumstances should the kernel community require the > > > > existence of freely licensed user-space code that exercises all > > > > functionalities of a proposed kernel driver or feature? > > > > > > I think it'd be reasonable to ask, as well: if we required this for > > > *all* kernel functionality, such that we never add any userspace > > > interface to the kernel unless there's *some* Open Source userspace that > > > needs/wants it, what problems would that cause if any? > > > > > > It appears that in this case the kernel pushing back has influenced the > > > release of Open Source userspace code. Having a kernel-wide policy here > > > seems like it'll *help* people within many companies to push for such > > > changes: "We're never going to be able to get our changes into the > > > upstream kernel if there's no userspace to drive them." > > > > I can certainly see why that discussion is needed for features that deal > > with hardware which requires an elaborate userspace component in order > > to work. > > But I'm not convinced this policy makes sense for all kernel features. > > For example, when we introduce a new general api in kernel core it will > > often be driven by requirements of other well-known open source > > projects. If such projects state that they will add support for it once > > a kernel supporting this feature is released that expression of their > > intent is often sufficient. We usually don't make such projects jump > > through the hoops of implementing the userspace side upfront to proof > > that they would use it. Although to the credit of a few open source > > projects that does also happen. But I'm hesitant to make this a general > > rule. > > I agree it's an orthogonal discussion, but I think we've also had our > fair share of fully generic interface that turned out to miss the mark > in real-world usage. This is why the generic kernel > modesetting/display interface for drivers in drivers/gpu also needs > fully open implementation. Not because we really need that for > long-term maintainability - the interfaces are generally well-defined > enough that testcases + docs are sufficient for that, but because in > practices it just catches so many small gotchas that are otherwise > overlooked in good generic uapi design. I concur here. I've spent the past 3 years working on libcamera in userspace after a decade of experience in the kernel side of V4L2. It was an enlightening (and embarassing) moment to realize that some kernel APIs that I had designed myself didn't stand the test of being used for real. A test application written to test an API in the way the API was designed will generally not be good at finding design flaws. > But I do think we should keep this apart from the discussions for hw > drivers, where 80+% of the driver that's absolutely needed to drive > the hardware is in userspace. -- Regards, Laurent Pinchart ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-10 21:32 ` Josh Triplett 2021-09-13 13:50 ` Christian Brauner @ 2021-09-14 14:40 ` Jani Nikula 2021-09-14 14:45 ` Geert Uytterhoeven 1 sibling, 1 reply; 77+ messages in thread From: Jani Nikula @ 2021-09-14 14:40 UTC (permalink / raw) To: Josh Triplett, Jonathan Corbet; +Cc: ksummit On Fri, 10 Sep 2021, Josh Triplett <josh@joshtriplett.org> wrote: > On Fri, Sep 10, 2021 at 03:00:58PM -0600, Jonathan Corbet wrote: >> - What constitutes an acceptable user-space implementation in cases >> where these restrictions apply? > > This seems like it'll always be a fuzzy line. The main issue: it's OK if > there are both open and proprietary users, but it's not OK if the only > open implementation is an outdated or token project that nobody actually > uses, that exists and is maintained solely for the purposes of placating > the kernel requirement. There's no easy way to define that line, other > than "we'll know it when we see it". One aspect of it should be easy enough: If you have an issue with your proprietary stack, but you can't reproduce it with the open stack, you won't get your fix in the kernel. BR, Jani. -- Jani Nikula, Intel Open Source Graphics Center ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-14 14:40 ` Jani Nikula @ 2021-09-14 14:45 ` Geert Uytterhoeven 2021-09-14 14:59 ` Jani Nikula 0 siblings, 1 reply; 77+ messages in thread From: Geert Uytterhoeven @ 2021-09-14 14:45 UTC (permalink / raw) To: Jani Nikula; +Cc: Josh Triplett, Jonathan Corbet, ksummit Hi Jani, On Tue, Sep 14, 2021 at 4:40 PM Jani Nikula <jani.nikula@intel.com> wrote: > On Fri, 10 Sep 2021, Josh Triplett <josh@joshtriplett.org> wrote: > > On Fri, Sep 10, 2021 at 03:00:58PM -0600, Jonathan Corbet wrote: > >> - What constitutes an acceptable user-space implementation in cases > >> where these restrictions apply? > > > > This seems like it'll always be a fuzzy line. The main issue: it's OK if > > there are both open and proprietary users, but it's not OK if the only > > open implementation is an outdated or token project that nobody actually > > uses, that exists and is maintained solely for the purposes of placating > > the kernel requirement. There's no easy way to define that line, other > > than "we'll know it when we see it". > > One aspect of it should be easy enough: If you have an issue with your > proprietary stack, but you can't reproduce it with the open stack, you > won't get your fix in the kernel. Which basically boils down to the old mantra: before fixing a bug, first add a new test case to trigger the bug. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-14 14:45 ` Geert Uytterhoeven @ 2021-09-14 14:59 ` Jani Nikula 2021-09-14 15:10 ` Geert Uytterhoeven 0 siblings, 1 reply; 77+ messages in thread From: Jani Nikula @ 2021-09-14 14:59 UTC (permalink / raw) To: Geert Uytterhoeven; +Cc: Josh Triplett, Jonathan Corbet, ksummit On Tue, 14 Sep 2021, Geert Uytterhoeven <geert@linux-m68k.org> wrote: > Hi Jani, > > On Tue, Sep 14, 2021 at 4:40 PM Jani Nikula <jani.nikula@intel.com> wrote: >> On Fri, 10 Sep 2021, Josh Triplett <josh@joshtriplett.org> wrote: >> > On Fri, Sep 10, 2021 at 03:00:58PM -0600, Jonathan Corbet wrote: >> >> - What constitutes an acceptable user-space implementation in cases >> >> where these restrictions apply? >> > >> > This seems like it'll always be a fuzzy line. The main issue: it's OK if >> > there are both open and proprietary users, but it's not OK if the only >> > open implementation is an outdated or token project that nobody actually >> > uses, that exists and is maintained solely for the purposes of placating >> > the kernel requirement. There's no easy way to define that line, other >> > than "we'll know it when we see it". >> >> One aspect of it should be easy enough: If you have an issue with your >> proprietary stack, but you can't reproduce it with the open stack, you >> won't get your fix in the kernel. > > Which basically boils down to the old mantra: before fixing a bug, > first add a new test case to trigger the bug. Oh, but then the question becomes, is it enough to add a reproducer, simplified from your proprietary stack, in your test asset, and then fix the kernel issue? Even if it's not a problem in your open stack at all? BR, Jani. -- Jani Nikula, Intel Open Source Graphics Center ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-14 14:59 ` Jani Nikula @ 2021-09-14 15:10 ` Geert Uytterhoeven 0 siblings, 0 replies; 77+ messages in thread From: Geert Uytterhoeven @ 2021-09-14 15:10 UTC (permalink / raw) To: Jani Nikula; +Cc: Josh Triplett, Jonathan Corbet, ksummit Hi Jani, On Tue, Sep 14, 2021 at 5:00 PM Jani Nikula <jani.nikula@intel.com> wrote: > On Tue, 14 Sep 2021, Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > On Tue, Sep 14, 2021 at 4:40 PM Jani Nikula <jani.nikula@intel.com> wrote: > >> On Fri, 10 Sep 2021, Josh Triplett <josh@joshtriplett.org> wrote: > >> > On Fri, Sep 10, 2021 at 03:00:58PM -0600, Jonathan Corbet wrote: > >> >> - What constitutes an acceptable user-space implementation in cases > >> >> where these restrictions apply? > >> > > >> > This seems like it'll always be a fuzzy line. The main issue: it's OK if > >> > there are both open and proprietary users, but it's not OK if the only > >> > open implementation is an outdated or token project that nobody actually > >> > uses, that exists and is maintained solely for the purposes of placating > >> > the kernel requirement. There's no easy way to define that line, other > >> > than "we'll know it when we see it". > >> > >> One aspect of it should be easy enough: If you have an issue with your > >> proprietary stack, but you can't reproduce it with the open stack, you > >> won't get your fix in the kernel. > > > > Which basically boils down to the old mantra: before fixing a bug, > > first add a new test case to trigger the bug. > > Oh, but then the question becomes, is it enough to add a reproducer, > simplified from your proprietary stack, in your test asset, and then fix > the kernel issue? Even if it's not a problem in your open stack at all? I was thinking test ~ open stack. I.e. enhance the open stack to reproduce the issue. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-10 21:00 [MAINTAINER SUMMIT] User-space requirements for accelerator drivers Jonathan Corbet 2021-09-10 21:32 ` Josh Triplett @ 2021-09-10 21:51 ` James Bottomley 2021-09-10 21:59 ` Alexandre Belloni 2021-09-11 0:08 ` Laurent Pinchart 2021-09-10 22:52 ` Mauro Carvalho Chehab 2021-09-12 19:13 ` Dave Airlie 3 siblings, 2 replies; 77+ messages in thread From: James Bottomley @ 2021-09-10 21:51 UTC (permalink / raw) To: Jonathan Corbet, ksummit On Fri, 2021-09-10 at 15:00 -0600, Jonathan Corbet wrote: > - Are there internal kernel interfaces, such as DMA-BUF or P2PDMA, > that are only available to drivers with a free user-space > implementation? Do we need an EXPORT_SYMBOL_USERSPACE_GPL()? I don't think reasonably we can do this. The kernel GPLv2 licence includes this system exception: NOTE! This copyright does *not* cover user programs that use kernel services by normal system calls - this is merely considered normal use of the kernel, and does *not* fall under the heading of "derived work". Also note that the GPL below is copyrighted by the Free Software Foundation, but the instance of code that it refers to (the Linux kernel) is copyrighted by me and others who actually wrote it. Also note that the only valid version of the GPL as far as the kernel is concerned is _this_ particular version of the license (ie v2, not v2.2 or v3.x or whatever), unless explicitly otherwise stated. This means currently that once an API is exposed to user space, we've given up control of the type of programme (proprietary or open source) that may use it. It might be possible legally to try and take back that control by modifying the system exception (what is a "normal" system call), but I personally think that would be unwise and create a raft of other problems for other proprietary user space code running on Linux, which I really think we don't want to do. I think our only recourse for user space accelerators is not to export the interface if we think it would only be used for evil purposes. James ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-10 21:51 ` James Bottomley @ 2021-09-10 21:59 ` Alexandre Belloni 2021-09-10 22:35 ` James Bottomley 2021-09-11 0:08 ` Laurent Pinchart 1 sibling, 1 reply; 77+ messages in thread From: Alexandre Belloni @ 2021-09-10 21:59 UTC (permalink / raw) To: James Bottomley; +Cc: Jonathan Corbet, ksummit On 10/09/2021 14:51:43-0700, James Bottomley wrote: > On Fri, 2021-09-10 at 15:00 -0600, Jonathan Corbet wrote: > > - Are there internal kernel interfaces, such as DMA-BUF or P2PDMA, > > that are only available to drivers with a free user-space > > implementation? Do we need an EXPORT_SYMBOL_USERSPACE_GPL()? > > I don't think reasonably we can do this. The kernel GPLv2 licence > includes this system exception: > > NOTE! This copyright does *not* cover user programs that use > kernel services by normal system calls - this is merely considered > normal use of the kernel, and does *not* fall under the heading of > "derived work". Also note that the GPL below is copyrighted by the > Free Software Foundation, but the instance of code that it refers to > (the Linux kernel) is copyrighted by me and others who actually > wrote it. > > Also note that the only valid version of the GPL as far as the > kernel is concerned is _this_ particular version of the license (ie > v2, not v2.2 or v3.x or whatever), unless explicitly otherwise > stated. > > This means currently that once an API is exposed to user space, we've > given up control of the type of programme (proprietary or open source) > that may use it. > > It might be possible legally to try and take back that control by > modifying the system exception (what is a "normal" system call), but I > personally think that would be unwise and create a raft of other > problems for other proprietary user space code running on Linux, which > I really think we don't want to do. > > I think our only recourse for user space accelerators is not to export > the interface if we think it would only be used for evil purposes. > I think the question is not whether we want to forbid proprietary user space using an API but whether we want to merge said API so the license on the kernel doesn't matter much. -- Alexandre Belloni, co-owner and COO, Bootlin Embedded Linux and Kernel engineering https://bootlin.com ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-10 21:59 ` Alexandre Belloni @ 2021-09-10 22:35 ` James Bottomley 2021-09-11 14:51 ` Jonathan Corbet 0 siblings, 1 reply; 77+ messages in thread From: James Bottomley @ 2021-09-10 22:35 UTC (permalink / raw) To: Alexandre Belloni; +Cc: Jonathan Corbet, ksummit On Fri, 2021-09-10 at 23:59 +0200, Alexandre Belloni wrote: > On 10/09/2021 14:51:43-0700, James Bottomley wrote: > > On Fri, 2021-09-10 at 15:00 -0600, Jonathan Corbet wrote: > > > - Are there internal kernel interfaces, such as DMA-BUF or > > > P2PDMA, that are only available to drivers with a free user-space > > > implementation? Do we need an EXPORT_SYMBOL_USERSPACE_GPL()? > > > > I don't think reasonably we can do this. The kernel GPLv2 licence > > includes this system exception: > > > > NOTE! This copyright does *not* cover user programs that use > > kernel services by normal system calls - this is merely > > considered > > normal use of the kernel, and does *not* fall under the heading > > of > > "derived work". Also note that the GPL below is copyrighted by > > the > > Free Software Foundation, but the instance of code that it > > refers to > > (the Linux kernel) is copyrighted by me and others who actually > > wrote it. > > > > Also note that the only valid version of the GPL as far as the > > kernel is concerned is _this_ particular version of the license > > (ie > > v2, not v2.2 or v3.x or whatever), unless explicitly otherwise > > stated. > > > > This means currently that once an API is exposed to user space, > > we've given up control of the type of programme (proprietary or > > open source) that may use it. > > > > It might be possible legally to try and take back that control by > > modifying the system exception (what is a "normal" system call), > > but I personally think that would be unwise and create a raft of > > other problems for other proprietary user space code running on > > Linux, which I really think we don't want to do. > > > > I think our only recourse for user space accelerators is not to > > export the interface if we think it would only be used for evil > > purposes. > > > > I think the question is not whether we want to forbid proprietary > user space using an API but whether we want to merge said API so the > license on the kernel doesn't matter much. I thought that *was* the statement I made in the last paragraph: we can choose whether or not to merge the enabling API into the kernel. However, if we merge it we can't choose whether a proprietary user space takes advantage of the API. My original reply was to the notion of EXPORT_USERSPACE_GPL, which I think we have no legal basis for enforcing without modifying the system exception. James ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-10 22:35 ` James Bottomley @ 2021-09-11 14:51 ` Jonathan Corbet 2021-09-11 15:24 ` James Bottomley 0 siblings, 1 reply; 77+ messages in thread From: Jonathan Corbet @ 2021-09-11 14:51 UTC (permalink / raw) To: James Bottomley, Alexandre Belloni; +Cc: ksummit James Bottomley <James.Bottomley@HansenPartnership.com> writes: > On Fri, 2021-09-10 at 23:59 +0200, Alexandre Belloni wrote: >> I think the question is not whether we want to forbid proprietary >> user space using an API but whether we want to merge said API so the >> license on the kernel doesn't matter much. > > I thought that *was* the statement I made in the last paragraph: we can > choose whether or not to merge the enabling API into the kernel. > However, if we merge it we can't choose whether a proprietary user > space takes advantage of the API. My original reply was to the notion > of EXPORT_USERSPACE_GPL, which I think we have no legal basis for > enforcing without modifying the system exception. That wasn't thinking when I pulled the idea of EXPORT_USERSPACE_GPL out of whatever dark place it was lurking in. The idea was, instead, to document that if your driver is using that interface, it won't be considered for merging into the kernel in the absence of a working, free, user-space implementation -- should we choose to adopt such a policy, of course. Nobody is trying to prohibit proprietary user space, that's not the point. jon ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-11 14:51 ` Jonathan Corbet @ 2021-09-11 15:24 ` James Bottomley 2021-09-11 21:52 ` Laurent Pinchart 0 siblings, 1 reply; 77+ messages in thread From: James Bottomley @ 2021-09-11 15:24 UTC (permalink / raw) To: Jonathan Corbet, Alexandre Belloni; +Cc: ksummit On Sat, 2021-09-11 at 08:51 -0600, Jonathan Corbet wrote: > James Bottomley <James.Bottomley@HansenPartnership.com> writes: > > > On Fri, 2021-09-10 at 23:59 +0200, Alexandre Belloni wrote: > > > I think the question is not whether we want to forbid proprietary > > > user space using an API but whether we want to merge said API so > > > the license on the kernel doesn't matter much. > > > > I thought that *was* the statement I made in the last paragraph: we > > can choose whether or not to merge the enabling API into the > > kernel. However, if we merge it we can't choose whether a > > proprietary user space takes advantage of the API. My original > > reply was to the notion of EXPORT_USERSPACE_GPL, which I think we > > have no legal basis for enforcing without modifying the system > > exception. > > That wasn't thinking when I pulled the idea of EXPORT_USERSPACE_GPL > out of whatever dark place it was lurking in. OK, but you can see how that thought is arrived at since EXPORT_SYMBOL_GPL is a technically enforced licensing permission tag. However, I was seriously pushing back against the *idea* of such a tag because once it crosses the kernel to user boundary it would cause huge confusion of our current licensing positions ... regardless of what it actually means. > The idea was, instead, to document that if your driver is using > that interface, it won't be considered for merging into the kernel in > the absence of a working, free, user-space implementation -- should > we choose to adopt such a policy, of course. Right, and if you have a driver with an internal API that's used for communication with a userspace blob, we can evaluate that, as we have done before, on a case by case basis. It's not a new thing, because we're both old enough to remember "my wireless driver has to have a proprietary component for regulatory reasons". We've made decisions both for and against such drivers in the past, but I think the issues are too nuanced to make a general rule. If you do have a general rule, what other things, like firmware, would get caught by it and so on ... > Nobody is trying to prohibit proprietary user space, that's not the > point. I didn't think you were in general, but requiring a free userspace driver implementation is prohibiting a proprietary one and so then you get into questions of how wide the reach is and what the knock on effects are if you try to craft a general policy around this ... especially if it has technical enforcement measures. James ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-11 15:24 ` James Bottomley @ 2021-09-11 21:52 ` Laurent Pinchart 2021-09-14 13:22 ` Johannes Berg 0 siblings, 1 reply; 77+ messages in thread From: Laurent Pinchart @ 2021-09-11 21:52 UTC (permalink / raw) To: James Bottomley; +Cc: Jonathan Corbet, Alexandre Belloni, ksummit Hi James, On Sat, Sep 11, 2021 at 08:24:38AM -0700, James Bottomley wrote: > On Sat, 2021-09-11 at 08:51 -0600, Jonathan Corbet wrote: > > James Bottomley <James.Bottomley@HansenPartnership.com> writes: > > > > > On Fri, 2021-09-10 at 23:59 +0200, Alexandre Belloni wrote: > > > > I think the question is not whether we want to forbid proprietary > > > > user space using an API but whether we want to merge said API so > > > > the license on the kernel doesn't matter much. > > > > > > I thought that *was* the statement I made in the last paragraph: we > > > can choose whether or not to merge the enabling API into the > > > kernel. However, if we merge it we can't choose whether a > > > proprietary user space takes advantage of the API. My original > > > reply was to the notion of EXPORT_USERSPACE_GPL, which I think we > > > have no legal basis for enforcing without modifying the system > > > exception. > > > > That wasn't thinking when I pulled the idea of EXPORT_USERSPACE_GPL > > out of whatever dark place it was lurking in. > > OK, but you can see how that thought is arrived at since > EXPORT_SYMBOL_GPL is a technically enforced licensing permission tag. > However, I was seriously pushing back against the *idea* of such a tag > because once it crosses the kernel to user boundary it would cause huge > confusion of our current licensing positions ... regardless of what it > actually means. > > > The idea was, instead, to document that if your driver is using > > that interface, it won't be considered for merging into the kernel in > > the absence of a working, free, user-space implementation -- should > > we choose to adopt such a policy, of course. > > Right, and if you have a driver with an internal API that's used for > communication with a userspace blob, we can evaluate that, as we have > done before, on a case by case basis. It's not a new thing, because > we're both old enough to remember "my wireless driver has to have a > proprietary component for regulatory reasons". > > We've made decisions both for and against such drivers in the past, but > I think the issues are too nuanced to make a general rule. If you do > have a general rule, what other things, like firmware, would get caught > by it and so on ... > > > Nobody is trying to prohibit proprietary user space, that's not the > > point. > > I didn't think you were in general, but requiring a free userspace > driver implementation is prohibiting a proprietary one and so then you > get into questions of how wide the reach is and what the knock on > effects are if you try to craft a general policy around this ... > especially if it has technical enforcement measures. Requiring the existence of one open userspace implementation doesn't preclude the ability for vendors to develop and ship closed implementations in parallel, at least in the general case. For instance, with GPUs or cameras, an open implementation could be developed (in Mesa and libcamera respectively) to exercise the device features (such as the GPU shader instruction set, or the camera ISP processing parameters), but wouldn't be required to include all the tuning and optimizations that a closed implementation would typically have. A vendor could thus ship a closed-source shader compiler in its OpenGL/Vulkan userspace driver, protecting the R&D investment to implement those optimizations (this would most likely also include lots of hacks to please benchmarks), and the community would be able to use the GPU and improve the open implementation. For GPUs, the situation has been quite clear for years: if a vendor wants an upstream driver, with all the benefits this brings, they have to also provide a (mostly?) feature-complete (in the sense of hardware features) but not necessarily optimized open-source counterpart. We're exploring here whether or not the same deal should cover camera ISPs and ML/AI accelerators (and possibly other devices that I'm less familiar with). For a wireless driver the situation is possibly different, I suppose that if the closed-source userspace blob is there only for regulatory reasons, then there would be very little point in having a closed-source implementation with a parallel one. -- Regards, Laurent Pinchart ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-11 21:52 ` Laurent Pinchart @ 2021-09-14 13:22 ` Johannes Berg 0 siblings, 0 replies; 77+ messages in thread From: Johannes Berg @ 2021-09-14 13:22 UTC (permalink / raw) To: Laurent Pinchart, James Bottomley Cc: Jonathan Corbet, Alexandre Belloni, ksummit On Sun, 2021-09-12 at 00:52 +0300, Laurent Pinchart wrote: > > For a wireless driver the situation is possibly different, I suppose > that if the closed-source userspace blob is there only for regulatory > reasons, then there would be very little point in having a closed-source > implementation with a parallel one. > For the record, I know of no such thing, certainly not with an upstream driver. Regulatory enforcement is either done through regulatory.db{,.p7s} loaded into the kernel (the accepted keys are determined at build time), or, in many newer devices, by firmware. johannes ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-10 21:51 ` James Bottomley 2021-09-10 21:59 ` Alexandre Belloni @ 2021-09-11 0:08 ` Laurent Pinchart 1 sibling, 0 replies; 77+ messages in thread From: Laurent Pinchart @ 2021-09-11 0:08 UTC (permalink / raw) To: James Bottomley; +Cc: Jonathan Corbet, ksummit Hi James, On Fri, Sep 10, 2021 at 02:51:43PM -0700, James Bottomley wrote: > On Fri, 2021-09-10 at 15:00 -0600, Jonathan Corbet wrote: > > - Are there internal kernel interfaces, such as DMA-BUF or P2PDMA, > > that are only available to drivers with a free user-space > > implementation? Do we need an EXPORT_SYMBOL_USERSPACE_GPL()? > > I don't think reasonably we can do this. The kernel GPLv2 licence > includes this system exception: > > NOTE! This copyright does *not* cover user programs that use > kernel services by normal system calls - this is merely considered > normal use of the kernel, and does *not* fall under the heading of > "derived work". Also note that the GPL below is copyrighted by the > Free Software Foundation, but the instance of code that it refers to > (the Linux kernel) is copyrighted by me and others who actually > wrote it. > > Also note that the only valid version of the GPL as far as the > kernel is concerned is _this_ particular version of the license (ie > v2, not v2.2 or v3.x or whatever), unless explicitly otherwise > stated. > > This means currently that once an API is exposed to user space, we've > given up control of the type of programme (proprietary or open source) > that may use it. > > It might be possible legally to try and take back that control by > modifying the system exception (what is a "normal" system call), but I > personally think that would be unwise and create a raft of other > problems for other proprietary user space code running on Linux, which > I really think we don't want to do. I overall agree that forbidding APIs from being used by closed-source userspace is likely a no-go from a license point of view, and that it would create a dangerous precedent and convey a bad message. > I think our only recourse for user space accelerators is not to export > the interface if we think it would only be used for evil purposes. In my opinion the issue at hand isn't so much that the interface can be used for evil purpose, but that drivers can reap the benefits of being included in mainline while ignoring (in good faith or not) the counterpart of allowing all userspace, open or not, to use the device. The problematic part is usually not the internal kernel interfaces that those drivers use, but the fact that they expose vendor-specific API elements to userspace without documenting them. One obvious option, *if* we decide that this isn't an acceptable behaviour, is to refuse merging such drivers. DMA-BUF or P2PDMA are not in themselves problematic, but in the case that Jon mentioned, they indicate that the device is expected to inter-operate with other devices by sharing data either through system memory or with direct DMA between the devices. This makes the absence of an open userspace more problematic as it may also affect the ability to use other devices in the system. It could thus be considered as a criteria to decide which drivers would require at one open userspace, should we decide that not all drivers would. -- Regards, Laurent Pinchart ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-10 21:00 [MAINTAINER SUMMIT] User-space requirements for accelerator drivers Jonathan Corbet 2021-09-10 21:32 ` Josh Triplett 2021-09-10 21:51 ` James Bottomley @ 2021-09-10 22:52 ` Mauro Carvalho Chehab 2021-09-10 23:45 ` Josh Triplett 2021-09-10 23:46 ` Laurent Pinchart 2021-09-12 19:13 ` Dave Airlie 3 siblings, 2 replies; 77+ messages in thread From: Mauro Carvalho Chehab @ 2021-09-10 22:52 UTC (permalink / raw) To: Jonathan Corbet; +Cc: ksummit Em Fri, 10 Sep 2021 15:00:58 -0600 Jonathan Corbet <corbet@lwn.net> escreveu: > There has been a regular disagreement in recent years about whether > drivers for accelerators (such as for the Habana Gaudi device) should be > subject to the same requirements as GPU drivers when it comes to the > availability of a free implementation of the user-space side. It flared > up again recently: > > https://lwn.net/Articles/867168/ > > Happily, the Habana situation in particular seems to be resolving > itself: > > https://lwn.net/ml/linux-kernel/CAFCwf119s7iXk+qpwoVPnRtOGcxeuZb3rnihf6NWWoVT-4ODHA@mail.gmail.com/ > > But even there it is clear that the fundamental question has not yet > been resolved. > > This seems like the sort of question that the maintainer summit exists > to address. Specifically, we could discuss: > > - Under which circumstances should the kernel community require the > existence of freely licensed user-space code that exercises all > functionalities of a proposed kernel driver or feature? > > - Are there internal kernel interfaces, such as DMA-BUF or P2PDMA, that > are only available to drivers with a free user-space implementation? > Do we need an EXPORT_SYMBOL_USERSPACE_GPL()? > > - What constitutes an acceptable user-space implementation in cases > where these restrictions apply? > > I suspect that more clarity (and fewer arguments) on these questions > would be welcome both within and beyond the development community. The media subsystem also has this sort of issues: there are several drivers there to support hardware accelerators for video encoders and decoders. In the case of media, usually devices with such hardware have an Image Signal Processor, where the codec runs on some firmware. On media, enforcing userspace to always be open source would have been very bad, as it would prevent several videoconferencing software to exist on Linux. Also, there are several such codec hardware that only exists on embedded hardware that already depends on proprietary software to run. So, a policy like that would make more damage than good. What we do, instead, is to try to enforce that the userspace API to be fully documented in a way that open source software can exist. This is easier said than done, but we have some compliance tools that we use, in order to help to validate the uAPI implementations. Thanks, Mauro ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-10 22:52 ` Mauro Carvalho Chehab @ 2021-09-10 23:45 ` Josh Triplett 2021-09-10 23:48 ` Dave Hansen 2021-09-10 23:55 ` Thomas Gleixner 2021-09-10 23:46 ` Laurent Pinchart 1 sibling, 2 replies; 77+ messages in thread From: Josh Triplett @ 2021-09-10 23:45 UTC (permalink / raw) To: Mauro Carvalho Chehab; +Cc: Jonathan Corbet, ksummit On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote: > On media, enforcing userspace to always be open source would > have been very bad, as it would prevent several videoconferencing > software to exist on Linux. I don't think we should enforce that all userspace users of an interface be Open Source. I do think we should enforce that *some* userspace user of an interface be Open Source before we add the interface. ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-10 23:45 ` Josh Triplett @ 2021-09-10 23:48 ` Dave Hansen 2021-09-11 0:13 ` Laurent Pinchart 2021-09-10 23:55 ` Thomas Gleixner 1 sibling, 1 reply; 77+ messages in thread From: Dave Hansen @ 2021-09-10 23:48 UTC (permalink / raw) To: Josh Triplett, Mauro Carvalho Chehab; +Cc: Jonathan Corbet, ksummit On 9/10/21 4:45 PM, Josh Triplett wrote: > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote: >> On media, enforcing userspace to always be open source would >> have been very bad, as it would prevent several videoconferencing >> software to exist on Linux. > I don't think we should enforce that all userspace users of an interface > be Open Source. I do think we should enforce that *some* userspace user > of an interface be Open Source before we add the interface. Right, if there's *no* open userspace, we can't meaningfully test or debug the thing. Maybe we don't need a whole userspace stack for every last interface, but if folks can't even offer up a selftest, it's not a good sign. ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-10 23:48 ` Dave Hansen @ 2021-09-11 0:13 ` Laurent Pinchart 0 siblings, 0 replies; 77+ messages in thread From: Laurent Pinchart @ 2021-09-11 0:13 UTC (permalink / raw) To: Dave Hansen Cc: Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit Hi Dave, On Fri, Sep 10, 2021 at 04:48:37PM -0700, Dave Hansen wrote: > On 9/10/21 4:45 PM, Josh Triplett wrote: > > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote: > >> On media, enforcing userspace to always be open source would > >> have been very bad, as it would prevent several videoconferencing > >> software to exist on Linux. > > > > I don't think we should enforce that all userspace users of an interface > > be Open Source. I do think we should enforce that *some* userspace user > > of an interface be Open Source before we add the interface. > > Right, if there's *no* open userspace, we can't meaningfully test or > debug the thing. > > Maybe we don't need a whole userspace stack for every last interface, > but if folks can't even offer up a selftest, it's not a good sign. It really depends on the type of driver and device. For GPUs or camera ISPs, for instance, a selftest is pointless, a full stack is required to be able to meaningfully test the driver and use the device as those expose a very large custom API to userspace (usually in the form of command buffers that contain device-specific instructions or register values). -- Regards, Laurent Pinchart ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-10 23:45 ` Josh Triplett 2021-09-10 23:48 ` Dave Hansen @ 2021-09-10 23:55 ` Thomas Gleixner 2021-09-11 0:20 ` Laurent Pinchart 2021-09-11 10:31 ` Leon Romanovsky 1 sibling, 2 replies; 77+ messages in thread From: Thomas Gleixner @ 2021-09-10 23:55 UTC (permalink / raw) To: Josh Triplett, Mauro Carvalho Chehab; +Cc: Jonathan Corbet, ksummit On Fri, Sep 10 2021 at 16:45, Josh Triplett wrote: > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote: >> On media, enforcing userspace to always be open source would >> have been very bad, as it would prevent several videoconferencing >> software to exist on Linux. > > I don't think we should enforce that all userspace users of an interface > be Open Source. I do think we should enforce that *some* userspace user > of an interface be Open Source before we add the interface. The real question is whether the interface is documented in a way that an Open Source implementation is possible. It does not matter whether it exists at that point in time or not. Even if it exists there is no guarantee that it is feature complete. Freely accessible documentation is really the key. Thanks, tglx ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-10 23:55 ` Thomas Gleixner @ 2021-09-11 0:20 ` Laurent Pinchart 2021-09-11 14:20 ` Steven Rostedt 2021-09-11 10:31 ` Leon Romanovsky 1 sibling, 1 reply; 77+ messages in thread From: Laurent Pinchart @ 2021-09-11 0:20 UTC (permalink / raw) To: Thomas Gleixner Cc: Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit Hi Thomas, On Sat, Sep 11, 2021 at 01:55:16AM +0200, Thomas Gleixner wrote: > On Fri, Sep 10 2021 at 16:45, Josh Triplett wrote: > > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote: > >> On media, enforcing userspace to always be open source would > >> have been very bad, as it would prevent several videoconferencing > >> software to exist on Linux. > > > > I don't think we should enforce that all userspace users of an interface > > be Open Source. I do think we should enforce that *some* userspace user > > of an interface be Open Source before we add the interface. > > The real question is whether the interface is documented in a way that > an Open Source implementation is possible. It does not matter whether it > exists at that point in time or not. Even if it exists there is no > guarantee that it is feature complete. > > Freely accessible documentation is really the key. In principle I'd agree, but that assumes such documentation would exist in the first place, with a sufficient level of quality. In many cases an open implementation the exercises all device features is a better form of documentation than what vendors have, even internally. Of course, the opposite is true as well, having seen too much vendor code for my own good, there is such a thing as a working for unreadable implementation. I fully agree with your point about feature completeness by the way, vendors will always find ways to hide pieces of the API if they really want to, but I think that would be true of documentation as well. In the DRM/KMS subsystem, the requirement is to provide an implementation in a mainstream graphics stack (depending on the device, that could be Mesa, Xorg, Weston, Android AOSP, ...) *and* get it approved by the maintainers of that stack. Requiring maintainer approval is the best way that was found to ensure a sufficient level of quality in those cases. -- Regards, Laurent Pinchart ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-11 0:20 ` Laurent Pinchart @ 2021-09-11 14:20 ` Steven Rostedt 2021-09-11 22:08 ` Laurent Pinchart ` (2 more replies) 0 siblings, 3 replies; 77+ messages in thread From: Steven Rostedt @ 2021-09-11 14:20 UTC (permalink / raw) To: Laurent Pinchart Cc: Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit On Sat, 11 Sep 2021 03:20:50 +0300 Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > > Freely accessible documentation is really the key. > > In principle I'd agree, but that assumes such documentation would exist > in the first place, with a sufficient level of quality. In many cases an > open implementation the exercises all device features is a better form > of documentation than what vendors have, even internally. Of course, the > opposite is true as well, having seen too much vendor code for my own > good, there is such a thing as a working for unreadable implementation. > > I fully agree with your point about feature completeness by the way, > vendors will always find ways to hide pieces of the API if they really > want to, but I think that would be true of documentation as well. I would like not only documentation, but also an open source test suite that simply tests the interface. Honestly, I believe that all new interfaces to the kernel (open or not) should have full documentation and a test suite interface before it gets accepted. We have tools/selftests that should be updated with all new interfaces into the kernel. Even if it's just a smoke test, that would be fine. Obviously if there's a driver without hardware, it can't be tested. But if you have that hardware, perhaps there could be a simple test suite of the interface to let you know it is still functional. -- Steve ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-11 14:20 ` Steven Rostedt @ 2021-09-11 22:08 ` Laurent Pinchart 2021-09-11 22:42 ` Steven Rostedt 2021-09-11 22:51 ` Mauro Carvalho Chehab 2021-09-11 23:22 ` Mauro Carvalho Chehab 2 siblings, 1 reply; 77+ messages in thread From: Laurent Pinchart @ 2021-09-11 22:08 UTC (permalink / raw) To: Steven Rostedt Cc: Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit Hi Steven, On Sat, Sep 11, 2021 at 10:20:07AM -0400, Steven Rostedt wrote: > On Sat, 11 Sep 2021 03:20:50 +0300 Laurent Pinchart wrote: > > > > Freely accessible documentation is really the key. > > > > In principle I'd agree, but that assumes such documentation would exist > > in the first place, with a sufficient level of quality. In many cases an > > open implementation the exercises all device features is a better form > > of documentation than what vendors have, even internally. Of course, the > > opposite is true as well, having seen too much vendor code for my own > > good, there is such a thing as a working for unreadable implementation. > > > > I fully agree with your point about feature completeness by the way, > > vendors will always find ways to hide pieces of the API if they really > > want to, but I think that would be true of documentation as well. > > I would like not only documentation, but also an open source test suite > that simply tests the interface. Honestly, I believe that all new > interfaces to the kernel (open or not) should have full documentation > and a test suite interface before it gets accepted. We have > tools/selftests that should be updated with all new interfaces into the > kernel. > > Even if it's just a smoke test, that would be fine. Obviously if > there's a driver without hardware, it can't be tested. But if you have > that hardware, perhaps there could be a simple test suite of the > interface to let you know it is still functional. It really depends on the device and the interface it requires. A GPU or camera ISP driver can't be meaningfully tested just at the interface level. The interface exposed to userspace is usually of the form of an ioctl that allows passing a large command buffer in a device-specific format, full of data that is then consumed by hardware or firmware. For instance, look at the ipu3_uapi_params structure in drivers/staging/media/ipu3/include/uapi/intel-ipu3.h. You need very elaborate code to exercise such an API. If you wanted GPU drivers to have tests in tools/selftests, you'd have to move Mesa to the kernel :-) -- Regards, Laurent Pinchart ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-11 22:08 ` Laurent Pinchart @ 2021-09-11 22:42 ` Steven Rostedt 2021-09-11 23:10 ` Laurent Pinchart 2021-09-13 11:10 ` Mark Brown 0 siblings, 2 replies; 77+ messages in thread From: Steven Rostedt @ 2021-09-11 22:42 UTC (permalink / raw) To: Laurent Pinchart Cc: Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit On Sun, 12 Sep 2021 01:08:55 +0300 Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > If you wanted GPU drivers to have tests in tools/selftests, you'd have > to move Mesa to the kernel :-) Some selftests have dependencies. It could require that Mesa is installed to run the tests, otherwise it just returns "unsupported". -- Steve ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-11 22:42 ` Steven Rostedt @ 2021-09-11 23:10 ` Laurent Pinchart 2021-09-13 11:10 ` Mark Brown 1 sibling, 0 replies; 77+ messages in thread From: Laurent Pinchart @ 2021-09-11 23:10 UTC (permalink / raw) To: Steven Rostedt Cc: Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit Hi Steven, On Sat, Sep 11, 2021 at 06:42:05PM -0400, Steven Rostedt wrote: > On Sun, 12 Sep 2021 01:08:55 +0300 Laurent Pinchart wrote: > > > If you wanted GPU drivers to have tests in tools/selftests, you'd have > > to move Mesa to the kernel :-) > > Some selftests have dependencies. It could require that Mesa is > installed to run the tests, otherwise it just returns "unsupported". Obviously, I should have considered that. Projects such as Mesa or libcamera have extensive test suites for the supported devices. Is that something you'd like to integrate with selftests ? I'm not really sure how that should be done. -- Regards, Laurent Pinchart ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-11 22:42 ` Steven Rostedt 2021-09-11 23:10 ` Laurent Pinchart @ 2021-09-13 11:10 ` Mark Brown 1 sibling, 0 replies; 77+ messages in thread From: Mark Brown @ 2021-09-13 11:10 UTC (permalink / raw) To: Steven Rostedt Cc: Laurent Pinchart, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit [-- Attachment #1: Type: text/plain, Size: 1228 bytes --] On Sat, Sep 11, 2021 at 06:42:05PM -0400, Steven Rostedt wrote: > Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > > If you wanted GPU drivers to have tests in tools/selftests, you'd have > > to move Mesa to the kernel :-) > Some selftests have dependencies. It could require that Mesa is > installed to run the tests, otherwise it just returns "unsupported". There are some constraints on selftests for usability reasons, adding too many dependencies and too exotic a set of dependencies works against that - we already disable the BPF tests by default because it is not reasonable for people who are not actively working on BPF to be able to get the dependencies needed for the testsuite up and running and it was causing disruption to people trying to actually use kselftest for actual testing. Of course there's a balancing act here with having the tests picked and used by people but the kernel is such a big piece of software it seems reasonable to expect that we're not going to end up with everything in one place, and if it's not solving a practical problem for the people actively working with the tests it really doesn't seem like a good use of the limited time people have to work on quality stuff. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-11 14:20 ` Steven Rostedt 2021-09-11 22:08 ` Laurent Pinchart @ 2021-09-11 22:51 ` Mauro Carvalho Chehab 2021-09-11 23:22 ` Mauro Carvalho Chehab 2 siblings, 0 replies; 77+ messages in thread From: Mauro Carvalho Chehab @ 2021-09-11 22:51 UTC (permalink / raw) To: Steven Rostedt Cc: Laurent Pinchart, Thomas Gleixner, Josh Triplett, Jonathan Corbet, ksummit Em Sat, 11 Sep 2021 10:20:07 -0400 Steven Rostedt <rostedt@goodmis.org> escreveu: > On Sat, 11 Sep 2021 03:20:50 +0300 > Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > > > > Freely accessible documentation is really the key. > > > > In principle I'd agree, but that assumes such documentation would exist > > in the first place, with a sufficient level of quality. In many cases an > > open implementation the exercises all device features is a better form > > of documentation than what vendors have, even internally. Of course, the > > opposite is true as well, having seen too much vendor code for my own > > good, there is such a thing as a working for unreadable implementation. > > > > I fully agree with your point about feature completeness by the way, > > vendors will always find ways to hide pieces of the API if they really > > want to, but I think that would be true of documentation as well. > > I would like not only documentation, but also an open source test suite > that simply tests the interface. Honestly, I believe that all new > interfaces to the kernel (open or not) should have full documentation > and a test suite interface before it gets accepted. Fully agreed. > We have > tools/selftests that should be updated with all new interfaces into the > kernel. > > Even if it's just a smoke test, that would be fine. Obviously if > there's a driver without hardware, it can't be tested. But if you have > that hardware, perhaps there could be a simple test suite of the > interface to let you know it is still functional. Those days, if a vendor is adding support for a hardware that requires a new API, it usually means that it is a new hardware under development. Only such vendor may have the hardware. A smoke test would mean about nothing to the ones reviewing the patches, except if the vendor will also be shipping it to the reviewers. Thanks, Mauro ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-11 14:20 ` Steven Rostedt 2021-09-11 22:08 ` Laurent Pinchart 2021-09-11 22:51 ` Mauro Carvalho Chehab @ 2021-09-11 23:22 ` Mauro Carvalho Chehab 2 siblings, 0 replies; 77+ messages in thread From: Mauro Carvalho Chehab @ 2021-09-11 23:22 UTC (permalink / raw) To: Steven Rostedt Cc: Laurent Pinchart, Thomas Gleixner, Josh Triplett, Jonathan Corbet, ksummit Em Sat, 11 Sep 2021 10:20:07 -0400 Steven Rostedt <rostedt@goodmis.org> escreveu: > I would like not only documentation, but also an open source test suite > that simply tests the interface. Honestly, I believe that all new > interfaces to the kernel (open or not) should have full documentation > and a test suite interface before it gets accepted Btw, I've been working on an improvement for scripts/get_abi.pl, in order to allow it to check for missing API documentation: https://lore.kernel.org/lkml/cover.1631112725.git.mchehab+huawei@kernel.org/ It basically reads everything under /sys and at Documentation/ABI, and checks if something was found at sysfs but there's no documentation for it. It allows to optionally search for an specific string (actually, it uses regex): $ ./scripts/get_abi.pl undefined --search-string devices.*cpulistaffinity /sys/devices/pci0000:00/0000:00:01.0/pci_bus/0000:01/cpulistaffinity not found. /sys/devices/pci0000:00/0000:00:01.1/pci_bus/0000:02/cpulistaffinity not found. /sys/devices/pci0000:00/0000:00:01.2/pci_bus/0000:03/cpulistaffinity not found. /sys/devices/pci0000:00/0000:00:1c.0/pci_bus/0000:04/cpulistaffinity not found. /sys/devices/pci0000:00/0000:00:1c.1/pci_bus/0000:05/cpulistaffinity not found. /sys/devices/pci0000:00/0000:00:1c.2/pci_bus/0000:06/cpulistaffinity not found. /sys/devices/pci0000:00/0000:00:1c.4/pci_bus/0000:07/cpulistaffinity not found. /sys/devices/pci0000:00/0000:00:1d.0/pci_bus/0000:72/cpulistaffinity not found. /sys/devices/pci0000:00/0000:00:1d.4/pci_bus/0000:73/cpulistaffinity not found. /sys/devices/pci0000:00/pci_bus/0000:00/cpulistaffinity not found. While it won't check the quality of the ABI, it would let someone to at least double check if a driver is not exposing something undocumented via sysfs. If someone wants to test, the newest version is at: https://github.com/mchehab/linux/commits/get_undefined Thanks, Mauro ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-10 23:55 ` Thomas Gleixner 2021-09-11 0:20 ` Laurent Pinchart @ 2021-09-11 10:31 ` Leon Romanovsky 2021-09-11 11:41 ` Laurent Pinchart 1 sibling, 1 reply; 77+ messages in thread From: Leon Romanovsky @ 2021-09-11 10:31 UTC (permalink / raw) To: Thomas Gleixner Cc: Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit On Sat, Sep 11, 2021 at 01:55:16AM +0200, Thomas Gleixner wrote: > On Fri, Sep 10 2021 at 16:45, Josh Triplett wrote: > > > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote: > >> On media, enforcing userspace to always be open source would > >> have been very bad, as it would prevent several videoconferencing > >> software to exist on Linux. > > > > I don't think we should enforce that all userspace users of an interface > > be Open Source. I do think we should enforce that *some* userspace user > > of an interface be Open Source before we add the interface. > > The real question is whether the interface is documented in a way that > an Open Source implementation is possible. It does not matter whether it > exists at that point in time or not. Even if it exists there is no > guarantee that it is feature complete. > > Freely accessible documentation is really the key. I have more radical view than you and think that documentation is far from being enough. I would like to see any userspace API used (or to be used) in any package which exists in Debiam/Fedora/SuSE. Only this will give us some sort of confidence that API and device are usable to some level. As a side note, we will be able to estimate possible API deprecation/fix/extension based on simple search in package databases. IMHO, github projects to show API usage are the worst possible way to allow acceptance for new userspace API. Thanks > > Thanks, > > tglx > > > ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-11 10:31 ` Leon Romanovsky @ 2021-09-11 11:41 ` Laurent Pinchart 2021-09-11 12:04 ` Leon Romanovsky 0 siblings, 1 reply; 77+ messages in thread From: Laurent Pinchart @ 2021-09-11 11:41 UTC (permalink / raw) To: Leon Romanovsky Cc: Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit On Sat, Sep 11, 2021 at 01:31:02PM +0300, Leon Romanovsky wrote: > On Sat, Sep 11, 2021 at 01:55:16AM +0200, Thomas Gleixner wrote: > > On Fri, Sep 10 2021 at 16:45, Josh Triplett wrote: > > > > > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote: > > >> On media, enforcing userspace to always be open source would > > >> have been very bad, as it would prevent several videoconferencing > > >> software to exist on Linux. > > > > > > I don't think we should enforce that all userspace users of an interface > > > be Open Source. I do think we should enforce that *some* userspace user > > > of an interface be Open Source before we add the interface. > > > > The real question is whether the interface is documented in a way that > > an Open Source implementation is possible. It does not matter whether it > > exists at that point in time or not. Even if it exists there is no > > guarantee that it is feature complete. > > > > Freely accessible documentation is really the key. > > I have more radical view than you and think that documentation is far > from being enough. I would like to see any userspace API used (or to be > used) in any package which exists in Debiam/Fedora/SuSE. We probably need to add Android AOSP to that list, as we have Android-specific APIs (not that I believe we *should* have Android-specific APIs, there's been lots of efforts over the past years to develop standard APIs for use cases that stem from Android, slowly replacing Android-specific APIs in some area, but I don't believe we can realisticly bridge that gap completely overnight, if ever). > Only this will give us some sort of confidence that API and device are usable > to some level. As a side note, we will be able to estimate possible API > deprecation/fix/extension based on simple search in package databases. Linux supports devices from very diverse markets, from very tiny embedded devices to supercomputers. We have drivers for devices that exist in data centres of a single company only, or for which only a handful of units exist through the world. The set of rules that we'll decide on, if any, should take this into account. > IMHO, github projects to show API usage are the worst possible way to > allow acceptance for new userspace API. -- Regards, Laurent Pinchart ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-11 11:41 ` Laurent Pinchart @ 2021-09-11 12:04 ` Leon Romanovsky 2021-09-11 22:04 ` Laurent Pinchart 0 siblings, 1 reply; 77+ messages in thread From: Leon Romanovsky @ 2021-09-11 12:04 UTC (permalink / raw) To: Laurent Pinchart Cc: Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit On Sat, Sep 11, 2021 at 02:41:52PM +0300, Laurent Pinchart wrote: > On Sat, Sep 11, 2021 at 01:31:02PM +0300, Leon Romanovsky wrote: > > On Sat, Sep 11, 2021 at 01:55:16AM +0200, Thomas Gleixner wrote: > > > On Fri, Sep 10 2021 at 16:45, Josh Triplett wrote: > > > > > > > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote: > > > >> On media, enforcing userspace to always be open source would > > > >> have been very bad, as it would prevent several videoconferencing > > > >> software to exist on Linux. > > > > > > > > I don't think we should enforce that all userspace users of an interface > > > > be Open Source. I do think we should enforce that *some* userspace user > > > > of an interface be Open Source before we add the interface. > > > > > > The real question is whether the interface is documented in a way that > > > an Open Source implementation is possible. It does not matter whether it > > > exists at that point in time or not. Even if it exists there is no > > > guarantee that it is feature complete. > > > > > > Freely accessible documentation is really the key. > > > > I have more radical view than you and think that documentation is far > > from being enough. I would like to see any userspace API used (or to be > > used) in any package which exists in Debiam/Fedora/SuSE. > > We probably need to add Android AOSP to that list, as we have > Android-specific APIs (not that I believe we *should* have > Android-specific APIs, there's been lots of efforts over the past years > to develop standard APIs for use cases that stem from Android, slowly > replacing Android-specific APIs in some area, but I don't believe we can > realisticly bridge that gap completely overnight, if ever). Maybe. > > > Only this will give us some sort of confidence that API and device are usable > > to some level. As a side note, we will be able to estimate possible API > > deprecation/fix/extension based on simple search in package databases. > > Linux supports devices from very diverse markets, from very tiny > embedded devices to supercomputers. We have drivers for devices that > exist in data centres of a single company only, or for which only a > handful of units exist through the world. The set of rules that we'll > decide on, if any, should take this into account. I'm part of that group (RDMA) who cares about enterprise, cloud and supercomputers. :) So for us, working out-of-the box (distro packages and not github code drops) is the key to the scalability. Regarding "embedded devices", I remind that we are talking about userspace API and most likely busybox will be used for them, which is also part of larger distro anyway, so fails under category "exists in Debian/Fedora/SuSE". > > > IMHO, github projects to show API usage are the worst possible way to > > allow acceptance for new userspace API. > > -- > Regards, > > Laurent Pinchart ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-11 12:04 ` Leon Romanovsky @ 2021-09-11 22:04 ` Laurent Pinchart 2021-09-12 4:27 ` Leon Romanovsky 0 siblings, 1 reply; 77+ messages in thread From: Laurent Pinchart @ 2021-09-11 22:04 UTC (permalink / raw) To: Leon Romanovsky Cc: Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit Hi Leon, On Sat, Sep 11, 2021 at 03:04:07PM +0300, Leon Romanovsky wrote: > On Sat, Sep 11, 2021 at 02:41:52PM +0300, Laurent Pinchart wrote: > > On Sat, Sep 11, 2021 at 01:31:02PM +0300, Leon Romanovsky wrote: > > > On Sat, Sep 11, 2021 at 01:55:16AM +0200, Thomas Gleixner wrote: > > > > On Fri, Sep 10 2021 at 16:45, Josh Triplett wrote: > > > > > > > > > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote: > > > > >> On media, enforcing userspace to always be open source would > > > > >> have been very bad, as it would prevent several videoconferencing > > > > >> software to exist on Linux. > > > > > > > > > > I don't think we should enforce that all userspace users of an interface > > > > > be Open Source. I do think we should enforce that *some* userspace user > > > > > of an interface be Open Source before we add the interface. > > > > > > > > The real question is whether the interface is documented in a way that > > > > an Open Source implementation is possible. It does not matter whether it > > > > exists at that point in time or not. Even if it exists there is no > > > > guarantee that it is feature complete. > > > > > > > > Freely accessible documentation is really the key. > > > > > > I have more radical view than you and think that documentation is far > > > from being enough. I would like to see any userspace API used (or to be > > > used) in any package which exists in Debiam/Fedora/SuSE. > > > > We probably need to add Android AOSP to that list, as we have > > Android-specific APIs (not that I believe we *should* have > > Android-specific APIs, there's been lots of efforts over the past years > > to develop standard APIs for use cases that stem from Android, slowly > > replacing Android-specific APIs in some area, but I don't believe we can > > realisticly bridge that gap completely overnight, if ever). > > Maybe. > > > > Only this will give us some sort of confidence that API and device are usable > > > to some level. As a side note, we will be able to estimate possible API > > > deprecation/fix/extension based on simple search in package databases. > > > > Linux supports devices from very diverse markets, from very tiny > > embedded devices to supercomputers. We have drivers for devices that > > exist in data centres of a single company only, or for which only a > > handful of units exist through the world. The set of rules that we'll > > decide on, if any, should take this into account. > > I'm part of that group (RDMA) who cares about enterprise, cloud and supercomputers. :) > So for us, working out-of-the box (distro packages and not github code drops) is > the key to the scalability. What if we're dealing with a device that only exists in a handful of machines though ? Would distributions accept the burden of packaging corresponding userspace code, and maintaining the packages, when only a handful of people in the world will use it ? It's a genuine question. > Regarding "embedded devices", I remind that we are talking about > userspace API and most likely busybox will be used for them, which is > also part of larger distro anyway, so fails under category "exists in > Debian/Fedora/SuSE". We're talking about APIs exposed by drivers, for devices such as GPUs, cameras or AI/ML accelerators. I don't think busybox will exercise those :-) We have Masa for GPUs, libcamera for cameras, and other frameworks I'm less familiar with for AI/ML accelerators, and I expect those to be packaged by distributions. There are however other kind of devices that don't fall in existing well-defined categories. I'm thinking, for instance, about dewarp engines that are used to create 3D surround view for cars. In a nutshell, those devices take a set of texture and a list of triangles, and perform texture mapping. They're a bit like GPUs but without 3D, so APIs such as OpenGL or Vulkan don't apply. There's no standard API for such devices, and no existing userspace framework similar to Mesa in which a vendor could upstream the open userspace driver code. I believe that requiring an open userspace to merge such drivers in the kernel would make sense, but I also don't think it would be reasonable to ask the first vendor who wants to do so to create a complete userspace framework with a standard API. The bar to entry would be too high. An open implementation specific to that device, with a custom application API, would be a good first step, and it could serve as a basis to create a framework once a second vendor wants to do the same. We have to set the end goal, but also consider how it can be reached. > > > IMHO, github projects to show API usage are the worst possible way to > > > allow acceptance for new userspace API. -- Regards, Laurent Pinchart ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-11 22:04 ` Laurent Pinchart @ 2021-09-12 4:27 ` Leon Romanovsky 2021-09-12 7:26 ` Greg KH 2021-09-12 7:46 ` Mauro Carvalho Chehab 0 siblings, 2 replies; 77+ messages in thread From: Leon Romanovsky @ 2021-09-12 4:27 UTC (permalink / raw) To: Laurent Pinchart Cc: Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit On Sun, Sep 12, 2021 at 01:04:01AM +0300, Laurent Pinchart wrote: > Hi Leon, > > On Sat, Sep 11, 2021 at 03:04:07PM +0300, Leon Romanovsky wrote: > > On Sat, Sep 11, 2021 at 02:41:52PM +0300, Laurent Pinchart wrote: > > > On Sat, Sep 11, 2021 at 01:31:02PM +0300, Leon Romanovsky wrote: > > > > On Sat, Sep 11, 2021 at 01:55:16AM +0200, Thomas Gleixner wrote: > > > > > On Fri, Sep 10 2021 at 16:45, Josh Triplett wrote: > > > > > > > > > > > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote: > > > > > >> On media, enforcing userspace to always be open source would > > > > > >> have been very bad, as it would prevent several videoconferencing > > > > > >> software to exist on Linux. > > > > > > > > > > > > I don't think we should enforce that all userspace users of an interface > > > > > > be Open Source. I do think we should enforce that *some* userspace user > > > > > > of an interface be Open Source before we add the interface. > > > > > > > > > > The real question is whether the interface is documented in a way that > > > > > an Open Source implementation is possible. It does not matter whether it > > > > > exists at that point in time or not. Even if it exists there is no > > > > > guarantee that it is feature complete. > > > > > > > > > > Freely accessible documentation is really the key. > > > > > > > > I have more radical view than you and think that documentation is far > > > > from being enough. I would like to see any userspace API used (or to be > > > > used) in any package which exists in Debiam/Fedora/SuSE. > > > > > > We probably need to add Android AOSP to that list, as we have > > > Android-specific APIs (not that I believe we *should* have > > > Android-specific APIs, there's been lots of efforts over the past years > > > to develop standard APIs for use cases that stem from Android, slowly > > > replacing Android-specific APIs in some area, but I don't believe we can > > > realisticly bridge that gap completely overnight, if ever). > > > > Maybe. > > > > > > Only this will give us some sort of confidence that API and device are usable > > > > to some level. As a side note, we will be able to estimate possible API > > > > deprecation/fix/extension based on simple search in package databases. > > > > > > Linux supports devices from very diverse markets, from very tiny > > > embedded devices to supercomputers. We have drivers for devices that > > > exist in data centres of a single company only, or for which only a > > > handful of units exist through the world. The set of rules that we'll > > > decide on, if any, should take this into account. > > > > I'm part of that group (RDMA) who cares about enterprise, cloud and supercomputers. :) > > So for us, working out-of-the box (distro packages and not github code drops) is > > the key to the scalability. > > What if we're dealing with a device that only exists in a handful of > machines though ? Would distributions accept the burden of packaging > corresponding userspace code, and maintaining the packages, when only a > handful of people in the world will use it ? It's a genuine question. Fedora, Debian and OpenSuSE are volunteer based distributions, they accept new packages, which need to be prepared (or asked to be prepared) by such vendors. There is no "accept the burden of packaging corresponding userspace code, and maintaining the packages", it is on package maintainer who can or can't be associated with distribution. > > > Regarding "embedded devices", I remind that we are talking about > > userspace API and most likely busybox will be used for them, which is > > also part of larger distro anyway, so fails under category "exists in > > Debian/Fedora/SuSE". > > We're talking about APIs exposed by drivers, for devices such as GPUs, > cameras or AI/ML accelerators. I don't think busybox will exercise those > :-) We have Masa for GPUs, libcamera for cameras, and other frameworks > I'm less familiar with for AI/ML accelerators, and I expect those to be > packaged by distributions. There are however other kind of devices that > don't fall in existing well-defined categories. I'm a little bit confused here. IMHO, you are trying to find an universal solution for a problem that doesn't exist. Above you asked how to deal with niche devices? Here you talk about mass products devices for the enterprise while before you mentioned "embedded devices". 1. Niche devices - continue to do as they do it now, by supplying out-of-tree solutions for their customers. Such devices and companies rarely need upstream linux kernel support, because the burden to upstream it is very high. We don't want them in the tree either, because once they upstream it, the maintenance burden will be on us. 2. Devices that hits the certain level of adoption - need to be integrated into certain userspace stack, which needs to be part of distro. And AI/ML is no different here, someone just need to start build such stack. Otherwise, we will continue to see more free riders like HabanaLabs which don't have any real benefit to the community. > > I'm thinking, for instance, about dewarp engines that are used to create > 3D surround view for cars. In a nutshell, those devices take a set of > texture and a list of triangles, and perform texture mapping. They're a > bit like GPUs but without 3D, so APIs such as OpenGL or Vulkan don't > apply. There's no standard API for such devices, and no existing > userspace framework similar to Mesa in which a vendor could upstream the > open userspace driver code. I believe that requiring an open userspace > to merge such drivers in the kernel would make sense, but I also don't > think it would be reasonable to ask the first vendor who wants to do so > to create a complete userspace framework with a standard API. The bar to > entry would be too high. An open implementation specific to that device, > with a custom application API, would be a good first step, and it could > serve as a basis to create a framework once a second vendor wants to do > the same. We have to set the end goal, but also consider how it can be > reached. The bar needs to be high from the beginning, We can lower it later if it doesn't work. Thanks > > > > > IMHO, github projects to show API usage are the worst possible way to > > > > allow acceptance for new userspace API. > > -- > Regards, > > Laurent Pinchart ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-12 4:27 ` Leon Romanovsky @ 2021-09-12 7:26 ` Greg KH 2021-09-12 8:29 ` Leon Romanovsky 2021-09-12 19:52 ` Dave Airlie 2021-09-12 7:46 ` Mauro Carvalho Chehab 1 sibling, 2 replies; 77+ messages in thread From: Greg KH @ 2021-09-12 7:26 UTC (permalink / raw) To: Leon Romanovsky Cc: Laurent Pinchart, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit On Sun, Sep 12, 2021 at 07:27:55AM +0300, Leon Romanovsky wrote: > On Sun, Sep 12, 2021 at 01:04:01AM +0300, Laurent Pinchart wrote: > > Hi Leon, > > > > On Sat, Sep 11, 2021 at 03:04:07PM +0300, Leon Romanovsky wrote: > > > On Sat, Sep 11, 2021 at 02:41:52PM +0300, Laurent Pinchart wrote: > > > > On Sat, Sep 11, 2021 at 01:31:02PM +0300, Leon Romanovsky wrote: > > > > > On Sat, Sep 11, 2021 at 01:55:16AM +0200, Thomas Gleixner wrote: > > > > > > On Fri, Sep 10 2021 at 16:45, Josh Triplett wrote: > > > > > > > > > > > > > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote: > > > > > > >> On media, enforcing userspace to always be open source would > > > > > > >> have been very bad, as it would prevent several videoconferencing > > > > > > >> software to exist on Linux. > > > > > > > > > > > > > > I don't think we should enforce that all userspace users of an interface > > > > > > > be Open Source. I do think we should enforce that *some* userspace user > > > > > > > of an interface be Open Source before we add the interface. > > > > > > > > > > > > The real question is whether the interface is documented in a way that > > > > > > an Open Source implementation is possible. It does not matter whether it > > > > > > exists at that point in time or not. Even if it exists there is no > > > > > > guarantee that it is feature complete. > > > > > > > > > > > > Freely accessible documentation is really the key. > > > > > > > > > > I have more radical view than you and think that documentation is far > > > > > from being enough. I would like to see any userspace API used (or to be > > > > > used) in any package which exists in Debiam/Fedora/SuSE. > > > > > > > > We probably need to add Android AOSP to that list, as we have > > > > Android-specific APIs (not that I believe we *should* have > > > > Android-specific APIs, there's been lots of efforts over the past years > > > > to develop standard APIs for use cases that stem from Android, slowly > > > > replacing Android-specific APIs in some area, but I don't believe we can > > > > realisticly bridge that gap completely overnight, if ever). > > > > > > Maybe. > > > > > > > > Only this will give us some sort of confidence that API and device are usable > > > > > to some level. As a side note, we will be able to estimate possible API > > > > > deprecation/fix/extension based on simple search in package databases. > > > > > > > > Linux supports devices from very diverse markets, from very tiny > > > > embedded devices to supercomputers. We have drivers for devices that > > > > exist in data centres of a single company only, or for which only a > > > > handful of units exist through the world. The set of rules that we'll > > > > decide on, if any, should take this into account. > > > > > > I'm part of that group (RDMA) who cares about enterprise, cloud and supercomputers. :) > > > So for us, working out-of-the box (distro packages and not github code drops) is > > > the key to the scalability. > > > > What if we're dealing with a device that only exists in a handful of > > machines though ? Would distributions accept the burden of packaging > > corresponding userspace code, and maintaining the packages, when only a > > handful of people in the world will use it ? It's a genuine question. > > Fedora, Debian and OpenSuSE are volunteer based distributions, they > accept new packages, which need to be prepared (or asked to be > prepared) by such vendors. > > There is no "accept the burden of packaging corresponding userspace code, > and maintaining the packages", it is on package maintainer who can or > can't be associated with distribution. > > > > > > Regarding "embedded devices", I remind that we are talking about > > > userspace API and most likely busybox will be used for them, which is > > > also part of larger distro anyway, so fails under category "exists in > > > Debian/Fedora/SuSE". > > > > We're talking about APIs exposed by drivers, for devices such as GPUs, > > cameras or AI/ML accelerators. I don't think busybox will exercise those > > :-) We have Masa for GPUs, libcamera for cameras, and other frameworks > > I'm less familiar with for AI/ML accelerators, and I expect those to be > > packaged by distributions. There are however other kind of devices that > > don't fall in existing well-defined categories. > > I'm a little bit confused here. IMHO, you are trying to find an universal > solution for a problem that doesn't exist. > > Above you asked how to deal with niche devices? Here you talk about mass > products devices for the enterprise while before you mentioned "embedded > devices". > > 1. Niche devices - continue to do as they do it now, by supplying > out-of-tree solutions for their customers. Such devices and companies > rarely need upstream linux kernel support, because the burden to > upstream it is very high. We don't want them in the tree either, because > once they upstream it, the maintenance burden will be on us. {sigh} No, that is NOT our rule at all. These devices and companies need to be upstream more than anything else as that way they become part of our community and are responsible for maintaining their code in the tree. To force them to remain outside is to go against everything that many of us have been saying for _decades_ now. And how are you going to judge what is, and is not, a "niche" device? > 2. Devices that hits the certain level of adoption - need to be > integrated into certain userspace stack, which needs to be part of > distro. Distros are a very odd rule to rely on given that they are by far the minority of the usage in raw numbers for Linux in the world. > And AI/ML is no different here, someone just need to start build such > stack. Otherwise, we will continue to see more free riders like HabanaLabs > which don't have any real benefit to the community. Everyone contributes to Linux in a selfish manner, that's just how the community works. The work that companies like habanalabs is NOT being a "free rider" at all, they have worked with us and done the hard work of actually getting their code merged into the tree and their userspace code released under an open source license (unlike _ALL_ other AI/ML companies, including Intel). It would have been much cheaper and quicker of them to just ignore upstream entirely, but that would have meant that the community would not have any idea of what exactly these use-case models were nor what the problems were that they were trying to get Linux to do. Linux benefits overall by having everyone participate, do NOT make arbitrary rules to somehow prevent one company/group from being allowed to upstream their code vs. another. That is NOT how we have worked in the past, and would only cause us to slowly die and become irrelevant. thanks, greg k-h ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-12 7:26 ` Greg KH @ 2021-09-12 8:29 ` Leon Romanovsky 2021-09-12 13:25 ` Greg KH 2021-09-12 19:52 ` Dave Airlie 1 sibling, 1 reply; 77+ messages in thread From: Leon Romanovsky @ 2021-09-12 8:29 UTC (permalink / raw) To: Greg KH Cc: Laurent Pinchart, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit On Sun, Sep 12, 2021 at 09:26:57AM +0200, Greg KH wrote: > On Sun, Sep 12, 2021 at 07:27:55AM +0300, Leon Romanovsky wrote: > > On Sun, Sep 12, 2021 at 01:04:01AM +0300, Laurent Pinchart wrote: > > > Hi Leon, > > > > > > On Sat, Sep 11, 2021 at 03:04:07PM +0300, Leon Romanovsky wrote: > > > > On Sat, Sep 11, 2021 at 02:41:52PM +0300, Laurent Pinchart wrote: > > > > > On Sat, Sep 11, 2021 at 01:31:02PM +0300, Leon Romanovsky wrote: > > > > > > On Sat, Sep 11, 2021 at 01:55:16AM +0200, Thomas Gleixner wrote: > > > > > > > On Fri, Sep 10 2021 at 16:45, Josh Triplett wrote: > > > > > > > > > > > > > > > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote: > > > > > > > >> On media, enforcing userspace to always be open source would > > > > > > > >> have been very bad, as it would prevent several videoconferencing > > > > > > > >> software to exist on Linux. > > > > > > > > > > > > > > > > I don't think we should enforce that all userspace users of an interface > > > > > > > > be Open Source. I do think we should enforce that *some* userspace user > > > > > > > > of an interface be Open Source before we add the interface. > > > > > > > > > > > > > > The real question is whether the interface is documented in a way that > > > > > > > an Open Source implementation is possible. It does not matter whether it > > > > > > > exists at that point in time or not. Even if it exists there is no > > > > > > > guarantee that it is feature complete. > > > > > > > > > > > > > > Freely accessible documentation is really the key. > > > > > > > > > > > > I have more radical view than you and think that documentation is far > > > > > > from being enough. I would like to see any userspace API used (or to be > > > > > > used) in any package which exists in Debiam/Fedora/SuSE. > > > > > > > > > > We probably need to add Android AOSP to that list, as we have > > > > > Android-specific APIs (not that I believe we *should* have > > > > > Android-specific APIs, there's been lots of efforts over the past years > > > > > to develop standard APIs for use cases that stem from Android, slowly > > > > > replacing Android-specific APIs in some area, but I don't believe we can > > > > > realisticly bridge that gap completely overnight, if ever). > > > > > > > > Maybe. > > > > > > > > > > Only this will give us some sort of confidence that API and device are usable > > > > > > to some level. As a side note, we will be able to estimate possible API > > > > > > deprecation/fix/extension based on simple search in package databases. > > > > > > > > > > Linux supports devices from very diverse markets, from very tiny > > > > > embedded devices to supercomputers. We have drivers for devices that > > > > > exist in data centres of a single company only, or for which only a > > > > > handful of units exist through the world. The set of rules that we'll > > > > > decide on, if any, should take this into account. > > > > > > > > I'm part of that group (RDMA) who cares about enterprise, cloud and supercomputers. :) > > > > So for us, working out-of-the box (distro packages and not github code drops) is > > > > the key to the scalability. > > > > > > What if we're dealing with a device that only exists in a handful of > > > machines though ? Would distributions accept the burden of packaging > > > corresponding userspace code, and maintaining the packages, when only a > > > handful of people in the world will use it ? It's a genuine question. > > > > Fedora, Debian and OpenSuSE are volunteer based distributions, they > > accept new packages, which need to be prepared (or asked to be > > prepared) by such vendors. > > > > There is no "accept the burden of packaging corresponding userspace code, > > and maintaining the packages", it is on package maintainer who can or > > can't be associated with distribution. > > > > > > > > > Regarding "embedded devices", I remind that we are talking about > > > > userspace API and most likely busybox will be used for them, which is > > > > also part of larger distro anyway, so fails under category "exists in > > > > Debian/Fedora/SuSE". > > > > > > We're talking about APIs exposed by drivers, for devices such as GPUs, > > > cameras or AI/ML accelerators. I don't think busybox will exercise those > > > :-) We have Masa for GPUs, libcamera for cameras, and other frameworks > > > I'm less familiar with for AI/ML accelerators, and I expect those to be > > > packaged by distributions. There are however other kind of devices that > > > don't fall in existing well-defined categories. > > > > I'm a little bit confused here. IMHO, you are trying to find an universal > > solution for a problem that doesn't exist. > > > > Above you asked how to deal with niche devices? Here you talk about mass > > products devices for the enterprise while before you mentioned "embedded > > devices". > > > > 1. Niche devices - continue to do as they do it now, by supplying > > out-of-tree solutions for their customers. Such devices and companies > > rarely need upstream linux kernel support, because the burden to > > upstream it is very high. We don't want them in the tree either, because > > once they upstream it, the maintenance burden will be on us. > > {sigh} > > No, that is NOT our rule at all. > > These devices and companies need to be upstream more than anything else > as that way they become part of our community and are responsible for > maintaining their code in the tree. To force them to remain outside is > to go against everything that many of us have been saying for _decades_ > now. > > And how are you going to judge what is, and is not, a "niche" device? I will leave to that company to decide. Again this is exactly how they operate now, there is nothing new here. Every company calculates ROI for working with upstream and small companies with niche devices are not different here. The main idea that I want to see working userspace stack, and being in distro sets a certain quality level, am I asking too much? > > > 2. Devices that hits the certain level of adoption - need to be > > integrated into certain userspace stack, which needs to be part of > > distro. > > Distros are a very odd rule to rely on given that they are by far the > minority of the usage in raw numbers for Linux in the world. You can count Android as another distro, it is just semantics. > > > And AI/ML is no different here, someone just need to start build such > > stack. Otherwise, we will continue to see more free riders like HabanaLabs > > which don't have any real benefit to the community. > > Everyone contributes to Linux in a selfish manner, that's just how the > community works. The work that companies like habanalabs is NOT being a > "free rider" at all, they have worked with us and done the hard work of > actually getting their code merged into the tree. I perfectly remember them trying to bypass netdev and RDMA communities by pretending "misc" device. https://lore.kernel.org/linux-rdma/20200915133556.21268811@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/ https://lore.kernel.org/linux-rdma/20200917171833.GJ8409@ziepe.ca/ Or DRM https://lore.kernel.org/linux-rdma/CAKMK7uFOfoxbD2Z5mb-qHFnUe5rObGKQ6Ygh--HSH9M=9bziGg@mail.gmail.com/ So I can agree with the statement "worked hard", but not with the relevant communities. > code released under an open source license (unlike _ALL_ other AI/ML > companies, including Intel). Yes, they provided user-space library, but didn't release compiler, so till recently, it wasn't usable at all. > It would have been much cheaper and > quicker of them to just ignore upstream entirely, but that would have > meant that the community would not have any idea of what exactly these > use-case models were nor what the problems were that they were trying to > get Linux to do. The thing is that community talks about AI/ML stack for a long time, but as long as backdoor to merge code exists, we won't have anything good for the end users. > > Linux benefits overall by having everyone participate, do NOT make > arbitrary rules to somehow prevent one company/group from being allowed > to upstream their code vs. another. That is NOT how we have worked in > the past, and would only cause us to slowly die and become irrelevant. Somehow, we have rules, for example, we require user space part for any API merged. Should we cancel it too? so all groups and companies will be able to contribute. Thanks > > thanks, > > greg k-h ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-12 8:29 ` Leon Romanovsky @ 2021-09-12 13:25 ` Greg KH 2021-09-12 14:15 ` Leon Romanovsky 2021-09-12 15:55 ` Laurent Pinchart 0 siblings, 2 replies; 77+ messages in thread From: Greg KH @ 2021-09-12 13:25 UTC (permalink / raw) To: Leon Romanovsky Cc: Laurent Pinchart, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit On Sun, Sep 12, 2021 at 11:29:45AM +0300, Leon Romanovsky wrote: > On Sun, Sep 12, 2021 at 09:26:57AM +0200, Greg KH wrote: > > On Sun, Sep 12, 2021 at 07:27:55AM +0300, Leon Romanovsky wrote: > > > On Sun, Sep 12, 2021 at 01:04:01AM +0300, Laurent Pinchart wrote: > > > > Hi Leon, > > > > > > > > On Sat, Sep 11, 2021 at 03:04:07PM +0300, Leon Romanovsky wrote: > > > > > On Sat, Sep 11, 2021 at 02:41:52PM +0300, Laurent Pinchart wrote: > > > > > > On Sat, Sep 11, 2021 at 01:31:02PM +0300, Leon Romanovsky wrote: > > > > > > > On Sat, Sep 11, 2021 at 01:55:16AM +0200, Thomas Gleixner wrote: > > > > > > > > On Fri, Sep 10 2021 at 16:45, Josh Triplett wrote: > > > > > > > > > > > > > > > > > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote: > > > > > > > > >> On media, enforcing userspace to always be open source would > > > > > > > > >> have been very bad, as it would prevent several videoconferencing > > > > > > > > >> software to exist on Linux. > > > > > > > > > > > > > > > > > > I don't think we should enforce that all userspace users of an interface > > > > > > > > > be Open Source. I do think we should enforce that *some* userspace user > > > > > > > > > of an interface be Open Source before we add the interface. > > > > > > > > > > > > > > > > The real question is whether the interface is documented in a way that > > > > > > > > an Open Source implementation is possible. It does not matter whether it > > > > > > > > exists at that point in time or not. Even if it exists there is no > > > > > > > > guarantee that it is feature complete. > > > > > > > > > > > > > > > > Freely accessible documentation is really the key. > > > > > > > > > > > > > > I have more radical view than you and think that documentation is far > > > > > > > from being enough. I would like to see any userspace API used (or to be > > > > > > > used) in any package which exists in Debiam/Fedora/SuSE. > > > > > > > > > > > > We probably need to add Android AOSP to that list, as we have > > > > > > Android-specific APIs (not that I believe we *should* have > > > > > > Android-specific APIs, there's been lots of efforts over the past years > > > > > > to develop standard APIs for use cases that stem from Android, slowly > > > > > > replacing Android-specific APIs in some area, but I don't believe we can > > > > > > realisticly bridge that gap completely overnight, if ever). > > > > > > > > > > Maybe. > > > > > > > > > > > > Only this will give us some sort of confidence that API and device are usable > > > > > > > to some level. As a side note, we will be able to estimate possible API > > > > > > > deprecation/fix/extension based on simple search in package databases. > > > > > > > > > > > > Linux supports devices from very diverse markets, from very tiny > > > > > > embedded devices to supercomputers. We have drivers for devices that > > > > > > exist in data centres of a single company only, or for which only a > > > > > > handful of units exist through the world. The set of rules that we'll > > > > > > decide on, if any, should take this into account. > > > > > > > > > > I'm part of that group (RDMA) who cares about enterprise, cloud and supercomputers. :) > > > > > So for us, working out-of-the box (distro packages and not github code drops) is > > > > > the key to the scalability. > > > > > > > > What if we're dealing with a device that only exists in a handful of > > > > machines though ? Would distributions accept the burden of packaging > > > > corresponding userspace code, and maintaining the packages, when only a > > > > handful of people in the world will use it ? It's a genuine question. > > > > > > Fedora, Debian and OpenSuSE are volunteer based distributions, they > > > accept new packages, which need to be prepared (or asked to be > > > prepared) by such vendors. > > > > > > There is no "accept the burden of packaging corresponding userspace code, > > > and maintaining the packages", it is on package maintainer who can or > > > can't be associated with distribution. > > > > > > > > > > > > Regarding "embedded devices", I remind that we are talking about > > > > > userspace API and most likely busybox will be used for them, which is > > > > > also part of larger distro anyway, so fails under category "exists in > > > > > Debian/Fedora/SuSE". > > > > > > > > We're talking about APIs exposed by drivers, for devices such as GPUs, > > > > cameras or AI/ML accelerators. I don't think busybox will exercise those > > > > :-) We have Masa for GPUs, libcamera for cameras, and other frameworks > > > > I'm less familiar with for AI/ML accelerators, and I expect those to be > > > > packaged by distributions. There are however other kind of devices that > > > > don't fall in existing well-defined categories. > > > > > > I'm a little bit confused here. IMHO, you are trying to find an universal > > > solution for a problem that doesn't exist. > > > > > > Above you asked how to deal with niche devices? Here you talk about mass > > > products devices for the enterprise while before you mentioned "embedded > > > devices". > > > > > > 1. Niche devices - continue to do as they do it now, by supplying > > > out-of-tree solutions for their customers. Such devices and companies > > > rarely need upstream linux kernel support, because the burden to > > > upstream it is very high. We don't want them in the tree either, because > > > once they upstream it, the maintenance burden will be on us. > > > > {sigh} > > > > No, that is NOT our rule at all. > > > > These devices and companies need to be upstream more than anything else > > as that way they become part of our community and are responsible for > > maintaining their code in the tree. To force them to remain outside is > > to go against everything that many of us have been saying for _decades_ > > now. > > > > And how are you going to judge what is, and is not, a "niche" device? > > I will leave to that company to decide. Again this is exactly how they > operate now, there is nothing new here. Every company calculates ROI > for working with upstream and small companies with niche devices are not > different here. > > The main idea that I want to see working userspace stack, and being in > distro sets a certain quality level, am I asking too much? Define "working userspace stack" and "distro" please. Like others have said, many distros will not take userspace code unless it's already in the kernel tree first, as that ensures that the abi will not break. > > > 2. Devices that hits the certain level of adoption - need to be > > > integrated into certain userspace stack, which needs to be part of > > > distro. > > > > Distros are a very odd rule to rely on given that they are by far the > > minority of the usage in raw numbers for Linux in the world. > > You can count Android as another distro, it is just semantics. But how do you define Android's userspace? Just one vendor? 2 vendors? 10 vendors? There is major userspace fragmentation in Android userspace in many places, the user/kernel boundry being one of the big ones as many of us have found out over the past years. And many of us are working to resolve this, but it's not so simple at times, and I have many examples if you want specifics. > > > And AI/ML is no different here, someone just need to start build such > > > stack. Otherwise, we will continue to see more free riders like HabanaLabs > > > which don't have any real benefit to the community. > > > > Everyone contributes to Linux in a selfish manner, that's just how the > > community works. The work that companies like habanalabs is NOT being a > > "free rider" at all, they have worked with us and done the hard work of > > actually getting their code merged into the tree. > > I perfectly remember them trying to bypass netdev and RDMA communities > by pretending "misc" device. > > https://lore.kernel.org/linux-rdma/20200915133556.21268811@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/ > https://lore.kernel.org/linux-rdma/20200917171833.GJ8409@ziepe.ca/ > > Or DRM > https://lore.kernel.org/linux-rdma/CAKMK7uFOfoxbD2Z5mb-qHFnUe5rObGKQ6Ygh--HSH9M=9bziGg@mail.gmail.com/ > > So I can agree with the statement "worked hard", but not with the > relevant communities. I point at these as doing exactly what we want vendors to be doing! Thank you for finding the good examples. This is a vendor submitting patches and saying, "here is what we want to do, with a first cut at doing it." It's up to us as a community to tell them if they are doing it the right way or not. If we just let them all go their own ways, they will come up with horrible apis and interfaces, we have all seen that before. So by working together, we both can learn from, and work together to solve the issue. And that is what these driver authors and company has been doing! They are part of our community, why are you saying they should now just go do their own thing away from us? And as for "bypassing", that feels very mean. We have had accelerator code in the char/misc and other parts of the kernel tree since at least 2018 if not earlier (I didn't look all that hard.) Just because someone wanted to use the in-kernel apis that are there (why is dma-buf some magic thing?) does not mean that they suddenly need to move to a different subsystem. We get at least 1-2 new subsystems and major drivers that get added to the kernel tree that do things that have never been done before with custom user/kernel apis every kernel release. Not everything can be a standard api no matter how much I, and others, wish it were. As examples, what about the hyperv blob api that was submitted recently going around the block layer? What about the new Intel accelerator that added yet-another-set-of-custom-ioctls? What about the rpi drivers? What about the virtualbox drivers? Should all of those just live outside of the kernel for forever? Of course not. thanks, greg k-h ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-12 13:25 ` Greg KH @ 2021-09-12 14:15 ` Leon Romanovsky 2021-09-12 14:34 ` Greg KH 2021-09-12 15:55 ` Laurent Pinchart 1 sibling, 1 reply; 77+ messages in thread From: Leon Romanovsky @ 2021-09-12 14:15 UTC (permalink / raw) To: Greg KH Cc: Laurent Pinchart, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit On Sun, Sep 12, 2021 at 03:25:58PM +0200, Greg KH wrote: > On Sun, Sep 12, 2021 at 11:29:45AM +0300, Leon Romanovsky wrote: > > On Sun, Sep 12, 2021 at 09:26:57AM +0200, Greg KH wrote: > > > On Sun, Sep 12, 2021 at 07:27:55AM +0300, Leon Romanovsky wrote: > > > > On Sun, Sep 12, 2021 at 01:04:01AM +0300, Laurent Pinchart wrote: > > > > > Hi Leon, > > > > > > > > > > On Sat, Sep 11, 2021 at 03:04:07PM +0300, Leon Romanovsky wrote: > > > > > > On Sat, Sep 11, 2021 at 02:41:52PM +0300, Laurent Pinchart wrote: > > > > > > > On Sat, Sep 11, 2021 at 01:31:02PM +0300, Leon Romanovsky wrote: > > > > > > > > On Sat, Sep 11, 2021 at 01:55:16AM +0200, Thomas Gleixner wrote: > > > > > > > > > On Fri, Sep 10 2021 at 16:45, Josh Triplett wrote: > > > > > > > > > > > > > > > > > > > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote: > > > > > > > > > >> On media, enforcing userspace to always be open source would > > > > > > > > > >> have been very bad, as it would prevent several videoconferencing > > > > > > > > > >> software to exist on Linux. > > > > > > > > > > > > > > > > > > > > I don't think we should enforce that all userspace users of an interface > > > > > > > > > > be Open Source. I do think we should enforce that *some* userspace user > > > > > > > > > > of an interface be Open Source before we add the interface. > > > > > > > > > > > > > > > > > > The real question is whether the interface is documented in a way that > > > > > > > > > an Open Source implementation is possible. It does not matter whether it > > > > > > > > > exists at that point in time or not. Even if it exists there is no > > > > > > > > > guarantee that it is feature complete. > > > > > > > > > > > > > > > > > > Freely accessible documentation is really the key. > > > > > > > > > > > > > > > > I have more radical view than you and think that documentation is far > > > > > > > > from being enough. I would like to see any userspace API used (or to be > > > > > > > > used) in any package which exists in Debiam/Fedora/SuSE. > > > > > > > > > > > > > > We probably need to add Android AOSP to that list, as we have > > > > > > > Android-specific APIs (not that I believe we *should* have > > > > > > > Android-specific APIs, there's been lots of efforts over the past years > > > > > > > to develop standard APIs for use cases that stem from Android, slowly > > > > > > > replacing Android-specific APIs in some area, but I don't believe we can > > > > > > > realisticly bridge that gap completely overnight, if ever). > > > > > > > > > > > > Maybe. > > > > > > > > > > > > > > Only this will give us some sort of confidence that API and device are usable > > > > > > > > to some level. As a side note, we will be able to estimate possible API > > > > > > > > deprecation/fix/extension based on simple search in package databases. > > > > > > > > > > > > > > Linux supports devices from very diverse markets, from very tiny > > > > > > > embedded devices to supercomputers. We have drivers for devices that > > > > > > > exist in data centres of a single company only, or for which only a > > > > > > > handful of units exist through the world. The set of rules that we'll > > > > > > > decide on, if any, should take this into account. > > > > > > > > > > > > I'm part of that group (RDMA) who cares about enterprise, cloud and supercomputers. :) > > > > > > So for us, working out-of-the box (distro packages and not github code drops) is > > > > > > the key to the scalability. > > > > > > > > > > What if we're dealing with a device that only exists in a handful of > > > > > machines though ? Would distributions accept the burden of packaging > > > > > corresponding userspace code, and maintaining the packages, when only a > > > > > handful of people in the world will use it ? It's a genuine question. > > > > > > > > Fedora, Debian and OpenSuSE are volunteer based distributions, they > > > > accept new packages, which need to be prepared (or asked to be > > > > prepared) by such vendors. > > > > > > > > There is no "accept the burden of packaging corresponding userspace code, > > > > and maintaining the packages", it is on package maintainer who can or > > > > can't be associated with distribution. > > > > > > > > > > > > > > > Regarding "embedded devices", I remind that we are talking about > > > > > > userspace API and most likely busybox will be used for them, which is > > > > > > also part of larger distro anyway, so fails under category "exists in > > > > > > Debian/Fedora/SuSE". > > > > > > > > > > We're talking about APIs exposed by drivers, for devices such as GPUs, > > > > > cameras or AI/ML accelerators. I don't think busybox will exercise those > > > > > :-) We have Masa for GPUs, libcamera for cameras, and other frameworks > > > > > I'm less familiar with for AI/ML accelerators, and I expect those to be > > > > > packaged by distributions. There are however other kind of devices that > > > > > don't fall in existing well-defined categories. > > > > > > > > I'm a little bit confused here. IMHO, you are trying to find an universal > > > > solution for a problem that doesn't exist. > > > > > > > > Above you asked how to deal with niche devices? Here you talk about mass > > > > products devices for the enterprise while before you mentioned "embedded > > > > devices". > > > > > > > > 1. Niche devices - continue to do as they do it now, by supplying > > > > out-of-tree solutions for their customers. Such devices and companies > > > > rarely need upstream linux kernel support, because the burden to > > > > upstream it is very high. We don't want them in the tree either, because > > > > once they upstream it, the maintenance burden will be on us. > > > > > > {sigh} > > > > > > No, that is NOT our rule at all. > > > > > > These devices and companies need to be upstream more than anything else > > > as that way they become part of our community and are responsible for > > > maintaining their code in the tree. To force them to remain outside is > > > to go against everything that many of us have been saying for _decades_ > > > now. > > > > > > And how are you going to judge what is, and is not, a "niche" device? > > > > I will leave to that company to decide. Again this is exactly how they > > operate now, there is nothing new here. Every company calculates ROI > > for working with upstream and small companies with niche devices are not > > different here. > > > > The main idea that I want to see working userspace stack, and being in > > distro sets a certain quality level, am I asking too much? > > Define "working userspace stack" and "distro" please. Like others have > said, many distros will not take userspace code unless it's already in > the kernel tree first, as that ensures that the abi will not break. Like I already answered https://lore.kernel.org/all/YT2zryAKHc%2F5R2IH@unreal/ "To be used" means some open PR to existing package or request for inclusion for new packages. > > > > > 2. Devices that hits the certain level of adoption - need to be > > > > integrated into certain userspace stack, which needs to be part of > > > > distro. > > > > > > Distros are a very odd rule to rely on given that they are by far the > > > minority of the usage in raw numbers for Linux in the world. > > > > You can count Android as another distro, it is just semantics. > > But how do you define Android's userspace? Just one vendor? 2 vendors? > 10 vendors? There is major userspace fragmentation in Android userspace > in many places, the user/kernel boundry being one of the big ones as > many of us have found out over the past years. And many of us are > working to resolve this, but it's not so simple at times, and I have > many examples if you want specifics. Lauerent suggested AOSP https://lore.kernel.org/all/YTyWANV%2FmSkQbYhj@pendragon.ideasonboard.com/ > > > > > And AI/ML is no different here, someone just need to start build such > > > > stack. Otherwise, we will continue to see more free riders like HabanaLabs > > > > which don't have any real benefit to the community. > > > > > > Everyone contributes to Linux in a selfish manner, that's just how the > > > community works. The work that companies like habanalabs is NOT being a > > > "free rider" at all, they have worked with us and done the hard work of > > > actually getting their code merged into the tree. > > > > I perfectly remember them trying to bypass netdev and RDMA communities > > by pretending "misc" device. > > > > https://lore.kernel.org/linux-rdma/20200915133556.21268811@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/ > > https://lore.kernel.org/linux-rdma/20200917171833.GJ8409@ziepe.ca/ > > > > Or DRM > > https://lore.kernel.org/linux-rdma/CAKMK7uFOfoxbD2Z5mb-qHFnUe5rObGKQ6Ygh--HSH9M=9bziGg@mail.gmail.com/ > > > > So I can agree with the statement "worked hard", but not with the > > relevant communities. > > I point at these as doing exactly what we want vendors to be doing! > Thank you for finding the good examples. This is a vendor submitting > patches and saying, "here is what we want to do, with a first cut at > doing it." It's up to us as a community to tell them if they are doing > it the right way or not. > > If we just let them all go their own ways, they will come up with > horrible apis and interfaces, we have all seen that before. > > So by working together, we both can learn from, and work together to > solve the issue. And that is what these driver authors and company has > been doing! They are part of our community, why are you saying they > should now just go do their own thing away from us? This is not what I said. I don't see Intel (habanalabs) as a company that can't create proper AI stack and think that this is our responsibility to provide them enough incentive to do it. > > And as for "bypassing", that feels very mean. We have had accelerator > code in the char/misc and other parts of the kernel tree since at least > 2018 if not earlier (I didn't look all that hard.) Just because someone > wanted to use the in-kernel apis that are there (why is dma-buf some > magic thing?) does not mean that they suddenly need to move to a > different subsystem. Because dma-buf API has specific semantics and was designed with very specific usage model in mind. > > We get at least 1-2 new subsystems and major drivers that get added to > the kernel tree that do things that have never been done before with > custom user/kernel apis every kernel release. Not everything can be a > standard api no matter how much I, and others, wish it were. So when will you draw a line and ask to create proper susbsystem with standard APIs? After 2, 3 ... 100 similar (from our point of view) and different (from vendor point of view) devices with custom API? > > As examples, what about the hyperv blob api that was submitted recently > going around the block layer? What about the new Intel accelerator that > added yet-another-set-of-custom-ioctls? What about the rpi drivers? > What about the virtualbox drivers? Should all of those just live > outside of the kernel for forever? > > Of course not. So what is your bar? Accept everything? Thanks > > thanks, > > greg k-h ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-12 14:15 ` Leon Romanovsky @ 2021-09-12 14:34 ` Greg KH 2021-09-12 16:41 ` Laurent Pinchart ` (3 more replies) 0 siblings, 4 replies; 77+ messages in thread From: Greg KH @ 2021-09-12 14:34 UTC (permalink / raw) To: Leon Romanovsky Cc: Laurent Pinchart, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit On Sun, Sep 12, 2021 at 05:15:30PM +0300, Leon Romanovsky wrote: > On Sun, Sep 12, 2021 at 03:25:58PM +0200, Greg KH wrote: > > > The main idea that I want to see working userspace stack, and being in > > > distro sets a certain quality level, am I asking too much? > > > > Define "working userspace stack" and "distro" please. Like others have > > said, many distros will not take userspace code unless it's already in > > the kernel tree first, as that ensures that the abi will not break. > > Like I already answered > https://lore.kernel.org/all/YT2zryAKHc%2F5R2IH@unreal/ > "To be used" means some open PR to existing package or request for > inclusion for new packages. But again, distros will not take things that are not already in the kernel. > > > > > 2. Devices that hits the certain level of adoption - need to be > > > > > integrated into certain userspace stack, which needs to be part of > > > > > distro. > > > > > > > > Distros are a very odd rule to rely on given that they are by far the > > > > minority of the usage in raw numbers for Linux in the world. > > > > > > You can count Android as another distro, it is just semantics. > > > > But how do you define Android's userspace? Just one vendor? 2 vendors? > > 10 vendors? There is major userspace fragmentation in Android userspace > > in many places, the user/kernel boundry being one of the big ones as > > many of us have found out over the past years. And many of us are > > working to resolve this, but it's not so simple at times, and I have > > many examples if you want specifics. > > Lauerent suggested AOSP > https://lore.kernel.org/all/YTyWANV%2FmSkQbYhj@pendragon.ideasonboard.com/ Vendors can not get code into AOSP for various reasons that only Google understands. There are many millions, if not billions of Android devices out there with user/kernel apis that are not upstream nor in AOSP because Google doesn't want to take them, or because the vendor can not go through those hoops (international law is tricky at times...) So are we to just not be able to take drivers that add those new apis if AOSP can not take the userspace side, yet the userspace side is published somewhere else? > > > > > And AI/ML is no different here, someone just need to start build such > > > > > stack. Otherwise, we will continue to see more free riders like HabanaLabs > > > > > which don't have any real benefit to the community. > > > > > > > > Everyone contributes to Linux in a selfish manner, that's just how the > > > > community works. The work that companies like habanalabs is NOT being a > > > > "free rider" at all, they have worked with us and done the hard work of > > > > actually getting their code merged into the tree. > > > > > > I perfectly remember them trying to bypass netdev and RDMA communities > > > by pretending "misc" device. > > > > > > https://lore.kernel.org/linux-rdma/20200915133556.21268811@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/ > > > https://lore.kernel.org/linux-rdma/20200917171833.GJ8409@ziepe.ca/ > > > > > > Or DRM > > > https://lore.kernel.org/linux-rdma/CAKMK7uFOfoxbD2Z5mb-qHFnUe5rObGKQ6Ygh--HSH9M=9bziGg@mail.gmail.com/ > > > > > > So I can agree with the statement "worked hard", but not with the > > > relevant communities. > > > > I point at these as doing exactly what we want vendors to be doing! > > Thank you for finding the good examples. This is a vendor submitting > > patches and saying, "here is what we want to do, with a first cut at > > doing it." It's up to us as a community to tell them if they are doing > > it the right way or not. > > > > If we just let them all go their own ways, they will come up with > > horrible apis and interfaces, we have all seen that before. > > > > So by working together, we both can learn from, and work together to > > solve the issue. And that is what these driver authors and company has > > been doing! They are part of our community, why are you saying they > > should now just go do their own thing away from us? > > This is not what I said. I don't see Intel (habanalabs) as a company > that can't create proper AI stack and think that this is our > responsibility to provide them enough incentive to do it. So should we be forcing everyone to follow the IBM standard for accelerator drivers because they were in the kernel first all those years ago? Or what other standard do we pick? And why are we dictating new industry standards here? Who are we to do that? Who is going to take that responsibility on? > > And as for "bypassing", that feels very mean. We have had accelerator > > code in the char/misc and other parts of the kernel tree since at least > > 2018 if not earlier (I didn't look all that hard.) Just because someone > > wanted to use the in-kernel apis that are there (why is dma-buf some > > magic thing?) does not mean that they suddenly need to move to a > > different subsystem. > > Because dma-buf API has specific semantics and was designed with very > specific usage model in mind. So will the IB patches usage be re-reviewed? Anyway, we have apis that are used throughout the kernel all the time that don't end up on the various subsystem mailing list because people forget, or just do not know. That's normal and something we have dealt with for forever. As an example, I didn't realise that just using the dma-buf api required such a review. Can we put that in the MAINTAINERS file somehow for apis? > > We get at least 1-2 new subsystems and major drivers that get added to > > the kernel tree that do things that have never been done before with > > custom user/kernel apis every kernel release. Not everything can be a > > standard api no matter how much I, and others, wish it were. > > So when will you draw a line and ask to create proper susbsystem > with standard APIs? After 2, 3 ... 100 similar (from our point of view) > and different (from vendor point of view) devices with custom API? That is a great question and I do not have the answer to that. Should we have done that after the first one went into the kernel all those years ago? Maybe, but I seem to recal the answer being "our hardware works much differently, so our user api will be much different", and that's a valid answer. If your standard can not handle new usage models and a way to handle that, then it isn't a good standard that companies will follow for new types of devices. We have loads of char drivers with odd ioctl apis because we have loads of odd hardware devices out in the world. We have been treating these accelerators like that for a long time now, except when they try to duplicate existing in-kernel code (like crypto or networking). > > As examples, what about the hyperv blob api that was submitted recently > > going around the block layer? What about the new Intel accelerator that > > added yet-another-set-of-custom-ioctls? What about the rpi drivers? > > What about the virtualbox drivers? Should all of those just live > > outside of the kernel for forever? > > > > Of course not. > > So what is your bar? Accept everything? It's a hard line to draw, and for some reason, I seem to be the one having to review these types of drivers every kernel release. If people wish to help me out, please do so, all the patches are on the lists. Right now I push back where I can and try to get semi-sane apis created that are "obviously not wrong" where I notice. After that, I just need to trust that the maintainer of the driver knows what they are doing and will maintain the code going forward. So far, it's worked out. Do you have a better idea of what to do instead? thanks, greg k-h ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-12 14:34 ` Greg KH @ 2021-09-12 16:41 ` Laurent Pinchart 2021-09-12 20:35 ` Dave Airlie ` (2 subsequent siblings) 3 siblings, 0 replies; 77+ messages in thread From: Laurent Pinchart @ 2021-09-12 16:41 UTC (permalink / raw) To: Greg KH Cc: Leon Romanovsky, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit Hi Greg, On Sun, Sep 12, 2021 at 04:34:48PM +0200, Greg KH wrote: > On Sun, Sep 12, 2021 at 05:15:30PM +0300, Leon Romanovsky wrote: > > On Sun, Sep 12, 2021 at 03:25:58PM +0200, Greg KH wrote: > > > > The main idea that I want to see working userspace stack, and being in > > > > distro sets a certain quality level, am I asking too much? > > > > > > Define "working userspace stack" and "distro" please. Like others have > > > said, many distros will not take userspace code unless it's already in > > > the kernel tree first, as that ensures that the abi will not break. > > > > Like I already answered > > https://lore.kernel.org/all/YT2zryAKHc%2F5R2IH@unreal/ > > "To be used" means some open PR to existing package or request for > > inclusion for new packages. > > But again, distros will not take things that are not already in the > kernel. It's becoming difficult to follow the discussion as it has branched. I've replied on this topic separately. > > > > > > 2. Devices that hits the certain level of adoption - need to be > > > > > > integrated into certain userspace stack, which needs to be part of > > > > > > distro. > > > > > > > > > > Distros are a very odd rule to rely on given that they are by far the > > > > > minority of the usage in raw numbers for Linux in the world. > > > > > > > > You can count Android as another distro, it is just semantics. > > > > > > But how do you define Android's userspace? Just one vendor? 2 vendors? > > > 10 vendors? There is major userspace fragmentation in Android userspace > > > in many places, the user/kernel boundry being one of the big ones as > > > many of us have found out over the past years. And many of us are > > > working to resolve this, but it's not so simple at times, and I have > > > many examples if you want specifics. > > > > Lauerent suggested AOSP > > https://lore.kernel.org/all/YTyWANV%2FmSkQbYhj@pendragon.ideasonboard.com/ > > Vendors can not get code into AOSP for various reasons that only Google > understands. There are many millions, if not billions of Android > devices out there with user/kernel apis that are not upstream nor in > AOSP because Google doesn't want to take them, or because the vendor can > not go through those hoops (international law is tricky at times...) > > So are we to just not be able to take drivers that add those new apis if > AOSP can not take the userspace side, yet the userspace side is > published somewhere else? "Open userspace" and "packaged in distros" are two criteria that have been proposed. There are more, such as "open documentation" for instance. It's up to us to decide what to do (if anything), and I don't believe we'll be able to find one-size-fits-them-all criteria that can apply globally. There is however in my opinion value in carefully designing a set of criteria and document them, to then for instance let subsystems pick the ones that work best for the type of devices they handle. The "packaged in distros" criteria is, as I understand it, an attempt to avoid code dumps on git..b that would have been so badly designed that they would be unmaintainable. It's a tricky area, what I think is required is that vendors publish an open userspace implementation that is serious enough, and not just a way to tick a box while circumventing the spirit of the rule. Distro packaging may help achieving that, but there are certainly other ways too. For me, at the end of the day it's really about how to create a community starting from a single implementation. > > > > > > And AI/ML is no different here, someone just need to start build such > > > > > > stack. Otherwise, we will continue to see more free riders like HabanaLabs > > > > > > which don't have any real benefit to the community. > > > > > > > > > > Everyone contributes to Linux in a selfish manner, that's just how the > > > > > community works. The work that companies like habanalabs is NOT being a > > > > > "free rider" at all, they have worked with us and done the hard work of > > > > > actually getting their code merged into the tree. > > > > > > > > I perfectly remember them trying to bypass netdev and RDMA communities > > > > by pretending "misc" device. > > > > > > > > https://lore.kernel.org/linux-rdma/20200915133556.21268811@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/ > > > > https://lore.kernel.org/linux-rdma/20200917171833.GJ8409@ziepe.ca/ > > > > > > > > Or DRM > > > > https://lore.kernel.org/linux-rdma/CAKMK7uFOfoxbD2Z5mb-qHFnUe5rObGKQ6Ygh--HSH9M=9bziGg@mail.gmail.com/ > > > > > > > > So I can agree with the statement "worked hard", but not with the > > > > relevant communities. > > > > > > I point at these as doing exactly what we want vendors to be doing! > > > Thank you for finding the good examples. This is a vendor submitting > > > patches and saying, "here is what we want to do, with a first cut at > > > doing it." It's up to us as a community to tell them if they are doing > > > it the right way or not. > > > > > > If we just let them all go their own ways, they will come up with > > > horrible apis and interfaces, we have all seen that before. > > > > > > So by working together, we both can learn from, and work together to > > > solve the issue. And that is what these driver authors and company has > > > been doing! They are part of our community, why are you saying they > > > should now just go do their own thing away from us? > > > > This is not what I said. I don't see Intel (habanalabs) as a company > > that can't create proper AI stack and think that this is our > > responsibility to provide them enough incentive to do it. > > So should we be forcing everyone to follow the IBM standard for > accelerator drivers because they were in the kernel first all those > years ago? Or what other standard do we pick? > > And why are we dictating new industry standards here? Who are we to do > that? Who is going to take that responsibility on? > > > > And as for "bypassing", that feels very mean. We have had accelerator > > > code in the char/misc and other parts of the kernel tree since at least > > > 2018 if not earlier (I didn't look all that hard.) Just because someone > > > wanted to use the in-kernel apis that are there (why is dma-buf some > > > magic thing?) does not mean that they suddenly need to move to a > > > different subsystem. > > > > Because dma-buf API has specific semantics and was designed with very > > specific usage model in mind. > > So will the IB patches usage be re-reviewed? > > Anyway, we have apis that are used throughout the kernel all the time > that don't end up on the various subsystem mailing list because people > forget, or just do not know. That's normal and something we have dealt > with for forever. As an example, I didn't realise that just using the > dma-buf api required such a review. > > Can we put that in the MAINTAINERS file somehow for apis? > > > > We get at least 1-2 new subsystems and major drivers that get added to > > > the kernel tree that do things that have never been done before with > > > custom user/kernel apis every kernel release. Not everything can be a > > > standard api no matter how much I, and others, wish it were. > > > > So when will you draw a line and ask to create proper susbsystem > > with standard APIs? After 2, 3 ... 100 similar (from our point of view) > > and different (from vendor point of view) devices with custom API? > > That is a great question and I do not have the answer to that. Should > we have done that after the first one went into the kernel all those > years ago? Maybe, but I seem to recal the answer being "our hardware > works much differently, so our user api will be much different", and > that's a valid answer. And it's also the answer that all vendors will give, because it's an easy way to avoid doing extra work. It may sometimes be true, but that's an exception rather than a rule. It reminds me of something I've heard in a working group recently, when someone mentioned a "key differentiating factor" that requires a free ticket for vendors not to open the implementation, and a few seconds later went on to say it was "available in all phones in the market today". I won't call these lies, I believe that in most cases the vendors actually believe it's true. > If your standard can not handle new usage models and a way to handle > that, then it isn't a good standard that companies will follow for new > types of devices. > > We have loads of char drivers with odd ioctl apis because we have loads > of odd hardware devices out in the world. We have been treating these > accelerators like that for a long time now, except when they try to > duplicate existing in-kernel code (like crypto or networking). Going back to the "is an accelerator a GPU?" topic for a bit, DRM doesn't prevent drivers from exposing custom features with custom API elements. AI/ML accelerators aren't GPUs in the original sense of 3D rendering accelerators (maybe that's a cause of misunderstanding, we're not using the best terminology), but they fit pretty well within the device model that DRM creates. The side effect of using DRM is that an open userspace is required, and this is why some people in the community believe Habanalabs tried to work around that rule by going for drivers/misc/. I don't know enough about the history to know if they were behaving in good faith or not, but maybe we could try to turn this page by deciding on the right path forward together and forget about the finger pointing and blaming. > > > As examples, what about the hyperv blob api that was submitted recently > > > going around the block layer? What about the new Intel accelerator that > > > added yet-another-set-of-custom-ioctls? What about the rpi drivers? > > > What about the virtualbox drivers? Should all of those just live > > > outside of the kernel for forever? > > > > > > Of course not. > > > > So what is your bar? Accept everything? > > It's a hard line to draw, and for some reason, I seem to be the one > having to review these types of drivers every kernel release. If people > wish to help me out, please do so, all the patches are on the lists. This may be a controversial point, but could it be because vendors perceive you as less likely to look closely and push back ? If drivers/misc/ is seen as being free-for-all and other subsystems are likely to ask for more work, natural laziness will push vendors to drivers/misc/. > Right now I push back where I can and try to get semi-sane apis created > that are "obviously not wrong" where I notice. After that, I just need > to trust that the maintainer of the driver knows what they are doing and > will maintain the code going forward. So far, it's worked out. > > Do you have a better idea of what to do instead? Is there a way we could push those drivers more strongly towards other subsystems ? There's certainly no way you will be able to foster the creating of a dozen userspace frameworks and related communities from drivers/misc/ by yourself. -- Regards, Laurent Pinchart ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-12 14:34 ` Greg KH 2021-09-12 16:41 ` Laurent Pinchart @ 2021-09-12 20:35 ` Dave Airlie 2021-09-12 20:41 ` Dave Airlie 2021-09-13 14:03 ` Mark Brown 3 siblings, 0 replies; 77+ messages in thread From: Dave Airlie @ 2021-09-12 20:35 UTC (permalink / raw) To: Greg KH Cc: Leon Romanovsky, Laurent Pinchart, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit > > So should we be forcing everyone to follow the IBM standard for > accelerator drivers because they were in the kernel first all those > years ago? Or what other standard do we pick? > > And why are we dictating new industry standards here? Who are we to do > that? Who is going to take that responsibility on? There is no sane kernel API standards here, the standards that control these devices live out in userspace, far away from the world you want to inhabit. Responsible kernel maintainership should come with knowledge of the entire ecosystem and where it's going. If people are trying to merge kernel drivers and you don't have enough info/knowledge about the ecosystem, then say No. > > > > Because dma-buf API has specific semantics and was designed with very > > specific usage model in mind. > > So will the IB patches usage be re-reviewed? > > Anyway, we have apis that are used throughout the kernel all the time > that don't end up on the various subsystem mailing list because people > forget, or just do not know. That's normal and something we have dealt > with for forever. As an example, I didn't realise that just using the > dma-buf api required such a review. > > Can we put that in the MAINTAINERS file somehow for apis? We have had MAINTAINERS rules matching on the dma-buf includes 78baee8d3b976a6a6a2c208e3a36d3f1e6297e6c Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Dec 4 22:51:05 2019 +0100 MAINTAINERS: Match on dma_buf|fence|resv anywhere and a later followup to clean it up a bit. > That is a great question and I do not have the answer to that. Should > we have done that after the first one went into the kernel all those > years ago? Maybe, but I seem to recal the answer being "our hardware > works much differently, so our user api will be much different", and > that's a valid answer. Every GPU driver has a different user API, all of them there is no standard. We still merge them but we require userspace. Maybe if you could sign up to follow the same rules it might be less onerous on you. > > It's a hard line to draw, and for some reason, I seem to be the one > having to review these types of drivers every kernel release. If people > wish to help me out, please do so, all the patches are on the lists. We do help out, we've said No. I've no idea why you go ahead and merge things sometimes. Creating a trash pile in your neighbourhood and then complaining when more people continue to dump more trash on it seems a little disingenuous to me. We need to take more responsibility for the way these things are used, and making sure there are frameworks for them. We got things where they were by saying upstream first a lot without thinking of the consequences of success. Now we have that success we should start thinking of the responsibilities that come with it. Distros like RHEL/Centos are why these vendors are pushing stuff upstream, the want to be included. However once they do that they no longer gain the benefits of the Linux development model and just run off and spawn 20 userspace projects that they can maintain control over. Companies love control, they hate not having ultimate say over their kernel drivers, and they won't willingly create userspace projects where that happens either, not unless we work together for the good of the ecosystem, not just the good of the kernel. Dave. ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-12 14:34 ` Greg KH 2021-09-12 16:41 ` Laurent Pinchart 2021-09-12 20:35 ` Dave Airlie @ 2021-09-12 20:41 ` Dave Airlie 2021-09-12 20:49 ` Daniel Vetter 2021-09-13 14:03 ` Mark Brown 3 siblings, 1 reply; 77+ messages in thread From: Dave Airlie @ 2021-09-12 20:41 UTC (permalink / raw) To: Greg KH Cc: Leon Romanovsky, Laurent Pinchart, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit > > So will the IB patches usage be re-reviewed? https://lore.kernel.org/linux-rdma/MW3PR11MB4555CCCDD42F1ADEC61F7ACAE5AB0@MW3PR11MB4555.namprd11.prod.outlook.com/ FYI it's a thread where GPU devs reviewing IB dma-buf patches, what's next cat and dogs living together? Dave. ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-12 20:41 ` Dave Airlie @ 2021-09-12 20:49 ` Daniel Vetter 2021-09-12 21:12 ` Dave Airlie 0 siblings, 1 reply; 77+ messages in thread From: Daniel Vetter @ 2021-09-12 20:49 UTC (permalink / raw) To: Dave Airlie Cc: Greg KH, Leon Romanovsky, Laurent Pinchart, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit On Sun, Sep 12, 2021 at 10:41 PM Dave Airlie <airlied@gmail.com> wrote: > > So will the IB patches usage be re-reviewed? > > https://lore.kernel.org/linux-rdma/MW3PR11MB4555CCCDD42F1ADEC61F7ACAE5AB0@MW3PR11MB4555.namprd11.prod.outlook.com/ > > FYI it's a thread where GPU devs reviewing IB dma-buf patches, what's > next cat and dogs living together? And as you can see, the review has been long and involved a ton of different dri-devel (and rdma ofc too) folks. It's like there's an entire community of experts at hands who could help out in reviewing these things, if only we'd not have a maintainer who happily bypasses all that and invites all the dumpster fires into drivers/misc. And then complains that no one helps with reviewing ... -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-12 20:49 ` Daniel Vetter @ 2021-09-12 21:12 ` Dave Airlie 2021-09-12 22:51 ` Linus Walleij 0 siblings, 1 reply; 77+ messages in thread From: Dave Airlie @ 2021-09-12 21:12 UTC (permalink / raw) To: Daniel Vetter Cc: Greg KH, Leon Romanovsky, Laurent Pinchart, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit On Mon, 13 Sept 2021 at 06:49, Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > On Sun, Sep 12, 2021 at 10:41 PM Dave Airlie <airlied@gmail.com> wrote: > > > So will the IB patches usage be re-reviewed? > > > > https://lore.kernel.org/linux-rdma/MW3PR11MB4555CCCDD42F1ADEC61F7ACAE5AB0@MW3PR11MB4555.namprd11.prod.outlook.com/ > > > > FYI it's a thread where GPU devs reviewing IB dma-buf patches, what's > > next cat and dogs living together? > > And as you can see, the review has been long and involved a ton of > different dri-devel (and rdma ofc too) folks. > > It's like there's an entire community of experts at hands who could > help out in reviewing these things, if only we'd not have a maintainer > who happily bypasses all that and invites all the dumpster fires into > drivers/misc. And then complains that no one helps with reviewing ... > -Daniel Daniel makes a good point here about "communities of experts". We need to foster those cross-vendor expert communities to sustain Linux going forward. For userspace components as well these communities of experts need to exist for each domain, and we need to encourage upstream first processes across the board for these split kernel/userspace stacks. The habanalabs compiler backend is an LLVM fork, I'd like to see the effort to upstream that LLVM backend into LLVM proper. When this sort of thing happens it gets on the radar of the LLVM compiler experts, instead of it just being the habanalabs experts. I've met so many internal company experts who remain unchallenged internally but buckle when introduced to true communities of expertise. If we want to keep this thing growing and maintainable we need to tap into those existing expertise groups. This is why it's important we foster userspace groups. If you hear the myth that only our company understands our hw enough to write code for it, it's been proven bullshit numerous times. It's an excuse for retaining control. Also if there was a shared runtime library repo with cross vendor review, I'm betting they'd all learn a lot about how a userspace should work and be maintained rather than assuming they knew it all themselves. Be more like spiderman, great maintainership + great responsibility. Dave. ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-12 21:12 ` Dave Airlie @ 2021-09-12 22:51 ` Linus Walleij 2021-09-12 23:15 ` Dave Airlie 2021-09-13 13:20 ` Arnd Bergmann 0 siblings, 2 replies; 77+ messages in thread From: Linus Walleij @ 2021-09-12 22:51 UTC (permalink / raw) To: Dave Airlie Cc: Daniel Vetter, Greg KH, Leon Romanovsky, Laurent Pinchart, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit, dev On Sun, Sep 12, 2021 at 11:13 PM Dave Airlie <airlied@gmail.com> wrote: > For userspace components as well these communities of experts need to > exist for each domain, and we need to encourage upstream first > processes across the board for these split kernel/userspace stacks. > > The habanalabs compiler backend is an LLVM fork, I'd like to see the > effort to upstream that LLVM backend into LLVM proper. I couldn't agree more. A big part of the problem with inference engines / NPU:s is that of no standardized userspace. Several of the machine learning initiatives from some years back now have stale git repositories and are visibly unmaintained, c.f. Caffe https://github.com/BVLC/caffe last commit 2 years ago. In a discussion thread at LWN I raised Apache TVM as a currently quite obviously alive and kicking community, and these people have the ambition to provide "an open source machine learning compiler framework for CPUs, GPUs, and machine learning accelerators". https://tvm.apache.org/ At least they have all relevant companies logotypes on their homepage, so there is some kind of commitment. You can find for example from Arm an RFC for real HW accelerator code support using (out of tree) Linux kernel drivers with Apache TVM: https://discuss.tvm.apache.org/t/rfc-ethosn-arm-ethos-n-integration/6680 Then there is Google's TensorFlow. How open is that for a random HW vendor who want to integrate their accelerator and how open is it to working with the kernel community? Then there is PyTorch. All of these apparently active. Well CPU vendors often support two different compilers so I guess they could very well support three machine learning userspaces, why not. What confuses me is what kind of time horizon and longevity these projects have, and what level of commitment is involved and what ambition. Especially to what extent they would care about working with the Linux kernel community. (TVM have a mail address so I added them on CC.) Habanalabs propose an LLVM fork as compiler, yet the Intel logo is on the Apache TVM website, and no sign of integrating with that project. They claim to support also TensorFlow. The way I percieve it is that there simply isn't any GCC/LLVM or Gallium 3D of NPU:s, these people haven't yet decided that "here is that userspace we are all going to use". Or have they? LLVM? TVM? TensorFlow? PyTorch? Some other one? What worries me is that I don't see one single developer being able to say "this one definately, and they will work with the kernel community", and that is what we need to hear. Yours, Linus Walleij ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-12 22:51 ` Linus Walleij @ 2021-09-12 23:15 ` Dave Airlie 2021-09-13 13:20 ` Arnd Bergmann 1 sibling, 0 replies; 77+ messages in thread From: Dave Airlie @ 2021-09-12 23:15 UTC (permalink / raw) To: Linus Walleij Cc: Daniel Vetter, Greg KH, Leon Romanovsky, Laurent Pinchart, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit, dev On Mon, 13 Sept 2021 at 08:52, Linus Walleij <linus.walleij@linaro.org> wrote: > > On Sun, Sep 12, 2021 at 11:13 PM Dave Airlie <airlied@gmail.com> wrote: > > > For userspace components as well these communities of experts need to > > exist for each domain, and we need to encourage upstream first > > processes across the board for these split kernel/userspace stacks. > > > > The habanalabs compiler backend is an LLVM fork, I'd like to see the > > effort to upstream that LLVM backend into LLVM proper. > > I couldn't agree more. > > A big part of the problem with inference engines / NPU:s is that of no > standardized userspace. Several of the machine learning initiatives > from some years back now have stale git repositories and are > visibly unmaintained, c.f. Caffe https://github.com/BVLC/caffe > last commit 2 years ago. > > In a discussion thread at LWN I raised Apache TVM as a currently > quite obviously alive and kicking community, and these people have > the ambition to provide "an open source machine learning compiler > framework for CPUs, GPUs, and machine learning accelerators". > https://tvm.apache.org/ > At least they have all relevant companies logotypes on their homepage, > so there is some kind of commitment. > You can find for example from Arm an RFC for real HW accelerator code > support using (out of tree) Linux kernel drivers with Apache TVM: > https://discuss.tvm.apache.org/t/rfc-ethosn-arm-ethos-n-integration/6680 > > Then there is Google's TensorFlow. How open is that for a random > HW vendor who want to integrate their accelerator and how open is > it to working with the kernel community? Then there is PyTorch. > All of these apparently active. Well CPU vendors often support > two different compilers so I guess they could very well support > three machine learning userspaces, why not. > > What confuses me is what kind of time horizon and longevity these > projects have, and what level of commitment is involved and > what ambition. Especially to what extent they would care about > working with the Linux kernel community. (TVM have a mail > address so I added them on CC.) > > Habanalabs propose an LLVM fork as compiler, yet the Intel > logo is on the Apache TVM website, and no sign of integrating with > that project. They claim to support also TensorFlow. > > The way I percieve it is that there simply isn't any GCC/LLVM or > Gallium 3D of NPU:s, these people haven't yet decided that "here > is that userspace we are all going to use". Or have they? > > LLVM? TVM? TensorFlow? PyTorch? Some other one? Yeah I've been doing the same research, and there is also the Glow project I think to add to the list. The thing is control, everyone wants to run it, when it comes to Linux nearly all the vendors have realised they've lost their control and learned to live with it, but the second they are into userspace, it's like hey we need to be in charge of every single piece of this, thus losing the Linux kernel advantage of pooling engineering expertise cross-vendor. I certainly don't want to be the distro packager having to package 30 forks of LLVM for 20 different vendor accelerators with 20 runtime APIs and 20 forks of TVM/Tensorflow/pytorch. Enabling that behaviour by just merging kernel drivers and washing our hands to me seems like a large misstep for the future of maintainability of the kernel, esp as these devices start interacting with GPUs or RDMA and we get locked into unmovable interfaces that we can't even analyse for deadlocks etc. Dave. ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-12 22:51 ` Linus Walleij 2021-09-12 23:15 ` Dave Airlie @ 2021-09-13 13:20 ` Arnd Bergmann 2021-09-13 13:54 ` Daniel Vetter ` (2 more replies) 1 sibling, 3 replies; 77+ messages in thread From: Arnd Bergmann @ 2021-09-13 13:20 UTC (permalink / raw) To: Linus Walleij Cc: Dave Airlie, Daniel Vetter, Greg KH, Leon Romanovsky, Laurent Pinchart, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit, dev On Mon, Sep 13, 2021 at 12:51 AM Linus Walleij <linus.walleij@linaro.org> wrote: > On Sun, Sep 12, 2021 at 11:13 PM Dave Airlie <airlied@gmail.com> wrote: > > > For userspace components as well these communities of experts need to > > exist for each domain, and we need to encourage upstream first > > processes across the board for these split kernel/userspace stacks. > > > > The habanalabs compiler backend is an LLVM fork, I'd like to see the > > effort to upstream that LLVM backend into LLVM proper. > > I couldn't agree more. > > A big part of the problem with inference engines / NPU:s is that of no > standardized userspace. Several of the machine learning initiatives > from some years back now have stale git repositories and are > visibly unmaintained, c.f. Caffe https://github.com/BVLC/caffe > last commit 2 years ago. Caffe as a standalone project was abandoned and merged into PyTorch, see https://caffe2.ai/. I think this is the kind of consolidation of those projects that you are looking for. > Habanalabs propose an LLVM fork as compiler, yet the Intel > logo is on the Apache TVM website, and no sign of integrating with > that project. They claim to support also TensorFlow. > > The way I perceive it is that there simply isn't any GCC/LLVM or > Gallium 3D of NPU:s, these people haven't yet decided that "here > is that userspace we are all going to use". Or have they? > > LLVM? TVM? TensorFlow? PyTorch? Some other one? > > What worries me is that I don't see one single developer being > able to say "this one definitely, and they will work with the kernel > community", and that is what we need to hear. I don't actually think this is a decision we can possibly wait for. The ones you listed all work on different levels, some build on top of others, and some may get replaced by new ones over time. For a generic kernel interface, we need something that can be supported as a back-end for multiple such libraries, and that works on more than just one hardware. Most likely we will need both higher-level and lower-level interfaces, so that a framework (or an application directly) may target one interface, but some hardware may not be able to implement this. One straightforward hardware independent low-level API would be the traditional BLAS GEMM call[1] for matrix multiplication and its variants (integer, float, bfloat16, ...). Most of the frameworks are able to use SGEMM to do the actual calculation since that has optimized versions for most CPUs and GPUs, and most hardware accelerators should be able to provide an implementation of this that doesn't completely suck. This can be used for both inferencing and training. On the kernel side, this could probably be done inside the existing crypto (async), media (mem2mem), or gpu/drm interfaces that all provide ways to offload computational functions on blocks of memory potentially backed by a dmabuf, but having a new top-level chardev interface may be a better fit. A completely different interface would something that lets you compile a model into a hardware specific blob in user space and then submit that blob into the kernel, using further commands to send and receive model specific data. As I understand it, this method is roughly what habanalabs and some of the other ones do for inferencing. The performance is almost certainly better here, but it requires a high degree of integration between model, framework, user space driver, compiler and kernel driver. We already do similar things in the gpu, fpga and remoteproc frameworks, all of which could be used here, or we add a more specialized interface. What the actual interfaces should be I have no clue, those two are just examples of what it could be, being completely ignorant of what drivers do today. As Dave said, this really needs a maintainer that understands both the kernel side and what kind of hardware and frameworks exist and what interfaces both sides actually require. Arnd [1] http://www.netlib.org/lapack/explore-html/db/dc9/group__single__blas__level3_gafe51bacb54592ff5de056acabd83c260.html ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-13 13:20 ` Arnd Bergmann @ 2021-09-13 13:54 ` Daniel Vetter 2021-09-13 22:04 ` Arnd Bergmann 2021-09-13 14:52 ` James Bottomley 2021-09-14 13:07 ` Linus Walleij 2 siblings, 1 reply; 77+ messages in thread From: Daniel Vetter @ 2021-09-13 13:54 UTC (permalink / raw) To: Arnd Bergmann Cc: Linus Walleij, Dave Airlie, Greg KH, Leon Romanovsky, Laurent Pinchart, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit, dev On Mon, Sep 13, 2021 at 3:20 PM Arnd Bergmann <arnd@arndb.de> wrote: > > On Mon, Sep 13, 2021 at 12:51 AM Linus Walleij <linus.walleij@linaro.org> wrote: > > On Sun, Sep 12, 2021 at 11:13 PM Dave Airlie <airlied@gmail.com> wrote: > > > > > For userspace components as well these communities of experts need to > > > exist for each domain, and we need to encourage upstream first > > > processes across the board for these split kernel/userspace stacks. > > > > > > The habanalabs compiler backend is an LLVM fork, I'd like to see the > > > effort to upstream that LLVM backend into LLVM proper. > > > > I couldn't agree more. > > > > A big part of the problem with inference engines / NPU:s is that of no > > standardized userspace. Several of the machine learning initiatives > > from some years back now have stale git repositories and are > > visibly unmaintained, c.f. Caffe https://github.com/BVLC/caffe > > last commit 2 years ago. > > Caffe as a standalone project was abandoned and merged into > PyTorch, see https://caffe2.ai/. I think this is the kind of consolidation > of those projects that you are looking for. > > > Habanalabs propose an LLVM fork as compiler, yet the Intel > > logo is on the Apache TVM website, and no sign of integrating with > > that project. They claim to support also TensorFlow. > > > > The way I perceive it is that there simply isn't any GCC/LLVM or > > Gallium 3D of NPU:s, these people haven't yet decided that "here > > is that userspace we are all going to use". Or have they? > > > > LLVM? TVM? TensorFlow? PyTorch? Some other one? > > > > What worries me is that I don't see one single developer being > > able to say "this one definitely, and they will work with the kernel > > community", and that is what we need to hear. > > I don't actually think this is a decision we can possibly wait for. > The ones you listed all work on different levels, some build on top > of others, and some may get replaced by new ones over time. > > For a generic kernel interface, we need something that can be > supported as a back-end for multiple such libraries, and that > works on more than just one hardware. Most likely we will need > both higher-level and lower-level interfaces, so that a > framework (or an application directly) may target one interface, > but some hardware may not be able to implement this. > > One straightforward hardware independent low-level API would > be the traditional BLAS GEMM call[1] for matrix multiplication > and its variants (integer, float, bfloat16, ...). Most of the frameworks > are able to use SGEMM to do the actual calculation since that > has optimized versions for most CPUs and GPUs, and most > hardware accelerators should be able to provide an > implementation of this that doesn't completely suck. This > can be used for both inferencing and training. I think BLAS are too high-level for these. Sure fore perfect speed the vendor probably wants to have their own BLAS thing, their own NN optmizer and a heap of other things, but for the low-level userspace we're talking about here that pretty much doesn't matter. I think a really good example of this is the compute stack Intel is building: - level0 is the absolute bare-bones low level driver. For this discussion here that's enough of a userspace to make at least Dave&me happy. In 3d this would be vulkan. In AI/NN space, there's nothing here, at least nothing cross-vendor. - Then there's the entire OneApi ecosystem on top. Lots of this is open, some of it is closed, but from the pov of an accel stack it's all looking like applications, not like driver code. BLAS is sitting here. For AI/NN this is pytorch, tensorflow and all these higher-level frameworks (which often have quite sophisticated optimizers of their won) - then there's funny intermediate apis like opencl, where the state of the art is still to implement them directly as userspace drivers on top of the kernel. Although on the 3d side at least we're getting to a point where opengl on top of vulkan is impressively close to an optimized driver. But for know it's still mostly custom. This is what AI/NN drivers generally look like, with the high-level library fused together with the backend. Or the backend being an out-of-tree fork (which is pretty much always an llvm fork for the compiler side). Especially BLAS isn't the most impressive, since largely it's fused multiple-add benchmark and not much else. Ok, enormous amounts of tuning to perfectly exploit the execution bw and interconnect/cache hierarchy of your chip, whatever it is. That's often something vendors don't like sharing (intel's math kernels are still closed afaik) because it leaks a bit much about actual implementation details of the chip as opposed to how it's programmed. Also not something I really care about with my maintainer hat on. > On the kernel side, this could probably be done inside the > existing crypto (async), media (mem2mem), or gpu/drm > interfaces that all provide ways to offload computational > functions on blocks of memory potentially backed by a dmabuf, > but having a new top-level chardev interface may be a better > fit. > > A completely different interface would something that lets you > compile a model into a hardware specific blob in user space > and then submit that blob into the kernel, using further commands > to send and receive model specific data. As I understand it, > this method is roughly what habanalabs and some of the > other ones do for inferencing. The performance is almost > certainly better here, but it requires a high degree of integration > between model, framework, user space driver, compiler and > kernel driver. > We already do similar things in the gpu, fpga and remoteproc > frameworks, all of which could be used here, or we add a more > specialized interface. Not even the interface matters that much, there's very little the 3d/compute gpu drivers share there. It's the community of experts that matters, and the cross-vendor userspace project. > What the actual interfaces should be I have no clue, those > two are just examples of what it could be, being completely > ignorant of what drivers do today. As Dave said, this really > needs a maintainer that understands both the kernel side > and what kind of hardware and frameworks exist and > what interfaces both sides actually require. So yeah, agreeing here. -Daniel > Arnd > > [1] http://www.netlib.org/lapack/explore-html/db/dc9/group__single__blas__level3_gafe51bacb54592ff5de056acabd83c260.html -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-13 13:54 ` Daniel Vetter @ 2021-09-13 22:04 ` Arnd Bergmann 2021-09-13 23:33 ` Dave Airlie 0 siblings, 1 reply; 77+ messages in thread From: Arnd Bergmann @ 2021-09-13 22:04 UTC (permalink / raw) To: Daniel Vetter Cc: Arnd Bergmann, Linus Walleij, Dave Airlie, Greg KH, Leon Romanovsky, Laurent Pinchart, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit, dev >n Mon, Sep 13, 2021 at 3:54 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > One straightforward hardware independent low-level API would > > be the traditional BLAS GEMM call[1] for matrix multiplication > > and its variants (integer, float, bfloat16, ...). Most of the frameworks > > are able to use SGEMM to do the actual calculation since that > > has optimized versions for most CPUs and GPUs, and most > > hardware accelerators should be able to provide an > > implementation of this that doesn't completely suck. This > > can be used for both inferencing and training. > > I think BLAS are too high-level for these. Sure fore perfect speed the > vendor probably wants to have their own BLAS thing, their own NN > optmizer and a heap of other things, but for the low-level userspace > we're talking about here that pretty much doesn't matter. I suppose high-level vs low-level is not the correct distinction here, it's more like fixed-function vs programmable. As a fixed-function interface, something like GEMM is probably as low-level as you would want to get, as it's big enough to make sense as a single atomic command, but small enough to be able to build on top of it. > I think a really good example of this is the compute stack Intel is building: > - level0 is the absolute bare-bones low level driver. For this > discussion here that's enough of a userspace to make at least Dave&me > happy. In 3d this would be vulkan. In AI/NN space, there's nothing > here, at least nothing cross-vendor. > - Then there's the entire OneApi ecosystem on top. Lots of this is > open, some of it is closed, but from the pov of an accel stack it's > all looking like applications, not like driver code. BLAS is sitting > here. For AI/NN this is pytorch, tensorflow and all these higher-level > frameworks (which often have quite sophisticated optimizers of their > won) Looking at OneAPI, I see a BLAS implementation (oneMKL) next to somewhat higher-level abstraction (oneDNN). Which of the two are the generic frameworks (pytorch/tensorflow/...) built on top of? The oneDNN interface looks like it could be implemented not only on top of level0 but also layered above some BLAS library or as a thin wrapper above a fixed-function kernel interface that provides similar high-level abstractions. Is that a correct understanding? It also seems like this is similar in purpose to Apple's BNNS library. > Especially BLAS isn't the most impressive, since largely it's fused > multiple-add benchmark and not much else. Ok, enormous amounts of > tuning to perfectly exploit the execution bw and interconnect/cache > hierarchy of your chip, whatever it is. That's often something vendors > don't like sharing (intel's math kernels are still closed afaik) > because it leaks a bit much about actual implementation details of the > chip as opposed to how it's programmed. Also not something I really > care about with my maintainer hat on. It's not /just/ benchmarks, it's actually being used directly underneath the high-level frameworks precisely because it is simple, portable and well optimized. If there is a higher-level interface like oneDNN that is usable by the common frameworks, using a subset of that as a fixed-function interface for the kernel may be a good alternative (or at least complementary) to a fully programmable interface. I realize that fixed-function is not fashionable on GPUs, but they are widely used in other areas (video codecs, crypto, ...) even when you are running precompiled code on the accelerator hardware. This would of course replace the question of open source user space with the question of open-source firmware, as the user side would become mostly while the accelerator goes from dynamically created to a firmware blob. Arnd ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-13 22:04 ` Arnd Bergmann @ 2021-09-13 23:33 ` Dave Airlie 2021-09-14 9:08 ` Arnd Bergmann 0 siblings, 1 reply; 77+ messages in thread From: Dave Airlie @ 2021-09-13 23:33 UTC (permalink / raw) To: Arnd Bergmann Cc: Daniel Vetter, Linus Walleij, Greg KH, Leon Romanovsky, Laurent Pinchart, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit, dev On Tue, 14 Sept 2021 at 08:05, Arnd Bergmann <arnd@arndb.de> wrote: > > >n Mon, Sep 13, 2021 at 3:54 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > > One straightforward hardware independent low-level API would > > > be the traditional BLAS GEMM call[1] for matrix multiplication > > > and its variants (integer, float, bfloat16, ...). Most of the frameworks > > > are able to use SGEMM to do the actual calculation since that > > > has optimized versions for most CPUs and GPUs, and most > > > hardware accelerators should be able to provide an > > > implementation of this that doesn't completely suck. This > > > can be used for both inferencing and training. > > > > I think BLAS are too high-level for these. Sure fore perfect speed the > > vendor probably wants to have their own BLAS thing, their own NN > > optmizer and a heap of other things, but for the low-level userspace > > we're talking about here that pretty much doesn't matter. > > I suppose high-level vs low-level is not the correct distinction here, > it's more like fixed-function vs programmable. > > As a fixed-function interface, something like GEMM is probably as > low-level as you would want to get, as it's big enough to make sense > as a single atomic command, but small enough to be able to build on > top of it. The distinctions is more programming model than fixed vs programmable in rough order of complexity a) device is MMIO programmed and can process one thing, kernel needs to mediate between exclusive users (big lock, initial drm subsystem) b) device has a queue that can process untrusted userspace command with no memory safety (old drm drivers, in-kernel command stream parsing) c) device has queues, contexts, memory safety, virtual address space (newer drm drivers) d) device has full preempt on all hw blocks, is fully coherent, can trigger paging sanely, userspace can submit directly (pipe dream). What the device processes is of little consequence to the kernel driver model. the uAPI of course needs to reflect the above along with what the device can program. Since there could be a queue for a DMA device that isn't specificed but can be programmed to DMA random system memory. Devices in category (a) are the sort of things that can need kernel interfaces like a GEMM or BLAS level, however there is no point having an interface at that level for any of the b/c/d device. That interface needs to be in userspace somewhere, level0 or something like is probably where things will end up, and the type (a) devices will die out. > I realize that fixed-function is not fashionable on GPUs, but they > are widely used in other areas (video codecs, crypto, ...) even when > you are running precompiled code on the accelerator hardware. > This would of course replace the question of open source user space > with the question of open-source firmware, as the user side would > become mostly while the accelerator goes from dynamically created > to a firmware blob. We have lots of fixed function on GPUs, video codecs are on most x86 GPUs. It's how you program them that matters, most of them are behind queues similar to the 3D engine, so you program them the same way. What isn't fashionable on GPUs is programmable blocks that are single user that only the kernel can program one user on at a time, since hw has long since left that model as desirable. There are some AI accelerators going doing the same path, but eventually they'll have to be shareable and catch up with GPU programming models to remain competitive. Dave. ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-13 23:33 ` Dave Airlie @ 2021-09-14 9:08 ` Arnd Bergmann 2021-09-14 9:23 ` Daniel Vetter 2021-09-14 15:43 ` Luck, Tony 0 siblings, 2 replies; 77+ messages in thread From: Arnd Bergmann @ 2021-09-14 9:08 UTC (permalink / raw) To: Dave Airlie Cc: Arnd Bergmann, Daniel Vetter, Linus Walleij, Greg KH, Leon Romanovsky, Laurent Pinchart, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit, dev On Tue, Sep 14, 2021 at 1:33 AM Dave Airlie <airlied@gmail.com> wrote: > On Tue, 14 Sept 2021 at 08:05, Arnd Bergmann <arnd@arndb.de> wrote: > > >On Mon, Sep 13, 2021 at 3:54 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > I think BLAS are too high-level for these. Sure fore perfect speed the > > > vendor probably wants to have their own BLAS thing, their own NN > > > optmizer and a heap of other things, but for the low-level userspace > > > we're talking about here that pretty much doesn't matter. > > > > I suppose high-level vs low-level is not the correct distinction here, > > it's more like fixed-function vs programmable. > > > > As a fixed-function interface, something like GEMM is probably as > > low-level as you would want to get, as it's big enough to make sense > > as a single atomic command, but small enough to be able to build on > > top of it. > > The distinctions is more programming model than fixed vs programmable > in rough order of complexity > > a) device is MMIO programmed and can process one thing, kernel needs > to mediate between exclusive users (big lock, initial drm subsystem) > b) device has a queue that can process untrusted userspace command > with no memory safety (old drm drivers, in-kernel command stream > parsing) > c) device has queues, contexts, memory safety, virtual address space > (newer drm drivers) > d) device has full preempt on all hw blocks, is fully coherent, can > trigger paging sanely, userspace can submit directly (pipe dream). > > What the device processes is of little consequence to the kernel > driver model. the uAPI of course needs to reflect the above along with > what the device can program. Since there could be a queue for a DMA > device that isn't specificed but can be programmed to DMA random > system memory. Thank you for the useful overview! > Devices in category (a) are the sort of things that can need kernel > interfaces like a GEMM or BLAS level, however there is no point having > an interface at that level for any of the b/c/d device. That interface > needs to be in userspace somewhere, level0 or something like is > probably where things will end up, and the type (a) devices will die > out. I can see two reasons why one would want to support a type (a) interface even with the more versatile devices: - It can be done in a generic way so that simply adding a kernel driver and loading some firmware into it makes existing user space software work out of the box. - It gives the manufacturer a way to get an upstream kernel driver without open sourcing their firmware (a.k.a. compiler and user space driver). Whether you consider this a good or bad thing is of course a matter of perspective. > > I realize that fixed-function is not fashionable on GPUs, but they > > are widely used in other areas (video codecs, crypto, ...) even when > > you are running precompiled code on the accelerator hardware. > > This would of course replace the question of open source user space > > with the question of open-source firmware, as the user side would > > become mostly while the accelerator goes from dynamically created > > to a firmware blob. > > We have lots of fixed function on GPUs, video codecs are on most x86 > GPUs. It's how you program them that matters, most of them are behind > queues similar to the 3D engine, so you program them the same way. So these would go through /dev/dri instead of /dev/media0? I can definitely see a lot of codec drivers in the kernel that use a /dev/media interfaces, and the tradeoffs between those two seem very similar to the tradeoffs you get for machine learning accelerators. > What isn't fashionable on GPUs is programmable blocks that are single > user that only the kernel can program one user on at a time, since hw > has long since left that model as desirable. There are some AI > accelerators going doing the same path, but eventually they'll have to > be shareable and catch up with GPU programming models to remain > competitive. I'm not convinced by this at all. While I totally understand this argument for GPUs and general-purpose users (phone, PC, server, ...), I also see a lot of cheap SoC hardware with much simpler requirements. If the chip is built for an embedded application (face detection, smart speaker, ...) you would never need to have two processes access the same accelerator hardware, or even just load a new model into it after boot. Adding any complexity to the hardware increases the cost, so you would only do it if absolutely necessary, or if the cheapest off-the-shelf solution already includes it. Arnd ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-14 9:08 ` Arnd Bergmann @ 2021-09-14 9:23 ` Daniel Vetter 2021-09-14 10:47 ` Laurent Pinchart 2021-09-14 12:58 ` Arnd Bergmann 2021-09-14 15:43 ` Luck, Tony 1 sibling, 2 replies; 77+ messages in thread From: Daniel Vetter @ 2021-09-14 9:23 UTC (permalink / raw) To: Arnd Bergmann Cc: Dave Airlie, Linus Walleij, Greg KH, Leon Romanovsky, Laurent Pinchart, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit, dev On Tue, Sep 14, 2021 at 11:09 AM Arnd Bergmann <arnd@arndb.de> wrote: > > On Tue, Sep 14, 2021 at 1:33 AM Dave Airlie <airlied@gmail.com> wrote: > > On Tue, 14 Sept 2021 at 08:05, Arnd Bergmann <arnd@arndb.de> wrote: > > > >On Mon, Sep 13, 2021 at 3:54 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > > I think BLAS are too high-level for these. Sure fore perfect speed the > > > > vendor probably wants to have their own BLAS thing, their own NN > > > > optmizer and a heap of other things, but for the low-level userspace > > > > we're talking about here that pretty much doesn't matter. > > > > > > I suppose high-level vs low-level is not the correct distinction here, > > > it's more like fixed-function vs programmable. > > > > > > As a fixed-function interface, something like GEMM is probably as > > > low-level as you would want to get, as it's big enough to make sense > > > as a single atomic command, but small enough to be able to build on > > > top of it. > > > > The distinctions is more programming model than fixed vs programmable > > in rough order of complexity > > > > a) device is MMIO programmed and can process one thing, kernel needs > > to mediate between exclusive users (big lock, initial drm subsystem) I think even for these you might want a drm style uapi, where drm/sched takes different jobs and hammers them into hw in a kernel thread. Ofc it all depends what the programming model is, and something more fixed like media might make sense. > > b) device has a queue that can process untrusted userspace command > > with no memory safety (old drm drivers, in-kernel command stream > > parsing) > > c) device has queues, contexts, memory safety, virtual address space > > (newer drm drivers) > > d) device has full preempt on all hw blocks, is fully coherent, can > > trigger paging sanely, userspace can submit directly (pipe dream). > > > > What the device processes is of little consequence to the kernel > > driver model. the uAPI of course needs to reflect the above along with > > what the device can program. Since there could be a queue for a DMA > > device that isn't specificed but can be programmed to DMA random > > system memory. > > Thank you for the useful overview! > > > Devices in category (a) are the sort of things that can need kernel > > interfaces like a GEMM or BLAS level, however there is no point having > > an interface at that level for any of the b/c/d device. That interface > > needs to be in userspace somewhere, level0 or something like is > > probably where things will end up, and the type (a) devices will die > > out. > > I can see two reasons why one would want to support a type (a) > interface even with the more versatile devices: > > - It can be done in a generic way so that simply adding a kernel > driver and loading some firmware into it makes existing user space > software work out of the box. > > - It gives the manufacturer a way to get an upstream kernel driver > without open sourcing their firmware (a.k.a. compiler and user > space driver). Whether you consider this a good or bad thing is > of course a matter of perspective. I think for some embedded use-case this makes sense, especially around media stuff. I don't think it's BLAS, because on the compute side you really want a compiler that sees through the entire thing and can optimize it. Afaik BLAS is for some quick prototype of matrix algorithms and most importantly, for the top500 list :-) > > > I realize that fixed-function is not fashionable on GPUs, but they > > > are widely used in other areas (video codecs, crypto, ...) even when > > > you are running precompiled code on the accelerator hardware. > > > This would of course replace the question of open source user space > > > with the question of open-source firmware, as the user side would > > > become mostly while the accelerator goes from dynamically created > > > to a firmware blob. > > > > We have lots of fixed function on GPUs, video codecs are on most x86 > > GPUs. It's how you program them that matters, most of them are behind > > queues similar to the 3D engine, so you program them the same way. > > So these would go through /dev/dri instead of /dev/media0? I can definitely > see a lot of codec drivers in the kernel that use a /dev/media interfaces, > and the tradeoffs between those two seem very similar to the tradeoffs > you get for machine learning accelerators. Yeah we have plenty of codes running on top of /dev/dri0, with all the magic in userspace. They are all very far away from anything that is a machine learning accelerator. > > What isn't fashionable on GPUs is programmable blocks that are single > > user that only the kernel can program one user on at a time, since hw > > has long since left that model as desirable. There are some AI > > accelerators going doing the same path, but eventually they'll have to > > be shareable and catch up with GPU programming models to remain > > competitive. > > I'm not convinced by this at all. While I totally understand this argument > for GPUs and general-purpose users (phone, PC, server, ...), I also see > a lot of cheap SoC hardware with much simpler requirements. If the chip > is built for an embedded application (face detection, smart speaker, ...) > you would never need to have two processes access the same > accelerator hardware, or even just load a new model into it after > boot. Adding any complexity to the hardware increases the cost, so > you would only do it if absolutely necessary, or if the cheapest > off-the-shelf solution already includes it. Yeah for those I think a more fixed uapi like drivers/media has a lot of makes sense. What I don't like is when vendors then use that excuse of "oh you only upload a fixed model at boot" to shovel in an acccel driver with full generic interface, but not all the userspace bits&pieces. There's unfortunately another accel driver in drivers/misc for qualcom soc, which really should be either a media driver (for the fixed function use-case) or a drm driver (for the fully programmable) use-case. I think for the fixed-function interface case you can also make a reasonable argument that just documenting that fixed interface and all the parameters is good enough. But as soon as the interface becomes a generic "submit workload" style thing because you want to make it work for an entire set of "firmware" compiled by your closed stack, that's out of the window. So yeah there's another driver in misc which managed to bypass review of two subsystem, not just one :-/ -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-14 9:23 ` Daniel Vetter @ 2021-09-14 10:47 ` Laurent Pinchart 2021-09-14 12:58 ` Arnd Bergmann 1 sibling, 0 replies; 77+ messages in thread From: Laurent Pinchart @ 2021-09-14 10:47 UTC (permalink / raw) To: Daniel Vetter Cc: Arnd Bergmann, Dave Airlie, Linus Walleij, Greg KH, Leon Romanovsky, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit, dev On Tue, Sep 14, 2021 at 11:23:56AM +0200, Daniel Vetter wrote: > On Tue, Sep 14, 2021 at 11:09 AM Arnd Bergmann wrote: > > On Tue, Sep 14, 2021 at 1:33 AM Dave Airlie wrote: > > > On Tue, 14 Sept 2021 at 08:05, Arnd Bergmann wrote: > > > > >On Mon, Sep 13, 2021 at 3:54 PM Daniel Vetter wrote: > > > > > I think BLAS are too high-level for these. Sure fore perfect speed the > > > > > vendor probably wants to have their own BLAS thing, their own NN > > > > > optmizer and a heap of other things, but for the low-level userspace > > > > > we're talking about here that pretty much doesn't matter. > > > > > > > > I suppose high-level vs low-level is not the correct distinction here, > > > > it's more like fixed-function vs programmable. > > > > > > > > As a fixed-function interface, something like GEMM is probably as > > > > low-level as you would want to get, as it's big enough to make sense > > > > as a single atomic command, but small enough to be able to build on > > > > top of it. > > > > > > The distinctions is more programming model than fixed vs programmable > > > in rough order of complexity > > > > > > a) device is MMIO programmed and can process one thing, kernel needs > > > to mediate between exclusive users (big lock, initial drm subsystem) > > I think even for these you might want a drm style uapi, where > drm/sched takes different jobs and hammers them into hw in a kernel > thread. Ofc it all depends what the programming model is, and > something more fixed like media might make sense. For completeness, there's a similar component in the V4L2 M2M framework, but simpler. Jobs are executed sequentially in the order they are received. The simplicity is mostly due to the fact that the type of hardware V4L2 M2M supports doesn't have the ability to run multiple jobs in parallel. We also have ISPs that fall in this category, and use the V4L2 API in memory-to-memory mode but without any scheduling, because context switching doesn't exist at the hardware level and is too expensive to implement in software. For those we restrict operation to a single process at a time. > > > b) device has a queue that can process untrusted userspace command > > > with no memory safety (old drm drivers, in-kernel command stream > > > parsing) > > > c) device has queues, contexts, memory safety, virtual address space > > > (newer drm drivers) > > > d) device has full preempt on all hw blocks, is fully coherent, can > > > trigger paging sanely, userspace can submit directly (pipe dream). > > > > > > What the device processes is of little consequence to the kernel > > > driver model. the uAPI of course needs to reflect the above along with > > > what the device can program. Since there could be a queue for a DMA > > > device that isn't specificed but can be programmed to DMA random > > > system memory. > > > > Thank you for the useful overview! > > > > > Devices in category (a) are the sort of things that can need kernel > > > interfaces like a GEMM or BLAS level, however there is no point having > > > an interface at that level for any of the b/c/d device. That interface > > > needs to be in userspace somewhere, level0 or something like is > > > probably where things will end up, and the type (a) devices will die > > > out. > > > > I can see two reasons why one would want to support a type (a) > > interface even with the more versatile devices: > > > > - It can be done in a generic way so that simply adding a kernel > > driver and loading some firmware into it makes existing user space > > software work out of the box. > > > > - It gives the manufacturer a way to get an upstream kernel driver > > without open sourcing their firmware (a.k.a. compiler and user > > space driver). Whether you consider this a good or bad thing is > > of course a matter of perspective. > > I think for some embedded use-case this makes sense, especially around > media stuff. > > I don't think it's BLAS, because on the compute side you really want a > compiler that sees through the entire thing and can optimize it. Afaik > BLAS is for some quick prototype of matrix algorithms and most > importantly, for the top500 list :-) > > > > > I realize that fixed-function is not fashionable on GPUs, but they > > > > are widely used in other areas (video codecs, crypto, ...) even when > > > > you are running precompiled code on the accelerator hardware. > > > > This would of course replace the question of open source user space > > > > with the question of open-source firmware, as the user side would > > > > become mostly while the accelerator goes from dynamically created > > > > to a firmware blob. > > > > > > We have lots of fixed function on GPUs, video codecs are on most x86 > > > GPUs. It's how you program them that matters, most of them are behind > > > queues similar to the 3D engine, so you program them the same way. > > > > So these would go through /dev/dri instead of /dev/media0? I can definitely > > see a lot of codec drivers in the kernel that use a /dev/media interfaces, > > and the tradeoffs between those two seem very similar to the tradeoffs > > you get for machine learning accelerators. > > Yeah we have plenty of codes running on top of /dev/dri0, with all the > magic in userspace. > > They are all very far away from anything that is a machine learning accelerator. > > > > What isn't fashionable on GPUs is programmable blocks that are single > > > user that only the kernel can program one user on at a time, since hw > > > has long since left that model as desirable. There are some AI > > > accelerators going doing the same path, but eventually they'll have to > > > be shareable and catch up with GPU programming models to remain > > > competitive. > > > > I'm not convinced by this at all. While I totally understand this argument > > for GPUs and general-purpose users (phone, PC, server, ...), I also see > > a lot of cheap SoC hardware with much simpler requirements. If the chip > > is built for an embedded application (face detection, smart speaker, ...) > > you would never need to have two processes access the same > > accelerator hardware, or even just load a new model into it after > > boot. Adding any complexity to the hardware increases the cost, so > > you would only do it if absolutely necessary, or if the cheapest > > off-the-shelf solution already includes it. > > Yeah for those I think a more fixed uapi like drivers/media has a lot > of makes sense. What I don't like is when vendors then use that excuse > of "oh you only upload a fixed model at boot" to shovel in an acccel > driver with full generic interface, but not all the userspace > bits&pieces. There's unfortunately another accel driver in > drivers/misc for qualcom soc, which really should be either a media > driver (for the fixed function use-case) or a drm driver (for the > fully programmable) use-case. > > I think for the fixed-function interface case you can also make a > reasonable argument that just documenting that fixed interface and all > the parameters is good enough. But as soon as the interface becomes a > generic "submit workload" style thing because you want to make it work > for an entire set of "firmware" compiled by your closed stack, that's > out of the window. > > So yeah there's another driver in misc which managed to bypass review > of two subsystem, not just one :-/ -- Regards, Laurent Pinchart ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-14 9:23 ` Daniel Vetter 2021-09-14 10:47 ` Laurent Pinchart @ 2021-09-14 12:58 ` Arnd Bergmann 2021-09-14 19:45 ` Daniel Vetter 1 sibling, 1 reply; 77+ messages in thread From: Arnd Bergmann @ 2021-09-14 12:58 UTC (permalink / raw) To: Daniel Vetter Cc: Arnd Bergmann, Dave Airlie, Linus Walleij, Greg KH, Leon Romanovsky, Laurent Pinchart, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit, dev On Tue, Sep 14, 2021 at 11:23 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > On Tue, Sep 14, 2021 at 11:09 AM Arnd Bergmann <arnd@arndb.de> wrote: > > I can see two reasons why one would want to support a type (a) > > interface even with the more versatile devices: > > > > - It can be done in a generic way so that simply adding a kernel > > driver and loading some firmware into it makes existing user space > > software work out of the box. > > > > - It gives the manufacturer a way to get an upstream kernel driver > > without open sourcing their firmware (a.k.a. compiler and user > > space driver). Whether you consider this a good or bad thing is > > of course a matter of perspective. > > I think for some embedded use-case this makes sense, especially around > media stuff. > > I don't think it's BLAS, because on the compute side you really want a > compiler that sees through the entire thing and can optimize it. Afaik > BLAS is for some quick prototype of matrix algorithms and most > importantly, for the top500 list :-) It's probably not the only thing you need, but I would assume something like sgemm and its variants are one of the building blocks you'd need in this kind of interface. Note that oneDNN also comes with a simplified interface similar to gemm[1] as well as straight wrapper around gemm itself. There are definitely frameworks that are successfully built just on top of NumPy and blas (with NumPy itself being built on top of blas). I used to make fun of linpack as the supercomputer benchmark that has no practical use, but in the end it does spend most of its time in the SGEMM function that is the most optimized algorithm in the world and that is also where you end up spending your cycles in many AI applications. I found a link to this blog post[2] explaining why this is still used everywhere, and this matches what I've seen elsewhere, but unlike me, the author seems to know what they are talking about ;-) To get back to my own question from earlier about which part of oneAPI is actually being used, I see that pytorch (to pick a common framework) can use either mkl (oneMKL, BLAS) or mkldnn (dnnl, oneDNN) as a backend, next to cuda, cudnn, openmp and certainly a number of third-party backends. The mkl backend seems to mostly be a wrapper around cblas_*gemm(), though I may be reading that wrong. The oneDNN backend operates on a higher level, calling into a subset of the oneDNN interfaces. The other frameworks I looked at (mxnet, tensorflow) look similar, probably each using other subsets of oneDNN. > > > We have lots of fixed function on GPUs, video codecs are on most x86 > > > GPUs. It's how you program them that matters, most of them are behind > > > queues similar to the 3D engine, so you program them the same way. > > > > So these would go through /dev/dri instead of /dev/media0? I can definitely > > see a lot of codec drivers in the kernel that use a /dev/media interfaces, > > and the tradeoffs between those two seem very similar to the tradeoffs > > you get for machine learning accelerators. > > Yeah we have plenty of codes running on top of /dev/dri0, with all the > magic in userspace. > > They are all very far away from anything that is a machine learning accelerator. Sure, I only meant the relation between dri codecs and media codecs is similar to the relation between the ways one can implement the AI accelerator APIs. > Yeah for those I think a more fixed uapi like drivers/media has a lot > of makes sense. What I don't like is when vendors then use that excuse > of "oh you only upload a fixed model at boot" to shovel in an acccel > driver with full generic interface, but not all the userspace > bits&pieces. There's unfortunately another accel driver in > drivers/misc for qualcom soc, which really should be either a media > driver (for the fixed function use-case) or a drm driver (for the > fully programmable) use-case. I would argue that for the fixed-function use case, the media subsystem isn't a great fit either. It would probably work just as well (as would the crypto subsystem), but having a distinct interface that does just one thing makes more sense conceptually, if only to make it clear where to look for such drivers and to have a consistent interface documentation. > I think for the fixed-function interface case you can also make a > reasonable argument that just documenting that fixed interface and all > the parameters is good enough. But as soon as the interface becomes a > generic "submit workload" style thing because you want to make it work > for an entire set of "firmware" compiled by your closed stack, that's > out of the window. Right, agreed. If we add a fixed-function interface, that should ideally not allow any vendor specific extensions at all, just a set of well-defined operations, and certainly not a bypass mode that gets used to send compiled binaries. Arnd [1] https://oneapi-src.github.io/oneDNN/dev_guide_matmul.html [1] https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/ ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-14 12:58 ` Arnd Bergmann @ 2021-09-14 19:45 ` Daniel Vetter 0 siblings, 0 replies; 77+ messages in thread From: Daniel Vetter @ 2021-09-14 19:45 UTC (permalink / raw) To: Arnd Bergmann Cc: Dave Airlie, Linus Walleij, Greg KH, Leon Romanovsky, Laurent Pinchart, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit, dev On Tue, Sep 14, 2021 at 2:58 PM Arnd Bergmann <arnd@arndb.de> wrote: > On Tue, Sep 14, 2021 at 11:23 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > On Tue, Sep 14, 2021 at 11:09 AM Arnd Bergmann <arnd@arndb.de> wrote: > > > > I can see two reasons why one would want to support a type (a) > > > interface even with the more versatile devices: > > > > > > - It can be done in a generic way so that simply adding a kernel > > > driver and loading some firmware into it makes existing user space > > > software work out of the box. > > > > > > - It gives the manufacturer a way to get an upstream kernel driver > > > without open sourcing their firmware (a.k.a. compiler and user > > > space driver). Whether you consider this a good or bad thing is > > > of course a matter of perspective. > > > > I think for some embedded use-case this makes sense, especially around > > media stuff. > > > > I don't think it's BLAS, because on the compute side you really want a > > compiler that sees through the entire thing and can optimize it. Afaik > > BLAS is for some quick prototype of matrix algorithms and most > > importantly, for the top500 list :-) > > It's probably not the only thing you need, but I would assume something > like sgemm and its variants are one of the building blocks you'd need > in this kind of interface. Note that oneDNN also comes with a > simplified interface similar to gemm[1] as well as straight wrapper around > gemm itself. > > There are definitely frameworks that are successfully built just on top > of NumPy and blas (with NumPy itself being built on top of blas). > I used to make fun of linpack as the supercomputer benchmark that > has no practical use, but in the end it does spend most of its time in > the SGEMM function that is the most optimized algorithm in the world > and that is also where you end up spending your cycles in many AI > applications. I found a link to this blog post[2] explaining why this is still > used everywhere, and this matches what I've seen elsewhere, but > unlike me, the author seems to know what they are talking about ;-) > > To get back to my own question from earlier about which part of oneAPI > is actually being used, I see that pytorch (to pick a common framework) > can use either mkl (oneMKL, BLAS) or mkldnn (dnnl, oneDNN) as a backend, > next to cuda, cudnn, openmp and certainly a number of third-party > backends. > > The mkl backend seems to mostly be a wrapper around cblas_*gemm(), > though I may be reading that wrong. > The oneDNN backend operates on a higher level, calling into a > subset of the oneDNN interfaces. The other frameworks I looked at > (mxnet, tensorflow) look similar, probably each using other subsets of > oneDNN. Hm I didn't know that in practice it's all just matrix multiplies in AI land too. I thought there's more fun going on here, but I guess as long as you have dense (enough) networks it's fully limited by the matrix multiply step and nothing else matters. Thanks for the references. I still dont think BLAS is what you want, except for a very specific NPU thing in a soc maybe that can't do anything else than actually matrix multiplies in hw. The reason is that vendors are most likely not going to give you the optimized kernels, and the dumb kernels are very boring (just multiply-add in a loop). So for anything somewhat programmable you want want level below that, or it's just not very interesting as userspace demonstraction vehicle for your kernel interface. Also there's generally quite some featurs in the command streamer (inter-engine sync as just one example), so a gemm ioctl call (or whatever you pick from blas) is definitely not what you want for anything that has a command streamer in hw. But I guess for the various NPUs that pop up in socs all over a limited blas interface with documentation might be good enough. > > > > We have lots of fixed function on GPUs, video codecs are on most x86 > > > > GPUs. It's how you program them that matters, most of them are behind > > > > queues similar to the 3D engine, so you program them the same way. > > > > > > So these would go through /dev/dri instead of /dev/media0? I can definitely > > > see a lot of codec drivers in the kernel that use a /dev/media interfaces, > > > and the tradeoffs between those two seem very similar to the tradeoffs > > > you get for machine learning accelerators. > > > > Yeah we have plenty of codes running on top of /dev/dri0, with all the > > magic in userspace. > > > > They are all very far away from anything that is a machine learning accelerator. > > Sure, I only meant the relation between dri codecs and media codecs > is similar to the relation between the ways one can implement the AI > accelerator APIs. > > > Yeah for those I think a more fixed uapi like drivers/media has a lot > > of makes sense. What I don't like is when vendors then use that excuse > > of "oh you only upload a fixed model at boot" to shovel in an acccel > > driver with full generic interface, but not all the userspace > > bits&pieces. There's unfortunately another accel driver in > > drivers/misc for qualcom soc, which really should be either a media > > driver (for the fixed function use-case) or a drm driver (for the > > fully programmable) use-case. > > I would argue that for the fixed-function use case, the media subsystem > isn't a great fit either. It would probably work just as well (as would the > crypto subsystem), but having a distinct interface that does just > one thing makes more sense conceptually, if only to make it clear > where to look for such drivers and to have a consistent interface > documentation. Yeah for tiny soc NPU a fixed interface might work out. Would need some benchmarking to check the ioctl overhead isn't too bad, I guess worst case the new uring ioctl stuff could be used for real fast dispatch. I've seen an nvida npu (but not sure that shipped anywhere) and the arm npu that Linus mentioned somewhere else with open enough drivers to make this possible. -Daniel > > I think for the fixed-function interface case you can also make a > > reasonable argument that just documenting that fixed interface and all > > the parameters is good enough. But as soon as the interface becomes a > > generic "submit workload" style thing because you want to make it work > > for an entire set of "firmware" compiled by your closed stack, that's > > out of the window. > > Right, agreed. If we add a fixed-function interface, that should ideally > not allow any vendor specific extensions at all, just a set of well-defined > operations, and certainly not a bypass mode that gets used to > send compiled binaries. > > Arnd > > [1] https://oneapi-src.github.io/oneDNN/dev_guide_matmul.html > [1] https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/ -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 77+ messages in thread
* RE: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-14 9:08 ` Arnd Bergmann 2021-09-14 9:23 ` Daniel Vetter @ 2021-09-14 15:43 ` Luck, Tony 1 sibling, 0 replies; 77+ messages in thread From: Luck, Tony @ 2021-09-14 15:43 UTC (permalink / raw) To: Arnd Bergmann, Dave Airlie Cc: Daniel Vetter, Linus Walleij, Greg KH, Leon Romanovsky, Laurent Pinchart, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit, dev > d) device has full preempt on all hw blocks, is fully coherent, can > trigger paging sanely, userspace can submit directly (pipe dream). Not a pipe dream. Coming soon to a server near you. The Intel "ENQCMD" instruction can be used from userspace to submit a descriptor to an accelerator device. ENQCMD picks up the PASID from an MSR during submission, so the device can ask the iommu to translate virtual addresses based on the address space of the process that submitted the descriptor. -Tony ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-13 13:20 ` Arnd Bergmann 2021-09-13 13:54 ` Daniel Vetter @ 2021-09-13 14:52 ` James Bottomley 2021-09-14 13:07 ` Linus Walleij 2 siblings, 0 replies; 77+ messages in thread From: James Bottomley @ 2021-09-13 14:52 UTC (permalink / raw) To: Arnd Bergmann, Linus Walleij Cc: Dave Airlie, Daniel Vetter, Greg KH, Leon Romanovsky, Laurent Pinchart, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit, dev On Mon, 2021-09-13 at 15:20 +0200, Arnd Bergmann wrote: > On Mon, Sep 13, 2021 at 12:51 AM Linus Walleij < > linus.walleij@linaro.org> wrote: > > On Sun, Sep 12, 2021 at 11:13 PM Dave Airlie <airlied@gmail.com> > > wrote: > > > > > For userspace components as well these communities of experts > > > need to exist for each domain, and we need to encourage upstream > > > first processes across the board for these split kernel/userspace > > > stacks. > > > > > > The habanalabs compiler backend is an LLVM fork, I'd like to see > > > the effort to upstream that LLVM backend into LLVM proper. > > > > I couldn't agree more. > > > > A big part of the problem with inference engines / NPU:s is that of > > no standardized userspace. Several of the machine learning > > initiatives from some years back now have stale git repositories > > and are visibly unmaintained, c.f. Caffe > > https://github.com/BVLC/caffe last commit 2 years ago. > > Caffe as a standalone project was abandoned and merged into > PyTorch, see https://caffe2.ai/. I think this is the kind of > consolidation of those projects that you are looking for. > > > Habanalabs propose an LLVM fork as compiler, yet the Intel > > logo is on the Apache TVM website, and no sign of integrating with > > that project. They claim to support also TensorFlow. > > > > The way I perceive it is that there simply isn't any GCC/LLVM or > > Gallium 3D of NPU:s, these people haven't yet decided that "here > > is that userspace we are all going to use". Or have they? > > > > LLVM? TVM? TensorFlow? PyTorch? Some other one? > > > > What worries me is that I don't see one single developer being > > able to say "this one definitely, and they will work with the > > kernel community", and that is what we need to hear. > > I don't actually think this is a decision we can possibly wait for. > The ones you listed all work on different levels, some build on top > of others, and some may get replaced by new ones over time. I cut all the interesting design stuff because there's a meta problem here: we seem to be charting a course based on the idea we have to get the userspace API right first time. We really don't, we have to make a reasonable effort to get it right, but we can go around for a v2 if we fail ... that's the whole point about open source: fail fast and redo. No-one can really design an API without seeing how the users actually use it. When we do get it right first time, it's more by luck than judgment, so we should expect failure more often than not. The trick to a successful API is usually finding what the minimal set of operations is and implementing that. If you think about bells and whistles first (as 95% of API design documents do tend to) you usually fail. Completely new APIs with producer consumer interlock always have this failure problem, because in a blue sky environment, neither the producer nor consumer knows exactly what they want the first time around ... they usually have to try a couple of times to figure out what works and what doesn't. What we have to enable is this fast iteration while they work it out. API versioning is usually a good beginning to this ... There's also nothing wrong with recommending existing interfaces and seeing how that works because existing patterns are there for a reason. James ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-13 13:20 ` Arnd Bergmann 2021-09-13 13:54 ` Daniel Vetter 2021-09-13 14:52 ` James Bottomley @ 2021-09-14 13:07 ` Linus Walleij 2 siblings, 0 replies; 77+ messages in thread From: Linus Walleij @ 2021-09-14 13:07 UTC (permalink / raw) To: Arnd Bergmann Cc: Dave Airlie, Daniel Vetter, Greg KH, Leon Romanovsky, Laurent Pinchart, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit, dev On Mon, Sep 13, 2021 at 3:20 PM Arnd Bergmann <arnd@arndb.de> wrote: > One straightforward hardware independent low-level API would > be the traditional BLAS GEMM call[1] for matrix multiplication > and its variants (integer, float, bfloat16, ...). What this (and subsequent posts from Dave and Daniel) show, is that the general pattern is that what we are accelerating is no longer the specialized use cases of linear algebra such as 3D "shaders" or whatever inference linear algebra NPUs are doing, which appear to include regression, bayesian stuff, gaussian quadrature... name it. What we are talking about here is acceleration, using an efficient data path, of numerical analysis, using tailored hardware. I'm not even sure we are limited to linear algebra anymore. Is this what is happening, and should we be thinking numerical analysis accelerators and their different shapes and sizes rather than usecase-foo-accelerators, so we don't end up with this situation again the next time applied math comes knocking on the door with their next usecase? Yours, Linus Walleij ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-12 14:34 ` Greg KH ` (2 preceding siblings ...) 2021-09-12 20:41 ` Dave Airlie @ 2021-09-13 14:03 ` Mark Brown 3 siblings, 0 replies; 77+ messages in thread From: Mark Brown @ 2021-09-13 14:03 UTC (permalink / raw) To: Greg KH Cc: Leon Romanovsky, Laurent Pinchart, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit [-- Attachment #1: Type: text/plain, Size: 1773 bytes --] On Sun, Sep 12, 2021 at 04:34:48PM +0200, Greg KH wrote: > On Sun, Sep 12, 2021 at 05:15:30PM +0300, Leon Romanovsky wrote: > > https://lore.kernel.org/all/YT2zryAKHc%2F5R2IH@unreal/ > > "To be used" means some open PR to existing package or request for > > inclusion for new packages. > But again, distros will not take things that are not already in the > kernel. Or, mainly for the community distros which are open to people volunteering to package things, can't be relied on to do validation beyond checking that the package is distributable and that the installed files integrate into the distro in roughly the right form. That's not really a meaningful form of back pressure from our point of view. > > > But how do you define Android's userspace? Just one vendor? 2 vendors? > > > 10 vendors? There is major userspace fragmentation in Android userspace > > > in many places, the user/kernel boundry being one of the big ones as > > > many of us have found out over the past years. And many of us are > > > working to resolve this, but it's not so simple at times, and I have > > > many examples if you want specifics. > > Lauerent suggested AOSP > > https://lore.kernel.org/all/YTyWANV%2FmSkQbYhj@pendragon.ideasonboard.com/ > Vendors can not get code into AOSP for various reasons that only Google > understands. There are many millions, if not billions of Android > devices out there with user/kernel apis that are not upstream nor in > AOSP because Google doesn't want to take them, or because the vendor can > not go through those hoops (international law is tricky at times...) Right, if you're not one of the main SoC vendors working on something that's a main application of Android it can be very hard to get anyone to give you the time of day. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-12 13:25 ` Greg KH 2021-09-12 14:15 ` Leon Romanovsky @ 2021-09-12 15:55 ` Laurent Pinchart 2021-09-12 16:43 ` James Bottomley 1 sibling, 1 reply; 77+ messages in thread From: Laurent Pinchart @ 2021-09-12 15:55 UTC (permalink / raw) To: Greg KH Cc: Leon Romanovsky, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit Hi Greg and Leon, (Sorry to reply in the middle of the thread, but there is context I wanted to reply to that has been deleted in the last e-mails) On Sun, Sep 12, 2021 at 03:25:58PM +0200, Greg KH wrote: > On Sun, Sep 12, 2021 at 11:29:45AM +0300, Leon Romanovsky wrote: > > On Sun, Sep 12, 2021 at 09:26:57AM +0200, Greg KH wrote: > > > On Sun, Sep 12, 2021 at 07:27:55AM +0300, Leon Romanovsky wrote: > > > > On Sun, Sep 12, 2021 at 01:04:01AM +0300, Laurent Pinchart wrote: > > > > > On Sat, Sep 11, 2021 at 03:04:07PM +0300, Leon Romanovsky wrote: > > > > > > On Sat, Sep 11, 2021 at 02:41:52PM +0300, Laurent Pinchart wrote: > > > > > > > On Sat, Sep 11, 2021 at 01:31:02PM +0300, Leon Romanovsky wrote: > > > > > > > > On Sat, Sep 11, 2021 at 01:55:16AM +0200, Thomas Gleixner wrote: > > > > > > > > > On Fri, Sep 10 2021 at 16:45, Josh Triplett wrote: > > > > > > > > > > > > > > > > > > > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote: > > > > > > > > > >> On media, enforcing userspace to always be open source would > > > > > > > > > >> have been very bad, as it would prevent several videoconferencing > > > > > > > > > >> software to exist on Linux. > > > > > > > > > > > > > > > > > > > > I don't think we should enforce that all userspace users of an interface > > > > > > > > > > be Open Source. I do think we should enforce that *some* userspace user > > > > > > > > > > of an interface be Open Source before we add the interface. > > > > > > > > > > > > > > > > > > The real question is whether the interface is documented in a way that > > > > > > > > > an Open Source implementation is possible. It does not matter whether it > > > > > > > > > exists at that point in time or not. Even if it exists there is no > > > > > > > > > guarantee that it is feature complete. > > > > > > > > > > > > > > > > > > Freely accessible documentation is really the key. > > > > > > > > > > > > > > > > I have more radical view than you and think that documentation is far > > > > > > > > from being enough. I would like to see any userspace API used (or to be > > > > > > > > used) in any package which exists in Debiam/Fedora/SuSE. > > > > > > > > > > > > > > We probably need to add Android AOSP to that list, as we have > > > > > > > Android-specific APIs (not that I believe we *should* have > > > > > > > Android-specific APIs, there's been lots of efforts over the past years > > > > > > > to develop standard APIs for use cases that stem from Android, slowly > > > > > > > replacing Android-specific APIs in some area, but I don't believe we can > > > > > > > realisticly bridge that gap completely overnight, if ever). > > > > > > > > > > > > Maybe. > > > > > > > > > > > > > > Only this will give us some sort of confidence that API and device are usable > > > > > > > > to some level. As a side note, we will be able to estimate possible API > > > > > > > > deprecation/fix/extension based on simple search in package databases. > > > > > > > > > > > > > > Linux supports devices from very diverse markets, from very tiny > > > > > > > embedded devices to supercomputers. We have drivers for devices that > > > > > > > exist in data centres of a single company only, or for which only a > > > > > > > handful of units exist through the world. The set of rules that we'll > > > > > > > decide on, if any, should take this into account. > > > > > > > > > > > > I'm part of that group (RDMA) who cares about enterprise, cloud and supercomputers. :) > > > > > > So for us, working out-of-the box (distro packages and not github code drops) is > > > > > > the key to the scalability. > > > > > > > > > > What if we're dealing with a device that only exists in a handful of > > > > > machines though ? Would distributions accept the burden of packaging > > > > > corresponding userspace code, and maintaining the packages, when only a > > > > > handful of people in the world will use it ? It's a genuine question. > > > > > > > > Fedora, Debian and OpenSuSE are volunteer based distributions, they > > > > accept new packages, which need to be prepared (or asked to be > > > > prepared) by such vendors. > > > > > > > > There is no "accept the burden of packaging corresponding userspace code, > > > > and maintaining the packages", it is on package maintainer who can or > > > > can't be associated with distribution. > > > > > > > > > > Regarding "embedded devices", I remind that we are talking about > > > > > > userspace API and most likely busybox will be used for them, which is > > > > > > also part of larger distro anyway, so fails under category "exists in > > > > > > Debian/Fedora/SuSE". > > > > > > > > > > We're talking about APIs exposed by drivers, for devices such as GPUs, > > > > > cameras or AI/ML accelerators. I don't think busybox will exercise those > > > > > :-) We have Masa for GPUs, libcamera for cameras, and other frameworks > > > > > I'm less familiar with for AI/ML accelerators, and I expect those to be > > > > > packaged by distributions. There are however other kind of devices that > > > > > don't fall in existing well-defined categories. > > > > > > > > I'm a little bit confused here. IMHO, you are trying to find an universal > > > > solution for a problem that doesn't exist. > > > > > > > > Above you asked how to deal with niche devices? Here you talk about mass > > > > products devices for the enterprise while before you mentioned "embedded > > > > devices". > > > > > > > > 1. Niche devices - continue to do as they do it now, by supplying > > > > out-of-tree solutions for their customers. Such devices and companies > > > > rarely need upstream linux kernel support, because the burden to > > > > upstream it is very high. We don't want them in the tree either, because > > > > once they upstream it, the maintenance burden will be on us. > > > > > > {sigh} > > > > > > No, that is NOT our rule at all. > > > > > > These devices and companies need to be upstream more than anything else > > > as that way they become part of our community and are responsible for > > > maintaining their code in the tree. To force them to remain outside is > > > to go against everything that many of us have been saying for _decades_ > > > now. > > > > > > And how are you going to judge what is, and is not, a "niche" device? I partly side with Greg here. I welcome drivers for "niche" devices, regardless of how we define them, in the kernel, *if* they comply with the rules. In some cases companies won't bother to upstream the code because of the "niche" and ROI criteria, but that should only be their decision, *not* something we force upon them. There could be some exceptions, when the device architecture is so alien that it would require an effort from the community that we just can't afford at that particular point of time (rewriting the driver model for instance), but I think that would be caught by the ROI criteria anyway. > > I will leave to that company to decide. Again this is exactly how they > > operate now, there is nothing new here. Every company calculates ROI > > for working with upstream and small companies with niche devices are not > > different here. > > > > The main idea that I want to see working userspace stack, and being in > > distro sets a certain quality level, am I asking too much? > > Define "working userspace stack" and "distro" please. Like others have > said, many distros will not take userspace code unless it's already in > the kernel tree first, as that ensures that the abi will not break. As mentioned in another part of the mail thread, requiring code being merged in upstream userspace projects and/or packaged by distributions will cause deadlocks, but requiring code to be submitted and (pre-)approved is workable. That's what DRM/KMS does. To upstream a new KMS property for instance, you need to show how it's going to be used in Weston/Xorg/Android/... by submitting patches, and have the overall architecture approved by the corresponding maintainers. Does this raise the bar to entry ? Yes. But it also expands the community, we've seen cases where vendors, being told that their random unproven API wouldn't be accepted in the kernel, looked for options and realized that other vendors were facing the exact same problem. This leads to cross-vendor discussions and collaborative design of solutions. That's the Linux kernel development model as far as I'm concerned. I do however agree that defining "working userspace stack" precisely will be difficult, but I don't see that as an unsolvable problem. Whatever criteria we set, if someone wants to cheat, it will always be possible. We need to assume a minimum level of good faith on all sides. After all, if that wasn't the case, collaboration would be inherently impossible. If a vendor is then caught cheating, that will damage their reputation. We will be more cautious the next time they submit, and we could even decide to drop drivers in that case (not that I'd push for that in particular, it's just an example of options that we can evaluate). > > > > 2. Devices that hits the certain level of adoption - need to be > > > > integrated into certain userspace stack, which needs to be part of > > > > distro. > > > > > > Distros are a very odd rule to rely on given that they are by far the > > > minority of the usage in raw numbers for Linux in the world. > > > > You can count Android as another distro, it is just semantics. > > But how do you define Android's userspace? Just one vendor? 2 vendors? > 10 vendors? Possibly AOSP ? We don't need to have device support merged into AOSP as a criteria there, but if we have a multi-vendor framework that becomes the de facto standard in Linux for a particular set of use cases (think about Mesa for instance), having the framework included in generic distributions and having the device support submitted in the framework would be enough in my opinion. We don't need to wait until the support for that particular device hits distributions eventually when packages will be updated. As I said previously, we need to consider the end goal, but also create the path to achieve it. It's not fair telling vendors what they have to achieve if no way to do so exists. > There is major userspace fragmentation in Android userspace > in many places, the user/kernel boundry being one of the big ones as > many of us have found out over the past years. And many of us are > working to resolve this, but it's not so simple at times, and I have > many examples if you want specifics. > > > > > And AI/ML is no different here, someone just need to start build such > > > > stack. Otherwise, we will continue to see more free riders like HabanaLabs > > > > which don't have any real benefit to the community. > > > > > > Everyone contributes to Linux in a selfish manner, that's just how the > > > community works. That is not true, you're disregarding at least hobbyists here. They contribute for a wide variety of reasons, and most often get something in return (a working device, knowledge, a professional reputation, or even just the sense of contributing to humanity). The same is true of some companies too. Unless we're getting in the philosophical debate of whether true altruim even exists (I'd be happy to discuss that, but not here), I wouldn't qualify all this as selfish. > > > The work that companies like habanalabs is NOT being a > > > "free rider" at all, they have worked with us and done the hard work of > > > actually getting their code merged into the tree. > > > > I perfectly remember them trying to bypass netdev and RDMA communities > > by pretending "misc" device. > > > > https://lore.kernel.org/linux-rdma/20200915133556.21268811@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/ > > https://lore.kernel.org/linux-rdma/20200917171833.GJ8409@ziepe.ca/ > > > > Or DRM > > https://lore.kernel.org/linux-rdma/CAKMK7uFOfoxbD2Z5mb-qHFnUe5rObGKQ6Ygh--HSH9M=9bziGg@mail.gmail.com/ > > > > So I can agree with the statement "worked hard", but not with the > > relevant communities. > > I point at these as doing exactly what we want vendors to be doing! > Thank you for finding the good examples. This is a vendor submitting > patches and saying, "here is what we want to do, with a first cut at > doing it." It's up to us as a community to tell them if they are doing > it the right way or not. Isn't it exactly what we're discussing here ? Isn't telling them that we can't accept the driver because the device can't be used at all without their closed-source blob an acceptable way of saying they're doing it wrong ? > If we just let them all go their own ways, they will come up with > horrible apis and interfaces, we have all seen that before. > > So by working together, we both can learn from, and work together to > solve the issue. And that is what these driver authors and company has > been doing! They are part of our community, why are you saying they > should now just go do their own thing away from us? I feel some cognitive dissonance here :-) I don't really interpret anyone's comment in this mail thread as telling vendors to go away. Quite the contrary, as I mentioned above, requiring open userspace leads to standardized userspace framework and better designs in the end, which *is* community building. > And as for "bypassing", that feels very mean. We have had accelerator > code in the char/misc and other parts of the kernel tree since at least > 2018 if not earlier (I didn't look all that hard.) That's when subsystems have been bypassed. I vividly recall a discussion at plumbers on this topic a few years ago, about creating an accelerator subsystem and what requirements it should have. Some people pushed for an unregulated subsystem with vendor-specific per-driver userspace APIs, and some called for standardization of frameworks in userspace. There was no agreement at the time, but instead of trying to continue the effort, vendors got given a backdoor in drivers/misc/. I'll let the corresponding community members speak up if they recognize themselves and want to participate in the discussion, but I can tell you that this has been felt as a betrayal of our core values and a major blow to many attempts at fostering collaboration in userspace. > Just because someone > wanted to use the in-kernel apis that are there (why is dma-buf some > magic thing?) does not mean that they suddenly need to move to a > different subsystem. It means they should have been in a different subsystem from the beginning. After the bitter taste that the accelerators mess left in 2018, I know some people decided they had no other option than ignore those drivers, as long as they would stay there and do their own thing by themselves. Adding support for dma-buf means interoperating with other devices and drivers. That's a clear indication that those drivers are spreading their reach within the kernel, and if we accept this, then vendors will have free reigns to bypass any subsystem for any type of device by claiming it's a bit different. You mentioned you don't like fragmentation, that's exactly what we would have. > We get at least 1-2 new subsystems and major drivers that get added to > the kernel tree that do things that have never been done before with > custom user/kernel apis every kernel release. Not everything can be a > standard api no matter how much I, and others, wish it were. I don't think anyone has called for everything being standard. The point we're discussing is whether the non-standard APIs need to have a corresponding open userspace. On a side note, it's also not all black or white, in many cases device expose a standard API with device-specific pieces. A GPU driver uses the DRM API, which standardizes a set of common operations, but has also a set of custom ioctls to submit jobs to the GPU. Over time we may find that some of those custom ioctls may be standardized, but certainly not all of them. That's fine, nobody is complaining about that. > As examples, what about the hyperv blob api that was submitted recently > going around the block layer? What about the new Intel accelerator that > added yet-another-set-of-custom-ioctls? What about the rpi drivers? > What about the virtualbox drivers? Should all of those just live > outside of the kernel for forever? I'll comment on the RPi drivers only as I'm not familiar with the rest. It's interesting that you mention Raspberry Pi, as I've been working with them over the past couple of years to upstream camera support. They've had out-of-tree camera drivers since 2013, available in the RPi downstream kernel only. The situation is changing, we're working on upstreaming those drivers. This has required a very large amount of work in two areas: - In the kernel, the drivers use V4L2 with custom extensions that make them incompatible with camera sensor drivers in upstream. This means that merging, for instance, the RPi driver for the Sony IMX477 driver would make it usable on a RPi, but not on any other device. To solve this we're working on standardizing V4L2 extensions to cover the corresponding use cases. It's a large amount of work, which we've only been able to do by finding multiple vendors who are facing the same issues and convincing them to sponsor the development. If camera drivers could be merged in drivers/misc/ this would never have been possible. Note that large companies that have the resources to solve this issues often lack either the will or the knowledge, if not both. If you look at out-of-tree camera drivers from NVidia, Intel or other vendors, you will see duplicated drivers for camera sensors from Sony, OmniVision, ON Semi, ... that are bundled with the SoC kernel camera drivers, implemented in incompatible ways. - In userspace, RPi didn't have any framework in which to upstream any code, as there was simply no userspace camera framework (V4L2 is a kernel API designed to be used directly by application, the V4L2 equivalent to libalsa was a historical mistake and is considered legacy now). RPi has thus moved all the camera code that can't live in the kernel to a firmware (there's lots of complex algorithms that need to be implemented to make an embedded camera work, unlike USB webcams that are comparatively extremely simple to handle in all layers of the stack because the complex part is implemented inside the webcam). The firmware had support for 3 camera sensors, and that's more or less all that could be used on a RPi. Adding support for new sensors wans't possible for users, creating a very closed stack. The situation has changed with the development of the libcamera project. We now have a framework where vendors can upstream the device-specific userspace code, and RPi has done so. They've been more open than any other camera ISP vendor on the market today (there was also the TI OMAP3 that had public ISP documentation, but that's legacy and since then vendors have shifted to keeping everything closed). Here too this has been made possible because we have identified a problem and tried to fix it. It's a complex area, the amount of work required is huge, and it's very difficult to get vendors to do the right thing and contribute. I see camera support as another example of a situation where most vendors do it wrong, and we have to push them to collaborate and do it right instead. If we allowed camera kernel drivers with custom undocumented APIs and no open userspace, none of this would be possible. -- Regards, Laurent Pinchart ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-12 15:55 ` Laurent Pinchart @ 2021-09-12 16:43 ` James Bottomley 2021-09-12 16:58 ` Laurent Pinchart 0 siblings, 1 reply; 77+ messages in thread From: James Bottomley @ 2021-09-12 16:43 UTC (permalink / raw) To: Laurent Pinchart, Greg KH Cc: Leon Romanovsky, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit On Sun, 2021-09-12 at 18:55 +0300, Laurent Pinchart wrote: > As mentioned in another part of the mail thread, requiring code being > merged in upstream userspace projects and/or packaged by > distributions will cause deadlocks, but requiring code to be > submitted and (pre-)approved is workable. That's what DRM/KMS does. > To upstream a new KMS property for instance, you need to show how > it's going to be used in Weston/Xorg/Android/... by submitting > patches, and have the overall architecture approved by the > corresponding maintainers. This is no different from interlocks required in pretty much every other project crossing open source feature, so it seems like the right approach to me. We already use this for confidential computing, which often requires interlocking changes to QEMU, edk2 and other tools. Usually, for confidential computing, the evaluation is on either the QEMU or edk2 list which then accepts the patch and the rest of the projects follow. We do, occasionally, get a late objection to the API from one of the other projects after part of the enabling code has gone upstream in the others, but we handle this like a bug fix. James ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-12 16:43 ` James Bottomley @ 2021-09-12 16:58 ` Laurent Pinchart 2021-09-12 17:08 ` James Bottomley 0 siblings, 1 reply; 77+ messages in thread From: Laurent Pinchart @ 2021-09-12 16:58 UTC (permalink / raw) To: James Bottomley Cc: Greg KH, Leon Romanovsky, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit On Sun, Sep 12, 2021 at 09:43:39AM -0700, James Bottomley wrote: > On Sun, 2021-09-12 at 18:55 +0300, Laurent Pinchart wrote: > > As mentioned in another part of the mail thread, requiring code being > > merged in upstream userspace projects and/or packaged by > > distributions will cause deadlocks, but requiring code to be > > submitted and (pre-)approved is workable. That's what DRM/KMS does. > > To upstream a new KMS property for instance, you need to show how > > it's going to be used in Weston/Xorg/Android/... by submitting > > patches, and have the overall architecture approved by the > > corresponding maintainers. > > This is no different from interlocks required in pretty much every > other project crossing open source feature, so it seems like the right > approach to me. We already use this for confidential computing, which > often requires interlocking changes to QEMU, edk2 and other tools. > Usually, for confidential computing, the evaluation is on either the > QEMU or edk2 list which then accepts the patch and the rest of the > projects follow. We do, occasionally, get a late objection to the API > from one of the other projects after part of the enabling code has gone > upstream in the others, but we handle this like a bug fix. On the DRM/KMS side that's also handled fine as far as I know, as mentioned above. For cameras, libcamera is becoming the de facto standard userspace stack, so we'll have a solution too. The harder question is what to do when no standard userspace stack exists. The answer obviously is to create one (or possibly multiple alternatives), but we'll need more than wishful thinking to make that happened. I can tell it took lots of work for libcamera to see the light of the day, including on the business side of it, not just the technical side. -- Regards, Laurent Pinchart ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-12 16:58 ` Laurent Pinchart @ 2021-09-12 17:08 ` James Bottomley 0 siblings, 0 replies; 77+ messages in thread From: James Bottomley @ 2021-09-12 17:08 UTC (permalink / raw) To: Laurent Pinchart Cc: Greg KH, Leon Romanovsky, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit On Sun, 2021-09-12 at 19:58 +0300, Laurent Pinchart wrote: > On Sun, Sep 12, 2021 at 09:43:39AM -0700, James Bottomley wrote: > > On Sun, 2021-09-12 at 18:55 +0300, Laurent Pinchart wrote: > > > As mentioned in another part of the mail thread, requiring code > > > being merged in upstream userspace projects and/or packaged by > > > distributions will cause deadlocks, but requiring code to be > > > submitted and (pre-)approved is workable. That's what DRM/KMS > > > does. To upstream a new KMS property for instance, you need to > > > show how it's going to be used in Weston/Xorg/Android/... by > > > submitting patches, and have the overall architecture approved by > > > the corresponding maintainers. > > > > This is no different from interlocks required in pretty much every > > other project crossing open source feature, so it seems like the > > right approach to me. We already use this for confidential > > computing, which often requires interlocking changes to QEMU, edk2 > > and other tools. Usually, for confidential computing, the > > evaluation is on either the QEMU or edk2 list which then accepts > > the patch and the rest of the projects follow. We do, > > occasionally, get a late objection to the API from one of the other > > projects after part of the enabling code has gone upstream in the > > others, but we handle this like a bug fix. > > On the DRM/KMS side that's also handled fine as far as I know, as > mentioned above. For cameras, libcamera is becoming the de facto > standard userspace stack, so we'll have a solution too. The harder > question is what to do when no standard userspace stack exists. The > answer obviously is to create one (or possibly multiple > alternatives), but we'll need more than wishful thinking to make that > happened. I can tell it took lots of work for libcamera to see the > light of the day, including on the business side of it, not just the > technical side. Well, you know, this is where Open Source as a Standard comes from. We see the same in Confidential Computing. There are several manufacturers and they always specify how they think their stuff should work in their standards or code drops, but rarely get beyond a proof of concept in their own labs. Once we start moving it upstream, we find points of similarity between the different chip vendors, or sometimes specified implementations which plain don't work, and start modifying the APIs to take this into account. What we eventually end up with often doesn't mirror what the manufacturer standard says but it ends up being the actual standard for current and future confidential computing chips. James ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-12 7:26 ` Greg KH 2021-09-12 8:29 ` Leon Romanovsky @ 2021-09-12 19:52 ` Dave Airlie 1 sibling, 0 replies; 77+ messages in thread From: Dave Airlie @ 2021-09-12 19:52 UTC (permalink / raw) To: Greg KH Cc: Leon Romanovsky, Laurent Pinchart, Thomas Gleixner, Josh Triplett, Mauro Carvalho Chehab, Jonathan Corbet, ksummit Outside of all this, I disagree on distros being the target at all. However distros are probably a good thing to have involved. It's much easier for a distro to package one project than one per-vendor project, esp if that one project has a release cycle. If every company forks an LLVM backend for example distros will never be able to ship things, getting the LLVM backends upstream into LLVM means the distro get it for "free" on the next LLVM release. Creating kernel-like communities for userspace should be the goal, why do we want to forget the benefits the kernel ecosystem has given us as soon as we exit a syscall handler? > > > > 1. Niche devices - continue to do as they do it now, by supplying > > out-of-tree solutions for their customers. Such devices and companies > > rarely need upstream linux kernel support, because the burden to > > upstream it is very high. We don't want them in the tree either, because > > once they upstream it, the maintenance burden will be on us. > > {sigh} > > No, that is NOT our rule at all. > > These devices and companies need to be upstream more than anything else > as that way they become part of our community and are responsible for > maintaining their code in the tree. To force them to remain outside is > to go against everything that many of us have been saying for _decades_ > now. Name one group that has actively become part of the community via this advice, (I'll wait). From my view most of the communities have been created with more push-back by kernel maintainers, gpus, rdma, media, alsa vs misc (X accel drivers with no home or common cause). > > And AI/ML is no different here, someone just need to start build such > > stack. Otherwise, we will continue to see more free riders like HabanaLabs > > which don't have any real benefit to the community. > > Everyone contributes to Linux in a selfish manner, that's just how the > community works. The work that companies like habanalabs is NOT being a > "free rider" at all, they have worked with us and done the hard work of > actually getting their code merged into the tree and their userspace > code released under an open source license (unlike _ALL_ other AI/ML > companies, including Intel). It would have been much cheaper and > quicker of them to just ignore upstream entirely, but that would have > meant that the community would not have any idea of what exactly these > use-case models were nor what the problems were that they were trying to > get Linux to do. These companies don't get to ignore upstream entirely. They aren't here because they want to be, at least initially, they are here because RHEL, Amazon, Facebook, Google whoever told them they would buy their hw if it had upstream drivers in a contract and they have to do the minimal amount of work to get past Greg to merge stuff and satisfy that agreement. The community is very well aware of the needs of these groups, it's not like we don't have lots of GPUs being using for AI/ML. The habanalabs hardware is just a VLIW multithreaded processor almost like taking an AMD evergreen and shaving off the texture engines and other GPU specific bits. There is nothing new or exciting here that hasn't been solved. > > Linux benefits overall by having everyone participate, do NOT make > arbitrary rules to somehow prevent one company/group from being allowed > to upstream their code vs. another. That is NOT how we have worked in > the past, and would only cause us to slowly die and become irrelevant. The Linux Foundation might benefit, Linux doesn't. Linux benefits and stays maintainable by having responsible maintainers guide the direction of the kernel design, and creating upstream communities to sustain that. Dave. ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-12 4:27 ` Leon Romanovsky 2021-09-12 7:26 ` Greg KH @ 2021-09-12 7:46 ` Mauro Carvalho Chehab 2021-09-12 8:00 ` Leon Romanovsky 1 sibling, 1 reply; 77+ messages in thread From: Mauro Carvalho Chehab @ 2021-09-12 7:46 UTC (permalink / raw) To: Leon Romanovsky Cc: Laurent Pinchart, Thomas Gleixner, Josh Triplett, Jonathan Corbet, ksummit Em Sun, 12 Sep 2021 07:27:55 +0300 Leon Romanovsky <leon@kernel.org> escreveu: > > What if we're dealing with a device that only exists in a handful of > > machines though ? Would distributions accept the burden of packaging > > corresponding userspace code, and maintaining the packages, when only a > > handful of people in the world will use it ? It's a genuine question. > > Fedora, Debian and OpenSuSE are volunteer based distributions, they > accept new packages, which need to be prepared (or asked to be > prepared) by such vendors. > > There is no "accept the burden of packaging corresponding userspace code, > and maintaining the packages", it is on package maintainer who can or > can't be associated with distribution. There is a dead lock issue, though: if we're willing to have a policy of only accepting a new Kernel API after Fedora/Debian/openSuse accepts its userspace counterpart, it would mean, in practice, that no new APIs will ever be added, as I'm pretty sure most Fedora/Debian/openSuse maintainers will refuse an application that depends on a non-accepted Kernel API. As a maintainer of several Fedora packages myself, I would refuse any attempts of adding support for a non-accepted kernel API on the packages I maintain. - Also, it makes no sense to add support on such general-purpose distros for some hardware that will never be supported by it. See, there are, for instance, some types of hardware that are specific for some industry, like for instance, the CAN bus. While CAN buses remain restricted to vehicles, it won't make any sense to crowd a general purpose distro with support for such hardware. Such distros are not certified with ASIL. So, they aren't allowed by law to be used inside vehicles. Thanks, Mauro ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-12 7:46 ` Mauro Carvalho Chehab @ 2021-09-12 8:00 ` Leon Romanovsky 2021-09-12 14:53 ` Laurent Pinchart 0 siblings, 1 reply; 77+ messages in thread From: Leon Romanovsky @ 2021-09-12 8:00 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Laurent Pinchart, Thomas Gleixner, Josh Triplett, Jonathan Corbet, ksummit On Sun, Sep 12, 2021 at 09:46:48AM +0200, Mauro Carvalho Chehab wrote: > Em Sun, 12 Sep 2021 07:27:55 +0300 > Leon Romanovsky <leon@kernel.org> escreveu: > > > > What if we're dealing with a device that only exists in a handful of > > > machines though ? Would distributions accept the burden of packaging > > > corresponding userspace code, and maintaining the packages, when only a > > > handful of people in the world will use it ? It's a genuine question. > > > > Fedora, Debian and OpenSuSE are volunteer based distributions, they > > accept new packages, which need to be prepared (or asked to be > > prepared) by such vendors. > > > > There is no "accept the burden of packaging corresponding userspace code, > > and maintaining the packages", it is on package maintainer who can or > > can't be associated with distribution. > > There is a dead lock issue, though: if we're willing to have a policy > of only accepting a new Kernel API after Fedora/Debian/openSuse accepts > its userspace counterpart, it would mean, in practice, that no new > APIs will ever be added, as I'm pretty sure most Fedora/Debian/openSuse > maintainers will refuse an application that depends on a non-accepted > Kernel API. I said something different - "I would like to see any userspace API used (or to be used)". https://lore.kernel.org/ksummit/20210912003349.6d2cacb1@coco.lan/T/#m3b7fbbe0959f1b59288dec9afd39f7cda0eeefe9 "To be used" means some open PR to existing package or request for inclusion for new packages. > > As a maintainer of several Fedora packages myself, I would refuse > any attempts of adding support for a non-accepted kernel API on > the packages I maintain. > > - > > Also, it makes no sense to add support on such general-purpose > distros for some hardware that will never be supported by it. > > See, there are, for instance, some types of hardware that are > specific for some industry, like for instance, the CAN bus. > While CAN buses remain restricted to vehicles, it won't make any > sense to crowd a general purpose distro with support for such > hardware. Such distros are not certified with ASIL. So, they > aren't allowed by law to be used inside vehicles. And github pile of ... is certified? In attempt to find general solution for all types of APIs and devices, we won't solve anything. So I suggest to return and talk about AI/ML devices and APIs that targeted for enterprise/cloud and needs to be supported by major distros. Thanks > > Thanks, > Mauro ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-12 8:00 ` Leon Romanovsky @ 2021-09-12 14:53 ` Laurent Pinchart 2021-09-12 15:41 ` Mauro Carvalho Chehab 0 siblings, 1 reply; 77+ messages in thread From: Laurent Pinchart @ 2021-09-12 14:53 UTC (permalink / raw) To: Leon Romanovsky Cc: Mauro Carvalho Chehab, Thomas Gleixner, Josh Triplett, Jonathan Corbet, ksummit Hello, On Sun, Sep 12, 2021 at 11:00:47AM +0300, Leon Romanovsky wrote: > On Sun, Sep 12, 2021 at 09:46:48AM +0200, Mauro Carvalho Chehab wrote: > > Em Sun, 12 Sep 2021 07:27:55 +0300 Leon Romanovsky escreveu: > > > > > > What if we're dealing with a device that only exists in a handful of > > > > machines though ? Would distributions accept the burden of packaging > > > > corresponding userspace code, and maintaining the packages, when only a > > > > handful of people in the world will use it ? It's a genuine question. > > > > > > Fedora, Debian and OpenSuSE are volunteer based distributions, they > > > accept new packages, which need to be prepared (or asked to be > > > prepared) by such vendors. > > > > > > There is no "accept the burden of packaging corresponding userspace code, > > > and maintaining the packages", it is on package maintainer who can or > > > can't be associated with distribution. > > > > There is a dead lock issue, though: if we're willing to have a policy > > of only accepting a new Kernel API after Fedora/Debian/openSuse accepts > > its userspace counterpart, it would mean, in practice, that no new > > APIs will ever be added, as I'm pretty sure most Fedora/Debian/openSuse > > maintainers will refuse an application that depends on a non-accepted > > Kernel API. > > I said something different - "I would like to see any userspace API used (or to be > used)". > https://lore.kernel.org/ksummit/20210912003349.6d2cacb1@coco.lan/T/#m3b7fbbe0959f1b59288dec9afd39f7cda0eeefe9 > > "To be used" means some open PR to existing package or request for > inclusion for new packages. Requiring userspace support to be merged in the appropriate framework or accepted as a package by distributions can result in deadlocks, but requiring only aa upstream pre-approval is I think a good way to deal with the issue. That's what DRM/KMS does, there's no hard requirement (as far as I can tell) to have code merged in Mesa before the kernel, only a requirement of getting the Mesa code reviewed and acked. > > As a maintainer of several Fedora packages myself, I would refuse > > any attempts of adding support for a non-accepted kernel API on > > the packages I maintain. > > > > - > > > > Also, it makes no sense to add support on such general-purpose > > distros for some hardware that will never be supported by it. > > > > See, there are, for instance, some types of hardware that are > > specific for some industry, like for instance, the CAN bus. > > While CAN buses remain restricted to vehicles, it won't make any > > sense to crowd a general purpose distro with support for such > > hardware. Such distros are not certified with ASIL. So, they > > aren't allowed by law to be used inside vehicles. I'm not sure that's the best example, CAN has uses in other types of devices, some of which may run a general-purpose distribution. > And github pile of ... is certified? > > In attempt to find general solution for all types of APIs and devices, > we won't solve anything. That I agree with. > So I suggest to return and talk about AI/ML devices and APIs that > targeted for enterprise/cloud and needs to be supported by major > distros. And that I don't :-) I think the issue is the same for at least GPUs and AI/ML accelerators, and quite possible camera ISPs too. I'd like to try and define clear sets of criteria to address the problem, and that can include different alternatives (just as an example, not necessarily something I'd advocate for, open userspace vs. documentation) that subsystems can then select based on their specific situation. -- Regards, Laurent Pinchart ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-12 14:53 ` Laurent Pinchart @ 2021-09-12 15:41 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 77+ messages in thread From: Mauro Carvalho Chehab @ 2021-09-12 15:41 UTC (permalink / raw) To: Laurent Pinchart Cc: Leon Romanovsky, Thomas Gleixner, Josh Triplett, Jonathan Corbet, ksummit Em Sun, 12 Sep 2021 17:53:51 +0300 Laurent Pinchart <laurent.pinchart@ideasonboard.com> escreveu: > Hello, > > On Sun, Sep 12, 2021 at 11:00:47AM +0300, Leon Romanovsky wrote: > > On Sun, Sep 12, 2021 at 09:46:48AM +0200, Mauro Carvalho Chehab wrote: > > > Em Sun, 12 Sep 2021 07:27:55 +0300 Leon Romanovsky escreveu: > > > > > > > > What if we're dealing with a device that only exists in a handful of > > > > > machines though ? Would distributions accept the burden of packaging > > > > > corresponding userspace code, and maintaining the packages, when only a > > > > > handful of people in the world will use it ? It's a genuine question. > > > > > > > > Fedora, Debian and OpenSuSE are volunteer based distributions, they > > > > accept new packages, which need to be prepared (or asked to be > > > > prepared) by such vendors. > > > > > > > > There is no "accept the burden of packaging corresponding userspace code, > > > > and maintaining the packages", it is on package maintainer who can or > > > > can't be associated with distribution. > > > > > > There is a dead lock issue, though: if we're willing to have a policy > > > of only accepting a new Kernel API after Fedora/Debian/openSuse accepts > > > its userspace counterpart, it would mean, in practice, that no new > > > APIs will ever be added, as I'm pretty sure most Fedora/Debian/openSuse > > > maintainers will refuse an application that depends on a non-accepted > > > Kernel API. > > > > I said something different - "I would like to see any userspace API used (or to be > > used)". > > https://lore.kernel.org/ksummit/20210912003349.6d2cacb1@coco.lan/T/#m3b7fbbe0959f1b59288dec9afd39f7cda0eeefe9 > > > > "To be used" means some open PR to existing package or request for > > inclusion for new packages. > > Requiring userspace support to be merged in the appropriate framework or > accepted as a package by distributions can result in deadlocks, but > requiring only aa upstream pre-approval is I think a good way to deal > with the issue. > > > > As a maintainer of several Fedora packages myself, I would refuse > > > any attempts of adding support for a non-accepted kernel API on > > > the packages I maintain. > > > > > > - > > > > > > Also, it makes no sense to add support on such general-purpose > > > distros for some hardware that will never be supported by it. > > > > > > See, there are, for instance, some types of hardware that are > > > specific for some industry, like for instance, the CAN bus. > > > While CAN buses remain restricted to vehicles, it won't make any > > > sense to crowd a general purpose distro with support for such > > > hardware. Such distros are not certified with ASIL. So, they > > > aren't allowed by law to be used inside vehicles. > > I'm not sure that's the best example, CAN has uses in other types of > devices, some of which may run a general-purpose distribution. Surely. That's why I added the "While CAN buses remain restricted to vehicles" on the above phrase. This was created for a demand from one specific industry, by it could be used on other places. The same happened in the past with cameras that required an ISP IP block: they started being used only on embedded, but migrated to laptops and other devices after some time. > > And github pile of ... is certified? > > > > In attempt to find general solution for all types of APIs and devices, > > we won't solve anything. A maintainer's summit discussion is the forum for discussing issues that cross multiple subsystems. AI/ML is not the first case where new APIs are needed, nor will be the last one. So, while I agree that AI/ML should be discussed, it can't stop on it, as similar issues happen on other subsystems. > > So I suggest to return and talk about AI/ML devices and APIs that > > targeted for enterprise/cloud and needs to be supported by major > > distros. > > And that I don't :-) I think the issue is the same for at least GPUs and > AI/ML accelerators, and quite possible camera ISPs too. I'd like to try > and define clear sets of criteria to address the problem, and that can > include different alternatives (just as an example, not necessarily > something I'd advocate for, open userspace vs. documentation) that > subsystems can then select based on their specific situation. Agreed. Thanks, Mauro ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-10 22:52 ` Mauro Carvalho Chehab 2021-09-10 23:45 ` Josh Triplett @ 2021-09-10 23:46 ` Laurent Pinchart 2021-09-11 0:38 ` Mauro Carvalho Chehab 1 sibling, 1 reply; 77+ messages in thread From: Laurent Pinchart @ 2021-09-10 23:46 UTC (permalink / raw) To: Mauro Carvalho Chehab; +Cc: Jonathan Corbet, ksummit Hi Mauro, On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote: > Em Fri, 10 Sep 2021 15:00:58 -0600 Jonathan Corbet escreveu: > > > There has been a regular disagreement in recent years about whether > > drivers for accelerators (such as for the Habana Gaudi device) should be > > subject to the same requirements as GPU drivers when it comes to the > > availability of a free implementation of the user-space side. It flared > > up again recently: > > > > https://lwn.net/Articles/867168/ > > > > Happily, the Habana situation in particular seems to be resolving > > itself: > > > > https://lwn.net/ml/linux-kernel/CAFCwf119s7iXk+qpwoVPnRtOGcxeuZb3rnihf6NWWoVT-4ODHA@mail.gmail.com/ > > > > But even there it is clear that the fundamental question has not yet > > been resolved. > > > > This seems like the sort of question that the maintainer summit exists > > to address. Specifically, we could discuss: > > > > - Under which circumstances should the kernel community require the > > existence of freely licensed user-space code that exercises all > > functionalities of a proposed kernel driver or feature? > > > > - Are there internal kernel interfaces, such as DMA-BUF or P2PDMA, that > > are only available to drivers with a free user-space implementation? > > Do we need an EXPORT_SYMBOL_USERSPACE_GPL()? > > > > - What constitutes an acceptable user-space implementation in cases > > where these restrictions apply? > > > > I suspect that more clarity (and fewer arguments) on these questions > > would be welcome both within and beyond the development community. > > The media subsystem also has this sort of issues: there are several > drivers there to support hardware accelerators for video encoders and > decoders. In the case of media, usually devices with such hardware have > an Image Signal Processor, where the codec runs on some firmware. > > On media, enforcing userspace to always be open source would > have been very bad, as it would prevent several videoconferencing > software to exist on Linux. Could you elaborate on which software you're thinking of ? And maybe which driver(s) you're thinking about ? > Also, there are several such codec hardware that only exists on > embedded hardware that already depends on proprietary software > to run. > > So, a policy like that would make more damage than good. I wonder if there's some sort of misunderstanding. We're not talking about requiring *all* userspace to be open, but about requiring the existence of *one* open userspace as an acceptance criteria for merging drivers. > What we do, instead, is to try to enforce that the userspace API to > be fully documented in a way that open source software can exist. > > This is easier said than done, but we have some compliance tools > that we use, in order to help to validate the uAPI implementations. I won't comment on the codec side as there are people more knowledgeable than me in that area, but on the camera side, my analysis of the situation is different than yours. The vast majority of drivers only use standard parts of the V4L2 and MC APIs. For those, we do have plenty of existing open userspace, as well as compliance tools as you mentioned (some drivers also expose custom controls, but that's a very small API footprint and they are documented well enough to be usable by any application). The possibly problematic case is mostly about ISP drivers. For those, the userspace API is more complex, with lots of device-specific elements. The first ISP that received kernel support was the OMAP3 ISP, and the driver has custom ioctls. Requiring an open userspace may indeed have delayed the driver from being merged. However, for that particular device, we had a public datasheet that documented the ISP, which we could consider as an alternative to the open userspace implementation (a topic worth discussing I believe). Even if we had considered the public datasheet to not be enough, I think we would have eventually got an open userspace anyway (based on my internal knowledge of the Nokia team working on this project). More recently, we have two ISP drivers that got merged, for the Rockchip RK3399 ISP and the Intel IPU3. Those drivers differ from all previous drivers in the sense that the device is configured through a blob of parameters passed by userspace to the kernel, written to registers by the driver in the Rockchip case, and passed to the ISP firmware in the Intel case. We have for both drivers a header file that describes the layout of those blobs as C structures, but I can tell with first hand experience working on an open userspace implementation that at least in the Intel case that's not enough to use the ISP. There's also an ISP driver for Raspberry Pi that is currently out of tree and that we'll try to upstream, and for that one we have an open userspace already (there's actually no closed userspace, kudos to Raspberry Pi for doing the right thing, I'd like to see more vendors following that great lead). Finally, having spent the last 2 years and a half working on an open userspace camera stack (libcamera) that exercise the V4L2 and MC APIs, I was quite horrified to find out how some parts of those APIs are pretty badly designed. I'm not just blaming others here, this includes APIs that I have designed myself. They have been tested at the time with test applications (either extending tools such as v4l2-ctl, or writing dedicated tested tools for the API), but failed me when exercised in real use cases. In retrospect I shouldn't have been surprised, developing a test application that exercises the API in the way it was designed as opposed to the way a real use case would need it can only lead to problems. I think that requiring an open implementation of a real use case, not just a test tool, would be a very good practice for new APIs or extensions or existing APIs. -- Regards, Laurent Pinchart ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-10 23:46 ` Laurent Pinchart @ 2021-09-11 0:38 ` Mauro Carvalho Chehab 2021-09-11 9:27 ` Laurent Pinchart 0 siblings, 1 reply; 77+ messages in thread From: Mauro Carvalho Chehab @ 2021-09-11 0:38 UTC (permalink / raw) To: Laurent Pinchart; +Cc: Jonathan Corbet, ksummit Em Sat, 11 Sep 2021 02:46:42 +0300 Laurent Pinchart <laurent.pinchart@ideasonboard.com> escreveu: > Hi Mauro, > > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote: > > Em Fri, 10 Sep 2021 15:00:58 -0600 Jonathan Corbet escreveu: > > > > > There has been a regular disagreement in recent years about whether > > > drivers for accelerators (such as for the Habana Gaudi device) should be > > > subject to the same requirements as GPU drivers when it comes to the > > > availability of a free implementation of the user-space side. It flared > > > up again recently: > > > > > > https://lwn.net/Articles/867168/ > > > > > > Happily, the Habana situation in particular seems to be resolving > > > itself: > > > > > > https://lwn.net/ml/linux-kernel/CAFCwf119s7iXk+qpwoVPnRtOGcxeuZb3rnihf6NWWoVT-4ODHA@mail.gmail.com/ > > > > > > But even there it is clear that the fundamental question has not yet > > > been resolved. > > > > > > This seems like the sort of question that the maintainer summit exists > > > to address. Specifically, we could discuss: > > > > > > - Under which circumstances should the kernel community require the > > > existence of freely licensed user-space code that exercises all > > > functionalities of a proposed kernel driver or feature? > > > > > > - Are there internal kernel interfaces, such as DMA-BUF or P2PDMA, that > > > are only available to drivers with a free user-space implementation? > > > Do we need an EXPORT_SYMBOL_USERSPACE_GPL()? > > > > > > - What constitutes an acceptable user-space implementation in cases > > > where these restrictions apply? > > > > > > I suspect that more clarity (and fewer arguments) on these questions > > > would be welcome both within and beyond the development community. > > > > The media subsystem also has this sort of issues: there are several > > drivers there to support hardware accelerators for video encoders and > > decoders. In the case of media, usually devices with such hardware have > > an Image Signal Processor, where the codec runs on some firmware. > > > > On media, enforcing userspace to always be open source would > > have been very bad, as it would prevent several videoconferencing > > software to exist on Linux. > > Could you elaborate on which software you're thinking of ? And maybe > which driver(s) you're thinking about ? I'm referring to tools like v4l2-compliance, qv4l2 and other tools we maintain at v4l-utils tree. > > Also, there are several such codec hardware that only exists on > > embedded hardware that already depends on proprietary software > > to run. > > > > So, a policy like that would make more damage than good. > > I wonder if there's some sort of misunderstanding. We're not talking > about requiring *all* userspace to be open, but about requiring the > existence of *one* open userspace as an acceptance criteria for merging > drivers. Something like EXPORT_SYMBOL_USERSPACE_GPL() implies that any userspace app using such symbols would be GPL'd. > > > What we do, instead, is to try to enforce that the userspace API to > > be fully documented in a way that open source software can exist. > > > > This is easier said than done, but we have some compliance tools > > that we use, in order to help to validate the uAPI implementations. > > I won't comment on the codec side as there are people more knowledgeable > than me in that area, but on the camera side, my analysis of the > situation is different than yours. The vast majority of drivers only use > standard parts of the V4L2 and MC APIs. For those, we do have plenty of > existing open userspace, as well as compliance tools as you mentioned > (some drivers also expose custom controls, but that's a very small API > footprint and they are documented well enough to be usable by any > application). Yes, that's my view too. We used to have problems in the past with some proprietary fourccs, but I guess the problematic ones were either removed (because there was no upstream driver) or documented. > The possibly problematic case is mostly about ISP drivers. For those, > the userspace API is more complex, with lots of device-specific > elements. The first ISP that received kernel support was the OMAP3 ISP, > and the driver has custom ioctls. Requiring an open userspace may indeed > have delayed the driver from being merged. However, for that particular > device, we had a public datasheet that documented the ISP, Yes, but afterwards, other ISP drivers got added. I don't think they all have public datasheets. > which we > could consider as an alternative to the open userspace implementation > (a topic worth discussing I believe). Yeah, a public datasheet sounds an interesting requirement. It offers a problem, though: maybe some details could be missed on it, which would prevent any real open source userspace development. > Even if we had considered the > public datasheet to not be enough, I think we would have eventually got > an open userspace anyway (based on my internal knowledge of the Nokia > team working on this project). Yes. > More recently, we have two ISP drivers that got merged, for the Rockchip > RK3399 ISP and the Intel IPU3. Those drivers differ from all previous > drivers in the sense that the device is configured through a blob of > parameters passed by userspace to the kernel, written to registers by > the driver in the Rockchip case, and passed to the ISP firmware in the > Intel case. We have for both drivers a header file that describes the > layout of those blobs as C structures, but I can tell with first hand > experience working on an open userspace implementation that at least in > the Intel case that's not enough to use the ISP. Yeah, I was afraid that this would end happening some day. Not too big harm, though, as IPU3 is under staging. We should enforce it to be be supported at libcamera or to have some other open source application before moving it out of staging. > There's also an ISP driver for Raspberry Pi that is currently out of > tree and that we'll try to upstream, and for that one we have an open > userspace already (there's actually no closed userspace, kudos to > Raspberry Pi for doing the right thing, I'd like to see more vendors > following that great lead). > > Finally, having spent the last 2 years and a half working on an open > userspace camera stack (libcamera) that exercise the V4L2 and MC APIs, I > was quite horrified to find out how some parts of those APIs are pretty > badly designed. I'm not just blaming others here, this includes APIs > that I have designed myself. They have been tested at the time with test > applications (either extending tools such as v4l2-ctl, or writing > dedicated tested tools for the API), but failed me when exercised in > real use cases. In retrospect I shouldn't have been surprised, > developing a test application that exercises the API in the way it was > designed as opposed to the way a real use case would need it can only > lead to problems. I think that requiring an open implementation of a > real use case, not just a test tool, would be a very good practice for > new APIs or extensions or existing APIs. I remember that, during OMAP3 development, I required several times to have an userspace app/lib before merging it upstream (somewhat similar to libcamera goals). On that time, we didn't have staging yet. So, when Nokia ended with MeeGo development, I opted to merge what we had so far to support OMAP3 even without having an open source counterpart, as there were already some public documentation that seemed to help someone to write userspace tools in the future. Thanks, Mauro ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-11 0:38 ` Mauro Carvalho Chehab @ 2021-09-11 9:27 ` Laurent Pinchart 2021-09-11 22:33 ` Mauro Carvalho Chehab 2021-09-13 12:04 ` Mark Brown 0 siblings, 2 replies; 77+ messages in thread From: Laurent Pinchart @ 2021-09-11 9:27 UTC (permalink / raw) To: Mauro Carvalho Chehab; +Cc: Jonathan Corbet, ksummit Hi Mauro, On Sat, Sep 11, 2021 at 02:38:11AM +0200, Mauro Carvalho Chehab wrote: > Em Sat, 11 Sep 2021 02:46:42 +0300 Laurent Pinchart escreveu: > > On Sat, Sep 11, 2021 at 12:52:14AM +0200, Mauro Carvalho Chehab wrote: > > > Em Fri, 10 Sep 2021 15:00:58 -0600 Jonathan Corbet escreveu: > > > > > > > There has been a regular disagreement in recent years about whether > > > > drivers for accelerators (such as for the Habana Gaudi device) should be > > > > subject to the same requirements as GPU drivers when it comes to the > > > > availability of a free implementation of the user-space side. It flared > > > > up again recently: > > > > > > > > https://lwn.net/Articles/867168/ > > > > > > > > Happily, the Habana situation in particular seems to be resolving > > > > itself: > > > > > > > > https://lwn.net/ml/linux-kernel/CAFCwf119s7iXk+qpwoVPnRtOGcxeuZb3rnihf6NWWoVT-4ODHA@mail.gmail.com/ > > > > > > > > But even there it is clear that the fundamental question has not yet > > > > been resolved. > > > > > > > > This seems like the sort of question that the maintainer summit exists > > > > to address. Specifically, we could discuss: > > > > > > > > - Under which circumstances should the kernel community require the > > > > existence of freely licensed user-space code that exercises all > > > > functionalities of a proposed kernel driver or feature? > > > > > > > > - Are there internal kernel interfaces, such as DMA-BUF or P2PDMA, that > > > > are only available to drivers with a free user-space implementation? > > > > Do we need an EXPORT_SYMBOL_USERSPACE_GPL()? > > > > > > > > - What constitutes an acceptable user-space implementation in cases > > > > where these restrictions apply? > > > > > > > > I suspect that more clarity (and fewer arguments) on these questions > > > > would be welcome both within and beyond the development community. > > > > > > The media subsystem also has this sort of issues: there are several > > > drivers there to support hardware accelerators for video encoders and > > > decoders. In the case of media, usually devices with such hardware have > > > an Image Signal Processor, where the codec runs on some firmware. > > > > > > On media, enforcing userspace to always be open source would > > > have been very bad, as it would prevent several videoconferencing > > > software to exist on Linux. > > > > Could you elaborate on which software you're thinking of ? And maybe > > which driver(s) you're thinking about ? > > I'm referring to tools like v4l2-compliance, qv4l2 and other tools > we maintain at v4l-utils tree. I meant the video conferencing software that would have been prevented from existing. I'd like to understand if you think that requiring *one* open userspace would be problematic. > > > Also, there are several such codec hardware that only exists on > > > embedded hardware that already depends on proprietary software > > > to run. > > > > > > So, a policy like that would make more damage than good. > > > > I wonder if there's some sort of misunderstanding. We're not talking > > about requiring *all* userspace to be open, but about requiring the > > existence of *one* open userspace as an acceptance criteria for merging > > drivers. > > Something like EXPORT_SYMBOL_USERSPACE_GPL() implies that any > userspace app using such symbols would be GPL'd. I think EXPORT_SYMBOL_USERSPACE_GPL() has already been deemed not to be the right option based on the discussions in this e-mail thread. The requirement of having *one* open userspace is still being discussed, and is orthogonal to EXPORT_SYMBOL_USERSPACE_GPL() I believe. > > > What we do, instead, is to try to enforce that the userspace API to > > > be fully documented in a way that open source software can exist. > > > > > > This is easier said than done, but we have some compliance tools > > > that we use, in order to help to validate the uAPI implementations. > > > > I won't comment on the codec side as there are people more knowledgeable > > than me in that area, but on the camera side, my analysis of the > > situation is different than yours. The vast majority of drivers only use > > standard parts of the V4L2 and MC APIs. For those, we do have plenty of > > existing open userspace, as well as compliance tools as you mentioned > > (some drivers also expose custom controls, but that's a very small API > > footprint and they are documented well enough to be usable by any > > application). > > Yes, that's my view too. We used to have problems in the past with > some proprietary fourccs, but I guess the problematic ones were > either removed (because there was no upstream driver) or documented. > > > The possibly problematic case is mostly about ISP drivers. For those, > > the userspace API is more complex, with lots of device-specific > > elements. The first ISP that received kernel support was the OMAP3 ISP, > > and the driver has custom ioctls. Requiring an open userspace may indeed > > have delayed the driver from being merged. However, for that particular > > device, we had a public datasheet that documented the ISP, > > Yes, but afterwards, other ISP drivers got added. I don't think they > all have public datasheets. Sure. That's addressed below :-) > > which we > > could consider as an alternative to the open userspace implementation > > (a topic worth discussing I believe). > > Yeah, a public datasheet sounds an interesting requirement. It offers > a problem, though: maybe some details could be missed on it, which > would prevent any real open source userspace development. Absolutely, and I don't think we can come up with any process and technical measure that would prevent a vendor from cheating if they really want to. It will always be possible to hide some features behind reserved registers that wouldn't need to be programmed for basic operation but that would be crucial to optimize the quality or performances. This is regardless of whether we want to enforce openness of documentation in the form of datasheets or source code. I'm not too concerned about this though. If we can address most of this issue with a clear process and message I think it would be a very good step forward already. > > Even if we had considered the > > public datasheet to not be enough, I think we would have eventually got > > an open userspace anyway (based on my internal knowledge of the Nokia > > team working on this project). > > Yes. > > > More recently, we have two ISP drivers that got merged, for the Rockchip > > RK3399 ISP and the Intel IPU3. Those drivers differ from all previous > > drivers in the sense that the device is configured through a blob of > > parameters passed by userspace to the kernel, written to registers by > > the driver in the Rockchip case, and passed to the ISP firmware in the > > Intel case. We have for both drivers a header file that describes the > > layout of those blobs as C structures, but I can tell with first hand > > experience working on an open userspace implementation that at least in > > the Intel case that's not enough to use the ISP. > > Yeah, I was afraid that this would end happening some day. Not too big > harm, though, as IPU3 is under staging. We should enforce it to be > be supported at libcamera or to have some other open source application > before moving it out of staging. The same may be true of the rkisp1 driver, we haven't moved forward enough with its support in libcamera yet to tell for sure. > > There's also an ISP driver for Raspberry Pi that is currently out of > > tree and that we'll try to upstream, and for that one we have an open > > userspace already (there's actually no closed userspace, kudos to > > Raspberry Pi for doing the right thing, I'd like to see more vendors > > following that great lead). > > > > Finally, having spent the last 2 years and a half working on an open > > userspace camera stack (libcamera) that exercise the V4L2 and MC APIs, I > > was quite horrified to find out how some parts of those APIs are pretty > > badly designed. I'm not just blaming others here, this includes APIs > > that I have designed myself. They have been tested at the time with test > > applications (either extending tools such as v4l2-ctl, or writing > > dedicated tested tools for the API), but failed me when exercised in > > real use cases. In retrospect I shouldn't have been surprised, > > developing a test application that exercises the API in the way it was > > designed as opposed to the way a real use case would need it can only > > lead to problems. I think that requiring an open implementation of a > > real use case, not just a test tool, would be a very good practice for > > new APIs or extensions or existing APIs. > > I remember that, during OMAP3 development, I required several times > to have an userspace app/lib before merging it upstream (somewhat > similar to libcamera goals). > > On that time, we didn't have staging yet. So, when Nokia ended with > MeeGo development, I opted to merge what we had so far to support > OMAP3 even without having an open source counterpart, as there > were already some public documentation that seemed to help someone > to write userspace tools in the future. And there was also https://git.ideasonboard.org/omap3-isp-live.git that was published shortly after. -- Regards, Laurent Pinchart ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-11 9:27 ` Laurent Pinchart @ 2021-09-11 22:33 ` Mauro Carvalho Chehab 2021-09-13 12:04 ` Mark Brown 1 sibling, 0 replies; 77+ messages in thread From: Mauro Carvalho Chehab @ 2021-09-11 22:33 UTC (permalink / raw) To: Laurent Pinchart; +Cc: Jonathan Corbet, ksummit Em Sat, 11 Sep 2021 12:27:37 +0300 Laurent Pinchart <laurent.pinchart@ideasonboard.com> escreveu: > Hi Mauro, > > > > On media, enforcing userspace to always be open source would > > > > have been very bad, as it would prevent several videoconferencing > > > > software to exist on Linux. > > > > > > Could you elaborate on which software you're thinking of ? And maybe > > > which driver(s) you're thinking about ? > > > > I'm referring to tools like v4l2-compliance, qv4l2 and other tools > > we maintain at v4l-utils tree. > > I meant the video conferencing software that would have been prevented > from existing. I'd like to understand if you think that requiring *one* > open userspace would be problematic. No, requiring *one* open userspace real application should be enough. > > Yeah, a public datasheet sounds an interesting requirement. It offers > > a problem, though: maybe some details could be missed on it, which > > would prevent any real open source userspace development. > > Absolutely, and I don't think we can come up with any process and > technical measure that would prevent a vendor from cheating if they > really want to. It will always be possible to hide some features behind > reserved registers that wouldn't need to be programmed for basic > operation but that would be crucial to optimize the quality or > performances. This is regardless of whether we want to enforce openness > of documentation in the form of datasheets or source code. Unfortunately true. > I'm not too concerned about this though. If we can address most of this > issue with a clear process and message I think it would be a very good > step forward already. Yeah, a policy could be implemented in order to address such cases, asking the vendor for a fix or even removing drivers and banning vendors that are, by purpose, sending broken drivers/APIs. Thanks, Mauro ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-11 9:27 ` Laurent Pinchart 2021-09-11 22:33 ` Mauro Carvalho Chehab @ 2021-09-13 12:04 ` Mark Brown 1 sibling, 0 replies; 77+ messages in thread From: Mark Brown @ 2021-09-13 12:04 UTC (permalink / raw) To: Laurent Pinchart; +Cc: Mauro Carvalho Chehab, Jonathan Corbet, ksummit [-- Attachment #1: Type: text/plain, Size: 1913 bytes --] On Sat, Sep 11, 2021 at 12:27:37PM +0300, Laurent Pinchart wrote: > On Sat, Sep 11, 2021 at 02:38:11AM +0200, Mauro Carvalho Chehab wrote: > > > which we > > > could consider as an alternative to the open userspace implementation > > > (a topic worth discussing I believe). > > Yeah, a public datasheet sounds an interesting requirement. It offers > > a problem, though: maybe some details could be missed on it, which > > would prevent any real open source userspace development. > Absolutely, and I don't think we can come up with any process and > technical measure that would prevent a vendor from cheating if they > really want to. It will always be possible to hide some features behind > reserved registers that wouldn't need to be programmed for basic > operation but that would be crucial to optimize the quality or > performances. This is regardless of whether we want to enforce openness > of documentation in the form of datasheets or source code. This is already very standard in some parts of the industry, even between vendors and customers. Sometimes it's done intentionally but it's also often just that the actual practical configuration process relies on some non-trivial test system and perhaps has as much art as science involved. It can also be a decision about managing support costs which works for everyone involved on the business side - sometimes the product being delivered includes the vendor doing a good deal of the tuning for some combination of cost and complexity reasons. > I'm not too concerned about this though. If we can address most of this > issue with a clear process and message I think it would be a very good > step forward already. Yeah, I'm personally not so concerned about the callibration and tuning side - ideally that would be fully open but like you say even without that we've achieved something and there may not actually be anything extant to open. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-10 21:00 [MAINTAINER SUMMIT] User-space requirements for accelerator drivers Jonathan Corbet ` (2 preceding siblings ...) 2021-09-10 22:52 ` Mauro Carvalho Chehab @ 2021-09-12 19:13 ` Dave Airlie 2021-09-12 19:48 ` Laurent Pinchart 3 siblings, 1 reply; 77+ messages in thread From: Dave Airlie @ 2021-09-12 19:13 UTC (permalink / raw) To: Jonathan Corbet; +Cc: ksummit On Sat, 11 Sept 2021 at 07:10, Jonathan Corbet <corbet@lwn.net> wrote: > > There has been a regular disagreement in recent years about whether > drivers for accelerators (such as for the Habana Gaudi device) should be > subject to the same requirements as GPU drivers when it comes to the > availability of a free implementation of the user-space side. It flared > up again recently: > > https://lwn.net/Articles/867168/ > > Happily, the Habana situation in particular seems to be resolving > itself: > > https://lwn.net/ml/linux-kernel/CAFCwf119s7iXk+qpwoVPnRtOGcxeuZb3rnihf6NWWoVT-4ODHA@mail.gmail.com/ > > But even there it is clear that the fundamental question has not yet > been resolved. > > This seems like the sort of question that the maintainer summit exists > to address. Specifically, we could discuss: > > - Under which circumstances should the kernel community require the > existence of freely licensed user-space code that exercises all > functionalities of a proposed kernel driver or feature? > > - Are there internal kernel interfaces, such as DMA-BUF or P2PDMA, that > are only available to drivers with a free user-space implementation? > Do we need an EXPORT_SYMBOL_USERSPACE_GPL()? > > - What constitutes an acceptable user-space implementation in cases > where these restrictions apply? > > I suspect that more clarity (and fewer arguments) on these questions > would be welcome both within and beyond the development community. > > Thanks, Can everyone take a read of: https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements I think in order to clean the signal/noise ratio up in here, some education effort might help people realise how non-trivial these things are. 1. These drivers are not one or two ioctls that a few selftests and a small test app can cover. It's like saying LTP is all we need to define the uAPI for the kernel and if anyone does something LTP doesn't cover the app is broken. These systems are generally complex, multithreaded and multiuser uAPIs, involving command streams recorded in userspace being submitted to the devices. They interact with memory management and can cause unfindable deadlocks across the system if designed incorrectly. Documentation or kselftests aren't going to cut it here. 2. In my experience we don't build communities by merging everything, we build them by saying No more and pushing back on companies with education and cross-vendor cooperation. Responsible kernel maintenance shouldn't end at the kernel boundaries. If you aren't the person to help enforce a userspace for a driver you are being asked to merge, then don't merge it, but try and engage the vendor with the communities of interest in the kernel who already deal in those areas. 3. The pressures on these companies to merge things into Linux isn't altruistic or even that they necessarily want to be in the Linux kernel upstream. They are being told by Red Hat, Facebook, Google or someone else that they need an upstream driver. They will generally engage at a minimal level to get past that blockage and then disengage. Having a clear set of rules (or a place to discuss those rules, for new subsystems) and a gentle pushback helps develop communities by unlocking funding within those larger areas. As Laurent has said this isn't free, but just putting things into the kernel and not caring about userspace hasn't built any Linux communities in the accelerator areas. That said I started writing a cleaned up version of the above document which is more generic that other subsystems could sign on to. I was going to engage with a coalition of like-minded maintainers rather than trying to gain consensus among a herd of cats to see if we can draw clearer lines in the sand that cross more subsystems so the experience of drivers/gpu doesn't go unwasted but also isn't just bypassed by subsystem hunting. https://cgit.freedesktop.org/~airlied/linux/log/?h=wip-open-source-userspace Dave. ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-12 19:13 ` Dave Airlie @ 2021-09-12 19:48 ` Laurent Pinchart 2021-09-13 2:26 ` Dave Airlie 0 siblings, 1 reply; 77+ messages in thread From: Laurent Pinchart @ 2021-09-12 19:48 UTC (permalink / raw) To: Dave Airlie; +Cc: Jonathan Corbet, ksummit Hi Dave, On Mon, Sep 13, 2021 at 05:13:05AM +1000, Dave Airlie wrote: > On Sat, 11 Sept 2021 at 07:10, Jonathan Corbet <corbet@lwn.net> wrote: > > > > There has been a regular disagreement in recent years about whether > > drivers for accelerators (such as for the Habana Gaudi device) should be > > subject to the same requirements as GPU drivers when it comes to the > > availability of a free implementation of the user-space side. It flared > > up again recently: > > > > https://lwn.net/Articles/867168/ > > > > Happily, the Habana situation in particular seems to be resolving > > itself: > > > > https://lwn.net/ml/linux-kernel/CAFCwf119s7iXk+qpwoVPnRtOGcxeuZb3rnihf6NWWoVT-4ODHA@mail.gmail.com/ > > > > But even there it is clear that the fundamental question has not yet > > been resolved. > > > > This seems like the sort of question that the maintainer summit exists > > to address. Specifically, we could discuss: > > > > - Under which circumstances should the kernel community require the > > existence of freely licensed user-space code that exercises all > > functionalities of a proposed kernel driver or feature? > > > > - Are there internal kernel interfaces, such as DMA-BUF or P2PDMA, that > > are only available to drivers with a free user-space implementation? > > Do we need an EXPORT_SYMBOL_USERSPACE_GPL()? > > > > - What constitutes an acceptable user-space implementation in cases > > where these restrictions apply? > > > > I suspect that more clarity (and fewer arguments) on these questions > > would be welcome both within and beyond the development community. > > > > Thanks, > > Can everyone take a read of: > > https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements > > I think in order to clean the signal/noise ratio up in here, some > education effort might help people realise how non-trivial these > things are. > > 1. These drivers are not one or two ioctls that a few selftests and a > small test app can cover. It's like saying LTP is all we need to > define the uAPI for the kernel and if anyone does something LTP > doesn't cover the app is broken. These systems are generally complex, > multithreaded and multiuser uAPIs, involving command streams recorded > in userspace being submitted to the devices. They interact with memory > management and can cause unfindable deadlocks across the system if > designed incorrectly. Documentation or kselftests aren't going to cut > it here. > > 2. In my experience we don't build communities by merging everything, > we build them by saying No more and pushing back on companies with > education and cross-vendor cooperation. Responsible kernel maintenance > shouldn't end at the kernel boundaries. If you aren't the person to > help enforce a userspace for a driver you are being asked to merge, > then don't merge it, but try and engage the vendor with the > communities of interest in the kernel who already deal in those areas. > > 3. The pressures on these companies to merge things into Linux isn't > altruistic or even that they necessarily want to be in the Linux > kernel upstream. They are being told by Red Hat, Facebook, Google or > someone else that they need an upstream driver. They will generally > engage at a minimal level to get past that blockage and then > disengage. Having a clear set of rules (or a place to discuss those > rules, for new subsystems) and a gentle pushback helps develop > communities by unlocking funding within those larger areas. As Laurent > has said this isn't free, but just putting things into the kernel and > not caring about userspace hasn't built any Linux communities in the > accelerator areas. > > That said I started writing a cleaned up version of the above document > which is more generic that other subsystems could sign on to. I was > going to engage with a coalition of like-minded maintainers rather > than trying to gain consensus among a herd of cats to see if we can > draw clearer lines in the sand that cross more subsystems so the > experience of drivers/gpu doesn't go unwasted but also isn't just > bypassed by subsystem hunting. > > https://cgit.freedesktop.org/~airlied/linux/log/?h=wip-open-source-userspace Thank you for that effort. Could you add camera ISPs to the list with FPGAs, DSPs and ML accelerators ? You mention Level0 in that document. I assume you don't mean the OpenStreetMap editor ? -- Regards, Laurent Pinchart ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers 2021-09-12 19:48 ` Laurent Pinchart @ 2021-09-13 2:26 ` Dave Airlie 0 siblings, 0 replies; 77+ messages in thread From: Dave Airlie @ 2021-09-13 2:26 UTC (permalink / raw) To: Laurent Pinchart; +Cc: Jonathan Corbet, ksummit On Mon, 13 Sept 2021 at 05:48, Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > > Hi Dave, > > On Mon, Sep 13, 2021 at 05:13:05AM +1000, Dave Airlie wrote: > > On Sat, 11 Sept 2021 at 07:10, Jonathan Corbet <corbet@lwn.net> wrote: > > > > > > There has been a regular disagreement in recent years about whether > > > drivers for accelerators (such as for the Habana Gaudi device) should be > > > subject to the same requirements as GPU drivers when it comes to the > > > availability of a free implementation of the user-space side. It flared > > > up again recently: > > > > > > https://lwn.net/Articles/867168/ > > > > > > Happily, the Habana situation in particular seems to be resolving > > > itself: > > > > > > https://lwn.net/ml/linux-kernel/CAFCwf119s7iXk+qpwoVPnRtOGcxeuZb3rnihf6NWWoVT-4ODHA@mail.gmail.com/ > > > > > > But even there it is clear that the fundamental question has not yet > > > been resolved. > > > > > > This seems like the sort of question that the maintainer summit exists > > > to address. Specifically, we could discuss: > > > > > > - Under which circumstances should the kernel community require the > > > existence of freely licensed user-space code that exercises all > > > functionalities of a proposed kernel driver or feature? > > > > > > - Are there internal kernel interfaces, such as DMA-BUF or P2PDMA, that > > > are only available to drivers with a free user-space implementation? > > > Do we need an EXPORT_SYMBOL_USERSPACE_GPL()? > > > > > > - What constitutes an acceptable user-space implementation in cases > > > where these restrictions apply? > > > > > > I suspect that more clarity (and fewer arguments) on these questions > > > would be welcome both within and beyond the development community. > > > > > > Thanks, > > > > Can everyone take a read of: > > > > https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements > > > > I think in order to clean the signal/noise ratio up in here, some > > education effort might help people realise how non-trivial these > > things are. > > > > 1. These drivers are not one or two ioctls that a few selftests and a > > small test app can cover. It's like saying LTP is all we need to > > define the uAPI for the kernel and if anyone does something LTP > > doesn't cover the app is broken. These systems are generally complex, > > multithreaded and multiuser uAPIs, involving command streams recorded > > in userspace being submitted to the devices. They interact with memory > > management and can cause unfindable deadlocks across the system if > > designed incorrectly. Documentation or kselftests aren't going to cut > > it here. > > > > 2. In my experience we don't build communities by merging everything, > > we build them by saying No more and pushing back on companies with > > education and cross-vendor cooperation. Responsible kernel maintenance > > shouldn't end at the kernel boundaries. If you aren't the person to > > help enforce a userspace for a driver you are being asked to merge, > > then don't merge it, but try and engage the vendor with the > > communities of interest in the kernel who already deal in those areas. > > > > 3. The pressures on these companies to merge things into Linux isn't > > altruistic or even that they necessarily want to be in the Linux > > kernel upstream. They are being told by Red Hat, Facebook, Google or > > someone else that they need an upstream driver. They will generally > > engage at a minimal level to get past that blockage and then > > disengage. Having a clear set of rules (or a place to discuss those > > rules, for new subsystems) and a gentle pushback helps develop > > communities by unlocking funding within those larger areas. As Laurent > > has said this isn't free, but just putting things into the kernel and > > not caring about userspace hasn't built any Linux communities in the > > accelerator areas. > > > > That said I started writing a cleaned up version of the above document > > which is more generic that other subsystems could sign on to. I was > > going to engage with a coalition of like-minded maintainers rather > > than trying to gain consensus among a herd of cats to see if we can > > draw clearer lines in the sand that cross more subsystems so the > > experience of drivers/gpu doesn't go unwasted but also isn't just > > bypassed by subsystem hunting. > > > > https://cgit.freedesktop.org/~airlied/linux/log/?h=wip-open-source-userspace > > Thank you for that effort. Could you add camera ISPs to the list with > FPGAs, DSPs and ML accelerators ? I'll add that to the next iteration, thanks. > > You mention Level0 in that document. I assume you don't mean the > OpenStreetMap editor ? https://spec.oneapi.io/level-zero/latest/core/INTRO.html It's like a vulkan for OpenCL effort, they've already managed to put things in the API that are close to impossible to make work on the Linux kernel properly, again because Intel internally thought they had better experts than the kernel, but we are trying to get that all fixed up. Dave. ^ permalink raw reply [flat|nested] 77+ messages in thread
end of thread, other threads:[~2021-09-14 19:45 UTC | newest] Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-09-10 21:00 [MAINTAINER SUMMIT] User-space requirements for accelerator drivers Jonathan Corbet 2021-09-10 21:32 ` Josh Triplett 2021-09-13 13:50 ` Christian Brauner 2021-09-13 13:57 ` Daniel Vetter 2021-09-14 2:07 ` Laurent Pinchart 2021-09-14 14:40 ` Jani Nikula 2021-09-14 14:45 ` Geert Uytterhoeven 2021-09-14 14:59 ` Jani Nikula 2021-09-14 15:10 ` Geert Uytterhoeven 2021-09-10 21:51 ` James Bottomley 2021-09-10 21:59 ` Alexandre Belloni 2021-09-10 22:35 ` James Bottomley 2021-09-11 14:51 ` Jonathan Corbet 2021-09-11 15:24 ` James Bottomley 2021-09-11 21:52 ` Laurent Pinchart 2021-09-14 13:22 ` Johannes Berg 2021-09-11 0:08 ` Laurent Pinchart 2021-09-10 22:52 ` Mauro Carvalho Chehab 2021-09-10 23:45 ` Josh Triplett 2021-09-10 23:48 ` Dave Hansen 2021-09-11 0:13 ` Laurent Pinchart 2021-09-10 23:55 ` Thomas Gleixner 2021-09-11 0:20 ` Laurent Pinchart 2021-09-11 14:20 ` Steven Rostedt 2021-09-11 22:08 ` Laurent Pinchart 2021-09-11 22:42 ` Steven Rostedt 2021-09-11 23:10 ` Laurent Pinchart 2021-09-13 11:10 ` Mark Brown 2021-09-11 22:51 ` Mauro Carvalho Chehab 2021-09-11 23:22 ` Mauro Carvalho Chehab 2021-09-11 10:31 ` Leon Romanovsky 2021-09-11 11:41 ` Laurent Pinchart 2021-09-11 12:04 ` Leon Romanovsky 2021-09-11 22:04 ` Laurent Pinchart 2021-09-12 4:27 ` Leon Romanovsky 2021-09-12 7:26 ` Greg KH 2021-09-12 8:29 ` Leon Romanovsky 2021-09-12 13:25 ` Greg KH 2021-09-12 14:15 ` Leon Romanovsky 2021-09-12 14:34 ` Greg KH 2021-09-12 16:41 ` Laurent Pinchart 2021-09-12 20:35 ` Dave Airlie 2021-09-12 20:41 ` Dave Airlie 2021-09-12 20:49 ` Daniel Vetter 2021-09-12 21:12 ` Dave Airlie 2021-09-12 22:51 ` Linus Walleij 2021-09-12 23:15 ` Dave Airlie 2021-09-13 13:20 ` Arnd Bergmann 2021-09-13 13:54 ` Daniel Vetter 2021-09-13 22:04 ` Arnd Bergmann 2021-09-13 23:33 ` Dave Airlie 2021-09-14 9:08 ` Arnd Bergmann 2021-09-14 9:23 ` Daniel Vetter 2021-09-14 10:47 ` Laurent Pinchart 2021-09-14 12:58 ` Arnd Bergmann 2021-09-14 19:45 ` Daniel Vetter 2021-09-14 15:43 ` Luck, Tony 2021-09-13 14:52 ` James Bottomley 2021-09-14 13:07 ` Linus Walleij 2021-09-13 14:03 ` Mark Brown 2021-09-12 15:55 ` Laurent Pinchart 2021-09-12 16:43 ` James Bottomley 2021-09-12 16:58 ` Laurent Pinchart 2021-09-12 17:08 ` James Bottomley 2021-09-12 19:52 ` Dave Airlie 2021-09-12 7:46 ` Mauro Carvalho Chehab 2021-09-12 8:00 ` Leon Romanovsky 2021-09-12 14:53 ` Laurent Pinchart 2021-09-12 15:41 ` Mauro Carvalho Chehab 2021-09-10 23:46 ` Laurent Pinchart 2021-09-11 0:38 ` Mauro Carvalho Chehab 2021-09-11 9:27 ` Laurent Pinchart 2021-09-11 22:33 ` Mauro Carvalho Chehab 2021-09-13 12:04 ` Mark Brown 2021-09-12 19:13 ` Dave Airlie 2021-09-12 19:48 ` Laurent Pinchart 2021-09-13 2:26 ` Dave Airlie
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).