* Habanalabs Open-Source TPC LLVM compiler and SynapseAI Core library @ 2021-09-10 7:26 Oded Gabbay 2021-09-10 7:58 ` Greg Kroah-Hartman 2021-09-12 7:38 ` Michael Zuckerman 0 siblings, 2 replies; 15+ messages in thread From: Oded Gabbay @ 2021-09-10 7:26 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: mzuckerman, dsinger, Linus Torvalds, Dave Airlie, Daniel Vetter, Jason Gunthorpe, Linux-Kernel@Vger. Kernel. Org Hi Greg, Following our conversations a couple of months ago, I'm happy to tell you that Habanalabs has open-sourced its TPC (Tensor Processing Core) LLVM compiler, which is a fork of the LLVM open-source project. The project can be found on Habanalabs GitHub website at: https://github.com/HabanaAI/tpc_llvm There is a companion guide on how to write TPC kernels at: https://docs.habana.ai/en/latest/TPC_User_Guide/TPC_User_Guide.html The guide details the TPC compute engine's architecture, how to write TPC kernels using the TPC-C language, etc. In addition, we have written a reference implementation of the SynapseAI API, called SynapseAI Core, and released its code under the MIT license to the open-source community at: https://github.com/HabanaAI/SynapseAI_Core SynapseAI Core contains all the necessary building blocks to run Deep Learning training on Gaudi, although not as optimized as the closed-source library. The project repository contains a couple of TPC kernels that implement basic DL operators. These kernels can serve as an example of how to implement more complex operators. To work with the Gaudi device, the library calls the Habanalabs kernel driver uAPI through the already open-source hl-thunk library at: https://github.com/HabanaAI/hl-thunk Moreover, the library contains a few tests (and more will follow soon) that demonstrate how to use the SynapseAI API to run workloads which utilize the TPC engines on Gaudi devices. We provided a short readme that explains how to build and run the included tests. It is important to note we provided all the necessary APIs to connect this library to any Deep Learning frameworks by writing appropriate backends in the frameworks and by writing more TPC kernels to implement the different operators. Once the driver(s) for the Gaudi NIC ports will be upstreamed, this library may be used together with IBverbs to perform training on multiple Gaudi devices. Thanks, Oded ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Habanalabs Open-Source TPC LLVM compiler and SynapseAI Core library 2021-09-10 7:26 Habanalabs Open-Source TPC LLVM compiler and SynapseAI Core library Oded Gabbay @ 2021-09-10 7:58 ` Greg Kroah-Hartman 2021-09-10 16:09 ` Daniel Vetter 2021-10-27 6:53 ` Habanalabs Open-Source TPC LLVM compiler and SynapseAI Core library Oded Gabbay 2021-09-12 7:38 ` Michael Zuckerman 1 sibling, 2 replies; 15+ messages in thread From: Greg Kroah-Hartman @ 2021-09-10 7:58 UTC (permalink / raw) To: Oded Gabbay Cc: mzuckerman, dsinger, Linus Torvalds, Dave Airlie, Daniel Vetter, Jason Gunthorpe, Linux-Kernel@Vger. Kernel. Org On Fri, Sep 10, 2021 at 10:26:56AM +0300, Oded Gabbay wrote: > Hi Greg, > > Following our conversations a couple of months ago, I'm happy to tell you that > Habanalabs has open-sourced its TPC (Tensor Processing Core) LLVM compiler, > which is a fork of the LLVM open-source project. > > The project can be found on Habanalabs GitHub website at: > https://github.com/HabanaAI/tpc_llvm > > There is a companion guide on how to write TPC kernels at: > https://docs.habana.ai/en/latest/TPC_User_Guide/TPC_User_Guide.html That's great news, thanks for pushing for this and releasing it all! greg k-h ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Habanalabs Open-Source TPC LLVM compiler and SynapseAI Core library 2021-09-10 7:58 ` Greg Kroah-Hartman @ 2021-09-10 16:09 ` Daniel Vetter 2021-09-10 16:10 ` Daniel Vetter 2021-10-27 6:53 ` Habanalabs Open-Source TPC LLVM compiler and SynapseAI Core library Oded Gabbay 1 sibling, 1 reply; 15+ messages in thread From: Daniel Vetter @ 2021-09-10 16:09 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Oded Gabbay, mzuckerman, dsinger, Linus Torvalds, Dave Airlie, Jason Gunthorpe, Linux-Kernel@Vger. Kernel. Org On Fri, Sep 10, 2021 at 9:58 AM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > On Fri, Sep 10, 2021 at 10:26:56AM +0300, Oded Gabbay wrote: > > Hi Greg, > > > > Following our conversations a couple of months ago, I'm happy to tell you that > > Habanalabs has open-sourced its TPC (Tensor Processing Core) LLVM compiler, > > which is a fork of the LLVM open-source project. > > > > The project can be found on Habanalabs GitHub website at: > > https://github.com/HabanaAI/tpc_llvm > > > > There is a companion guide on how to write TPC kernels at: > > https://docs.habana.ai/en/latest/TPC_User_Guide/TPC_User_Guide.html > > That's great news, thanks for pushing for this and releasing it all! Yeah this is neat. There's still the problem that we spent the past 2.5 years pissing off a lot of people for an imo questionable political project, bypassing all the technical review and expertise. Now that the political nonsense is resolved I think we need to look at at least the technical cleanup. The angered people are much harder to fix, so let's maybe ignore that (or perhaps a ks topic, no idea, I'm honestly not super motivated to rehash this entire story again). Here's what I think we should do: - move drivers/misc/habanalabs under drivers/gpu/habanalabs and review/discussions on dri-devel - grandfather the entire current situation in as-is, it's not the only driver we have with a funny uapi of its own (but the other driver did manage to get their compiler into upstream llvm even, and not like 2 years late) - review the dma-buf stuff on dri-devel and then land it through standard flows, not the gregk-misc bypass - close drivers/misc backdoor for further accel driver submissions, I'd like to focus on technical stuff in this area going forward and not pointless exercises in bypassing due process and all that I expect we'll have a proper discussion what the stack should look like with the next submission (from a different vendor maybe), that ship kinda sailed with habanalabs. Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Habanalabs Open-Source TPC LLVM compiler and SynapseAI Core library 2021-09-10 16:09 ` Daniel Vetter @ 2021-09-10 16:10 ` Daniel Vetter 0 siblings, 0 replies; 15+ messages in thread From: Daniel Vetter @ 2021-09-10 16:10 UTC (permalink / raw) To: Greg Kroah-Hartman, dri-devel Cc: Oded Gabbay, mzuckerman, dsinger, Linus Torvalds, Dave Airlie, Jason Gunthorpe, Linux-Kernel@Vger. Kernel. Org Forgot to add dri-devel. On Fri, Sep 10, 2021 at 6:09 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > On Fri, Sep 10, 2021 at 9:58 AM Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > On Fri, Sep 10, 2021 at 10:26:56AM +0300, Oded Gabbay wrote: > > > Hi Greg, > > > > > > Following our conversations a couple of months ago, I'm happy to tell you that > > > Habanalabs has open-sourced its TPC (Tensor Processing Core) LLVM compiler, > > > which is a fork of the LLVM open-source project. > > > > > > The project can be found on Habanalabs GitHub website at: > > > https://github.com/HabanaAI/tpc_llvm > > > > > > There is a companion guide on how to write TPC kernels at: > > > https://docs.habana.ai/en/latest/TPC_User_Guide/TPC_User_Guide.html > > > > That's great news, thanks for pushing for this and releasing it all! > > Yeah this is neat. > > There's still the problem that we spent the past 2.5 years pissing off > a lot of people for an imo questionable political project, bypassing > all the technical review and expertise. Now that the political > nonsense is resolved I think we need to look at at least the technical > cleanup. The angered people are much harder to fix, so let's maybe > ignore that (or perhaps a ks topic, no idea, I'm honestly not super > motivated to rehash this entire story again). Here's what I think we > should do: > > - move drivers/misc/habanalabs under drivers/gpu/habanalabs and > review/discussions on dri-devel > - grandfather the entire current situation in as-is, it's not the only > driver we have with a funny uapi of its own (but the other driver did > manage to get their compiler into upstream llvm even, and not like 2 > years late) > - review the dma-buf stuff on dri-devel and then land it through > standard flows, not the gregk-misc bypass > - close drivers/misc backdoor for further accel driver submissions, > I'd like to focus on technical stuff in this area going forward and > not pointless exercises in bypassing due process and all that > > I expect we'll have a proper discussion what the stack should look > like with the next submission (from a different vendor maybe), that > ship kinda sailed with habanalabs. > > Cheers, Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Habanalabs Open-Source TPC LLVM compiler and SynapseAI Core library @ 2021-09-10 16:10 ` Daniel Vetter 0 siblings, 0 replies; 15+ messages in thread From: Daniel Vetter @ 2021-09-10 16:10 UTC (permalink / raw) To: Greg Kroah-Hartman, dri-devel Cc: Oded Gabbay, mzuckerman, dsinger, Linus Torvalds, Dave Airlie, Jason Gunthorpe, Linux-Kernel@Vger. Kernel. Org Forgot to add dri-devel. On Fri, Sep 10, 2021 at 6:09 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > On Fri, Sep 10, 2021 at 9:58 AM Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > On Fri, Sep 10, 2021 at 10:26:56AM +0300, Oded Gabbay wrote: > > > Hi Greg, > > > > > > Following our conversations a couple of months ago, I'm happy to tell you that > > > Habanalabs has open-sourced its TPC (Tensor Processing Core) LLVM compiler, > > > which is a fork of the LLVM open-source project. > > > > > > The project can be found on Habanalabs GitHub website at: > > > https://github.com/HabanaAI/tpc_llvm > > > > > > There is a companion guide on how to write TPC kernels at: > > > https://docs.habana.ai/en/latest/TPC_User_Guide/TPC_User_Guide.html > > > > That's great news, thanks for pushing for this and releasing it all! > > Yeah this is neat. > > There's still the problem that we spent the past 2.5 years pissing off > a lot of people for an imo questionable political project, bypassing > all the technical review and expertise. Now that the political > nonsense is resolved I think we need to look at at least the technical > cleanup. The angered people are much harder to fix, so let's maybe > ignore that (or perhaps a ks topic, no idea, I'm honestly not super > motivated to rehash this entire story again). Here's what I think we > should do: > > - move drivers/misc/habanalabs under drivers/gpu/habanalabs and > review/discussions on dri-devel > - grandfather the entire current situation in as-is, it's not the only > driver we have with a funny uapi of its own (but the other driver did > manage to get their compiler into upstream llvm even, and not like 2 > years late) > - review the dma-buf stuff on dri-devel and then land it through > standard flows, not the gregk-misc bypass > - close drivers/misc backdoor for further accel driver submissions, > I'd like to focus on technical stuff in this area going forward and > not pointless exercises in bypassing due process and all that > > I expect we'll have a proper discussion what the stack should look > like with the next submission (from a different vendor maybe), that > ship kinda sailed with habanalabs. > > Cheers, Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 15+ messages in thread
* Accelerator drivers going forward (was Re: Habanalabs Open-Source TPC LLVM compiler and SynapseAI Core library) 2021-09-10 16:10 ` Daniel Vetter (?) @ 2021-09-12 13:55 ` Greg Kroah-Hartman 2021-09-12 16:37 ` Simon Ser 2021-09-12 19:32 ` Dave Airlie -1 siblings, 2 replies; 15+ messages in thread From: Greg Kroah-Hartman @ 2021-09-12 13:55 UTC (permalink / raw) To: Daniel Vetter Cc: dri-devel, Oded Gabbay, mzuckerman, dsinger, Linus Torvalds, Dave Airlie, Jason Gunthorpe, Linux-Kernel@Vger. Kernel. Org On Fri, Sep 10, 2021 at 06:10:27PM +0200, Daniel Vetter wrote: > Forgot to add dri-devel. > > On Fri, Sep 10, 2021 at 6:09 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > > On Fri, Sep 10, 2021 at 9:58 AM Greg Kroah-Hartman > > <gregkh@linuxfoundation.org> wrote: > > > On Fri, Sep 10, 2021 at 10:26:56AM +0300, Oded Gabbay wrote: > > > > Hi Greg, > > > > > > > > Following our conversations a couple of months ago, I'm happy to tell you that > > > > Habanalabs has open-sourced its TPC (Tensor Processing Core) LLVM compiler, > > > > which is a fork of the LLVM open-source project. > > > > > > > > The project can be found on Habanalabs GitHub website at: > > > > https://github.com/HabanaAI/tpc_llvm > > > > > > > > There is a companion guide on how to write TPC kernels at: > > > > https://docs.habana.ai/en/latest/TPC_User_Guide/TPC_User_Guide.html > > > > > > That's great news, thanks for pushing for this and releasing it all! > > > > Yeah this is neat. > > > > There's still the problem that we spent the past 2.5 years pissing off > > a lot of people for an imo questionable political project, bypassing > > all the technical review and expertise. Now that the political > > nonsense is resolved I think we need to look at at least the technical > > cleanup. The angered people are much harder to fix, so let's maybe > > ignore that (or perhaps a ks topic, no idea, I'm honestly not super > > motivated to rehash this entire story again). Here's what I think we > > should do: > > > > - move drivers/misc/habanalabs under drivers/gpu/habanalabs and > > review/discussions on dri-devel Wait, why move into gpu? Are we going to do that for all hardware accelerators that we currently have in the kernel tree? These things are not GPUs in the sense of them being "do some work and write out to a screen", which is what I would associate with a GPU (G does stand for "Graphical", right?) Yes, GPUs can do things that some accelerators can do, but they can do things that accelerators can not do, and the other way around as well. I doubt you want all of the existing gpu drivers to be only treated as an "accelerator driver" now, as where would the logic that has to happen to get the bits out to a screen live? And since we have a long history of accepting accelerator drivers (I see some in our tree since 2018 at the least), and there is no common userspace collation trying to make a common userspace api, why do they have to live in the same place? What makes them common except for the fact that they use the kernel as a semi-dumb pipe to send work to and from a different processor? Look at drivers/misc/cxl/ and drivers/misc/ocxl and drivers/misc/uacce/ and drivers/misc/sgi-gru and drivers/misc/bcm-vk/ even drivers/misc/mei/ as that is an off-load engine we talk to, right? What about the drivers/fpga/ api we have, it handles accelerators as well. I'm sure we have many other examples in the kernel tree as well, I just did a quick look and found these. All the above accelerators do things in different ways because their hardware is different, so they need different user/kernel apis, right? How are we going to unify them? Who is going to unify them? So drivers/accel/ perhaps? I would be able to get rid of loads of drivers/misc/ code that way :) Who is going to be the new maintainer of this subsystem? So far they have all been going into drivers/misc/ because no one else stepped up to do the review of them except me. I would _LOVE_ the help here as I end up reviewing a new one every kernel release at the least, but companies do not seem to be willing to fund developers to be maintainers these days :( And yes, I have been reviewing the fpga code as well, even though they do have a good maintainer, as those patches flow through my tree due to historical reasons. I know the fpga developers would have loved some help with review of those patches. > > - grandfather the entire current situation in as-is, it's not the only > > driver we have with a funny uapi of its own (but the other driver did > > manage to get their compiler into upstream llvm even, and not like 2 > > years late) We have many many accelerator drivers with odd uapis as they all work differently. Are we going to have to have any new company that comes along use one of the existing apis we have (and if so, which one?) or do we allow them to create their own as everyone does do things differently, which really is fine as far as a kernel is concerned (again, semi-dumb pipe.) > > - review the dma-buf stuff on dri-devel and then land it through > > standard flows, not the gregk-misc bypass Are dma-bufs somehow required to be reviewed on dri-devel? As others have asked in the past, they are already being used in other subsystems (like IB) today, did those authors also get review there? If so, great, if not, that feels odd to me, as I am seeing lots of out-of-tree drivers start to use these structures, which is why the api was created (to stop the roll-your-own-implementations.) Does dri-devel want me to have those vendors cc: you all when those get submitted? > > - close drivers/misc backdoor for further accel driver submissions, > > I'd like to focus on technical stuff in this area going forward and > > not pointless exercises in bypassing due process and all that I will be glad to not accept any more, but as I say above, what are the new requirements going to be so that those companies that do want to submit their code know what to do? And what exactly are we using as a definition of an accelerator? We have networking cards that are "accelerators" as well as crypto "accelerators" :) > > I expect we'll have a proper discussion what the stack should look > > like with the next submission (from a different vendor maybe), that > > ship kinda sailed with habanalabs. Who is going to define this stack? As there is no industry standard, why would we define this? thanks, greg k-h ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Accelerator drivers going forward (was Re: Habanalabs Open-Source TPC LLVM compiler and SynapseAI Core library) 2021-09-12 13:55 ` Accelerator drivers going forward (was Re: Habanalabs Open-Source TPC LLVM compiler and SynapseAI Core library) Greg Kroah-Hartman @ 2021-09-12 16:37 ` Simon Ser 2021-09-12 19:32 ` Dave Airlie 1 sibling, 0 replies; 15+ messages in thread From: Simon Ser @ 2021-09-12 16:37 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Daniel Vetter, dri-devel, Oded Gabbay, mzuckerman, dsinger, Linus Torvalds, Dave Airlie, Jason Gunthorpe, Linux-Kernel@Vger. Kernel. Org > > > - move drivers/misc/habanalabs under drivers/gpu/habanalabs and > > > review/discussions on dri-devel > > Wait, why move into gpu? Are we going to do that for all hardware > accelerators that we currently have in the kernel tree? > > These things are not GPUs in the sense of them being "do some work and > write out to a screen", which is what I would associate with a GPU (G > does stand for "Graphical", right?) > > Yes, GPUs can do things that some accelerators can do, but they can do > things that accelerators can not do, and the other way around as well. > I doubt you want all of the existing gpu drivers to be only treated as > an "accelerator driver" now, as where would the logic that has to happen > to get the bits out to a screen live? This seems like a description of the "display" part of the drivers, driven by KMS. There are many chips which can't do the "display" part, only the "render" part. Their drivers are living in drivers/gpu/ as well. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Accelerator drivers going forward (was Re: Habanalabs Open-Source TPC LLVM compiler and SynapseAI Core library) 2021-09-12 13:55 ` Accelerator drivers going forward (was Re: Habanalabs Open-Source TPC LLVM compiler and SynapseAI Core library) Greg Kroah-Hartman @ 2021-09-12 19:32 ` Dave Airlie 2021-09-12 19:32 ` Dave Airlie 1 sibling, 0 replies; 15+ messages in thread From: Dave Airlie @ 2021-09-12 19:32 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Daniel Vetter, dri-devel, Oded Gabbay, mzuckerman, dsinger, Linus Torvalds, Jason Gunthorpe, Linux-Kernel@Vger. Kernel. Org On Sun, 12 Sept 2021 at 23:55, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > On Fri, Sep 10, 2021 at 06:10:27PM +0200, Daniel Vetter wrote: > > Forgot to add dri-devel. > > > > On Fri, Sep 10, 2021 at 6:09 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > > > > On Fri, Sep 10, 2021 at 9:58 AM Greg Kroah-Hartman > > > <gregkh@linuxfoundation.org> wrote: > > > > On Fri, Sep 10, 2021 at 10:26:56AM +0300, Oded Gabbay wrote: > > > > > Hi Greg, > > > > > > > > > > Following our conversations a couple of months ago, I'm happy to tell you that > > > > > Habanalabs has open-sourced its TPC (Tensor Processing Core) LLVM compiler, > > > > > which is a fork of the LLVM open-source project. > > > > > > > > > > The project can be found on Habanalabs GitHub website at: > > > > > https://github.com/HabanaAI/tpc_llvm > > > > > > > > > > There is a companion guide on how to write TPC kernels at: > > > > > https://docs.habana.ai/en/latest/TPC_User_Guide/TPC_User_Guide.html > > > > > > > > That's great news, thanks for pushing for this and releasing it all! > > > > > > Yeah this is neat. > > > > > > There's still the problem that we spent the past 2.5 years pissing off > > > a lot of people for an imo questionable political project, bypassing > > > all the technical review and expertise. Now that the political > > > nonsense is resolved I think we need to look at at least the technical > > > cleanup. The angered people are much harder to fix, so let's maybe > > > ignore that (or perhaps a ks topic, no idea, I'm honestly not super > > > motivated to rehash this entire story again). Here's what I think we > > > should do: > > > > > > - move drivers/misc/habanalabs under drivers/gpu/habanalabs and > > > review/discussions on dri-devel > > Wait, why move into gpu? Are we going to do that for all hardware > accelerators that we currently have in the kernel tree? > We could just mv drivers/gpu drivers/accel if that helps your mental model here. > These things are not GPUs in the sense of them being "do some work and > write out to a screen", which is what I would associate with a GPU (G > does stand for "Graphical", right?) Neither are a lot of the gpu drivers, it's almost like we evolved the subsystem in 20 years, and the name got away from us. As an example: etnaviv, panfrost, lima and vgem drivers have no display interfaces at all. Nada, they do nothing except accelerate and use dma-buf to talk to other drivers. > Yes, GPUs can do things that some accelerators can do, but they can do > things that accelerators can not do, and the other way around as well. > I doubt you want all of the existing gpu drivers to be only treated as > an "accelerator driver" now, as where would the logic that has to happen > to get the bits out to a screen live? Don't care, totally doesn't matter if a driver is accelerator + display, you could write in-driver buses if you wanted to abstract this more, since internally most GPUs are just SoCs, the display and accelerator pieces talk to power management, irqs and dma-buf like functionality internally in the driver, the thing is for most GPUs there is a single PCI device to bind to, so historically nobody has seen the value in splitting them more or adding an in-driver bus for one set of devices. > And since we have a long history of accepting accelerator drivers (I see > some in our tree since 2018 at the least), and there is no common > userspace collation trying to make a common userspace api, why do they > have to live in the same place? What makes them common except for the > fact that they use the kernel as a semi-dumb pipe to send work to and > from a different processor? > > Look at drivers/misc/cxl/ and drivers/misc/ocxl and drivers/misc/uacce/ > and drivers/misc/sgi-gru and drivers/misc/bcm-vk/ even drivers/misc/mei/ > as that is an off-load engine we talk to, right? > > What about the drivers/fpga/ api we have, it handles accelerators as > well. I'm sure we have many other examples in the kernel tree as well, > I just did a quick look and found these. > > All the above accelerators do things in different ways because their > hardware is different, so they need different user/kernel apis, right? > How are we going to unify them? Who is going to unify them? > > So drivers/accel/ perhaps? I would be able to get rid of loads of > drivers/misc/ code that way :) > > Who is going to be the new maintainer of this subsystem? We already said if we could get agreement on having things follow the rules, then they can be merged under drm trees or we'd start a new accel tree. The problem is the free-for-all merge with no barriers approach that you and I believe Olof are campaigning for, doesn't seem to create communities, it may create consulting or training opportunities for the Linux Foundation, but thus far I don't see any communities. Graphics accelerator community exists because of and has itself refined the rules over time. I don't think our rules will necessarily work for other groups immediately but I think other groups need to construct acceptable merge criteria beyond the kernel, and kernel maintainers have to take more responsibility for saying no if they don't have time for community building. > So far they have all been going into drivers/misc/ because no one else > stepped up to do the review of them except me. I would _LOVE_ the help > here as I end up reviewing a new one every kernel release at the least, > but companies do not seem to be willing to fund developers to be > maintainers these days :( > > And yes, I have been reviewing the fpga code as well, even though they > do have a good maintainer, as those patches flow through my tree due to > historical reasons. I know the fpga developers would have loved some > help with review of those patches. Lack of reviewing isn't the problem here, lack of responsibility for creating a long term mess is. You are creating long term dumping grounds for badly thought out stuff. Saying people keeping adding more trash to my dump and it's overloading me is just the effect of having created the dump with no rules to follow in the first place. > > > > - review the dma-buf stuff on dri-devel and then land it through > > > standard flows, not the gregk-misc bypass > > Are dma-bufs somehow required to be reviewed on dri-devel? As others > have asked in the past, they are already being used in other subsystems > (like IB) today, did those authors also get review there? Yes any use of dma-buf has to be cc'ed to dri-devel and linux-media per MAINTAINERS > > If so, great, if not, that feels odd to me, as I am seeing lots of > out-of-tree drivers start to use these structures, which is why the api > was created (to stop the roll-your-own-implementations.) Does dri-devel > want me to have those vendors cc: you all when those get submitted? Yes. MAINTAINERS has matching for this, are you not advising people to use the proper submission techniques and thus bypassing that file? The reason is dma-buf and later by extension dma-fence can create really bad problems for the kernel around memory management. https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html#indefinite-dma-fences When a driver is self contained and doesn't interact with other kernel drivers nobody really has to care. However once a driver starts interacting with other drivers in the kernel, a responsible maintainer has to check that these new drivers aren't going to crap all over the existing drivers and destabilise the kernel. Someone has to review the hardware design to see if page faulting works or if preemption works or a bunch of other gotchas. Someone has to review the userspace to make sure it isn't doing knowingly bad things or making assumptions based on the kernel driver doing bad things. The thing is we've had code merged into our in-tree i915 driver that broke a bunch of these assumptions, and have had to spend a year cleaning it out, now this happened post-merge and diligence had lessened, having the expertise to spot this in new dma-buf/fence users is why we insist on having access to way more than just the 1000 line kernel driver submission. > I will be glad to not accept any more, but as I say above, what are the > new requirements going to be so that those companies that do want to > submit their code know what to do? I'm proposing a patch for documentation that maintainers can sign up for (it's mentioned in the ksummit thread). > And what exactly are we using as a definition of an accelerator? We > have networking cards that are "accelerators" as well as crypto > "accelerators" :) > > > > I expect we'll have a proper discussion what the stack should look > > > like with the next submission (from a different vendor maybe), that > > > ship kinda sailed with habanalabs. > > Who is going to define this stack? As there is no industry standard, > why would we define this? Because someone has to help, saying yes isn't helping, it's enabling back behaviour. Parenting and maintaining both involve saying No for the future prosperity of the ecosystem. Dave. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Accelerator drivers going forward (was Re: Habanalabs Open-Source TPC LLVM compiler and SynapseAI Core library) @ 2021-09-12 19:32 ` Dave Airlie 0 siblings, 0 replies; 15+ messages in thread From: Dave Airlie @ 2021-09-12 19:32 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Daniel Vetter, dri-devel, Oded Gabbay, mzuckerman, dsinger, Linus Torvalds, Jason Gunthorpe, Linux-Kernel@Vger. Kernel. Org On Sun, 12 Sept 2021 at 23:55, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > On Fri, Sep 10, 2021 at 06:10:27PM +0200, Daniel Vetter wrote: > > Forgot to add dri-devel. > > > > On Fri, Sep 10, 2021 at 6:09 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > > > > On Fri, Sep 10, 2021 at 9:58 AM Greg Kroah-Hartman > > > <gregkh@linuxfoundation.org> wrote: > > > > On Fri, Sep 10, 2021 at 10:26:56AM +0300, Oded Gabbay wrote: > > > > > Hi Greg, > > > > > > > > > > Following our conversations a couple of months ago, I'm happy to tell you that > > > > > Habanalabs has open-sourced its TPC (Tensor Processing Core) LLVM compiler, > > > > > which is a fork of the LLVM open-source project. > > > > > > > > > > The project can be found on Habanalabs GitHub website at: > > > > > https://github.com/HabanaAI/tpc_llvm > > > > > > > > > > There is a companion guide on how to write TPC kernels at: > > > > > https://docs.habana.ai/en/latest/TPC_User_Guide/TPC_User_Guide.html > > > > > > > > That's great news, thanks for pushing for this and releasing it all! > > > > > > Yeah this is neat. > > > > > > There's still the problem that we spent the past 2.5 years pissing off > > > a lot of people for an imo questionable political project, bypassing > > > all the technical review and expertise. Now that the political > > > nonsense is resolved I think we need to look at at least the technical > > > cleanup. The angered people are much harder to fix, so let's maybe > > > ignore that (or perhaps a ks topic, no idea, I'm honestly not super > > > motivated to rehash this entire story again). Here's what I think we > > > should do: > > > > > > - move drivers/misc/habanalabs under drivers/gpu/habanalabs and > > > review/discussions on dri-devel > > Wait, why move into gpu? Are we going to do that for all hardware > accelerators that we currently have in the kernel tree? > We could just mv drivers/gpu drivers/accel if that helps your mental model here. > These things are not GPUs in the sense of them being "do some work and > write out to a screen", which is what I would associate with a GPU (G > does stand for "Graphical", right?) Neither are a lot of the gpu drivers, it's almost like we evolved the subsystem in 20 years, and the name got away from us. As an example: etnaviv, panfrost, lima and vgem drivers have no display interfaces at all. Nada, they do nothing except accelerate and use dma-buf to talk to other drivers. > Yes, GPUs can do things that some accelerators can do, but they can do > things that accelerators can not do, and the other way around as well. > I doubt you want all of the existing gpu drivers to be only treated as > an "accelerator driver" now, as where would the logic that has to happen > to get the bits out to a screen live? Don't care, totally doesn't matter if a driver is accelerator + display, you could write in-driver buses if you wanted to abstract this more, since internally most GPUs are just SoCs, the display and accelerator pieces talk to power management, irqs and dma-buf like functionality internally in the driver, the thing is for most GPUs there is a single PCI device to bind to, so historically nobody has seen the value in splitting them more or adding an in-driver bus for one set of devices. > And since we have a long history of accepting accelerator drivers (I see > some in our tree since 2018 at the least), and there is no common > userspace collation trying to make a common userspace api, why do they > have to live in the same place? What makes them common except for the > fact that they use the kernel as a semi-dumb pipe to send work to and > from a different processor? > > Look at drivers/misc/cxl/ and drivers/misc/ocxl and drivers/misc/uacce/ > and drivers/misc/sgi-gru and drivers/misc/bcm-vk/ even drivers/misc/mei/ > as that is an off-load engine we talk to, right? > > What about the drivers/fpga/ api we have, it handles accelerators as > well. I'm sure we have many other examples in the kernel tree as well, > I just did a quick look and found these. > > All the above accelerators do things in different ways because their > hardware is different, so they need different user/kernel apis, right? > How are we going to unify them? Who is going to unify them? > > So drivers/accel/ perhaps? I would be able to get rid of loads of > drivers/misc/ code that way :) > > Who is going to be the new maintainer of this subsystem? We already said if we could get agreement on having things follow the rules, then they can be merged under drm trees or we'd start a new accel tree. The problem is the free-for-all merge with no barriers approach that you and I believe Olof are campaigning for, doesn't seem to create communities, it may create consulting or training opportunities for the Linux Foundation, but thus far I don't see any communities. Graphics accelerator community exists because of and has itself refined the rules over time. I don't think our rules will necessarily work for other groups immediately but I think other groups need to construct acceptable merge criteria beyond the kernel, and kernel maintainers have to take more responsibility for saying no if they don't have time for community building. > So far they have all been going into drivers/misc/ because no one else > stepped up to do the review of them except me. I would _LOVE_ the help > here as I end up reviewing a new one every kernel release at the least, > but companies do not seem to be willing to fund developers to be > maintainers these days :( > > And yes, I have been reviewing the fpga code as well, even though they > do have a good maintainer, as those patches flow through my tree due to > historical reasons. I know the fpga developers would have loved some > help with review of those patches. Lack of reviewing isn't the problem here, lack of responsibility for creating a long term mess is. You are creating long term dumping grounds for badly thought out stuff. Saying people keeping adding more trash to my dump and it's overloading me is just the effect of having created the dump with no rules to follow in the first place. > > > > - review the dma-buf stuff on dri-devel and then land it through > > > standard flows, not the gregk-misc bypass > > Are dma-bufs somehow required to be reviewed on dri-devel? As others > have asked in the past, they are already being used in other subsystems > (like IB) today, did those authors also get review there? Yes any use of dma-buf has to be cc'ed to dri-devel and linux-media per MAINTAINERS > > If so, great, if not, that feels odd to me, as I am seeing lots of > out-of-tree drivers start to use these structures, which is why the api > was created (to stop the roll-your-own-implementations.) Does dri-devel > want me to have those vendors cc: you all when those get submitted? Yes. MAINTAINERS has matching for this, are you not advising people to use the proper submission techniques and thus bypassing that file? The reason is dma-buf and later by extension dma-fence can create really bad problems for the kernel around memory management. https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html#indefinite-dma-fences When a driver is self contained and doesn't interact with other kernel drivers nobody really has to care. However once a driver starts interacting with other drivers in the kernel, a responsible maintainer has to check that these new drivers aren't going to crap all over the existing drivers and destabilise the kernel. Someone has to review the hardware design to see if page faulting works or if preemption works or a bunch of other gotchas. Someone has to review the userspace to make sure it isn't doing knowingly bad things or making assumptions based on the kernel driver doing bad things. The thing is we've had code merged into our in-tree i915 driver that broke a bunch of these assumptions, and have had to spend a year cleaning it out, now this happened post-merge and diligence had lessened, having the expertise to spot this in new dma-buf/fence users is why we insist on having access to way more than just the 1000 line kernel driver submission. > I will be glad to not accept any more, but as I say above, what are the > new requirements going to be so that those companies that do want to > submit their code know what to do? I'm proposing a patch for documentation that maintainers can sign up for (it's mentioned in the ksummit thread). > And what exactly are we using as a definition of an accelerator? We > have networking cards that are "accelerators" as well as crypto > "accelerators" :) > > > > I expect we'll have a proper discussion what the stack should look > > > like with the next submission (from a different vendor maybe), that > > > ship kinda sailed with habanalabs. > > Who is going to define this stack? As there is no industry standard, > why would we define this? Because someone has to help, saying yes isn't helping, it's enabling back behaviour. Parenting and maintaining both involve saying No for the future prosperity of the ecosystem. Dave. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Accelerator drivers going forward (was Re: Habanalabs Open-Source TPC LLVM compiler and SynapseAI Core library) 2021-09-12 19:32 ` Dave Airlie @ 2021-09-14 8:42 ` Oded Gabbay -1 siblings, 0 replies; 15+ messages in thread From: Oded Gabbay @ 2021-09-14 8:42 UTC (permalink / raw) To: Dave Airlie Cc: Greg Kroah-Hartman, Daniel Vetter, dri-devel, mzuckerman, dsinger, Linus Torvalds, Jason Gunthorpe, Linux-Kernel@Vger. Kernel. Org On Sun, Sep 12, 2021 at 10:32 PM Dave Airlie <airlied@gmail.com> wrote: > > On Sun, 12 Sept 2021 at 23:55, Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > > > On Fri, Sep 10, 2021 at 06:10:27PM +0200, Daniel Vetter wrote: > > > Forgot to add dri-devel. > > > > > > On Fri, Sep 10, 2021 at 6:09 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > > > > > > On Fri, Sep 10, 2021 at 9:58 AM Greg Kroah-Hartman > > > > <gregkh@linuxfoundation.org> wrote: > > > > > On Fri, Sep 10, 2021 at 10:26:56AM +0300, Oded Gabbay wrote: > > > > > > Hi Greg, > > > > > > > > > > > > Following our conversations a couple of months ago, I'm happy to tell you that > > > > > > Habanalabs has open-sourced its TPC (Tensor Processing Core) LLVM compiler, > > > > > > which is a fork of the LLVM open-source project. > > > > > > > > > > > > The project can be found on Habanalabs GitHub website at: > > > > > > https://github.com/HabanaAI/tpc_llvm > > > > > > > > > > > > There is a companion guide on how to write TPC kernels at: > > > > > > https://docs.habana.ai/en/latest/TPC_User_Guide/TPC_User_Guide.html > > > > > > > > > > That's great news, thanks for pushing for this and releasing it all! > > > > > > > > Yeah this is neat. > > > > > > > > There's still the problem that we spent the past 2.5 years pissing off > > > > a lot of people for an imo questionable political project, bypassing > > > > all the technical review and expertise. Now that the political > > > > nonsense is resolved I think we need to look at at least the technical > > > > cleanup. The angered people are much harder to fix, so let's maybe > > > > ignore that (or perhaps a ks topic, no idea, I'm honestly not super > > > > motivated to rehash this entire story again). Here's what I think we > > > > should do: > > > > > > > > - move drivers/misc/habanalabs under drivers/gpu/habanalabs and > > > > review/discussions on dri-devel > > > > Wait, why move into gpu? Are we going to do that for all hardware > > accelerators that we currently have in the kernel tree? > > > > We could just mv drivers/gpu drivers/accel if that helps your mental model here. > > > These things are not GPUs in the sense of them being "do some work and > > write out to a screen", which is what I would associate with a GPU (G > > does stand for "Graphical", right?) > > Neither are a lot of the gpu drivers, it's almost like we evolved the > subsystem in 20 years, > and the name got away from us. > > As an example: > etnaviv, panfrost, lima and vgem drivers have no display interfaces at > all. Nada, they do nothing except accelerate and use dma-buf to talk > to other drivers. > > > > Yes, GPUs can do things that some accelerators can do, but they can do > > things that accelerators can not do, and the other way around as well. > > I doubt you want all of the existing gpu drivers to be only treated as > > an "accelerator driver" now, as where would the logic that has to happen > > to get the bits out to a screen live? > > Don't care, totally doesn't matter if a driver is accelerator + > display, you could write in-driver buses if you wanted to abstract > this more, since internally most GPUs are just SoCs, the display and > accelerator pieces talk to power management, irqs and dma-buf like > functionality internally in the driver, the thing is for most GPUs > there is a single PCI device to bind to, so historically nobody has > seen the value in splitting them more or adding an in-driver bus for > one set of devices. > > > And since we have a long history of accepting accelerator drivers (I see > > some in our tree since 2018 at the least), and there is no common > > userspace collation trying to make a common userspace api, why do they > > have to live in the same place? What makes them common except for the > > fact that they use the kernel as a semi-dumb pipe to send work to and > > from a different processor? > > > > Look at drivers/misc/cxl/ and drivers/misc/ocxl and drivers/misc/uacce/ > > and drivers/misc/sgi-gru and drivers/misc/bcm-vk/ even drivers/misc/mei/ > > as that is an off-load engine we talk to, right? > > > > What about the drivers/fpga/ api we have, it handles accelerators as > > well. I'm sure we have many other examples in the kernel tree as well, > > I just did a quick look and found these. > > > > All the above accelerators do things in different ways because their > > hardware is different, so they need different user/kernel apis, right? > > How are we going to unify them? Who is going to unify them? > > > > So drivers/accel/ perhaps? I would be able to get rid of loads of > > drivers/misc/ code that way :) > > > > Who is going to be the new maintainer of this subsystem? > > We already said if we could get agreement on having things follow the > rules, then they can be merged under drm trees or we'd start a new > accel tree. > > The problem is the free-for-all merge with no barriers approach that > you and I believe Olof are campaigning for, doesn't seem to create > communities, it may create consulting or training opportunities for > the Linux Foundation, but thus far I don't see any communities. > > Graphics accelerator community exists because of and has itself > refined the rules over time. I don't think our rules will necessarily > work for other groups immediately but I think other groups need to > construct acceptable merge criteria beyond the kernel, and kernel > maintainers have to take more responsibility for saying no if they > don't have time for community building. > > > > So far they have all been going into drivers/misc/ because no one else > > stepped up to do the review of them except me. I would _LOVE_ the help > > here as I end up reviewing a new one every kernel release at the least, > > but companies do not seem to be willing to fund developers to be > > maintainers these days :( > > > > And yes, I have been reviewing the fpga code as well, even though they > > do have a good maintainer, as those patches flow through my tree due to > > historical reasons. I know the fpga developers would have loved some > > help with review of those patches. > > Lack of reviewing isn't the problem here, lack of responsibility for > creating a long term mess is. You are creating long term dumping > grounds for badly thought out stuff. Saying people keeping adding more > trash to my dump and it's overloading me is just the effect of having > created the dump with no rules to follow in the first place. > > > > > > > - review the dma-buf stuff on dri-devel and then land it through > > > > standard flows, not the gregk-misc bypass > > > > Are dma-bufs somehow required to be reviewed on dri-devel? As others > > have asked in the past, they are already being used in other subsystems > > (like IB) today, did those authors also get review there? > > Yes any use of dma-buf has to be cc'ed to dri-devel and linux-media > per MAINTAINERS > Hi Dave/Daniel, Now that we opened up the user-space compiler and provided a library with which you can load compiled kernels and run them, I've re-sent the two dma-buf patches to dri-devel and linux-media (and to specific people) on Sunday evening. Can you please help review them ? They already got reviewed by Christian and Jason on previous iterations and I fixed them according to their reviews so I believe they are fundamentally correct. Thanks, Oded > > > > If so, great, if not, that feels odd to me, as I am seeing lots of > > out-of-tree drivers start to use these structures, which is why the api > > was created (to stop the roll-your-own-implementations.) Does dri-devel > > want me to have those vendors cc: you all when those get submitted? > > Yes. MAINTAINERS has matching for this, are you not advising people to > use the proper submission techniques and thus bypassing that file? > > The reason is dma-buf and later by extension dma-fence can create > really bad problems for the kernel around memory management. > > https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html#indefinite-dma-fences > > When a driver is self contained and doesn't interact with other kernel > drivers nobody really has to care. However once a driver starts > interacting with other drivers in the kernel, a responsible maintainer > has to check that these new drivers aren't going to crap all over the > existing drivers and destabilise the kernel. Someone has to review the > hardware design to see if page faulting works or if preemption works > or a bunch of other gotchas. Someone has to review the userspace to > make sure it isn't doing knowingly bad things or making assumptions > based on the kernel driver doing bad things. > > The thing is we've had code merged into our in-tree i915 driver that > broke a bunch of these assumptions, and have had to spend a year > cleaning it out, now this happened post-merge and diligence had > lessened, having the expertise to spot this in new dma-buf/fence users > is why we insist on having access to way more than just the 1000 line > kernel driver submission. > > > > I will be glad to not accept any more, but as I say above, what are the > > new requirements going to be so that those companies that do want to > > submit their code know what to do? > > I'm proposing a patch for documentation that maintainers can sign up > for (it's mentioned in the ksummit thread). > > > And what exactly are we using as a definition of an accelerator? We > > have networking cards that are "accelerators" as well as crypto > > "accelerators" :) > > > > > > I expect we'll have a proper discussion what the stack should look > > > > like with the next submission (from a different vendor maybe), that > > > > ship kinda sailed with habanalabs. > > > > Who is going to define this stack? As there is no industry standard, > > why would we define this? > > Because someone has to help, saying yes isn't helping, it's enabling > back behaviour. Parenting and maintaining both involve saying No for > the future prosperity of the ecosystem. > > Dave. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Accelerator drivers going forward (was Re: Habanalabs Open-Source TPC LLVM compiler and SynapseAI Core library) @ 2021-09-14 8:42 ` Oded Gabbay 0 siblings, 0 replies; 15+ messages in thread From: Oded Gabbay @ 2021-09-14 8:42 UTC (permalink / raw) To: Dave Airlie Cc: Greg Kroah-Hartman, Daniel Vetter, dri-devel, mzuckerman, dsinger, Linus Torvalds, Jason Gunthorpe, Linux-Kernel@Vger. Kernel. Org On Sun, Sep 12, 2021 at 10:32 PM Dave Airlie <airlied@gmail.com> wrote: > > On Sun, 12 Sept 2021 at 23:55, Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > > > On Fri, Sep 10, 2021 at 06:10:27PM +0200, Daniel Vetter wrote: > > > Forgot to add dri-devel. > > > > > > On Fri, Sep 10, 2021 at 6:09 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > > > > > > On Fri, Sep 10, 2021 at 9:58 AM Greg Kroah-Hartman > > > > <gregkh@linuxfoundation.org> wrote: > > > > > On Fri, Sep 10, 2021 at 10:26:56AM +0300, Oded Gabbay wrote: > > > > > > Hi Greg, > > > > > > > > > > > > Following our conversations a couple of months ago, I'm happy to tell you that > > > > > > Habanalabs has open-sourced its TPC (Tensor Processing Core) LLVM compiler, > > > > > > which is a fork of the LLVM open-source project. > > > > > > > > > > > > The project can be found on Habanalabs GitHub website at: > > > > > > https://github.com/HabanaAI/tpc_llvm > > > > > > > > > > > > There is a companion guide on how to write TPC kernels at: > > > > > > https://docs.habana.ai/en/latest/TPC_User_Guide/TPC_User_Guide.html > > > > > > > > > > That's great news, thanks for pushing for this and releasing it all! > > > > > > > > Yeah this is neat. > > > > > > > > There's still the problem that we spent the past 2.5 years pissing off > > > > a lot of people for an imo questionable political project, bypassing > > > > all the technical review and expertise. Now that the political > > > > nonsense is resolved I think we need to look at at least the technical > > > > cleanup. The angered people are much harder to fix, so let's maybe > > > > ignore that (or perhaps a ks topic, no idea, I'm honestly not super > > > > motivated to rehash this entire story again). Here's what I think we > > > > should do: > > > > > > > > - move drivers/misc/habanalabs under drivers/gpu/habanalabs and > > > > review/discussions on dri-devel > > > > Wait, why move into gpu? Are we going to do that for all hardware > > accelerators that we currently have in the kernel tree? > > > > We could just mv drivers/gpu drivers/accel if that helps your mental model here. > > > These things are not GPUs in the sense of them being "do some work and > > write out to a screen", which is what I would associate with a GPU (G > > does stand for "Graphical", right?) > > Neither are a lot of the gpu drivers, it's almost like we evolved the > subsystem in 20 years, > and the name got away from us. > > As an example: > etnaviv, panfrost, lima and vgem drivers have no display interfaces at > all. Nada, they do nothing except accelerate and use dma-buf to talk > to other drivers. > > > > Yes, GPUs can do things that some accelerators can do, but they can do > > things that accelerators can not do, and the other way around as well. > > I doubt you want all of the existing gpu drivers to be only treated as > > an "accelerator driver" now, as where would the logic that has to happen > > to get the bits out to a screen live? > > Don't care, totally doesn't matter if a driver is accelerator + > display, you could write in-driver buses if you wanted to abstract > this more, since internally most GPUs are just SoCs, the display and > accelerator pieces talk to power management, irqs and dma-buf like > functionality internally in the driver, the thing is for most GPUs > there is a single PCI device to bind to, so historically nobody has > seen the value in splitting them more or adding an in-driver bus for > one set of devices. > > > And since we have a long history of accepting accelerator drivers (I see > > some in our tree since 2018 at the least), and there is no common > > userspace collation trying to make a common userspace api, why do they > > have to live in the same place? What makes them common except for the > > fact that they use the kernel as a semi-dumb pipe to send work to and > > from a different processor? > > > > Look at drivers/misc/cxl/ and drivers/misc/ocxl and drivers/misc/uacce/ > > and drivers/misc/sgi-gru and drivers/misc/bcm-vk/ even drivers/misc/mei/ > > as that is an off-load engine we talk to, right? > > > > What about the drivers/fpga/ api we have, it handles accelerators as > > well. I'm sure we have many other examples in the kernel tree as well, > > I just did a quick look and found these. > > > > All the above accelerators do things in different ways because their > > hardware is different, so they need different user/kernel apis, right? > > How are we going to unify them? Who is going to unify them? > > > > So drivers/accel/ perhaps? I would be able to get rid of loads of > > drivers/misc/ code that way :) > > > > Who is going to be the new maintainer of this subsystem? > > We already said if we could get agreement on having things follow the > rules, then they can be merged under drm trees or we'd start a new > accel tree. > > The problem is the free-for-all merge with no barriers approach that > you and I believe Olof are campaigning for, doesn't seem to create > communities, it may create consulting or training opportunities for > the Linux Foundation, but thus far I don't see any communities. > > Graphics accelerator community exists because of and has itself > refined the rules over time. I don't think our rules will necessarily > work for other groups immediately but I think other groups need to > construct acceptable merge criteria beyond the kernel, and kernel > maintainers have to take more responsibility for saying no if they > don't have time for community building. > > > > So far they have all been going into drivers/misc/ because no one else > > stepped up to do the review of them except me. I would _LOVE_ the help > > here as I end up reviewing a new one every kernel release at the least, > > but companies do not seem to be willing to fund developers to be > > maintainers these days :( > > > > And yes, I have been reviewing the fpga code as well, even though they > > do have a good maintainer, as those patches flow through my tree due to > > historical reasons. I know the fpga developers would have loved some > > help with review of those patches. > > Lack of reviewing isn't the problem here, lack of responsibility for > creating a long term mess is. You are creating long term dumping > grounds for badly thought out stuff. Saying people keeping adding more > trash to my dump and it's overloading me is just the effect of having > created the dump with no rules to follow in the first place. > > > > > > > - review the dma-buf stuff on dri-devel and then land it through > > > > standard flows, not the gregk-misc bypass > > > > Are dma-bufs somehow required to be reviewed on dri-devel? As others > > have asked in the past, they are already being used in other subsystems > > (like IB) today, did those authors also get review there? > > Yes any use of dma-buf has to be cc'ed to dri-devel and linux-media > per MAINTAINERS > Hi Dave/Daniel, Now that we opened up the user-space compiler and provided a library with which you can load compiled kernels and run them, I've re-sent the two dma-buf patches to dri-devel and linux-media (and to specific people) on Sunday evening. Can you please help review them ? They already got reviewed by Christian and Jason on previous iterations and I fixed them according to their reviews so I believe they are fundamentally correct. Thanks, Oded > > > > If so, great, if not, that feels odd to me, as I am seeing lots of > > out-of-tree drivers start to use these structures, which is why the api > > was created (to stop the roll-your-own-implementations.) Does dri-devel > > want me to have those vendors cc: you all when those get submitted? > > Yes. MAINTAINERS has matching for this, are you not advising people to > use the proper submission techniques and thus bypassing that file? > > The reason is dma-buf and later by extension dma-fence can create > really bad problems for the kernel around memory management. > > https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html#indefinite-dma-fences > > When a driver is self contained and doesn't interact with other kernel > drivers nobody really has to care. However once a driver starts > interacting with other drivers in the kernel, a responsible maintainer > has to check that these new drivers aren't going to crap all over the > existing drivers and destabilise the kernel. Someone has to review the > hardware design to see if page faulting works or if preemption works > or a bunch of other gotchas. Someone has to review the userspace to > make sure it isn't doing knowingly bad things or making assumptions > based on the kernel driver doing bad things. > > The thing is we've had code merged into our in-tree i915 driver that > broke a bunch of these assumptions, and have had to spend a year > cleaning it out, now this happened post-merge and diligence had > lessened, having the expertise to spot this in new dma-buf/fence users > is why we insist on having access to way more than just the 1000 line > kernel driver submission. > > > > I will be glad to not accept any more, but as I say above, what are the > > new requirements going to be so that those companies that do want to > > submit their code know what to do? > > I'm proposing a patch for documentation that maintainers can sign up > for (it's mentioned in the ksummit thread). > > > And what exactly are we using as a definition of an accelerator? We > > have networking cards that are "accelerators" as well as crypto > > "accelerators" :) > > > > > > I expect we'll have a proper discussion what the stack should look > > > > like with the next submission (from a different vendor maybe), that > > > > ship kinda sailed with habanalabs. > > > > Who is going to define this stack? As there is no industry standard, > > why would we define this? > > Because someone has to help, saying yes isn't helping, it's enabling > back behaviour. Parenting and maintaining both involve saying No for > the future prosperity of the ecosystem. > > Dave. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Habanalabs Open-Source TPC LLVM compiler and SynapseAI Core library 2021-09-10 7:58 ` Greg Kroah-Hartman 2021-09-10 16:09 ` Daniel Vetter @ 2021-10-27 6:53 ` Oded Gabbay 2021-10-28 7:38 ` Daniel Vetter 1 sibling, 1 reply; 15+ messages in thread From: Oded Gabbay @ 2021-10-27 6:53 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Linus Torvalds, Dave Airlie, Daniel Vetter, Jason Gunthorpe, Linux-Kernel@Vger. Kernel. Org On Fri, Sep 10, 2021 at 10:58 AM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > On Fri, Sep 10, 2021 at 10:26:56AM +0300, Oded Gabbay wrote: > > Hi Greg, > > > > Following our conversations a couple of months ago, I'm happy to tell you that > > Habanalabs has open-sourced its TPC (Tensor Processing Core) LLVM compiler, > > which is a fork of the LLVM open-source project. > > > > The project can be found on Habanalabs GitHub website at: > > https://github.com/HabanaAI/tpc_llvm > > > > There is a companion guide on how to write TPC kernels at: > > https://docs.habana.ai/en/latest/TPC_User_Guide/TPC_User_Guide.html > > That's great news, thanks for pushing for this and releasing it all! > > greg k-h Hi Greg, I would like to update that yesterday AWS launched new EC2 instances powered by the Gaudi accelerators. It is now in general availability, and anyone can launch an instance with those devices. Therefore, one can now take the upstream driver, hl-thunk, tpc llvm compiler and SynapseAI core and execute compute kernels on the Gaudi devices. I have verified this to be working with the driver in kernel 5.15-rc6. We are still missing the networking parts, but I hope to start upstreaming them in the next coming months. Thanks, Oded ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Habanalabs Open-Source TPC LLVM compiler and SynapseAI Core library 2021-10-27 6:53 ` Habanalabs Open-Source TPC LLVM compiler and SynapseAI Core library Oded Gabbay @ 2021-10-28 7:38 ` Daniel Vetter 2021-10-28 12:00 ` Oded Gabbay 0 siblings, 1 reply; 15+ messages in thread From: Daniel Vetter @ 2021-10-28 7:38 UTC (permalink / raw) To: Oded Gabbay Cc: Greg Kroah-Hartman, Linus Torvalds, Dave Airlie, Jason Gunthorpe, Linux-Kernel@Vger. Kernel. Org On Wed, Oct 27, 2021 at 8:53 AM Oded Gabbay <ogabbay@kernel.org> wrote: > > On Fri, Sep 10, 2021 at 10:58 AM Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > > > On Fri, Sep 10, 2021 at 10:26:56AM +0300, Oded Gabbay wrote: > > > Hi Greg, > > > > > > Following our conversations a couple of months ago, I'm happy to tell you that > > > Habanalabs has open-sourced its TPC (Tensor Processing Core) LLVM compiler, > > > which is a fork of the LLVM open-source project. > > > > > > The project can be found on Habanalabs GitHub website at: > > > https://github.com/HabanaAI/tpc_llvm > > > > > > There is a companion guide on how to write TPC kernels at: > > > https://docs.habana.ai/en/latest/TPC_User_Guide/TPC_User_Guide.html > > > > That's great news, thanks for pushing for this and releasing it all! > > > > greg k-h > > Hi Greg, > I would like to update that yesterday AWS launched new EC2 instances > powered by the Gaudi accelerators. It is now in general availability, > and anyone can launch an instance with those devices. > Therefore, one can now take the upstream driver, hl-thunk, tpc llvm > compiler and SynapseAI core and execute compute kernels on the Gaudi > devices. I have verified this to be working with the driver in kernel > 5.15-rc6. Nice! Now that the llvm part is open, any plans to upstream that? Years ago when amd upstreamed their backend there was the hope that llvm would grow some competent support for gpu style accelerator isa, but since for years now amd's the only backend that ever was merged it's stuck in a chicken-egg situation of upstream llvm complaining why amd backend has all these special requirements. And other accel backends (at least the gpu-style simd ones) not having a good path to upstream llvm since a lot of the infrastructure and understanding isn't there. Getting a 2nd accel backend into upstream llvm would be a huge step towards fixing this mess. As far as I know the only other open accel backend based on llvm is intel's igc (for intel gpus), and that one is such a massive fork that's been out of upstream llvm for so long that it's not going to land anytime soon, if ever (in it's current form at least). Once we do have an accel backend in upstream llvm we can finally start building a real stack here I think, so whomever is first will win quite some advantage I think. Cheers, Daniel > We are still missing the networking parts, but I hope to start > upstreaming them in the next coming months. > > Thanks, > Oded -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Habanalabs Open-Source TPC LLVM compiler and SynapseAI Core library 2021-10-28 7:38 ` Daniel Vetter @ 2021-10-28 12:00 ` Oded Gabbay 0 siblings, 0 replies; 15+ messages in thread From: Oded Gabbay @ 2021-10-28 12:00 UTC (permalink / raw) To: Daniel Vetter Cc: Greg Kroah-Hartman, Linus Torvalds, Dave Airlie, Jason Gunthorpe, Linux-Kernel@Vger. Kernel. Org On Thu, Oct 28, 2021 at 10:38 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > On Wed, Oct 27, 2021 at 8:53 AM Oded Gabbay <ogabbay@kernel.org> wrote: > > > > On Fri, Sep 10, 2021 at 10:58 AM Greg Kroah-Hartman > > <gregkh@linuxfoundation.org> wrote: > > > > > > On Fri, Sep 10, 2021 at 10:26:56AM +0300, Oded Gabbay wrote: > > > > Hi Greg, > > > > > > > > Following our conversations a couple of months ago, I'm happy to tell you that > > > > Habanalabs has open-sourced its TPC (Tensor Processing Core) LLVM compiler, > > > > which is a fork of the LLVM open-source project. > > > > > > > > The project can be found on Habanalabs GitHub website at: > > > > https://github.com/HabanaAI/tpc_llvm > > > > > > > > There is a companion guide on how to write TPC kernels at: > > > > https://docs.habana.ai/en/latest/TPC_User_Guide/TPC_User_Guide.html > > > > > > That's great news, thanks for pushing for this and releasing it all! > > > > > > greg k-h > > > > Hi Greg, > > I would like to update that yesterday AWS launched new EC2 instances > > powered by the Gaudi accelerators. It is now in general availability, > > and anyone can launch an instance with those devices. > > Therefore, one can now take the upstream driver, hl-thunk, tpc llvm > > compiler and SynapseAI core and execute compute kernels on the Gaudi > > devices. I have verified this to be working with the driver in kernel > > 5.15-rc6. > > Nice! > > Now that the llvm part is open, any plans to upstream that? Years ago AFAIK, there were internal discussions about doing that and the decision was to pursue that goal somewhere in the future. Not sure how far in the future they were talking about... Having said that, I'm not at all involved at the compiler front, so I might have outdated information. If you want, I can connect you with the compiler group leader to discuss that with him. Oded > when amd upstreamed their backend there was the hope that llvm would > grow some competent support for gpu style accelerator isa, but since > for years now amd's the only backend that ever was merged it's stuck > in a chicken-egg situation of upstream llvm complaining why amd > backend has all these special requirements. And other accel backends > (at least the gpu-style simd ones) not having a good path to upstream > llvm since a lot of the infrastructure and understanding isn't there. > > Getting a 2nd accel backend into upstream llvm would be a huge step > towards fixing this mess. As far as I know the only other open accel > backend based on llvm is intel's igc (for intel gpus), and that one is > such a massive fork that's been out of upstream llvm for so long that > it's not going to land anytime soon, if ever (in it's current form at > least). > > Once we do have an accel backend in upstream llvm we can finally start > building a real stack here I think, so whomever is first will win > quite some advantage I think. > > Cheers, Daniel > > > We are still missing the networking parts, but I hope to start > > upstreaming them in the next coming months. > > > > Thanks, > > Oded > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Habanalabs Open-Source TPC LLVM compiler and SynapseAI Core library 2021-09-10 7:26 Habanalabs Open-Source TPC LLVM compiler and SynapseAI Core library Oded Gabbay 2021-09-10 7:58 ` Greg Kroah-Hartman @ 2021-09-12 7:38 ` Michael Zuckerman 1 sibling, 0 replies; 15+ messages in thread From: Michael Zuckerman @ 2021-09-12 7:38 UTC (permalink / raw) To: Oded Gabbay, Greg Kroah-Hartman, Tzachi Cohen Cc: Doron Singer, Linus Torvalds, Dave Airlie, Daniel Vetter, Jason Gunthorpe, Linux-Kernel@Vger. Kernel. Org Add @Tzachi Cohen -----Original Message----- From: Oded Gabbay <ogabbay@kernel.org> Sent: Friday, 10 September 2021 10:27 To: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michael Zuckerman <mzuckerman@habana.ai>; Doron Singer <dsinger@habana.ai>; Linus Torvalds <torvalds@linux-foundation.org>; Dave Airlie <airlied@gmail.com>; Daniel Vetter <daniel.vetter@ffwll.ch>; Jason Gunthorpe <jgg@ziepe.ca>; Linux-Kernel@Vger. Kernel. Org <linux-kernel@vger.kernel.org> Subject: Habanalabs Open-Source TPC LLVM compiler and SynapseAI Core library [Some people who received this message don't often get email from ogabbay@kernel.org. Learn why this is important at http://aka.ms/LearnAboutSenderIdentification.] Hi Greg, Following our conversations a couple of months ago, I'm happy to tell you that Habanalabs has open-sourced its TPC (Tensor Processing Core) LLVM compiler, which is a fork of the LLVM open-source project. The project can be found on Habanalabs GitHub website at: https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FHabanaAI%2Ftpc_llvm&data=04%7C01%7Cmzuckerman%40habana.ai%7C2380c349fa96487b7c6b08d9742c6fd6%7C0d4d4539213c4ed8a251dc9766ba127a%7C0%7C0%7C637668556484231215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=rp%2FIMKAAi%2FAFjjs3GM0cV4ViFn1bkA9nhAq632QB0TQ%3D&reserved=0 There is a companion guide on how to write TPC kernels at: https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.habana.ai%2Fen%2Flatest%2FTPC_User_Guide%2FTPC_User_Guide.html&data=04%7C01%7Cmzuckerman%40habana.ai%7C2380c349fa96487b7c6b08d9742c6fd6%7C0d4d4539213c4ed8a251dc9766ba127a%7C0%7C0%7C637668556484241172%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=UdmYc%2B71EsEMAtzm6kV0Yu53brrZNdUSIEDQ%2F0vKhA8%3D&reserved=0 The guide details the TPC compute engine's architecture, how to write TPC kernels using the TPC-C language, etc. In addition, we have written a reference implementation of the SynapseAI API, called SynapseAI Core, and released its code under the MIT license to the open-source community at: https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FHabanaAI%2FSynapseAI_Core&data=04%7C01%7Cmzuckerman%40habana.ai%7C2380c349fa96487b7c6b08d9742c6fd6%7C0d4d4539213c4ed8a251dc9766ba127a%7C0%7C0%7C637668556484241172%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=sadSWIBvATo2oVvSWgzGujnYYxcfGMEUEDgnfsmy4AU%3D&reserved=0 SynapseAI Core contains all the necessary building blocks to run Deep Learning training on Gaudi, although not as optimized as the closed-source library. The project repository contains a couple of TPC kernels that implement basic DL operators. These kernels can serve as an example of how to implement more complex operators. To work with the Gaudi device, the library calls the Habanalabs kernel driver uAPI through the already open-source hl-thunk library at: https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FHabanaAI%2Fhl-thunk&data=04%7C01%7Cmzuckerman%40habana.ai%7C2380c349fa96487b7c6b08d9742c6fd6%7C0d4d4539213c4ed8a251dc9766ba127a%7C0%7C0%7C637668556484241172%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=VocvcnvoLz36biT58GxWTn%2FxiUD%2BqG2WRD9EEOEDNb8%3D&reserved=0 Moreover, the library contains a few tests (and more will follow soon) that demonstrate how to use the SynapseAI API to run workloads which utilize the TPC engines on Gaudi devices. We provided a short readme that explains how to build and run the included tests. It is important to note we provided all the necessary APIs to connect this library to any Deep Learning frameworks by writing appropriate backends in the frameworks and by writing more TPC kernels to implement the different operators. Once the driver(s) for the Gaudi NIC ports will be upstreamed, this library may be used together with IBverbs to perform training on multiple Gaudi devices. Thanks, Oded ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2021-10-28 12:01 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-09-10 7:26 Habanalabs Open-Source TPC LLVM compiler and SynapseAI Core library Oded Gabbay 2021-09-10 7:58 ` Greg Kroah-Hartman 2021-09-10 16:09 ` Daniel Vetter 2021-09-10 16:10 ` Daniel Vetter 2021-09-10 16:10 ` Daniel Vetter 2021-09-12 13:55 ` Accelerator drivers going forward (was Re: Habanalabs Open-Source TPC LLVM compiler and SynapseAI Core library) Greg Kroah-Hartman 2021-09-12 16:37 ` Simon Ser 2021-09-12 19:32 ` Dave Airlie 2021-09-12 19:32 ` Dave Airlie 2021-09-14 8:42 ` Oded Gabbay 2021-09-14 8:42 ` Oded Gabbay 2021-10-27 6:53 ` Habanalabs Open-Source TPC LLVM compiler and SynapseAI Core library Oded Gabbay 2021-10-28 7:38 ` Daniel Vetter 2021-10-28 12:00 ` Oded Gabbay 2021-09-12 7:38 ` Michael Zuckerman
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.