* New subsystem for acceleration devices @ 2022-07-31 11:45 ` Oded Gabbay 2022-07-31 15:37 ` Greg Kroah-Hartman ` (2 more replies) 0 siblings, 3 replies; 78+ messages in thread From: Oded Gabbay @ 2022-07-31 11:45 UTC (permalink / raw) To: Yuji Ishikawa, Jiho Chu Cc: Greg Kroah-Hartman, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org Hi, Greg and I talked a couple of months ago about preparing a new accel subsystem for compute/acceleration devices that are not GPUs and I think your drivers that you are now trying to upstream fit it as well. Would you be open/interested in migrating your drivers to this new subsystem ? Because there were no outstanding candidates, I have done so far only a basic and partial implementation of the infrastructure for this subsystem, but if you are willing to join I believe I can finish it rather quickly. At start, the new subsystem will provide only a common device character (e.g. /dev/acX) so everyone will do open/close/ioctl on the same device character. Also sysfs/debugfs entries will be under that device and maybe an IOCTL to retrieve information. In the future I plan to move some of habanalabs driver's code into the subsystem itself, for common tasks such as memory management, dma memory allocation, etc. Of course, you will be able to add your own IOCTLs as you see fit. There will be a range of IOCTLs which are device-specific (similar to drm). wdyt ? Thanks, Oded ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-07-31 11:45 ` New subsystem for acceleration devices Oded Gabbay @ 2022-07-31 15:37 ` Greg Kroah-Hartman 2022-08-01 2:29 ` yuji2.ishikawa 2022-08-02 17:25 ` Jiho Chu 2022-08-03 19:04 ` Dave Airlie 2 siblings, 1 reply; 78+ messages in thread From: Greg Kroah-Hartman @ 2022-07-31 15:37 UTC (permalink / raw) To: Oded Gabbay Cc: Yuji Ishikawa, Jiho Chu, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org On Sun, Jul 31, 2022 at 02:45:34PM +0300, Oded Gabbay wrote: > Hi, > Greg and I talked a couple of months ago about preparing a new accel > subsystem for compute/acceleration devices that are not GPUs and I > think your drivers that you are now trying to upstream fit it as well. > > Would you be open/interested in migrating your drivers to this new subsystem ? > > Because there were no outstanding candidates, I have done so far only > a basic and partial implementation of the infrastructure for this > subsystem, but if you are willing to join I believe I can finish it > rather quickly. > > At start, the new subsystem will provide only a common device > character (e.g. /dev/acX) so everyone will do open/close/ioctl on the > same device character. Also sysfs/debugfs entries will be under that > device and maybe an IOCTL to retrieve information. > > In the future I plan to move some of habanalabs driver's code into the > subsystem itself, for common tasks such as memory management, dma > memory allocation, etc. > > Of course, you will be able to add your own IOCTLs as you see fit. > There will be a range of IOCTLs which are device-specific (similar to > drm). > > wdyt ? That sounds reasonable to me as a sane plan forward, thanks for working on this. greg k-h ^ permalink raw reply [flat|nested] 78+ messages in thread
* RE: New subsystem for acceleration devices 2022-07-31 15:37 ` Greg Kroah-Hartman @ 2022-08-01 2:29 ` yuji2.ishikawa 2022-08-01 8:21 ` Oded Gabbay 0 siblings, 1 reply; 78+ messages in thread From: yuji2.ishikawa @ 2022-08-01 2:29 UTC (permalink / raw) To: gregkh, oded.gabbay; +Cc: jiho.chu, arnd, linux-kernel > -----Original Message----- > From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Sent: Monday, August 1, 2022 12:38 AM > To: Oded Gabbay <oded.gabbay@gmail.com> > Cc: ishikawa yuji(石川 悠司 ○RDC□AITC○EA開) > <yuji2.ishikawa@toshiba.co.jp>; Jiho Chu <jiho.chu@samsung.com>; Arnd > Bergmann <arnd@arndb.de>; Linux-Kernel@Vger. Kernel. Org > <linux-kernel@vger.kernel.org> > Subject: Re: New subsystem for acceleration devices > > On Sun, Jul 31, 2022 at 02:45:34PM +0300, Oded Gabbay wrote: > > Hi, > > Greg and I talked a couple of months ago about preparing a new accel > > subsystem for compute/acceleration devices that are not GPUs and I > > think your drivers that you are now trying to upstream fit it as well. > > > > Would you be open/interested in migrating your drivers to this new > subsystem ? > > > > Because there were no outstanding candidates, I have done so far only > > a basic and partial implementation of the infrastructure for this > > subsystem, but if you are willing to join I believe I can finish it > > rather quickly. > > > > At start, the new subsystem will provide only a common device > > character (e.g. /dev/acX) so everyone will do open/close/ioctl on the > > same device character. Also sysfs/debugfs entries will be under that > > device and maybe an IOCTL to retrieve information. > > > > In the future I plan to move some of habanalabs driver's code into the > > subsystem itself, for common tasks such as memory management, dma > > memory allocation, etc. > > > > Of course, you will be able to add your own IOCTLs as you see fit. > > There will be a range of IOCTLs which are device-specific (similar to > > drm). > > > > wdyt ? > > That sounds reasonable to me as a sane plan forward, thanks for working on > this. > > greg k-h Hi, Thank you for your suggestion. I'm really interested in the idea to have a dedicated subsystem for accelerators. Let me challenge if the Visconti DNN driver can be re-written to the framework. As Visconti SoC has several accelerators as well as DNN, having a general/common interface will be a big benefit for us to maintain drivers for a long time. I've heard that the framework has some basic implementation. Do you have some sample code or enumeration of idea to describe the framework? Would you share them with me? if that does not bother you. If you need further information on the current DNN driver, please let me know. Also, I can provide some for other accelerator drivers which are currently under cleaning up for contribution. Regards, Yuji ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-01 2:29 ` yuji2.ishikawa @ 2022-08-01 8:21 ` Oded Gabbay 2022-08-03 4:39 ` yuji2.ishikawa 0 siblings, 1 reply; 78+ messages in thread From: Oded Gabbay @ 2022-08-01 8:21 UTC (permalink / raw) To: Yuji Ishikawa Cc: Greg Kroah-Hartman, Jiho Chu, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org On Mon, Aug 1, 2022 at 5:35 AM <yuji2.ishikawa@toshiba.co.jp> wrote: > > > -----Original Message----- > > From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > > Sent: Monday, August 1, 2022 12:38 AM > > To: Oded Gabbay <oded.gabbay@gmail.com> > > Cc: ishikawa yuji(石川 悠司 ○RDC□AITC○EA開) > > <yuji2.ishikawa@toshiba.co.jp>; Jiho Chu <jiho.chu@samsung.com>; Arnd > > Bergmann <arnd@arndb.de>; Linux-Kernel@Vger. Kernel. Org > > <linux-kernel@vger.kernel.org> > > Subject: Re: New subsystem for acceleration devices > > > > On Sun, Jul 31, 2022 at 02:45:34PM +0300, Oded Gabbay wrote: > > > Hi, > > > Greg and I talked a couple of months ago about preparing a new accel > > > subsystem for compute/acceleration devices that are not GPUs and I > > > think your drivers that you are now trying to upstream fit it as well. > > > > > > Would you be open/interested in migrating your drivers to this new > > subsystem ? > > > > > > Because there were no outstanding candidates, I have done so far only > > > a basic and partial implementation of the infrastructure for this > > > subsystem, but if you are willing to join I believe I can finish it > > > rather quickly. > > > > > > At start, the new subsystem will provide only a common device > > > character (e.g. /dev/acX) so everyone will do open/close/ioctl on the > > > same device character. Also sysfs/debugfs entries will be under that > > > device and maybe an IOCTL to retrieve information. > > > > > > In the future I plan to move some of habanalabs driver's code into the > > > subsystem itself, for common tasks such as memory management, dma > > > memory allocation, etc. > > > > > > Of course, you will be able to add your own IOCTLs as you see fit. > > > There will be a range of IOCTLs which are device-specific (similar to > > > drm). > > > > > > wdyt ? > > > > That sounds reasonable to me as a sane plan forward, thanks for working on > > this. > > > > greg k-h > > Hi, > Thank you for your suggestion. > I'm really interested in the idea to have a dedicated subsystem for accelerators. > Let me challenge if the Visconti DNN driver can be re-written to the framework. > As Visconti SoC has several accelerators as well as DNN, > having a general/common interface will be a big benefit for us to maintain drivers for a long time. > > I've heard that the framework has some basic implementation. > Do you have some sample code or enumeration of idea to describe the framework? > Would you share them with me? if that does not bother you. > Great to hear that! I will need a couple of days to complete the code and clean it up because I stopped working on it a couple of months ago as there were no other candidates at the time. Once I have it ready I will put it in my linux repo and send you a branch name. In the meantime, I'd like to describe at a high level how this framework is going to work. I'll start with the main theme of this framework which is allowing maximum flexibility to the different device drivers. This is because in my opinion there is and always will be a large variance between different compute accelerators, which stems from the fact their h/w is designed for different purposes. I believe that it would be nearly impossible to create a standard acceleration API that will be applicable to all compute accelerators. Having said that, there are many things that can be common. I'll just name here a few things: - Exposing devices in a uniform way (same device character files) and managing the major/minor/open/close (with callbacks to the open/close of the device driver) - Defining a standard user API for monitoring applications that usually run in a data-center. There is a big difference between the acceleration uAPI and a monitoring uAPI and while the former is highly specialized, the latter can be standardized. - Common implementation of a memory manager that will give services of allocating kernel memory that can be mmap'ed by the user. - For devices with on-chip MMU, common memory manager for allocating device memory and managing it. - Integration with dma-buf for interoperability with other drivers. The initial implementation of the framework will expose two device character files per device. One will be used for the main/compute operations and the other one will be used for monitoring applications. This is done in case there are some limitations on the main device file (e.g. allowing only a single compute application to be connected to the device). Each driver will call a register function during its probe function (very similar to what is done in the drm subsystem). This will register the device in the accel framework and expose the device character files. Every call on those files (e.g. open, ioctl) will then go through the accel subsystem first and then will go to the callbacks of the specific device drivers. The framework will take care of allocating major and minor numbers and handle these issues in a uniform way. For now, I plan to define a single ioctl in the common part, which is an information ioctl, allowing the user to retrieve various information on the device (fixed or dynamic). I don't want to jump ahead of myself and define other common ioctls before gaining some traction. As I wrote above, each driver will be allowed to define its own custom ioctls. There will be an infrastructure to add custom sysfs and debugfs entries. Once we start working, I'm sure we will find some properties that we can move to the infrastructure itself (e.g. asking to reset the device, operational status) I don't know if I want to add the memory manager we prepared in habanalabs driver at day one, because I would like to minimize the effort we have to convert our drivers to using this framework. From the above, I believe you can see the current effort is not large. That's it from high-level pov. As I wrote at the beginning, I'll get back to you in a few days with a link to a git repo containing the initial implementation. > If you need further information on the current DNN driver, please let me know. > Also, I can provide some for other accelerator drivers which are currently under cleaning up for contribution. I looked at the patches you sent and I believe it will be a good match for this framework. If you have additional drivers that you think can be a goot match, by all means, please send links to me. Thanks, Oded ^ permalink raw reply [flat|nested] 78+ messages in thread
* RE: New subsystem for acceleration devices 2022-08-01 8:21 ` Oded Gabbay @ 2022-08-03 4:39 ` yuji2.ishikawa 2022-08-03 5:34 ` Greg KH 2022-08-03 20:28 ` Oded Gabbay 0 siblings, 2 replies; 78+ messages in thread From: yuji2.ishikawa @ 2022-08-03 4:39 UTC (permalink / raw) To: ogabbay; +Cc: gregkh, jiho.chu, arnd, linux-kernel > -----Original Message----- > From: Oded Gabbay <ogabbay@kernel.org> > Sent: Monday, August 1, 2022 5:21 PM > To: ishikawa yuji(石川 悠司 ○RDC□AITC○EA開) > <yuji2.ishikawa@toshiba.co.jp> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>; Jiho Chu > <jiho.chu@samsung.com>; Arnd Bergmann <arnd@arndb.de>; > Linux-Kernel@Vger. Kernel. Org <linux-kernel@vger.kernel.org> > Subject: Re: New subsystem for acceleration devices > > On Mon, Aug 1, 2022 at 5:35 AM <yuji2.ishikawa@toshiba.co.jp> wrote: > > > > > -----Original Message----- > > > From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > > > Sent: Monday, August 1, 2022 12:38 AM > > > To: Oded Gabbay <oded.gabbay@gmail.com> > > > Cc: ishikawa yuji(石川 悠司 ○RDC□AITC○EA開) > > > <yuji2.ishikawa@toshiba.co.jp>; Jiho Chu <jiho.chu@samsung.com>; > > > Arnd Bergmann <arnd@arndb.de>; Linux-Kernel@Vger. Kernel. Org > > > <linux-kernel@vger.kernel.org> > > > Subject: Re: New subsystem for acceleration devices > > > > > > On Sun, Jul 31, 2022 at 02:45:34PM +0300, Oded Gabbay wrote: > > > > Hi, > > > > Greg and I talked a couple of months ago about preparing a new > > > > accel subsystem for compute/acceleration devices that are not GPUs > > > > and I think your drivers that you are now trying to upstream fit it as well. > > > > > > > > Would you be open/interested in migrating your drivers to this new > > > subsystem ? > > > > > > > > Because there were no outstanding candidates, I have done so far > > > > only a basic and partial implementation of the infrastructure for > > > > this subsystem, but if you are willing to join I believe I can > > > > finish it rather quickly. > > > > > > > > At start, the new subsystem will provide only a common device > > > > character (e.g. /dev/acX) so everyone will do open/close/ioctl on > > > > the same device character. Also sysfs/debugfs entries will be > > > > under that device and maybe an IOCTL to retrieve information. > > > > > > > > In the future I plan to move some of habanalabs driver's code into > > > > the subsystem itself, for common tasks such as memory management, > > > > dma memory allocation, etc. > > > > > > > > Of course, you will be able to add your own IOCTLs as you see fit. > > > > There will be a range of IOCTLs which are device-specific (similar > > > > to drm). > > > > > > > > wdyt ? > > > > > > That sounds reasonable to me as a sane plan forward, thanks for > > > working on this. > > > > > > greg k-h > > > > Hi, > > Thank you for your suggestion. > > I'm really interested in the idea to have a dedicated subsystem for > accelerators. > > Let me challenge if the Visconti DNN driver can be re-written to the > framework. > > As Visconti SoC has several accelerators as well as DNN, having a > > general/common interface will be a big benefit for us to maintain drivers for a > long time. > > > > I've heard that the framework has some basic implementation. > > Do you have some sample code or enumeration of idea to describe the > framework? > > Would you share them with me? if that does not bother you. > > > Great to hear that! > > I will need a couple of days to complete the code and clean it up because I > stopped working on it a couple of months ago as there were no other candidates > at the time. > Once I have it ready I will put it in my linux repo and send you a branch name. > > In the meantime, I'd like to describe at a high level how this framework is going > to work. > > I'll start with the main theme of this framework which is allowing maximum > flexibility to the different device drivers. This is because in my opinion there is > and always will be a large variance between different compute accelerators, > which stems from the fact their h/w is designed for different purposes. I believe > that it would be nearly impossible to create a standard acceleration API that will > be applicable to all compute accelerators. > > Having said that, there are many things that can be common. I'll just name here > a few things: > > - Exposing devices in a uniform way (same device character files) and > managing the major/minor/open/close (with callbacks to the open/close of the > device driver) > > - Defining a standard user API for monitoring applications that usually run in a > data-center. There is a big difference between the acceleration uAPI and a > monitoring uAPI and while the former is highly specialized, the latter can be > standardized. > > - Common implementation of a memory manager that will give services of > allocating kernel memory that can be > mmap'ed by the user. > > - For devices with on-chip MMU, common memory manager for allocating > device memory and managing it. > > - Integration with dma-buf for interoperability with other drivers. > > The initial implementation of the framework will expose two device character > files per device. One will be used for the main/compute operations and the > other one will be used for monitoring applications. > This is done in case there are some limitations on the main device file (e.g. > allowing only a single compute application to be connected to the device). > > Each driver will call a register function during its probe function (very similar to > what is done in the drm subsystem). This will register the device in the accel > framework and expose the device character files. Every call on those files (e.g. > open, ioctl) will then go through the accel subsystem first and then will go to > the callbacks of the specific device drivers. The framework will take care of > allocating major and minor numbers and handle these issues in a uniform way. > > For now, I plan to define a single ioctl in the common part, which is an > information ioctl, allowing the user to retrieve various information on the device > (fixed or dynamic). I don't want to jump ahead of myself and define other > common ioctls before gaining some traction. As I wrote above, each driver will > be allowed to define its own custom ioctls. > > There will be an infrastructure to add custom sysfs and debugfs entries. Once > we start working, I'm sure we will find some properties that we can move to the > infrastructure itself (e.g. asking to reset the device, operational status) > > I don't know if I want to add the memory manager we prepared in habanalabs > driver at day one, because I would like to minimize the effort we have to convert > our drivers to using this framework. From the above, I believe you can see the > current effort is not large. > > That's it from high-level pov. As I wrote at the beginning, I'll get back to you in a > few days with a link to a git repo containing the initial implementation. > Hi Oded, Thank you for sharing high level idea with me. I'm grad to hear that the DNN driver can be ported to the accelerator framework without too-much effort. I think we can start with minimum implementation and find better way. Currently the driver does not have API for monitoring. I'll prepare some logic to match the framework. Some of my interests are handling IOMMU and DMA-BUF. Currently the DNN driver has its own minimum implementation of those two. Will DMA-BUF feature of accelerator framework be provided soon? > > If you need further information on the current DNN driver, please let me know. > > Also, I can provide some for other accelerator drivers which are currently > under cleaning up for contribution. > I looked at the patches you sent and I believe it will be a good match for this > framework. > If you have additional drivers that you think can be a goot match, by all means, > please send links to me. > > Thanks, > Oded Visconti DSP driver might be interesting for you as it provides multiple functions and requires external firmware. The problem is that I don't have official public repository. I wonder if it's possible to post some pieces of code to this ML. Regards, Yuji ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-03 4:39 ` yuji2.ishikawa @ 2022-08-03 5:34 ` Greg KH 2022-08-03 20:28 ` Oded Gabbay 1 sibling, 0 replies; 78+ messages in thread From: Greg KH @ 2022-08-03 5:34 UTC (permalink / raw) To: yuji2.ishikawa; +Cc: ogabbay, jiho.chu, arnd, linux-kernel On Wed, Aug 03, 2022 at 04:39:12AM +0000, yuji2.ishikawa@toshiba.co.jp wrote: > Visconti DSP driver might be interesting for you as it provides multiple functions and requires external firmware. > The problem is that I don't have official public repository. > I wonder if it's possible to post some pieces of code to this ML. You can always post code to this mailing list, that's what it is here for! :) ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-03 4:39 ` yuji2.ishikawa 2022-08-03 5:34 ` Greg KH @ 2022-08-03 20:28 ` Oded Gabbay 1 sibling, 0 replies; 78+ messages in thread From: Oded Gabbay @ 2022-08-03 20:28 UTC (permalink / raw) To: Yuji Ishikawa Cc: Greg Kroah-Hartman, Jiho Chu, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org On Wed, Aug 3, 2022 at 7:44 AM <yuji2.ishikawa@toshiba.co.jp> wrote: > > > -----Original Message----- > > From: Oded Gabbay <ogabbay@kernel.org> > > Sent: Monday, August 1, 2022 5:21 PM > > To: ishikawa yuji(石川 悠司 ○RDC□AITC○EA開) > > <yuji2.ishikawa@toshiba.co.jp> > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>; Jiho Chu > > <jiho.chu@samsung.com>; Arnd Bergmann <arnd@arndb.de>; > > Linux-Kernel@Vger. Kernel. Org <linux-kernel@vger.kernel.org> > > Subject: Re: New subsystem for acceleration devices > > > > On Mon, Aug 1, 2022 at 5:35 AM <yuji2.ishikawa@toshiba.co.jp> wrote: > > > > > > > -----Original Message----- > > > > From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > > > > Sent: Monday, August 1, 2022 12:38 AM > > > > To: Oded Gabbay <oded.gabbay@gmail.com> > > > > Cc: ishikawa yuji(石川 悠司 ○RDC□AITC○EA開) > > > > <yuji2.ishikawa@toshiba.co.jp>; Jiho Chu <jiho.chu@samsung.com>; > > > > Arnd Bergmann <arnd@arndb.de>; Linux-Kernel@Vger. Kernel. Org > > > > <linux-kernel@vger.kernel.org> > > > > Subject: Re: New subsystem for acceleration devices > > > > > > > > On Sun, Jul 31, 2022 at 02:45:34PM +0300, Oded Gabbay wrote: > > > > > Hi, > > > > > Greg and I talked a couple of months ago about preparing a new > > > > > accel subsystem for compute/acceleration devices that are not GPUs > > > > > and I think your drivers that you are now trying to upstream fit it as well. > > > > > > > > > > Would you be open/interested in migrating your drivers to this new > > > > subsystem ? > > > > > > > > > > Because there were no outstanding candidates, I have done so far > > > > > only a basic and partial implementation of the infrastructure for > > > > > this subsystem, but if you are willing to join I believe I can > > > > > finish it rather quickly. > > > > > > > > > > At start, the new subsystem will provide only a common device > > > > > character (e.g. /dev/acX) so everyone will do open/close/ioctl on > > > > > the same device character. Also sysfs/debugfs entries will be > > > > > under that device and maybe an IOCTL to retrieve information. > > > > > > > > > > In the future I plan to move some of habanalabs driver's code into > > > > > the subsystem itself, for common tasks such as memory management, > > > > > dma memory allocation, etc. > > > > > > > > > > Of course, you will be able to add your own IOCTLs as you see fit. > > > > > There will be a range of IOCTLs which are device-specific (similar > > > > > to drm). > > > > > > > > > > wdyt ? > > > > > > > > That sounds reasonable to me as a sane plan forward, thanks for > > > > working on this. > > > > > > > > greg k-h > > > > > > Hi, > > > Thank you for your suggestion. > > > I'm really interested in the idea to have a dedicated subsystem for > > accelerators. > > > Let me challenge if the Visconti DNN driver can be re-written to the > > framework. > > > As Visconti SoC has several accelerators as well as DNN, having a > > > general/common interface will be a big benefit for us to maintain drivers for a > > long time. > > > > > > I've heard that the framework has some basic implementation. > > > Do you have some sample code or enumeration of idea to describe the > > framework? > > > Would you share them with me? if that does not bother you. > > > > > Great to hear that! > > > > I will need a couple of days to complete the code and clean it up because I > > stopped working on it a couple of months ago as there were no other candidates > > at the time. > > Once I have it ready I will put it in my linux repo and send you a branch name. > > > > In the meantime, I'd like to describe at a high level how this framework is going > > to work. > > > > I'll start with the main theme of this framework which is allowing maximum > > flexibility to the different device drivers. This is because in my opinion there is > > and always will be a large variance between different compute accelerators, > > which stems from the fact their h/w is designed for different purposes. I believe > > that it would be nearly impossible to create a standard acceleration API that will > > be applicable to all compute accelerators. > > > > Having said that, there are many things that can be common. I'll just name here > > a few things: > > > > - Exposing devices in a uniform way (same device character files) and > > managing the major/minor/open/close (with callbacks to the open/close of the > > device driver) > > > > - Defining a standard user API for monitoring applications that usually run in a > > data-center. There is a big difference between the acceleration uAPI and a > > monitoring uAPI and while the former is highly specialized, the latter can be > > standardized. > > > > - Common implementation of a memory manager that will give services of > > allocating kernel memory that can be > > mmap'ed by the user. > > > > - For devices with on-chip MMU, common memory manager for allocating > > device memory and managing it. > > > > - Integration with dma-buf for interoperability with other drivers. > > > > The initial implementation of the framework will expose two device character > > files per device. One will be used for the main/compute operations and the > > other one will be used for monitoring applications. > > This is done in case there are some limitations on the main device file (e.g. > > allowing only a single compute application to be connected to the device). > > > > Each driver will call a register function during its probe function (very similar to > > what is done in the drm subsystem). This will register the device in the accel > > framework and expose the device character files. Every call on those files (e.g. > > open, ioctl) will then go through the accel subsystem first and then will go to > > the callbacks of the specific device drivers. The framework will take care of > > allocating major and minor numbers and handle these issues in a uniform way. > > > > For now, I plan to define a single ioctl in the common part, which is an > > information ioctl, allowing the user to retrieve various information on the device > > (fixed or dynamic). I don't want to jump ahead of myself and define other > > common ioctls before gaining some traction. As I wrote above, each driver will > > be allowed to define its own custom ioctls. > > > > There will be an infrastructure to add custom sysfs and debugfs entries. Once > > we start working, I'm sure we will find some properties that we can move to the > > infrastructure itself (e.g. asking to reset the device, operational status) > > > > I don't know if I want to add the memory manager we prepared in habanalabs > > driver at day one, because I would like to minimize the effort we have to convert > > our drivers to using this framework. From the above, I believe you can see the > > current effort is not large. > > > > That's it from high-level pov. As I wrote at the beginning, I'll get back to you in a > > few days with a link to a git repo containing the initial implementation. > > > > Hi Oded, > > Thank you for sharing high level idea with me. > I'm grad to hear that the DNN driver can be ported to the accelerator framework without too-much effort. I think we can start with minimum implementation and find better way. > Currently the driver does not have API for monitoring. I'll prepare some logic to match the framework. > > Some of my interests are handling IOMMU and DMA-BUF. > Currently the DNN driver has its own minimum implementation of those two. > Will DMA-BUF feature of accelerator framework be provided soon? I didn't plan it to be on the initial release, but I'm sure we can make it a priority to be added soon after we do the initial integration and everything is accepted to the kernel. Oded > > > > If you need further information on the current DNN driver, please let me know. > > > Also, I can provide some for other accelerator drivers which are currently > > under cleaning up for contribution. > > I looked at the patches you sent and I believe it will be a good match for this > > framework. > > If you have additional drivers that you think can be a goot match, by all means, > > please send links to me. > > > > Thanks, > > Oded > > Visconti DSP driver might be interesting for you as it provides multiple functions and requires external firmware. > The problem is that I don't have official public repository. > I wonder if it's possible to post some pieces of code to this ML. > > Regards, > Yuji ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-07-31 11:45 ` New subsystem for acceleration devices Oded Gabbay 2022-07-31 15:37 ` Greg Kroah-Hartman @ 2022-08-02 17:25 ` Jiho Chu 2022-08-02 19:07 ` Oded Gabbay 2022-08-03 19:04 ` Dave Airlie 2 siblings, 1 reply; 78+ messages in thread From: Jiho Chu @ 2022-08-02 17:25 UTC (permalink / raw) To: Oded Gabbay Cc: Yuji Ishikawa, Greg Kroah-Hartman, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org On Sun, 31 Jul 2022 14:45:34 +0300 Oded Gabbay <oded.gabbay@gmail.com> wrote: > Hi, > Greg and I talked a couple of months ago about preparing a new accel > subsystem for compute/acceleration devices that are not GPUs and I > think your drivers that you are now trying to upstream fit it as well. > > Would you be open/interested in migrating your drivers to this new subsystem ? > > Because there were no outstanding candidates, I have done so far only > a basic and partial implementation of the infrastructure for this > subsystem, but if you are willing to join I believe I can finish it > rather quickly. > > At start, the new subsystem will provide only a common device > character (e.g. /dev/acX) so everyone will do open/close/ioctl on the > same device character. Also sysfs/debugfs entries will be under that > device and maybe an IOCTL to retrieve information. > > In the future I plan to move some of habanalabs driver's code into the > subsystem itself, for common tasks such as memory management, dma > memory allocation, etc. > > Of course, you will be able to add your own IOCTLs as you see fit. > There will be a range of IOCTLs which are device-specific (similar to > drm). > > wdyt ? > > Thanks, > Oded > Hi, Oded. Thanks for sharing your idea. And I'm really positive on the subsystem for ai acceleration devices. Samsung NPU driver is trying to upstream now, so I wonder new subsystem can cover all of the operations. I'll appreciate if you share the code when you ready, so I can figure out our driver can be migrated. Sincerely, Jiho Chu ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-02 17:25 ` Jiho Chu @ 2022-08-02 19:07 ` Oded Gabbay 0 siblings, 0 replies; 78+ messages in thread From: Oded Gabbay @ 2022-08-02 19:07 UTC (permalink / raw) To: Jiho Chu Cc: Yuji Ishikawa, Greg Kroah-Hartman, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org On Tue, Aug 2, 2022 at 8:25 PM Jiho Chu <jiho.chu@samsung.com> wrote: > > On Sun, 31 Jul 2022 14:45:34 +0300 > Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > Hi, > > Greg and I talked a couple of months ago about preparing a new accel > > subsystem for compute/acceleration devices that are not GPUs and I > > think your drivers that you are now trying to upstream fit it as well. > > > > Would you be open/interested in migrating your drivers to this new subsystem ? > > > > Because there were no outstanding candidates, I have done so far only > > a basic and partial implementation of the infrastructure for this > > subsystem, but if you are willing to join I believe I can finish it > > rather quickly. > > > > At start, the new subsystem will provide only a common device > > character (e.g. /dev/acX) so everyone will do open/close/ioctl on the > > same device character. Also sysfs/debugfs entries will be under that > > device and maybe an IOCTL to retrieve information. > > > > In the future I plan to move some of habanalabs driver's code into the > > subsystem itself, for common tasks such as memory management, dma > > memory allocation, etc. > > > > Of course, you will be able to add your own IOCTLs as you see fit. > > There will be a range of IOCTLs which are device-specific (similar to > > drm). > > > > wdyt ? > > > > Thanks, > > Oded > > > > Hi, Oded. > Thanks for sharing your idea. And I'm really positive on the subsystem for ai acceleration devices. > > Samsung NPU driver is trying to upstream now, so I wonder new subsystem can cover all of the operations. > I'll appreciate if you share the code when you ready, so I can figure out our driver can be migrated. I'm working on it. I'm also preparing a small demo driver so you will be able to better understand how to integrate a driver with this subsystem. I believe I will be able to post the link during the weekend. Oded > > Sincerely, > Jiho Chu > > > > > ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-07-31 11:45 ` New subsystem for acceleration devices Oded Gabbay 2022-07-31 15:37 ` Greg Kroah-Hartman 2022-08-02 17:25 ` Jiho Chu @ 2022-08-03 19:04 ` Dave Airlie 2022-08-03 20:20 ` Oded Gabbay 2 siblings, 1 reply; 78+ messages in thread From: Dave Airlie @ 2022-08-03 19:04 UTC (permalink / raw) To: Oded Gabbay Cc: Yuji Ishikawa, Jiho Chu, Greg Kroah-Hartman, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org On Sun, 31 Jul 2022 at 22:04, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > Hi, > Greg and I talked a couple of months ago about preparing a new accel > subsystem for compute/acceleration devices that are not GPUs and I > think your drivers that you are now trying to upstream fit it as well. We've had some submissions for not-GPUs to the drm subsystem recently. Intel GNA, Intel VPU, NVDLA, rpmsg AI processor unit. why is creating a new subsystem at this time necessary? Are we just creating a subsystem to avoid the open source userspace consumer rules? Or do we have some concrete reasoning behind it? Dave. ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-03 19:04 ` Dave Airlie @ 2022-08-03 20:20 ` Oded Gabbay 2022-08-03 23:31 ` Daniel Stone 2022-08-03 23:54 ` Dave Airlie 0 siblings, 2 replies; 78+ messages in thread From: Oded Gabbay @ 2022-08-03 20:20 UTC (permalink / raw) To: Dave Airlie, Greg Kroah-Hartman Cc: Yuji Ishikawa, Jiho Chu, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org On Wed, Aug 3, 2022 at 10:04 PM Dave Airlie <airlied@gmail.com> wrote: > > On Sun, 31 Jul 2022 at 22:04, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > > Hi, > > Greg and I talked a couple of months ago about preparing a new accel > > subsystem for compute/acceleration devices that are not GPUs and I > > think your drivers that you are now trying to upstream fit it as well. > > We've had some submissions for not-GPUs to the drm subsystem recently. > > Intel GNA, Intel VPU, NVDLA, rpmsg AI processor unit. > > why is creating a new subsystem at this time necessary? > > Are we just creating a subsystem to avoid the open source userspace > consumer rules? Or do we have some concrete reasoning behind it? > > Dave. Hi Dave. The reason it happened now is because I saw two drivers, which are doing h/w acceleration for AI, trying to be accepted to the misc subsystem. Add to that the fact I talked with Greg a couple of months ago about doing a subsystem for any compute accelerators, which he was positive about, I thought it is a good opportunity to finally do it. I also honestly think that I can contribute much to these drivers from my experience with the habana driver (which is now deployed in mass at AWS) and contribute code from the habana driver to a common framework for AI drivers. Regarding the open source userspace rules in drm - yes, I think your rules are too limiting for the relatively young AI scene, and I saw at the 2021 kernel summit that other people from the kernel community think that as well. But that's not the main reason, or even a reason at all for doing this. After all, at least for habana, we open-sourced our compiler and a runtime library. And Greg also asked those two drivers if they have matching open-sourced user-space code. And a final reason is that I thought this can also help in somewhat reducing the workload on Greg. I saw in the last kernel summit there was a concern about bringing more people to be kernel maintainers so I thought this is a step in the right direction. Oded ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-03 20:20 ` Oded Gabbay @ 2022-08-03 23:31 ` Daniel Stone 2022-08-04 6:46 ` Oded Gabbay 2022-08-03 23:54 ` Dave Airlie 1 sibling, 1 reply; 78+ messages in thread From: Daniel Stone @ 2022-08-03 23:31 UTC (permalink / raw) To: Oded Gabbay Cc: Dave Airlie, Greg Kroah-Hartman, Yuji Ishikawa, Jiho Chu, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org Hi Oded, On Wed, 3 Aug 2022 at 21:21, Oded Gabbay <oded.gabbay@gmail.com> wrote: > The reason it happened now is because I saw two drivers, which are > doing h/w acceleration for AI, trying to be accepted to the misc > subsystem. Why misc? > Regarding the open source userspace rules in drm - yes, I think your > rules are too limiting for the relatively young AI scene, and I saw at > the 2021 kernel summit that other people from the kernel community > think that as well. > But that's not the main reason, or even a reason at all for doing > this. After all, at least for habana, we open-sourced our compiler and > a runtime library. And Greg also asked those two drivers if they have > matching open-sourced user-space code. > > And a final reason is that I thought this can also help in somewhat > reducing the workload on Greg. I saw in the last kernel summit there > was a concern about bringing more people to be kernel maintainers so I > thought this is a step in the right direction. Can you please explain what the reason is here? Everything you have described - uniform device enumeration, common job description, memory management helpers, unique job submission format, etc - applies exactly to DRM. If open userspace is not a requirement, and bypassing Greg's manual merging is a requirement, then I don't see what the difference is between DRM and this new bespoke subsystem. It would be great to have these differences enumerated in email as well as in kerneldoc. Cheers, Daniel ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-03 23:31 ` Daniel Stone @ 2022-08-04 6:46 ` Oded Gabbay 2022-08-04 9:27 ` Jiho Chu 0 siblings, 1 reply; 78+ messages in thread From: Oded Gabbay @ 2022-08-04 6:46 UTC (permalink / raw) To: Daniel Stone Cc: Dave Airlie, Greg Kroah-Hartman, Yuji Ishikawa, Jiho Chu, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org On Thu, Aug 4, 2022 at 2:32 AM Daniel Stone <daniel@fooishbar.org> wrote: > > Hi Oded, > > On Wed, 3 Aug 2022 at 21:21, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > The reason it happened now is because I saw two drivers, which are > > doing h/w acceleration for AI, trying to be accepted to the misc > > subsystem. > > Why misc? You will need to ask them ;) Seriously, I guess they thought they were not gpu drivers and didn't find anything else to go to. And at least for one of them, I remember Greg and Arnd pointing them to misc. > > > Regarding the open source userspace rules in drm - yes, I think your > > rules are too limiting for the relatively young AI scene, and I saw at > > the 2021 kernel summit that other people from the kernel community > > think that as well. > > But that's not the main reason, or even a reason at all for doing > > this. After all, at least for habana, we open-sourced our compiler and > > a runtime library. And Greg also asked those two drivers if they have > > matching open-sourced user-space code. > > > > And a final reason is that I thought this can also help in somewhat > > reducing the workload on Greg. I saw in the last kernel summit there > > was a concern about bringing more people to be kernel maintainers so I > > thought this is a step in the right direction. > > Can you please explain what the reason is here? > > Everything you have described - uniform device enumeration, common job > description, memory management helpers, unique job submission format, > etc - applies exactly to DRM. If open userspace is not a requirement, > and bypassing Greg's manual merging is a requirement, then I don't see > what the difference is between DRM and this new bespoke subsystem. It > would be great to have these differences enumerated in email as well > as in kerneldoc. I don't think preparing such a list at this point is relevant, because I don't have a full-featured subsystem ready, which I can take and list all its features and compare it with drm. I have a beginning of a subsystem, with very minimal common code, and I planned for it to grow with time and with the relevant participants. And regarding the serspace issue, I believe it will be less stringent than in drm. For example, afaik in drm you must upstream your LLVM fork to the mainline LLVM tree. This is something that is really a heavy-lifting task for most, if not all, companies. So this is a requirement I think we can forgo. Thanks, Oded > > Cheers, > Daniel ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-04 6:46 ` Oded Gabbay @ 2022-08-04 9:27 ` Jiho Chu 0 siblings, 0 replies; 78+ messages in thread From: Jiho Chu @ 2022-08-04 9:27 UTC (permalink / raw) To: Oded Gabbay Cc: Daniel Stone, Dave Airlie, Greg Kroah-Hartman, Yuji Ishikawa, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org On Thu, 4 Aug 2022 09:46:49 +0300 Oded Gabbay <oded.gabbay@gmail.com> wrote: > On Thu, Aug 4, 2022 at 2:32 AM Daniel Stone <daniel@fooishbar.org> wrote: > > > > Hi Oded, > > > > On Wed, 3 Aug 2022 at 21:21, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > The reason it happened now is because I saw two drivers, which are > > > doing h/w acceleration for AI, trying to be accepted to the misc > > > subsystem. > > > > Why misc? > You will need to ask them ;) > Seriously, I guess they thought they were not gpu drivers and didn't > find anything else to go to. > And at least for one of them, I remember Greg and Arnd pointing them to misc. > Hi, Daniel. Samsung NPU driver is one of the trier to be a misc device. There is some reasons that it chooses misc, but it can be simply said that GPU was not a perfect suit for NPU. AI workload is not limited in graphical job, it can be NLP, data analysis or training job. The GPU/DRM can work for them, but its description is not for them. e.g. AI workloads needs to manage ai model data as well as input data. I guess it can be working with GEM object, and needs to be expaned for model information. But I have a question that DRM accept this specialized GEM, thus it's not related to Graphics. Other subsystem was simliar, so I only could choose misc device. IMHO, at the same reason, I'm positive on Oded's working, expecting that the new subsystem could be more specialized for AI workload. thanks, Jiho > > > > > Regarding the open source userspace rules in drm - yes, I think your > > > rules are too limiting for the relatively young AI scene, and I saw at > > > the 2021 kernel summit that other people from the kernel community > > > think that as well. > > > But that's not the main reason, or even a reason at all for doing > > > this. After all, at least for habana, we open-sourced our compiler and > > > a runtime library. And Greg also asked those two drivers if they have > > > matching open-sourced user-space code. > > > > > > And a final reason is that I thought this can also help in somewhat > > > reducing the workload on Greg. I saw in the last kernel summit there > > > was a concern about bringing more people to be kernel maintainers so I > > > thought this is a step in the right direction. > > > > Can you please explain what the reason is here? > > > > Everything you have described - uniform device enumeration, common job > > description, memory management helpers, unique job submission format, > > etc - applies exactly to DRM. If open userspace is not a requirement, > > and bypassing Greg's manual merging is a requirement, then I don't see > > what the difference is between DRM and this new bespoke subsystem. It > > would be great to have these differences enumerated in email as well > > as in kerneldoc. > I don't think preparing such a list at this point is relevant, because > I don't have a full-featured subsystem ready, which I can take and > list all its features and compare it with drm. > I have a beginning of a subsystem, with very minimal common code, and > I planned for it to grow with time and with the relevant participants. > > And regarding the serspace issue, I believe it will be less stringent > than in drm. > For example, afaik in drm you must upstream your LLVM fork to the > mainline LLVM tree. This is something that is really a heavy-lifting > task for most, if not all, companies. > So this is a requirement I think we can forgo. > > Thanks, > Oded > > > > > Cheers, > > Daniel > ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-03 20:20 ` Oded Gabbay @ 2022-08-03 23:54 ` Dave Airlie 2022-08-03 23:54 ` Dave Airlie 1 sibling, 0 replies; 78+ messages in thread From: Dave Airlie @ 2022-08-03 23:54 UTC (permalink / raw) To: Oded Gabbay, dri-devel Cc: Greg Kroah-Hartman, Yuji Ishikawa, Jiho Chu, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org On Thu, 4 Aug 2022 at 06:21, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > On Wed, Aug 3, 2022 at 10:04 PM Dave Airlie <airlied@gmail.com> wrote: > > > > On Sun, 31 Jul 2022 at 22:04, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > > > > Hi, > > > Greg and I talked a couple of months ago about preparing a new accel > > > subsystem for compute/acceleration devices that are not GPUs and I > > > think your drivers that you are now trying to upstream fit it as well. > > > > We've had some submissions for not-GPUs to the drm subsystem recently. > > > > Intel GNA, Intel VPU, NVDLA, rpmsg AI processor unit. > > > > why is creating a new subsystem at this time necessary? > > > > Are we just creating a subsystem to avoid the open source userspace > > consumer rules? Or do we have some concrete reasoning behind it? > > > > Dave. > > Hi Dave. > The reason it happened now is because I saw two drivers, which are > doing h/w acceleration for AI, trying to be accepted to the misc > subsystem. > Add to that the fact I talked with Greg a couple of months ago about > doing a subsystem for any compute accelerators, which he was positive > about, I thought it is a good opportunity to finally do it. > > I also honestly think that I can contribute much to these drivers from > my experience with the habana driver (which is now deployed in mass at > AWS) and contribute code from the habana driver to a common framework > for AI drivers. Why not port the habana driver to drm now instead? I don't get why it wouldn't make sense? Stepping up to create a new subsystem is great, but we need rules around what belongs where, we can't just spawn new subsystems when we have no clear guidelines on where drivers should land. What are the rules for a new accel subsystem? Do we have to now retarget the 3 drivers that are queued up to use drm for accelerators, because 2 drivers don't? There's a lot more to figure out here than merge some structures and it will be fine. I think the one area I can see a divide where a new subsystem is for accelerators that are single-user, one shot type things like media drivers (though maybe they could be just media drivers). I think anything that does command offloading to firmware or queues belongs in drm, because that is pretty much what the framework does. I think it might make sense to enhance some parts of drm to fit things in better, but that shouldn't block things getting started. I'm considering if, we should add an accelerator staging area to drm and land the 2-3 submissions we have and try and steer things towards commonality that way instead of holding them out of tree. I'd like to offload things from Greg by just not having people submit misc drivers at all for things that should go elsewhere. Dave. > > Regarding the open source userspace rules in drm - yes, I think your > rules are too limiting for the relatively young AI scene, and I saw at > the 2021 kernel summit that other people from the kernel community > think that as well. > But that's not the main reason, or even a reason at all for doing > this. After all, at least for habana, we open-sourced our compiler and > a runtime library. And Greg also asked those two drivers if they have > matching open-sourced user-space code. > > And a final reason is that I thought this can also help in somewhat > reducing the workload on Greg. I saw in the last kernel summit there > was a concern about bringing more people to be kernel maintainers so I > thought this is a step in the right direction. > > Oded ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-08-03 23:54 ` Dave Airlie 0 siblings, 0 replies; 78+ messages in thread From: Dave Airlie @ 2022-08-03 23:54 UTC (permalink / raw) To: Oded Gabbay, dri-devel Cc: Greg Kroah-Hartman, Yuji Ishikawa, Linux-Kernel@Vger. Kernel. Org, Arnd Bergmann, Jiho Chu On Thu, 4 Aug 2022 at 06:21, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > On Wed, Aug 3, 2022 at 10:04 PM Dave Airlie <airlied@gmail.com> wrote: > > > > On Sun, 31 Jul 2022 at 22:04, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > > > > Hi, > > > Greg and I talked a couple of months ago about preparing a new accel > > > subsystem for compute/acceleration devices that are not GPUs and I > > > think your drivers that you are now trying to upstream fit it as well. > > > > We've had some submissions for not-GPUs to the drm subsystem recently. > > > > Intel GNA, Intel VPU, NVDLA, rpmsg AI processor unit. > > > > why is creating a new subsystem at this time necessary? > > > > Are we just creating a subsystem to avoid the open source userspace > > consumer rules? Or do we have some concrete reasoning behind it? > > > > Dave. > > Hi Dave. > The reason it happened now is because I saw two drivers, which are > doing h/w acceleration for AI, trying to be accepted to the misc > subsystem. > Add to that the fact I talked with Greg a couple of months ago about > doing a subsystem for any compute accelerators, which he was positive > about, I thought it is a good opportunity to finally do it. > > I also honestly think that I can contribute much to these drivers from > my experience with the habana driver (which is now deployed in mass at > AWS) and contribute code from the habana driver to a common framework > for AI drivers. Why not port the habana driver to drm now instead? I don't get why it wouldn't make sense? Stepping up to create a new subsystem is great, but we need rules around what belongs where, we can't just spawn new subsystems when we have no clear guidelines on where drivers should land. What are the rules for a new accel subsystem? Do we have to now retarget the 3 drivers that are queued up to use drm for accelerators, because 2 drivers don't? There's a lot more to figure out here than merge some structures and it will be fine. I think the one area I can see a divide where a new subsystem is for accelerators that are single-user, one shot type things like media drivers (though maybe they could be just media drivers). I think anything that does command offloading to firmware or queues belongs in drm, because that is pretty much what the framework does. I think it might make sense to enhance some parts of drm to fit things in better, but that shouldn't block things getting started. I'm considering if, we should add an accelerator staging area to drm and land the 2-3 submissions we have and try and steer things towards commonality that way instead of holding them out of tree. I'd like to offload things from Greg by just not having people submit misc drivers at all for things that should go elsewhere. Dave. > > Regarding the open source userspace rules in drm - yes, I think your > rules are too limiting for the relatively young AI scene, and I saw at > the 2021 kernel summit that other people from the kernel community > think that as well. > But that's not the main reason, or even a reason at all for doing > this. After all, at least for habana, we open-sourced our compiler and > a runtime library. And Greg also asked those two drivers if they have > matching open-sourced user-space code. > > And a final reason is that I thought this can also help in somewhat > reducing the workload on Greg. I saw in the last kernel summit there > was a concern about bringing more people to be kernel maintainers so I > thought this is a step in the right direction. > > Oded ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-03 23:54 ` Dave Airlie @ 2022-08-04 7:43 ` Oded Gabbay -1 siblings, 0 replies; 78+ messages in thread From: Oded Gabbay @ 2022-08-04 7:43 UTC (permalink / raw) To: Dave Airlie Cc: dri-devel, Greg Kroah-Hartman, Yuji Ishikawa, Jiho Chu, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org On Thu, Aug 4, 2022 at 2:54 AM Dave Airlie <airlied@gmail.com> wrote: > > On Thu, 4 Aug 2022 at 06:21, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > > On Wed, Aug 3, 2022 at 10:04 PM Dave Airlie <airlied@gmail.com> wrote: > > > > > > On Sun, 31 Jul 2022 at 22:04, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > > > > > > Hi, > > > > Greg and I talked a couple of months ago about preparing a new accel > > > > subsystem for compute/acceleration devices that are not GPUs and I > > > > think your drivers that you are now trying to upstream fit it as well. > > > > > > We've had some submissions for not-GPUs to the drm subsystem recently. > > > > > > Intel GNA, Intel VPU, NVDLA, rpmsg AI processor unit. > > > > > > why is creating a new subsystem at this time necessary? > > > > > > Are we just creating a subsystem to avoid the open source userspace > > > consumer rules? Or do we have some concrete reasoning behind it? > > > > > > Dave. > > > > Hi Dave. > > The reason it happened now is because I saw two drivers, which are > > doing h/w acceleration for AI, trying to be accepted to the misc > > subsystem. > > Add to that the fact I talked with Greg a couple of months ago about > > doing a subsystem for any compute accelerators, which he was positive > > about, I thought it is a good opportunity to finally do it. > > > > I also honestly think that I can contribute much to these drivers from > > my experience with the habana driver (which is now deployed in mass at > > AWS) and contribute code from the habana driver to a common framework > > for AI drivers. > > Why not port the habana driver to drm now instead? I don't get why it > wouldn't make sense? imho, no, I don't see the upside. This is not a trivial change, and will require a large effort. What will it give me that I need and I don't have now ? > > Stepping up to create a new subsystem is great, but we need rules > around what belongs where, we can't just spawn new subsystems when we > have no clear guidelines on where drivers should land. > > What are the rules for a new accel subsystem? Do we have to now > retarget the 3 drivers that are queued up to use drm for accelerators, > because 2 drivers don't? > > There's a lot more to figure out here than merge some structures and > it will be fine. I totally agree. We need to set some rules and make sure everyone in the kernel community is familiar with them, because now you get different answers based on who you consult with. My rules of thumb that I thought of was that if you don't have any display (you don't need to support X/wayland) and you don't need to support opengl/vulkan/opencl/directx or any other gpu-related software stack, then you don't have to go through drm. In other words, if you don't have gpu-specific h/w and/or you don't need gpu uAPI, you don't belong in drm. After all, memory management services, or common device chars handling I can get from other subsystems (e.g. rdma) as well. I'm sure I could model my uAPI to be rdma uAPI compliant (I can define proprietary uAPI there as well), but this doesn't mean I belong there, right ? > > I think the one area I can see a divide where a new subsystem is for > accelerators that are single-user, one shot type things like media > drivers (though maybe they could be just media drivers). > > I think anything that does command offloading to firmware or queues > belongs in drm, because that is pretty much what the framework does. I I think this is a very broad statement which doesn't reflect reality in the kernel. > think it might make sense to enhance some parts of drm to fit things > in better, but that shouldn't block things getting started. > > I'm considering if, we should add an accelerator staging area to drm > and land the 2-3 submissions we have and try and steer things towards > commonality that way instead of holding them out of tree. Sounds like a good idea regardless of this discussion. > > I'd like to offload things from Greg by just not having people submit > misc drivers at all for things that should go elsewhere. Great, at least we can agree on that. Thanks, Oded > > Dave. > > > > > > Regarding the open source userspace rules in drm - yes, I think your > > rules are too limiting for the relatively young AI scene, and I saw at > > the 2021 kernel summit that other people from the kernel community > > think that as well. > > But that's not the main reason, or even a reason at all for doing > > this. After all, at least for habana, we open-sourced our compiler and > > a runtime library. And Greg also asked those two drivers if they have > > matching open-sourced user-space code. > > > > And a final reason is that I thought this can also help in somewhat > > reducing the workload on Greg. I saw in the last kernel summit there > > was a concern about bringing more people to be kernel maintainers so I > > thought this is a step in the right direction. > > > > Oded ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-08-04 7:43 ` Oded Gabbay 0 siblings, 0 replies; 78+ messages in thread From: Oded Gabbay @ 2022-08-04 7:43 UTC (permalink / raw) To: Dave Airlie Cc: Arnd Bergmann, Greg Kroah-Hartman, Linux-Kernel@Vger. Kernel. Org, dri-devel, Jiho Chu, Yuji Ishikawa On Thu, Aug 4, 2022 at 2:54 AM Dave Airlie <airlied@gmail.com> wrote: > > On Thu, 4 Aug 2022 at 06:21, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > > On Wed, Aug 3, 2022 at 10:04 PM Dave Airlie <airlied@gmail.com> wrote: > > > > > > On Sun, 31 Jul 2022 at 22:04, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > > > > > > Hi, > > > > Greg and I talked a couple of months ago about preparing a new accel > > > > subsystem for compute/acceleration devices that are not GPUs and I > > > > think your drivers that you are now trying to upstream fit it as well. > > > > > > We've had some submissions for not-GPUs to the drm subsystem recently. > > > > > > Intel GNA, Intel VPU, NVDLA, rpmsg AI processor unit. > > > > > > why is creating a new subsystem at this time necessary? > > > > > > Are we just creating a subsystem to avoid the open source userspace > > > consumer rules? Or do we have some concrete reasoning behind it? > > > > > > Dave. > > > > Hi Dave. > > The reason it happened now is because I saw two drivers, which are > > doing h/w acceleration for AI, trying to be accepted to the misc > > subsystem. > > Add to that the fact I talked with Greg a couple of months ago about > > doing a subsystem for any compute accelerators, which he was positive > > about, I thought it is a good opportunity to finally do it. > > > > I also honestly think that I can contribute much to these drivers from > > my experience with the habana driver (which is now deployed in mass at > > AWS) and contribute code from the habana driver to a common framework > > for AI drivers. > > Why not port the habana driver to drm now instead? I don't get why it > wouldn't make sense? imho, no, I don't see the upside. This is not a trivial change, and will require a large effort. What will it give me that I need and I don't have now ? > > Stepping up to create a new subsystem is great, but we need rules > around what belongs where, we can't just spawn new subsystems when we > have no clear guidelines on where drivers should land. > > What are the rules for a new accel subsystem? Do we have to now > retarget the 3 drivers that are queued up to use drm for accelerators, > because 2 drivers don't? > > There's a lot more to figure out here than merge some structures and > it will be fine. I totally agree. We need to set some rules and make sure everyone in the kernel community is familiar with them, because now you get different answers based on who you consult with. My rules of thumb that I thought of was that if you don't have any display (you don't need to support X/wayland) and you don't need to support opengl/vulkan/opencl/directx or any other gpu-related software stack, then you don't have to go through drm. In other words, if you don't have gpu-specific h/w and/or you don't need gpu uAPI, you don't belong in drm. After all, memory management services, or common device chars handling I can get from other subsystems (e.g. rdma) as well. I'm sure I could model my uAPI to be rdma uAPI compliant (I can define proprietary uAPI there as well), but this doesn't mean I belong there, right ? > > I think the one area I can see a divide where a new subsystem is for > accelerators that are single-user, one shot type things like media > drivers (though maybe they could be just media drivers). > > I think anything that does command offloading to firmware or queues > belongs in drm, because that is pretty much what the framework does. I I think this is a very broad statement which doesn't reflect reality in the kernel. > think it might make sense to enhance some parts of drm to fit things > in better, but that shouldn't block things getting started. > > I'm considering if, we should add an accelerator staging area to drm > and land the 2-3 submissions we have and try and steer things towards > commonality that way instead of holding them out of tree. Sounds like a good idea regardless of this discussion. > > I'd like to offload things from Greg by just not having people submit > misc drivers at all for things that should go elsewhere. Great, at least we can agree on that. Thanks, Oded > > Dave. > > > > > > Regarding the open source userspace rules in drm - yes, I think your > > rules are too limiting for the relatively young AI scene, and I saw at > > the 2021 kernel summit that other people from the kernel community > > think that as well. > > But that's not the main reason, or even a reason at all for doing > > this. After all, at least for habana, we open-sourced our compiler and > > a runtime library. And Greg also asked those two drivers if they have > > matching open-sourced user-space code. > > > > And a final reason is that I thought this can also help in somewhat > > reducing the workload on Greg. I saw in the last kernel summit there > > was a concern about bringing more people to be kernel maintainers so I > > thought this is a step in the right direction. > > > > Oded ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-04 7:43 ` Oded Gabbay @ 2022-08-04 14:50 ` Jason Gunthorpe -1 siblings, 0 replies; 78+ messages in thread From: Jason Gunthorpe @ 2022-08-04 14:50 UTC (permalink / raw) To: Oded Gabbay Cc: Dave Airlie, dri-devel, Greg Kroah-Hartman, Yuji Ishikawa, Jiho Chu, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org On Thu, Aug 04, 2022 at 10:43:42AM +0300, Oded Gabbay wrote: > After all, memory management services, or common device chars handling > I can get from other subsystems (e.g. rdma) as well. I'm sure I could > model my uAPI to be rdma uAPI compliant (I can define proprietary uAPI > there as well), but this doesn't mean I belong there, right ? You sure can, but there is still an expectation, eg in RDMA, that your device has a similarity to the established standards (like roce in habana's case) that RDMA is geared to support. I think the the most important thing to establish a new subsystem is to actually identify what commonalities it is supposed to be providing. Usually this is driven by some standards body, but the AI/ML space hasn't gone in that direction at all yet. We don't need a "subsystem" to have a bunch of drivers expose chardevs with completely unique ioctls. The flip is true of DRM - DRM is pretty general. I bet I could implement an RDMA device under DRM - but that doesn't mean it should be done. My biggest concern is that this subsystem not turn into a back door for a bunch of abuse of kernel APIs going forward. Though things are better now, we still see this in DRM where expediency or performance justifies hacky shortcuts instead of good in-kernel architecture. At least DRM has reliable and experienced review these days. Jason ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-08-04 14:50 ` Jason Gunthorpe 0 siblings, 0 replies; 78+ messages in thread From: Jason Gunthorpe @ 2022-08-04 14:50 UTC (permalink / raw) To: Oded Gabbay Cc: Arnd Bergmann, Greg Kroah-Hartman, Linux-Kernel@Vger. Kernel. Org, dri-devel, Jiho Chu, Yuji Ishikawa On Thu, Aug 04, 2022 at 10:43:42AM +0300, Oded Gabbay wrote: > After all, memory management services, or common device chars handling > I can get from other subsystems (e.g. rdma) as well. I'm sure I could > model my uAPI to be rdma uAPI compliant (I can define proprietary uAPI > there as well), but this doesn't mean I belong there, right ? You sure can, but there is still an expectation, eg in RDMA, that your device has a similarity to the established standards (like roce in habana's case) that RDMA is geared to support. I think the the most important thing to establish a new subsystem is to actually identify what commonalities it is supposed to be providing. Usually this is driven by some standards body, but the AI/ML space hasn't gone in that direction at all yet. We don't need a "subsystem" to have a bunch of drivers expose chardevs with completely unique ioctls. The flip is true of DRM - DRM is pretty general. I bet I could implement an RDMA device under DRM - but that doesn't mean it should be done. My biggest concern is that this subsystem not turn into a back door for a bunch of abuse of kernel APIs going forward. Though things are better now, we still see this in DRM where expediency or performance justifies hacky shortcuts instead of good in-kernel architecture. At least DRM has reliable and experienced review these days. Jason ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-04 14:50 ` Jason Gunthorpe @ 2022-08-04 17:48 ` Oded Gabbay -1 siblings, 0 replies; 78+ messages in thread From: Oded Gabbay @ 2022-08-04 17:48 UTC (permalink / raw) To: Jason Gunthorpe Cc: Dave Airlie, dri-devel, Greg Kroah-Hartman, Yuji Ishikawa, Jiho Chu, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org On Thu, Aug 4, 2022 at 5:50 PM Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Thu, Aug 04, 2022 at 10:43:42AM +0300, Oded Gabbay wrote: > > > After all, memory management services, or common device chars handling > > I can get from other subsystems (e.g. rdma) as well. I'm sure I could > > model my uAPI to be rdma uAPI compliant (I can define proprietary uAPI > > there as well), but this doesn't mean I belong there, right ? > > You sure can, but there is still an expectation, eg in RDMA, that your > device has a similarity to the established standards (like roce in > habana's case) that RDMA is geared to support. > > I think the the most important thing to establish a new subsystem is > to actually identify what commonalities it is supposed to be > providing. Usually this is driven by some standards body, but the > AI/ML space hasn't gone in that direction at all yet. I agree. In the AI-world the standard doesn't exist and I don't see anything on the horizon. There are the AI frameworks/compilers which are 30,000 feet above us, and there is CUDA which is closed-source and I have no idea what it does inside. > > We don't need a "subsystem" to have a bunch of drivers expose chardevs > with completely unique ioctls. I totally agree with this sentence and this is *exactly* why personally I don't want to use DRM because when I look at the long list of common IOCTLs in drm.h, I don't find anything that I can use. It's simply either not relevant at all to my h/w or it is something that our h/w implemented differently. This is in contrast to the rdma, where as you said, we have ibverbs API. So, when you asked that we write an IBverbs driver I understood the reasoning. There is a common user-space library which talks to the rdma drivers and all the rdma applications use that library and once I will write a (somewhat) standard driver, then hopefully I can enjoy all that. > > The flip is true of DRM - DRM is pretty general. I bet I could > implement an RDMA device under DRM - but that doesn't mean it should > be done. > > My biggest concern is that this subsystem not turn into a back door > for a bunch of abuse of kernel APIs going forward. Though things are How do you suggest to make sure it won't happen ? > better now, we still see this in DRM where expediency or performance > justifies hacky shortcuts instead of good in-kernel architecture. At > least DRM has reliable and experienced review these days. Definitely. DRM has some parts that are really well written. For example, the whole char device handling with sysfs/debugfs and managed resources code. This is something I would gladly either use or copy-paste into the hw accel subsystem. And of course more pairs of eyes looking at the code will usually produce better code. I think that it is clear from my previous email what I intended to provide. A clean, simple framework for devices to register with, get services for the most basic stuff such as device char handling, sysfs/debugfs. Later on, add more simple stuff such as memory manager and uapi for memory handling. I guess someone can say all that exists in drm, but like I said it exists in other subsystems as well. I want to be perfectly honest and say there is nothing special here for AI. It's actually the opposite, it is a generic framework for compute only. Think of it as an easier path to upstream if you just want to do compute acceleration. Maybe in the future it will be more, but I can't predict the future. If that's not enough for a new subsystem, fair enough, I'll withdraw my offer. Thanks, Oded > > Jason ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-08-04 17:48 ` Oded Gabbay 0 siblings, 0 replies; 78+ messages in thread From: Oded Gabbay @ 2022-08-04 17:48 UTC (permalink / raw) To: Jason Gunthorpe Cc: Arnd Bergmann, Greg Kroah-Hartman, Linux-Kernel@Vger. Kernel. Org, dri-devel, Jiho Chu, Yuji Ishikawa On Thu, Aug 4, 2022 at 5:50 PM Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Thu, Aug 04, 2022 at 10:43:42AM +0300, Oded Gabbay wrote: > > > After all, memory management services, or common device chars handling > > I can get from other subsystems (e.g. rdma) as well. I'm sure I could > > model my uAPI to be rdma uAPI compliant (I can define proprietary uAPI > > there as well), but this doesn't mean I belong there, right ? > > You sure can, but there is still an expectation, eg in RDMA, that your > device has a similarity to the established standards (like roce in > habana's case) that RDMA is geared to support. > > I think the the most important thing to establish a new subsystem is > to actually identify what commonalities it is supposed to be > providing. Usually this is driven by some standards body, but the > AI/ML space hasn't gone in that direction at all yet. I agree. In the AI-world the standard doesn't exist and I don't see anything on the horizon. There are the AI frameworks/compilers which are 30,000 feet above us, and there is CUDA which is closed-source and I have no idea what it does inside. > > We don't need a "subsystem" to have a bunch of drivers expose chardevs > with completely unique ioctls. I totally agree with this sentence and this is *exactly* why personally I don't want to use DRM because when I look at the long list of common IOCTLs in drm.h, I don't find anything that I can use. It's simply either not relevant at all to my h/w or it is something that our h/w implemented differently. This is in contrast to the rdma, where as you said, we have ibverbs API. So, when you asked that we write an IBverbs driver I understood the reasoning. There is a common user-space library which talks to the rdma drivers and all the rdma applications use that library and once I will write a (somewhat) standard driver, then hopefully I can enjoy all that. > > The flip is true of DRM - DRM is pretty general. I bet I could > implement an RDMA device under DRM - but that doesn't mean it should > be done. > > My biggest concern is that this subsystem not turn into a back door > for a bunch of abuse of kernel APIs going forward. Though things are How do you suggest to make sure it won't happen ? > better now, we still see this in DRM where expediency or performance > justifies hacky shortcuts instead of good in-kernel architecture. At > least DRM has reliable and experienced review these days. Definitely. DRM has some parts that are really well written. For example, the whole char device handling with sysfs/debugfs and managed resources code. This is something I would gladly either use or copy-paste into the hw accel subsystem. And of course more pairs of eyes looking at the code will usually produce better code. I think that it is clear from my previous email what I intended to provide. A clean, simple framework for devices to register with, get services for the most basic stuff such as device char handling, sysfs/debugfs. Later on, add more simple stuff such as memory manager and uapi for memory handling. I guess someone can say all that exists in drm, but like I said it exists in other subsystems as well. I want to be perfectly honest and say there is nothing special here for AI. It's actually the opposite, it is a generic framework for compute only. Think of it as an easier path to upstream if you just want to do compute acceleration. Maybe in the future it will be more, but I can't predict the future. If that's not enough for a new subsystem, fair enough, I'll withdraw my offer. Thanks, Oded > > Jason ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-04 17:48 ` Oded Gabbay @ 2022-08-05 0:22 ` Jason Gunthorpe -1 siblings, 0 replies; 78+ messages in thread From: Jason Gunthorpe @ 2022-08-05 0:22 UTC (permalink / raw) To: Oded Gabbay Cc: Dave Airlie, dri-devel, Greg Kroah-Hartman, Yuji Ishikawa, Jiho Chu, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org On Thu, Aug 04, 2022 at 08:48:28PM +0300, Oded Gabbay wrote: > > The flip is true of DRM - DRM is pretty general. I bet I could > > implement an RDMA device under DRM - but that doesn't mean it should > > be done. > > > > My biggest concern is that this subsystem not turn into a back door > > for a bunch of abuse of kernel APIs going forward. Though things > > are > > How do you suggest to make sure it won't happen ? Well, if you launch the subsystem then it is your job to make sure it doesn't happen - or endure the complaints :) Accelerators have this nasty tendancy to become co-designed with their SOCs in nasty intricate ways and then there is a desire to punch through all the inconvenient layers to make it go faster. > > better now, we still see this in DRM where expediency or performance > > justifies hacky shortcuts instead of good in-kernel architecture. At > > least DRM has reliable and experienced review these days. > Definitely. DRM has some parts that are really well written. For > example, the whole char device handling with sysfs/debugfs and managed > resources code. Arguably this should all be common code in the driver core/etc - what value is a subsystem adding beyond that besides using it properly? It would be nice to at least identify something that could obviously be common, like some kind of enumeration and metadata kind of stuff (think like ethtool, devlink, rdma tool, nvemctl etc) > I think that it is clear from my previous email what I intended to > provide. A clean, simple framework for devices to register with, get > services for the most basic stuff such as device char handling, > sysfs/debugfs. This should all be trivially done by any driver using the core codes, if you see gaps I'd ask why not improve the core code? > Later on, add more simple stuff such as memory manager > and uapi for memory handling. I guess someone can say all that exists > in drm, but like I said it exists in other subsystems as well. This makes sense to me, all accelerators need a way to set a memory map, but on the other hand we are doing some very nifty stuff in this area with iommufd and it might be very interesting to just have the accelerator driver link to that API instead of building yet another copy of pin_user_pages() code.. Especially with PASID likely becoming part of any accelerator toolkit. > I want to be perfectly honest and say there is nothing special here > for AI. It's actually the opposite, it is a generic framework for > compute only. Think of it as an easier path to upstream if you just > want to do compute acceleration. Maybe in the future it will be more, > but I can't predict the future. I can't either, and to be clear I'm only questioning the merit of a "subsystem" eg with a struct class, rigerous uAPI, etc. The idea that these kinds of accel drivers deserve specialized review makes sense to me, even if they remain as unorganized char devices. Just understand that is what you are signing up for :) Jason ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-08-05 0:22 ` Jason Gunthorpe 0 siblings, 0 replies; 78+ messages in thread From: Jason Gunthorpe @ 2022-08-05 0:22 UTC (permalink / raw) To: Oded Gabbay Cc: Arnd Bergmann, Greg Kroah-Hartman, Linux-Kernel@Vger. Kernel. Org, dri-devel, Jiho Chu, Yuji Ishikawa On Thu, Aug 04, 2022 at 08:48:28PM +0300, Oded Gabbay wrote: > > The flip is true of DRM - DRM is pretty general. I bet I could > > implement an RDMA device under DRM - but that doesn't mean it should > > be done. > > > > My biggest concern is that this subsystem not turn into a back door > > for a bunch of abuse of kernel APIs going forward. Though things > > are > > How do you suggest to make sure it won't happen ? Well, if you launch the subsystem then it is your job to make sure it doesn't happen - or endure the complaints :) Accelerators have this nasty tendancy to become co-designed with their SOCs in nasty intricate ways and then there is a desire to punch through all the inconvenient layers to make it go faster. > > better now, we still see this in DRM where expediency or performance > > justifies hacky shortcuts instead of good in-kernel architecture. At > > least DRM has reliable and experienced review these days. > Definitely. DRM has some parts that are really well written. For > example, the whole char device handling with sysfs/debugfs and managed > resources code. Arguably this should all be common code in the driver core/etc - what value is a subsystem adding beyond that besides using it properly? It would be nice to at least identify something that could obviously be common, like some kind of enumeration and metadata kind of stuff (think like ethtool, devlink, rdma tool, nvemctl etc) > I think that it is clear from my previous email what I intended to > provide. A clean, simple framework for devices to register with, get > services for the most basic stuff such as device char handling, > sysfs/debugfs. This should all be trivially done by any driver using the core codes, if you see gaps I'd ask why not improve the core code? > Later on, add more simple stuff such as memory manager > and uapi for memory handling. I guess someone can say all that exists > in drm, but like I said it exists in other subsystems as well. This makes sense to me, all accelerators need a way to set a memory map, but on the other hand we are doing some very nifty stuff in this area with iommufd and it might be very interesting to just have the accelerator driver link to that API instead of building yet another copy of pin_user_pages() code.. Especially with PASID likely becoming part of any accelerator toolkit. > I want to be perfectly honest and say there is nothing special here > for AI. It's actually the opposite, it is a generic framework for > compute only. Think of it as an easier path to upstream if you just > want to do compute acceleration. Maybe in the future it will be more, > but I can't predict the future. I can't either, and to be clear I'm only questioning the merit of a "subsystem" eg with a struct class, rigerous uAPI, etc. The idea that these kinds of accel drivers deserve specialized review makes sense to me, even if they remain as unorganized char devices. Just understand that is what you are signing up for :) Jason ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-05 0:22 ` Jason Gunthorpe @ 2022-08-07 6:43 ` Oded Gabbay -1 siblings, 0 replies; 78+ messages in thread From: Oded Gabbay @ 2022-08-07 6:43 UTC (permalink / raw) To: Jason Gunthorpe Cc: Dave Airlie, dri-devel, Greg Kroah-Hartman, Yuji Ishikawa, Jiho Chu, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org On Fri, Aug 5, 2022 at 3:22 AM Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Thu, Aug 04, 2022 at 08:48:28PM +0300, Oded Gabbay wrote: > > > > The flip is true of DRM - DRM is pretty general. I bet I could > > > implement an RDMA device under DRM - but that doesn't mean it should > > > be done. > > > > > > My biggest concern is that this subsystem not turn into a back door > > > for a bunch of abuse of kernel APIs going forward. Though things > > > are > > > > How do you suggest to make sure it won't happen ? > > Well, if you launch the subsystem then it is your job to make sure it > doesn't happen - or endure the complaints :) Understood, I'll make sure there is no room for complaints. > > Accelerators have this nasty tendancy to become co-designed with their > SOCs in nasty intricate ways and then there is a desire to punch > through all the inconvenient layers to make it go faster. > > > > better now, we still see this in DRM where expediency or performance > > > justifies hacky shortcuts instead of good in-kernel architecture. At > > > least DRM has reliable and experienced review these days. > > Definitely. DRM has some parts that are really well written. For > > example, the whole char device handling with sysfs/debugfs and managed > > resources code. > > Arguably this should all be common code in the driver core/etc - what > value is a subsystem adding beyond that besides using it properly? I mainly see two things here: 1. If there is a subsystem which is responsible for creating and exposing the device character files, then there should be some code that connects between each device driver to that subsystem. i.e. There should be functions that each driver should call from its probe and release callback functions. Those functions should take care of the following: - Create metadata for the device, the device's minor(s) and the driver's ioctls table and driver's callback for file operations (both are common for all the driver's devices). Save all that metadata with proper locking. - Create the device char files themselves and supply file operations that will be called per each open/close/mmap/etc. - Keep track of all these objects' lifetime in regard to the device driver's lifetime, with proper handling for release. - Add common handling and entries of sysfs/debugfs for these devices with the ability for each device driver to add their own unique entries. 2. I think that you underestimate (due to your experience) the "using it properly" part... It is not so easy to do this properly for inexperienced kernel people. If we provide all the code I mentioned above, the device driver writer doesn't need to be aware of all these kernel APIs. > > It would be nice to at least identify something that could obviously > be common, like some kind of enumeration and metadata kind of stuff > (think like ethtool, devlink, rdma tool, nvemctl etc) Definitely. I think we can have at least one ioctl that will be common to all drivers from the start. A kind of information retrieval ioctl. There are many information points that I'm sure are common to most accelerators. We have an extensive information ioctl in the habanalabs driver and most of the information there is not habana specific imo. > > > I think that it is clear from my previous email what I intended to > > provide. A clean, simple framework for devices to register with, get > > services for the most basic stuff such as device char handling, > > sysfs/debugfs. > > This should all be trivially done by any driver using the core codes, > if you see gaps I'd ask why not improve the core code? > > > Later on, add more simple stuff such as memory manager > > and uapi for memory handling. I guess someone can say all that exists > > in drm, but like I said it exists in other subsystems as well. > > This makes sense to me, all accelerators need a way to set a memory > map, but on the other hand we are doing some very nifty stuff in this > area with iommufd and it might be very interesting to just have the > accelerator driver link to that API instead of building yet another > copy of pin_user_pages() code.. Especially with PASID likely becoming > part of any accelerator toolkit. Here I disagree with you. First of all, there are many relatively simple accelerators, especially in edge, where PASID is really not relevant. Second, even for the more sophisticated PCIe/CXL-based ones, PASID is not mandatory and I suspect that it won't be in 100% of those devices. But definitely that should be an alternative to the "classic" way of handling dma'able memory (pin_user_pages()). > > > I want to be perfectly honest and say there is nothing special here > > for AI. It's actually the opposite, it is a generic framework for > > compute only. Think of it as an easier path to upstream if you just > > want to do compute acceleration. Maybe in the future it will be more, > > but I can't predict the future. > > I can't either, and to be clear I'm only questioning the merit of a > "subsystem" eg with a struct class, rigerous uAPI, etc. > > The idea that these kinds of accel drivers deserve specialized review > makes sense to me, even if they remain as unorganized char > devices. Just understand that is what you are signing up for :) I understand. That's why I'm taking all your points very seriously. This is not a decision that should be taken lightly and I want to be sure most agree this is the correct way forward. My next step is to talk to Dave about it in-depth. In his other email he wrote some interesting ideas which I want to discuss with him. Maybe this is something that should be discussed in the kernel summit ? > > Jason ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-08-07 6:43 ` Oded Gabbay 0 siblings, 0 replies; 78+ messages in thread From: Oded Gabbay @ 2022-08-07 6:43 UTC (permalink / raw) To: Jason Gunthorpe Cc: Arnd Bergmann, Greg Kroah-Hartman, Linux-Kernel@Vger. Kernel. Org, dri-devel, Jiho Chu, Yuji Ishikawa On Fri, Aug 5, 2022 at 3:22 AM Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Thu, Aug 04, 2022 at 08:48:28PM +0300, Oded Gabbay wrote: > > > > The flip is true of DRM - DRM is pretty general. I bet I could > > > implement an RDMA device under DRM - but that doesn't mean it should > > > be done. > > > > > > My biggest concern is that this subsystem not turn into a back door > > > for a bunch of abuse of kernel APIs going forward. Though things > > > are > > > > How do you suggest to make sure it won't happen ? > > Well, if you launch the subsystem then it is your job to make sure it > doesn't happen - or endure the complaints :) Understood, I'll make sure there is no room for complaints. > > Accelerators have this nasty tendancy to become co-designed with their > SOCs in nasty intricate ways and then there is a desire to punch > through all the inconvenient layers to make it go faster. > > > > better now, we still see this in DRM where expediency or performance > > > justifies hacky shortcuts instead of good in-kernel architecture. At > > > least DRM has reliable and experienced review these days. > > Definitely. DRM has some parts that are really well written. For > > example, the whole char device handling with sysfs/debugfs and managed > > resources code. > > Arguably this should all be common code in the driver core/etc - what > value is a subsystem adding beyond that besides using it properly? I mainly see two things here: 1. If there is a subsystem which is responsible for creating and exposing the device character files, then there should be some code that connects between each device driver to that subsystem. i.e. There should be functions that each driver should call from its probe and release callback functions. Those functions should take care of the following: - Create metadata for the device, the device's minor(s) and the driver's ioctls table and driver's callback for file operations (both are common for all the driver's devices). Save all that metadata with proper locking. - Create the device char files themselves and supply file operations that will be called per each open/close/mmap/etc. - Keep track of all these objects' lifetime in regard to the device driver's lifetime, with proper handling for release. - Add common handling and entries of sysfs/debugfs for these devices with the ability for each device driver to add their own unique entries. 2. I think that you underestimate (due to your experience) the "using it properly" part... It is not so easy to do this properly for inexperienced kernel people. If we provide all the code I mentioned above, the device driver writer doesn't need to be aware of all these kernel APIs. > > It would be nice to at least identify something that could obviously > be common, like some kind of enumeration and metadata kind of stuff > (think like ethtool, devlink, rdma tool, nvemctl etc) Definitely. I think we can have at least one ioctl that will be common to all drivers from the start. A kind of information retrieval ioctl. There are many information points that I'm sure are common to most accelerators. We have an extensive information ioctl in the habanalabs driver and most of the information there is not habana specific imo. > > > I think that it is clear from my previous email what I intended to > > provide. A clean, simple framework for devices to register with, get > > services for the most basic stuff such as device char handling, > > sysfs/debugfs. > > This should all be trivially done by any driver using the core codes, > if you see gaps I'd ask why not improve the core code? > > > Later on, add more simple stuff such as memory manager > > and uapi for memory handling. I guess someone can say all that exists > > in drm, but like I said it exists in other subsystems as well. > > This makes sense to me, all accelerators need a way to set a memory > map, but on the other hand we are doing some very nifty stuff in this > area with iommufd and it might be very interesting to just have the > accelerator driver link to that API instead of building yet another > copy of pin_user_pages() code.. Especially with PASID likely becoming > part of any accelerator toolkit. Here I disagree with you. First of all, there are many relatively simple accelerators, especially in edge, where PASID is really not relevant. Second, even for the more sophisticated PCIe/CXL-based ones, PASID is not mandatory and I suspect that it won't be in 100% of those devices. But definitely that should be an alternative to the "classic" way of handling dma'able memory (pin_user_pages()). > > > I want to be perfectly honest and say there is nothing special here > > for AI. It's actually the opposite, it is a generic framework for > > compute only. Think of it as an easier path to upstream if you just > > want to do compute acceleration. Maybe in the future it will be more, > > but I can't predict the future. > > I can't either, and to be clear I'm only questioning the merit of a > "subsystem" eg with a struct class, rigerous uAPI, etc. > > The idea that these kinds of accel drivers deserve specialized review > makes sense to me, even if they remain as unorganized char > devices. Just understand that is what you are signing up for :) I understand. That's why I'm taking all your points very seriously. This is not a decision that should be taken lightly and I want to be sure most agree this is the correct way forward. My next step is to talk to Dave about it in-depth. In his other email he wrote some interesting ideas which I want to discuss with him. Maybe this is something that should be discussed in the kernel summit ? > > Jason ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-07 6:43 ` Oded Gabbay @ 2022-08-07 11:25 ` Oded Gabbay -1 siblings, 0 replies; 78+ messages in thread From: Oded Gabbay @ 2022-08-07 11:25 UTC (permalink / raw) To: Jason Gunthorpe Cc: Dave Airlie, dri-devel, Greg Kroah-Hartman, Yuji Ishikawa, Jiho Chu, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org On Sun, Aug 7, 2022 at 9:43 AM Oded Gabbay <oded.gabbay@gmail.com> wrote: > > On Fri, Aug 5, 2022 at 3:22 AM Jason Gunthorpe <jgg@nvidia.com> wrote: > > > > On Thu, Aug 04, 2022 at 08:48:28PM +0300, Oded Gabbay wrote: > > > > > > The flip is true of DRM - DRM is pretty general. I bet I could > > > > implement an RDMA device under DRM - but that doesn't mean it should > > > > be done. > > > > > > > > My biggest concern is that this subsystem not turn into a back door > > > > for a bunch of abuse of kernel APIs going forward. Though things > > > > are > > > > > > How do you suggest to make sure it won't happen ? > > > > Well, if you launch the subsystem then it is your job to make sure it > > doesn't happen - or endure the complaints :) > Understood, I'll make sure there is no room for complaints. > > > > Accelerators have this nasty tendancy to become co-designed with their > > SOCs in nasty intricate ways and then there is a desire to punch > > through all the inconvenient layers to make it go faster. > > > > > > better now, we still see this in DRM where expediency or performance > > > > justifies hacky shortcuts instead of good in-kernel architecture. At > > > > least DRM has reliable and experienced review these days. > > > Definitely. DRM has some parts that are really well written. For > > > example, the whole char device handling with sysfs/debugfs and managed > > > resources code. > > > > Arguably this should all be common code in the driver core/etc - what > > value is a subsystem adding beyond that besides using it properly? > > I mainly see two things here: > > 1. If there is a subsystem which is responsible for creating and > exposing the device character files, then there should be some code > that connects between each device driver to that subsystem. > i.e. There should be functions that each driver should call from its > probe and release callback functions. > > Those functions should take care of the following: > - Create metadata for the device, the device's minor(s) and the > driver's ioctls table and driver's callback for file operations (both > are common for all the driver's devices). Save all that metadata with > proper locking. > - Create the device char files themselves and supply file operations > that will be called per each open/close/mmap/etc. > - Keep track of all these objects' lifetime in regard to the device > driver's lifetime, with proper handling for release. > - Add common handling and entries of sysfs/debugfs for these devices > with the ability for each device driver to add their own unique > entries. > > 2. I think that you underestimate (due to your experience) the "using > it properly" part... It is not so easy to do this properly for > inexperienced kernel people. If we provide all the code I mentioned > above, the device driver writer doesn't need to be aware of all these > kernel APIs. > Two more points I thought of as examples for added value to be done in common code: 1. Common code for handling dma-buf. Very similar to what was added a year ago to rdma. This can be accompanied by a common ioctl to export and import a dma-buf. 2. Common code to handle drivers that want to allow a single user at a time to run open the device char file. Oded > > > > It would be nice to at least identify something that could obviously > > be common, like some kind of enumeration and metadata kind of stuff > > (think like ethtool, devlink, rdma tool, nvemctl etc) > Definitely. I think we can have at least one ioctl that will be common > to all drivers from the start. > A kind of information retrieval ioctl. There are many information > points that I'm sure are common to most accelerators. We have an > extensive information ioctl in the habanalabs driver and most of the > information there is not habana specific imo. > > > > > I think that it is clear from my previous email what I intended to > > > provide. A clean, simple framework for devices to register with, get > > > services for the most basic stuff such as device char handling, > > > sysfs/debugfs. > > > > This should all be trivially done by any driver using the core codes, > > if you see gaps I'd ask why not improve the core code? > > > > > Later on, add more simple stuff such as memory manager > > > and uapi for memory handling. I guess someone can say all that exists > > > in drm, but like I said it exists in other subsystems as well. > > > > This makes sense to me, all accelerators need a way to set a memory > > map, but on the other hand we are doing some very nifty stuff in this > > area with iommufd and it might be very interesting to just have the > > accelerator driver link to that API instead of building yet another > > copy of pin_user_pages() code.. Especially with PASID likely becoming > > part of any accelerator toolkit. > Here I disagree with you. First of all, there are many relatively > simple accelerators, especially in edge, where PASID is really not > relevant. > Second, even for the more sophisticated PCIe/CXL-based ones, PASID is > not mandatory and I suspect that it won't be in 100% of those devices. > But definitely that should be an alternative to the "classic" way of > handling dma'able memory (pin_user_pages()). > > > > > I want to be perfectly honest and say there is nothing special here > > > for AI. It's actually the opposite, it is a generic framework for > > > compute only. Think of it as an easier path to upstream if you just > > > want to do compute acceleration. Maybe in the future it will be more, > > > but I can't predict the future. > > > > I can't either, and to be clear I'm only questioning the merit of a > > "subsystem" eg with a struct class, rigerous uAPI, etc. > > > > The idea that these kinds of accel drivers deserve specialized review > > makes sense to me, even if they remain as unorganized char > > devices. Just understand that is what you are signing up for :) > I understand. That's why I'm taking all your points very seriously. > This is not a decision that should be taken lightly and I want to be > sure most agree this is the correct way forward. > My next step is to talk to Dave about it in-depth. In his other email > he wrote some interesting ideas which I want to discuss with him. > > Maybe this is something that should be discussed in the kernel summit ? > > > > > Jason ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-08-07 11:25 ` Oded Gabbay 0 siblings, 0 replies; 78+ messages in thread From: Oded Gabbay @ 2022-08-07 11:25 UTC (permalink / raw) To: Jason Gunthorpe Cc: Arnd Bergmann, Greg Kroah-Hartman, Linux-Kernel@Vger. Kernel. Org, dri-devel, Jiho Chu, Yuji Ishikawa On Sun, Aug 7, 2022 at 9:43 AM Oded Gabbay <oded.gabbay@gmail.com> wrote: > > On Fri, Aug 5, 2022 at 3:22 AM Jason Gunthorpe <jgg@nvidia.com> wrote: > > > > On Thu, Aug 04, 2022 at 08:48:28PM +0300, Oded Gabbay wrote: > > > > > > The flip is true of DRM - DRM is pretty general. I bet I could > > > > implement an RDMA device under DRM - but that doesn't mean it should > > > > be done. > > > > > > > > My biggest concern is that this subsystem not turn into a back door > > > > for a bunch of abuse of kernel APIs going forward. Though things > > > > are > > > > > > How do you suggest to make sure it won't happen ? > > > > Well, if you launch the subsystem then it is your job to make sure it > > doesn't happen - or endure the complaints :) > Understood, I'll make sure there is no room for complaints. > > > > Accelerators have this nasty tendancy to become co-designed with their > > SOCs in nasty intricate ways and then there is a desire to punch > > through all the inconvenient layers to make it go faster. > > > > > > better now, we still see this in DRM where expediency or performance > > > > justifies hacky shortcuts instead of good in-kernel architecture. At > > > > least DRM has reliable and experienced review these days. > > > Definitely. DRM has some parts that are really well written. For > > > example, the whole char device handling with sysfs/debugfs and managed > > > resources code. > > > > Arguably this should all be common code in the driver core/etc - what > > value is a subsystem adding beyond that besides using it properly? > > I mainly see two things here: > > 1. If there is a subsystem which is responsible for creating and > exposing the device character files, then there should be some code > that connects between each device driver to that subsystem. > i.e. There should be functions that each driver should call from its > probe and release callback functions. > > Those functions should take care of the following: > - Create metadata for the device, the device's minor(s) and the > driver's ioctls table and driver's callback for file operations (both > are common for all the driver's devices). Save all that metadata with > proper locking. > - Create the device char files themselves and supply file operations > that will be called per each open/close/mmap/etc. > - Keep track of all these objects' lifetime in regard to the device > driver's lifetime, with proper handling for release. > - Add common handling and entries of sysfs/debugfs for these devices > with the ability for each device driver to add their own unique > entries. > > 2. I think that you underestimate (due to your experience) the "using > it properly" part... It is not so easy to do this properly for > inexperienced kernel people. If we provide all the code I mentioned > above, the device driver writer doesn't need to be aware of all these > kernel APIs. > Two more points I thought of as examples for added value to be done in common code: 1. Common code for handling dma-buf. Very similar to what was added a year ago to rdma. This can be accompanied by a common ioctl to export and import a dma-buf. 2. Common code to handle drivers that want to allow a single user at a time to run open the device char file. Oded > > > > It would be nice to at least identify something that could obviously > > be common, like some kind of enumeration and metadata kind of stuff > > (think like ethtool, devlink, rdma tool, nvemctl etc) > Definitely. I think we can have at least one ioctl that will be common > to all drivers from the start. > A kind of information retrieval ioctl. There are many information > points that I'm sure are common to most accelerators. We have an > extensive information ioctl in the habanalabs driver and most of the > information there is not habana specific imo. > > > > > I think that it is clear from my previous email what I intended to > > > provide. A clean, simple framework for devices to register with, get > > > services for the most basic stuff such as device char handling, > > > sysfs/debugfs. > > > > This should all be trivially done by any driver using the core codes, > > if you see gaps I'd ask why not improve the core code? > > > > > Later on, add more simple stuff such as memory manager > > > and uapi for memory handling. I guess someone can say all that exists > > > in drm, but like I said it exists in other subsystems as well. > > > > This makes sense to me, all accelerators need a way to set a memory > > map, but on the other hand we are doing some very nifty stuff in this > > area with iommufd and it might be very interesting to just have the > > accelerator driver link to that API instead of building yet another > > copy of pin_user_pages() code.. Especially with PASID likely becoming > > part of any accelerator toolkit. > Here I disagree with you. First of all, there are many relatively > simple accelerators, especially in edge, where PASID is really not > relevant. > Second, even for the more sophisticated PCIe/CXL-based ones, PASID is > not mandatory and I suspect that it won't be in 100% of those devices. > But definitely that should be an alternative to the "classic" way of > handling dma'able memory (pin_user_pages()). > > > > > I want to be perfectly honest and say there is nothing special here > > > for AI. It's actually the opposite, it is a generic framework for > > > compute only. Think of it as an easier path to upstream if you just > > > want to do compute acceleration. Maybe in the future it will be more, > > > but I can't predict the future. > > > > I can't either, and to be clear I'm only questioning the merit of a > > "subsystem" eg with a struct class, rigerous uAPI, etc. > > > > The idea that these kinds of accel drivers deserve specialized review > > makes sense to me, even if they remain as unorganized char > > devices. Just understand that is what you are signing up for :) > I understand. That's why I'm taking all your points very seriously. > This is not a decision that should be taken lightly and I want to be > sure most agree this is the correct way forward. > My next step is to talk to Dave about it in-depth. In his other email > he wrote some interesting ideas which I want to discuss with him. > > Maybe this is something that should be discussed in the kernel summit ? > > > > > Jason ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-07 11:25 ` Oded Gabbay @ 2022-08-08 6:10 ` Greg Kroah-Hartman -1 siblings, 0 replies; 78+ messages in thread From: Greg Kroah-Hartman @ 2022-08-08 6:10 UTC (permalink / raw) To: Oded Gabbay Cc: Jason Gunthorpe, Dave Airlie, dri-devel, Yuji Ishikawa, Jiho Chu, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org On Sun, Aug 07, 2022 at 02:25:33PM +0300, Oded Gabbay wrote: > 2. Common code to handle drivers that want to allow a single user at a > time to run open the device char file. Note, that's an impossible request, and one that the kernel should never worry about, so don't even try it. Think about userspace doing an call to dup() on an open char file descriptor and then passing that off somewhere else. thanks, greg k-h ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-08-08 6:10 ` Greg Kroah-Hartman 0 siblings, 0 replies; 78+ messages in thread From: Greg Kroah-Hartman @ 2022-08-08 6:10 UTC (permalink / raw) To: Oded Gabbay Cc: Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org, dri-devel, Jason Gunthorpe, Jiho Chu, Yuji Ishikawa On Sun, Aug 07, 2022 at 02:25:33PM +0300, Oded Gabbay wrote: > 2. Common code to handle drivers that want to allow a single user at a > time to run open the device char file. Note, that's an impossible request, and one that the kernel should never worry about, so don't even try it. Think about userspace doing an call to dup() on an open char file descriptor and then passing that off somewhere else. thanks, greg k-h ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-08 6:10 ` Greg Kroah-Hartman @ 2022-08-08 17:55 ` Jason Gunthorpe -1 siblings, 0 replies; 78+ messages in thread From: Jason Gunthorpe @ 2022-08-08 17:55 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Oded Gabbay, Dave Airlie, dri-devel, Yuji Ishikawa, Jiho Chu, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org On Mon, Aug 08, 2022 at 08:10:22AM +0200, Greg Kroah-Hartman wrote: > On Sun, Aug 07, 2022 at 02:25:33PM +0300, Oded Gabbay wrote: > > 2. Common code to handle drivers that want to allow a single user at a > > time to run open the device char file. > > Note, that's an impossible request, and one that the kernel should never > worry about, so don't even try it. Think about userspace doing an call > to dup() on an open char file descriptor and then passing that off > somewhere else. Oded is talking about a model like VFIO has where the HW has a limited number of concurrent state registers - lets say in this case the ASID translation mapping the accelerator into DMA. Each 'struct file' that is created owns one of those HW state registers, and each struct file is completely isolated from all others. eg someone controlling the accelerator through struct file A cannot access memory mapped into the accelerator through struct file B. So, the number of struct files that can be created is capped at the number of HW state registers the device can support (eg one for Habana). This is different from the number of FDs pointing at the struct file. Userpsace can open a HW state and point a lot of FDs at it, that is userspace's problem. From a kernel view they all share one struct file and thus one HW state. Jason ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-08-08 17:55 ` Jason Gunthorpe 0 siblings, 0 replies; 78+ messages in thread From: Jason Gunthorpe @ 2022-08-08 17:55 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org, dri-devel, Jiho Chu, Yuji Ishikawa On Mon, Aug 08, 2022 at 08:10:22AM +0200, Greg Kroah-Hartman wrote: > On Sun, Aug 07, 2022 at 02:25:33PM +0300, Oded Gabbay wrote: > > 2. Common code to handle drivers that want to allow a single user at a > > time to run open the device char file. > > Note, that's an impossible request, and one that the kernel should never > worry about, so don't even try it. Think about userspace doing an call > to dup() on an open char file descriptor and then passing that off > somewhere else. Oded is talking about a model like VFIO has where the HW has a limited number of concurrent state registers - lets say in this case the ASID translation mapping the accelerator into DMA. Each 'struct file' that is created owns one of those HW state registers, and each struct file is completely isolated from all others. eg someone controlling the accelerator through struct file A cannot access memory mapped into the accelerator through struct file B. So, the number of struct files that can be created is capped at the number of HW state registers the device can support (eg one for Habana). This is different from the number of FDs pointing at the struct file. Userpsace can open a HW state and point a lot of FDs at it, that is userspace's problem. From a kernel view they all share one struct file and thus one HW state. Jason ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-08 17:55 ` Jason Gunthorpe @ 2022-08-09 6:23 ` Greg Kroah-Hartman -1 siblings, 0 replies; 78+ messages in thread From: Greg Kroah-Hartman @ 2022-08-09 6:23 UTC (permalink / raw) To: Jason Gunthorpe Cc: Oded Gabbay, Dave Airlie, dri-devel, Yuji Ishikawa, Jiho Chu, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org On Mon, Aug 08, 2022 at 02:55:28PM -0300, Jason Gunthorpe wrote: > On Mon, Aug 08, 2022 at 08:10:22AM +0200, Greg Kroah-Hartman wrote: > > On Sun, Aug 07, 2022 at 02:25:33PM +0300, Oded Gabbay wrote: > > > 2. Common code to handle drivers that want to allow a single user at a > > > time to run open the device char file. > > > > Note, that's an impossible request, and one that the kernel should never > > worry about, so don't even try it. Think about userspace doing an call > > to dup() on an open char file descriptor and then passing that off > > somewhere else. > > Oded is talking about a model like VFIO has where the HW has a limited > number of concurrent state registers - lets say in this case the ASID > translation mapping the accelerator into DMA. Based on the number of drivers that I see submitted weekly that try to restrict their open call to just one user by using atomic variables or other tricks, I think my interpretation of this stands :) > Each 'struct file' that is created owns one of those HW state > registers, and each struct file is completely isolated from all > others. eg someone controlling the accelerator through struct file A > cannot access memory mapped into the accelerator through struct file > B. > > So, the number of struct files that can be created is capped at the > number of HW state registers the device can support (eg one for > Habana). > > This is different from the number of FDs pointing at the struct file. > Userpsace can open a HW state and point a lot of FDs at it, that is > userspace's problem. From a kernel view they all share one struct file > and thus one HW state. Yes, that's fine, if that is what is happening here, I have no objection. greg k-h ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-08-09 6:23 ` Greg Kroah-Hartman 0 siblings, 0 replies; 78+ messages in thread From: Greg Kroah-Hartman @ 2022-08-09 6:23 UTC (permalink / raw) To: Jason Gunthorpe Cc: Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org, dri-devel, Jiho Chu, Yuji Ishikawa On Mon, Aug 08, 2022 at 02:55:28PM -0300, Jason Gunthorpe wrote: > On Mon, Aug 08, 2022 at 08:10:22AM +0200, Greg Kroah-Hartman wrote: > > On Sun, Aug 07, 2022 at 02:25:33PM +0300, Oded Gabbay wrote: > > > 2. Common code to handle drivers that want to allow a single user at a > > > time to run open the device char file. > > > > Note, that's an impossible request, and one that the kernel should never > > worry about, so don't even try it. Think about userspace doing an call > > to dup() on an open char file descriptor and then passing that off > > somewhere else. > > Oded is talking about a model like VFIO has where the HW has a limited > number of concurrent state registers - lets say in this case the ASID > translation mapping the accelerator into DMA. Based on the number of drivers that I see submitted weekly that try to restrict their open call to just one user by using atomic variables or other tricks, I think my interpretation of this stands :) > Each 'struct file' that is created owns one of those HW state > registers, and each struct file is completely isolated from all > others. eg someone controlling the accelerator through struct file A > cannot access memory mapped into the accelerator through struct file > B. > > So, the number of struct files that can be created is capped at the > number of HW state registers the device can support (eg one for > Habana). > > This is different from the number of FDs pointing at the struct file. > Userpsace can open a HW state and point a lot of FDs at it, that is > userspace's problem. From a kernel view they all share one struct file > and thus one HW state. Yes, that's fine, if that is what is happening here, I have no objection. greg k-h ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-09 6:23 ` Greg Kroah-Hartman (?) @ 2022-08-09 8:04 ` Christoph Hellwig 2022-08-09 8:32 ` Arnd Bergmann 2022-08-09 8:45 ` Greg Kroah-Hartman -1 siblings, 2 replies; 78+ messages in thread From: Christoph Hellwig @ 2022-08-09 8:04 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Jason Gunthorpe, Oded Gabbay, Dave Airlie, dri-devel, Yuji Ishikawa, Jiho Chu, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org On Tue, Aug 09, 2022 at 08:23:27AM +0200, Greg Kroah-Hartman wrote: > Based on the number of drivers that I see submitted weekly that try to > restrict their open call to just one user by using atomic variables or > other tricks, I think my interpretation of this stands :) I think they really want what Jason described most of the time. They just don't know about the pitfalls of dup yet. > > This is different from the number of FDs pointing at the struct file. > > Userpsace can open a HW state and point a lot of FDs at it, that is > > userspace's problem. From a kernel view they all share one struct file > > and thus one HW state. > > Yes, that's fine, if that is what is happening here, I have no > objection. It would be great if we could actually life that into a common layer (chardev or vfs) given just how common this, and drivers tend to get it wrong, do it suboptimal so often. ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-09 8:04 ` Christoph Hellwig @ 2022-08-09 8:32 ` Arnd Bergmann 2022-08-09 8:45 ` Greg Kroah-Hartman 1 sibling, 0 replies; 78+ messages in thread From: Arnd Bergmann @ 2022-08-09 8:32 UTC (permalink / raw) To: Christoph Hellwig Cc: Greg Kroah-Hartman, Jason Gunthorpe, Oded Gabbay, Dave Airlie, dri-devel, Yuji Ishikawa, Jiho Chu, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org On Tue, Aug 9, 2022 at 10:04 AM Christoph Hellwig <hch@infradead.org> wrote: > On Tue, Aug 09, 2022 at 08:23:27AM +0200, Greg Kroah-Hartman wrote: > > > This is different from the number of FDs pointing at the struct file. > > > Userpsace can open a HW state and point a lot of FDs at it, that is > > > userspace's problem. From a kernel view they all share one struct file > > > and thus one HW state. > > > > Yes, that's fine, if that is what is happening here, I have no > > objection. > > It would be great if we could actually life that into a common > layer (chardev or vfs) given just how common this, and drivers tend > to get it wrong, do it suboptimal so often. Totally agreed. I think for devices with hardware MMU contexts you actually want to bind the context to a 'mm_struct', and then ensure that the context is only ever used from a process that shares this mm_struct, regardless of who else has access to the same file or inode. We did this in a somewhat hacky way in spufs a long time ago, and I would expect this to be solved more cleanly in drivers/gpu/drm, but I don't know where to look for that code. Arnd ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-08-09 8:32 ` Arnd Bergmann 0 siblings, 0 replies; 78+ messages in thread From: Arnd Bergmann @ 2022-08-09 8:32 UTC (permalink / raw) To: Christoph Hellwig Cc: Arnd Bergmann, Greg Kroah-Hartman, Linux-Kernel@Vger. Kernel. Org, dri-devel, Yuji Ishikawa, Jason Gunthorpe, Jiho Chu On Tue, Aug 9, 2022 at 10:04 AM Christoph Hellwig <hch@infradead.org> wrote: > On Tue, Aug 09, 2022 at 08:23:27AM +0200, Greg Kroah-Hartman wrote: > > > This is different from the number of FDs pointing at the struct file. > > > Userpsace can open a HW state and point a lot of FDs at it, that is > > > userspace's problem. From a kernel view they all share one struct file > > > and thus one HW state. > > > > Yes, that's fine, if that is what is happening here, I have no > > objection. > > It would be great if we could actually life that into a common > layer (chardev or vfs) given just how common this, and drivers tend > to get it wrong, do it suboptimal so often. Totally agreed. I think for devices with hardware MMU contexts you actually want to bind the context to a 'mm_struct', and then ensure that the context is only ever used from a process that shares this mm_struct, regardless of who else has access to the same file or inode. We did this in a somewhat hacky way in spufs a long time ago, and I would expect this to be solved more cleanly in drivers/gpu/drm, but I don't know where to look for that code. Arnd ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-09 8:32 ` Arnd Bergmann @ 2022-08-09 12:18 ` Jason Gunthorpe -1 siblings, 0 replies; 78+ messages in thread From: Jason Gunthorpe @ 2022-08-09 12:18 UTC (permalink / raw) To: Arnd Bergmann Cc: Christoph Hellwig, Greg Kroah-Hartman, Oded Gabbay, Dave Airlie, dri-devel, Yuji Ishikawa, Jiho Chu, Linux-Kernel@Vger. Kernel. Org On Tue, Aug 09, 2022 at 10:32:27AM +0200, Arnd Bergmann wrote: > On Tue, Aug 9, 2022 at 10:04 AM Christoph Hellwig <hch@infradead.org> wrote: > > On Tue, Aug 09, 2022 at 08:23:27AM +0200, Greg Kroah-Hartman wrote: > > > > This is different from the number of FDs pointing at the struct file. > > > > Userpsace can open a HW state and point a lot of FDs at it, that is > > > > userspace's problem. From a kernel view they all share one struct file > > > > and thus one HW state. > > > > > > Yes, that's fine, if that is what is happening here, I have no > > > objection. > > > > It would be great if we could actually life that into a common > > layer (chardev or vfs) given just how common this, and drivers tend > > to get it wrong, do it suboptimal so often. > > Totally agreed. > > I think for devices with hardware MMU contexts you actually want to > bind the context to a 'mm_struct', and then ensure that the context > is only ever used from a process that shares this mm_struct, > regardless of who else has access to the same file or inode. I can't think of a security justification for this. If process A stuffs part of its address space into the device and passes the FD to process B which can then access that addresss space, how is it any different from process A making a tmpfs, mmaping it, and passing it to process B which can then access that address space? IMHO the 'struct file' is the security domain and a process must be careful to only allow FDs to be created that meet its security needs. The kernel should not be involved in security here any further than using the standard FD security mechanisms. Who is to say there isn't a meaningful dual process use case for the accelerator? We see dual-process designs regularly in networking accelerators. Jason ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-08-09 12:18 ` Jason Gunthorpe 0 siblings, 0 replies; 78+ messages in thread From: Jason Gunthorpe @ 2022-08-09 12:18 UTC (permalink / raw) To: Arnd Bergmann Cc: Yuji Ishikawa, Greg Kroah-Hartman, Linux-Kernel@Vger. Kernel. Org, dri-devel, Christoph Hellwig, Jiho Chu On Tue, Aug 09, 2022 at 10:32:27AM +0200, Arnd Bergmann wrote: > On Tue, Aug 9, 2022 at 10:04 AM Christoph Hellwig <hch@infradead.org> wrote: > > On Tue, Aug 09, 2022 at 08:23:27AM +0200, Greg Kroah-Hartman wrote: > > > > This is different from the number of FDs pointing at the struct file. > > > > Userpsace can open a HW state and point a lot of FDs at it, that is > > > > userspace's problem. From a kernel view they all share one struct file > > > > and thus one HW state. > > > > > > Yes, that's fine, if that is what is happening here, I have no > > > objection. > > > > It would be great if we could actually life that into a common > > layer (chardev or vfs) given just how common this, and drivers tend > > to get it wrong, do it suboptimal so often. > > Totally agreed. > > I think for devices with hardware MMU contexts you actually want to > bind the context to a 'mm_struct', and then ensure that the context > is only ever used from a process that shares this mm_struct, > regardless of who else has access to the same file or inode. I can't think of a security justification for this. If process A stuffs part of its address space into the device and passes the FD to process B which can then access that addresss space, how is it any different from process A making a tmpfs, mmaping it, and passing it to process B which can then access that address space? IMHO the 'struct file' is the security domain and a process must be careful to only allow FDs to be created that meet its security needs. The kernel should not be involved in security here any further than using the standard FD security mechanisms. Who is to say there isn't a meaningful dual process use case for the accelerator? We see dual-process designs regularly in networking accelerators. Jason ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-09 12:18 ` Jason Gunthorpe @ 2022-08-09 12:46 ` Arnd Bergmann -1 siblings, 0 replies; 78+ messages in thread From: Arnd Bergmann @ 2022-08-09 12:46 UTC (permalink / raw) To: Jason Gunthorpe Cc: Arnd Bergmann, Christoph Hellwig, Greg Kroah-Hartman, Oded Gabbay, Dave Airlie, dri-devel, Yuji Ishikawa, Jiho Chu, Linux-Kernel@Vger. Kernel. Org On Tue, Aug 9, 2022 at 2:18 PM Jason Gunthorpe <jgg@nvidia.com> wrote: > On Tue, Aug 09, 2022 at 10:32:27AM +0200, Arnd Bergmann wrote: > > On Tue, Aug 9, 2022 at 10:04 AM Christoph Hellwig <hch@infradead.org> wrote: > > > > I think for devices with hardware MMU contexts you actually want to > > bind the context to a 'mm_struct', and then ensure that the context > > is only ever used from a process that shares this mm_struct, > > regardless of who else has access to the same file or inode. > > I can't think of a security justification for this. > > If process A stuffs part of its address space into the device and > passes the FD to process B which can then access that addresss space, > how is it any different from process A making a tmpfs, mmaping it, and > passing it to process B which can then access that address space? > > IMHO the 'struct file' is the security domain and a process must be > careful to only allow FDs to be created that meet its security needs. > > The kernel should not be involved in security here any further than > using the standard FD security mechanisms. > > Who is to say there isn't a meaningful dual process use case for the > accelerator? We see dual-process designs regularly in networking > accelerators. I think there is a lot of value in keeping things simple here, to make sure users and developers don't make a mess of it. If the accelerator has access to the memory of one process but I run it from another process, I lose the benefits of the shared page tables, but gain a lot of the complexity that I think the 'single-opener' logic was trying (without success) to avoid. E.g. attaching a debugger to single-step through the accelerator code would then involve at least four address spaces: - the addresses of the debugger - the addresses local to the accelerator - addresses in the shared address space of the process that owns the memory - addresses in the process that owns the accelerator context which is at least one more than I'd like to deal with. This is somewhat different for accelerators that have coherent access to a process address space and only access explicitly shared buffers. On these you could more easily allow sharing the file descriptor between any number of processes. Arnd ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-08-09 12:46 ` Arnd Bergmann 0 siblings, 0 replies; 78+ messages in thread From: Arnd Bergmann @ 2022-08-09 12:46 UTC (permalink / raw) To: Jason Gunthorpe Cc: Yuji Ishikawa, Arnd Bergmann, Greg Kroah-Hartman, Linux-Kernel@Vger. Kernel. Org, dri-devel, Christoph Hellwig, Jiho Chu On Tue, Aug 9, 2022 at 2:18 PM Jason Gunthorpe <jgg@nvidia.com> wrote: > On Tue, Aug 09, 2022 at 10:32:27AM +0200, Arnd Bergmann wrote: > > On Tue, Aug 9, 2022 at 10:04 AM Christoph Hellwig <hch@infradead.org> wrote: > > > > I think for devices with hardware MMU contexts you actually want to > > bind the context to a 'mm_struct', and then ensure that the context > > is only ever used from a process that shares this mm_struct, > > regardless of who else has access to the same file or inode. > > I can't think of a security justification for this. > > If process A stuffs part of its address space into the device and > passes the FD to process B which can then access that addresss space, > how is it any different from process A making a tmpfs, mmaping it, and > passing it to process B which can then access that address space? > > IMHO the 'struct file' is the security domain and a process must be > careful to only allow FDs to be created that meet its security needs. > > The kernel should not be involved in security here any further than > using the standard FD security mechanisms. > > Who is to say there isn't a meaningful dual process use case for the > accelerator? We see dual-process designs regularly in networking > accelerators. I think there is a lot of value in keeping things simple here, to make sure users and developers don't make a mess of it. If the accelerator has access to the memory of one process but I run it from another process, I lose the benefits of the shared page tables, but gain a lot of the complexity that I think the 'single-opener' logic was trying (without success) to avoid. E.g. attaching a debugger to single-step through the accelerator code would then involve at least four address spaces: - the addresses of the debugger - the addresses local to the accelerator - addresses in the shared address space of the process that owns the memory - addresses in the process that owns the accelerator context which is at least one more than I'd like to deal with. This is somewhat different for accelerators that have coherent access to a process address space and only access explicitly shared buffers. On these you could more easily allow sharing the file descriptor between any number of processes. Arnd ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-09 12:46 ` Arnd Bergmann @ 2022-08-09 14:22 ` Jason Gunthorpe -1 siblings, 0 replies; 78+ messages in thread From: Jason Gunthorpe @ 2022-08-09 14:22 UTC (permalink / raw) To: Arnd Bergmann Cc: Christoph Hellwig, Greg Kroah-Hartman, Oded Gabbay, Dave Airlie, dri-devel, Yuji Ishikawa, Jiho Chu, Linux-Kernel@Vger. Kernel. Org On Tue, Aug 09, 2022 at 02:46:36PM +0200, Arnd Bergmann wrote: > On Tue, Aug 9, 2022 at 2:18 PM Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Tue, Aug 09, 2022 at 10:32:27AM +0200, Arnd Bergmann wrote: > > > On Tue, Aug 9, 2022 at 10:04 AM Christoph Hellwig <hch@infradead.org> wrote: > > > > > > I think for devices with hardware MMU contexts you actually want to > > > bind the context to a 'mm_struct', and then ensure that the context > > > is only ever used from a process that shares this mm_struct, > > > regardless of who else has access to the same file or inode. > > > > I can't think of a security justification for this. > > > > If process A stuffs part of its address space into the device and > > passes the FD to process B which can then access that addresss space, > > how is it any different from process A making a tmpfs, mmaping it, and > > passing it to process B which can then access that address space? > > > > IMHO the 'struct file' is the security domain and a process must be > > careful to only allow FDs to be created that meet its security needs. > > > > The kernel should not be involved in security here any further than > > using the standard FD security mechanisms. > > > > Who is to say there isn't a meaningful dual process use case for the > > accelerator? We see dual-process designs regularly in networking > > accelerators. > > I think there is a lot of value in keeping things simple here, to > make sure users and developers don't make a mess of it. I don't think the kernel should enforce policy on userspace. As long as the kernel side is simple and reasonable then it should let userspace make whatever mess it likes. We have a lot of experiance here now, and userspaces do take advantage of this flexability in more well established accelerator subsystems, like DRM and RDMA. > If the accelerator has access to the memory of one process but I run > it from another process, I lose the benefits of the shared page > tables, There are several accelerator "ASID" models I've seen - devices can have one or many ASID's and the ASID's can be map/unmap style or forced 1:1 with a mm_struct (usually called SVA/SVM). Userspace is responsible to figure out how to use this stuff. With map/unmap there should be no kernel restriction on what mappings can be created, but often sane userspaces will probably want to stick to 1:1 map/unmap with a single process. > E.g. attaching a debugger to single-step through the accelerator code > would then involve at least four address spaces: > > - the addresses of the debugger > - the addresses local to the accelerator > - addresses in the shared address space of the process that owns > the memory > - addresses in the process that owns the accelerator context > > which is at least one more than I'd like to deal with. It is a FD. There is no "owns the accelerator context" - that concept is an abuse of the FD model, IMHO. If you are debugging you have the mmu_struct of each of the threads you are debugging and each of the ASID's loaded in the accelerator to deal with - it is inherent in the hardware design. > This is somewhat different for accelerators that have coherent > access to a process address space and only access explicitly > shared buffers. On these you could more easily allow sharing the > file descriptor between any number of processes. That is just a multi-ASID accelerator and userspace has linked a unique SVA ASID to each unique process using the FD. The thing to understand is that the FD represents the security context, so it is very resonable on a multi-ASID device I could share the same security context with two co-operating processes, create two ASID's and do accelerator operations that work jointly across both memory spaces. For instance I might consume a source from process B, process it and deliver the result into process A where process A will then send it over the network or something. We have these kinds of use models already. Jason ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-08-09 14:22 ` Jason Gunthorpe 0 siblings, 0 replies; 78+ messages in thread From: Jason Gunthorpe @ 2022-08-09 14:22 UTC (permalink / raw) To: Arnd Bergmann Cc: Yuji Ishikawa, Greg Kroah-Hartman, Linux-Kernel@Vger. Kernel. Org, dri-devel, Christoph Hellwig, Jiho Chu On Tue, Aug 09, 2022 at 02:46:36PM +0200, Arnd Bergmann wrote: > On Tue, Aug 9, 2022 at 2:18 PM Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Tue, Aug 09, 2022 at 10:32:27AM +0200, Arnd Bergmann wrote: > > > On Tue, Aug 9, 2022 at 10:04 AM Christoph Hellwig <hch@infradead.org> wrote: > > > > > > I think for devices with hardware MMU contexts you actually want to > > > bind the context to a 'mm_struct', and then ensure that the context > > > is only ever used from a process that shares this mm_struct, > > > regardless of who else has access to the same file or inode. > > > > I can't think of a security justification for this. > > > > If process A stuffs part of its address space into the device and > > passes the FD to process B which can then access that addresss space, > > how is it any different from process A making a tmpfs, mmaping it, and > > passing it to process B which can then access that address space? > > > > IMHO the 'struct file' is the security domain and a process must be > > careful to only allow FDs to be created that meet its security needs. > > > > The kernel should not be involved in security here any further than > > using the standard FD security mechanisms. > > > > Who is to say there isn't a meaningful dual process use case for the > > accelerator? We see dual-process designs regularly in networking > > accelerators. > > I think there is a lot of value in keeping things simple here, to > make sure users and developers don't make a mess of it. I don't think the kernel should enforce policy on userspace. As long as the kernel side is simple and reasonable then it should let userspace make whatever mess it likes. We have a lot of experiance here now, and userspaces do take advantage of this flexability in more well established accelerator subsystems, like DRM and RDMA. > If the accelerator has access to the memory of one process but I run > it from another process, I lose the benefits of the shared page > tables, There are several accelerator "ASID" models I've seen - devices can have one or many ASID's and the ASID's can be map/unmap style or forced 1:1 with a mm_struct (usually called SVA/SVM). Userspace is responsible to figure out how to use this stuff. With map/unmap there should be no kernel restriction on what mappings can be created, but often sane userspaces will probably want to stick to 1:1 map/unmap with a single process. > E.g. attaching a debugger to single-step through the accelerator code > would then involve at least four address spaces: > > - the addresses of the debugger > - the addresses local to the accelerator > - addresses in the shared address space of the process that owns > the memory > - addresses in the process that owns the accelerator context > > which is at least one more than I'd like to deal with. It is a FD. There is no "owns the accelerator context" - that concept is an abuse of the FD model, IMHO. If you are debugging you have the mmu_struct of each of the threads you are debugging and each of the ASID's loaded in the accelerator to deal with - it is inherent in the hardware design. > This is somewhat different for accelerators that have coherent > access to a process address space and only access explicitly > shared buffers. On these you could more easily allow sharing the > file descriptor between any number of processes. That is just a multi-ASID accelerator and userspace has linked a unique SVA ASID to each unique process using the FD. The thing to understand is that the FD represents the security context, so it is very resonable on a multi-ASID device I could share the same security context with two co-operating processes, create two ASID's and do accelerator operations that work jointly across both memory spaces. For instance I might consume a source from process B, process it and deliver the result into process A where process A will then send it over the network or something. We have these kinds of use models already. Jason ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-09 8:04 ` Christoph Hellwig @ 2022-08-09 8:45 ` Greg Kroah-Hartman 2022-08-09 8:45 ` Greg Kroah-Hartman 1 sibling, 0 replies; 78+ messages in thread From: Greg Kroah-Hartman @ 2022-08-09 8:45 UTC (permalink / raw) To: Christoph Hellwig Cc: Jason Gunthorpe, Oded Gabbay, Dave Airlie, dri-devel, Yuji Ishikawa, Jiho Chu, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org On Tue, Aug 09, 2022 at 01:04:15AM -0700, Christoph Hellwig wrote: > On Tue, Aug 09, 2022 at 08:23:27AM +0200, Greg Kroah-Hartman wrote: > > Based on the number of drivers that I see submitted weekly that try to > > restrict their open call to just one user by using atomic variables or > > other tricks, I think my interpretation of this stands :) > > I think they really want what Jason described most of the time. They > just don't know about the pitfalls of dup yet. > > > > This is different from the number of FDs pointing at the struct file. > > > Userpsace can open a HW state and point a lot of FDs at it, that is > > > userspace's problem. From a kernel view they all share one struct file > > > and thus one HW state. > > > > Yes, that's fine, if that is what is happening here, I have no > > objection. > > It would be great if we could actually life that into a common > layer (chardev or vfs) given just how common this, and drivers tend > to get it wrong, do it suboptimal so often. No objection from me, I'll gladly take patches to chardev or miscdev to support this. greg k-h ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-08-09 8:45 ` Greg Kroah-Hartman 0 siblings, 0 replies; 78+ messages in thread From: Greg Kroah-Hartman @ 2022-08-09 8:45 UTC (permalink / raw) To: Christoph Hellwig Cc: Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org, dri-devel, Jason Gunthorpe, Jiho Chu, Yuji Ishikawa On Tue, Aug 09, 2022 at 01:04:15AM -0700, Christoph Hellwig wrote: > On Tue, Aug 09, 2022 at 08:23:27AM +0200, Greg Kroah-Hartman wrote: > > Based on the number of drivers that I see submitted weekly that try to > > restrict their open call to just one user by using atomic variables or > > other tricks, I think my interpretation of this stands :) > > I think they really want what Jason described most of the time. They > just don't know about the pitfalls of dup yet. > > > > This is different from the number of FDs pointing at the struct file. > > > Userpsace can open a HW state and point a lot of FDs at it, that is > > > userspace's problem. From a kernel view they all share one struct file > > > and thus one HW state. > > > > Yes, that's fine, if that is what is happening here, I have no > > objection. > > It would be great if we could actually life that into a common > layer (chardev or vfs) given just how common this, and drivers tend > to get it wrong, do it suboptimal so often. No objection from me, I'll gladly take patches to chardev or miscdev to support this. greg k-h ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-07 6:43 ` Oded Gabbay @ 2022-08-08 17:46 ` Jason Gunthorpe -1 siblings, 0 replies; 78+ messages in thread From: Jason Gunthorpe @ 2022-08-08 17:46 UTC (permalink / raw) To: Oded Gabbay Cc: Dave Airlie, dri-devel, Greg Kroah-Hartman, Yuji Ishikawa, Jiho Chu, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org On Sun, Aug 07, 2022 at 09:43:40AM +0300, Oded Gabbay wrote: > 1. If there is a subsystem which is responsible for creating and > exposing the device character files, then there should be some code > that connects between each device driver to that subsystem. > i.e. There should be functions that each driver should call from its > probe and release callback functions. > > Those functions should take care of the following: > - Create metadata for the device, the device's minor(s) and the > driver's ioctls table and driver's callback for file operations (both > are common for all the driver's devices). Save all that metadata with > proper locking. > - Create the device char files themselves and supply file operations > that will be called per each open/close/mmap/etc. > - Keep track of all these objects' lifetime in regard to the device > driver's lifetime, with proper handling for release. > - Add common handling and entries of sysfs/debugfs for these devices > with the ability for each device driver to add their own unique > entries. > > 2. I think that you underestimate (due to your experience) the "using > it properly" part... It is not so easy to do this properly for > inexperienced kernel people. If we provide all the code I mentioned > above, the device driver writer doesn't need to be aware of all these > kernel APIs. This may be, but it still seems weird to me to justify a subsystem as "making existing APIs simpler so drivers don't mess them up". It suggests perhaps we need additional core API helpers? > > It would be nice to at least identify something that could obviously > > be common, like some kind of enumeration and metadata kind of stuff > > (think like ethtool, devlink, rdma tool, nvemctl etc) > Definitely. I think we can have at least one ioctl that will be common > to all drivers from the start. Generally you don't want that as an ioctl because you have to open the device to execute it, there may be permissions issues for instance - or if you have a single-open-at-a-time model like VFIO, then it doesn't work well together. Usually this would be sysfs or netlink. > > This makes sense to me, all accelerators need a way to set a memory > > map, but on the other hand we are doing some very nifty stuff in this > > area with iommufd and it might be very interesting to just have the > > accelerator driver link to that API instead of building yet another > > copy of pin_user_pages() code.. Especially with PASID likely becoming > > part of any accelerator toolkit. > > Here I disagree with you. First of all, there are many relatively > simple accelerators, especially in edge, where PASID is really not > relevant. > Second, even for the more sophisticated PCIe/CXL-based ones, PASID is > not mandatory and I suspect that it won't be in 100% of those devices. > But definitely that should be an alternative to the "classic" way of > handling dma'able memory (pin_user_pages()). My point was that iommufd can do the pinning for you and dump that result into a iommu based PASID, or it can do the pinning for you and allow the driver to translate it into its own page table format eg the ASID in the habana device. We don't need to have map/unmap APIs to manage address spaces in every subsystem. > Maybe this is something that should be discussed in the kernel summit ? Maybe, I expect to be at LPC at least Jason ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-08-08 17:46 ` Jason Gunthorpe 0 siblings, 0 replies; 78+ messages in thread From: Jason Gunthorpe @ 2022-08-08 17:46 UTC (permalink / raw) To: Oded Gabbay Cc: Arnd Bergmann, Greg Kroah-Hartman, Linux-Kernel@Vger. Kernel. Org, dri-devel, Jiho Chu, Yuji Ishikawa On Sun, Aug 07, 2022 at 09:43:40AM +0300, Oded Gabbay wrote: > 1. If there is a subsystem which is responsible for creating and > exposing the device character files, then there should be some code > that connects between each device driver to that subsystem. > i.e. There should be functions that each driver should call from its > probe and release callback functions. > > Those functions should take care of the following: > - Create metadata for the device, the device's minor(s) and the > driver's ioctls table and driver's callback for file operations (both > are common for all the driver's devices). Save all that metadata with > proper locking. > - Create the device char files themselves and supply file operations > that will be called per each open/close/mmap/etc. > - Keep track of all these objects' lifetime in regard to the device > driver's lifetime, with proper handling for release. > - Add common handling and entries of sysfs/debugfs for these devices > with the ability for each device driver to add their own unique > entries. > > 2. I think that you underestimate (due to your experience) the "using > it properly" part... It is not so easy to do this properly for > inexperienced kernel people. If we provide all the code I mentioned > above, the device driver writer doesn't need to be aware of all these > kernel APIs. This may be, but it still seems weird to me to justify a subsystem as "making existing APIs simpler so drivers don't mess them up". It suggests perhaps we need additional core API helpers? > > It would be nice to at least identify something that could obviously > > be common, like some kind of enumeration and metadata kind of stuff > > (think like ethtool, devlink, rdma tool, nvemctl etc) > Definitely. I think we can have at least one ioctl that will be common > to all drivers from the start. Generally you don't want that as an ioctl because you have to open the device to execute it, there may be permissions issues for instance - or if you have a single-open-at-a-time model like VFIO, then it doesn't work well together. Usually this would be sysfs or netlink. > > This makes sense to me, all accelerators need a way to set a memory > > map, but on the other hand we are doing some very nifty stuff in this > > area with iommufd and it might be very interesting to just have the > > accelerator driver link to that API instead of building yet another > > copy of pin_user_pages() code.. Especially with PASID likely becoming > > part of any accelerator toolkit. > > Here I disagree with you. First of all, there are many relatively > simple accelerators, especially in edge, where PASID is really not > relevant. > Second, even for the more sophisticated PCIe/CXL-based ones, PASID is > not mandatory and I suspect that it won't be in 100% of those devices. > But definitely that should be an alternative to the "classic" way of > handling dma'able memory (pin_user_pages()). My point was that iommufd can do the pinning for you and dump that result into a iommu based PASID, or it can do the pinning for you and allow the driver to translate it into its own page table format eg the ASID in the habana device. We don't need to have map/unmap APIs to manage address spaces in every subsystem. > Maybe this is something that should be discussed in the kernel summit ? Maybe, I expect to be at LPC at least Jason ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-08 17:46 ` Jason Gunthorpe @ 2022-08-08 20:26 ` Oded Gabbay -1 siblings, 0 replies; 78+ messages in thread From: Oded Gabbay @ 2022-08-08 20:26 UTC (permalink / raw) To: Jason Gunthorpe Cc: Dave Airlie, dri-devel, Greg Kroah-Hartman, Yuji Ishikawa, Jiho Chu, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org On Mon, Aug 8, 2022 at 8:46 PM Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Sun, Aug 07, 2022 at 09:43:40AM +0300, Oded Gabbay wrote: > > > 1. If there is a subsystem which is responsible for creating and > > exposing the device character files, then there should be some code > > that connects between each device driver to that subsystem. > > i.e. There should be functions that each driver should call from its > > probe and release callback functions. > > > > Those functions should take care of the following: > > - Create metadata for the device, the device's minor(s) and the > > driver's ioctls table and driver's callback for file operations (both > > are common for all the driver's devices). Save all that metadata with > > proper locking. > > - Create the device char files themselves and supply file operations > > that will be called per each open/close/mmap/etc. > > - Keep track of all these objects' lifetime in regard to the device > > driver's lifetime, with proper handling for release. > > - Add common handling and entries of sysfs/debugfs for these devices > > with the ability for each device driver to add their own unique > > entries. > > > > 2. I think that you underestimate (due to your experience) the "using > > it properly" part... It is not so easy to do this properly for > > inexperienced kernel people. If we provide all the code I mentioned > > above, the device driver writer doesn't need to be aware of all these > > kernel APIs. > > This may be, but it still seems weird to me to justify a subsystem as > "making existing APIs simpler so drivers don't mess them up". It > suggests perhaps we need additional core API helpers? > I'm sorry, but my original argument was poorly phrased. I'll try to phrase it better. What I'm trying to say is that imo one of the end goals of doing a common subsystem is to provide a common uAPI for all the drivers that belong to that subsystem. I wrote this argument in a previous email as a criteria whether a driver should join a subsystem. So if you want a common uAPI and a common userspace library to use it, you need to expose the same device character files for every device, regardless of the driver. e.g. you need all devices to be called /dev/accelX and not /dev/habanaX or /dev/nvidiaX This means that the whole device character creation/removal/open/close is done in the common subsystem, not in each driver individually. So even if it is a simple code as you said, it still must reside in the subsystem common code. Once you put that code there, you need to add meta-data as different drivers attach to the subsystem and ask to create devices and minors when their probe function is called. In addition, you need to remove all this code from each individual driver. That's what I mean by abstracting all this kernel API from the drivers. Not because it is an API that is hard to use, but because the drivers should *not* use it at all. I think drm did that pretty well. Their code defines objects for driver, device and minors, with resource manager that will take care of releasing the objects automatically (it is based on devres.c). > > > It would be nice to at least identify something that could obviously > > > be common, like some kind of enumeration and metadata kind of stuff > > > (think like ethtool, devlink, rdma tool, nvemctl etc) > > Definitely. I think we can have at least one ioctl that will be common > > to all drivers from the start. > > Generally you don't want that as an ioctl because you have to open the > device to execute it, there may be permissions issues for instance - > or if you have a single-open-at-a-time model like VFIO, then it > doesn't work well together. > > Usually this would be sysfs or netlink. So actually I do want an ioctl but as you said, not for the main device char, but to an accompanied control device char. In habana we define two device char per device. One is the compute which behaves like VFIO, and one is a control device which has no limitation on the number of applications that can access it. However, an application only has access to the information ioctl through this device char (so it can't submit anything, allocate memory, etc.) and can only retrieve metrics which do not leak information about the compute application. The reason I want an ioctl is because it is much more flexible than sysfs and allows writing proper software to monitor the device in the data-center. At least, that's my experience from the deployment we had so far. > > > > This makes sense to me, all accelerators need a way to set a memory > > > map, but on the other hand we are doing some very nifty stuff in this > > > area with iommufd and it might be very interesting to just have the > > > accelerator driver link to that API instead of building yet another > > > copy of pin_user_pages() code.. Especially with PASID likely becoming > > > part of any accelerator toolkit. > > > > Here I disagree with you. First of all, there are many relatively > > simple accelerators, especially in edge, where PASID is really not > > relevant. > > Second, even for the more sophisticated PCIe/CXL-based ones, PASID is > > not mandatory and I suspect that it won't be in 100% of those devices. > > But definitely that should be an alternative to the "classic" way of > > handling dma'able memory (pin_user_pages()). > > My point was that iommufd can do the pinning for you and dump that > result into a iommu based PASID, or it can do the pinning for you and > allow the driver to translate it into its own page table format eg the > ASID in the habana device. > > We don't need to have map/unmap APIs to manage address spaces in every > subsystem. I see, I will need to learn this more in-depth if/when it will become relevant. But of course I agree that reusing existing code is much better. > > > Maybe this is something that should be discussed in the kernel summit ? > > Maybe, I expect to be at LPC at least Great! Oded > > Jason ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-08-08 20:26 ` Oded Gabbay 0 siblings, 0 replies; 78+ messages in thread From: Oded Gabbay @ 2022-08-08 20:26 UTC (permalink / raw) To: Jason Gunthorpe Cc: Arnd Bergmann, Greg Kroah-Hartman, Linux-Kernel@Vger. Kernel. Org, dri-devel, Jiho Chu, Yuji Ishikawa On Mon, Aug 8, 2022 at 8:46 PM Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Sun, Aug 07, 2022 at 09:43:40AM +0300, Oded Gabbay wrote: > > > 1. If there is a subsystem which is responsible for creating and > > exposing the device character files, then there should be some code > > that connects between each device driver to that subsystem. > > i.e. There should be functions that each driver should call from its > > probe and release callback functions. > > > > Those functions should take care of the following: > > - Create metadata for the device, the device's minor(s) and the > > driver's ioctls table and driver's callback for file operations (both > > are common for all the driver's devices). Save all that metadata with > > proper locking. > > - Create the device char files themselves and supply file operations > > that will be called per each open/close/mmap/etc. > > - Keep track of all these objects' lifetime in regard to the device > > driver's lifetime, with proper handling for release. > > - Add common handling and entries of sysfs/debugfs for these devices > > with the ability for each device driver to add their own unique > > entries. > > > > 2. I think that you underestimate (due to your experience) the "using > > it properly" part... It is not so easy to do this properly for > > inexperienced kernel people. If we provide all the code I mentioned > > above, the device driver writer doesn't need to be aware of all these > > kernel APIs. > > This may be, but it still seems weird to me to justify a subsystem as > "making existing APIs simpler so drivers don't mess them up". It > suggests perhaps we need additional core API helpers? > I'm sorry, but my original argument was poorly phrased. I'll try to phrase it better. What I'm trying to say is that imo one of the end goals of doing a common subsystem is to provide a common uAPI for all the drivers that belong to that subsystem. I wrote this argument in a previous email as a criteria whether a driver should join a subsystem. So if you want a common uAPI and a common userspace library to use it, you need to expose the same device character files for every device, regardless of the driver. e.g. you need all devices to be called /dev/accelX and not /dev/habanaX or /dev/nvidiaX This means that the whole device character creation/removal/open/close is done in the common subsystem, not in each driver individually. So even if it is a simple code as you said, it still must reside in the subsystem common code. Once you put that code there, you need to add meta-data as different drivers attach to the subsystem and ask to create devices and minors when their probe function is called. In addition, you need to remove all this code from each individual driver. That's what I mean by abstracting all this kernel API from the drivers. Not because it is an API that is hard to use, but because the drivers should *not* use it at all. I think drm did that pretty well. Their code defines objects for driver, device and minors, with resource manager that will take care of releasing the objects automatically (it is based on devres.c). > > > It would be nice to at least identify something that could obviously > > > be common, like some kind of enumeration and metadata kind of stuff > > > (think like ethtool, devlink, rdma tool, nvemctl etc) > > Definitely. I think we can have at least one ioctl that will be common > > to all drivers from the start. > > Generally you don't want that as an ioctl because you have to open the > device to execute it, there may be permissions issues for instance - > or if you have a single-open-at-a-time model like VFIO, then it > doesn't work well together. > > Usually this would be sysfs or netlink. So actually I do want an ioctl but as you said, not for the main device char, but to an accompanied control device char. In habana we define two device char per device. One is the compute which behaves like VFIO, and one is a control device which has no limitation on the number of applications that can access it. However, an application only has access to the information ioctl through this device char (so it can't submit anything, allocate memory, etc.) and can only retrieve metrics which do not leak information about the compute application. The reason I want an ioctl is because it is much more flexible than sysfs and allows writing proper software to monitor the device in the data-center. At least, that's my experience from the deployment we had so far. > > > > This makes sense to me, all accelerators need a way to set a memory > > > map, but on the other hand we are doing some very nifty stuff in this > > > area with iommufd and it might be very interesting to just have the > > > accelerator driver link to that API instead of building yet another > > > copy of pin_user_pages() code.. Especially with PASID likely becoming > > > part of any accelerator toolkit. > > > > Here I disagree with you. First of all, there are many relatively > > simple accelerators, especially in edge, where PASID is really not > > relevant. > > Second, even for the more sophisticated PCIe/CXL-based ones, PASID is > > not mandatory and I suspect that it won't be in 100% of those devices. > > But definitely that should be an alternative to the "classic" way of > > handling dma'able memory (pin_user_pages()). > > My point was that iommufd can do the pinning for you and dump that > result into a iommu based PASID, or it can do the pinning for you and > allow the driver to translate it into its own page table format eg the > ASID in the habana device. > > We don't need to have map/unmap APIs to manage address spaces in every > subsystem. I see, I will need to learn this more in-depth if/when it will become relevant. But of course I agree that reusing existing code is much better. > > > Maybe this is something that should be discussed in the kernel summit ? > > Maybe, I expect to be at LPC at least Great! Oded > > Jason ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-08 20:26 ` Oded Gabbay @ 2022-08-09 12:43 ` Jason Gunthorpe -1 siblings, 0 replies; 78+ messages in thread From: Jason Gunthorpe @ 2022-08-09 12:43 UTC (permalink / raw) To: Oded Gabbay Cc: Dave Airlie, dri-devel, Greg Kroah-Hartman, Yuji Ishikawa, Jiho Chu, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org On Mon, Aug 08, 2022 at 11:26:11PM +0300, Oded Gabbay wrote: > So if you want a common uAPI and a common userspace library to use it, > you need to expose the same device character files for every device, > regardless of the driver. e.g. you need all devices to be called > /dev/accelX and not /dev/habanaX or /dev/nvidiaX So, this is an interesting idea. One of the things we did in RDMA that turned our very well is to have the user side of the kernel/user API in a single git repo for all the drivers, including the lowest layer of the driver-specific APIs. It gives a reasonable target for a DRM-like test of "you must have a userspace". Ie send your userspace and userspace documentation/tests before your kernel side can be merged. Even if it is just a git repo collecting and curating driver-specific libraries under the "accel" banner it could be quite a useful activity. But, probably this boils down to things that look like: device = habana_open_device() habana_mooo(device) device = nvidia_open_device() nvidia_baaa(device) > That's what I mean by abstracting all this kernel API from the > drivers. Not because it is an API that is hard to use, but because the > drivers should *not* use it at all. > > I think drm did that pretty well. Their code defines objects for > driver, device and minors, with resource manager that will take care > of releasing the objects automatically (it is based on devres.c). We have lots of examples of subsystems doing this - the main thing unique about accel is that that there is really no shared uAPI between the drivers, and not 'abstraction' provided by the kernel. Maybe that is the point.. > So actually I do want an ioctl but as you said, not for the main > device char, but to an accompanied control device char. There is a general problem across all these "thick" devices in the kernel to support their RAS & configuration requirements and IMHO we don't have a good answer at all. We've been talking on and off here about having some kind of subsystem/methodology specifically for this area - how to monitor, configure, service, etc a very complicated off-CPU device. I think there would be a lot of interest in this and maybe it shouldn't be coupled to this accel idea. Eg we already have some established mechinisms - I would expect any accel device to be able to introspect and upgrade its flash FW using the 'devlink flash' common API. > an application only has access to the information ioctl through this > device char (so it can't submit anything, allocate memory, etc.) and > can only retrieve metrics which do not leak information about the > compute application. This is often being done over a netlink socket as the "second char" Jason ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-08-09 12:43 ` Jason Gunthorpe 0 siblings, 0 replies; 78+ messages in thread From: Jason Gunthorpe @ 2022-08-09 12:43 UTC (permalink / raw) To: Oded Gabbay Cc: Arnd Bergmann, Greg Kroah-Hartman, Linux-Kernel@Vger. Kernel. Org, dri-devel, Jiho Chu, Yuji Ishikawa On Mon, Aug 08, 2022 at 11:26:11PM +0300, Oded Gabbay wrote: > So if you want a common uAPI and a common userspace library to use it, > you need to expose the same device character files for every device, > regardless of the driver. e.g. you need all devices to be called > /dev/accelX and not /dev/habanaX or /dev/nvidiaX So, this is an interesting idea. One of the things we did in RDMA that turned our very well is to have the user side of the kernel/user API in a single git repo for all the drivers, including the lowest layer of the driver-specific APIs. It gives a reasonable target for a DRM-like test of "you must have a userspace". Ie send your userspace and userspace documentation/tests before your kernel side can be merged. Even if it is just a git repo collecting and curating driver-specific libraries under the "accel" banner it could be quite a useful activity. But, probably this boils down to things that look like: device = habana_open_device() habana_mooo(device) device = nvidia_open_device() nvidia_baaa(device) > That's what I mean by abstracting all this kernel API from the > drivers. Not because it is an API that is hard to use, but because the > drivers should *not* use it at all. > > I think drm did that pretty well. Their code defines objects for > driver, device and minors, with resource manager that will take care > of releasing the objects automatically (it is based on devres.c). We have lots of examples of subsystems doing this - the main thing unique about accel is that that there is really no shared uAPI between the drivers, and not 'abstraction' provided by the kernel. Maybe that is the point.. > So actually I do want an ioctl but as you said, not for the main > device char, but to an accompanied control device char. There is a general problem across all these "thick" devices in the kernel to support their RAS & configuration requirements and IMHO we don't have a good answer at all. We've been talking on and off here about having some kind of subsystem/methodology specifically for this area - how to monitor, configure, service, etc a very complicated off-CPU device. I think there would be a lot of interest in this and maybe it shouldn't be coupled to this accel idea. Eg we already have some established mechinisms - I would expect any accel device to be able to introspect and upgrade its flash FW using the 'devlink flash' common API. > an application only has access to the information ioctl through this > device char (so it can't submit anything, allocate memory, etc.) and > can only retrieve metrics which do not leak information about the > compute application. This is often being done over a netlink socket as the "second char" Jason ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-04 7:43 ` Oded Gabbay @ 2022-08-05 3:02 ` Dave Airlie -1 siblings, 0 replies; 78+ messages in thread From: Dave Airlie @ 2022-08-05 3:02 UTC (permalink / raw) To: Oded Gabbay Cc: dri-devel, Greg Kroah-Hartman, Yuji Ishikawa, Jiho Chu, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org On Thu, 4 Aug 2022 at 17:44, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > On Thu, Aug 4, 2022 at 2:54 AM Dave Airlie <airlied@gmail.com> wrote: > > > > On Thu, 4 Aug 2022 at 06:21, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > > > > On Wed, Aug 3, 2022 at 10:04 PM Dave Airlie <airlied@gmail.com> wrote: > > > > > > > > On Sun, 31 Jul 2022 at 22:04, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > > > > > > > > Hi, > > > > > Greg and I talked a couple of months ago about preparing a new accel > > > > > subsystem for compute/acceleration devices that are not GPUs and I > > > > > think your drivers that you are now trying to upstream fit it as well. > > > > > > > > We've had some submissions for not-GPUs to the drm subsystem recently. > > > > > > > > Intel GNA, Intel VPU, NVDLA, rpmsg AI processor unit. > > > > > > > > why is creating a new subsystem at this time necessary? > > > > > > > > Are we just creating a subsystem to avoid the open source userspace > > > > consumer rules? Or do we have some concrete reasoning behind it? > > > > > > > > Dave. > > > > > > Hi Dave. > > > The reason it happened now is because I saw two drivers, which are > > > doing h/w acceleration for AI, trying to be accepted to the misc > > > subsystem. > > > Add to that the fact I talked with Greg a couple of months ago about > > > doing a subsystem for any compute accelerators, which he was positive > > > about, I thought it is a good opportunity to finally do it. > > > > > > I also honestly think that I can contribute much to these drivers from > > > my experience with the habana driver (which is now deployed in mass at > > > AWS) and contribute code from the habana driver to a common framework > > > for AI drivers. > > > > Why not port the habana driver to drm now instead? I don't get why it > > wouldn't make sense? > > imho, no, I don't see the upside. This is not a trivial change, and > will require a large effort. What will it give me that I need and I > don't have now ? The opportunity for review, code sharing, experience of locking hierarchy, mm integration? IMHO The biggest thing that drm has is the community of people who understand accelerators, memory management, userspace command submissions, fencing, dma-buf etc. It's hard to have input to those discussions from the outside, and they are always ongoing. I think one of the Intel teams reported dropping a lot of code on their drm port due to stuff already being there, I'd expect the same for you. The opposite question is also valid, what does moving to a new subsystem help you or others, when there is one with all the infrastructure and more importantly reviewers. I'd be happy to have accelerator submaintainers, I'd be happy to even create an ACCELERATOR property for non-gpu drivers, so they can opt out of some of the GPUier stuff, like multiple device node users etc, or even create a new class of device nodes under /dev/dri. > I totally agree. We need to set some rules and make sure everyone in > the kernel community is familiar with them, because now you get > different answers based on who you consult with. > > My rules of thumb that I thought of was that if you don't have any > display (you don't need to support X/wayland) and you don't need to > support opengl/vulkan/opencl/directx or any other gpu-related software > stack, then you don't have to go through drm. > In other words, if you don't have gpu-specific h/w and/or you don't > need gpu uAPI, you don't belong in drm. What happens when NVIDIA submit a driver for just compute or intel? for what is actually a GPU? This has been suggested as workaround for our userspace rules a few times. If my GPU can do compute tasks, do I have to add an accelerator subsystem driver alongside my GPU one? > After all, memory management services, or common device chars handling > I can get from other subsystems (e.g. rdma) as well. I'm sure I could > model my uAPI to be rdma uAPI compliant (I can define proprietary uAPI > there as well), but this doesn't mean I belong there, right ? Good point, but I think accelerators do mostly belong in drm or media, because there is enough framework around them to allow them to work, without reinventing everything. > > > > I think the one area I can see a divide where a new subsystem is for > > accelerators that are single-user, one shot type things like media > > drivers (though maybe they could be just media drivers). > > > > I think anything that does command offloading to firmware or queues > > belongs in drm, because that is pretty much what the framework does. I > I think this is a very broad statement which doesn't reflect reality > in the kernel. I think the habanalabs driver is one of the only ones that is outside this really, and in major use. There might be one or two other minor drivers with no real users. Dave. ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-08-05 3:02 ` Dave Airlie 0 siblings, 0 replies; 78+ messages in thread From: Dave Airlie @ 2022-08-05 3:02 UTC (permalink / raw) To: Oded Gabbay Cc: Arnd Bergmann, Greg Kroah-Hartman, Linux-Kernel@Vger. Kernel. Org, dri-devel, Jiho Chu, Yuji Ishikawa On Thu, 4 Aug 2022 at 17:44, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > On Thu, Aug 4, 2022 at 2:54 AM Dave Airlie <airlied@gmail.com> wrote: > > > > On Thu, 4 Aug 2022 at 06:21, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > > > > On Wed, Aug 3, 2022 at 10:04 PM Dave Airlie <airlied@gmail.com> wrote: > > > > > > > > On Sun, 31 Jul 2022 at 22:04, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > > > > > > > > Hi, > > > > > Greg and I talked a couple of months ago about preparing a new accel > > > > > subsystem for compute/acceleration devices that are not GPUs and I > > > > > think your drivers that you are now trying to upstream fit it as well. > > > > > > > > We've had some submissions for not-GPUs to the drm subsystem recently. > > > > > > > > Intel GNA, Intel VPU, NVDLA, rpmsg AI processor unit. > > > > > > > > why is creating a new subsystem at this time necessary? > > > > > > > > Are we just creating a subsystem to avoid the open source userspace > > > > consumer rules? Or do we have some concrete reasoning behind it? > > > > > > > > Dave. > > > > > > Hi Dave. > > > The reason it happened now is because I saw two drivers, which are > > > doing h/w acceleration for AI, trying to be accepted to the misc > > > subsystem. > > > Add to that the fact I talked with Greg a couple of months ago about > > > doing a subsystem for any compute accelerators, which he was positive > > > about, I thought it is a good opportunity to finally do it. > > > > > > I also honestly think that I can contribute much to these drivers from > > > my experience with the habana driver (which is now deployed in mass at > > > AWS) and contribute code from the habana driver to a common framework > > > for AI drivers. > > > > Why not port the habana driver to drm now instead? I don't get why it > > wouldn't make sense? > > imho, no, I don't see the upside. This is not a trivial change, and > will require a large effort. What will it give me that I need and I > don't have now ? The opportunity for review, code sharing, experience of locking hierarchy, mm integration? IMHO The biggest thing that drm has is the community of people who understand accelerators, memory management, userspace command submissions, fencing, dma-buf etc. It's hard to have input to those discussions from the outside, and they are always ongoing. I think one of the Intel teams reported dropping a lot of code on their drm port due to stuff already being there, I'd expect the same for you. The opposite question is also valid, what does moving to a new subsystem help you or others, when there is one with all the infrastructure and more importantly reviewers. I'd be happy to have accelerator submaintainers, I'd be happy to even create an ACCELERATOR property for non-gpu drivers, so they can opt out of some of the GPUier stuff, like multiple device node users etc, or even create a new class of device nodes under /dev/dri. > I totally agree. We need to set some rules and make sure everyone in > the kernel community is familiar with them, because now you get > different answers based on who you consult with. > > My rules of thumb that I thought of was that if you don't have any > display (you don't need to support X/wayland) and you don't need to > support opengl/vulkan/opencl/directx or any other gpu-related software > stack, then you don't have to go through drm. > In other words, if you don't have gpu-specific h/w and/or you don't > need gpu uAPI, you don't belong in drm. What happens when NVIDIA submit a driver for just compute or intel? for what is actually a GPU? This has been suggested as workaround for our userspace rules a few times. If my GPU can do compute tasks, do I have to add an accelerator subsystem driver alongside my GPU one? > After all, memory management services, or common device chars handling > I can get from other subsystems (e.g. rdma) as well. I'm sure I could > model my uAPI to be rdma uAPI compliant (I can define proprietary uAPI > there as well), but this doesn't mean I belong there, right ? Good point, but I think accelerators do mostly belong in drm or media, because there is enough framework around them to allow them to work, without reinventing everything. > > > > I think the one area I can see a divide where a new subsystem is for > > accelerators that are single-user, one shot type things like media > > drivers (though maybe they could be just media drivers). > > > > I think anything that does command offloading to firmware or queues > > belongs in drm, because that is pretty much what the framework does. I > I think this is a very broad statement which doesn't reflect reality > in the kernel. I think the habanalabs driver is one of the only ones that is outside this really, and in major use. There might be one or two other minor drivers with no real users. Dave. ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-05 3:02 ` Dave Airlie @ 2022-08-07 6:50 ` Oded Gabbay -1 siblings, 0 replies; 78+ messages in thread From: Oded Gabbay @ 2022-08-07 6:50 UTC (permalink / raw) To: Dave Airlie Cc: dri-devel, Greg Kroah-Hartman, Yuji Ishikawa, Jiho Chu, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org On Fri, Aug 5, 2022 at 6:03 AM Dave Airlie <airlied@gmail.com> wrote: > > On Thu, 4 Aug 2022 at 17:44, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > > On Thu, Aug 4, 2022 at 2:54 AM Dave Airlie <airlied@gmail.com> wrote: > > > > > > On Thu, 4 Aug 2022 at 06:21, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > > > > > > On Wed, Aug 3, 2022 at 10:04 PM Dave Airlie <airlied@gmail.com> wrote: > > > > > > > > > > On Sun, 31 Jul 2022 at 22:04, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > > > > > > > > > > Hi, > > > > > > Greg and I talked a couple of months ago about preparing a new accel > > > > > > subsystem for compute/acceleration devices that are not GPUs and I > > > > > > think your drivers that you are now trying to upstream fit it as well. > > > > > > > > > > We've had some submissions for not-GPUs to the drm subsystem recently. > > > > > > > > > > Intel GNA, Intel VPU, NVDLA, rpmsg AI processor unit. > > > > > > > > > > why is creating a new subsystem at this time necessary? > > > > > > > > > > Are we just creating a subsystem to avoid the open source userspace > > > > > consumer rules? Or do we have some concrete reasoning behind it? > > > > > > > > > > Dave. > > > > > > > > Hi Dave. > > > > The reason it happened now is because I saw two drivers, which are > > > > doing h/w acceleration for AI, trying to be accepted to the misc > > > > subsystem. > > > > Add to that the fact I talked with Greg a couple of months ago about > > > > doing a subsystem for any compute accelerators, which he was positive > > > > about, I thought it is a good opportunity to finally do it. > > > > > > > > I also honestly think that I can contribute much to these drivers from > > > > my experience with the habana driver (which is now deployed in mass at > > > > AWS) and contribute code from the habana driver to a common framework > > > > for AI drivers. > > > > > > Why not port the habana driver to drm now instead? I don't get why it > > > wouldn't make sense? > > > > imho, no, I don't see the upside. This is not a trivial change, and > > will require a large effort. What will it give me that I need and I > > don't have now ? > > The opportunity for review, code sharing, experience of locking > hierarchy, mm integration? > > IMHO The biggest thing that drm has is the community of people who > understand accelerators, memory management, userspace command > submissions, fencing, dma-buf etc. > > It's hard to have input to those discussions from the outside, and > they are always ongoing. > > I think one of the Intel teams reported dropping a lot of code on > their drm port due to stuff already being there, I'd expect the same > for you. > > The opposite question is also valid, what does moving to a new > subsystem help you or others, when there is one with all the > infrastructure and more importantly reviewers. > > I'd be happy to have accelerator submaintainers, I'd be happy to even > create an ACCELERATOR property for non-gpu drivers, so they can opt > out of some of the GPUier stuff, like multiple device node users etc, > or even create a new class of device nodes under /dev/dri. > I'm taking all what you wrote seriously, these are all good points. As I wrote to Jason, I don't want to jump the gun here. I think we should discuss this and explore the possibilities that you suggested because I would like to reach consensus if possible. Maybe this is something we can discuss in LPC or in the kernel summit ? Oded > > > I totally agree. We need to set some rules and make sure everyone in > > the kernel community is familiar with them, because now you get > > different answers based on who you consult with. > > > > My rules of thumb that I thought of was that if you don't have any > > display (you don't need to support X/wayland) and you don't need to > > support opengl/vulkan/opencl/directx or any other gpu-related software > > stack, then you don't have to go through drm. > > In other words, if you don't have gpu-specific h/w and/or you don't > > need gpu uAPI, you don't belong in drm. > > What happens when NVIDIA submit a driver for just compute or intel? > for what is actually a GPU? > This has been suggested as workaround for our userspace rules a few times. > > If my GPU can do compute tasks, do I have to add an accelerator > subsystem driver alongside my GPU one? > > > After all, memory management services, or common device chars handling > > I can get from other subsystems (e.g. rdma) as well. I'm sure I could > > model my uAPI to be rdma uAPI compliant (I can define proprietary uAPI > > there as well), but this doesn't mean I belong there, right ? > > Good point, but I think accelerators do mostly belong in drm or media, > because there is enough framework around them to allow them to work, > without reinventing everything. > > > > > > > I think the one area I can see a divide where a new subsystem is for > > > accelerators that are single-user, one shot type things like media > > > drivers (though maybe they could be just media drivers). > > > > > > I think anything that does command offloading to firmware or queues > > > belongs in drm, because that is pretty much what the framework does. I > > I think this is a very broad statement which doesn't reflect reality > > in the kernel. > > I think the habanalabs driver is one of the only ones that is outside > this really, and in major use. There might be one or two other minor > drivers with no real users. > > Dave. ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-08-07 6:50 ` Oded Gabbay 0 siblings, 0 replies; 78+ messages in thread From: Oded Gabbay @ 2022-08-07 6:50 UTC (permalink / raw) To: Dave Airlie Cc: Arnd Bergmann, Greg Kroah-Hartman, Linux-Kernel@Vger. Kernel. Org, dri-devel, Jiho Chu, Yuji Ishikawa On Fri, Aug 5, 2022 at 6:03 AM Dave Airlie <airlied@gmail.com> wrote: > > On Thu, 4 Aug 2022 at 17:44, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > > On Thu, Aug 4, 2022 at 2:54 AM Dave Airlie <airlied@gmail.com> wrote: > > > > > > On Thu, 4 Aug 2022 at 06:21, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > > > > > > On Wed, Aug 3, 2022 at 10:04 PM Dave Airlie <airlied@gmail.com> wrote: > > > > > > > > > > On Sun, 31 Jul 2022 at 22:04, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > > > > > > > > > > Hi, > > > > > > Greg and I talked a couple of months ago about preparing a new accel > > > > > > subsystem for compute/acceleration devices that are not GPUs and I > > > > > > think your drivers that you are now trying to upstream fit it as well. > > > > > > > > > > We've had some submissions for not-GPUs to the drm subsystem recently. > > > > > > > > > > Intel GNA, Intel VPU, NVDLA, rpmsg AI processor unit. > > > > > > > > > > why is creating a new subsystem at this time necessary? > > > > > > > > > > Are we just creating a subsystem to avoid the open source userspace > > > > > consumer rules? Or do we have some concrete reasoning behind it? > > > > > > > > > > Dave. > > > > > > > > Hi Dave. > > > > The reason it happened now is because I saw two drivers, which are > > > > doing h/w acceleration for AI, trying to be accepted to the misc > > > > subsystem. > > > > Add to that the fact I talked with Greg a couple of months ago about > > > > doing a subsystem for any compute accelerators, which he was positive > > > > about, I thought it is a good opportunity to finally do it. > > > > > > > > I also honestly think that I can contribute much to these drivers from > > > > my experience with the habana driver (which is now deployed in mass at > > > > AWS) and contribute code from the habana driver to a common framework > > > > for AI drivers. > > > > > > Why not port the habana driver to drm now instead? I don't get why it > > > wouldn't make sense? > > > > imho, no, I don't see the upside. This is not a trivial change, and > > will require a large effort. What will it give me that I need and I > > don't have now ? > > The opportunity for review, code sharing, experience of locking > hierarchy, mm integration? > > IMHO The biggest thing that drm has is the community of people who > understand accelerators, memory management, userspace command > submissions, fencing, dma-buf etc. > > It's hard to have input to those discussions from the outside, and > they are always ongoing. > > I think one of the Intel teams reported dropping a lot of code on > their drm port due to stuff already being there, I'd expect the same > for you. > > The opposite question is also valid, what does moving to a new > subsystem help you or others, when there is one with all the > infrastructure and more importantly reviewers. > > I'd be happy to have accelerator submaintainers, I'd be happy to even > create an ACCELERATOR property for non-gpu drivers, so they can opt > out of some of the GPUier stuff, like multiple device node users etc, > or even create a new class of device nodes under /dev/dri. > I'm taking all what you wrote seriously, these are all good points. As I wrote to Jason, I don't want to jump the gun here. I think we should discuss this and explore the possibilities that you suggested because I would like to reach consensus if possible. Maybe this is something we can discuss in LPC or in the kernel summit ? Oded > > > I totally agree. We need to set some rules and make sure everyone in > > the kernel community is familiar with them, because now you get > > different answers based on who you consult with. > > > > My rules of thumb that I thought of was that if you don't have any > > display (you don't need to support X/wayland) and you don't need to > > support opengl/vulkan/opencl/directx or any other gpu-related software > > stack, then you don't have to go through drm. > > In other words, if you don't have gpu-specific h/w and/or you don't > > need gpu uAPI, you don't belong in drm. > > What happens when NVIDIA submit a driver for just compute or intel? > for what is actually a GPU? > This has been suggested as workaround for our userspace rules a few times. > > If my GPU can do compute tasks, do I have to add an accelerator > subsystem driver alongside my GPU one? > > > After all, memory management services, or common device chars handling > > I can get from other subsystems (e.g. rdma) as well. I'm sure I could > > model my uAPI to be rdma uAPI compliant (I can define proprietary uAPI > > there as well), but this doesn't mean I belong there, right ? > > Good point, but I think accelerators do mostly belong in drm or media, > because there is enough framework around them to allow them to work, > without reinventing everything. > > > > > > > I think the one area I can see a divide where a new subsystem is for > > > accelerators that are single-user, one shot type things like media > > > drivers (though maybe they could be just media drivers). > > > > > > I think anything that does command offloading to firmware or queues > > > belongs in drm, because that is pretty much what the framework does. I > > I think this is a very broad statement which doesn't reflect reality > > in the kernel. > > I think the habanalabs driver is one of the only ones that is outside > this really, and in major use. There might be one or two other minor > drivers with no real users. > > Dave. ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-07 6:50 ` Oded Gabbay @ 2022-08-09 21:42 ` Oded Gabbay -1 siblings, 0 replies; 78+ messages in thread From: Oded Gabbay @ 2022-08-09 21:42 UTC (permalink / raw) To: Dave Airlie, Greg Kroah-Hartman, Yuji Ishikawa, Jiho Chu Cc: dri-devel, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org, Jason Gunthorpe On Sun, Aug 7, 2022 at 9:50 AM Oded Gabbay <oded.gabbay@gmail.com> wrote: > > On Fri, Aug 5, 2022 at 6:03 AM Dave Airlie <airlied@gmail.com> wrote: > > > > On Thu, 4 Aug 2022 at 17:44, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > > > > On Thu, Aug 4, 2022 at 2:54 AM Dave Airlie <airlied@gmail.com> wrote: > > > > > > > > On Thu, 4 Aug 2022 at 06:21, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > > > > > > > > On Wed, Aug 3, 2022 at 10:04 PM Dave Airlie <airlied@gmail.com> wrote: > > > > > > > > > > > > On Sun, 31 Jul 2022 at 22:04, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > > > > > > > > > > > > Hi, > > > > > > > Greg and I talked a couple of months ago about preparing a new accel > > > > > > > subsystem for compute/acceleration devices that are not GPUs and I > > > > > > > think your drivers that you are now trying to upstream fit it as well. > > > > > > > > > > > > We've had some submissions for not-GPUs to the drm subsystem recently. > > > > > > > > > > > > Intel GNA, Intel VPU, NVDLA, rpmsg AI processor unit. > > > > > > > > > > > > why is creating a new subsystem at this time necessary? > > > > > > > > > > > > Are we just creating a subsystem to avoid the open source userspace > > > > > > consumer rules? Or do we have some concrete reasoning behind it? > > > > > > > > > > > > Dave. > > > > > > > > > > Hi Dave. > > > > > The reason it happened now is because I saw two drivers, which are > > > > > doing h/w acceleration for AI, trying to be accepted to the misc > > > > > subsystem. > > > > > Add to that the fact I talked with Greg a couple of months ago about > > > > > doing a subsystem for any compute accelerators, which he was positive > > > > > about, I thought it is a good opportunity to finally do it. > > > > > > > > > > I also honestly think that I can contribute much to these drivers from > > > > > my experience with the habana driver (which is now deployed in mass at > > > > > AWS) and contribute code from the habana driver to a common framework > > > > > for AI drivers. > > > > > > > > Why not port the habana driver to drm now instead? I don't get why it > > > > wouldn't make sense? > > > > > > imho, no, I don't see the upside. This is not a trivial change, and > > > will require a large effort. What will it give me that I need and I > > > don't have now ? > > > > The opportunity for review, code sharing, experience of locking > > hierarchy, mm integration? > > > > IMHO The biggest thing that drm has is the community of people who > > understand accelerators, memory management, userspace command > > submissions, fencing, dma-buf etc. > > > > It's hard to have input to those discussions from the outside, and > > they are always ongoing. > > > > I think one of the Intel teams reported dropping a lot of code on > > their drm port due to stuff already being there, I'd expect the same > > for you. > > > > The opposite question is also valid, what does moving to a new > > subsystem help you or others, when there is one with all the > > infrastructure and more importantly reviewers. > > > > I'd be happy to have accelerator submaintainers, I'd be happy to even > > create an ACCELERATOR property for non-gpu drivers, so they can opt > > out of some of the GPUier stuff, like multiple device node users etc, > > or even create a new class of device nodes under /dev/dri. > > > I'm taking all what you wrote seriously, these are all good points. > As I wrote to Jason, I don't want to jump the gun here. I think we > should discuss this and explore the possibilities that you suggested > because I would like to reach consensus if possible. > Maybe this is something we can discuss in LPC or in the kernel summit ? > > Oded Hi Jiho, Yuji. I want to update that I'm currently in discussions with Dave to figure out what's the best way to move forward. We are writing it down to do a proper comparison between the two paths (new accel subsystem or using drm). I guess it will take a week or so. In the meantime, I'm putting the accel code on hold. I have only managed to do the very basic infra and add a demo driver that shows how to register and unregister from it. You can check the code at: https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux.git/log/?h=accel It has two commits. The first adds the subsystem code and the second adds the demo driver. The subsystem code is basically drm code copied and renamed and slightly modified, but I really only worked on it for a couple of hours so take that into consideration. The important thing is that the demo driver shows the basic steps are really simple. You need to add two function calls in your probe and one function call in your release. Of course you will need to supply some function callbacks, but I haven't got to fill that in the demo driver. Once you register, you get /dev/accel/ac0 and /dev/accel/ac_controlD64 (if you want a control device). If I were to continue this, the next step is to do the open and close part. I will update once we know where things are heading. As I said, I imagine it can take a few weeks. Thanks, Oded Oded > > > > > > I totally agree. We need to set some rules and make sure everyone in > > > the kernel community is familiar with them, because now you get > > > different answers based on who you consult with. > > > > > > My rules of thumb that I thought of was that if you don't have any > > > display (you don't need to support X/wayland) and you don't need to > > > support opengl/vulkan/opencl/directx or any other gpu-related software > > > stack, then you don't have to go through drm. > > > In other words, if you don't have gpu-specific h/w and/or you don't > > > need gpu uAPI, you don't belong in drm. > > > > What happens when NVIDIA submit a driver for just compute or intel? > > for what is actually a GPU? > > This has been suggested as workaround for our userspace rules a few times. > > > > If my GPU can do compute tasks, do I have to add an accelerator > > subsystem driver alongside my GPU one? > > > > > After all, memory management services, or common device chars handling > > > I can get from other subsystems (e.g. rdma) as well. I'm sure I could > > > model my uAPI to be rdma uAPI compliant (I can define proprietary uAPI > > > there as well), but this doesn't mean I belong there, right ? > > > > Good point, but I think accelerators do mostly belong in drm or media, > > because there is enough framework around them to allow them to work, > > without reinventing everything. > > > > > > > > > > I think the one area I can see a divide where a new subsystem is for > > > > accelerators that are single-user, one shot type things like media > > > > drivers (though maybe they could be just media drivers). > > > > > > > > I think anything that does command offloading to firmware or queues > > > > belongs in drm, because that is pretty much what the framework does. I > > > I think this is a very broad statement which doesn't reflect reality > > > in the kernel. > > > > I think the habanalabs driver is one of the only ones that is outside > > this really, and in major use. There might be one or two other minor > > drivers with no real users. > > > > Dave. ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-08-09 21:42 ` Oded Gabbay 0 siblings, 0 replies; 78+ messages in thread From: Oded Gabbay @ 2022-08-09 21:42 UTC (permalink / raw) To: Dave Airlie, Greg Kroah-Hartman, Yuji Ishikawa, Jiho Chu Cc: Jason Gunthorpe, Arnd Bergmann, dri-devel, Linux-Kernel@Vger. Kernel. Org On Sun, Aug 7, 2022 at 9:50 AM Oded Gabbay <oded.gabbay@gmail.com> wrote: > > On Fri, Aug 5, 2022 at 6:03 AM Dave Airlie <airlied@gmail.com> wrote: > > > > On Thu, 4 Aug 2022 at 17:44, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > > > > On Thu, Aug 4, 2022 at 2:54 AM Dave Airlie <airlied@gmail.com> wrote: > > > > > > > > On Thu, 4 Aug 2022 at 06:21, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > > > > > > > > On Wed, Aug 3, 2022 at 10:04 PM Dave Airlie <airlied@gmail.com> wrote: > > > > > > > > > > > > On Sun, 31 Jul 2022 at 22:04, Oded Gabbay <oded.gabbay@gmail.com> wrote: > > > > > > > > > > > > > > Hi, > > > > > > > Greg and I talked a couple of months ago about preparing a new accel > > > > > > > subsystem for compute/acceleration devices that are not GPUs and I > > > > > > > think your drivers that you are now trying to upstream fit it as well. > > > > > > > > > > > > We've had some submissions for not-GPUs to the drm subsystem recently. > > > > > > > > > > > > Intel GNA, Intel VPU, NVDLA, rpmsg AI processor unit. > > > > > > > > > > > > why is creating a new subsystem at this time necessary? > > > > > > > > > > > > Are we just creating a subsystem to avoid the open source userspace > > > > > > consumer rules? Or do we have some concrete reasoning behind it? > > > > > > > > > > > > Dave. > > > > > > > > > > Hi Dave. > > > > > The reason it happened now is because I saw two drivers, which are > > > > > doing h/w acceleration for AI, trying to be accepted to the misc > > > > > subsystem. > > > > > Add to that the fact I talked with Greg a couple of months ago about > > > > > doing a subsystem for any compute accelerators, which he was positive > > > > > about, I thought it is a good opportunity to finally do it. > > > > > > > > > > I also honestly think that I can contribute much to these drivers from > > > > > my experience with the habana driver (which is now deployed in mass at > > > > > AWS) and contribute code from the habana driver to a common framework > > > > > for AI drivers. > > > > > > > > Why not port the habana driver to drm now instead? I don't get why it > > > > wouldn't make sense? > > > > > > imho, no, I don't see the upside. This is not a trivial change, and > > > will require a large effort. What will it give me that I need and I > > > don't have now ? > > > > The opportunity for review, code sharing, experience of locking > > hierarchy, mm integration? > > > > IMHO The biggest thing that drm has is the community of people who > > understand accelerators, memory management, userspace command > > submissions, fencing, dma-buf etc. > > > > It's hard to have input to those discussions from the outside, and > > they are always ongoing. > > > > I think one of the Intel teams reported dropping a lot of code on > > their drm port due to stuff already being there, I'd expect the same > > for you. > > > > The opposite question is also valid, what does moving to a new > > subsystem help you or others, when there is one with all the > > infrastructure and more importantly reviewers. > > > > I'd be happy to have accelerator submaintainers, I'd be happy to even > > create an ACCELERATOR property for non-gpu drivers, so they can opt > > out of some of the GPUier stuff, like multiple device node users etc, > > or even create a new class of device nodes under /dev/dri. > > > I'm taking all what you wrote seriously, these are all good points. > As I wrote to Jason, I don't want to jump the gun here. I think we > should discuss this and explore the possibilities that you suggested > because I would like to reach consensus if possible. > Maybe this is something we can discuss in LPC or in the kernel summit ? > > Oded Hi Jiho, Yuji. I want to update that I'm currently in discussions with Dave to figure out what's the best way to move forward. We are writing it down to do a proper comparison between the two paths (new accel subsystem or using drm). I guess it will take a week or so. In the meantime, I'm putting the accel code on hold. I have only managed to do the very basic infra and add a demo driver that shows how to register and unregister from it. You can check the code at: https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux.git/log/?h=accel It has two commits. The first adds the subsystem code and the second adds the demo driver. The subsystem code is basically drm code copied and renamed and slightly modified, but I really only worked on it for a couple of hours so take that into consideration. The important thing is that the demo driver shows the basic steps are really simple. You need to add two function calls in your probe and one function call in your release. Of course you will need to supply some function callbacks, but I haven't got to fill that in the demo driver. Once you register, you get /dev/accel/ac0 and /dev/accel/ac_controlD64 (if you want a control device). If I were to continue this, the next step is to do the open and close part. I will update once we know where things are heading. As I said, I imagine it can take a few weeks. Thanks, Oded Oded > > > > > > I totally agree. We need to set some rules and make sure everyone in > > > the kernel community is familiar with them, because now you get > > > different answers based on who you consult with. > > > > > > My rules of thumb that I thought of was that if you don't have any > > > display (you don't need to support X/wayland) and you don't need to > > > support opengl/vulkan/opencl/directx or any other gpu-related software > > > stack, then you don't have to go through drm. > > > In other words, if you don't have gpu-specific h/w and/or you don't > > > need gpu uAPI, you don't belong in drm. > > > > What happens when NVIDIA submit a driver for just compute or intel? > > for what is actually a GPU? > > This has been suggested as workaround for our userspace rules a few times. > > > > If my GPU can do compute tasks, do I have to add an accelerator > > subsystem driver alongside my GPU one? > > > > > After all, memory management services, or common device chars handling > > > I can get from other subsystems (e.g. rdma) as well. I'm sure I could > > > model my uAPI to be rdma uAPI compliant (I can define proprietary uAPI > > > there as well), but this doesn't mean I belong there, right ? > > > > Good point, but I think accelerators do mostly belong in drm or media, > > because there is enough framework around them to allow them to work, > > without reinventing everything. > > > > > > > > > > I think the one area I can see a divide where a new subsystem is for > > > > accelerators that are single-user, one shot type things like media > > > > drivers (though maybe they could be just media drivers). > > > > > > > > I think anything that does command offloading to firmware or queues > > > > belongs in drm, because that is pretty much what the framework does. I > > > I think this is a very broad statement which doesn't reflect reality > > > in the kernel. > > > > I think the habanalabs driver is one of the only ones that is outside > > this really, and in major use. There might be one or two other minor > > drivers with no real users. > > > > Dave. ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-09 21:42 ` Oded Gabbay @ 2022-08-10 9:00 ` Jiho Chu -1 siblings, 0 replies; 78+ messages in thread From: Jiho Chu @ 2022-08-10 9:00 UTC (permalink / raw) To: Oded Gabbay Cc: Dave Airlie, Greg Kroah-Hartman, Yuji Ishikawa, dri-devel, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org, Jason Gunthorpe On Wed, 10 Aug 2022 00:42:24 +0300 Oded Gabbay <oded.gabbay@gmail.com> wrote: > > Hi Jiho, Yuji. > > I want to update that I'm currently in discussions with Dave to figure > out what's the best way to move forward. We are writing it down to do > a proper comparison between the two paths (new accel subsystem or > using drm). I guess it will take a week or so. > > In the meantime, I'm putting the accel code on hold. I have only > managed to do the very basic infra and add a demo driver that shows > how to register and unregister from it. > You can check the code at: > https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux.git/log/?h=accel > > It has two commits. The first adds the subsystem code and the second > adds the demo driver. > The subsystem code is basically drm code copied and renamed and > slightly modified, but I really only worked on it for a couple of > hours so take that into consideration. > > The important thing is that the demo driver shows the basic steps are > really simple. You need to add two function calls in your probe and > one function call in your release. Of course you will need to supply > some function callbacks, but I haven't got to fill that in the demo > driver. Once you register, you get /dev/accel/ac0 and > /dev/accel/ac_controlD64 (if you want a control device). If I were to > continue this, the next step is to do the open and close part. > > I will update once we know where things are heading. As I said, I > imagine it can take a few weeks. > > Thanks, > Oded > Hi Oded, Thanks for sharing your code, it looks good start from basic drm code. I hope the discussion makes better result. Thanks, Jiho Chu ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-08-10 9:00 ` Jiho Chu 0 siblings, 0 replies; 78+ messages in thread From: Jiho Chu @ 2022-08-10 9:00 UTC (permalink / raw) To: Oded Gabbay Cc: Arnd Bergmann, Greg Kroah-Hartman, Linux-Kernel@Vger. Kernel. Org, dri-devel, Jason Gunthorpe, Yuji Ishikawa On Wed, 10 Aug 2022 00:42:24 +0300 Oded Gabbay <oded.gabbay@gmail.com> wrote: > > Hi Jiho, Yuji. > > I want to update that I'm currently in discussions with Dave to figure > out what's the best way to move forward. We are writing it down to do > a proper comparison between the two paths (new accel subsystem or > using drm). I guess it will take a week or so. > > In the meantime, I'm putting the accel code on hold. I have only > managed to do the very basic infra and add a demo driver that shows > how to register and unregister from it. > You can check the code at: > https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux.git/log/?h=accel > > It has two commits. The first adds the subsystem code and the second > adds the demo driver. > The subsystem code is basically drm code copied and renamed and > slightly modified, but I really only worked on it for a couple of > hours so take that into consideration. > > The important thing is that the demo driver shows the basic steps are > really simple. You need to add two function calls in your probe and > one function call in your release. Of course you will need to supply > some function callbacks, but I haven't got to fill that in the demo > driver. Once you register, you get /dev/accel/ac0 and > /dev/accel/ac_controlD64 (if you want a control device). If I were to > continue this, the next step is to do the open and close part. > > I will update once we know where things are heading. As I said, I > imagine it can take a few weeks. > > Thanks, > Oded > Hi Oded, Thanks for sharing your code, it looks good start from basic drm code. I hope the discussion makes better result. Thanks, Jiho Chu ^ permalink raw reply [flat|nested] 78+ messages in thread
* RE: New subsystem for acceleration devices 2022-08-09 21:42 ` Oded Gabbay @ 2022-08-10 14:05 ` yuji2.ishikawa -1 siblings, 0 replies; 78+ messages in thread From: yuji2.ishikawa @ 2022-08-10 14:05 UTC (permalink / raw) To: oded.gabbay, airlied, gregkh, jiho.chu; +Cc: dri-devel, arnd, linux-kernel, jgg > -----Original Message----- > From: Oded Gabbay <oded.gabbay@gmail.com> > Sent: Wednesday, August 10, 2022 6:42 AM > To: Dave Airlie <airlied@gmail.com>; Greg Kroah-Hartman > <gregkh@linuxfoundation.org>; ishikawa yuji(石川 悠司 ○RDC□AITC○ > EA開) <yuji2.ishikawa@toshiba.co.jp>; Jiho Chu <jiho.chu@samsung.com> > Cc: dri-devel <dri-devel@lists.freedesktop.org>; Arnd Bergmann > <arnd@arndb.de>; Linux-Kernel@Vger. Kernel. Org > <linux-kernel@vger.kernel.org>; Jason Gunthorpe <jgg@nvidia.com> > Subject: Re: New subsystem for acceleration devices > > Hi Jiho, Yuji. > > I want to update that I'm currently in discussions with Dave to figure out what's > the best way to move forward. We are writing it down to do a proper comparison > between the two paths (new accel subsystem or using drm). I guess it will take > a week or so. > > In the meantime, I'm putting the accel code on hold. I have only managed to do > the very basic infra and add a demo driver that shows how to register and > unregister from it. > You can check the code at: > https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux.git/log/?h=ac > cel > > It has two commits. The first adds the subsystem code and the second adds the > demo driver. > The subsystem code is basically drm code copied and renamed and slightly > modified, but I really only worked on it for a couple of hours so take that into > consideration. > > The important thing is that the demo driver shows the basic steps are really > simple. You need to add two function calls in your probe and one function call in > your release. Of course you will need to supply some function callbacks, but I > haven't got to fill that in the demo driver. Once you register, you get > /dev/accel/ac0 and > /dev/accel/ac_controlD64 (if you want a control device). If I were to continue > this, the next step is to do the open and close part. > > I will update once we know where things are heading. As I said, I imagine it can > take a few weeks. > > Thanks, > Oded Hi Odded, Thank you for uploading the framework as well as a sample. It's exciting to see new software is growing up. Since Visconti DNN is a platform device, I'll write some test code to initialize driver and see if it works. Regards, Yuji ^ permalink raw reply [flat|nested] 78+ messages in thread
* RE: New subsystem for acceleration devices @ 2022-08-10 14:05 ` yuji2.ishikawa 0 siblings, 0 replies; 78+ messages in thread From: yuji2.ishikawa @ 2022-08-10 14:05 UTC (permalink / raw) To: oded.gabbay, airlied, gregkh, jiho.chu; +Cc: jgg, arnd, dri-devel, linux-kernel > -----Original Message----- > From: Oded Gabbay <oded.gabbay@gmail.com> > Sent: Wednesday, August 10, 2022 6:42 AM > To: Dave Airlie <airlied@gmail.com>; Greg Kroah-Hartman > <gregkh@linuxfoundation.org>; ishikawa yuji(石川 悠司 ○RDC□AITC○ > EA開) <yuji2.ishikawa@toshiba.co.jp>; Jiho Chu <jiho.chu@samsung.com> > Cc: dri-devel <dri-devel@lists.freedesktop.org>; Arnd Bergmann > <arnd@arndb.de>; Linux-Kernel@Vger. Kernel. Org > <linux-kernel@vger.kernel.org>; Jason Gunthorpe <jgg@nvidia.com> > Subject: Re: New subsystem for acceleration devices > > Hi Jiho, Yuji. > > I want to update that I'm currently in discussions with Dave to figure out what's > the best way to move forward. We are writing it down to do a proper comparison > between the two paths (new accel subsystem or using drm). I guess it will take > a week or so. > > In the meantime, I'm putting the accel code on hold. I have only managed to do > the very basic infra and add a demo driver that shows how to register and > unregister from it. > You can check the code at: > https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux.git/log/?h=ac > cel > > It has two commits. The first adds the subsystem code and the second adds the > demo driver. > The subsystem code is basically drm code copied and renamed and slightly > modified, but I really only worked on it for a couple of hours so take that into > consideration. > > The important thing is that the demo driver shows the basic steps are really > simple. You need to add two function calls in your probe and one function call in > your release. Of course you will need to supply some function callbacks, but I > haven't got to fill that in the demo driver. Once you register, you get > /dev/accel/ac0 and > /dev/accel/ac_controlD64 (if you want a control device). If I were to continue > this, the next step is to do the open and close part. > > I will update once we know where things are heading. As I said, I imagine it can > take a few weeks. > > Thanks, > Oded Hi Odded, Thank you for uploading the framework as well as a sample. It's exciting to see new software is growing up. Since Visconti DNN is a platform device, I'll write some test code to initialize driver and see if it works. Regards, Yuji ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-10 14:05 ` yuji2.ishikawa @ 2022-08-10 14:37 ` Oded Gabbay -1 siblings, 0 replies; 78+ messages in thread From: Oded Gabbay @ 2022-08-10 14:37 UTC (permalink / raw) To: Yuji Ishikawa Cc: Dave Airlie, Greg Kroah-Hartman, Jiho Chu, Maling list - DRI developers, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org, Jason Gunthorpe On Wed, Aug 10, 2022 at 5:10 PM <yuji2.ishikawa@toshiba.co.jp> wrote: > > > -----Original Message----- > > From: Oded Gabbay <oded.gabbay@gmail.com> > > Sent: Wednesday, August 10, 2022 6:42 AM > > To: Dave Airlie <airlied@gmail.com>; Greg Kroah-Hartman > > <gregkh@linuxfoundation.org>; ishikawa yuji(石川 悠司 ○RDC□AITC○ > > EA開) <yuji2.ishikawa@toshiba.co.jp>; Jiho Chu <jiho.chu@samsung.com> > > Cc: dri-devel <dri-devel@lists.freedesktop.org>; Arnd Bergmann > > <arnd@arndb.de>; Linux-Kernel@Vger. Kernel. Org > > <linux-kernel@vger.kernel.org>; Jason Gunthorpe <jgg@nvidia.com> > > Subject: Re: New subsystem for acceleration devices > > > > Hi Jiho, Yuji. > > > > I want to update that I'm currently in discussions with Dave to figure out what's > > the best way to move forward. We are writing it down to do a proper comparison > > between the two paths (new accel subsystem or using drm). I guess it will take > > a week or so. > > > > In the meantime, I'm putting the accel code on hold. I have only managed to do > > the very basic infra and add a demo driver that shows how to register and > > unregister from it. > > You can check the code at: > > https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux.git/log/?h=ac > > cel > > > > It has two commits. The first adds the subsystem code and the second adds the > > demo driver. > > The subsystem code is basically drm code copied and renamed and slightly > > modified, but I really only worked on it for a couple of hours so take that into > > consideration. > > > > The important thing is that the demo driver shows the basic steps are really > > simple. You need to add two function calls in your probe and one function call in > > your release. Of course you will need to supply some function callbacks, but I > > haven't got to fill that in the demo driver. Once you register, you get > > /dev/accel/ac0 and > > /dev/accel/ac_controlD64 (if you want a control device). If I were to continue > > this, the next step is to do the open and close part. > > > > I will update once we know where things are heading. As I said, I imagine it can > > take a few weeks. > > > > Thanks, > > Oded > > Hi Odded, > Thank you for uploading the framework as well as a sample. > It's exciting to see new software is growing up. > > Since Visconti DNN is a platform device, I'll write some test code to initialize driver and see if it works. > > Regards, > Yuji Platform or PCI, it doesn't matter. You just call it from the probe. But really, this is something I did in a few hours and I stopped because there were some objections and I wanted to first talk about it with Dave. I don't know if it's worth it for you to waste time on it at this point. Thanks, Oded ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-08-10 14:37 ` Oded Gabbay 0 siblings, 0 replies; 78+ messages in thread From: Oded Gabbay @ 2022-08-10 14:37 UTC (permalink / raw) To: Yuji Ishikawa Cc: Arnd Bergmann, Greg Kroah-Hartman, Linux-Kernel@Vger. Kernel. Org, Maling list - DRI developers, Jason Gunthorpe, Jiho Chu On Wed, Aug 10, 2022 at 5:10 PM <yuji2.ishikawa@toshiba.co.jp> wrote: > > > -----Original Message----- > > From: Oded Gabbay <oded.gabbay@gmail.com> > > Sent: Wednesday, August 10, 2022 6:42 AM > > To: Dave Airlie <airlied@gmail.com>; Greg Kroah-Hartman > > <gregkh@linuxfoundation.org>; ishikawa yuji(石川 悠司 ○RDC□AITC○ > > EA開) <yuji2.ishikawa@toshiba.co.jp>; Jiho Chu <jiho.chu@samsung.com> > > Cc: dri-devel <dri-devel@lists.freedesktop.org>; Arnd Bergmann > > <arnd@arndb.de>; Linux-Kernel@Vger. Kernel. Org > > <linux-kernel@vger.kernel.org>; Jason Gunthorpe <jgg@nvidia.com> > > Subject: Re: New subsystem for acceleration devices > > > > Hi Jiho, Yuji. > > > > I want to update that I'm currently in discussions with Dave to figure out what's > > the best way to move forward. We are writing it down to do a proper comparison > > between the two paths (new accel subsystem or using drm). I guess it will take > > a week or so. > > > > In the meantime, I'm putting the accel code on hold. I have only managed to do > > the very basic infra and add a demo driver that shows how to register and > > unregister from it. > > You can check the code at: > > https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux.git/log/?h=ac > > cel > > > > It has two commits. The first adds the subsystem code and the second adds the > > demo driver. > > The subsystem code is basically drm code copied and renamed and slightly > > modified, but I really only worked on it for a couple of hours so take that into > > consideration. > > > > The important thing is that the demo driver shows the basic steps are really > > simple. You need to add two function calls in your probe and one function call in > > your release. Of course you will need to supply some function callbacks, but I > > haven't got to fill that in the demo driver. Once you register, you get > > /dev/accel/ac0 and > > /dev/accel/ac_controlD64 (if you want a control device). If I were to continue > > this, the next step is to do the open and close part. > > > > I will update once we know where things are heading. As I said, I imagine it can > > take a few weeks. > > > > Thanks, > > Oded > > Hi Odded, > Thank you for uploading the framework as well as a sample. > It's exciting to see new software is growing up. > > Since Visconti DNN is a platform device, I'll write some test code to initialize driver and see if it works. > > Regards, > Yuji Platform or PCI, it doesn't matter. You just call it from the probe. But really, this is something I did in a few hours and I stopped because there were some objections and I wanted to first talk about it with Dave. I don't know if it's worth it for you to waste time on it at this point. Thanks, Oded ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-09 21:42 ` Oded Gabbay @ 2022-08-23 18:23 ` Kevin Hilman -1 siblings, 0 replies; 78+ messages in thread From: Kevin Hilman @ 2022-08-23 18:23 UTC (permalink / raw) To: Oded Gabbay, Dave Airlie, Greg Kroah-Hartman, Yuji Ishikawa, Jiho Chu, Alexandre Bailon Cc: dri-devel, Arnd Bergmann, Jason Gunthorpe, Linux-Kernel@Vger. Kernel. Org Hi Obed, Oded Gabbay <oded.gabbay@gmail.com> writes: [...] > I want to update that I'm currently in discussions with Dave to figure > out what's the best way to move forward. We are writing it down to do > a proper comparison between the two paths (new accel subsystem or > using drm). I guess it will take a week or so. Any update on the discussions with Dave? and/or are there any plans to discuss this further at LPC/ksummit yet? We (BayLibre) are upstreaming support for APUs on Mediatek SoCs, and are using the DRM-based approach. I'll also be at LPC and happy to discuss in person. For some context on my/our interest: back in Sept 2020 we initially submitted an rpmesg based driver for kernel communication[1]. After review comments, we rewrote that based on DRM[2] and are now using it for some MTK SoCs[3] and supporting our MTK customers with it. Hopefully we will get the kernel interfaces sorted out soon, but next, there's the userspace side of things. To that end, we're also working on libAPU, a common, open userspace stack. Alex Bailon recently presented a proposal earlier this year at Embedded Recipes in Paris (video[4], slides[5].) libAPU would include abstractions of the kernel interfaces for DRM (using libdrm), remoteproc/rpmsg, virtio etc. but also goes farther and proposes an open firmware for the accelerator side using libMetal/OpenAMP + rpmsg for communication with (most likely closed source) vendor firmware. Think of this like sound open firmware (SOF[6]), but for accelerators. We've been using this succesfully for Mediatek SoCs (which have a Cadence VP6 APU) and have submitted/published the code, including the OpenAMP[7] and libmetal[8] parts in addition to the kernel parts already mentioned. We're to the point where we're pretty happy with how this works for MTK SoCs, and wanting to collaborate with folks working on other platforms and to see what's needed to support other kinds of accelerators with a common userspace and open firmware infrastructure. Kevin [1] https://lore.kernel.org/r/20200930115350.5272-1-abailon@baylibre.com [2] https://lore.kernel.org/r/20210917125945.620097-1-abailon@baylibre.com [3] https://lore.kernel.org/r/20210819151340.741565-1-abailon@baylibre.com [4] https://www.youtube.com/watch?v=Uj1FZoF8MMw&t=18211s [5] https://embedded-recipes.org/2022/wp-content/uploads/2022/06/bailon.pdf [6] https://www.sofproject.org/ [7] https://github.com/BayLibre/open-amp/tree/v2021.10-mtk [8] https://github.com/BayLibre/libmetal/tree/v2021.10-mtk ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-08-23 18:23 ` Kevin Hilman 0 siblings, 0 replies; 78+ messages in thread From: Kevin Hilman @ 2022-08-23 18:23 UTC (permalink / raw) To: Oded Gabbay, Dave Airlie, Greg Kroah-Hartman, Yuji Ishikawa, Jiho Chu, Alexandre Bailon Cc: Jason Gunthorpe, Arnd Bergmann, dri-devel, Linux-Kernel@Vger. Kernel. Org Hi Obed, Oded Gabbay <oded.gabbay@gmail.com> writes: [...] > I want to update that I'm currently in discussions with Dave to figure > out what's the best way to move forward. We are writing it down to do > a proper comparison between the two paths (new accel subsystem or > using drm). I guess it will take a week or so. Any update on the discussions with Dave? and/or are there any plans to discuss this further at LPC/ksummit yet? We (BayLibre) are upstreaming support for APUs on Mediatek SoCs, and are using the DRM-based approach. I'll also be at LPC and happy to discuss in person. For some context on my/our interest: back in Sept 2020 we initially submitted an rpmesg based driver for kernel communication[1]. After review comments, we rewrote that based on DRM[2] and are now using it for some MTK SoCs[3] and supporting our MTK customers with it. Hopefully we will get the kernel interfaces sorted out soon, but next, there's the userspace side of things. To that end, we're also working on libAPU, a common, open userspace stack. Alex Bailon recently presented a proposal earlier this year at Embedded Recipes in Paris (video[4], slides[5].) libAPU would include abstractions of the kernel interfaces for DRM (using libdrm), remoteproc/rpmsg, virtio etc. but also goes farther and proposes an open firmware for the accelerator side using libMetal/OpenAMP + rpmsg for communication with (most likely closed source) vendor firmware. Think of this like sound open firmware (SOF[6]), but for accelerators. We've been using this succesfully for Mediatek SoCs (which have a Cadence VP6 APU) and have submitted/published the code, including the OpenAMP[7] and libmetal[8] parts in addition to the kernel parts already mentioned. We're to the point where we're pretty happy with how this works for MTK SoCs, and wanting to collaborate with folks working on other platforms and to see what's needed to support other kinds of accelerators with a common userspace and open firmware infrastructure. Kevin [1] https://lore.kernel.org/r/20200930115350.5272-1-abailon@baylibre.com [2] https://lore.kernel.org/r/20210917125945.620097-1-abailon@baylibre.com [3] https://lore.kernel.org/r/20210819151340.741565-1-abailon@baylibre.com [4] https://www.youtube.com/watch?v=Uj1FZoF8MMw&t=18211s [5] https://embedded-recipes.org/2022/wp-content/uploads/2022/06/bailon.pdf [6] https://www.sofproject.org/ [7] https://github.com/BayLibre/open-amp/tree/v2021.10-mtk [8] https://github.com/BayLibre/libmetal/tree/v2021.10-mtk ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-23 18:23 ` Kevin Hilman @ 2022-08-23 20:45 ` Oded Gabbay -1 siblings, 0 replies; 78+ messages in thread From: Oded Gabbay @ 2022-08-23 20:45 UTC (permalink / raw) To: Kevin Hilman Cc: Arnd Bergmann, Greg Kroah-Hartman, Linux-Kernel@Vger. Kernel. Org, dri-devel, Alexandre Bailon, Jason Gunthorpe, Jiho Chu, Yuji Ishikawa On Tue, Aug 23, 2022 at 9:24 PM Kevin Hilman <khilman@baylibre.com> wrote: > > Hi Obed, > > Oded Gabbay <oded.gabbay@gmail.com> writes: > > [...] > > > I want to update that I'm currently in discussions with Dave to figure > > out what's the best way to move forward. We are writing it down to do > > a proper comparison between the two paths (new accel subsystem or > > using drm). I guess it will take a week or so. > > Any update on the discussions with Dave? and/or are there any plans to > discuss this further at LPC/ksummit yet? Hi Kevin. We are still discussing the details, as at least the habanalabs driver is very complex and there are multiple parts that I need to see if and how they can be mapped to drm. Some of us will attend LPC so we will probably take advantage of that to talk more about this. > > We (BayLibre) are upstreaming support for APUs on Mediatek SoCs, and are > using the DRM-based approach. I'll also be at LPC and happy to discuss > in person. > > For some context on my/our interest: back in Sept 2020 we initially > submitted an rpmesg based driver for kernel communication[1]. After > review comments, we rewrote that based on DRM[2] and are now using it > for some MTK SoCs[3] and supporting our MTK customers with it. > > Hopefully we will get the kernel interfaces sorted out soon, but next, > there's the userspace side of things. To that end, we're also working > on libAPU, a common, open userspace stack. Alex Bailon recently > presented a proposal earlier this year at Embedded Recipes in Paris > (video[4], slides[5].) > > libAPU would include abstractions of the kernel interfaces for DRM > (using libdrm), remoteproc/rpmsg, virtio etc. but also goes farther and > proposes an open firmware for the accelerator side using > libMetal/OpenAMP + rpmsg for communication with (most likely closed > source) vendor firmware. Think of this like sound open firmware (SOF[6]), > but for accelerators. I think your device and the habana device are very different in nature, and it is part of what Dave and I discussed, whether these two classes of devices can live together. I guess they can live together in the kernel, but in the userspace, not so much imo. The first class is the edge inference devices (usually as part of some SoC). I think your description of the APU on MTK SoC is a classic example of such a device. You usually have some firmware you load, you give it a graph and pointers for input and output and then you just execute the graph again and again to perform inference and just replace the inputs. The second class is the data-center, training accelerators, which habana's gaudi device is classified as such. These devices usually have a number of different compute engines, a fabric for scaling out, on-device memory, internal MMUs and RAS monitoring requirements. Those devices are usually operated via command queues, either through their kernel driver or directly from user-space. They have multiple APIs for memory management, RAS, scaling-out and command-submissions. > > We've been using this succesfully for Mediatek SoCs (which have a > Cadence VP6 APU) and have submitted/published the code, including the > OpenAMP[7] and libmetal[8] parts in addition to the kernel parts already > mentioned. What's the difference between libmetal and other open-source low-level runtime drivers, such as oneAPI level-zero ? Currently we have our own runtime driver which is tightly coupled with our h/w. For example, the method the userspace "talks" to the data-plane firmware is very proprietary as it is hard-wired into the architecture of the entire ASIC and how it performs deep-learning training. Therefore, I don't see how this can be shared with other vendors. Not because of secrecy but because it is simply not relevant to any other ASIC. > > We're to the point where we're pretty happy with how this works for MTK > SoCs, and wanting to collaborate with folks working on other platforms > and to see what's needed to support other kinds of accelerators with a > common userspace and open firmware infrastructure. > > Kevin > > [1] https://lore.kernel.org/r/20200930115350.5272-1-abailon@baylibre.com > [2] https://lore.kernel.org/r/20210917125945.620097-1-abailon@baylibre.com > [3] https://lore.kernel.org/r/20210819151340.741565-1-abailon@baylibre.com > [4] https://www.youtube.com/watch?v=Uj1FZoF8MMw&t=18211s > [5] https://embedded-recipes.org/2022/wp-content/uploads/2022/06/bailon.pdf > [6] https://www.sofproject.org/ > [7] https://github.com/BayLibre/open-amp/tree/v2021.10-mtk > [8] https://github.com/BayLibre/libmetal/tree/v2021.10-mtk ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-08-23 20:45 ` Oded Gabbay 0 siblings, 0 replies; 78+ messages in thread From: Oded Gabbay @ 2022-08-23 20:45 UTC (permalink / raw) To: Kevin Hilman Cc: Dave Airlie, Greg Kroah-Hartman, Yuji Ishikawa, Jiho Chu, Alexandre Bailon, Jason Gunthorpe, Arnd Bergmann, dri-devel, Linux-Kernel@Vger. Kernel. Org On Tue, Aug 23, 2022 at 9:24 PM Kevin Hilman <khilman@baylibre.com> wrote: > > Hi Obed, > > Oded Gabbay <oded.gabbay@gmail.com> writes: > > [...] > > > I want to update that I'm currently in discussions with Dave to figure > > out what's the best way to move forward. We are writing it down to do > > a proper comparison between the two paths (new accel subsystem or > > using drm). I guess it will take a week or so. > > Any update on the discussions with Dave? and/or are there any plans to > discuss this further at LPC/ksummit yet? Hi Kevin. We are still discussing the details, as at least the habanalabs driver is very complex and there are multiple parts that I need to see if and how they can be mapped to drm. Some of us will attend LPC so we will probably take advantage of that to talk more about this. > > We (BayLibre) are upstreaming support for APUs on Mediatek SoCs, and are > using the DRM-based approach. I'll also be at LPC and happy to discuss > in person. > > For some context on my/our interest: back in Sept 2020 we initially > submitted an rpmesg based driver for kernel communication[1]. After > review comments, we rewrote that based on DRM[2] and are now using it > for some MTK SoCs[3] and supporting our MTK customers with it. > > Hopefully we will get the kernel interfaces sorted out soon, but next, > there's the userspace side of things. To that end, we're also working > on libAPU, a common, open userspace stack. Alex Bailon recently > presented a proposal earlier this year at Embedded Recipes in Paris > (video[4], slides[5].) > > libAPU would include abstractions of the kernel interfaces for DRM > (using libdrm), remoteproc/rpmsg, virtio etc. but also goes farther and > proposes an open firmware for the accelerator side using > libMetal/OpenAMP + rpmsg for communication with (most likely closed > source) vendor firmware. Think of this like sound open firmware (SOF[6]), > but for accelerators. I think your device and the habana device are very different in nature, and it is part of what Dave and I discussed, whether these two classes of devices can live together. I guess they can live together in the kernel, but in the userspace, not so much imo. The first class is the edge inference devices (usually as part of some SoC). I think your description of the APU on MTK SoC is a classic example of such a device. You usually have some firmware you load, you give it a graph and pointers for input and output and then you just execute the graph again and again to perform inference and just replace the inputs. The second class is the data-center, training accelerators, which habana's gaudi device is classified as such. These devices usually have a number of different compute engines, a fabric for scaling out, on-device memory, internal MMUs and RAS monitoring requirements. Those devices are usually operated via command queues, either through their kernel driver or directly from user-space. They have multiple APIs for memory management, RAS, scaling-out and command-submissions. > > We've been using this succesfully for Mediatek SoCs (which have a > Cadence VP6 APU) and have submitted/published the code, including the > OpenAMP[7] and libmetal[8] parts in addition to the kernel parts already > mentioned. What's the difference between libmetal and other open-source low-level runtime drivers, such as oneAPI level-zero ? Currently we have our own runtime driver which is tightly coupled with our h/w. For example, the method the userspace "talks" to the data-plane firmware is very proprietary as it is hard-wired into the architecture of the entire ASIC and how it performs deep-learning training. Therefore, I don't see how this can be shared with other vendors. Not because of secrecy but because it is simply not relevant to any other ASIC. > > We're to the point where we're pretty happy with how this works for MTK > SoCs, and wanting to collaborate with folks working on other platforms > and to see what's needed to support other kinds of accelerators with a > common userspace and open firmware infrastructure. > > Kevin > > [1] https://lore.kernel.org/r/20200930115350.5272-1-abailon@baylibre.com > [2] https://lore.kernel.org/r/20210917125945.620097-1-abailon@baylibre.com > [3] https://lore.kernel.org/r/20210819151340.741565-1-abailon@baylibre.com > [4] https://www.youtube.com/watch?v=Uj1FZoF8MMw&t=18211s > [5] https://embedded-recipes.org/2022/wp-content/uploads/2022/06/bailon.pdf > [6] https://www.sofproject.org/ > [7] https://github.com/BayLibre/open-amp/tree/v2021.10-mtk > [8] https://github.com/BayLibre/libmetal/tree/v2021.10-mtk ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-23 20:45 ` Oded Gabbay @ 2022-08-29 20:54 ` Kevin Hilman -1 siblings, 0 replies; 78+ messages in thread From: Kevin Hilman @ 2022-08-29 20:54 UTC (permalink / raw) To: Oded Gabbay Cc: Dave Airlie, Greg Kroah-Hartman, Yuji Ishikawa, Jiho Chu, Alexandre Bailon, Jason Gunthorpe, Arnd Bergmann, dri-devel, Linux-Kernel@Vger. Kernel. Org Hi Oded (and sorry I misspelled your name last time), Oded Gabbay <oded.gabbay@gmail.com> writes: > On Tue, Aug 23, 2022 at 9:24 PM Kevin Hilman <khilman@baylibre.com> wrote: >> >> Hi Obed, >> >> Oded Gabbay <oded.gabbay@gmail.com> writes: >> >> [...] >> >> > I want to update that I'm currently in discussions with Dave to figure >> > out what's the best way to move forward. We are writing it down to do >> > a proper comparison between the two paths (new accel subsystem or >> > using drm). I guess it will take a week or so. >> >> Any update on the discussions with Dave? and/or are there any plans to >> discuss this further at LPC/ksummit yet? > Hi Kevin. > > We are still discussing the details, as at least the habanalabs driver > is very complex and there are multiple parts that I need to see if and > how they can be mapped to drm. > Some of us will attend LPC so we will probably take advantage of that > to talk more about this. OK, looking forward to some more conversations at LPC. >> >> We (BayLibre) are upstreaming support for APUs on Mediatek SoCs, and are >> using the DRM-based approach. I'll also be at LPC and happy to discuss >> in person. >> >> For some context on my/our interest: back in Sept 2020 we initially >> submitted an rpmesg based driver for kernel communication[1]. After >> review comments, we rewrote that based on DRM[2] and are now using it >> for some MTK SoCs[3] and supporting our MTK customers with it. >> >> Hopefully we will get the kernel interfaces sorted out soon, but next, >> there's the userspace side of things. To that end, we're also working >> on libAPU, a common, open userspace stack. Alex Bailon recently >> presented a proposal earlier this year at Embedded Recipes in Paris >> (video[4], slides[5].) >> >> libAPU would include abstractions of the kernel interfaces for DRM >> (using libdrm), remoteproc/rpmsg, virtio etc. but also goes farther and >> proposes an open firmware for the accelerator side using >> libMetal/OpenAMP + rpmsg for communication with (most likely closed >> source) vendor firmware. Think of this like sound open firmware (SOF[6]), >> but for accelerators. > > I think your device and the habana device are very different in > nature, and it is part of what Dave and I discussed, whether these two > classes of devices can live together. I guess they can live together > in the kernel, but in the userspace, not so much imo. Yeah, for now I think focusing on how to handle both classes of devices in the kernel is the most important. > The first class is the edge inference devices (usually as part of some > SoC). I think your description of the APU on MTK SoC is a classic > example of such a device. Correct. > You usually have some firmware you load, you give it a graph and > pointers for input and output and then you just execute the graph > again and again to perform inference and just replace the inputs. > > The second class is the data-center, training accelerators, which > habana's gaudi device is classified as such. These devices usually > have a number of different compute engines, a fabric for scaling out, > on-device memory, internal MMUs and RAS monitoring requirements. Those > devices are usually operated via command queues, either through their > kernel driver or directly from user-space. They have multiple APIs for > memory management, RAS, scaling-out and command-submissions. OK, I see. >> >> We've been using this succesfully for Mediatek SoCs (which have a >> Cadence VP6 APU) and have submitted/published the code, including the >> OpenAMP[7] and libmetal[8] parts in addition to the kernel parts already >> mentioned. > What's the difference between libmetal and other open-source low-level > runtime drivers, such as oneAPI level-zero ? TBH, I'd never heard of oneAPI before, so I'm assuming it's mainly focused in the data center. libmetal/openAMP are widely used in the consumer, industrial embedded space, and heavily used by FPGAs in many market segments. > Currently we have our own runtime driver which is tightly coupled with > our h/w. For example, the method the userspace "talks" to the > data-plane firmware is very proprietary as it is hard-wired into the > architecture of the entire ASIC and how it performs deep-learning > training. Therefore, I don't see how this can be shared with other > vendors. Not because of secrecy but because it is simply not relevant > to any other ASIC. OK, makes sense. Thanks for clarifying your use case in more detail. Kevin ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-08-29 20:54 ` Kevin Hilman 0 siblings, 0 replies; 78+ messages in thread From: Kevin Hilman @ 2022-08-29 20:54 UTC (permalink / raw) To: Oded Gabbay Cc: Arnd Bergmann, Greg Kroah-Hartman, Linux-Kernel@Vger. Kernel. Org, dri-devel, Alexandre Bailon, Jason Gunthorpe, Jiho Chu, Yuji Ishikawa Hi Oded (and sorry I misspelled your name last time), Oded Gabbay <oded.gabbay@gmail.com> writes: > On Tue, Aug 23, 2022 at 9:24 PM Kevin Hilman <khilman@baylibre.com> wrote: >> >> Hi Obed, >> >> Oded Gabbay <oded.gabbay@gmail.com> writes: >> >> [...] >> >> > I want to update that I'm currently in discussions with Dave to figure >> > out what's the best way to move forward. We are writing it down to do >> > a proper comparison between the two paths (new accel subsystem or >> > using drm). I guess it will take a week or so. >> >> Any update on the discussions with Dave? and/or are there any plans to >> discuss this further at LPC/ksummit yet? > Hi Kevin. > > We are still discussing the details, as at least the habanalabs driver > is very complex and there are multiple parts that I need to see if and > how they can be mapped to drm. > Some of us will attend LPC so we will probably take advantage of that > to talk more about this. OK, looking forward to some more conversations at LPC. >> >> We (BayLibre) are upstreaming support for APUs on Mediatek SoCs, and are >> using the DRM-based approach. I'll also be at LPC and happy to discuss >> in person. >> >> For some context on my/our interest: back in Sept 2020 we initially >> submitted an rpmesg based driver for kernel communication[1]. After >> review comments, we rewrote that based on DRM[2] and are now using it >> for some MTK SoCs[3] and supporting our MTK customers with it. >> >> Hopefully we will get the kernel interfaces sorted out soon, but next, >> there's the userspace side of things. To that end, we're also working >> on libAPU, a common, open userspace stack. Alex Bailon recently >> presented a proposal earlier this year at Embedded Recipes in Paris >> (video[4], slides[5].) >> >> libAPU would include abstractions of the kernel interfaces for DRM >> (using libdrm), remoteproc/rpmsg, virtio etc. but also goes farther and >> proposes an open firmware for the accelerator side using >> libMetal/OpenAMP + rpmsg for communication with (most likely closed >> source) vendor firmware. Think of this like sound open firmware (SOF[6]), >> but for accelerators. > > I think your device and the habana device are very different in > nature, and it is part of what Dave and I discussed, whether these two > classes of devices can live together. I guess they can live together > in the kernel, but in the userspace, not so much imo. Yeah, for now I think focusing on how to handle both classes of devices in the kernel is the most important. > The first class is the edge inference devices (usually as part of some > SoC). I think your description of the APU on MTK SoC is a classic > example of such a device. Correct. > You usually have some firmware you load, you give it a graph and > pointers for input and output and then you just execute the graph > again and again to perform inference and just replace the inputs. > > The second class is the data-center, training accelerators, which > habana's gaudi device is classified as such. These devices usually > have a number of different compute engines, a fabric for scaling out, > on-device memory, internal MMUs and RAS monitoring requirements. Those > devices are usually operated via command queues, either through their > kernel driver or directly from user-space. They have multiple APIs for > memory management, RAS, scaling-out and command-submissions. OK, I see. >> >> We've been using this succesfully for Mediatek SoCs (which have a >> Cadence VP6 APU) and have submitted/published the code, including the >> OpenAMP[7] and libmetal[8] parts in addition to the kernel parts already >> mentioned. > What's the difference between libmetal and other open-source low-level > runtime drivers, such as oneAPI level-zero ? TBH, I'd never heard of oneAPI before, so I'm assuming it's mainly focused in the data center. libmetal/openAMP are widely used in the consumer, industrial embedded space, and heavily used by FPGAs in many market segments. > Currently we have our own runtime driver which is tightly coupled with > our h/w. For example, the method the userspace "talks" to the > data-plane firmware is very proprietary as it is hard-wired into the > architecture of the entire ASIC and how it performs deep-learning > training. Therefore, I don't see how this can be shared with other > vendors. Not because of secrecy but because it is simply not relevant > to any other ASIC. OK, makes sense. Thanks for clarifying your use case in more detail. Kevin ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-29 20:54 ` Kevin Hilman @ 2022-09-23 16:21 ` Oded Gabbay -1 siblings, 0 replies; 78+ messages in thread From: Oded Gabbay @ 2022-09-23 16:21 UTC (permalink / raw) To: Linux-Kernel@Vger. Kernel. Org, Yuji Ishikawa, Jiho Chu, Alexandre Bailon, Kevin Hilman Cc: Dave Airlie, Greg Kroah-Hartman, Jason Gunthorpe, Arnd Bergmann, dri-devel, Daniel Vetter On Mon, Aug 29, 2022 at 11:54 PM Kevin Hilman <khilman@baylibre.com> wrote: > > Hi Oded (and sorry I misspelled your name last time), > > Oded Gabbay <oded.gabbay@gmail.com> writes: > > > On Tue, Aug 23, 2022 at 9:24 PM Kevin Hilman <khilman@baylibre.com> wrote: > >> > >> Hi Obed, > >> > >> Oded Gabbay <oded.gabbay@gmail.com> writes: > >> > >> [...] > >> > >> > I want to update that I'm currently in discussions with Dave to figure > >> > out what's the best way to move forward. We are writing it down to do > >> > a proper comparison between the two paths (new accel subsystem or > >> > using drm). I guess it will take a week or so. > >> > >> Any update on the discussions with Dave? and/or are there any plans to > >> discuss this further at LPC/ksummit yet? > > Hi Kevin. > > > > We are still discussing the details, as at least the habanalabs driver > > is very complex and there are multiple parts that I need to see if and > > how they can be mapped to drm. > > Some of us will attend LPC so we will probably take advantage of that > > to talk more about this. > > OK, looking forward to some more conversations at LPC. > > >> > >> We (BayLibre) are upstreaming support for APUs on Mediatek SoCs, and are > >> using the DRM-based approach. I'll also be at LPC and happy to discuss > >> in person. > >> > >> For some context on my/our interest: back in Sept 2020 we initially > >> submitted an rpmesg based driver for kernel communication[1]. After > >> review comments, we rewrote that based on DRM[2] and are now using it > >> for some MTK SoCs[3] and supporting our MTK customers with it. > >> > >> Hopefully we will get the kernel interfaces sorted out soon, but next, > >> there's the userspace side of things. To that end, we're also working > >> on libAPU, a common, open userspace stack. Alex Bailon recently > >> presented a proposal earlier this year at Embedded Recipes in Paris > >> (video[4], slides[5].) > >> > >> libAPU would include abstractions of the kernel interfaces for DRM > >> (using libdrm), remoteproc/rpmsg, virtio etc. but also goes farther and > >> proposes an open firmware for the accelerator side using > >> libMetal/OpenAMP + rpmsg for communication with (most likely closed > >> source) vendor firmware. Think of this like sound open firmware (SOF[6]), > >> but for accelerators. > > > > I think your device and the habana device are very different in > > nature, and it is part of what Dave and I discussed, whether these two > > classes of devices can live together. I guess they can live together > > in the kernel, but in the userspace, not so much imo. > > Yeah, for now I think focusing on how to handle both classes of devices > in the kernel is the most important. > > > The first class is the edge inference devices (usually as part of some > > SoC). I think your description of the APU on MTK SoC is a classic > > example of such a device. > > Correct. > > > You usually have some firmware you load, you give it a graph and > > pointers for input and output and then you just execute the graph > > again and again to perform inference and just replace the inputs. > > > > The second class is the data-center, training accelerators, which > > habana's gaudi device is classified as such. These devices usually > > have a number of different compute engines, a fabric for scaling out, > > on-device memory, internal MMUs and RAS monitoring requirements. Those > > devices are usually operated via command queues, either through their > > kernel driver or directly from user-space. They have multiple APIs for > > memory management, RAS, scaling-out and command-submissions. > > OK, I see. > > >> > >> We've been using this succesfully for Mediatek SoCs (which have a > >> Cadence VP6 APU) and have submitted/published the code, including the > >> OpenAMP[7] and libmetal[8] parts in addition to the kernel parts already > >> mentioned. > > What's the difference between libmetal and other open-source low-level > > runtime drivers, such as oneAPI level-zero ? > > TBH, I'd never heard of oneAPI before, so I'm assuming it's mainly > focused in the data center. libmetal/openAMP are widely used > in the consumer, industrial embedded space, and heavily used by FPGAs in > many market segments. > > > Currently we have our own runtime driver which is tightly coupled with > > our h/w. For example, the method the userspace "talks" to the > > data-plane firmware is very proprietary as it is hard-wired into the > > architecture of the entire ASIC and how it performs deep-learning > > training. Therefore, I don't see how this can be shared with other > > vendors. Not because of secrecy but because it is simply not relevant > > to any other ASIC. > > OK, makes sense. > > Thanks for clarifying your use case in more detail. > > Kevin Hi all, I wanted to update on this issue for those of you who weren't in LPC. We had a BOF session about this topic with most, if not all, of the relevant people - DRM maintainers, Greg and other subsystem and device drivers maintainers. Dave Airlie summarized the session in his blog - https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html TL;DR Accelerators drivers will use the DRM subsystem code, but they will be located in a different directory (drivers/accel) and will be exposed to the userspace using a new major and a new device char name (/dev/accelX). I'm currently working on preparing some prerequisite patches for the DRM subsystem to support the new subsystem (e.g. new major number). After that will be merged, I plan to move the habanalabs driver to the new location and convert it to use the modified DRM framework code. Thanks, Oded ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-09-23 16:21 ` Oded Gabbay 0 siblings, 0 replies; 78+ messages in thread From: Oded Gabbay @ 2022-09-23 16:21 UTC (permalink / raw) To: Linux-Kernel@Vger. Kernel. Org, Yuji Ishikawa, Jiho Chu, Alexandre Bailon, Kevin Hilman Cc: Arnd Bergmann, Greg Kroah-Hartman, dri-devel, Jason Gunthorpe, Daniel Vetter On Mon, Aug 29, 2022 at 11:54 PM Kevin Hilman <khilman@baylibre.com> wrote: > > Hi Oded (and sorry I misspelled your name last time), > > Oded Gabbay <oded.gabbay@gmail.com> writes: > > > On Tue, Aug 23, 2022 at 9:24 PM Kevin Hilman <khilman@baylibre.com> wrote: > >> > >> Hi Obed, > >> > >> Oded Gabbay <oded.gabbay@gmail.com> writes: > >> > >> [...] > >> > >> > I want to update that I'm currently in discussions with Dave to figure > >> > out what's the best way to move forward. We are writing it down to do > >> > a proper comparison between the two paths (new accel subsystem or > >> > using drm). I guess it will take a week or so. > >> > >> Any update on the discussions with Dave? and/or are there any plans to > >> discuss this further at LPC/ksummit yet? > > Hi Kevin. > > > > We are still discussing the details, as at least the habanalabs driver > > is very complex and there are multiple parts that I need to see if and > > how they can be mapped to drm. > > Some of us will attend LPC so we will probably take advantage of that > > to talk more about this. > > OK, looking forward to some more conversations at LPC. > > >> > >> We (BayLibre) are upstreaming support for APUs on Mediatek SoCs, and are > >> using the DRM-based approach. I'll also be at LPC and happy to discuss > >> in person. > >> > >> For some context on my/our interest: back in Sept 2020 we initially > >> submitted an rpmesg based driver for kernel communication[1]. After > >> review comments, we rewrote that based on DRM[2] and are now using it > >> for some MTK SoCs[3] and supporting our MTK customers with it. > >> > >> Hopefully we will get the kernel interfaces sorted out soon, but next, > >> there's the userspace side of things. To that end, we're also working > >> on libAPU, a common, open userspace stack. Alex Bailon recently > >> presented a proposal earlier this year at Embedded Recipes in Paris > >> (video[4], slides[5].) > >> > >> libAPU would include abstractions of the kernel interfaces for DRM > >> (using libdrm), remoteproc/rpmsg, virtio etc. but also goes farther and > >> proposes an open firmware for the accelerator side using > >> libMetal/OpenAMP + rpmsg for communication with (most likely closed > >> source) vendor firmware. Think of this like sound open firmware (SOF[6]), > >> but for accelerators. > > > > I think your device and the habana device are very different in > > nature, and it is part of what Dave and I discussed, whether these two > > classes of devices can live together. I guess they can live together > > in the kernel, but in the userspace, not so much imo. > > Yeah, for now I think focusing on how to handle both classes of devices > in the kernel is the most important. > > > The first class is the edge inference devices (usually as part of some > > SoC). I think your description of the APU on MTK SoC is a classic > > example of such a device. > > Correct. > > > You usually have some firmware you load, you give it a graph and > > pointers for input and output and then you just execute the graph > > again and again to perform inference and just replace the inputs. > > > > The second class is the data-center, training accelerators, which > > habana's gaudi device is classified as such. These devices usually > > have a number of different compute engines, a fabric for scaling out, > > on-device memory, internal MMUs and RAS monitoring requirements. Those > > devices are usually operated via command queues, either through their > > kernel driver or directly from user-space. They have multiple APIs for > > memory management, RAS, scaling-out and command-submissions. > > OK, I see. > > >> > >> We've been using this succesfully for Mediatek SoCs (which have a > >> Cadence VP6 APU) and have submitted/published the code, including the > >> OpenAMP[7] and libmetal[8] parts in addition to the kernel parts already > >> mentioned. > > What's the difference between libmetal and other open-source low-level > > runtime drivers, such as oneAPI level-zero ? > > TBH, I'd never heard of oneAPI before, so I'm assuming it's mainly > focused in the data center. libmetal/openAMP are widely used > in the consumer, industrial embedded space, and heavily used by FPGAs in > many market segments. > > > Currently we have our own runtime driver which is tightly coupled with > > our h/w. For example, the method the userspace "talks" to the > > data-plane firmware is very proprietary as it is hard-wired into the > > architecture of the entire ASIC and how it performs deep-learning > > training. Therefore, I don't see how this can be shared with other > > vendors. Not because of secrecy but because it is simply not relevant > > to any other ASIC. > > OK, makes sense. > > Thanks for clarifying your use case in more detail. > > Kevin Hi all, I wanted to update on this issue for those of you who weren't in LPC. We had a BOF session about this topic with most, if not all, of the relevant people - DRM maintainers, Greg and other subsystem and device drivers maintainers. Dave Airlie summarized the session in his blog - https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html TL;DR Accelerators drivers will use the DRM subsystem code, but they will be located in a different directory (drivers/accel) and will be exposed to the userspace using a new major and a new device char name (/dev/accelX). I'm currently working on preparing some prerequisite patches for the DRM subsystem to support the new subsystem (e.g. new major number). After that will be merged, I plan to move the habanalabs driver to the new location and convert it to use the modified DRM framework code. Thanks, Oded ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-09-23 16:21 ` Oded Gabbay (?) @ 2022-09-26 8:16 ` Christoph Hellwig 2022-09-29 6:50 ` Oded Gabbay -1 siblings, 1 reply; 78+ messages in thread From: Christoph Hellwig @ 2022-09-26 8:16 UTC (permalink / raw) To: Oded Gabbay Cc: Linux-Kernel@Vger. Kernel. Org, Yuji Ishikawa, Jiho Chu, Alexandre Bailon, Kevin Hilman, Dave Airlie, Greg Kroah-Hartman, Jason Gunthorpe, Arnd Bergmann, dri-devel, Daniel Vetter Btw, there is another interesting thing around on the block: NVMe Computational Storage devices. Don't be fooled by the name, much of it is not about neither computation not storage, but it allows to use the existing NVMe queuing model model to allow access to arbitrary accelerators, including a way to expose access to on-device memory. The probably most current version is here: https://www.snia.org/educational-library/nvme-computational-storage-update-standard-2022 The first version will be rather limited and miss some important functionality like directly accessing host DRAM or CXL integration, but much of that is planned. The initial version also probably won't be able to be supported by Linux at all, but we need to think hard about how to support it. It woud also be really elpful to get more people with accelerator experience into the NVMe working group. ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-09-26 8:16 ` Christoph Hellwig @ 2022-09-29 6:50 ` Oded Gabbay 0 siblings, 0 replies; 78+ messages in thread From: Oded Gabbay @ 2022-09-29 6:50 UTC (permalink / raw) To: Christoph Hellwig Cc: Linux-Kernel@Vger. Kernel. Org, Yuji Ishikawa, Jiho Chu, Alexandre Bailon, Kevin Hilman, Dave Airlie, Greg Kroah-Hartman, Jason Gunthorpe, Arnd Bergmann, dri-devel, Daniel Vetter On Mon, Sep 26, 2022 at 11:16 AM Christoph Hellwig <hch@infradead.org> wrote: > > Btw, there is another interesting thing around on the block: > > NVMe Computational Storage devices. Don't be fooled by the name, much > of it is not about neither computation not storage, but it allows to > use the existing NVMe queuing model model to allow access to arbitrary > accelerators, including a way to expose access to on-device memory. > > The probably most current version is here: > > https://www.snia.org/educational-library/nvme-computational-storage-update-standard-2022 Thanks for the link. Indeed, there were some people in the BOF that mentioned computational storage as something that is relevant. I'll watch the presentation to understand the direction it is going and how it maps to what we were planning to do. > > The first version will be rather limited and miss some important > functionality like directly accessing host DRAM or CXL integration, > but much of that is planned. The initial version also probably won't > be able to be supported by Linux at all, but we need to think hard about > how to support it. > > It woud also be really elpful to get more people with accelerator > experience into the NVMe working group. I will be happy to help and contribute. Oded ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-09-29 6:50 ` Oded Gabbay 0 siblings, 0 replies; 78+ messages in thread From: Oded Gabbay @ 2022-09-29 6:50 UTC (permalink / raw) To: Christoph Hellwig Cc: Arnd Bergmann, Greg Kroah-Hartman, Jiho Chu, Linux-Kernel@Vger. Kernel. Org, dri-devel, Daniel Vetter, Alexandre Bailon, Jason Gunthorpe, Kevin Hilman, Yuji Ishikawa On Mon, Sep 26, 2022 at 11:16 AM Christoph Hellwig <hch@infradead.org> wrote: > > Btw, there is another interesting thing around on the block: > > NVMe Computational Storage devices. Don't be fooled by the name, much > of it is not about neither computation not storage, but it allows to > use the existing NVMe queuing model model to allow access to arbitrary > accelerators, including a way to expose access to on-device memory. > > The probably most current version is here: > > https://www.snia.org/educational-library/nvme-computational-storage-update-standard-2022 Thanks for the link. Indeed, there were some people in the BOF that mentioned computational storage as something that is relevant. I'll watch the presentation to understand the direction it is going and how it maps to what we were planning to do. > > The first version will be rather limited and miss some important > functionality like directly accessing host DRAM or CXL integration, > but much of that is planned. The initial version also probably won't > be able to be supported by Linux at all, but we need to think hard about > how to support it. > > It woud also be really elpful to get more people with accelerator > experience into the NVMe working group. I will be happy to help and contribute. Oded ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-03 23:54 ` Dave Airlie (?) (?) @ 2022-08-04 12:00 ` Tvrtko Ursulin 2022-08-04 15:03 ` Jeffrey Hugo -1 siblings, 1 reply; 78+ messages in thread From: Tvrtko Ursulin @ 2022-08-04 12:00 UTC (permalink / raw) To: Dave Airlie, Oded Gabbay, dri-devel Cc: Greg Kroah-Hartman, Yuji Ishikawa, Linux-Kernel@Vger. Kernel. Org, Arnd Bergmann, Jiho Chu On 04/08/2022 00:54, Dave Airlie wrote: > On Thu, 4 Aug 2022 at 06:21, Oded Gabbay <oded.gabbay@gmail.com> wrote: >> >> On Wed, Aug 3, 2022 at 10:04 PM Dave Airlie <airlied@gmail.com> wrote: >>> >>> On Sun, 31 Jul 2022 at 22:04, Oded Gabbay <oded.gabbay@gmail.com> wrote: >>>> >>>> Hi, >>>> Greg and I talked a couple of months ago about preparing a new accel >>>> subsystem for compute/acceleration devices that are not GPUs and I >>>> think your drivers that you are now trying to upstream fit it as well. >>> >>> We've had some submissions for not-GPUs to the drm subsystem recently. >>> >>> Intel GNA, Intel VPU, NVDLA, rpmsg AI processor unit. >>> >>> why is creating a new subsystem at this time necessary? >>> >>> Are we just creating a subsystem to avoid the open source userspace >>> consumer rules? Or do we have some concrete reasoning behind it? >>> >>> Dave. >> >> Hi Dave. >> The reason it happened now is because I saw two drivers, which are >> doing h/w acceleration for AI, trying to be accepted to the misc >> subsystem. >> Add to that the fact I talked with Greg a couple of months ago about >> doing a subsystem for any compute accelerators, which he was positive >> about, I thought it is a good opportunity to finally do it. >> >> I also honestly think that I can contribute much to these drivers from >> my experience with the habana driver (which is now deployed in mass at >> AWS) and contribute code from the habana driver to a common framework >> for AI drivers. > > Why not port the habana driver to drm now instead? I don't get why it > wouldn't make sense? > > Stepping up to create a new subsystem is great, but we need rules > around what belongs where, we can't just spawn new subsystems when we > have no clear guidelines on where drivers should land. > > What are the rules for a new accel subsystem? Do we have to now > retarget the 3 drivers that are queued up to use drm for accelerators, > because 2 drivers don't? Isn't there three on the "don't prefer drm" side as well? Habana, Toshiba and Samsung? Just so the numbers argument is not misrepresented. Perhaps a poll like a) prefer DRM, b) prefer a new subsystem, c) don't care in principle; is in order? More to the point, code sharing is a very compelling argument if it can be demonstrated to be significant, aka not needing to reinvent the same wheel. Perhaps one route forward could be a) to consider is to rename DRM to something more appropriate, removing rendering from the name and replacing with accelerators, co-processors, I don't know... Although I am not sure renaming the codebase, character device node names and userspace headers is all that feasible. Thought to mention it nevertheless, maybe it gives an idea to someone how it could be done. And b) allow the userspace rules to be considered per driver, or per class (is it a gpu or not should be a question that can be answered). Shouldn't be a blocker if it still matches the rules present elsewhere in the kernel. Those two would remove the two most contentions points as far as I understood the thread. Regards, Tvrtko > There's a lot more to figure out here than merge some structures and > it will be fine. > > I think the one area I can see a divide where a new subsystem is for > accelerators that are single-user, one shot type things like media > drivers (though maybe they could be just media drivers). > > I think anything that does command offloading to firmware or queues > belongs in drm, because that is pretty much what the framework does. I > think it might make sense to enhance some parts of drm to fit things > in better, but that shouldn't block things getting started. > > I'm considering if, we should add an accelerator staging area to drm > and land the 2-3 submissions we have and try and steer things towards > commonality that way instead of holding them out of tree. > > I'd like to offload things from Greg by just not having people submit > misc drivers at all for things that should go elsewhere. > > Dave. > > >> >> Regarding the open source userspace rules in drm - yes, I think your >> rules are too limiting for the relatively young AI scene, and I saw at >> the 2021 kernel summit that other people from the kernel community >> think that as well. >> But that's not the main reason, or even a reason at all for doing >> this. After all, at least for habana, we open-sourced our compiler and >> a runtime library. And Greg also asked those two drivers if they have >> matching open-sourced user-space code. >> >> And a final reason is that I thought this can also help in somewhat >> reducing the workload on Greg. I saw in the last kernel summit there >> was a concern about bringing more people to be kernel maintainers so I >> thought this is a step in the right direction. >> >> Oded ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-04 12:00 ` Tvrtko Ursulin @ 2022-08-04 15:03 ` Jeffrey Hugo 2022-08-04 17:53 ` Oded Gabbay 0 siblings, 1 reply; 78+ messages in thread From: Jeffrey Hugo @ 2022-08-04 15:03 UTC (permalink / raw) To: Tvrtko Ursulin, Dave Airlie, Oded Gabbay, dri-devel Cc: Greg Kroah-Hartman, Yuji Ishikawa, Linux-Kernel@Vger. Kernel. Org, Arnd Bergmann, Jiho Chu On 8/4/2022 6:00 AM, Tvrtko Ursulin wrote: > > On 04/08/2022 00:54, Dave Airlie wrote: >> On Thu, 4 Aug 2022 at 06:21, Oded Gabbay <oded.gabbay@gmail.com> wrote: >>> >>> On Wed, Aug 3, 2022 at 10:04 PM Dave Airlie <airlied@gmail.com> wrote: >>>> >>>> On Sun, 31 Jul 2022 at 22:04, Oded Gabbay <oded.gabbay@gmail.com> >>>> wrote: >>>>> >>>>> Hi, >>>>> Greg and I talked a couple of months ago about preparing a new accel >>>>> subsystem for compute/acceleration devices that are not GPUs and I >>>>> think your drivers that you are now trying to upstream fit it as well. >>>> >>>> We've had some submissions for not-GPUs to the drm subsystem recently. >>>> >>>> Intel GNA, Intel VPU, NVDLA, rpmsg AI processor unit. >>>> >>>> why is creating a new subsystem at this time necessary? >>>> >>>> Are we just creating a subsystem to avoid the open source userspace >>>> consumer rules? Or do we have some concrete reasoning behind it? >>>> >>>> Dave. >>> >>> Hi Dave. >>> The reason it happened now is because I saw two drivers, which are >>> doing h/w acceleration for AI, trying to be accepted to the misc >>> subsystem. >>> Add to that the fact I talked with Greg a couple of months ago about >>> doing a subsystem for any compute accelerators, which he was positive >>> about, I thought it is a good opportunity to finally do it. >>> >>> I also honestly think that I can contribute much to these drivers from >>> my experience with the habana driver (which is now deployed in mass at >>> AWS) and contribute code from the habana driver to a common framework >>> for AI drivers. >> >> Why not port the habana driver to drm now instead? I don't get why it >> wouldn't make sense? >> >> Stepping up to create a new subsystem is great, but we need rules >> around what belongs where, we can't just spawn new subsystems when we >> have no clear guidelines on where drivers should land. >> >> What are the rules for a new accel subsystem? Do we have to now >> retarget the 3 drivers that are queued up to use drm for accelerators, >> because 2 drivers don't? > > Isn't there three on the "don't prefer drm" side as well? Habana, > Toshiba and Samsung? Just so the numbers argument is not misrepresented. > Perhaps a poll like a) prefer DRM, b) prefer a new subsystem, c) don't > care in principle; is in order? I'll chime in with my opinions. Take them for what you will. I would say I fall into the C category, but I'm targeting DRM and will be the 5th(?) accel device to do so. I'll say that the ksummit (from what I see in the LWN article) made me very happy. Finally, the community had clear rules for accel drivers. When I targeted misc in the past, it seemed like Greg moved the goal post just for me, which stalled our attempt. It was even more frustrating to see that the high bar Greg set for us was not applied to other devices of the same "class" in following submissions. However, the past is the past, and based on ksummit, we've spent a number of months retargeting DRM. In a week (or two), I plan to post something to start up the discussions again. As far as the DRM userspace requirements, unless we've misunderstood something, they've been easier to satisfy (pending review I suppose) than what misc has set. I would say that Dave Airlie's feedback on this discussion resonates with me. From the perspective of a vendor wanting to be a part of the community, clear rules are important and ksummit seemed to set that. Oded's announcement has thrown all of that into the wind. Without a proposal to evaluate (eg show me the code with clear guidelines), I cannot seriously consider Oded's idea, and I'm not sure I want to sit by another few years to see it settle out. I expect to move forward with what we were planning prior to seeing this thread which is targeting DRM. We'll see what the DRM folks say when they have something to look at. If our device doesn't fit in DRM per an assessment of the DRM folks, then I sure hope they can suggest where we do fit because then we'll have tried misc and DRM, and not found a home. Since "drivers/accel" doesn't exist, and realistically won't for a long time if ever, I don't see why we should consider it. Why DRM? We consume dma_buf and might look to p2pdma in the future. ksummit appears clear - we are a DRM device. Also, someone could probably run openCL on our device if they were so inclined to wire it up. Over time, I've come to the thinking that we are a GPU, just without display. Yes, it would have helped if DRM and/or drivers/gpu were renamed, but I think I'm past that point. Once you have everything written, it doesn't seem like it matters if the uAPI device is called /dev/drmX, /dev/miscX, or /dev/magic. I will not opine on other devices as I am no expert on them. Today, my opinion is that DRM is the best place for me. We'll see where that goes. > More to the point, code sharing is a very compelling argument if it can > be demonstrated to be significant, aka not needing to reinvent the same > wheel. > > Perhaps one route forward could be a) to consider is to rename DRM to > something more appropriate, removing rendering from the name and > replacing with accelerators, co-processors, I don't know... Although I > am not sure renaming the codebase, character device node names and > userspace headers is all that feasible. Thought to mention it > nevertheless, maybe it gives an idea to someone how it could be done. > > And b) allow the userspace rules to be considered per driver, or per > class (is it a gpu or not should be a question that can be answered). > Shouldn't be a blocker if it still matches the rules present elsewhere > in the kernel. > > Those two would remove the two most contentions points as far as I > understood the thread. > > Regards, > > Tvrtko > ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices 2022-08-04 15:03 ` Jeffrey Hugo @ 2022-08-04 17:53 ` Oded Gabbay 0 siblings, 0 replies; 78+ messages in thread From: Oded Gabbay @ 2022-08-04 17:53 UTC (permalink / raw) To: Jeffrey Hugo Cc: Tvrtko Ursulin, Dave Airlie, dri-devel, Greg Kroah-Hartman, Yuji Ishikawa, Linux-Kernel@Vger. Kernel. Org, Arnd Bergmann, Jiho Chu On Thu, Aug 4, 2022 at 6:04 PM Jeffrey Hugo <quic_jhugo@quicinc.com> wrote: > > On 8/4/2022 6:00 AM, Tvrtko Ursulin wrote: > > > > On 04/08/2022 00:54, Dave Airlie wrote: > >> On Thu, 4 Aug 2022 at 06:21, Oded Gabbay <oded.gabbay@gmail.com> wrote: > >>> > >>> On Wed, Aug 3, 2022 at 10:04 PM Dave Airlie <airlied@gmail.com> wrote: > >>>> > >>>> On Sun, 31 Jul 2022 at 22:04, Oded Gabbay <oded.gabbay@gmail.com> > >>>> wrote: > >>>>> > >>>>> Hi, > >>>>> Greg and I talked a couple of months ago about preparing a new accel > >>>>> subsystem for compute/acceleration devices that are not GPUs and I > >>>>> think your drivers that you are now trying to upstream fit it as well. > >>>> > >>>> We've had some submissions for not-GPUs to the drm subsystem recently. > >>>> > >>>> Intel GNA, Intel VPU, NVDLA, rpmsg AI processor unit. > >>>> > >>>> why is creating a new subsystem at this time necessary? > >>>> > >>>> Are we just creating a subsystem to avoid the open source userspace > >>>> consumer rules? Or do we have some concrete reasoning behind it? > >>>> > >>>> Dave. > >>> > >>> Hi Dave. > >>> The reason it happened now is because I saw two drivers, which are > >>> doing h/w acceleration for AI, trying to be accepted to the misc > >>> subsystem. > >>> Add to that the fact I talked with Greg a couple of months ago about > >>> doing a subsystem for any compute accelerators, which he was positive > >>> about, I thought it is a good opportunity to finally do it. > >>> > >>> I also honestly think that I can contribute much to these drivers from > >>> my experience with the habana driver (which is now deployed in mass at > >>> AWS) and contribute code from the habana driver to a common framework > >>> for AI drivers. > >> > >> Why not port the habana driver to drm now instead? I don't get why it > >> wouldn't make sense? > >> > >> Stepping up to create a new subsystem is great, but we need rules > >> around what belongs where, we can't just spawn new subsystems when we > >> have no clear guidelines on where drivers should land. > >> > >> What are the rules for a new accel subsystem? Do we have to now > >> retarget the 3 drivers that are queued up to use drm for accelerators, > >> because 2 drivers don't? > > > > Isn't there three on the "don't prefer drm" side as well? Habana, > > Toshiba and Samsung? Just so the numbers argument is not misrepresented. > > Perhaps a poll like a) prefer DRM, b) prefer a new subsystem, c) don't > > care in principle; is in order? > > I'll chime in with my opinions. Take them for what you will. > > I would say I fall into the C category, but I'm targeting DRM and will > be the 5th(?) accel device to do so. > > I'll say that the ksummit (from what I see in the LWN article) made me > very happy. Finally, the community had clear rules for accel drivers. > When I targeted misc in the past, it seemed like Greg moved the goal > post just for me, which stalled our attempt. It was even more > frustrating to see that the high bar Greg set for us was not applied to > other devices of the same "class" in following submissions. > > However, the past is the past, and based on ksummit, we've spent a > number of months retargeting DRM. In a week (or two), I plan to post > something to start up the discussions again. > > As far as the DRM userspace requirements, unless we've misunderstood > something, they've been easier to satisfy (pending review I suppose) > than what misc has set. I think it is quite the opposite. In misc originally there was very minimal userspace requirements, but when my driver started to use dma-buf, Dave asked for more. e.g. a driver that wants to get accepted to DRM and use a fork of LLVM must not only open-source his code, but also to upstream his fork to the mainline LLVM tree. In misc there is nothing that closely comes to that requirement afaik. > > I would say that Dave Airlie's feedback on this discussion resonates > with me. From the perspective of a vendor wanting to be a part of the > community, clear rules are important and ksummit seemed to set that. > Oded's announcement has thrown all of that into the wind. Without a That wasn't my intention. I simply wanted to: 1. Offload Greg with these types of drivers. 2. Offer to the new drivers a standard char device handling 3. Start a community of kernel hackers that are writing device drivers for compute accelerators. > proposal to evaluate (eg show me the code with clear guidelines), I > cannot seriously consider Oded's idea, and I'm not sure I want to sit by > another few years to see it settle out. I thought of posting something quick (but not dirty) but this backlash has made me rethink that. > > I expect to move forward with what we were planning prior to seeing this > thread which is targeting DRM. We'll see what the DRM folks say when > they have something to look at. If our device doesn't fit in DRM per an > assessment of the DRM folks, then I sure hope they can suggest where we > do fit because then we'll have tried misc and DRM, and not found a home. > Since "drivers/accel" doesn't exist, and realistically won't for a > long time if ever, I don't see why we should consider it. > > Why DRM? We consume dma_buf and might look to p2pdma in the future. > ksummit appears clear - we are a DRM device. Also, someone could > probably run openCL on our device if they were so inclined to wire it > up. Over time, I've come to the thinking that we are a GPU, just > without display. Yes, it would have helped if DRM and/or drivers/gpu > were renamed, but I think I'm past that point. Once you have everything > written, it doesn't seem like it matters if the uAPI device is called > /dev/drmX, /dev/miscX, or /dev/magic. > > I will not opine on other devices as I am no expert on them. Today, my > opinion is that DRM is the best place for me. We'll see where that goes. > > > More to the point, code sharing is a very compelling argument if it can > > be demonstrated to be significant, aka not needing to reinvent the same > > wheel. > > > > Perhaps one route forward could be a) to consider is to rename DRM to > > something more appropriate, removing rendering from the name and > > replacing with accelerators, co-processors, I don't know... Although I > > am not sure renaming the codebase, character device node names and > > userspace headers is all that feasible. Thought to mention it > > nevertheless, maybe it gives an idea to someone how it could be done. > > > > And b) allow the userspace rules to be considered per driver, or per > > class (is it a gpu or not should be a question that can be answered). > > Shouldn't be a blocker if it still matches the rules present elsewhere > > in the kernel. > > > > Those two would remove the two most contentions points as far as I > > understood the thread. > > > > Regards, > > > > Tvrtko > > > ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: New subsystem for acceleration devices @ 2022-08-04 17:53 ` Oded Gabbay 0 siblings, 0 replies; 78+ messages in thread From: Oded Gabbay @ 2022-08-04 17:53 UTC (permalink / raw) To: Jeffrey Hugo Cc: Tvrtko Ursulin, Jiho Chu, Arnd Bergmann, Linux-Kernel@Vger. Kernel. Org, dri-devel, Greg Kroah-Hartman, Yuji Ishikawa On Thu, Aug 4, 2022 at 6:04 PM Jeffrey Hugo <quic_jhugo@quicinc.com> wrote: > > On 8/4/2022 6:00 AM, Tvrtko Ursulin wrote: > > > > On 04/08/2022 00:54, Dave Airlie wrote: > >> On Thu, 4 Aug 2022 at 06:21, Oded Gabbay <oded.gabbay@gmail.com> wrote: > >>> > >>> On Wed, Aug 3, 2022 at 10:04 PM Dave Airlie <airlied@gmail.com> wrote: > >>>> > >>>> On Sun, 31 Jul 2022 at 22:04, Oded Gabbay <oded.gabbay@gmail.com> > >>>> wrote: > >>>>> > >>>>> Hi, > >>>>> Greg and I talked a couple of months ago about preparing a new accel > >>>>> subsystem for compute/acceleration devices that are not GPUs and I > >>>>> think your drivers that you are now trying to upstream fit it as well. > >>>> > >>>> We've had some submissions for not-GPUs to the drm subsystem recently. > >>>> > >>>> Intel GNA, Intel VPU, NVDLA, rpmsg AI processor unit. > >>>> > >>>> why is creating a new subsystem at this time necessary? > >>>> > >>>> Are we just creating a subsystem to avoid the open source userspace > >>>> consumer rules? Or do we have some concrete reasoning behind it? > >>>> > >>>> Dave. > >>> > >>> Hi Dave. > >>> The reason it happened now is because I saw two drivers, which are > >>> doing h/w acceleration for AI, trying to be accepted to the misc > >>> subsystem. > >>> Add to that the fact I talked with Greg a couple of months ago about > >>> doing a subsystem for any compute accelerators, which he was positive > >>> about, I thought it is a good opportunity to finally do it. > >>> > >>> I also honestly think that I can contribute much to these drivers from > >>> my experience with the habana driver (which is now deployed in mass at > >>> AWS) and contribute code from the habana driver to a common framework > >>> for AI drivers. > >> > >> Why not port the habana driver to drm now instead? I don't get why it > >> wouldn't make sense? > >> > >> Stepping up to create a new subsystem is great, but we need rules > >> around what belongs where, we can't just spawn new subsystems when we > >> have no clear guidelines on where drivers should land. > >> > >> What are the rules for a new accel subsystem? Do we have to now > >> retarget the 3 drivers that are queued up to use drm for accelerators, > >> because 2 drivers don't? > > > > Isn't there three on the "don't prefer drm" side as well? Habana, > > Toshiba and Samsung? Just so the numbers argument is not misrepresented. > > Perhaps a poll like a) prefer DRM, b) prefer a new subsystem, c) don't > > care in principle; is in order? > > I'll chime in with my opinions. Take them for what you will. > > I would say I fall into the C category, but I'm targeting DRM and will > be the 5th(?) accel device to do so. > > I'll say that the ksummit (from what I see in the LWN article) made me > very happy. Finally, the community had clear rules for accel drivers. > When I targeted misc in the past, it seemed like Greg moved the goal > post just for me, which stalled our attempt. It was even more > frustrating to see that the high bar Greg set for us was not applied to > other devices of the same "class" in following submissions. > > However, the past is the past, and based on ksummit, we've spent a > number of months retargeting DRM. In a week (or two), I plan to post > something to start up the discussions again. > > As far as the DRM userspace requirements, unless we've misunderstood > something, they've been easier to satisfy (pending review I suppose) > than what misc has set. I think it is quite the opposite. In misc originally there was very minimal userspace requirements, but when my driver started to use dma-buf, Dave asked for more. e.g. a driver that wants to get accepted to DRM and use a fork of LLVM must not only open-source his code, but also to upstream his fork to the mainline LLVM tree. In misc there is nothing that closely comes to that requirement afaik. > > I would say that Dave Airlie's feedback on this discussion resonates > with me. From the perspective of a vendor wanting to be a part of the > community, clear rules are important and ksummit seemed to set that. > Oded's announcement has thrown all of that into the wind. Without a That wasn't my intention. I simply wanted to: 1. Offload Greg with these types of drivers. 2. Offer to the new drivers a standard char device handling 3. Start a community of kernel hackers that are writing device drivers for compute accelerators. > proposal to evaluate (eg show me the code with clear guidelines), I > cannot seriously consider Oded's idea, and I'm not sure I want to sit by > another few years to see it settle out. I thought of posting something quick (but not dirty) but this backlash has made me rethink that. > > I expect to move forward with what we were planning prior to seeing this > thread which is targeting DRM. We'll see what the DRM folks say when > they have something to look at. If our device doesn't fit in DRM per an > assessment of the DRM folks, then I sure hope they can suggest where we > do fit because then we'll have tried misc and DRM, and not found a home. > Since "drivers/accel" doesn't exist, and realistically won't for a > long time if ever, I don't see why we should consider it. > > Why DRM? We consume dma_buf and might look to p2pdma in the future. > ksummit appears clear - we are a DRM device. Also, someone could > probably run openCL on our device if they were so inclined to wire it > up. Over time, I've come to the thinking that we are a GPU, just > without display. Yes, it would have helped if DRM and/or drivers/gpu > were renamed, but I think I'm past that point. Once you have everything > written, it doesn't seem like it matters if the uAPI device is called > /dev/drmX, /dev/miscX, or /dev/magic. > > I will not opine on other devices as I am no expert on them. Today, my > opinion is that DRM is the best place for me. We'll see where that goes. > > > More to the point, code sharing is a very compelling argument if it can > > be demonstrated to be significant, aka not needing to reinvent the same > > wheel. > > > > Perhaps one route forward could be a) to consider is to rename DRM to > > something more appropriate, removing rendering from the name and > > replacing with accelerators, co-processors, I don't know... Although I > > am not sure renaming the codebase, character device node names and > > userspace headers is all that feasible. Thought to mention it > > nevertheless, maybe it gives an idea to someone how it could be done. > > > > And b) allow the userspace rules to be considered per driver, or per > > class (is it a gpu or not should be a question that can be answered). > > Shouldn't be a blocker if it still matches the rules present elsewhere > > in the kernel. > > > > Those two would remove the two most contentions points as far as I > > understood the thread. > > > > Regards, > > > > Tvrtko > > > ^ permalink raw reply [flat|nested] 78+ messages in thread
end of thread, other threads:[~2022-09-29 6:51 UTC | newest] Thread overview: 78+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CGME20220731114605epcas1p1afff6b948f542e2062b60d49a8023f6f@epcas1p1.samsung.com> 2022-07-31 11:45 ` New subsystem for acceleration devices Oded Gabbay 2022-07-31 15:37 ` Greg Kroah-Hartman 2022-08-01 2:29 ` yuji2.ishikawa 2022-08-01 8:21 ` Oded Gabbay 2022-08-03 4:39 ` yuji2.ishikawa 2022-08-03 5:34 ` Greg KH 2022-08-03 20:28 ` Oded Gabbay 2022-08-02 17:25 ` Jiho Chu 2022-08-02 19:07 ` Oded Gabbay 2022-08-03 19:04 ` Dave Airlie 2022-08-03 20:20 ` Oded Gabbay 2022-08-03 23:31 ` Daniel Stone 2022-08-04 6:46 ` Oded Gabbay 2022-08-04 9:27 ` Jiho Chu 2022-08-03 23:54 ` Dave Airlie 2022-08-03 23:54 ` Dave Airlie 2022-08-04 7:43 ` Oded Gabbay 2022-08-04 7:43 ` Oded Gabbay 2022-08-04 14:50 ` Jason Gunthorpe 2022-08-04 14:50 ` Jason Gunthorpe 2022-08-04 17:48 ` Oded Gabbay 2022-08-04 17:48 ` Oded Gabbay 2022-08-05 0:22 ` Jason Gunthorpe 2022-08-05 0:22 ` Jason Gunthorpe 2022-08-07 6:43 ` Oded Gabbay 2022-08-07 6:43 ` Oded Gabbay 2022-08-07 11:25 ` Oded Gabbay 2022-08-07 11:25 ` Oded Gabbay 2022-08-08 6:10 ` Greg Kroah-Hartman 2022-08-08 6:10 ` Greg Kroah-Hartman 2022-08-08 17:55 ` Jason Gunthorpe 2022-08-08 17:55 ` Jason Gunthorpe 2022-08-09 6:23 ` Greg Kroah-Hartman 2022-08-09 6:23 ` Greg Kroah-Hartman 2022-08-09 8:04 ` Christoph Hellwig 2022-08-09 8:32 ` Arnd Bergmann 2022-08-09 8:32 ` Arnd Bergmann 2022-08-09 12:18 ` Jason Gunthorpe 2022-08-09 12:18 ` Jason Gunthorpe 2022-08-09 12:46 ` Arnd Bergmann 2022-08-09 12:46 ` Arnd Bergmann 2022-08-09 14:22 ` Jason Gunthorpe 2022-08-09 14:22 ` Jason Gunthorpe 2022-08-09 8:45 ` Greg Kroah-Hartman 2022-08-09 8:45 ` Greg Kroah-Hartman 2022-08-08 17:46 ` Jason Gunthorpe 2022-08-08 17:46 ` Jason Gunthorpe 2022-08-08 20:26 ` Oded Gabbay 2022-08-08 20:26 ` Oded Gabbay 2022-08-09 12:43 ` Jason Gunthorpe 2022-08-09 12:43 ` Jason Gunthorpe 2022-08-05 3:02 ` Dave Airlie 2022-08-05 3:02 ` Dave Airlie 2022-08-07 6:50 ` Oded Gabbay 2022-08-07 6:50 ` Oded Gabbay 2022-08-09 21:42 ` Oded Gabbay 2022-08-09 21:42 ` Oded Gabbay 2022-08-10 9:00 ` Jiho Chu 2022-08-10 9:00 ` Jiho Chu 2022-08-10 14:05 ` yuji2.ishikawa 2022-08-10 14:05 ` yuji2.ishikawa 2022-08-10 14:37 ` Oded Gabbay 2022-08-10 14:37 ` Oded Gabbay 2022-08-23 18:23 ` Kevin Hilman 2022-08-23 18:23 ` Kevin Hilman 2022-08-23 20:45 ` Oded Gabbay 2022-08-23 20:45 ` Oded Gabbay 2022-08-29 20:54 ` Kevin Hilman 2022-08-29 20:54 ` Kevin Hilman 2022-09-23 16:21 ` Oded Gabbay 2022-09-23 16:21 ` Oded Gabbay 2022-09-26 8:16 ` Christoph Hellwig 2022-09-29 6:50 ` Oded Gabbay 2022-09-29 6:50 ` Oded Gabbay 2022-08-04 12:00 ` Tvrtko Ursulin 2022-08-04 15:03 ` Jeffrey Hugo 2022-08-04 17:53 ` Oded Gabbay 2022-08-04 17:53 ` Oded Gabbay
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.