linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jerome Glisse <jglisse@redhat.com>
To: Olof Johansson <olof@lixom.net>
Cc: Dave Airlie <airlied@gmail.com>,
	Oded Gabbay <oded.gabbay@gmail.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	LKML <linux-kernel@vger.kernel.org>,
	ogabbay@habana.ai, Arnd Bergmann <arnd@arndb.de>,
	fbarrat@linux.ibm.com,
	Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Subject: Re: [PATCH 00/15] Habana Labs kernel driver
Date: Wed, 23 Jan 2019 18:48:18 -0500	[thread overview]
Message-ID: <20190123234817.GE1257@redhat.com> (raw)
In-Reply-To: <CAOesGMhU6wzX=sBC0iqDrzBYinP8POp=9-d4oEQoaNee-dU1zQ@mail.gmail.com>

On Wed, Jan 23, 2019 at 03:40:25PM -0800, Olof Johansson wrote:
> On Wed, Jan 23, 2019 at 3:20 PM Jerome Glisse <jglisse@redhat.com> wrote:
> >
> > On Wed, Jan 23, 2019 at 03:04:33PM -0800, Olof Johansson wrote:
> > > On Wed, Jan 23, 2019 at 2:45 PM Dave Airlie <airlied@gmail.com> wrote:
> > > >
> > > > On Thu, 24 Jan 2019 at 08:32, Oded Gabbay <oded.gabbay@gmail.com> wrote:
> > > > >
> > > > > On Thu, Jan 24, 2019 at 12:02 AM Dave Airlie <airlied@gmail.com> wrote:
> > > > > >
> > > > > > Adding Daniel as well.
> > > > > >
> > > > > > Dave.
> > > > > >
> > > > > > On Thu, 24 Jan 2019 at 07:57, Dave Airlie <airlied@gmail.com> wrote:
> > > > > > >
> > > > > > > On Wed, 23 Jan 2019 at 10:01, Oded Gabbay <oded.gabbay@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > For those who don't know me, my name is Oded Gabbay (Kernel Maintainer
> > > > > > > > for AMD's amdkfd driver, worked at RedHat's Desktop group) and I work at
> > > > > > > > Habana Labs since its inception two and a half years ago.
> > > > > > >
> > > > > > > Hey Oded,
> > > > > > >
> > > > > > > So this creates a driver with a userspace facing API via ioctls.
> > > > > > > Although this isn't a "GPU" driver we have a rule in the graphics
> > > > > > > drivers are for accelerators that we don't merge userspace API with an
> > > > > > > appropriate userspace user.
> > > > > > >
> > > > > > > https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements
> > > > > > >
> > > > > > > I see nothing in these accelerator drivers that make me think we
> > > > > > > should be treating them different.
> > > > > > >
> > > > > > > Having large closed userspaces that we have no insight into means we
> > > > > > > get suboptimal locked for ever uAPIs. If someone in the future creates
> > > > > > > an open source userspace, we will end up in a place where they get
> > > > > > > suboptimal behaviour because they are locked into a uAPI that we can't
> > > > > > > change.
> > > > > > >
> > > > > > > Dave.
> > > > >
> > > > > Hi Dave,
> > > > > While I always appreciate your opinion and happy to hear it, I totally
> > > > > disagree with you on this point.
> > > > >
> > > > > First of all, as you said, this device is NOT a GPU. Hence, I wasn't
> > > > > aware that this rule might apply to this driver or to any other driver
> > > > > outside of drm. Has this rule been applied to all the current drivers
> > > > > in the kernel tree with userspace facing API via IOCTLs, which are not
> > > > > in the drm subsystem ?  I see the logic for GPUs as they drive the
> > > > > display of the entire machine, but this is an accelerator for a
> > > > > specific purpose, not something generic as GPU. I just don't see how
> > > > > one can treat them in the same way.
> > > >
> > > > The logic isn't there for GPUs for those reason that we have an
> > > > established library or that GPUs are in laptops. They are just where
> > > > we learned the lessons of merging things whose primary reason for
> > > > being in the kernel is to execute stuff from misc userspace stacks,
> > > > where the uAPI has to remain stable indefinitely.
> > > >
> > > > a) security - without knowledge of what the accelerator can do how can
> > > > we know if the API you expose isn't just a giant root hole?
> > > >
> > > > b) uAPI stability. Without a userspace for this, there is no way for
> > > > anyone even if in possession of the hardware to validate the uAPI you
> > > > provide and are asking the kernel to commit to supporting indefinitely
> > > > is optimal or secure. If an open source userspace appears is it to be
> > > > limited to API the closed userspace has created. It limits the future
> > > > unnecessarily.
> > > >
> > > > > There is no way that "someone" will create a userspace
> > > > > for our H/W without the intimate knowledge of the H/W or without the
> > > > > ISA of our programmable cores. Maybe for large companies this request
> > > > > is valid, but for startups complying to this request is not realistic.
> > > >
> > > > So what benefit does the Linux kernel get from having support for this
> > > > feature upstream?
> > > >
> > > > If users can't access the necessary code to use it, why does this
> > > > require to be maintained in the kernel.
> > > >
> > > > > To conclude, I think this approach discourage other companies from
> > > > > open sourcing their drivers and is counter-productive. I'm not sure
> > > > > you are aware of how difficult it is to convince startup management to
> > > > > opensource the code...
> > > >
> > > > Oh I am, but I'm also more aware how quickly startups go away and
> > > > leave the kernel holding a lot of code we don't know how to validate
> > > > or use.
> > > >
> > > > I'm opening to being convinced but I think defining new userspace
> > > > facing APIs is a task that we should take a lot more seriously going
> > > > forward to avoid mistakes of the past.
> > >
> > > I think the most important thing here is to know that things are
> > > likely to change quite a bit over the next couple of years, and that
> > > we don't know yet what we actually need. If we hold off picking up
> > > support for hardware while all of this is ironed out, we'll miss out
> > > on being exposed to it, and will have a very tall hill to climb once
> > > we try to convince vendors to come into the fold. It's also not been a
> > > requirement for the other two drivers we have merged, as far as I can
> > > tell (CAPI and OpenCAPI) so the cat's already out of the bag.
> > >
> > > I'd rather not get stuck in a stand-off needing the longterm solution
> > > to pick up the short term contribution. That way we can move over to a
> > > _new_ API once there's been a better chance of finding common grounds
> > > and once things settle down a bit, instead of trying to bring some
> > > larger legacy codebase for devices that people might no longer care
> > > much about over to the newer APIs.
> > >
> > > It's better to be exposed to the HW and drivers now, than having
> > > people build large elaborate out-of-tree software stacks for this.
> > > It's also better to get them to come and collaborate now, instead of
> > > pushing them away until things are perfect.
> > >
> > > Having a way to validate and exercise the userspace API is important,
> > > including ability to change it if needed. Would it be possible to open
> > > up the lowest userspace pieces (driver interactions), even if some
> > > other layers might not yet be, to exercise the device/kernel/userspace
> > > interfaces without "live" workload, etc?
> >
> > Yes and to exercise the userspace API you need at very least to
> > know the ISA so that you can write program for the accelerator.
> > You also need to know the set of commands the hardware has. The
> > ioctl and how to create a userspace that interact with the kernel
> > is the easy part, the hard part is the compiler.
> >
> > So if we want any kind of freedom to play with the UAPI, enhance
> > it or change it in anyway we must be free to build program for the
> > device ourself.
> >
> > I believe that the GPU sub-system requirement are a good guideline
> > to follow and the only exception with drivers/ that i am aware of
> > is the fpga. Everything else in driver as either an open source
> > userspace, expose a common API (like network) or is so simple that
> > anyone can write a userspace for it.
> 
> Once we have a common framework I agree that we need enough tools to
> exercise everything needed. I don't agree that this includes full
> sources to everything. We don't expect this for most PCIe cards today
> either.

We do expected this today except for FPGA, i do not know any single
pcie device with upstream driver that we do not know how to program.
Biggest chunk of PCIE devices are straightforward (network, sound,
media, ...).

So in effect today the lowest common denominator is open source user
space or device API is so simple that user space is obvious (various
media device).

> 
> If the GPU subsystem is to be followed, I fear that we will end up
> with Nvidia-equivalent vendors from day 1, where they will just build
> a bigger and bigger software stack on the side instead of joining in,
> and someone will need to best-effort bridge the gap by reverse
> engineering. I don't want that situation long-term, which is why I
> think it's reasonable to be more relaxed during the early days with
> upfront, clear, expectations for the longer term that hardware/kernel
> interfaces need to be exercisable.

I think the other way around, allowing people to push upstream driver
with no open source user space and people loose any motivation to
work on open sourcing their userspace. Not being upstream is painful
enough that they will get pressure to go upstream and if upstream
means open source userspace then they have to comply.

> 
> > For any complex device that execute program we should really enforce
> > the open source userspace so that we can properly audit the driver
> > as otherwise we only have half of the story with no idea what the
> > other half might implies.
> 
> What you're demanding is open userspace _and_ firmware. Since without
> firmware sources, you can't audit any on-chip behavior either (in
> reality, most commands passed down are likely parsed by said
> firmware).

No i do not ask for firmware. If we have any doubt about what the firm-
ware can let through then we can lock down the ioctl ie parse commands
from userspace and only allow kernel to write sanitize command to
command queue. By auditing here i mean being able to understand the
overall flow that is expected from program so from that program flow
we can work on what is the best UAPI with minimum overhead to achieve
that program flow the most efficiently. Sorry if that was not clear.

Cheers,
Jérôme

  reply	other threads:[~2019-01-23 23:48 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-23  0:00 [PATCH 00/15] Habana Labs kernel driver Oded Gabbay
2019-01-23  0:00 ` [PATCH 01/15] habanalabs: add skeleton driver Oded Gabbay
2019-01-23  0:49   ` Joe Perches
2019-01-25 19:18     ` Oded Gabbay
2019-01-23 12:28   ` Mike Rapoport
2019-01-23 12:40     ` Greg KH
2019-01-23 12:55       ` Mike Rapoport
2019-01-25 20:09         ` Oded Gabbay
2019-01-25 20:05     ` Oded Gabbay
2019-01-26 16:05   ` Arnd Bergmann
2019-01-26 16:24     ` Oded Gabbay
2019-01-26 21:14       ` Arnd Bergmann
2019-01-26 21:48         ` Oded Gabbay
2019-01-27  8:32           ` gregkh
2019-01-29 22:49             ` Oded Gabbay
2019-01-23  0:00 ` [PATCH 03/15] habanalabs: add basic Goya support Oded Gabbay
2019-01-23 12:28   ` Mike Rapoport
2019-01-25 20:32     ` Oded Gabbay
2019-01-27  6:39       ` Mike Rapoport
2019-01-28  7:44         ` Oded Gabbay
2019-01-23  0:00 ` [PATCH 04/15] habanalabs: add context and ASID modules Oded Gabbay
2019-01-23 12:28   ` Mike Rapoport
2019-01-25 21:07     ` Oded Gabbay
2019-01-23  0:00 ` [PATCH 05/15] habanalabs: add command buffer module Oded Gabbay
2019-01-23 12:28   ` Mike Rapoport
2019-01-25 21:47     ` Oded Gabbay
2019-01-27  6:49       ` Mike Rapoport
2019-01-28  7:55         ` Oded Gabbay
2019-01-28  8:41           ` Mike Rapoport
2019-01-23  0:00 ` [PATCH 06/15] habanalabs: add basic Goya h/w initialization Oded Gabbay
2019-01-25  7:46   ` Mike Rapoport
2019-01-28 10:35     ` Oded Gabbay
2019-01-23  0:00 ` [PATCH 07/15] habanalabs: add h/w queues module Oded Gabbay
2019-01-25  7:50   ` Mike Rapoport
2019-01-28 10:50     ` Oded Gabbay
2019-01-23  0:00 ` [PATCH 08/15] habanalabs: add event queue and interrupts Oded Gabbay
2019-01-25  7:51   ` Mike Rapoport
2019-01-28 11:14     ` Oded Gabbay
2019-01-23  0:00 ` [PATCH 09/15] habanalabs: add sysfs and hwmon support Oded Gabbay
2019-01-25  7:54   ` Mike Rapoport
2019-01-28 11:26     ` Oded Gabbay
2019-01-23  0:00 ` [PATCH 10/15] habanalabs: add device reset support Oded Gabbay
2019-01-27  7:51   ` Mike Rapoport
2019-01-28 12:53     ` Oded Gabbay
2019-01-23  0:00 ` [PATCH 11/15] habanalabs: add command submission module Oded Gabbay
2019-01-27 15:11   ` Mike Rapoport
2019-01-28 13:51     ` Oded Gabbay
2019-01-23  0:00 ` [PATCH 12/15] habanalabs: add virtual memory and MMU modules Oded Gabbay
2019-01-27 16:13   ` Mike Rapoport
2019-01-30 10:34     ` Oded Gabbay
2019-01-23  0:00 ` [PATCH 13/15] habanalabs: implement INFO IOCTL Oded Gabbay
2019-01-23  0:00 ` [PATCH 14/15] habanalabs: add debugfs support Oded Gabbay
2019-01-23  0:00 ` [PATCH 15/15] Update MAINTAINERS and CREDITS with habanalabs info Oded Gabbay
2019-01-23 12:27 ` [PATCH 00/15] Habana Labs kernel driver Mike Rapoport
2019-01-23 22:43   ` Oded Gabbay
2019-01-23 21:52 ` Olof Johansson
2019-01-23 22:40   ` Oded Gabbay
2019-01-23 23:16     ` Olof Johansson
2019-01-24  1:03   ` Andrew Donnellan
2019-01-24 11:59     ` Jonathan Cameron
2019-01-25 17:13     ` Olof Johansson
2019-02-24 22:23   ` Pavel Machek
2019-01-23 21:57 ` Dave Airlie
2019-01-23 22:02   ` Dave Airlie
2019-01-23 22:31     ` Oded Gabbay
2019-01-23 22:45       ` Dave Airlie
2019-01-23 23:04         ` Olof Johansson
2019-01-23 23:20           ` Jerome Glisse
2019-01-23 23:35             ` Oded Gabbay
2019-01-23 23:41               ` Olof Johansson
2019-01-23 23:40             ` Olof Johansson
2019-01-23 23:48               ` Jerome Glisse [this message]
2019-01-24  7:35                 ` Daniel Vetter
2019-01-24  9:50                   ` Oded Gabbay
2019-01-24 10:22                     ` Dave Airlie
2019-01-25  0:13                       ` Olof Johansson
2019-01-25  7:43                         ` Daniel Vetter
2019-01-25 15:02                           ` Olof Johansson
2019-01-25 16:00                             ` Daniel Vetter
2019-01-24 23:51                   ` Olof Johansson
2019-01-23 23:23           ` Oded Gabbay
2019-01-25  7:37   ` Greg Kroah-Hartman
2019-01-25 15:33     ` Olof Johansson
2019-01-25 16:06       ` Greg Kroah-Hartman
2019-01-25 17:12         ` Olof Johansson
2019-01-25 18:16           ` [PATCH/RFC 0/5] HW accel subsystem Olof Johansson
2019-01-25 18:16             ` [PATCH 1/5] drivers/accel: Introduce subsystem Olof Johansson
2019-01-25 21:13               ` [PATCH v2 " Olof Johansson
2019-01-26 17:09                 ` Randy Dunlap
2019-01-27  4:31                 ` Andrew Donnellan
2019-01-28 19:36                   ` Frederic Barrat
2019-01-25 22:23               ` [PATCH " Daniel Vetter
2019-01-27 16:31                 ` Daniel Vetter
2019-01-25 18:16             ` [PATCH 2/5] cxl: Move to drivers/accel Olof Johansson
2019-01-25 18:16             ` [PATCH 3/5] drivers/accel: cxl: Move non-uapi include files Olof Johansson
2019-01-25 18:16             ` [PATCH 4/5] ocxl: Move to drivers/accel Olof Johansson
2019-01-25 18:16             ` [PATCH 5/5] drivers/accel: ocxl: Move non-uapi include files Olof Johansson
2019-01-26 13:51               ` Greg Kroah-Hartman
2019-01-26 21:11             ` [PATCH/RFC 0/5] HW accel subsystem Arnd Bergmann
2019-02-01  9:10             ` Kenneth Lee
2019-02-01 10:07               ` Greg Kroah-Hartman
2019-02-01 12:09                 ` Kenneth Lee
2019-01-26 13:52           ` [PATCH 00/15] Habana Labs kernel driver Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190123234817.GE1257@redhat.com \
    --to=jglisse@redhat.com \
    --cc=airlied@gmail.com \
    --cc=andrew.donnellan@au1.ibm.com \
    --cc=arnd@arndb.de \
    --cc=daniel.vetter@ffwll.ch \
    --cc=fbarrat@linux.ibm.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oded.gabbay@gmail.com \
    --cc=ogabbay@habana.ai \
    --cc=olof@lixom.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).