linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oded Gabbay <oded.gabbay@gmail.com>
To: Olof Johansson <olof@lixom.net>
Cc: Dave Airlie <airlied@redhat.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	ogabbay@habana.ai, Arnd Bergmann <arnd@arndb.de>,
	fbarrat@linux.ibm.com, andrew.donnellan@au1.ibm.com
Subject: Re: [PATCH 00/15] Habana Labs kernel driver
Date: Thu, 24 Jan 2019 00:40:40 +0200	[thread overview]
Message-ID: <CAFCwf11sxVAi1fxeZ698rBoJbaV3WHRJAyqB3RyddDLzfOysxA@mail.gmail.com> (raw)
In-Reply-To: <CAOesGMjU0tjJwAqCADaAv6XrCGbjB8G2oT=4LxOgSQBHO7Gptw@mail.gmail.com>

On Wed, Jan 23, 2019 at 11:52 PM Olof Johansson <olof@lixom.net> wrote:
>
> Hi,
>
> On Tue, Jan 22, 2019 at 4:01 PM Oded Gabbay <oded.gabbay@gmail.com> wrote:
> >
> > Hello,
> >
> > For those who don't know me, my name is Oded Gabbay (Kernel Maintainer
> > for AMD's amdkfd driver, worked at RedHat's Desktop group) and I work at
> > Habana Labs since its inception two and a half years ago.
> >
> > Habana is a leading startup in the emerging AI processor space and we have
> > already started production of our first Goya inference processor PCIe card
> > and delivered it to customers. The Goya processor silicon has been tested
> > since June of 2018 and is production-qualified by now. The Gaudi training
> > processor solution is slated to sample in the second quarter of 2019.
> >
> > This patch-set contains the kernel driver for Habana's AI Processors
> > (AIP) that are designed to accelerate Deep Learning inference and training
> > workloads. The current version supports only the Goya processor and
> > support for Gaudi will be upstreamed after the ASIC will be available to
> > customers.
> [...]
>
> As others have mentioned, thanks for the amount of background and
> information in this patch set, it's great to see.
>
> Some have pointed out style and formatting issues, I'm not going to do
> that here but I do have some higher-level comments:
>
>  - There's a whole bunch of register definition headers. Outside of
> GPUs, traditionally we don't include the full sets unless they're
> needed in the driver since they tend to be very verbose.

And it is not the entire list :)
I trimmed down the files to only the files I actually use registers
from. I didn't went into those files and removed from them the
registers I don't use.
I hope this isn't a hard requirement because that's really a dirty work.

>  - I see a good amount of HW setup code that's mostly just writing
> hardcoded values to a large number of registers. I don't have any
> specific recommendation on how to do it better, but doing as much as
> possible of this through on-device firmware tends to be a little
> cleaner (or rather, hides it from the kernel. :). I don't know if that
> fits your design though.

This is actually not according to our design. In our design, the host
driver is the "king" of the device and we prefer to have all
initializations which can be done from the host to be done from the
host.
I know its not a "technical" hard reason, but on the other hand, I
don't think that's really something so terrible that it can't be done
from the driver.

>  - Are there any pointers to the userspace pieces that are used to run
> on this card, or any kind of test suites that can be used when someone
> has the hardware and is looking to change the driver?

Not right now. I do hope we can release a package with some
pre-compiled libraries and binaries that can be used to work vs. the
driver, but I don't believe it will be open-source. At least, not in
2019.

>
> But, I think the largest question I have (for a broader audience) is:
>
> I predict that we will see a handful of these kind of devices over the
> upcoming future -- definitely from ML accelerators but maybe also for
> other kinds of processing, where there's a command-based, buffer-based
> setup sending workloads to an offload engine and getting results back.
> While the first waves will all look different due to design trade-offs
> made in isolation, I think it makes sense to group them in one bucket
> instead of merging them through drivers/misc, if nothing else to
> encourage more cross-collaboration over time. First steps in figuring
> out long-term suitable frameworks is to get a survey of a few
> non-shared implementations.
>
> So, I'd like to propose a drivers/accel drivers subtree, and I'd be
> happy to bootstrap it with a small group (@Dave Airlie: I think your
> input from GPU land be very useful, want to join in?). Individual
> drivers maintained by existing maintainers, of course.
>
> I think it might make sense to move the CAPI/OpenCAPI drivers over as
> well -- not necessarily to change those drivers, but to group them
> with the rest as more show up.

I actually prefer not going down that path, at least not from the
start. AFAIK, there is no other device driver in the kernel for AI
acceleration and I don't want to presume I know all the answers for
such devices.
You have said it yourself: there will be many devices and they won't
be similar, at least not in the next few years. So I think that trying
to setup a subsystem for this now would be a premature optimization.

Oded

>
>
> -Olof
>
>
>
> -Olof

  reply	other threads:[~2019-01-23 22:41 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-23  0:00 [PATCH 00/15] Habana Labs kernel driver Oded Gabbay
2019-01-23  0:00 ` [PATCH 01/15] habanalabs: add skeleton driver Oded Gabbay
2019-01-23  0:49   ` Joe Perches
2019-01-25 19:18     ` Oded Gabbay
2019-01-23 12:28   ` Mike Rapoport
2019-01-23 12:40     ` Greg KH
2019-01-23 12:55       ` Mike Rapoport
2019-01-25 20:09         ` Oded Gabbay
2019-01-25 20:05     ` Oded Gabbay
2019-01-26 16:05   ` Arnd Bergmann
2019-01-26 16:24     ` Oded Gabbay
2019-01-26 21:14       ` Arnd Bergmann
2019-01-26 21:48         ` Oded Gabbay
2019-01-27  8:32           ` gregkh
2019-01-29 22:49             ` Oded Gabbay
2019-01-23  0:00 ` [PATCH 03/15] habanalabs: add basic Goya support Oded Gabbay
2019-01-23 12:28   ` Mike Rapoport
2019-01-25 20:32     ` Oded Gabbay
2019-01-27  6:39       ` Mike Rapoport
2019-01-28  7:44         ` Oded Gabbay
2019-01-23  0:00 ` [PATCH 04/15] habanalabs: add context and ASID modules Oded Gabbay
2019-01-23 12:28   ` Mike Rapoport
2019-01-25 21:07     ` Oded Gabbay
2019-01-23  0:00 ` [PATCH 05/15] habanalabs: add command buffer module Oded Gabbay
2019-01-23 12:28   ` Mike Rapoport
2019-01-25 21:47     ` Oded Gabbay
2019-01-27  6:49       ` Mike Rapoport
2019-01-28  7:55         ` Oded Gabbay
2019-01-28  8:41           ` Mike Rapoport
2019-01-23  0:00 ` [PATCH 06/15] habanalabs: add basic Goya h/w initialization Oded Gabbay
2019-01-25  7:46   ` Mike Rapoport
2019-01-28 10:35     ` Oded Gabbay
2019-01-23  0:00 ` [PATCH 07/15] habanalabs: add h/w queues module Oded Gabbay
2019-01-25  7:50   ` Mike Rapoport
2019-01-28 10:50     ` Oded Gabbay
2019-01-23  0:00 ` [PATCH 08/15] habanalabs: add event queue and interrupts Oded Gabbay
2019-01-25  7:51   ` Mike Rapoport
2019-01-28 11:14     ` Oded Gabbay
2019-01-23  0:00 ` [PATCH 09/15] habanalabs: add sysfs and hwmon support Oded Gabbay
2019-01-25  7:54   ` Mike Rapoport
2019-01-28 11:26     ` Oded Gabbay
2019-01-23  0:00 ` [PATCH 10/15] habanalabs: add device reset support Oded Gabbay
2019-01-27  7:51   ` Mike Rapoport
2019-01-28 12:53     ` Oded Gabbay
2019-01-23  0:00 ` [PATCH 11/15] habanalabs: add command submission module Oded Gabbay
2019-01-27 15:11   ` Mike Rapoport
2019-01-28 13:51     ` Oded Gabbay
2019-01-23  0:00 ` [PATCH 12/15] habanalabs: add virtual memory and MMU modules Oded Gabbay
2019-01-27 16:13   ` Mike Rapoport
2019-01-30 10:34     ` Oded Gabbay
2019-01-23  0:00 ` [PATCH 13/15] habanalabs: implement INFO IOCTL Oded Gabbay
2019-01-23  0:00 ` [PATCH 14/15] habanalabs: add debugfs support Oded Gabbay
2019-01-23  0:00 ` [PATCH 15/15] Update MAINTAINERS and CREDITS with habanalabs info Oded Gabbay
2019-01-23 12:27 ` [PATCH 00/15] Habana Labs kernel driver Mike Rapoport
2019-01-23 22:43   ` Oded Gabbay
2019-01-23 21:52 ` Olof Johansson
2019-01-23 22:40   ` Oded Gabbay [this message]
2019-01-23 23:16     ` Olof Johansson
2019-01-24  1:03   ` Andrew Donnellan
2019-01-24 11:59     ` Jonathan Cameron
2019-01-25 17:13     ` Olof Johansson
2019-02-24 22:23   ` Pavel Machek
2019-01-23 21:57 ` Dave Airlie
2019-01-23 22:02   ` Dave Airlie
2019-01-23 22:31     ` Oded Gabbay
2019-01-23 22:45       ` Dave Airlie
2019-01-23 23:04         ` Olof Johansson
2019-01-23 23:20           ` Jerome Glisse
2019-01-23 23:35             ` Oded Gabbay
2019-01-23 23:41               ` Olof Johansson
2019-01-23 23:40             ` Olof Johansson
2019-01-23 23:48               ` Jerome Glisse
2019-01-24  7:35                 ` Daniel Vetter
2019-01-24  9:50                   ` Oded Gabbay
2019-01-24 10:22                     ` Dave Airlie
2019-01-25  0:13                       ` Olof Johansson
2019-01-25  7:43                         ` Daniel Vetter
2019-01-25 15:02                           ` Olof Johansson
2019-01-25 16:00                             ` Daniel Vetter
2019-01-24 23:51                   ` Olof Johansson
2019-01-23 23:23           ` Oded Gabbay
2019-01-25  7:37   ` Greg Kroah-Hartman
2019-01-25 15:33     ` Olof Johansson
2019-01-25 16:06       ` Greg Kroah-Hartman
2019-01-25 17:12         ` Olof Johansson
2019-01-25 18:16           ` [PATCH/RFC 0/5] HW accel subsystem Olof Johansson
2019-01-25 18:16             ` [PATCH 1/5] drivers/accel: Introduce subsystem Olof Johansson
2019-01-25 21:13               ` [PATCH v2 " Olof Johansson
2019-01-26 17:09                 ` Randy Dunlap
2019-01-27  4:31                 ` Andrew Donnellan
2019-01-28 19:36                   ` Frederic Barrat
2019-01-25 22:23               ` [PATCH " Daniel Vetter
2019-01-27 16:31                 ` Daniel Vetter
2019-01-25 18:16             ` [PATCH 2/5] cxl: Move to drivers/accel Olof Johansson
2019-01-25 18:16             ` [PATCH 3/5] drivers/accel: cxl: Move non-uapi include files Olof Johansson
2019-01-25 18:16             ` [PATCH 4/5] ocxl: Move to drivers/accel Olof Johansson
2019-01-25 18:16             ` [PATCH 5/5] drivers/accel: ocxl: Move non-uapi include files Olof Johansson
2019-01-26 13:51               ` Greg Kroah-Hartman
2019-01-26 21:11             ` [PATCH/RFC 0/5] HW accel subsystem Arnd Bergmann
2019-02-01  9:10             ` Kenneth Lee
2019-02-01 10:07               ` Greg Kroah-Hartman
2019-02-01 12:09                 ` Kenneth Lee
2019-01-26 13:52           ` [PATCH 00/15] Habana Labs kernel driver Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFCwf11sxVAi1fxeZ698rBoJbaV3WHRJAyqB3RyddDLzfOysxA@mail.gmail.com \
    --to=oded.gabbay@gmail.com \
    --cc=airlied@redhat.com \
    --cc=andrew.donnellan@au1.ibm.com \
    --cc=arnd@arndb.de \
    --cc=fbarrat@linux.ibm.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ogabbay@habana.ai \
    --cc=olof@lixom.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).