All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Oded Gabbay <oded.gabbay@gmail.com>
Cc: Gal Pressman <galpress@amazon.com>,
	Jason Gunthorpe <jgg@ziepe.ca>, Jakub Kicinski <kuba@kernel.org>,
	"Linux-Kernel@Vger. Kernel. Org" <linux-kernel@vger.kernel.org>,
	netdev@vger.kernel.org, SW_Drivers <SW_Drivers@habana.ai>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"David S. Miller" <davem@davemloft.net>,
	Andrew Lunn <andrew@lunn.ch>,
	Florian Fainelli <f.fainelli@gmail.com>,
	linux-rdma@vger.kernel.org, izur@habana.ai,
	Olof Johansson <olof@lixom.net>
Subject: Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
Date: Fri, 18 Sep 2020 16:09:55 +0300	[thread overview]
Message-ID: <20200918130955.GV869610@unreal> (raw)
In-Reply-To: <CAFCwf12KEa=chCZCWWkJ5bvGDeRCrmBcY9fB8CrtzjOknRQ5Qg@mail.gmail.com>

On Fri, Sep 18, 2020 at 03:31:51PM +0300, Oded Gabbay wrote:
> On Fri, Sep 18, 2020 at 3:19 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Fri, Sep 18, 2020 at 03:07:19PM +0300, Oded Gabbay wrote:
> > > On Fri, Sep 18, 2020 at 3:03 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > >
> > > > On Fri, Sep 18, 2020 at 02:56:09PM +0300, Oded Gabbay wrote:
> > > > > On Fri, Sep 18, 2020 at 2:52 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > > > >
> > > > > > On Fri, Sep 18, 2020 at 02:36:10PM +0300, Gal Pressman wrote:
> > > > > > > On 17/09/2020 20:18, Jason Gunthorpe wrote:
> > > > > > > > On Tue, Sep 15, 2020 at 11:46:58PM +0300, Oded Gabbay wrote:
> > > > > > > >> infrastructure for communication between multiple accelerators. Same
> > > > > > > >> as Nvidia uses NVlink, we use RDMA that we have inside our ASIC.
> > > > > > > >> The RDMA implementation we did does NOT support some basic RDMA
> > > > > > > >> IBverbs (such as MR and PD) and therefore, we can't use the rdma-core
> > > > > > > >> library or to connect to the rdma infrastructure in the kernel.
> > > > > > > >
> > > > > > > > You can't create a parallel RDMA subsystem in netdev, or in misc, and
> > > > > > > > you can't add random device offloads as IOCTL to nedevs.
> > > > > > > >
> > > > > > > > RDMA is the proper home for all the networking offloads that don't fit
> > > > > > > > into netdev.
> > > > > > > >
> > > > > > > > EFA was able to fit into rdma-core/etc and it isn't even RoCE at
> > > > > > > > all. I'm sure this can too.
> > > > > > >
> > > > > > > Well, EFA wasn't welcomed to the RDMA subsystem with open arms ;), initially it
> > > > > > > was suggested to go through the vfio subsystem instead.
> > > > > > >
> > > > > > > I think this comes back to the discussion we had when EFA was upstreamed, which
> > > > > > > is what's the bar to get accepted to the RDMA subsystem.
> > > > > > > IIRC, what we eventually agreed on is having a userspace rdma-core provider and
> > > > > > > ibv_{ud,rc}_pingpong working (or just supporting one of the IB spec's QP types?).
> > > > > > >
> > > > > > > Does GAUDI fit these requirements? If not, should it be in a different subsystem
> > > > > > > or should we open the "what qualifies as an RDMA device" question again?
> > > > > >
> > > > > > I want to remind you that rdma-core requirement came to make sure that
> > > > > > anything exposed from the RDMA to the userspace is strict with proper
> > > > > > UAPI header hygiene.
> > > > > >
> > > > > > I doubt that Havana's ioctls are backed by anything like this.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > Why do you doubt that ? Have you looked at our code ?
> > > > > Our uapi and IOCTLs interface is based on drm subsystem uapi interface
> > > > > and it is very safe and protected.
> > > >
> > > > Yes, I looked and didn't find open-source users of your UAPI headers.
> > > > It is not related to being safe or protected by to the common request
> > > > to present userspace that relies on those exported interfaces.
> > > >
> > > > > Otherwise Greg would have never allowed me to go upstream in the first place.
> > > >
> > > > Nice, can we get a link?
> > > >
> > > > >
> > > > > We have a single function which is the entry point for all the IOCTLs
> > > > > of our drivers (only one IOCTL is RDMA related, all the others are
> > > > > compute related).
> > > > > That function is almost 1:1 copy of the function in drm.
> > > >
> > > > DRM has same rules as RDMA, no kernel code will be merged without seeing
> > > > open-source userspace.
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks,
> > > > > Oded
> > >
> > > So we do have an open-source library called hl-thunk, which uses our
> > > driver and indeed that was part of the requirement.
> > > It is similar to libdrm.
> > > Here is the link:
> > > https://github.com/HabanaAI/hl-thunk
> >
> > Are you kidding?
> >
> > This is mirror of some internal repository that looks like dumpster
> > with ChangeId, internal bug tracker numbers, not part of major OS
> > distributions.
> >
> > It is not open-source library and shows very clear why you chose
> > to upstream your driver through driver/misc/ tree.
> >
> > Thanks
>
> Adding Olof here.
>
> No, usually not.
> But are you kidding ?
> What did you exactly expect to find ? Is there an open-source project
> somewhere that encapsulates Deep-learning accelerators which I could
> connect to ?

I would expect certain level of code quality, collaboration and review
that distros require for inclusion. It is not the case for the github
repo you presented.

> AFAIK, the only thing remotely relevant is CUDA and that is
> closed-source (strange to hear lectures about open-source from NVIDIA
> people here...)

Please check git log statistics to estimate Nvidia/Mellanox/Cumulus
contributions to the Linux kernel and the open-source. You will be
surprised.

>
> So we are trying to give to the community such an open source library,
> or at least an example. Hopefully one day, when more companies
> upstream their drivers for deep-learning accelerators we could do
> something like libdrm or rdma-core, but for now, it's just our driver.

AFAIR, your driver is not unique, HiSilicon tried to submit something
similar years ago (warpdrive) and they are not alone.

>
> I have been in this community since 2013 with AMD and then RedHat, and
> I come with good intentions and a desire to open source and upstream
> as much as I can. I don't think I deserve this kind of response.

There is no need to take it personal. It was you who posted a link
to the github repo. What did you expect?

>
> The bottom line is that we had this discussion with Greg and Olof and
> DRM people almost 2 years ago and if there was some open-source
> project in user-space or some subsystem in the kernel we could connect
> to, we would have done that instead of what we did, but the fact of
> the matter there isn't such thing. Olof tried and is trying to create
> a h/w accelerator subsystem but it still hasn't got up from the ground
> yet.

Maybe it is a time to do it right.

>
> Oded

  reply	other threads:[~2020-09-18 13:10 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-15 17:10 [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 01/14] habanalabs/gaudi: add NIC H/W and registers definitions Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 02/14] habanalabs/gaudi: add NIC firmware-related definitions Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 03/14] habanalabs/gaudi: add NIC security configuration Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 04/14] habanalabs/gaudi: add support for NIC QMANs Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 05/14] habanalabs/gaudi: add NIC Ethernet support Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 06/14] habanalabs/gaudi: add NIC PHY code Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 07/14] habanalabs/gaudi: allow user to get MAC addresses in INFO IOCTL Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 08/14] habanalabs/gaudi: add a new IOCTL for NIC control operations Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 09/14] habanalabs/gaudi: add CQ " Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 10/14] habanalabs/gaudi: add WQ " Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 11/14] habanalabs/gaudi: add QP error handling Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 12/14] habanalabs/gaudi: Add ethtool support using coresight Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 13/14] habanalabs/gaudi: support DCB protocol Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 14/14] habanalabs/gaudi: add NIC init/fini calls from common code Oded Gabbay
2020-09-15 20:35 ` [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver Jakub Kicinski
2020-09-15 20:46   ` Oded Gabbay
2020-09-15 21:04     ` Jakub Kicinski
2020-09-15 21:20       ` Oded Gabbay
2020-09-15 21:37         ` Andrew Lunn
2020-09-15 21:43           ` Oded Gabbay
2020-09-15 22:35             ` David Miller
2020-09-15 22:36           ` David Miller
2020-09-15 22:34         ` David Miller
2020-09-16  4:26           ` Oded Gabbay
2020-09-17 17:18     ` Jason Gunthorpe
2020-09-18 11:36       ` Gal Pressman
2020-09-18 11:52         ` Leon Romanovsky
2020-09-18 11:56           ` Oded Gabbay
2020-09-18 12:03             ` Leon Romanovsky
2020-09-18 12:07               ` Oded Gabbay
2020-09-18 12:19                 ` Leon Romanovsky
2020-09-18 12:31                   ` Oded Gabbay
2020-09-18 13:09                     ` Leon Romanovsky [this message]
2020-09-19  6:40                   ` Greg Kroah-Hartman
2020-09-19  8:20                     ` Leon Romanovsky
2020-09-19  8:30                       ` Greg Kroah-Hartman
2020-09-19  8:58                         ` Leon Romanovsky
2020-09-19 16:43                         ` Oded Gabbay
2020-09-19 17:27                           ` Greg Kroah-Hartman
2020-09-19 19:22                             ` Jason Gunthorpe
2020-09-20  8:47                               ` Greg Kroah-Hartman
2020-09-20 19:05                                 ` Oded Gabbay
2020-09-21 10:39                                   ` Leon Romanovsky
2020-09-21 11:52                                 ` Jason Gunthorpe
2020-09-21 21:20                                   ` Jakub Kicinski
2020-09-22 11:49                                     ` Jason Gunthorpe
2020-09-19 18:49                           ` Andrew Lunn
2020-09-18 11:56         ` Jason Gunthorpe
2020-09-18 11:59           ` Oded Gabbay
2020-09-18 12:16             ` Jason Gunthorpe
2020-09-18 12:34               ` Oded Gabbay
2020-09-18 12:50                 ` Jason Gunthorpe
2020-09-18 13:02                   ` Oded Gabbay
2020-09-18 13:26                     ` Jason Gunthorpe
2020-09-18 13:49                       ` Oded Gabbay
2020-09-18 13:59                         ` Jason Gunthorpe
2020-09-18 14:12                           ` Oded Gabbay
2020-09-18 14:19                             ` Jason Gunthorpe
2020-09-18 14:45                               ` Oded Gabbay
2020-09-18 15:07                                 ` Jason Gunthorpe
2020-09-18 15:15                                   ` Oded Gabbay
2020-09-18 15:28                                     ` Jason Gunthorpe
2020-09-21 11:22                                       ` Gal Pressman
2020-09-21 11:49                                         ` Leon Romanovsky
2020-09-22 11:41                                         ` Jason Gunthorpe
2020-09-22 12:46                                           ` Gal Pressman
2020-09-22 16:14                                             ` Jason Gunthorpe
2020-09-22 16:30                                               ` Gal Pressman
2020-09-22 16:52                                                 ` Jason Gunthorpe
2020-09-18 12:10         ` Oded Gabbay
2020-09-15 20:42 ` David Miller
2020-09-15 20:49   ` Oded Gabbay
2020-09-16  6:26     ` Greg Kroah-Hartman
2020-09-16  6:36       ` Oded Gabbay
2020-09-16  7:42         ` Greg Kroah-Hartman
2020-09-16  8:02           ` Oded Gabbay
2020-09-16  8:22             ` Greg Kroah-Hartman
2020-09-16  8:47               ` Oded Gabbay
2020-09-16 12:00                 ` Greg Kroah-Hartman
2020-09-20 16:45                   ` Daniel Vetter
2020-09-16 23:04               ` Williams, Dan J
2020-09-18 12:00 ` Jason Gunthorpe
2020-09-18 12:01   ` Oded Gabbay

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200918130955.GV869610@unreal \
    --to=leon@kernel.org \
    --cc=SW_Drivers@habana.ai \
    --cc=andrew@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=f.fainelli@gmail.com \
    --cc=galpress@amazon.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=izur@habana.ai \
    --cc=jgg@ziepe.ca \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=oded.gabbay@gmail.com \
    --cc=olof@lixom.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.