linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg KH <gregkh@linuxfoundation.org>
To: Oded Gabbay <oded.gabbay@gmail.com>
Cc: linux-kernel@vger.kernel.org, SW_Drivers@habana.ai
Subject: Re: [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver
Date: Thu, 10 Sep 2020 17:54:40 +0200	[thread overview]
Message-ID: <20200910155440.GC1151284@kroah.com> (raw)
In-Reply-To: <20200910150328.20545-1-oded.gabbay@gmail.com>

On Thu, Sep 10, 2020 at 06:03:13PM +0300, Oded Gabbay wrote:
> This patch-set adds support for initializing and using the GAUDI NIC ports,
> functioning as scale-out interconnect when doing distributed Deep Learning
> training. The training can be performed over tens of thousands of GAUDIs
> and it is done using the RDMA-over-converged-Ethernet (RoCE) v2 protocol.
> 
> Each GAUDI exposes 10x100GbE ports that are designed to scale-out the
> inter-GAUDI communication by integrating a complete communication engine
> on-die. This native integration allows users to use the same scaling
> technology, both inside the server and rack (termed as scale-up), as well
> as for scaling across racks (scale-out). The racks can be connected
> directly between GAUDI processors, or through any number of standard
> Ethernet switches.
> 
> The driver exposes the NIC ports to the user as standard Ethernet ports by
> registering each port to the networking subsystem. This allows the user to
> manage the ports with standard tools such as ifconfig, ethtool, etc. It
> also enables us to connect to the Linux networking stack and thus support
> standard networking protocols, such as IPv4, IPv6, TCP, etc. In addition,
> we can also leverage protocols such as DCB for dynamically configuring
> priorities to avoid congestion.
> 
> For each NIC port there is a matching QMAN entity. For RoCE, the user
> submits workloads to the NIC through the QMAN, same as he does for the
> compute engines. For regular Ethernet, the user sends and receives packets
> through the standard Ethernet sockets. Those sockets are used only as a
> control path. The data path that is used for AI training goes through the
> RoCE interface.
> 
> It is important to note that there are some limitations and uniqueness
> in GAUDI's NIC H/W, compared to other networking adapters that enforced us
> to use a less-than-common driver design:
> 
> 1. The NIC functionality is NOT exposed as different PCI Physical
>    Functions. There is a single PF which is used for compute and
>    networking, as the main goal of the NIC ports is to be used as
>    intra-communication and not as standard network interfaces. This
>    implies we can't connect different drivers to handle the networking
>    ports because it is the same device, from the kernel POV, as the
>    compute. Therefore, we must integrate the networking code into the
>    main habanalabs driver.

That's kind of common, see the long threads on the netdev and IB mailing
lists about this type of issue on other networking cards today.  The
whole "virtual bus" code should help solve this, if Intel ever gets
around to posting a new version of that patch series one day...

But, because you are writing networking driver code here, you really
should run all of this by the netdev@vger.kernel.org maintainers and
developers, as they know how to review this interaction with the network
stack better than anyone else.

Care to resend it and cc: them too?

thanks,

greg k-h

  parent reply	other threads:[~2020-09-10 19:18 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-10 15:03 [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
2020-09-10 15:03 ` [PATCH 02/15] habanalabs/gaudi: add NIC firmware-related definitions Oded Gabbay
2020-09-10 15:03 ` [PATCH 03/15] habanalabs/gaudi: add NIC security configuration Oded Gabbay
2020-09-10 15:03 ` [PATCH 04/15] habanalabs/gaudi: add support for NIC QMANs Oded Gabbay
2020-09-10 15:03 ` [PATCH 05/15] habanalabs/gaudi: add NIC Ethernet support Oded Gabbay
2020-09-10 15:03 ` [PATCH 06/15] habanalabs/gaudi: add NIC PHY code Oded Gabbay
2020-09-10 15:03 ` [PATCH 07/15] habanalabs/gaudi: allow user to get MAC addresses in INFO IOCTL Oded Gabbay
2020-09-10 15:03 ` [PATCH 08/15] habanalabs/gaudi: add a new IOCTL for NIC control operations Oded Gabbay
2020-09-10 15:03 ` [PATCH 09/15] habanalabs/gaudi: add CQ " Oded Gabbay
2020-09-10 15:03 ` [PATCH 10/15] habanalabs/gaudi: add WQ " Oded Gabbay
2020-09-10 15:03 ` [PATCH 11/15] habanalabs/gaudi: add QP error handling Oded Gabbay
2020-09-10 15:03 ` [PATCH 12/15] habanalabs/gaudi: add debugfs entries for the NIC Oded Gabbay
2020-09-10 15:03 ` [PATCH 13/15] habanalabs/gaudi: Add ethtool support using coresight Oded Gabbay
2020-09-10 15:03 ` [PATCH 14/15] habanalabs/gaudi: support DCB protocol Oded Gabbay
2020-09-10 15:03 ` [PATCH 15/15] habanalabs/gaudi: add NIC init/fini calls from common code Oded Gabbay
2020-09-10 15:54 ` Greg KH [this message]
2020-09-10 15:59   ` [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
2020-09-10 16:08     ` Greg KH
2020-09-10 16:11 Oded Gabbay
2020-09-10 20:01 ` Jakub Kicinski
2020-09-10 20:16   ` Oded Gabbay
2020-09-10 20:25     ` Andrew Lunn
2020-09-10 20:30       ` Oded Gabbay
2020-09-10 20:38         ` Andrew Lunn
2020-09-10 20:52           ` Oded Gabbay
2020-09-11  6:22           ` Greg Kroah-Hartman
2020-09-10 20:28     ` Jakub Kicinski
2020-09-10 20:32       ` Oded Gabbay
2020-09-10 21:05         ` Florian Fainelli
2020-09-10 21:15           ` Oded Gabbay
2020-09-10 21:23             ` Florian Fainelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200910155440.GC1151284@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=SW_Drivers@habana.ai \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oded.gabbay@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).