From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT
	autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A34BFC433B4
	for <linux-kernel@archiver.kernel.org>; Thu, 13 May 2021 08:58:17 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 6F1CA61432
	for <linux-kernel@archiver.kernel.org>; Thu, 13 May 2021 08:58:17 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232290AbhEMI7V (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 13 May 2021 04:59:21 -0400
Received: from mga02.intel.com ([134.134.136.20]:47031 "EHLO mga02.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S232288AbhEMI7M (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 13 May 2021 04:59:12 -0400
IronPort-SDR: 0Kcv8M8qDSO/iEIzxCgOpL6l/4z8gzXAnII6wiZvY6u5vZ7O5X4XscLxHkzXjRP+eEKOKGGTkM
 KJRGIRfqa7zA==
X-IronPort-AV: E=McAfee;i="6200,9189,9982"; a="187032232"
X-IronPort-AV: E=Sophos;i="5.82,296,1613462400"; 
   d="scan'208";a="187032232"
Received: from fmsmga005.fm.intel.com ([10.253.24.32])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 May 2021 01:58:02 -0700
IronPort-SDR: 5yS78dNZlCzjSC0Tkr7wAv9h95TyVxINE/GlEVqgdv/Dw3CIgvQk5VxhWWqodAQWR8v/IvmQYS
 CN8Bo1v9oASg==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.82,296,1613462400"; 
   d="scan'208";a="625928233"
Received: from aipg-stp-03.iil.intel.com ([143.185.92.28])
  by fmsmga005.fm.intel.com with ESMTP; 13 May 2021 01:58:00 -0700
From:   Guy Zadicario <guy.zadicario@intel.com>
To:     gregkh@linuxfoundation.org, linux-kernel@vger.kernel.org
Cc:     olof@lixom.net, alexander.shishkin@linux.intel.com,
        andriy.shevchenko@intel.com, yochai.shefi-simchon@intel.com,
        guy.zadicario@intel.com
Subject: [PATCH v2 01/15] misc: nnpi: Document NNP-I's driver overview
Date:   Thu, 13 May 2021 11:57:11 +0300
Message-Id: <20210513085725.45528-2-guy.zadicario@intel.com>
X-Mailer: git-send-email 2.25.0
In-Reply-To: <20210513085725.45528-1-guy.zadicario@intel.com>
References: <20210513085725.45528-1-guy.zadicario@intel.com>
MIME-Version: 1.0
Organization: Intel Israel (74) Limited
Content-Transfer-Encoding: 8bit
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Introduce overview documentation for NNP-I card and driver
to let new readers of the driver understand better the
driver and the NNP-I device.

Signed-off-by: Guy Zadicario <guy.zadicario@intel.com>
Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 Documentation/misc-devices/index.rst      |   1 +
 Documentation/misc-devices/intel-nnpi.rst | 237 ++++++++++++++++++++++++++++++
 2 files changed, 238 insertions(+)
 create mode 100644 Documentation/misc-devices/intel-nnpi.rst

diff --git a/Documentation/misc-devices/index.rst b/Documentation/misc-devices/index.rst
index 30ac58f..7f14fc4 100644
--- a/Documentation/misc-devices/index.rst
+++ b/Documentation/misc-devices/index.rst
@@ -22,6 +22,7 @@ fit into other categories.
    dw-xdata-pcie
    ibmvmc
    ics932s401
+   intel-nnpi
    isl29003
    lis3lv02d
    max6875
diff --git a/Documentation/misc-devices/intel-nnpi.rst b/Documentation/misc-devices/intel-nnpi.rst
new file mode 100644
index 0000000..cb67fb4
--- /dev/null
+++ b/Documentation/misc-devices/intel-nnpi.rst
@@ -0,0 +1,237 @@
+.. SPDX-License-Identifier: GPL-2.0-ONLY
+
+======================================================================
+Device driver for Intel NNP-I (Neural Network Processor for Inference)
+======================================================================
+
+Terminology
+===========
+To avoid confusion and for better understanding of the code for new reviewers,
+here is a list of few definitions used throughout this document, in commit
+messages and in the code:
+
+card:
+    The NNP-I card, including hardware and embedded software running inside it,
+    which is connected to the host through PCIe.
+
+host:
+    The computer the NNP-I card is attached to through PCIe, and which runs the
+    Linux kernel with NNP-I driver.
+
+IPC protocol:
+    Protocol of messages exchanged between the host and card.
+
+command:
+    A message sent from host to card, typically a command to execute on the
+    card.
+
+response:
+    A message sent from card to the host, typically as a response to a command
+    sent previously from the host. or, a message sent from card to indicate some
+    event, such as an error condition, or the completion of an asynchronous request.
+
+nnp_user:
+    An object created for each file descriptor opened for /dev/nnpi_host. It
+    manages host memory resources. Each nnp_user has its own set of host resources which
+    cannot be shared with other nnp_user objects.
+
+channel, cmd_chan:
+    An object which manages communication between user-space applications and
+    cards. It provides an interface to send commands to the card and receive responses
+    from the card. Each application can create one or more channels to the same
+    or different cards. One channel connects a single nnp_user to a single card.
+    Each channel is associated with one NNP-I card and one nnp_user.
+    The channel can only access host memory resources belonging to that
+    nnp_user.
+
+ELBI:
+    This acronym refers to the set of hardware registers of the NNP-I card which
+    are accessible through BAR0. It stands for: "External Local Bus Interface".
+
+model:
+    A neural network used to process input data and generate output data
+
+Description
+===========
+NNP-I is a PCIe card which accelerates deep-learning inference
+applications. The card is equipped with LPDDR4 memory, a DMA engine, x86 cores
+and specialized compute units called Inference Compute Engines (ICE). These
+compute engines are designed for power-efficient inference-related computations.
+
+The card DRAM as well as the ICEs and other hardware components on the card
+are not accessible from host CPU; they are programmed and controlled by
+software components running on the card's x86 cores. Communication between the
+SW components on the card and the host is done through three interfaces:
+
+- A small size "command queue" and "response queue", through which commands
+  and responses can be sent and received to/from the card.
+- Two doorbell registers, through which the host and card can communicate state.
+- A DMA engine on the card, which copies big memory chunks between host and card
+  DRAM.
+
+A typical flow of an inference application running on the host is the following:
+
+1) Allocate memory on host to hold the model, input and output data.
+2) Request card's software to allocate space on the card DRAM for the model,
+   input and output data.
+3) Load a model into host DRAM, and request the card's DMA engine to copy it
+   over to the card's DRAM.
+4) Load host memory with input data.
+5) Schedule commands to the card to copy the input data to the card
+   DRAM, execute the model on this input, and copy the output data back to
+   host memory.
+
+The card's software stack consists of a BIOS image which is flashed on
+the card and kept in a dedicated persistent memory, and a full embedded Linux image
+which is loaded to the card during boot/reset process.
+During power-on, the card BIOS and host driver communicate through
+the doorbell registers and the "command queue". The card boot flow consists of
+loading a "boot image" to host memory and communicating the location of this
+image to the card's BIOS, the card's BIOS copies that boot image to card
+DRAM using the DMA engine and starts booting the embedded Linux running on the
+card's x86 cores.
+
+Each NNP-I card can support multiple inference application contexts; each
+context has its own space for card and host resource IDs.
+There is no hardware level restriction on one context to access resources of
+another context, however this is prevented by the card software stack by having a
+separate resource ID space for each of the contexts.
+
+There may be multiple NNP-I cards connected to one host. An inference application can
+hold channels to multiple NNP-I cards and should be able to map and access
+the same host resource memory on all cards, however only if the host resource
+and all device contexts are created by the same application. Possible use cases
+for this capability is to run different models on different cards in parallel
+on the same data, or passing output of one model ran on one card as input to
+another model which is loaded on another card. The driver implements
+that requirement by exporting two char devices, one for host resource management
+and another for card access. The application must provide host resource file
+descriptor to the card's character device to allow access to host resources created from
+the same file descriptor.
+
+ABI
+===
+There are two character device classes created by the driver with IOCTL
+interface, provided by include/uapi/misc/intel_nnpi.h:
+
+/dev/nnpi_host:
+
+ A character device which is not related to physical NNP-I  card.
+ It has 4 IOCTLs for creating, destroying,
+ locking and unlocking host resources. "host resource" is a
+ set of pinned memory pages on host which can be mapped
+ to PCI space and accessed by the card's DMA engine.
+ This character device is created on the first probed NNP-I card
+ so it will not be present on systems with no NNP-I cards.
+
+/dev/nnpi%d:
+
+ A character device with instance for each NNP-I card.
+ It support 5 IOCTLs for:
+
+ *  Creating a channel - A "channel" gives user-space the ability to
+    send commands and receive responses from the card.
+    For each channel an anonymous file descriptor is created and
+    returned to the user. Commands and responses to the card
+    are sent and received using write and read operations on
+    the channel file descriptor. The driver validates each
+    command sent and will reject unsupported or invalid commands.
+    Commands written to a channel are added to a queue; each
+    channel has its own command queue. For each card in the system
+    there's one kernel thread (msg_scheduler) which drains the
+    command queues to this card hardware command queue.
+    The channel is destroyed by closing the returned channel file
+    descriptor.
+    When creating a channel an open file descriptor for
+    /dev/nnpi_host needs to be provided. The channel object holds
+    a reference to that file, and the channel can map/unmap only
+    host resources which were created through that same file
+    descriptor.
+    Each channel has a unique 10-bit ID allocated by the driver.
+    Channel IDs in a range [0, 255] are used for inference
+    contexts.
+    Channel with ID greater than 255 is used for non-inference
+    related communication with the card (mainly maintenance,
+    stats query, etc).
+ *  Map/Unmap host resource - Maps a host resource to card PCI
+    space and sends to the card a pointer to a page table of
+    the physical addresses of the resource pages.
+    Each map has a unique 16-bit ID. Commands sent to the card
+    can include such ID in order to reference a host resource.
+    The ID space for host resources is private for each channel.
+ *  Create/Delete "ringbuffer" - This is exactly the same as
+    map/unmap host resource, but for special host resources,
+    called ring buffers, used to transfer data along with
+    some commands. There may be up to two host-to-card ring
+    buffers and two card-to-host ring buffers.
+
+sysfs
+=====
+There are multiple sysfs attributes for NNP-I card allowing to display
+card information and status, and some for control operations like enable/disable
+the card.
+
+Attributes are documented in `Documentation/ABI/testing/sysfs-driver-intel_nnpi`.
+
+PCI BARs
+========
+The card exposes two 64-bit BARs:
+
+BAR0-1:
+    4KB including card registers to control the command and response hardware
+    queues (FIFOs), doorbell registers and control/interrupt status registers.
+    The offsets and bitfields of those registers are defined in
+    if_include/nnpi_elbi.h
+
+BAR2-3:
+    Card Memory region of 64MB. The host has read/write access to this region.
+    The first 16KB of this region hold card crash dump in case the card
+    software stack has crashed (Obviously, this is an erroneous state that should
+    never happen, but when it does the crash dump helps debugging).
+    The layout of this 16KB is defined in if_include/nnp_inbound_mem.h.
+    This region will be filled by the card on event of crash and can be
+    read by the host for debugging purposes. When a crash is detected on card,
+    the card will send an event response message to indicate that event.
+    The Rest of this memory region (64MB - 16KB) is used by peer-to-peer
+    applications to transfer data between two NNP-I cards. A single application
+    can use multiple cards, there are commands to the card which allows such
+    application to allocate device resources in this BAR space as well as commands
+    to issue DMA copy request to copy data from one card DRAM memory to a second
+    card memory if the destination resource address is within BAR 2-3 space.
+
+Card's command protocol
+=======================
+Commands to the card include 1, 2 or 3 64-bit values. The lower 6 bits in the
+command specify the command opcode. The opcode also defines the command size as
+each command has constant size. Commands which are targeted to a specific channel
+include the channel ID in bits 15:6 of the command and must use opcode value
+above or equal to 32.
+
+The definition of other bits is specific to each command.
+Responses from the card have the same format.
+
+The opcodes and structure of the command and responses are defined in
+drivers/misc/intel-nnpi/ipc_include/ipc_protocol.h
+
+Driver architecture
+===================
+Since the NNP-I card has its own software stack and the way to program it is
+mainly by sending commands and receive responses, the driver is split into
+two modules.
+
+The PCIe device driver module, intel_nnpi_pcie, is responsible for detecting
+the NNP-I card and provide functionality for sending it commands, receive
+responses and perform few control operations. This module does not understand
+the IPC protocol, it only provides the mechanism to send and receive raw data.
+This module does not also provide any user space interfaces. The intel_nnpi_pcie
+module is a pci driver for the Intel NNP-I card and glue it into the NNP-I
+framework module.
+
+The NNP-I framework module, intel_nnpi, is the "logical" module that provides the
+user space interfaces, understands the IPC protocol, packs and unpacks commands
+and responses and control the card by calling to interfaces implemented by the
+device driver module. The main structure for an NNP-I card device in the framework
+is &struct nnp_device.
+
+These two modules serve similar roles to the "transport" and "application" layers
+in the `standard TCP/IP terminology <https://www.guru99.com/tcp-ip-model.html#3>`_.
-- 
1.8.3.1