linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: [PATCH v10 01/20] dlb: add skeleton for DLB driver
       [not found] ` <20210210175423.1873-2-mike.ximing.chen@intel.com>
@ 2021-02-18  7:34   ` Chen, Mike Ximing
  2021-02-18  7:52     ` gregkh
  0 siblings, 1 reply; 6+ messages in thread
From: Chen, Mike Ximing @ 2021-02-18  7:34 UTC (permalink / raw)
  To: Chen, Mike Ximing, netdev, Linux Kernel Mailing List
  Cc: davem, kuba, arnd, gregkh, Williams, Dan J, pierre-louis.bossart



> -----Original Message-----
> From: Mike Ximing Chen <mike.ximing.chen@intel.com>
> Sent: Wednesday, February 10, 2021 12:54 PM
> To: netdev@vger.kernel.org
> Cc: davem@davemloft.net; kuba@kernel.org; arnd@arndb.de;
> gregkh@linuxfoundation.org; Williams, Dan J <dan.j.williams@intel.com>; pierre-
> louis.bossart@linux.intel.com; Gage Eads <gage.eads@intel.com>
> Subject: [PATCH v10 01/20] dlb: add skeleton for DLB driver
> 
> diff --git a/Documentation/misc-devices/dlb.rst b/Documentation/misc-
> devices/dlb.rst
> new file mode 100644
> index 000000000000..aa79be07ee49
> --- /dev/null
> +++ b/Documentation/misc-devices/dlb.rst
> @@ -0,0 +1,259 @@
> +.. SPDX-License-Identifier: GPL-2.0-only
> +
> +===========================================
> +Intel(R) Dynamic Load Balancer Overview
> +===========================================
> +
> +:Authors: Gage Eads and Mike Ximing Chen
> +
> +Contents
> +========
> +
> +- Introduction
> +- Scheduling
> +- Queue Entry
> +- Port
> +- Queue
> +- Credits
> +- Scheduling Domain
> +- Interrupts
> +- Power Management
> +- User Interface
> +- Reset
> +
> +Introduction
> +============
> +
> +The Intel(r) Dynamic Load Balancer (Intel(r) DLB) is a PCIe device that
> +provides load-balanced, prioritized scheduling of core-to-core communication.
> +
> +Intel DLB is an accelerator for the event-driven programming model of
> +DPDK's Event Device Library[2]. The library is used in packet processing
> +pipelines that arrange for multi-core scalability, dynamic load-balancing, and
> +variety of packet distribution and synchronization schemes.
> +
> +Intel DLB device consists of queues and arbiters that connect producer
> +cores and consumer cores. The device implements load-balanced queueing
> features
> +including:
> +- Lock-free multi-producer/multi-consumer operation.
> +- Multiple priority levels for varying traffic types.
> +- 'Direct' traffic (i.e. multi-producer/single-consumer)
> +- Simple unordered load-balanced distribution.
> +- Atomic lock free load balancing across multiple consumers.
> +- Queue element reordering feature allowing ordered load-balanced distribution.
> +

Hi Jakub/Dave,
This is a device driver for a HW core-to-core communication accelerator. It is submitted 
to "linux-kernel" for a module under device/misc. Greg suggested (see below) that we
also sent it to you for any potential feedback in case there is any interaction with
networking initiatives. The device is used to handle the load balancing among CPU cores
after the packets are received and forwarded to CPU. We don't think it interferes
with networking operations, but would appreciate very much your review/comment on this.
 
Thanks for your help.
Mike

> As this is a networking related thing, I would like you to get the
>proper reviews/acks from the networking maintainers before I can take
>this.
>
>Or, if they think it has nothing to do with networking, that's fine too,
>but please do not try to route around them.
>
>thanks,
>
>greg k-h


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v10 01/20] dlb: add skeleton for DLB driver
  2021-02-18  7:34   ` [PATCH v10 01/20] dlb: add skeleton for DLB driver Chen, Mike Ximing
@ 2021-02-18  7:52     ` gregkh
  2021-02-18 15:37       ` Chen, Mike Ximing
  2021-03-07 13:59       ` Chen, Mike Ximing
  0 siblings, 2 replies; 6+ messages in thread
From: gregkh @ 2021-02-18  7:52 UTC (permalink / raw)
  To: Chen, Mike Ximing
  Cc: netdev, Linux Kernel Mailing List, davem, kuba, arnd, Williams,
	Dan J, pierre-louis.bossart

On Thu, Feb 18, 2021 at 07:34:31AM +0000, Chen, Mike Ximing wrote:
> 
> 
> > -----Original Message-----
> > From: Mike Ximing Chen <mike.ximing.chen@intel.com>
> > Sent: Wednesday, February 10, 2021 12:54 PM
> > To: netdev@vger.kernel.org
> > Cc: davem@davemloft.net; kuba@kernel.org; arnd@arndb.de;
> > gregkh@linuxfoundation.org; Williams, Dan J <dan.j.williams@intel.com>; pierre-
> > louis.bossart@linux.intel.com; Gage Eads <gage.eads@intel.com>
> > Subject: [PATCH v10 01/20] dlb: add skeleton for DLB driver
> > 
> > diff --git a/Documentation/misc-devices/dlb.rst b/Documentation/misc-
> > devices/dlb.rst
> > new file mode 100644
> > index 000000000000..aa79be07ee49
> > --- /dev/null
> > +++ b/Documentation/misc-devices/dlb.rst
> > @@ -0,0 +1,259 @@
> > +.. SPDX-License-Identifier: GPL-2.0-only
> > +
> > +===========================================
> > +Intel(R) Dynamic Load Balancer Overview
> > +===========================================
> > +
> > +:Authors: Gage Eads and Mike Ximing Chen
> > +
> > +Contents
> > +========
> > +
> > +- Introduction
> > +- Scheduling
> > +- Queue Entry
> > +- Port
> > +- Queue
> > +- Credits
> > +- Scheduling Domain
> > +- Interrupts
> > +- Power Management
> > +- User Interface
> > +- Reset
> > +
> > +Introduction
> > +============
> > +
> > +The Intel(r) Dynamic Load Balancer (Intel(r) DLB) is a PCIe device that
> > +provides load-balanced, prioritized scheduling of core-to-core communication.
> > +
> > +Intel DLB is an accelerator for the event-driven programming model of
> > +DPDK's Event Device Library[2]. The library is used in packet processing
> > +pipelines that arrange for multi-core scalability, dynamic load-balancing, and
> > +variety of packet distribution and synchronization schemes.
> > +
> > +Intel DLB device consists of queues and arbiters that connect producer
> > +cores and consumer cores. The device implements load-balanced queueing
> > features
> > +including:
> > +- Lock-free multi-producer/multi-consumer operation.
> > +- Multiple priority levels for varying traffic types.
> > +- 'Direct' traffic (i.e. multi-producer/single-consumer)
> > +- Simple unordered load-balanced distribution.
> > +- Atomic lock free load balancing across multiple consumers.
> > +- Queue element reordering feature allowing ordered load-balanced distribution.
> > +
> 
> Hi Jakub/Dave,
> This is a device driver for a HW core-to-core communication accelerator. It is submitted 
> to "linux-kernel" for a module under device/misc. Greg suggested (see below) that we
> also sent it to you for any potential feedback in case there is any interaction with
> networking initiatives. The device is used to handle the load balancing among CPU cores
> after the packets are received and forwarded to CPU. We don't think it interferes
> with networking operations, but would appreciate very much your review/comment on this.

It's the middle of the merge window, getting maintainers to review new
stuff until after 5.12-rc1 is out is going to be a very difficult thing
to do.

In the meantime, why don't you all help out and review submitted patches
to the mailing lists for the subsystems you all are trying to get this
patch into.  I know maintainers would appreciate the help, right?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH v10 01/20] dlb: add skeleton for DLB driver
  2021-02-18  7:52     ` gregkh
@ 2021-02-18 15:37       ` Chen, Mike Ximing
  2021-03-07 13:59       ` Chen, Mike Ximing
  1 sibling, 0 replies; 6+ messages in thread
From: Chen, Mike Ximing @ 2021-02-18 15:37 UTC (permalink / raw)
  To: gregkh
  Cc: netdev, Linux Kernel Mailing List, davem, kuba, arnd, Williams,
	Dan J, pierre-louis.bossart



> -----Original Message-----
> From: gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>
> Sent: Thursday, February 18, 2021 2:53 AM
> To: Chen, Mike Ximing <mike.ximing.chen@intel.com>
> Cc: netdev@vger.kernel.org; Linux Kernel Mailing List <linux-
> kernel@vger.kernel.org>; davem@davemloft.net; kuba@kernel.org; arnd@arndb.de;
> Williams, Dan J <dan.j.williams@intel.com>; pierre-louis.bossart@linux.intel.com
> Subject: Re: [PATCH v10 01/20] dlb: add skeleton for DLB driver
> 
> On Thu, Feb 18, 2021 at 07:34:31AM +0000, Chen, Mike Ximing wrote:
> >
> >
> > > -----Original Message-----
> > > From: Mike Ximing Chen <mike.ximing.chen@intel.com>
> > > Sent: Wednesday, February 10, 2021 12:54 PM
> > > To: netdev@vger.kernel.org
> > > Cc: davem@davemloft.net; kuba@kernel.org; arnd@arndb.de;
> > > gregkh@linuxfoundation.org; Williams, Dan J <dan.j.williams@intel.com>;
> pierre-
> > > louis.bossart@linux.intel.com; Gage Eads <gage.eads@intel.com>
> > > Subject: [PATCH v10 01/20] dlb: add skeleton for DLB driver
> > >
> > > diff --git a/Documentation/misc-devices/dlb.rst b/Documentation/misc-
> > > devices/dlb.rst
> > > new file mode 100644
> > > index 000000000000..aa79be07ee49
> > > --- /dev/null
> > > +++ b/Documentation/misc-devices/dlb.rst
> > > @@ -0,0 +1,259 @@
> > > +.. SPDX-License-Identifier: GPL-2.0-only
> > > +
> > > +===========================================
> > > +Intel(R) Dynamic Load Balancer Overview
> > > +===========================================
> > > +
> > > +:Authors: Gage Eads and Mike Ximing Chen
> > > +
> > > +Contents
> > > +========
> > > +
> > > +- Introduction
> > > +- Scheduling
> > > +- Queue Entry
> > > +- Port
> > > +- Queue
> > > +- Credits
> > > +- Scheduling Domain
> > > +- Interrupts
> > > +- Power Management
> > > +- User Interface
> > > +- Reset
> > > +
> > > +Introduction
> > > +============
> > > +
> > > +The Intel(r) Dynamic Load Balancer (Intel(r) DLB) is a PCIe device that
> > > +provides load-balanced, prioritized scheduling of core-to-core communication.
> > > +
> > > +Intel DLB is an accelerator for the event-driven programming model of
> > > +DPDK's Event Device Library[2]. The library is used in packet processing
> > > +pipelines that arrange for multi-core scalability, dynamic load-balancing, and
> > > +variety of packet distribution and synchronization schemes.
> > > +
> > > +Intel DLB device consists of queues and arbiters that connect producer
> > > +cores and consumer cores. The device implements load-balanced queueing
> > > features
> > > +including:
> > > +- Lock-free multi-producer/multi-consumer operation.
> > > +- Multiple priority levels for varying traffic types.
> > > +- 'Direct' traffic (i.e. multi-producer/single-consumer)
> > > +- Simple unordered load-balanced distribution.
> > > +- Atomic lock free load balancing across multiple consumers.
> > > +- Queue element reordering feature allowing ordered load-balanced distribution.
> > > +
> >
> > Hi Jakub/Dave,
> > This is a device driver for a HW core-to-core communication accelerator. It is
> submitted
> > to "linux-kernel" for a module under device/misc. Greg suggested (see below) that
> we
> > also sent it to you for any potential feedback in case there is any interaction with
> > networking initiatives. The device is used to handle the load balancing among CPU
> cores
> > after the packets are received and forwarded to CPU. We don't think it interferes
> > with networking operations, but would appreciate very much your
> review/comment on this.
> 
> It's the middle of the merge window, getting maintainers to review new
> stuff until after 5.12-rc1 is out is going to be a very difficult thing
> to do.
> 
> In the meantime, why don't you all help out and review submitted patches
> to the mailing lists for the subsystems you all are trying to get this
> patch into.  I know maintainers would appreciate the help, right?
> 
> thanks,
> 
> greg k-h

Sure. I am a little new to the community and process, but will try to help.
Thanks
Mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH v10 01/20] dlb: add skeleton for DLB driver
  2021-02-18  7:52     ` gregkh
  2021-02-18 15:37       ` Chen, Mike Ximing
@ 2021-03-07 13:59       ` Chen, Mike Ximing
  1 sibling, 0 replies; 6+ messages in thread
From: Chen, Mike Ximing @ 2021-03-07 13:59 UTC (permalink / raw)
  To: gregkh
  Cc: netdev, Linux Kernel Mailing List, davem, kuba, arnd, Williams,
	Dan J, pierre-louis.bossart, Brandeburg, Jesse



> -----Original Message-----
> From: gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>
> Sent: Thursday, February 18, 2021 2:53 AM
> To: Chen, Mike Ximing <mike.ximing.chen@intel.com>
> Cc: netdev@vger.kernel.org; Linux Kernel Mailing List <linux-
> kernel@vger.kernel.org>; davem@davemloft.net; kuba@kernel.org; arnd@arndb.de;
> Williams, Dan J <dan.j.williams@intel.com>; pierre-louis.bossart@linux.intel.com
> Subject: Re: [PATCH v10 01/20] dlb: add skeleton for DLB driver
> 
> On Thu, Feb 18, 2021 at 07:34:31AM +0000, Chen, Mike Ximing wrote:
> >
> >
> > > -----Original Message-----
> > > From: Mike Ximing Chen <mike.ximing.chen@intel.com>
> > > Sent: Wednesday, February 10, 2021 12:54 PM
> > > To: netdev@vger.kernel.org
> > > Cc: davem@davemloft.net; kuba@kernel.org; arnd@arndb.de;
> > > gregkh@linuxfoundation.org; Williams, Dan J <dan.j.williams@intel.com>;
> pierre-
> > > louis.bossart@linux.intel.com; Gage Eads <gage.eads@intel.com>
> > > Subject: [PATCH v10 01/20] dlb: add skeleton for DLB driver
> > >
> > > diff --git a/Documentation/misc-devices/dlb.rst b/Documentation/misc-
> > > devices/dlb.rst
> > > new file mode 100644
> > > index 000000000000..aa79be07ee49
> > > --- /dev/null
> > > +++ b/Documentation/misc-devices/dlb.rst
> > > @@ -0,0 +1,259 @@
> > > +.. SPDX-License-Identifier: GPL-2.0-only
> > > +
> > > +===========================================
> > > +Intel(R) Dynamic Load Balancer Overview
> > > +===========================================
> > > +
> > > +:Authors: Gage Eads and Mike Ximing Chen
> > > +
> > > +Contents
> > > +========
> > > +
> > > +- Introduction
> > > +- Scheduling
> > > +- Queue Entry
> > > +- Port
> > > +- Queue
> > > +- Credits
> > > +- Scheduling Domain
> > > +- Interrupts
> > > +- Power Management
> > > +- User Interface
> > > +- Reset
> > > +
> > > +Introduction
> > > +============
> > > +
> > > +The Intel(r) Dynamic Load Balancer (Intel(r) DLB) is a PCIe device that
> > > +provides load-balanced, prioritized scheduling of core-to-core communication.
> > > +
> > > +Intel DLB is an accelerator for the event-driven programming model of
> > > +DPDK's Event Device Library[2]. The library is used in packet processing
> > > +pipelines that arrange for multi-core scalability, dynamic load-balancing, and
> > > +variety of packet distribution and synchronization schemes.
> > > +
> > > +Intel DLB device consists of queues and arbiters that connect producer
> > > +cores and consumer cores. The device implements load-balanced queueing
> > > features
> > > +including:
> > > +- Lock-free multi-producer/multi-consumer operation.
> > > +- Multiple priority levels for varying traffic types.
> > > +- 'Direct' traffic (i.e. multi-producer/single-consumer)
> > > +- Simple unordered load-balanced distribution.
> > > +- Atomic lock free load balancing across multiple consumers.
> > > +- Queue element reordering feature allowing ordered load-balanced distribution.
> > > +
> >
> > Hi Jakub/Dave,
> > This is a device driver for a HW core-to-core communication accelerator. It is
> submitted
> > to "linux-kernel" for a module under device/misc. Greg suggested (see below) that
> we
> > also sent it to you for any potential feedback in case there is any interaction with
> > networking initiatives. The device is used to handle the load balancing among CPU
> cores
> > after the packets are received and forwarded to CPU. We don't think it interferes
> > with networking operations, but would appreciate very much your
> review/comment on this.
> 
> It's the middle of the merge window, getting maintainers to review new
> stuff until after 5.12-rc1 is out is going to be a very difficult thing
> to do.
> 

Hi Jakub/Dave,
Just wonder if you had a chance to take a look at our patch. With the close of 5.12
merge window, we would like to get the process moving again.

> In the meantime, why don't you all help out and review submitted patches
> to the mailing lists for the subsystems you all are trying to get this
> patch into.  I know maintainers would appreciate the help, right?
> 
> thanks,
> 
> greg k-h

Did a few reviews last weekend, and will continue to help.

Thanks
Mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH v10 00/20] dlb: introduce DLB device driver
       [not found]               ` <YJ6KHznWTKvKQZEl@kroah.com>
@ 2021-07-16  1:04                 ` Chen, Mike Ximing
  0 siblings, 0 replies; 6+ messages in thread
From: Chen, Mike Ximing @ 2021-07-16  1:04 UTC (permalink / raw)
  To: Greg KH, Williams, Dan J
  Cc: Netdev, David Miller, Jakub Kicinski, Arnd Bergmann,
	Pierre-Louis Bossart, Brandeburg, Jesse, KVM list, Raj, Ashok,
	Linux Kernel Mailing List

> -----Original Message-----
> From: Greg KH <gregkh@linuxfoundation.org>
> Sent: Friday, May 14, 2021 10:33 AM
> To: Williams, Dan J <dan.j.williams@intel.com>
> > Hi Greg,
> >
> > So, for the last few weeks Mike and company have patiently waded
> > through my questions and now I think we are at a point to work through
> > the upstream driver architecture options and tradeoffs. You were not
> > alone in struggling to understand what this device does because it is
> > unlike any other accelerator Linux has ever considered. It shards /
> > load balances a data stream for processing by CPU threads. This is
> > typically a network appliance function / protocol, but could also be
> > any other generic thread pool like the kernel's padata. It saves the
> > CPU cycles spent load balancing work items and marshaling them through
> > a thread pool pipeline. For example, in DPDK applications, DLB2 frees
> > up entire cores that would otherwise be consumed with scheduling and
> > work distribution. A separate proof-of-concept, using DLB2 to
> > accelerate the kernel's "padata" thread pool for a crypto workload,
> > demonstrated ~150% higher throughput with hardware employed to manage
> > work distribution and result ordering. Yes, you need a sufficiently
> > high touch / high throughput protocol before the software load
> > balancing overhead coordinating CPU threads starts to dominate the
> > performance, but there are some specific workloads willing to switch
> > to this regime.
> >
> > The primary consumer to date has been as a backend for the event
> > handling in the userspace networking stack, DPDK. DLB2 has an existing
> > polled-mode-userspace driver for that use case. So I said, "great,
> > just add more features to that userspace driver and you're done". In
> > fact there was DLB1 hardware that also had a polled-mode-userspace
> > driver. So, the next question is "what's changed in DLB2 where a
> > userspace driver is no longer suitable?". The new use case for DLB2 is
> > new hardware support for a host driver to carve up device resources
> > into smaller sets (vfio-mdevs) that can be assigned to guests (Intel
> > calls this new hardware capability SIOV: Scalable IO Virtualization).
> >
> > Hardware resource management is difficult to handle in userspace
> > especially when bare-metal hardware events need to coordinate with
> > guest-VM device instances. This includes a mailbox interface for the
> > guest VM to negotiate resources with the host driver. Another more
> > practical roadblock for a "DLB2 in userspace" proposal is the fact
> > that it implements what are in-effect software-defined-interrupts to
> > go beyond the scalability limits of PCI MSI-x (Intel calls this
> > Interrupt Message Store: IMS). So even if hardware resource management
> > was awkwardly plumbed into a userspace daemon there would still need
> > to be kernel enabling for device-specific extensions to
> > drivers/vfio/pci/vfio_pci_intrs.c for it to understand the IMS
> > interrupts of DLB2 in addition to PCI MSI-x.
> >
> > While that still might be solvable in userspace if you squint at it, I
> > don't think Linux end users are served by pushing all of hardware
> > resource management to userspace. VFIO is mostly built to pass entire
> > PCI devices to guests, or in coordination with a kernel driver to
> > describe a subset of the hardware to a virtual-device (vfio-mdev)
> > interface. The rub here is that to date kernel drivers using VFIO to
> > provision mdevs have some existing responsibilities to the core kernel
> > like a network driver or DMA offload driver. The DLB2 driver offers no
> > such service to the kernel for its primary role of accelerating a
> > userspace data-plane. I am assuming here that  the padata
> > proof-of-concept is interesting, but not a compelling reason to ship a
> > driver compared to giving end users competent kernel-driven
> > hardware-resource assignment for deploying DLB2 virtual instances into
> > guest VMs.
> >
> > My "just continue in userspace" suggestion has no answer for the IMS
> > interrupt and reliable hardware resource management support
> > requirements. If you're with me so far we can go deeper into the
> > details, but in answer to your previous questions most of the TLAs
> > were from the land of "SIOV" where the VFIO community should be
> > brought in to review. The driver is mostly a configuration plane where
> > the fast path data-plane is entirely in userspace. That configuration
> > plane needs to manage hardware events and resourcing on behalf of
> > guest VMs running on a partitioned subset of the device. There are
> > worthwhile questions about whether some of the uapi can be refactored
> > to common modules like uacce, but I think we need to get to a first
> > order understanding on what DLB2 is and why the kernel has a role
> > before diving into the uapi discussion.
> >
> > Any clearer?
> 
> A bit, yes, thanks.
> 
> > So, in summary drivers/misc/ appears to be the first stop in the
> > review since a host driver needs to be established to start the VFIO
> > enabling campaign. With my community hat on, I think requiring
> > standalone host drivers is healthier for Linux than broaching the
> > subject of VFIO-only drivers. Even if, as in this case, the initial
> > host driver is mostly implementing a capability that could be achieved
> > with a userspace driver.
> 
> Ok, then how about a much "smaller" kernel driver for all of this, and a whole lot of documentation to
> describe what is going on and what all of the TLAs are.
> 
> thanks,
> 
> greg k-h

Hi Greg,

tl;dr: We have been looking into various options to reduce the kernel driver size and ABI surface, such as moving more responsibility to user space, reusing existing kernel modules (uacce, for example), and converting functionality from ioctl to sysfs. End result 10 ioctls will be replaced by sysfs, the rest of them (20 ioctls) will be replaced by configfs. Some concepts are moved to device-special files rather than ioctls that produce file descriptors.

Details:
We investigated the possibility of using uacce (https://www.kernel.org/doc/html/latest/misc-devices/uacce.html) in our kernel driver.  The uacce interface fits well with accelerators that process user data with known source and destination addresses. For a DLB (Dynamic Load Balancer), however,  the destination port depends on the system load and is unknown to the application. While uacce exposes  "queues" to user, the dlb driver has to handle much complicated resource managements, such as credits, ports, queues and domains. We would have to add a lot of more concepts and code, which are not useful for other accelerators,  in uacce to make it working for DLB. This may also lead to a bigger code size over all.

We also took a another look at moving resource management functionality from kernel space to user space. Much of kernel driver supports both PF (Physical Function) on host and VFs (Virtual Functions) on VMs. Since only the PF on the host has permissions to setup resource and configure the DLB HW, all the requests on VFs are forwarded to PF via the VF-PF mail boxes, which are handled by the kernel driver. The driver also maintains various virtual id to physical id translations (for VFs, ports, queues, etc), and provides virtual-to-physical id mapping info DLB HW so that an application in VM can access the resources with virtual IDs only. Because of the VF/VDEV support, we have to keep the resource management, which is more than one half of the code size, in the driver.

To simplify the user interface, we explored the ways to reduce/eliminate ioctl interface, and found that we can utilize configfs for many of the DLB functionalities. Our current plan is to replace all the ioctls in the driver with sysfs and configfs. We will use configfs for most of setup and configuration for both physical function and virtual functions. This may not reduce the overall driver size greatly, but it will lessen much of ABI maintenance burden (with the elimination of ioctls).  I hope this is something that is in line with what you like to see for the driver.

Thanks
Mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v10 00/20] dlb: introduce DLB device driver
@ 2021-01-27 22:56 Mike Ximing Chen
  0 siblings, 0 replies; 6+ messages in thread
From: Mike Ximing Chen @ 2021-01-27 22:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: arnd, gregkh, dan.j.williams, pierre-louis.bossart

Introduce a new misc device driver for the Intel(r) Dynamic Load Balancer
(Intel(r) DLB). The Intel DLB is a PCIe device that provides
load-balanced, prioritized scheduling of core-to-core communication.

Intel DLB is an accelerator for the event-driven programming model of
DPDK's Event Device Library[2]. The library is used in packet processing
pipelines that arrange for multi-core scalability, dynamic load-balancing,
and variety of packet distribution and synchronization schemes

These distribution schemes include "parallel" (packets are load-balanced
across multiple cores and processed in parallel), "ordered" (similar to
"parallel" but packets are reordered into ingress order by the device), and
"atomic" (packet flows are scheduled to a single core at a time such that
locks are not required to access per-flow data, and dynamically migrated to
ensure load-balance).

This submission supports Intel DLB 2.0 only.

The Intel DLB consists of queues and arbiters that connect producer
cores and consumer cores. The device implements load-balanced queueing
features including:
- Lock-free multi-producer/multi-consumer operation.
- Multiple priority levels for varying traffic types.
- 'Direct' traffic (i.e. multi-producer/single-consumer)
- Simple unordered load-balanced distribution.
- Atomic lock free load balancing across multiple consumers.
- Queue element reordering feature allowing ordered load-balanced
  distribution.

The fundamental unit of communication through the device is a queue entry
(QE), which consists of 8B of data and 8B of metadata (destination queue,
priority, etc.). The data field can be any type that fits within 8B.

A core's interface to the device, a "port," consists of a memory-mappable
region through which the core enqueues a queue entry, and an in-memory
queue (the "consumer queue") to which the device schedules QEs. Each QE
is enqueued to a device-managed queue, and from there scheduled to a port.
Software specifies the "linking" of queues and ports; i.e. which ports the
device is allowed to schedule to for a given queue. The device uses a
credit scheme to prevent overflow of the on-device queue storage.

Applications can interface directly with the device by mapping the port's
memory and MMIO regions into the application's address space for enqueue
and dequeue operations, but call into the kernel driver for configuration
operations. An application can also be polling- or interrupt-driven;
Intel DLB supports both modes of operation.

Device resources -- i.e. ports, queues, and credits -- are contained within
a scheduling domain. Scheduling domains are isolated from one another; a
port can only enqueue to and dequeue from queues within its scheduling
domain. A scheduling domain's resources are configured through a scheduling
domain file, which is acquired through an ioctl.

Intel DLB supports SR-IOV and Scalable IOV, and allows for a flexible
division of its resources among the PF and its virtual devices. The virtual
devices are incapable of configuring the device directly; they use a
hardware mailbox to proxy configuration requests to the PF driver. This
driver supports both PF and virtual devices, as there is significant code
re-use between the two, with device-specific behavior handled through a
callback interface.  Virtualization support will be added in a later patch
set.

The dlb driver uses ioctls as its primary interface (it makes use of sysfs
as well, to a lesser extent). The dlb device file supports a different
ioctl interface than the scheduling domain file; the dlb device file
is used for device-wide operations (including scheduling domain creation),
and the scheduling domain file supports operations on the scheduling
domain's resources (primarily resource configuration). Scheduling domains
are created dynamically (using a dlb device file ioctl) by user-space
software, and the scheduling domain file is created from an anonymous file
that is installed in the ioctl's calling process's file descriptor table.

[1] https://builders.intel.com/docs/networkbuilders/SKU-343247-001US-queue-management-and-load-balancing-on-intel-architecture.pdf
[2] https://doc.dpdk.org/guides/prog_guide/eventdev.html

v10:
- Addressed an issue reported by kernel test robot <lkp@intel.com>
-- Add "WITH Linux-syscall-note" to the SPDX-License-Identifier in uapi
   header file dlb.h.

v9:
- Addressed all of Greg's feecback on v8, including
-- Remove function name (__func__) from dev_err() messages, that could spam log.
-- Replace list and function pointer calls in dlb_ioctl() with switch-case
   and real function calls for ioctl.
-- Drop the compat_ptr_ioctl in dlb_ops (struct file_operations).
-- Change ioctl magic number for DLB to unused 0x81 (from 'h').
-- Remove all placeholder/dummy functions in the patch set.
-- Re-arrange the comments in dlb.h so that the order is consistent with that
   of data structures referred.
-- Correct the comments on SPDX License and DLB versions in dlb.h.
-- Replace BIT_SET() and BITS_CLR() marcos with direct coding.   
-- Remove NULL pointer checking (f->private_data) in dlb_ioctl().
-- Use whole line whenever possible and not wrapping lines unnecessarily.
-- Remove __attribute__((unused)).
-- Merge dlb_ioctl.h and dlb_file.h into dlb_main.h

v8:
- Add a functional block diagram in dlb.rst 
- Modify change logs to reflect the links between patches and DPDK
  eventdev library.
- Add a check of power-of-2 for CQ depth.
- Move call to INIT_WORK() to dlb_open().
- Clean dlb workqueue by calling flush_scheduled_work().
- Add unmap_mapping_range() in dlb_port_close().

v7 (Intel internal version):
- Address all of Dan's feedback, including
-- Drop DLB 2.0 throughout the patch set, use DLB only.
-- Fix license and copyright statements
-- Use pcim_enable_device() and pcim_iomap_regions(), instead of
   unmanaged version.
-- Move cdev_add() to dlb_init() and add all devices at once.
-- Fix Makefile, using "+=" style.
-- Remove FLR description and mention movdir64/enqcmd usage in doc.
-- Make the permission for the domain same as that for device for
   ioctl access.
-- Use idr instead of ida.
-- Add a lock in dlb_close() to prevent driver unbinding while ioctl
   coomands are in progress.
-- Remove wrappers that are used for code sharing between kernel driver
   and DPDK. 
- Address Pierre-Louis' feedback, including
-- Clean the warinings from checkpatch
-- Fix the warnings from "make W=1"

v6 (Intel internal version):
- Change the module name to dlb(from dlb2), which currently supports Intel
  DLB 2.0 only.
- Address all of Pierre-Louis' feedback on v5, including
-- Consolidate the two near-identical for loops in dlb2_release_domain_memory().
-- Remove an unnecessary "port = NULL" initialization
-- Consistently use curly braces on the *_LIST_FOR macros
   when the for-loop contents spans multiple lines.
-- Add a comment to the definition of DLB2FS_MAGIC
-- Remove always true if statemnets
-- Move the get_cos_bw mutex unlock call earlier to shorten the critical
   section.
- Address all of Dan's feedbacks, including
-- Replace the unions for register bits access with bitmask and shifts
-- Centralize the "to/from" user memory copies for ioctl functions.
-- Review ioctl design against Documentation/process/botching-up-ioctls.rst
-- Remove wraper functions for memory barriers.
-- Use ilog() to simplify a switch code block.
-- Add base-commit to cover letter.

v5 (Intel internal version):
- Reduce the scope of the initial patch set (drop the last 8 patches)
- Further decompose some of the remaining patches into multiple patches.
- Address all of Pierre-Louis' feedback, including:
-- Move kerneldoc to *.c files
-- Fix SPDX comment style
-- Add BAR macros
-- Improve/clarify struct dlb2_dev and struct device variable naming
-- Add const where missing
-- Clarify existing comments and add new ones in various places
-- Remove unnecessary memsets and zero-initialization
-- Remove PM abstraction, fix missing pm_runtime_allow(), and don't
   update PM refcnt when port files are opened and closed.
-- Convert certain ternary operations into if-statements
-- Out-line the CQ depth valid check
-- De-duplicate the logic in dlb2_release_device_memory()
-- Limit use of devm functions to allocating/freeing struct dlb2
- Address Ira's comments on dlb2.rst and correct commit messages that
  don't use the imperative voice.

v4:
- Move PCI device ID definitions into dlb2_hw_types.h, drop the VF definition
- Remove dlb2_dev_list
- Remove open/close functions and fops structure (unused)
- Remove "(char *)" cast from PCI driver name
- Unwind init failures properly
- Remove ID alloc helper functions and call IDA interfaces directly instead

v3:
- Remove DLB2_PCI_REG_READ/WRITE macros

v2:
- Change driver license to GPLv2 only
- Expand Kconfig help text and remove unnecessary (R)s
- Remove unnecessary prints
- Add a new entry in ioctl-number.rst
- Convert the ioctl handler into a switch statement
- Correct some instances of IOWR that should have been IOR
- Align macro blocks
- Don't break ioctl ABI when introducing new commands
- Remove indirect pointers from ioctl data structures
- Remove the get-sched-domain-fd ioctl command

Mike Ximing Chen (20):
  dlb: add skeleton for DLB driver
  dlb: initialize device
  dlb: add resource and device initialization
  dlb: add device ioctl layer and first three ioctls
  dlb: add scheduling domain configuration
  dlb: add domain software reset
  dlb: add low-level register reset operations
  dlb: add runtime power-management support
  dlb: add queue create, reset, get-depth ioctls
  dlb: add register operations for queue management
  dlb: add ioctl to configure ports and query poll mode
  dlb: add register operations for port management
  dlb: add port mmap support
  dlb: add start domain ioctl
  dlb: add queue map, unmap, and pending unmap operations
  dlb: add port map/unmap state machine
  dlb: add static queue map register operations
  dlb: add dynamic queue map register operations
  dlb: add queue unmap register operations
  dlb: queue map/unmap workqueue

 Documentation/misc-devices/dlb.rst            |  259 +
 Documentation/misc-devices/index.rst          |    1 +
 .../userspace-api/ioctl/ioctl-number.rst      |    1 +
 MAINTAINERS                                   |    8 +
 drivers/misc/Kconfig                          |    1 +
 drivers/misc/Makefile                         |    1 +
 drivers/misc/dlb/Kconfig                      |   18 +
 drivers/misc/dlb/Makefile                     |   11 +
 drivers/misc/dlb/dlb_bitmap.h                 |  210 +
 drivers/misc/dlb/dlb_file.c                   |  149 +
 drivers/misc/dlb/dlb_hw_types.h               |  311 +
 drivers/misc/dlb/dlb_ioctl.c                  |  498 ++
 drivers/misc/dlb/dlb_main.c                   |  614 ++
 drivers/misc/dlb/dlb_main.h                   |  178 +
 drivers/misc/dlb/dlb_pf_ops.c                 |  277 +
 drivers/misc/dlb/dlb_regs.h                   | 3640 +++++++++++
 drivers/misc/dlb/dlb_resource.c               | 5469 +++++++++++++++++
 drivers/misc/dlb/dlb_resource.h               |   94 +
 include/uapi/linux/dlb.h                      |  602 ++
 19 files changed, 12342 insertions(+)
 create mode 100644 Documentation/misc-devices/dlb.rst
 create mode 100644 drivers/misc/dlb/Kconfig
 create mode 100644 drivers/misc/dlb/Makefile
 create mode 100644 drivers/misc/dlb/dlb_bitmap.h
 create mode 100644 drivers/misc/dlb/dlb_file.c
 create mode 100644 drivers/misc/dlb/dlb_hw_types.h
 create mode 100644 drivers/misc/dlb/dlb_ioctl.c
 create mode 100644 drivers/misc/dlb/dlb_main.c
 create mode 100644 drivers/misc/dlb/dlb_main.h
 create mode 100644 drivers/misc/dlb/dlb_pf_ops.c
 create mode 100644 drivers/misc/dlb/dlb_regs.h
 create mode 100644 drivers/misc/dlb/dlb_resource.c
 create mode 100644 drivers/misc/dlb/dlb_resource.h
 create mode 100644 include/uapi/linux/dlb.h


base-commit: e71ba9452f0b5b2e8dc8aa5445198cd9214a6a62
-- 
2.17.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-07-16  1:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20210210175423.1873-1-mike.ximing.chen@intel.com>
     [not found] ` <20210210175423.1873-2-mike.ximing.chen@intel.com>
2021-02-18  7:34   ` [PATCH v10 01/20] dlb: add skeleton for DLB driver Chen, Mike Ximing
2021-02-18  7:52     ` gregkh
2021-02-18 15:37       ` Chen, Mike Ximing
2021-03-07 13:59       ` Chen, Mike Ximing
     [not found] ` <YEiLI8fGoa9DoCnF@kroah.com>
     [not found]   ` <CAPcyv4gCMjoDCc2azLEc8QC5mVhdKeLibic9gj4Lm=Xwpft9ZA@mail.gmail.com>
     [not found]     ` <BYAPR11MB30950965A223EDE5414EAE08D96F9@BYAPR11MB3095.namprd11.prod.outlook.com>
     [not found]       ` <CAPcyv4htddEBB9ePPSheH+rO+=VJULeHzx0gc384if7qXTUHHg@mail.gmail.com>
     [not found]         ` <BYAPR11MB309515F449B8660043A559E5D96C9@BYAPR11MB3095.namprd11.prod.outlook.com>
     [not found]           ` <YFBz1BICsjDsSJwv@kroah.com>
     [not found]             ` <CAPcyv4g89PjKqPuPp2ag0vB9Vq8igTqh0gdP0h+7ySTVPagQ9w@mail.gmail.com>
     [not found]               ` <YJ6KHznWTKvKQZEl@kroah.com>
2021-07-16  1:04                 ` [PATCH v10 00/20] dlb: introduce DLB device driver Chen, Mike Ximing
2021-01-27 22:56 Mike Ximing Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).