> -----Original Message----- > From: Dan Williams > Sent: Friday, March 12, 2021 2:18 AM > To: Greg KH > Cc: Chen, Mike Ximing ; Netdev ; David Miller > ; Jakub Kicinski ; Arnd Bergmann ; Pierre- > Louis Bossart > Subject: Re: [PATCH v10 00/20] dlb: introduce DLB device driver > > On Wed, Mar 10, 2021 at 1:02 AM Greg KH wrote: > > > > On Wed, Feb 10, 2021 at 11:54:03AM -0600, Mike Ximing Chen wrote: > > > Intel DLB is an accelerator for the event-driven programming model of > > > DPDK's Event Device Library[2]. The library is used in packet processing > > > pipelines that arrange for multi-core scalability, dynamic load-balancing, > > > and variety of packet distribution and synchronization schemes > > > > The more that I look at this driver, the more I think this is a "run > > around" the networking stack. Why are you all adding kernel code to > > support DPDK which is an out-of-kernel networking stack? We can't > > support that at all. > > > > Why not just use the normal networking functionality instead of this > > custom char-device-node-monstrosity? > > Hey Greg, > > I've come to find out that this driver does not bypass kernel > networking, and the kernel functionality I thought it bypassed, IPC / > Scheduling, is not even in the picture in the non-accelerated case. So > given you and I are both confused by this submission that tells me > that the problem space needs to be clarified and assumptions need to > be enumerated. > > > What is missing from todays kernel networking code that requires this > > run-around? > > Yes, first and foremost Mike, what are the kernel infrastructure gaps > and pain points that led up to this proposal? Hi Greg/Dan, Sorry for the confusion. The cover letter and document did not articulate clearly the problem being solved by DLB. We will update the document in the next revision. In a brief description, Intel DLB is an accelerator that replaces shared memory queuing systems. Large modern server-class CPUs, with local caches for each core, tend to incur costly cache misses, cross core snoops and contentions. The impact becomes noticeable at high (messages/sec) rates, such as are seen in high throughput packet processing and HPC applications. DLB is used in high rate pipelines that require a variety of packet distribution & synchronization schemes. It can be leveraged to accelerate user space libraries, such as DPDK eventdev. It could show similar benefits in frameworks such as PADATA in the Kernel - if the messaging rate is sufficiently high. As can be seen in the following diagram, DLB operations come into the picture only after packets are received by Rx core from the networking devices. WCs are the worker cores which process packets distributed by DLB. (In case the diagram gets mis-formatted, please see attached file). WC1 WC4 +-----+ +----+ +---+ / \ +---+ / \ +---+ +----+ +-----+ |NIC | |Rx | |DLB| / \ |DLB| / \ |DLB| |Tx | |NIC | |Ports|---|Core|---| |-----WC2----| |-----WC5----| |---|Core|---|Ports| +-----+ -----+ +---+ \ / +---+ \ / +---+ +----+ ------+ \ / \ / WC3 WC6 At its heart DLB consists of resources than can be assigned to VDEVs/applications in a flexible manner, such as ports, queues, credits to use queues, sequence numbers, etc. We support up to 16/32 VF/VDEVs (depending on version) with SRIOV and SIOV. Role of the kernel driver includes VDEV Composition (vdcm module), functional level reset, live migration, error handling, power management, and etc.. Thanks Mike