netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg KH <gregkh@linuxfoundation.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "Chen, Mike Ximing" <mike.ximing.chen@intel.com>,
	Netdev <netdev@vger.kernel.org>,
	David Miller <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>,
	"Brandeburg, Jesse" <jesse.brandeburg@intel.com>,
	KVM list <kvm@vger.kernel.org>,
	"Raj, Ashok" <ashok.raj@intel.com>
Subject: Re: [PATCH v10 00/20] dlb: introduce DLB device driver
Date: Fri, 14 May 2021 16:33:03 +0200	[thread overview]
Message-ID: <YJ6KHznWTKvKQZEl@kroah.com> (raw)
In-Reply-To: <CAPcyv4g89PjKqPuPp2ag0vB9Vq8igTqh0gdP0h+7ySTVPagQ9w@mail.gmail.com>

On Wed, May 12, 2021 at 12:07:31PM -0700, Dan Williams wrote:
> [ add kvm@vger.kernel.org for VFIO discussion ]
> 
> 
> On Tue, Mar 16, 2021 at 2:01 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> [..]
> > > Ioctl interface
> > > Kernel driver provides ioctl interface for user applications to setup and configure dlb domains, ports, queues, scheduling types, credits,
> > > sequence numbers, and links between ports and queues.  Applications also use the interface to start, stop and inquire the dlb operations.
> >
> > What applications use any of this?  What userspace implementation today
> > interacts with this?  Where is that code located?
> >
> > Too many TLAs here, I have even less of an understanding of what this
> > driver is supposed to be doing, and what this hardware is now than
> > before.
> >
> > And here I thought I understood hardware devices, and if I am confused,
> > I pity anyone else looking at this code...
> >
> > You all need to get some real documentation together to explain
> > everything here in terms that anyone can understand.  Without that, this
> > code is going nowhere.
> 
> Hi Greg,
> 
> So, for the last few weeks Mike and company have patiently waded
> through my questions and now I think we are at a point to work through
> the upstream driver architecture options and tradeoffs. You were not
> alone in struggling to understand what this device does because it is
> unlike any other accelerator Linux has ever considered. It shards /
> load balances a data stream for processing by CPU threads. This is
> typically a network appliance function / protocol, but could also be
> any other generic thread pool like the kernel's padata. It saves the
> CPU cycles spent load balancing work items and marshaling them through
> a thread pool pipeline. For example, in DPDK applications, DLB2 frees
> up entire cores that would otherwise be consumed with scheduling and
> work distribution. A separate proof-of-concept, using DLB2 to
> accelerate the kernel's "padata" thread pool for a crypto workload,
> demonstrated ~150% higher throughput with hardware employed to manage
> work distribution and result ordering. Yes, you need a sufficiently
> high touch / high throughput protocol before the software load
> balancing overhead coordinating CPU threads starts to dominate the
> performance, but there are some specific workloads willing to switch
> to this regime.
> 
> The primary consumer to date has been as a backend for the event
> handling in the userspace networking stack, DPDK. DLB2 has an existing
> polled-mode-userspace driver for that use case. So I said, "great,
> just add more features to that userspace driver and you're done". In
> fact there was DLB1 hardware that also had a polled-mode-userspace
> driver. So, the next question is "what's changed in DLB2 where a
> userspace driver is no longer suitable?". The new use case for DLB2 is
> new hardware support for a host driver to carve up device resources
> into smaller sets (vfio-mdevs) that can be assigned to guests (Intel
> calls this new hardware capability SIOV: Scalable IO Virtualization).
> 
> Hardware resource management is difficult to handle in userspace
> especially when bare-metal hardware events need to coordinate with
> guest-VM device instances. This includes a mailbox interface for the
> guest VM to negotiate resources with the host driver. Another more
> practical roadblock for a "DLB2 in userspace" proposal is the fact
> that it implements what are in-effect software-defined-interrupts to
> go beyond the scalability limits of PCI MSI-x (Intel calls this
> Interrupt Message Store: IMS). So even if hardware resource management
> was awkwardly plumbed into a userspace daemon there would still need
> to be kernel enabling for device-specific extensions to
> drivers/vfio/pci/vfio_pci_intrs.c for it to understand the IMS
> interrupts of DLB2 in addition to PCI MSI-x.
> 
> While that still might be solvable in userspace if you squint at it, I
> don't think Linux end users are served by pushing all of hardware
> resource management to userspace. VFIO is mostly built to pass entire
> PCI devices to guests, or in coordination with a kernel driver to
> describe a subset of the hardware to a virtual-device (vfio-mdev)
> interface. The rub here is that to date kernel drivers using VFIO to
> provision mdevs have some existing responsibilities to the core kernel
> like a network driver or DMA offload driver. The DLB2 driver offers no
> such service to the kernel for its primary role of accelerating a
> userspace data-plane. I am assuming here that  the padata
> proof-of-concept is interesting, but not a compelling reason to ship a
> driver compared to giving end users competent kernel-driven
> hardware-resource assignment for deploying DLB2 virtual instances into
> guest VMs.
> 
> My "just continue in userspace" suggestion has no answer for the IMS
> interrupt and reliable hardware resource management support
> requirements. If you're with me so far we can go deeper into the
> details, but in answer to your previous questions most of the TLAs
> were from the land of "SIOV" where the VFIO community should be
> brought in to review. The driver is mostly a configuration plane where
> the fast path data-plane is entirely in userspace. That configuration
> plane needs to manage hardware events and resourcing on behalf of
> guest VMs running on a partitioned subset of the device. There are
> worthwhile questions about whether some of the uapi can be refactored
> to common modules like uacce, but I think we need to get to a first
> order understanding on what DLB2 is and why the kernel has a role
> before diving into the uapi discussion.
> 
> Any clearer?

A bit, yes, thanks.

> So, in summary drivers/misc/ appears to be the first stop in the
> review since a host driver needs to be established to start the VFIO
> enabling campaign. With my community hat on, I think requiring
> standalone host drivers is healthier for Linux than broaching the
> subject of VFIO-only drivers. Even if, as in this case, the initial
> host driver is mostly implementing a capability that could be achieved
> with a userspace driver.

Ok, then how about a much "smaller" kernel driver for all of this, and a
whole lot of documentation to describe what is going on and what all of
the TLAs are.

thanks,

greg k-h

  reply	other threads:[~2021-05-14 14:33 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-10 17:54 [PATCH v10 00/20] dlb: introduce DLB device driver Mike Ximing Chen
2021-02-10 17:54 ` [PATCH v10 01/20] dlb: add skeleton for DLB driver Mike Ximing Chen
2021-02-18  7:34   ` Chen, Mike Ximing
2021-02-18  7:52     ` gregkh
2021-02-18 15:37       ` Chen, Mike Ximing
2021-03-07 13:59       ` Chen, Mike Ximing
2021-02-10 17:54 ` [PATCH v10 02/20] dlb: initialize device Mike Ximing Chen
2021-02-10 17:54 ` [PATCH v10 03/20] dlb: add resource and device initialization Mike Ximing Chen
2021-03-09  9:24   ` Greg KH
2021-03-10  1:33     ` Chen, Mike Ximing
2021-03-10  8:13       ` Greg KH
2021-03-10 20:26         ` Chen, Mike Ximing
2021-02-10 17:54 ` [PATCH v10 04/20] dlb: add device ioctl layer and first three ioctls Mike Ximing Chen
2021-03-09  9:26   ` Greg KH
2021-03-10  1:34     ` Chen, Mike Ximing
2021-02-10 17:54 ` [PATCH v10 05/20] dlb: add scheduling domain configuration Mike Ximing Chen
2021-03-09  9:28   ` Greg KH
2021-03-10  1:35     ` Chen, Mike Ximing
2021-02-10 17:54 ` [PATCH v10 06/20] dlb: add domain software reset Mike Ximing Chen
2021-02-10 17:54 ` [PATCH v10 07/20] dlb: add low-level register reset operations Mike Ximing Chen
2021-02-10 17:54 ` [PATCH v10 08/20] dlb: add runtime power-management support Mike Ximing Chen
2021-02-10 17:54 ` [PATCH v10 09/20] dlb: add queue create, reset, get-depth ioctls Mike Ximing Chen
2021-02-10 17:54 ` [PATCH v10 10/20] dlb: add register operations for queue management Mike Ximing Chen
2021-02-10 17:54 ` [PATCH v10 11/20] dlb: add ioctl to configure ports and query poll mode Mike Ximing Chen
2021-02-10 17:54 ` [PATCH v10 12/20] dlb: add register operations for port management Mike Ximing Chen
2021-02-10 17:54 ` [PATCH v10 13/20] dlb: add port mmap support Mike Ximing Chen
2021-02-10 17:54 ` [PATCH v10 14/20] dlb: add start domain ioctl Mike Ximing Chen
2021-03-09  9:29   ` Greg KH
2021-03-10  2:45     ` Chen, Mike Ximing
2021-03-10  8:14       ` Greg KH
2021-03-10 20:19         ` Dan Williams
2021-03-10 20:26         ` Chen, Mike Ximing
2021-02-10 17:54 ` [PATCH v10 15/20] dlb: add queue map, unmap, and pending unmap operations Mike Ximing Chen
2021-02-10 17:54 ` [PATCH v10 16/20] dlb: add port map/unmap state machine Mike Ximing Chen
2021-02-10 17:54 ` [PATCH v10 17/20] dlb: add static queue map register operations Mike Ximing Chen
2021-02-10 17:54 ` [PATCH v10 18/20] dlb: add dynamic " Mike Ximing Chen
2021-02-10 17:54 ` [PATCH v10 19/20] dlb: add queue unmap " Mike Ximing Chen
2021-02-10 17:54 ` [PATCH v10 20/20] dlb: queue map/unmap workqueue Mike Ximing Chen
2021-03-10  9:02 ` [PATCH v10 00/20] dlb: introduce DLB device driver Greg KH
2021-03-12  7:18   ` Dan Williams
2021-03-12 21:55     ` Chen, Mike Ximing
2021-03-13  1:39       ` Dan Williams
2021-03-15 20:04         ` Chen, Mike Ximing
2021-03-15 20:08         ` Chen, Mike Ximing
2021-03-15 20:18         ` Chen, Mike Ximing
2021-03-16  9:01           ` Greg KH
2021-05-12 19:07             ` Dan Williams
2021-05-14 14:33               ` Greg KH [this message]
2021-07-16  1:04                 ` Chen, Mike Ximing

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YJ6KHznWTKvKQZEl@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=arnd@arndb.de \
    --cc=ashok.raj@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=davem@davemloft.net \
    --cc=jesse.brandeburg@intel.com \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=mike.ximing.chen@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pierre-louis.bossart@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).