kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v10 00/20] dlb: introduce DLB device driver
       [not found]           ` <YFBz1BICsjDsSJwv@kroah.com>
@ 2021-05-12 19:07             ` Dan Williams
  2021-05-14 14:33               ` Greg KH
  0 siblings, 1 reply; 3+ messages in thread
From: Dan Williams @ 2021-05-12 19:07 UTC (permalink / raw)
  To: Greg KH
  Cc: Chen, Mike Ximing, Netdev, David Miller, Jakub Kicinski,
	Arnd Bergmann, Pierre-Louis Bossart, Brandeburg, Jesse, KVM list,
	Raj, Ashok

[ add kvm@vger.kernel.org for VFIO discussion ]


On Tue, Mar 16, 2021 at 2:01 AM Greg KH <gregkh@linuxfoundation.org> wrote:
[..]
> > Ioctl interface
> > Kernel driver provides ioctl interface for user applications to setup and configure dlb domains, ports, queues, scheduling types, credits,
> > sequence numbers, and links between ports and queues.  Applications also use the interface to start, stop and inquire the dlb operations.
>
> What applications use any of this?  What userspace implementation today
> interacts with this?  Where is that code located?
>
> Too many TLAs here, I have even less of an understanding of what this
> driver is supposed to be doing, and what this hardware is now than
> before.
>
> And here I thought I understood hardware devices, and if I am confused,
> I pity anyone else looking at this code...
>
> You all need to get some real documentation together to explain
> everything here in terms that anyone can understand.  Without that, this
> code is going nowhere.

Hi Greg,

So, for the last few weeks Mike and company have patiently waded
through my questions and now I think we are at a point to work through
the upstream driver architecture options and tradeoffs. You were not
alone in struggling to understand what this device does because it is
unlike any other accelerator Linux has ever considered. It shards /
load balances a data stream for processing by CPU threads. This is
typically a network appliance function / protocol, but could also be
any other generic thread pool like the kernel's padata. It saves the
CPU cycles spent load balancing work items and marshaling them through
a thread pool pipeline. For example, in DPDK applications, DLB2 frees
up entire cores that would otherwise be consumed with scheduling and
work distribution. A separate proof-of-concept, using DLB2 to
accelerate the kernel's "padata" thread pool for a crypto workload,
demonstrated ~150% higher throughput with hardware employed to manage
work distribution and result ordering. Yes, you need a sufficiently
high touch / high throughput protocol before the software load
balancing overhead coordinating CPU threads starts to dominate the
performance, but there are some specific workloads willing to switch
to this regime.

The primary consumer to date has been as a backend for the event
handling in the userspace networking stack, DPDK. DLB2 has an existing
polled-mode-userspace driver for that use case. So I said, "great,
just add more features to that userspace driver and you're done". In
fact there was DLB1 hardware that also had a polled-mode-userspace
driver. So, the next question is "what's changed in DLB2 where a
userspace driver is no longer suitable?". The new use case for DLB2 is
new hardware support for a host driver to carve up device resources
into smaller sets (vfio-mdevs) that can be assigned to guests (Intel
calls this new hardware capability SIOV: Scalable IO Virtualization).

Hardware resource management is difficult to handle in userspace
especially when bare-metal hardware events need to coordinate with
guest-VM device instances. This includes a mailbox interface for the
guest VM to negotiate resources with the host driver. Another more
practical roadblock for a "DLB2 in userspace" proposal is the fact
that it implements what are in-effect software-defined-interrupts to
go beyond the scalability limits of PCI MSI-x (Intel calls this
Interrupt Message Store: IMS). So even if hardware resource management
was awkwardly plumbed into a userspace daemon there would still need
to be kernel enabling for device-specific extensions to
drivers/vfio/pci/vfio_pci_intrs.c for it to understand the IMS
interrupts of DLB2 in addition to PCI MSI-x.

While that still might be solvable in userspace if you squint at it, I
don't think Linux end users are served by pushing all of hardware
resource management to userspace. VFIO is mostly built to pass entire
PCI devices to guests, or in coordination with a kernel driver to
describe a subset of the hardware to a virtual-device (vfio-mdev)
interface. The rub here is that to date kernel drivers using VFIO to
provision mdevs have some existing responsibilities to the core kernel
like a network driver or DMA offload driver. The DLB2 driver offers no
such service to the kernel for its primary role of accelerating a
userspace data-plane. I am assuming here that  the padata
proof-of-concept is interesting, but not a compelling reason to ship a
driver compared to giving end users competent kernel-driven
hardware-resource assignment for deploying DLB2 virtual instances into
guest VMs.

My "just continue in userspace" suggestion has no answer for the IMS
interrupt and reliable hardware resource management support
requirements. If you're with me so far we can go deeper into the
details, but in answer to your previous questions most of the TLAs
were from the land of "SIOV" where the VFIO community should be
brought in to review. The driver is mostly a configuration plane where
the fast path data-plane is entirely in userspace. That configuration
plane needs to manage hardware events and resourcing on behalf of
guest VMs running on a partitioned subset of the device. There are
worthwhile questions about whether some of the uapi can be refactored
to common modules like uacce, but I think we need to get to a first
order understanding on what DLB2 is and why the kernel has a role
before diving into the uapi discussion.

Any clearer?

So, in summary drivers/misc/ appears to be the first stop in the
review since a host driver needs to be established to start the VFIO
enabling campaign. With my community hat on, I think requiring
standalone host drivers is healthier for Linux than broaching the
subject of VFIO-only drivers. Even if, as in this case, the initial
host driver is mostly implementing a capability that could be achieved
with a userspace driver.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v10 00/20] dlb: introduce DLB device driver
  2021-05-12 19:07             ` [PATCH v10 00/20] dlb: introduce DLB device driver Dan Williams
@ 2021-05-14 14:33               ` Greg KH
  2021-07-16  1:04                 ` Chen, Mike Ximing
  0 siblings, 1 reply; 3+ messages in thread
From: Greg KH @ 2021-05-14 14:33 UTC (permalink / raw)
  To: Dan Williams
  Cc: Chen, Mike Ximing, Netdev, David Miller, Jakub Kicinski,
	Arnd Bergmann, Pierre-Louis Bossart, Brandeburg, Jesse, KVM list,
	Raj, Ashok

On Wed, May 12, 2021 at 12:07:31PM -0700, Dan Williams wrote:
> [ add kvm@vger.kernel.org for VFIO discussion ]
> 
> 
> On Tue, Mar 16, 2021 at 2:01 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> [..]
> > > Ioctl interface
> > > Kernel driver provides ioctl interface for user applications to setup and configure dlb domains, ports, queues, scheduling types, credits,
> > > sequence numbers, and links between ports and queues.  Applications also use the interface to start, stop and inquire the dlb operations.
> >
> > What applications use any of this?  What userspace implementation today
> > interacts with this?  Where is that code located?
> >
> > Too many TLAs here, I have even less of an understanding of what this
> > driver is supposed to be doing, and what this hardware is now than
> > before.
> >
> > And here I thought I understood hardware devices, and if I am confused,
> > I pity anyone else looking at this code...
> >
> > You all need to get some real documentation together to explain
> > everything here in terms that anyone can understand.  Without that, this
> > code is going nowhere.
> 
> Hi Greg,
> 
> So, for the last few weeks Mike and company have patiently waded
> through my questions and now I think we are at a point to work through
> the upstream driver architecture options and tradeoffs. You were not
> alone in struggling to understand what this device does because it is
> unlike any other accelerator Linux has ever considered. It shards /
> load balances a data stream for processing by CPU threads. This is
> typically a network appliance function / protocol, but could also be
> any other generic thread pool like the kernel's padata. It saves the
> CPU cycles spent load balancing work items and marshaling them through
> a thread pool pipeline. For example, in DPDK applications, DLB2 frees
> up entire cores that would otherwise be consumed with scheduling and
> work distribution. A separate proof-of-concept, using DLB2 to
> accelerate the kernel's "padata" thread pool for a crypto workload,
> demonstrated ~150% higher throughput with hardware employed to manage
> work distribution and result ordering. Yes, you need a sufficiently
> high touch / high throughput protocol before the software load
> balancing overhead coordinating CPU threads starts to dominate the
> performance, but there are some specific workloads willing to switch
> to this regime.
> 
> The primary consumer to date has been as a backend for the event
> handling in the userspace networking stack, DPDK. DLB2 has an existing
> polled-mode-userspace driver for that use case. So I said, "great,
> just add more features to that userspace driver and you're done". In
> fact there was DLB1 hardware that also had a polled-mode-userspace
> driver. So, the next question is "what's changed in DLB2 where a
> userspace driver is no longer suitable?". The new use case for DLB2 is
> new hardware support for a host driver to carve up device resources
> into smaller sets (vfio-mdevs) that can be assigned to guests (Intel
> calls this new hardware capability SIOV: Scalable IO Virtualization).
> 
> Hardware resource management is difficult to handle in userspace
> especially when bare-metal hardware events need to coordinate with
> guest-VM device instances. This includes a mailbox interface for the
> guest VM to negotiate resources with the host driver. Another more
> practical roadblock for a "DLB2 in userspace" proposal is the fact
> that it implements what are in-effect software-defined-interrupts to
> go beyond the scalability limits of PCI MSI-x (Intel calls this
> Interrupt Message Store: IMS). So even if hardware resource management
> was awkwardly plumbed into a userspace daemon there would still need
> to be kernel enabling for device-specific extensions to
> drivers/vfio/pci/vfio_pci_intrs.c for it to understand the IMS
> interrupts of DLB2 in addition to PCI MSI-x.
> 
> While that still might be solvable in userspace if you squint at it, I
> don't think Linux end users are served by pushing all of hardware
> resource management to userspace. VFIO is mostly built to pass entire
> PCI devices to guests, or in coordination with a kernel driver to
> describe a subset of the hardware to a virtual-device (vfio-mdev)
> interface. The rub here is that to date kernel drivers using VFIO to
> provision mdevs have some existing responsibilities to the core kernel
> like a network driver or DMA offload driver. The DLB2 driver offers no
> such service to the kernel for its primary role of accelerating a
> userspace data-plane. I am assuming here that  the padata
> proof-of-concept is interesting, but not a compelling reason to ship a
> driver compared to giving end users competent kernel-driven
> hardware-resource assignment for deploying DLB2 virtual instances into
> guest VMs.
> 
> My "just continue in userspace" suggestion has no answer for the IMS
> interrupt and reliable hardware resource management support
> requirements. If you're with me so far we can go deeper into the
> details, but in answer to your previous questions most of the TLAs
> were from the land of "SIOV" where the VFIO community should be
> brought in to review. The driver is mostly a configuration plane where
> the fast path data-plane is entirely in userspace. That configuration
> plane needs to manage hardware events and resourcing on behalf of
> guest VMs running on a partitioned subset of the device. There are
> worthwhile questions about whether some of the uapi can be refactored
> to common modules like uacce, but I think we need to get to a first
> order understanding on what DLB2 is and why the kernel has a role
> before diving into the uapi discussion.
> 
> Any clearer?

A bit, yes, thanks.

> So, in summary drivers/misc/ appears to be the first stop in the
> review since a host driver needs to be established to start the VFIO
> enabling campaign. With my community hat on, I think requiring
> standalone host drivers is healthier for Linux than broaching the
> subject of VFIO-only drivers. Even if, as in this case, the initial
> host driver is mostly implementing a capability that could be achieved
> with a userspace driver.

Ok, then how about a much "smaller" kernel driver for all of this, and a
whole lot of documentation to describe what is going on and what all of
the TLAs are.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [PATCH v10 00/20] dlb: introduce DLB device driver
  2021-05-14 14:33               ` Greg KH
@ 2021-07-16  1:04                 ` Chen, Mike Ximing
  0 siblings, 0 replies; 3+ messages in thread
From: Chen, Mike Ximing @ 2021-07-16  1:04 UTC (permalink / raw)
  To: Greg KH, Williams, Dan J
  Cc: Netdev, David Miller, Jakub Kicinski, Arnd Bergmann,
	Pierre-Louis Bossart, Brandeburg, Jesse, KVM list, Raj, Ashok,
	Linux Kernel Mailing List

> -----Original Message-----
> From: Greg KH <gregkh@linuxfoundation.org>
> Sent: Friday, May 14, 2021 10:33 AM
> To: Williams, Dan J <dan.j.williams@intel.com>
> > Hi Greg,
> >
> > So, for the last few weeks Mike and company have patiently waded
> > through my questions and now I think we are at a point to work through
> > the upstream driver architecture options and tradeoffs. You were not
> > alone in struggling to understand what this device does because it is
> > unlike any other accelerator Linux has ever considered. It shards /
> > load balances a data stream for processing by CPU threads. This is
> > typically a network appliance function / protocol, but could also be
> > any other generic thread pool like the kernel's padata. It saves the
> > CPU cycles spent load balancing work items and marshaling them through
> > a thread pool pipeline. For example, in DPDK applications, DLB2 frees
> > up entire cores that would otherwise be consumed with scheduling and
> > work distribution. A separate proof-of-concept, using DLB2 to
> > accelerate the kernel's "padata" thread pool for a crypto workload,
> > demonstrated ~150% higher throughput with hardware employed to manage
> > work distribution and result ordering. Yes, you need a sufficiently
> > high touch / high throughput protocol before the software load
> > balancing overhead coordinating CPU threads starts to dominate the
> > performance, but there are some specific workloads willing to switch
> > to this regime.
> >
> > The primary consumer to date has been as a backend for the event
> > handling in the userspace networking stack, DPDK. DLB2 has an existing
> > polled-mode-userspace driver for that use case. So I said, "great,
> > just add more features to that userspace driver and you're done". In
> > fact there was DLB1 hardware that also had a polled-mode-userspace
> > driver. So, the next question is "what's changed in DLB2 where a
> > userspace driver is no longer suitable?". The new use case for DLB2 is
> > new hardware support for a host driver to carve up device resources
> > into smaller sets (vfio-mdevs) that can be assigned to guests (Intel
> > calls this new hardware capability SIOV: Scalable IO Virtualization).
> >
> > Hardware resource management is difficult to handle in userspace
> > especially when bare-metal hardware events need to coordinate with
> > guest-VM device instances. This includes a mailbox interface for the
> > guest VM to negotiate resources with the host driver. Another more
> > practical roadblock for a "DLB2 in userspace" proposal is the fact
> > that it implements what are in-effect software-defined-interrupts to
> > go beyond the scalability limits of PCI MSI-x (Intel calls this
> > Interrupt Message Store: IMS). So even if hardware resource management
> > was awkwardly plumbed into a userspace daemon there would still need
> > to be kernel enabling for device-specific extensions to
> > drivers/vfio/pci/vfio_pci_intrs.c for it to understand the IMS
> > interrupts of DLB2 in addition to PCI MSI-x.
> >
> > While that still might be solvable in userspace if you squint at it, I
> > don't think Linux end users are served by pushing all of hardware
> > resource management to userspace. VFIO is mostly built to pass entire
> > PCI devices to guests, or in coordination with a kernel driver to
> > describe a subset of the hardware to a virtual-device (vfio-mdev)
> > interface. The rub here is that to date kernel drivers using VFIO to
> > provision mdevs have some existing responsibilities to the core kernel
> > like a network driver or DMA offload driver. The DLB2 driver offers no
> > such service to the kernel for its primary role of accelerating a
> > userspace data-plane. I am assuming here that  the padata
> > proof-of-concept is interesting, but not a compelling reason to ship a
> > driver compared to giving end users competent kernel-driven
> > hardware-resource assignment for deploying DLB2 virtual instances into
> > guest VMs.
> >
> > My "just continue in userspace" suggestion has no answer for the IMS
> > interrupt and reliable hardware resource management support
> > requirements. If you're with me so far we can go deeper into the
> > details, but in answer to your previous questions most of the TLAs
> > were from the land of "SIOV" where the VFIO community should be
> > brought in to review. The driver is mostly a configuration plane where
> > the fast path data-plane is entirely in userspace. That configuration
> > plane needs to manage hardware events and resourcing on behalf of
> > guest VMs running on a partitioned subset of the device. There are
> > worthwhile questions about whether some of the uapi can be refactored
> > to common modules like uacce, but I think we need to get to a first
> > order understanding on what DLB2 is and why the kernel has a role
> > before diving into the uapi discussion.
> >
> > Any clearer?
> 
> A bit, yes, thanks.
> 
> > So, in summary drivers/misc/ appears to be the first stop in the
> > review since a host driver needs to be established to start the VFIO
> > enabling campaign. With my community hat on, I think requiring
> > standalone host drivers is healthier for Linux than broaching the
> > subject of VFIO-only drivers. Even if, as in this case, the initial
> > host driver is mostly implementing a capability that could be achieved
> > with a userspace driver.
> 
> Ok, then how about a much "smaller" kernel driver for all of this, and a whole lot of documentation to
> describe what is going on and what all of the TLAs are.
> 
> thanks,
> 
> greg k-h

Hi Greg,

tl;dr: We have been looking into various options to reduce the kernel driver size and ABI surface, such as moving more responsibility to user space, reusing existing kernel modules (uacce, for example), and converting functionality from ioctl to sysfs. End result 10 ioctls will be replaced by sysfs, the rest of them (20 ioctls) will be replaced by configfs. Some concepts are moved to device-special files rather than ioctls that produce file descriptors.

Details:
We investigated the possibility of using uacce (https://www.kernel.org/doc/html/latest/misc-devices/uacce.html) in our kernel driver.  The uacce interface fits well with accelerators that process user data with known source and destination addresses. For a DLB (Dynamic Load Balancer), however,  the destination port depends on the system load and is unknown to the application. While uacce exposes  "queues" to user, the dlb driver has to handle much complicated resource managements, such as credits, ports, queues and domains. We would have to add a lot of more concepts and code, which are not useful for other accelerators,  in uacce to make it working for DLB. This may also lead to a bigger code size over all.

We also took a another look at moving resource management functionality from kernel space to user space. Much of kernel driver supports both PF (Physical Function) on host and VFs (Virtual Functions) on VMs. Since only the PF on the host has permissions to setup resource and configure the DLB HW, all the requests on VFs are forwarded to PF via the VF-PF mail boxes, which are handled by the kernel driver. The driver also maintains various virtual id to physical id translations (for VFs, ports, queues, etc), and provides virtual-to-physical id mapping info DLB HW so that an application in VM can access the resources with virtual IDs only. Because of the VF/VDEV support, we have to keep the resource management, which is more than one half of the code size, in the driver.

To simplify the user interface, we explored the ways to reduce/eliminate ioctl interface, and found that we can utilize configfs for many of the DLB functionalities. Our current plan is to replace all the ioctls in the driver with sysfs and configfs. We will use configfs for most of setup and configuration for both physical function and virtual functions. This may not reduce the overall driver size greatly, but it will lessen much of ABI maintenance burden (with the elimination of ioctls).  I hope this is something that is in line with what you like to see for the driver.

Thanks
Mike

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-07-16  1:04 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20210210175423.1873-1-mike.ximing.chen@intel.com>
     [not found] ` <YEiLI8fGoa9DoCnF@kroah.com>
     [not found]   ` <CAPcyv4gCMjoDCc2azLEc8QC5mVhdKeLibic9gj4Lm=Xwpft9ZA@mail.gmail.com>
     [not found]     ` <BYAPR11MB30950965A223EDE5414EAE08D96F9@BYAPR11MB3095.namprd11.prod.outlook.com>
     [not found]       ` <CAPcyv4htddEBB9ePPSheH+rO+=VJULeHzx0gc384if7qXTUHHg@mail.gmail.com>
     [not found]         ` <BYAPR11MB309515F449B8660043A559E5D96C9@BYAPR11MB3095.namprd11.prod.outlook.com>
     [not found]           ` <YFBz1BICsjDsSJwv@kroah.com>
2021-05-12 19:07             ` [PATCH v10 00/20] dlb: introduce DLB device driver Dan Williams
2021-05-14 14:33               ` Greg KH
2021-07-16  1:04                 ` Chen, Mike Ximing

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).