All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC: Network Plugin Architecture (NPA) for vmxnet3
@ 2010-05-04 23:02 Pankaj Thakkar
  0 siblings, 0 replies; 33+ messages in thread
From: Pankaj Thakkar @ 2010-05-04 23:02 UTC (permalink / raw)
  To: linux-kernel, netdev, virtualization; +Cc: pv-drivers

Device passthrough technology allows a guest to bypass the hypervisor and drive
the underlying physical device. VMware has been exploring various ways to
deliver this technology to users in a manner which is easy to adopt. In this
process we have prepared an architecture along with Intel - NPA (Network Plugin
Architecture). NPA allows the guest to use the virtualized NIC vmxnet3 to
passthrough to a number of physical NICs which support it. The document below
provides an overview of NPA.

We intend to upgrade the upstreamed vmxnet3 driver to implement NPA so that
Linux users can exploit the benefits provided by passthrough devices in a
seamless manner while retaining the benefits of virtualization. The document
below tries to answer most of the questions which we anticipated. Please let us
know your comments and queries.

Thank you.

Signed-off-by: Pankaj Thakkar <pthakkar@vmware.com>


Network Plugin Architecture
---------------------------

VMware has been working on various device passthrough technologies for the past
few years. Passthrough technology is interesting as it can result in better
performance/cpu utilization for certain demanding applications. In our vSphere
product we support direct assignment of PCI devices like networking adapters to
a guest virtual machine. This allows the guest to drive the device using the
device drivers installed inside the guest. This is similar to the way KVM
allows for passthrough of PCI devices to the guests. The hypervisor is bypassed
for all I/O and control operations and hence it can not provide any value add
features such as live migration, suspend/resume, etc.

Network Plugin Architecture (NPA) is an approach which VMware has developed in
joint partnership with Intel which allows us to retain the best of passthrough
technology and virtualization. NPA allows for passthrough of the fast data
(I/O) path and lets the hypervisor deal with the slow control path using
traditional emulation/paravirtualization techniques. Through this splitting of
data and control path the hypervisor can still provide the above mentioned
value add features and exploit the performance benefits of passthrough.

NPA requires SR-IOV hardware which allows for sharing of one single NIC adapter
by multiple guests. SR-IOV hardware has many logically separate functions
called virtual functions (VF) which can be independently assigned to the guest
OS. They also have one or more physical functions (PF) (managed by a PF driver)
which are used by the hypervisor to control certain aspects of the VFs and the
rest of the hardware. NPA splits the guest driver into two components called
the Shell and the Plugin. The shell is responsible for interacting with the
guest networking stack and funneling the control operations to the hypervisor.
The plugin is responsible for driving the data path of the virtual function
exposed to the guest and is specific to the NIC hardware. NPA also requires an
embedded switch in the NIC to allow for switching traffic among the virtual
functions. The PF is also used as an uplink to provide connectivity to other
VMs which are in emulation mode. The figure below shows the major components in
a block diagram.

        +------------------------------+
        |         Guest VM             |
        |                              |
        |      +----------------+      |
        |      | vmxnet3 driver |      |
        |      |     Shell      |      |
        |      | +============+ |      |
        |      | |   Plugin   | |      |
        +------+-+------------+-+------+
                |           .
               +---------+  .
               | vmxnet3 |  .
               |___+-----+  .
                     |      .
                     |      .
                +----------------------------+
                |                            |
                |       virtual switch       |
                +----------------------------+
                  |         .               \
                  |         .                \
           +=============+  .                 \
           | PF control  |  .                  \
           |             |  .                   \
           |  L2 driver  |  .                    \
           +-------------+  .                     \
                  |         .                      \
                  |         .                       \
                +------------------------+     +------------+
                | PF   VF1 VF2 ...   VFn |     |            |
                |                        |     |  regular   |
                |       SR-IOV NIC       |     |    nic     |
                |    +--------------+    |     |   +--------+
                |    |   embedded   |    |     +---+
                |    |    switch    |    |
                |    +--------------+    |
                |        +---------------+
                +--------+

NPA offers several benefits:
1. Performance: Critical performance sensitive paths are not trapped and the
guest can directly drive the hardware without incurring virtualization
overheads.

2. Hypervisor control: All control operations from the guest such as programming
MAC address go through the hypervisor layer and hence can be subjected to
hypervisor policies. The PF driver can be further used to put policy decisions
like which VLAN the guest should be on.

3. Guest Management: No hardware specific drivers need to be installed in the
guest virtual machine and hence no overheads are incurred for guest management.
All software for the driver (including the PF driver and the plugin) is
installed in the hypervisor.

4. IHV independence: The architecture provides guidelines for splitting the
functionality between the VFs and PF but does not dictate how the hardware
should be implemented. It gives the IHV the freedom to do asynchronous updates
either to the software or the hardware to work around any defects.

The fundamental tenet in NPA is to let the hypervisor control the passthrough
functionality with minimal guest intervention. This gives a lot of flexibility
to the hypervisor which can then treat passthrough as an offload feature (just
like TSO, LRO, etc) which is offered to the guest virtual machine when there
are no conflicting features present. For example, if the hypervisor wants to
migrate the virtual machine from one host to another, the hypervisor can switch
the virtual machine out of passthrough mode into paravirtualized/emulated mode
and it can use existing technique to migrate the virtual machine. Once the
virtual machine is migrated to the destination host the hypervisor can switch
the virtual machine back to passthrough mode if a supporting SR-IOV nic is
present. This may involve reloading of a different plugin corresponding to the
new SR-IOV hardware.

Internally we have explored various other options before settling on the NPA
approach. For example there are approaches which create a bonding driver on top
of a complete passthrough of a NIC device and an emulated/paravirtualized
device. Though this approach allows for live migration to work it adds a lot of
complexity and dependency. First the hypervisor has to rely on a guest with
hot-add support. Second the hypervisor has to depend on the guest networking
stack to cooperate to perform migration. Third the guest has to carry the
driver images for all possible hardware to which the guest may migrate to.
Fourth the hypervisor does not get full control for all the policy decisions.
Another approach we have considered is to have a uniform interface for the data
path between the emulated/paravirtualized device and the hardware device which
allows the hypervisor to seamlessly switch from the emulated interface to the
hardware interface. Though this approach is very attractive and can work
without any guest involvement it is not acceptable to the IHVs as it does not
give them the freedom to fix bugs/erratas and differentiate from each other. We
believe NPA approach provides the right level of control and flexibility to the
hypervisors while letting the guest exploit the benefits of passthrough.

The plugin image is provided by the IHVs along with the PF driver and is
packaged in the hypervisor. The plugin image is OS agnostic and can be loaded
either into a Linux VM or a Windows VM. The plugin is written against the Shell
API interface which the shell is responsible for implementing. The API
interface allows the plugin to do TX and RX only by programming the hardware
rings (along with things like buffer allocation and basic initialization). The
virtual machine comes up in paravirtualized/emulated mode when it is booted.
The hypervisor allocates the VF and other resources and notifies the shell of
the availability of the VF. The hypervisor injects the plugin into memory
location specified by the shell. The shell initializes the plugin by calling
into a known entry point and the plugin initializes the data path. The control
path is already initialized by the PF driver when the VF is allocated. At this
point the shell switches to using the loaded plugin to do all further TX and RX
operations. The guest networking stack does not participate in these operations
and continues to function normally. All the control operations continue being
trapped by the hypervisor and are directed to the PF driver as needed. For
example, if the MAC address changes the hypervisor updates its internal state
and changes the state of the embedded switch as well through the PF control
API.

We have reworked our existing Linux vmxnet3 driver to accomodate NPA by
splitting the driver into two parts: Shell and Plugin. The new split driver is
backwards compatible and continues to work on old/existing vmxnet3 device
emulations. The shell implements the API interface and contains code to do the
bookkeeping for TX/RX buffers along with interrupt management. The shell code
also handles the loading of the plugin and verifying the license of the loaded
plugin. The plugin contains the code specific to vmxnet3 ring and descriptor
management. The plugin uses the same Shell API interface which would be used by
other IHVs. This vmxnet3 plugin is compiled statically along with the shell as
this is needed to provide connectivity when there is no underlying SR-IOV
device present. The IHV plugins are required to be distributed under GPL
license and we are currently looking at ways to verify this both within the
hypervisor and within the shell.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
  2010-05-06  8:58       ` Avi Kivity
@ 2010-05-10 20:46         ` Pankaj Thakkar
  -1 siblings, 0 replies; 33+ messages in thread
From: Pankaj Thakkar @ 2010-05-10 20:46 UTC (permalink / raw)
  To: Avi Kivity
  Cc: linux-kernel, netdev, virtualization, pv-drivers, Shreyas Bhatewara

On Thu, May 06, 2010 at 01:58:54AM -0700, Avi Kivity wrote:
> > We don't pass the whole VF to the guest. Only the BAR which is responsible for
> > TX/RX/intr is mapped into guest space.
> 
> Does the SR/IOV spec guarantee that you will have such a separation?

No. This is a guideline which we provided to IHVs and would have to be enforced
through testing/certification.

> How can you unmap the VF without guest cooperation?  If you're executing 
> Plugin code, you can't yank anything out.

In our Kawela plugin we don't have any reads from the memory space at all.
Hence you can yank the VF anytime (the code loaded in the guest address space
will keep on executing). Even if there were reads we can map the memory
pages to a NULL page and return 0xffffffff so that the plugin can detect this
and return an error to the shell. Remember there are no control operations in
the plugin and the code is really small (about 1k lines compared to 5k lines in
the full VF driver).

> 
> Are plugins executed with preemption/interrupts disabled?

Depends on the model. Today the plugin code for checking the TX/RX rings runs
in the deferred napi context.

> What ISAs do those plugins support?

x86 and x64.

Thanks,

-pankaj


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
@ 2010-05-10 20:46         ` Pankaj Thakkar
  0 siblings, 0 replies; 33+ messages in thread
From: Pankaj Thakkar @ 2010-05-10 20:46 UTC (permalink / raw)
  To: Avi Kivity
  Cc: linux-kernel, netdev, virtualization, pv-drivers, Shreyas Bhatewara

On Thu, May 06, 2010 at 01:58:54AM -0700, Avi Kivity wrote:
> > We don't pass the whole VF to the guest. Only the BAR which is responsible for
> > TX/RX/intr is mapped into guest space.
> 
> Does the SR/IOV spec guarantee that you will have such a separation?

No. This is a guideline which we provided to IHVs and would have to be enforced
through testing/certification.

> How can you unmap the VF without guest cooperation?  If you're executing 
> Plugin code, you can't yank anything out.

In our Kawela plugin we don't have any reads from the memory space at all.
Hence you can yank the VF anytime (the code loaded in the guest address space
will keep on executing). Even if there were reads we can map the memory
pages to a NULL page and return 0xffffffff so that the plugin can detect this
and return an error to the shell. Remember there are no control operations in
the plugin and the code is really small (about 1k lines compared to 5k lines in
the full VF driver).

> 
> Are plugins executed with preemption/interrupts disabled?

Depends on the model. Today the plugin code for checking the TX/RX rings runs
in the deferred napi context.

> What ISAs do those plugins support?

x86 and x64.

Thanks,

-pankaj


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
  2010-05-06  8:58       ` Avi Kivity
  (?)
@ 2010-05-10 20:46       ` Pankaj Thakkar
  -1 siblings, 0 replies; 33+ messages in thread
From: Pankaj Thakkar @ 2010-05-10 20:46 UTC (permalink / raw)
  To: Avi Kivity; +Cc: pv-drivers, netdev, linux-kernel, virtualization

On Thu, May 06, 2010 at 01:58:54AM -0700, Avi Kivity wrote:
> > We don't pass the whole VF to the guest. Only the BAR which is responsible for
> > TX/RX/intr is mapped into guest space.
> 
> Does the SR/IOV spec guarantee that you will have such a separation?

No. This is a guideline which we provided to IHVs and would have to be enforced
through testing/certification.

> How can you unmap the VF without guest cooperation?  If you're executing 
> Plugin code, you can't yank anything out.

In our Kawela plugin we don't have any reads from the memory space at all.
Hence you can yank the VF anytime (the code loaded in the guest address space
will keep on executing). Even if there were reads we can map the memory
pages to a NULL page and return 0xffffffff so that the plugin can detect this
and return an error to the shell. Remember there are no control operations in
the plugin and the code is really small (about 1k lines compared to 5k lines in
the full VF driver).

> 
> Are plugins executed with preemption/interrupts disabled?

Depends on the model. Today the plugin code for checking the TX/RX rings runs
in the deferred napi context.

> What ISAs do those plugins support?

x86 and x64.

Thanks,

-pankaj

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
  2010-05-05 19:44     ` Pankaj Thakkar
@ 2010-05-06  8:58       ` Avi Kivity
  -1 siblings, 0 replies; 33+ messages in thread
From: Avi Kivity @ 2010-05-06  8:58 UTC (permalink / raw)
  To: Pankaj Thakkar
  Cc: linux-kernel, netdev, virtualization, pv-drivers, Shreyas Bhatewara

On 05/05/2010 10:44 PM, Pankaj Thakkar wrote:
> On Wed, May 05, 2010 at 10:59:51AM -0700, Avi Kivity wrote:
>    
>> Date: Wed, 5 May 2010 10:59:51 -0700
>> From: Avi Kivity<avi@redhat.com>
>> To: Pankaj Thakkar<pthakkar@vmware.com>
>> CC: "linux-kernel@vger.kernel.org"<linux-kernel@vger.kernel.org>,
>> 	"netdev@vger.kernel.org"<netdev@vger.kernel.org>,
>> 	"virtualization@lists.linux-foundation.org"
>>   <virtualization@lists.linux-foundation.org>,
>> 	"pv-drivers@vmware.com"<pv-drivers@vmware.com>,
>> 	Shreyas Bhatewara<sbhatewara@vmware.com>
>> Subject: Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
>>
>> On 05/05/2010 02:02 AM, Pankaj Thakkar wrote:
>>      
>>> 2. Hypervisor control: All control operations from the guest such as programming
>>> MAC address go through the hypervisor layer and hence can be subjected to
>>> hypervisor policies. The PF driver can be further used to put policy decisions
>>> like which VLAN the guest should be on.
>>>
>>>        
>> Is this enforced?  Since you pass the hardware through, you can't rely
>> on the guest actually doing this, yes?
>>      
> We don't pass the whole VF to the guest. Only the BAR which is responsible for
> TX/RX/intr is mapped into guest space.

Does the SR/IOV spec guarantee that you will have such a separation?



>
>>
>>> We have reworked our existing Linux vmxnet3 driver to accomodate NPA by
>>> splitting the driver into two parts: Shell and Plugin. The new split driver is
>>>
>>>        
>> So the Shell would be the reworked or new bond driver, and Plugins would
>> be ordinary Linux network drivers.
>>      
> In NPA we do not rely on the guest OS to provide any of these services like
> bonding or PCI hotplug.

Well the Shell does some sort of bonding (there are two links and the 
shell selects which one to exercise) and some sort of hotplug.  Since 
the Shell is part of the guest OS, you do rely on it.

It's certainly simpler than PCI hotplug or ordinary bonding.

> We don't rely on the guest OS to unmap a VF and switch
> a VM out of passthrough. In a bonding approach that becomes an issue you can't
> just yank a device from underneath, you have to wait for the OS to process the
> request and switch from using VF to the emulated device and this makes the
> hypervisor dependent on the guest OS.

How can you unmap the VF without guest cooperation?  If you're executing 
Plugin code, you can't yank anything out.

Are plugins executed with preemption/interrupts disabled?

> Also we don't rely on the presence of all
> the drivers inside the guest OS (be it Linux or Windows), the ESX hypervisor
> carries all the plugins and the PF drivers and injects the right one as needed.
> These plugins are guest agnostic and the IHVs do not have to write plugins for
> different OS.
>    

What ISAs do those plugins support?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
@ 2010-05-06  8:58       ` Avi Kivity
  0 siblings, 0 replies; 33+ messages in thread
From: Avi Kivity @ 2010-05-06  8:58 UTC (permalink / raw)
  To: Pankaj Thakkar
  Cc: linux-kernel, netdev, virtualization, pv-drivers, Shreyas Bhatewara

On 05/05/2010 10:44 PM, Pankaj Thakkar wrote:
> On Wed, May 05, 2010 at 10:59:51AM -0700, Avi Kivity wrote:
>    
>> Date: Wed, 5 May 2010 10:59:51 -0700
>> From: Avi Kivity<avi@redhat.com>
>> To: Pankaj Thakkar<pthakkar@vmware.com>
>> CC: "linux-kernel@vger.kernel.org"<linux-kernel@vger.kernel.org>,
>> 	"netdev@vger.kernel.org"<netdev@vger.kernel.org>,
>> 	"virtualization@lists.linux-foundation.org"
>>   <virtualization@lists.linux-foundation.org>,
>> 	"pv-drivers@vmware.com"<pv-drivers@vmware.com>,
>> 	Shreyas Bhatewara<sbhatewara@vmware.com>
>> Subject: Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
>>
>> On 05/05/2010 02:02 AM, Pankaj Thakkar wrote:
>>      
>>> 2. Hypervisor control: All control operations from the guest such as programming
>>> MAC address go through the hypervisor layer and hence can be subjected to
>>> hypervisor policies. The PF driver can be further used to put policy decisions
>>> like which VLAN the guest should be on.
>>>
>>>        
>> Is this enforced?  Since you pass the hardware through, you can't rely
>> on the guest actually doing this, yes?
>>      
> We don't pass the whole VF to the guest. Only the BAR which is responsible for
> TX/RX/intr is mapped into guest space.

Does the SR/IOV spec guarantee that you will have such a separation?



>
>>
>>> We have reworked our existing Linux vmxnet3 driver to accomodate NPA by
>>> splitting the driver into two parts: Shell and Plugin. The new split driver is
>>>
>>>        
>> So the Shell would be the reworked or new bond driver, and Plugins would
>> be ordinary Linux network drivers.
>>      
> In NPA we do not rely on the guest OS to provide any of these services like
> bonding or PCI hotplug.

Well the Shell does some sort of bonding (there are two links and the 
shell selects which one to exercise) and some sort of hotplug.  Since 
the Shell is part of the guest OS, you do rely on it.

It's certainly simpler than PCI hotplug or ordinary bonding.

> We don't rely on the guest OS to unmap a VF and switch
> a VM out of passthrough. In a bonding approach that becomes an issue you can't
> just yank a device from underneath, you have to wait for the OS to process the
> request and switch from using VF to the emulated device and this makes the
> hypervisor dependent on the guest OS.

How can you unmap the VF without guest cooperation?  If you're executing 
Plugin code, you can't yank anything out.

Are plugins executed with preemption/interrupts disabled?

> Also we don't rely on the presence of all
> the drivers inside the guest OS (be it Linux or Windows), the ESX hypervisor
> carries all the plugins and the PF drivers and injects the right one as needed.
> These plugins are guest agnostic and the IHVs do not have to write plugins for
> different OS.
>    

What ISAs do those plugins support?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
  2010-05-05 19:44     ` Pankaj Thakkar
  (?)
@ 2010-05-06  8:58     ` Avi Kivity
  -1 siblings, 0 replies; 33+ messages in thread
From: Avi Kivity @ 2010-05-06  8:58 UTC (permalink / raw)
  To: Pankaj Thakkar; +Cc: pv-drivers, netdev, linux-kernel, virtualization

On 05/05/2010 10:44 PM, Pankaj Thakkar wrote:
> On Wed, May 05, 2010 at 10:59:51AM -0700, Avi Kivity wrote:
>    
>> Date: Wed, 5 May 2010 10:59:51 -0700
>> From: Avi Kivity<avi@redhat.com>
>> To: Pankaj Thakkar<pthakkar@vmware.com>
>> CC: "linux-kernel@vger.kernel.org"<linux-kernel@vger.kernel.org>,
>> 	"netdev@vger.kernel.org"<netdev@vger.kernel.org>,
>> 	"virtualization@lists.linux-foundation.org"
>>   <virtualization@lists.linux-foundation.org>,
>> 	"pv-drivers@vmware.com"<pv-drivers@vmware.com>,
>> 	Shreyas Bhatewara<sbhatewara@vmware.com>
>> Subject: Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
>>
>> On 05/05/2010 02:02 AM, Pankaj Thakkar wrote:
>>      
>>> 2. Hypervisor control: All control operations from the guest such as programming
>>> MAC address go through the hypervisor layer and hence can be subjected to
>>> hypervisor policies. The PF driver can be further used to put policy decisions
>>> like which VLAN the guest should be on.
>>>
>>>        
>> Is this enforced?  Since you pass the hardware through, you can't rely
>> on the guest actually doing this, yes?
>>      
> We don't pass the whole VF to the guest. Only the BAR which is responsible for
> TX/RX/intr is mapped into guest space.

Does the SR/IOV spec guarantee that you will have such a separation?



>
>>
>>> We have reworked our existing Linux vmxnet3 driver to accomodate NPA by
>>> splitting the driver into two parts: Shell and Plugin. The new split driver is
>>>
>>>        
>> So the Shell would be the reworked or new bond driver, and Plugins would
>> be ordinary Linux network drivers.
>>      
> In NPA we do not rely on the guest OS to provide any of these services like
> bonding or PCI hotplug.

Well the Shell does some sort of bonding (there are two links and the 
shell selects which one to exercise) and some sort of hotplug.  Since 
the Shell is part of the guest OS, you do rely on it.

It's certainly simpler than PCI hotplug or ordinary bonding.

> We don't rely on the guest OS to unmap a VF and switch
> a VM out of passthrough. In a bonding approach that becomes an issue you can't
> just yank a device from underneath, you have to wait for the OS to process the
> request and switch from using VF to the emulated device and this makes the
> hypervisor dependent on the guest OS.

How can you unmap the VF without guest cooperation?  If you're executing 
Plugin code, you can't yank anything out.

Are plugins executed with preemption/interrupts disabled?

> Also we don't rely on the presence of all
> the drivers inside the guest OS (be it Linux or Windows), the ESX hypervisor
> carries all the plugins and the PF drivers and injects the right one as needed.
> These plugins are guest agnostic and the IHVs do not have to write plugins for
> different OS.
>    

What ISAs do those plugins support?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
  2010-05-05 17:59 ` Avi Kivity
@ 2010-05-05 19:44     ` Pankaj Thakkar
  2010-05-05 19:44   ` Pankaj Thakkar
  1 sibling, 0 replies; 33+ messages in thread
From: Pankaj Thakkar @ 2010-05-05 19:44 UTC (permalink / raw)
  To: Avi Kivity
  Cc: linux-kernel, netdev, virtualization, pv-drivers, Shreyas Bhatewara

On Wed, May 05, 2010 at 10:59:51AM -0700, Avi Kivity wrote:
> Date: Wed, 5 May 2010 10:59:51 -0700
> From: Avi Kivity <avi@redhat.com>
> To: Pankaj Thakkar <pthakkar@vmware.com>
> CC: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
> 	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
> 	"virtualization@lists.linux-foundation.org"
>  <virtualization@lists.linux-foundation.org>,
> 	"pv-drivers@vmware.com" <pv-drivers@vmware.com>,
> 	Shreyas Bhatewara <sbhatewara@vmware.com>
> Subject: Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
> 
> On 05/05/2010 02:02 AM, Pankaj Thakkar wrote:
> > 2. Hypervisor control: All control operations from the guest such as programming
> > MAC address go through the hypervisor layer and hence can be subjected to
> > hypervisor policies. The PF driver can be further used to put policy decisions
> > like which VLAN the guest should be on.
> >    
> 
> Is this enforced?  Since you pass the hardware through, you can't rely 
> on the guest actually doing this, yes?

We don't pass the whole VF to the guest. Only the BAR which is responsible for
TX/RX/intr is mapped into guest space. The interface between the shell and
plugin only allows to do operations related to TX and RX such as send a packet
to the VF, allocate RX buffers, indicate a packet upto the shell. All control
operations are handled by the shell and the shell does what the existing
vmxnet3 drivers does (touch a specific register and let the device emulation do
the work). When a VF is mapped to the guest the hypervisor knows this and
programs the h/w accordingly on behalf of the shell. So for example if the VM
does a MAC address change inside the guest, the shell would write to
VMXNET3_REG_MAC{L|H} registers which would trigger the device emulation to read
the new mac address and update its internal virtual port information for the
virtual switch and if the VF is mapped it would also program the embedded
switch RX filters to reflect the new mac address.

> 
> > The plugin image is provided by the IHVs along with the PF driver and is
> > packaged in the hypervisor. The plugin image is OS agnostic and can be loaded
> > either into a Linux VM or a Windows VM. The plugin is written against the Shell
> > API interface which the shell is responsible for implementing. The API
> > interface allows the plugin to do TX and RX only by programming the hardware
> > rings (along with things like buffer allocation and basic initialization). The
> > virtual machine comes up in paravirtualized/emulated mode when it is booted.
> > The hypervisor allocates the VF and other resources and notifies the shell of
> > the availability of the VF. The hypervisor injects the plugin into memory
> > location specified by the shell. The shell initializes the plugin by calling
> > into a known entry point and the plugin initializes the data path. The control
> > path is already initialized by the PF driver when the VF is allocated. At this
> > point the shell switches to using the loaded plugin to do all further TX and RX
> > operations. The guest networking stack does not participate in these operations
> > and continues to function normally. All the control operations continue being
> > trapped by the hypervisor and are directed to the PF driver as needed. For
> > example, if the MAC address changes the hypervisor updates its internal state
> > and changes the state of the embedded switch as well through the PF control
> > API.
> >    
> 
> This is essentially a miniature network stack with a its own mini 
> bonding layer, mini hotplug, and mini API, except s/API/ABI/.  Is this a 
> correct view?

To some extent yes but there is no complicated bonding nor there is any thing
like a PCI hotplug. The shell interface is small and the OS always interacts
with the shell as the main driver. Based on the underlying VF the plugin
changes and this plugin as well is really small. Our vmxnet3 s/w plugin is
about 1300 lines with whitespaces and comments and the Intel Kawela plugin is
about 1100 lines with whitspaces and comments. The design principle is to put
more of the complexity related to initialization/control into the PF driver
rather than in plugin.

> 
> If so, the Linuxy approach would be to use the ordinary drivers and the 
> Linux networking API, and hide the bond setup using namespaces.  The 
> bond driver, or perhaps a new, similar, driver can be enhanced to 
> propagate ethtool commands to its (hidden) components, and to have a 
> control channel with the hypervisor.
> 
> This would make the approach hypervisor agnostic, you're just pairing 
> two devices and presenting them to the rest of the stack as a single device.
> 
> > We have reworked our existing Linux vmxnet3 driver to accomodate NPA by
> > splitting the driver into two parts: Shell and Plugin. The new split driver is
> >    
> 
> So the Shell would be the reworked or new bond driver, and Plugins would 
> be ordinary Linux network drivers.

In NPA we do not rely on the guest OS to provide any of these services like
bonding or PCI hotplug. We don't rely on the guest OS to unmap a VF and switch
a VM out of passthrough. In a bonding approach that becomes an issue you can't
just yank a device from underneath, you have to wait for the OS to process the
request and switch from using VF to the emulated device and this makes the
hypervisor dependent on the guest OS. Also we don't rely on the presence of all
the drivers inside the guest OS (be it Linux or Windows), the ESX hypervisor
carries all the plugins and the PF drivers and injects the right one as needed.
These plugins are guest agnostic and the IHVs do not have to write plugins for
different OS.


Thanks,

-pankaj

 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
@ 2010-05-05 19:44     ` Pankaj Thakkar
  0 siblings, 0 replies; 33+ messages in thread
From: Pankaj Thakkar @ 2010-05-05 19:44 UTC (permalink / raw)
  To: Avi Kivity
  Cc: linux-kernel, netdev, virtualization, pv-drivers, Shreyas Bhatewara

On Wed, May 05, 2010 at 10:59:51AM -0700, Avi Kivity wrote:
> Date: Wed, 5 May 2010 10:59:51 -0700
> From: Avi Kivity <avi@redhat.com>
> To: Pankaj Thakkar <pthakkar@vmware.com>
> CC: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
> 	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
> 	"virtualization@lists.linux-foundation.org"
>  <virtualization@lists.linux-foundation.org>,
> 	"pv-drivers@vmware.com" <pv-drivers@vmware.com>,
> 	Shreyas Bhatewara <sbhatewara@vmware.com>
> Subject: Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
> 
> On 05/05/2010 02:02 AM, Pankaj Thakkar wrote:
> > 2. Hypervisor control: All control operations from the guest such as programming
> > MAC address go through the hypervisor layer and hence can be subjected to
> > hypervisor policies. The PF driver can be further used to put policy decisions
> > like which VLAN the guest should be on.
> >    
> 
> Is this enforced?  Since you pass the hardware through, you can't rely 
> on the guest actually doing this, yes?

We don't pass the whole VF to the guest. Only the BAR which is responsible for
TX/RX/intr is mapped into guest space. The interface between the shell and
plugin only allows to do operations related to TX and RX such as send a packet
to the VF, allocate RX buffers, indicate a packet upto the shell. All control
operations are handled by the shell and the shell does what the existing
vmxnet3 drivers does (touch a specific register and let the device emulation do
the work). When a VF is mapped to the guest the hypervisor knows this and
programs the h/w accordingly on behalf of the shell. So for example if the VM
does a MAC address change inside the guest, the shell would write to
VMXNET3_REG_MAC{L|H} registers which would trigger the device emulation to read
the new mac address and update its internal virtual port information for the
virtual switch and if the VF is mapped it would also program the embedded
switch RX filters to reflect the new mac address.

> 
> > The plugin image is provided by the IHVs along with the PF driver and is
> > packaged in the hypervisor. The plugin image is OS agnostic and can be loaded
> > either into a Linux VM or a Windows VM. The plugin is written against the Shell
> > API interface which the shell is responsible for implementing. The API
> > interface allows the plugin to do TX and RX only by programming the hardware
> > rings (along with things like buffer allocation and basic initialization). The
> > virtual machine comes up in paravirtualized/emulated mode when it is booted.
> > The hypervisor allocates the VF and other resources and notifies the shell of
> > the availability of the VF. The hypervisor injects the plugin into memory
> > location specified by the shell. The shell initializes the plugin by calling
> > into a known entry point and the plugin initializes the data path. The control
> > path is already initialized by the PF driver when the VF is allocated. At this
> > point the shell switches to using the loaded plugin to do all further TX and RX
> > operations. The guest networking stack does not participate in these operations
> > and continues to function normally. All the control operations continue being
> > trapped by the hypervisor and are directed to the PF driver as needed. For
> > example, if the MAC address changes the hypervisor updates its internal state
> > and changes the state of the embedded switch as well through the PF control
> > API.
> >    
> 
> This is essentially a miniature network stack with a its own mini 
> bonding layer, mini hotplug, and mini API, except s/API/ABI/.  Is this a 
> correct view?

To some extent yes but there is no complicated bonding nor there is any thing
like a PCI hotplug. The shell interface is small and the OS always interacts
with the shell as the main driver. Based on the underlying VF the plugin
changes and this plugin as well is really small. Our vmxnet3 s/w plugin is
about 1300 lines with whitespaces and comments and the Intel Kawela plugin is
about 1100 lines with whitspaces and comments. The design principle is to put
more of the complexity related to initialization/control into the PF driver
rather than in plugin.

> 
> If so, the Linuxy approach would be to use the ordinary drivers and the 
> Linux networking API, and hide the bond setup using namespaces.  The 
> bond driver, or perhaps a new, similar, driver can be enhanced to 
> propagate ethtool commands to its (hidden) components, and to have a 
> control channel with the hypervisor.
> 
> This would make the approach hypervisor agnostic, you're just pairing 
> two devices and presenting them to the rest of the stack as a single device.
> 
> > We have reworked our existing Linux vmxnet3 driver to accomodate NPA by
> > splitting the driver into two parts: Shell and Plugin. The new split driver is
> >    
> 
> So the Shell would be the reworked or new bond driver, and Plugins would 
> be ordinary Linux network drivers.

In NPA we do not rely on the guest OS to provide any of these services like
bonding or PCI hotplug. We don't rely on the guest OS to unmap a VF and switch
a VM out of passthrough. In a bonding approach that becomes an issue you can't
just yank a device from underneath, you have to wait for the OS to process the
request and switch from using VF to the emulated device and this makes the
hypervisor dependent on the guest OS. Also we don't rely on the presence of all
the drivers inside the guest OS (be it Linux or Windows), the ESX hypervisor
carries all the plugins and the PF drivers and injects the right one as needed.
These plugins are guest agnostic and the IHVs do not have to write plugins for
different OS.


Thanks,

-pankaj

 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
  2010-05-05 17:59 ` Avi Kivity
  2010-05-05 19:44     ` Pankaj Thakkar
@ 2010-05-05 19:44   ` Pankaj Thakkar
  1 sibling, 0 replies; 33+ messages in thread
From: Pankaj Thakkar @ 2010-05-05 19:44 UTC (permalink / raw)
  To: Avi Kivity; +Cc: pv-drivers, netdev, linux-kernel, virtualization

On Wed, May 05, 2010 at 10:59:51AM -0700, Avi Kivity wrote:
> Date: Wed, 5 May 2010 10:59:51 -0700
> From: Avi Kivity <avi@redhat.com>
> To: Pankaj Thakkar <pthakkar@vmware.com>
> CC: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
> 	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
> 	"virtualization@lists.linux-foundation.org"
>  <virtualization@lists.linux-foundation.org>,
> 	"pv-drivers@vmware.com" <pv-drivers@vmware.com>,
> 	Shreyas Bhatewara <sbhatewara@vmware.com>
> Subject: Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
> 
> On 05/05/2010 02:02 AM, Pankaj Thakkar wrote:
> > 2. Hypervisor control: All control operations from the guest such as programming
> > MAC address go through the hypervisor layer and hence can be subjected to
> > hypervisor policies. The PF driver can be further used to put policy decisions
> > like which VLAN the guest should be on.
> >    
> 
> Is this enforced?  Since you pass the hardware through, you can't rely 
> on the guest actually doing this, yes?

We don't pass the whole VF to the guest. Only the BAR which is responsible for
TX/RX/intr is mapped into guest space. The interface between the shell and
plugin only allows to do operations related to TX and RX such as send a packet
to the VF, allocate RX buffers, indicate a packet upto the shell. All control
operations are handled by the shell and the shell does what the existing
vmxnet3 drivers does (touch a specific register and let the device emulation do
the work). When a VF is mapped to the guest the hypervisor knows this and
programs the h/w accordingly on behalf of the shell. So for example if the VM
does a MAC address change inside the guest, the shell would write to
VMXNET3_REG_MAC{L|H} registers which would trigger the device emulation to read
the new mac address and update its internal virtual port information for the
virtual switch and if the VF is mapped it would also program the embedded
switch RX filters to reflect the new mac address.

> 
> > The plugin image is provided by the IHVs along with the PF driver and is
> > packaged in the hypervisor. The plugin image is OS agnostic and can be loaded
> > either into a Linux VM or a Windows VM. The plugin is written against the Shell
> > API interface which the shell is responsible for implementing. The API
> > interface allows the plugin to do TX and RX only by programming the hardware
> > rings (along with things like buffer allocation and basic initialization). The
> > virtual machine comes up in paravirtualized/emulated mode when it is booted.
> > The hypervisor allocates the VF and other resources and notifies the shell of
> > the availability of the VF. The hypervisor injects the plugin into memory
> > location specified by the shell. The shell initializes the plugin by calling
> > into a known entry point and the plugin initializes the data path. The control
> > path is already initialized by the PF driver when the VF is allocated. At this
> > point the shell switches to using the loaded plugin to do all further TX and RX
> > operations. The guest networking stack does not participate in these operations
> > and continues to function normally. All the control operations continue being
> > trapped by the hypervisor and are directed to the PF driver as needed. For
> > example, if the MAC address changes the hypervisor updates its internal state
> > and changes the state of the embedded switch as well through the PF control
> > API.
> >    
> 
> This is essentially a miniature network stack with a its own mini 
> bonding layer, mini hotplug, and mini API, except s/API/ABI/.  Is this a 
> correct view?

To some extent yes but there is no complicated bonding nor there is any thing
like a PCI hotplug. The shell interface is small and the OS always interacts
with the shell as the main driver. Based on the underlying VF the plugin
changes and this plugin as well is really small. Our vmxnet3 s/w plugin is
about 1300 lines with whitespaces and comments and the Intel Kawela plugin is
about 1100 lines with whitspaces and comments. The design principle is to put
more of the complexity related to initialization/control into the PF driver
rather than in plugin.

> 
> If so, the Linuxy approach would be to use the ordinary drivers and the 
> Linux networking API, and hide the bond setup using namespaces.  The 
> bond driver, or perhaps a new, similar, driver can be enhanced to 
> propagate ethtool commands to its (hidden) components, and to have a 
> control channel with the hypervisor.
> 
> This would make the approach hypervisor agnostic, you're just pairing 
> two devices and presenting them to the rest of the stack as a single device.
> 
> > We have reworked our existing Linux vmxnet3 driver to accomodate NPA by
> > splitting the driver into two parts: Shell and Plugin. The new split driver is
> >    
> 
> So the Shell would be the reworked or new bond driver, and Plugins would 
> be ordinary Linux network drivers.

In NPA we do not rely on the guest OS to provide any of these services like
bonding or PCI hotplug. We don't rely on the guest OS to unmap a VF and switch
a VM out of passthrough. In a bonding approach that becomes an issue you can't
just yank a device from underneath, you have to wait for the OS to process the
request and switch from using VF to the emulated device and this makes the
hypervisor dependent on the guest OS. Also we don't rely on the presence of all
the drivers inside the guest OS (be it Linux or Windows), the ESX hypervisor
carries all the plugins and the PF drivers and injects the right one as needed.
These plugins are guest agnostic and the IHVs do not have to write plugins for
different OS.


Thanks,

-pankaj

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
  2010-05-05  0:58   ` Chris Wright
@ 2010-05-05 19:00     ` Pankaj Thakkar
  -1 siblings, 0 replies; 33+ messages in thread
From: Pankaj Thakkar @ 2010-05-05 19:00 UTC (permalink / raw)
  To: Chris Wright
  Cc: linux-kernel, netdev, virtualization, pv-drivers, Shreyas Bhatewara, kvm

On Tue, May 04, 2010 at 05:58:52PM -0700, Chris Wright wrote:
> Date: Tue, 4 May 2010 17:58:52 -0700
> From: Chris Wright <chrisw@sous-sol.org>
> To: Pankaj Thakkar <pthakkar@vmware.com>
> CC: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
> 	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
> 	"virtualization@lists.linux-foundation.org"
>  <virtualization@lists.linux-foundation.org>,
> 	"pv-drivers@vmware.com" <pv-drivers@vmware.com>,
> 	Shreyas Bhatewara <sbhatewara@vmware.com>,
> 	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
> Subject: Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
> 
> * Pankaj Thakkar (pthakkar@vmware.com) wrote:
> > We intend to upgrade the upstreamed vmxnet3 driver to implement NPA so that
> > Linux users can exploit the benefits provided by passthrough devices in a
> > seamless manner while retaining the benefits of virtualization. The document
> > below tries to answer most of the questions which we anticipated. Please let us
> > know your comments and queries.
> 
> How does the throughput, latency, and host CPU utilization for normal
> data path compare with say NetQueue?

NetQueue is really for scaling across multiple VMs. NPA allows similar scaling
and also helps in improving the CPU efficiency for a single VM since the
hypervisor is bypassed. Througput wise both emulation and passthrough (NPA) can
obtain line rates on 10gig but passthrough saves upto 40% cpu based on the
workload. We did a demo at IDF 2009 where we compared 8 VMs running on NetQueue
v/s 8 VMs running on NPA (using Niantic) and we obtained similar CPU efficiency
gains.

> 
> And does this obsolete your UPT implementation?

NPA and UPT share a lot of code in the hypervisor. UPT was adopted only by very
limited IHVs and hence NPA is our way forward to have all IHVs onboard.

> How many cards actually support this NPA interface?  What does it look
> like, i.e. where is the NPA specification?  (AFAIK, we never got the UPT
> one).

We have it working internally with Intel Niantic (10G) and Kawela (1G) SR-IOV
NIC. We are also working with upcoming Broadcom 10G card and plan to support
other IHVs. This is unlike UPT so we don't dictate the register sets or rings
like we did in UPT. Rather we have guidelines like that the card should have an
embedded switch for inter VF switching or should support programming (rx
filters, VLAN, etc) though the PF driver rather than the VF driver.

> How do you handle hardware which has a more symmetric view of the
> SR-IOV world (SR-IOV is only PCI sepcification, not a network driver
> specification)?  Or hardware which has multiple functions per physical
> port (multiqueue, hw filtering, embedded switch, etc.)?

I am not sure what do you mean by symmetric view of SR-IOV world?

NPA allows multi-queue VFs and requires an embedded switch currently. As far as
the PF driver is concerned we require IHVs to support all existing and upcoming
features like NetQueue, FCoE, etc. The PF driver is considered special and is
used to drive the traffic for the emulated/paravirtualized VMs and is also used
to program things on behalf of the VFs through the hypervisor. If the hardware
has multiple physical functions they are treated as separate adapters (with
their own set of VFs) and we require the embedded switch to maintain that
distinction as well.


> > NPA offers several benefits:
> > 1. Performance: Critical performance sensitive paths are not trapped and the
> > guest can directly drive the hardware without incurring virtualization
> > overheads.
> 
> Can you demonstrate with data?

The setup is 2.667Ghz Nehalem server running SLES11 VM talking to a 2.33Ghz
Barcelona client box running RHEL 5.1. We had netperf streams with 16k msg size
over 64k socket size running between server VM and client and they are using
Intel Niantic 10G cards. In both cases (NPA and regular) the VM was CPU
saturated (used one full core).

TX: regular vmxnet3 = 3085.5 Mbps/GHz; NPA vmxnet3 = 4397.2 Mbps/GHz
RX: regular vmxnet3 = 1379.6 Mbps/GHz; NPA vmxnet3 = 2349.7 Mbps/GHz

We have similar results for other configuration and in general we have seen NPA
is better in terms of CPU cost and can save upto 40% of CPU cost.

> 
> > 2. Hypervisor control: All control operations from the guest such as programming
> > MAC address go through the hypervisor layer and hence can be subjected to
> > hypervisor policies. The PF driver can be further used to put policy decisions
> > like which VLAN the guest should be on.
> 
> This can happen without NPA as well.  VF simply needs to request
> the change via the PF (in fact, hw does that right now).  Also, we
> already have a host side management interface via PF (see, for example,
> RTM_SETLINK IFLA_VF_MAC interface).
> 
> What is control plane interface?  Just something like a fixed register set?

All operations other than TX/RX go through the vmxnet3 shell to the vmxnet3
device emulation. So the control plane is really the vmxnet3 device emulation
as far as the guest is concerned.

> 
> > 3. Guest Management: No hardware specific drivers need to be installed in the
> > guest virtual machine and hence no overheads are incurred for guest management.
> > All software for the driver (including the PF driver and the plugin) is
> > installed in the hypervisor.
> 
> So we have a plugin per hardware VF implementation?  And the hypervisor
> injects this code into the guest?

One guest-agnostic plugin per VF implementation. Yes, the plugin is injected
into the guest by the hypervisor.

> > The plugin image is provided by the IHVs along with the PF driver and is
> > packaged in the hypervisor. The plugin image is OS agnostic and can be loaded
> > either into a Linux VM or a Windows VM. The plugin is written against the Shell
> 
> And it will need to be GPL AFAICT from what you've said thus far.  It
> does sound worrisome, although I suppose hw firmware isn't particularly
> different.

Yes it would be GPL and we are thinking of enforcing the license in the
hypervisor as well as in the shell.

> How does the shell switch back to emulated mode for live migration?

The hypervisor sends a notification to the shell to switch out of passthrough
and it quiesces the VF and tears down the mapping between VF and the guest. The
shell free's up the buffers and other resources on behalf of the plugin and
reinitializes the s/w vmxnet3 emulation plugin.

> Please make this shell API interface and the PF/VF requirments available.

We have an internal prototype working but we are not yet ready to post the
patch to LKML. We are still in the process of making changes to our windows
driver and want to ensure that we take into account all changes that could
happen.

Thanks,

-pankaj


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
@ 2010-05-05 19:00     ` Pankaj Thakkar
  0 siblings, 0 replies; 33+ messages in thread
From: Pankaj Thakkar @ 2010-05-05 19:00 UTC (permalink / raw)
  To: Chris Wright
  Cc: linux-kernel, netdev, virtualization, pv-drivers, Shreyas Bhatewara, kvm

On Tue, May 04, 2010 at 05:58:52PM -0700, Chris Wright wrote:
> Date: Tue, 4 May 2010 17:58:52 -0700
> From: Chris Wright <chrisw@sous-sol.org>
> To: Pankaj Thakkar <pthakkar@vmware.com>
> CC: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
> 	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
> 	"virtualization@lists.linux-foundation.org"
>  <virtualization@lists.linux-foundation.org>,
> 	"pv-drivers@vmware.com" <pv-drivers@vmware.com>,
> 	Shreyas Bhatewara <sbhatewara@vmware.com>,
> 	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
> Subject: Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
> 
> * Pankaj Thakkar (pthakkar@vmware.com) wrote:
> > We intend to upgrade the upstreamed vmxnet3 driver to implement NPA so that
> > Linux users can exploit the benefits provided by passthrough devices in a
> > seamless manner while retaining the benefits of virtualization. The document
> > below tries to answer most of the questions which we anticipated. Please let us
> > know your comments and queries.
> 
> How does the throughput, latency, and host CPU utilization for normal
> data path compare with say NetQueue?

NetQueue is really for scaling across multiple VMs. NPA allows similar scaling
and also helps in improving the CPU efficiency for a single VM since the
hypervisor is bypassed. Througput wise both emulation and passthrough (NPA) can
obtain line rates on 10gig but passthrough saves upto 40% cpu based on the
workload. We did a demo at IDF 2009 where we compared 8 VMs running on NetQueue
v/s 8 VMs running on NPA (using Niantic) and we obtained similar CPU efficiency
gains.

> 
> And does this obsolete your UPT implementation?

NPA and UPT share a lot of code in the hypervisor. UPT was adopted only by very
limited IHVs and hence NPA is our way forward to have all IHVs onboard.

> How many cards actually support this NPA interface?  What does it look
> like, i.e. where is the NPA specification?  (AFAIK, we never got the UPT
> one).

We have it working internally with Intel Niantic (10G) and Kawela (1G) SR-IOV
NIC. We are also working with upcoming Broadcom 10G card and plan to support
other IHVs. This is unlike UPT so we don't dictate the register sets or rings
like we did in UPT. Rather we have guidelines like that the card should have an
embedded switch for inter VF switching or should support programming (rx
filters, VLAN, etc) though the PF driver rather than the VF driver.

> How do you handle hardware which has a more symmetric view of the
> SR-IOV world (SR-IOV is only PCI sepcification, not a network driver
> specification)?  Or hardware which has multiple functions per physical
> port (multiqueue, hw filtering, embedded switch, etc.)?

I am not sure what do you mean by symmetric view of SR-IOV world?

NPA allows multi-queue VFs and requires an embedded switch currently. As far as
the PF driver is concerned we require IHVs to support all existing and upcoming
features like NetQueue, FCoE, etc. The PF driver is considered special and is
used to drive the traffic for the emulated/paravirtualized VMs and is also used
to program things on behalf of the VFs through the hypervisor. If the hardware
has multiple physical functions they are treated as separate adapters (with
their own set of VFs) and we require the embedded switch to maintain that
distinction as well.


> > NPA offers several benefits:
> > 1. Performance: Critical performance sensitive paths are not trapped and the
> > guest can directly drive the hardware without incurring virtualization
> > overheads.
> 
> Can you demonstrate with data?

The setup is 2.667Ghz Nehalem server running SLES11 VM talking to a 2.33Ghz
Barcelona client box running RHEL 5.1. We had netperf streams with 16k msg size
over 64k socket size running between server VM and client and they are using
Intel Niantic 10G cards. In both cases (NPA and regular) the VM was CPU
saturated (used one full core).

TX: regular vmxnet3 = 3085.5 Mbps/GHz; NPA vmxnet3 = 4397.2 Mbps/GHz
RX: regular vmxnet3 = 1379.6 Mbps/GHz; NPA vmxnet3 = 2349.7 Mbps/GHz

We have similar results for other configuration and in general we have seen NPA
is better in terms of CPU cost and can save upto 40% of CPU cost.

> 
> > 2. Hypervisor control: All control operations from the guest such as programming
> > MAC address go through the hypervisor layer and hence can be subjected to
> > hypervisor policies. The PF driver can be further used to put policy decisions
> > like which VLAN the guest should be on.
> 
> This can happen without NPA as well.  VF simply needs to request
> the change via the PF (in fact, hw does that right now).  Also, we
> already have a host side management interface via PF (see, for example,
> RTM_SETLINK IFLA_VF_MAC interface).
> 
> What is control plane interface?  Just something like a fixed register set?

All operations other than TX/RX go through the vmxnet3 shell to the vmxnet3
device emulation. So the control plane is really the vmxnet3 device emulation
as far as the guest is concerned.

> 
> > 3. Guest Management: No hardware specific drivers need to be installed in the
> > guest virtual machine and hence no overheads are incurred for guest management.
> > All software for the driver (including the PF driver and the plugin) is
> > installed in the hypervisor.
> 
> So we have a plugin per hardware VF implementation?  And the hypervisor
> injects this code into the guest?

One guest-agnostic plugin per VF implementation. Yes, the plugin is injected
into the guest by the hypervisor.

> > The plugin image is provided by the IHVs along with the PF driver and is
> > packaged in the hypervisor. The plugin image is OS agnostic and can be loaded
> > either into a Linux VM or a Windows VM. The plugin is written against the Shell
> 
> And it will need to be GPL AFAICT from what you've said thus far.  It
> does sound worrisome, although I suppose hw firmware isn't particularly
> different.

Yes it would be GPL and we are thinking of enforcing the license in the
hypervisor as well as in the shell.

> How does the shell switch back to emulated mode for live migration?

The hypervisor sends a notification to the shell to switch out of passthrough
and it quiesces the VF and tears down the mapping between VF and the guest. The
shell free's up the buffers and other resources on behalf of the plugin and
reinitializes the s/w vmxnet3 emulation plugin.

> Please make this shell API interface and the PF/VF requirments available.

We have an internal prototype working but we are not yet ready to post the
patch to LKML. We are still in the process of making changes to our windows
driver and want to ensure that we take into account all changes that could
happen.

Thanks,

-pankaj


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
  2010-05-05  0:58   ` Chris Wright
  (?)
@ 2010-05-05 19:00   ` Pankaj Thakkar
  -1 siblings, 0 replies; 33+ messages in thread
From: Pankaj Thakkar @ 2010-05-05 19:00 UTC (permalink / raw)
  To: Chris Wright; +Cc: kvm, pv-drivers, netdev, linux-kernel, virtualization

On Tue, May 04, 2010 at 05:58:52PM -0700, Chris Wright wrote:
> Date: Tue, 4 May 2010 17:58:52 -0700
> From: Chris Wright <chrisw@sous-sol.org>
> To: Pankaj Thakkar <pthakkar@vmware.com>
> CC: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
> 	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
> 	"virtualization@lists.linux-foundation.org"
>  <virtualization@lists.linux-foundation.org>,
> 	"pv-drivers@vmware.com" <pv-drivers@vmware.com>,
> 	Shreyas Bhatewara <sbhatewara@vmware.com>,
> 	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
> Subject: Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
> 
> * Pankaj Thakkar (pthakkar@vmware.com) wrote:
> > We intend to upgrade the upstreamed vmxnet3 driver to implement NPA so that
> > Linux users can exploit the benefits provided by passthrough devices in a
> > seamless manner while retaining the benefits of virtualization. The document
> > below tries to answer most of the questions which we anticipated. Please let us
> > know your comments and queries.
> 
> How does the throughput, latency, and host CPU utilization for normal
> data path compare with say NetQueue?

NetQueue is really for scaling across multiple VMs. NPA allows similar scaling
and also helps in improving the CPU efficiency for a single VM since the
hypervisor is bypassed. Througput wise both emulation and passthrough (NPA) can
obtain line rates on 10gig but passthrough saves upto 40% cpu based on the
workload. We did a demo at IDF 2009 where we compared 8 VMs running on NetQueue
v/s 8 VMs running on NPA (using Niantic) and we obtained similar CPU efficiency
gains.

> 
> And does this obsolete your UPT implementation?

NPA and UPT share a lot of code in the hypervisor. UPT was adopted only by very
limited IHVs and hence NPA is our way forward to have all IHVs onboard.

> How many cards actually support this NPA interface?  What does it look
> like, i.e. where is the NPA specification?  (AFAIK, we never got the UPT
> one).

We have it working internally with Intel Niantic (10G) and Kawela (1G) SR-IOV
NIC. We are also working with upcoming Broadcom 10G card and plan to support
other IHVs. This is unlike UPT so we don't dictate the register sets or rings
like we did in UPT. Rather we have guidelines like that the card should have an
embedded switch for inter VF switching or should support programming (rx
filters, VLAN, etc) though the PF driver rather than the VF driver.

> How do you handle hardware which has a more symmetric view of the
> SR-IOV world (SR-IOV is only PCI sepcification, not a network driver
> specification)?  Or hardware which has multiple functions per physical
> port (multiqueue, hw filtering, embedded switch, etc.)?

I am not sure what do you mean by symmetric view of SR-IOV world?

NPA allows multi-queue VFs and requires an embedded switch currently. As far as
the PF driver is concerned we require IHVs to support all existing and upcoming
features like NetQueue, FCoE, etc. The PF driver is considered special and is
used to drive the traffic for the emulated/paravirtualized VMs and is also used
to program things on behalf of the VFs through the hypervisor. If the hardware
has multiple physical functions they are treated as separate adapters (with
their own set of VFs) and we require the embedded switch to maintain that
distinction as well.


> > NPA offers several benefits:
> > 1. Performance: Critical performance sensitive paths are not trapped and the
> > guest can directly drive the hardware without incurring virtualization
> > overheads.
> 
> Can you demonstrate with data?

The setup is 2.667Ghz Nehalem server running SLES11 VM talking to a 2.33Ghz
Barcelona client box running RHEL 5.1. We had netperf streams with 16k msg size
over 64k socket size running between server VM and client and they are using
Intel Niantic 10G cards. In both cases (NPA and regular) the VM was CPU
saturated (used one full core).

TX: regular vmxnet3 = 3085.5 Mbps/GHz; NPA vmxnet3 = 4397.2 Mbps/GHz
RX: regular vmxnet3 = 1379.6 Mbps/GHz; NPA vmxnet3 = 2349.7 Mbps/GHz

We have similar results for other configuration and in general we have seen NPA
is better in terms of CPU cost and can save upto 40% of CPU cost.

> 
> > 2. Hypervisor control: All control operations from the guest such as programming
> > MAC address go through the hypervisor layer and hence can be subjected to
> > hypervisor policies. The PF driver can be further used to put policy decisions
> > like which VLAN the guest should be on.
> 
> This can happen without NPA as well.  VF simply needs to request
> the change via the PF (in fact, hw does that right now).  Also, we
> already have a host side management interface via PF (see, for example,
> RTM_SETLINK IFLA_VF_MAC interface).
> 
> What is control plane interface?  Just something like a fixed register set?

All operations other than TX/RX go through the vmxnet3 shell to the vmxnet3
device emulation. So the control plane is really the vmxnet3 device emulation
as far as the guest is concerned.

> 
> > 3. Guest Management: No hardware specific drivers need to be installed in the
> > guest virtual machine and hence no overheads are incurred for guest management.
> > All software for the driver (including the PF driver and the plugin) is
> > installed in the hypervisor.
> 
> So we have a plugin per hardware VF implementation?  And the hypervisor
> injects this code into the guest?

One guest-agnostic plugin per VF implementation. Yes, the plugin is injected
into the guest by the hypervisor.

> > The plugin image is provided by the IHVs along with the PF driver and is
> > packaged in the hypervisor. The plugin image is OS agnostic and can be loaded
> > either into a Linux VM or a Windows VM. The plugin is written against the Shell
> 
> And it will need to be GPL AFAICT from what you've said thus far.  It
> does sound worrisome, although I suppose hw firmware isn't particularly
> different.

Yes it would be GPL and we are thinking of enforcing the license in the
hypervisor as well as in the shell.

> How does the shell switch back to emulated mode for live migration?

The hypervisor sends a notification to the shell to switch out of passthrough
and it quiesces the VF and tears down the mapping between VF and the guest. The
shell free's up the buffers and other resources on behalf of the plugin and
reinitializes the s/w vmxnet3 emulation plugin.

> Please make this shell API interface and the PF/VF requirments available.

We have an internal prototype working but we are not yet ready to post the
patch to LKML. We are still in the process of making changes to our windows
driver and want to ensure that we take into account all changes that could
happen.

Thanks,

-pankaj

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
  2010-05-04 23:02 Pankaj Thakkar
                   ` (4 preceding siblings ...)
  2010-05-05 17:23 ` Christoph Hellwig
@ 2010-05-05 17:59 ` Avi Kivity
  2010-05-05 19:44     ` Pankaj Thakkar
  2010-05-05 19:44   ` Pankaj Thakkar
  2010-05-05 17:59 ` Avi Kivity
  6 siblings, 2 replies; 33+ messages in thread
From: Avi Kivity @ 2010-05-05 17:59 UTC (permalink / raw)
  To: Pankaj Thakkar
  Cc: linux-kernel, netdev, virtualization, pv-drivers, sbhatewara

On 05/05/2010 02:02 AM, Pankaj Thakkar wrote:
> 2. Hypervisor control: All control operations from the guest such as programming
> MAC address go through the hypervisor layer and hence can be subjected to
> hypervisor policies. The PF driver can be further used to put policy decisions
> like which VLAN the guest should be on.
>    

Is this enforced?  Since you pass the hardware through, you can't rely 
on the guest actually doing this, yes?

> The plugin image is provided by the IHVs along with the PF driver and is
> packaged in the hypervisor. The plugin image is OS agnostic and can be loaded
> either into a Linux VM or a Windows VM. The plugin is written against the Shell
> API interface which the shell is responsible for implementing. The API
> interface allows the plugin to do TX and RX only by programming the hardware
> rings (along with things like buffer allocation and basic initialization). The
> virtual machine comes up in paravirtualized/emulated mode when it is booted.
> The hypervisor allocates the VF and other resources and notifies the shell of
> the availability of the VF. The hypervisor injects the plugin into memory
> location specified by the shell. The shell initializes the plugin by calling
> into a known entry point and the plugin initializes the data path. The control
> path is already initialized by the PF driver when the VF is allocated. At this
> point the shell switches to using the loaded plugin to do all further TX and RX
> operations. The guest networking stack does not participate in these operations
> and continues to function normally. All the control operations continue being
> trapped by the hypervisor and are directed to the PF driver as needed. For
> example, if the MAC address changes the hypervisor updates its internal state
> and changes the state of the embedded switch as well through the PF control
> API.
>    

This is essentially a miniature network stack with a its own mini 
bonding layer, mini hotplug, and mini API, except s/API/ABI/.  Is this a 
correct view?

If so, the Linuxy approach would be to use the ordinary drivers and the 
Linux networking API, and hide the bond setup using namespaces.  The 
bond driver, or perhaps a new, similar, driver can be enhanced to 
propagate ethtool commands to its (hidden) components, and to have a 
control channel with the hypervisor.

This would make the approach hypervisor agnostic, you're just pairing 
two devices and presenting them to the rest of the stack as a single device.

> We have reworked our existing Linux vmxnet3 driver to accomodate NPA by
> splitting the driver into two parts: Shell and Plugin. The new split driver is
>    

So the Shell would be the reworked or new bond driver, and Plugins would 
be ordinary Linux network drivers.


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
  2010-05-04 23:02 Pankaj Thakkar
                   ` (5 preceding siblings ...)
  2010-05-05 17:59 ` Avi Kivity
@ 2010-05-05 17:59 ` Avi Kivity
  6 siblings, 0 replies; 33+ messages in thread
From: Avi Kivity @ 2010-05-05 17:59 UTC (permalink / raw)
  To: Pankaj Thakkar; +Cc: pv-drivers, netdev, linux-kernel, virtualization

On 05/05/2010 02:02 AM, Pankaj Thakkar wrote:
> 2. Hypervisor control: All control operations from the guest such as programming
> MAC address go through the hypervisor layer and hence can be subjected to
> hypervisor policies. The PF driver can be further used to put policy decisions
> like which VLAN the guest should be on.
>    

Is this enforced?  Since you pass the hardware through, you can't rely 
on the guest actually doing this, yes?

> The plugin image is provided by the IHVs along with the PF driver and is
> packaged in the hypervisor. The plugin image is OS agnostic and can be loaded
> either into a Linux VM or a Windows VM. The plugin is written against the Shell
> API interface which the shell is responsible for implementing. The API
> interface allows the plugin to do TX and RX only by programming the hardware
> rings (along with things like buffer allocation and basic initialization). The
> virtual machine comes up in paravirtualized/emulated mode when it is booted.
> The hypervisor allocates the VF and other resources and notifies the shell of
> the availability of the VF. The hypervisor injects the plugin into memory
> location specified by the shell. The shell initializes the plugin by calling
> into a known entry point and the plugin initializes the data path. The control
> path is already initialized by the PF driver when the VF is allocated. At this
> point the shell switches to using the loaded plugin to do all further TX and RX
> operations. The guest networking stack does not participate in these operations
> and continues to function normally. All the control operations continue being
> trapped by the hypervisor and are directed to the PF driver as needed. For
> example, if the MAC address changes the hypervisor updates its internal state
> and changes the state of the embedded switch as well through the PF control
> API.
>    

This is essentially a miniature network stack with a its own mini 
bonding layer, mini hotplug, and mini API, except s/API/ABI/.  Is this a 
correct view?

If so, the Linuxy approach would be to use the ordinary drivers and the 
Linux networking API, and hide the bond setup using namespaces.  The 
bond driver, or perhaps a new, similar, driver can be enhanced to 
propagate ethtool commands to its (hidden) components, and to have a 
control channel with the hypervisor.

This would make the approach hypervisor agnostic, you're just pairing 
two devices and presenting them to the rest of the stack as a single device.

> We have reworked our existing Linux vmxnet3 driver to accomodate NPA by
> splitting the driver into two parts: Shell and Plugin. The new split driver is
>    

So the Shell would be the reworked or new bond driver, and Plugins would 
be ordinary Linux network drivers.


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
  2010-05-04 23:02 Pankaj Thakkar
                   ` (3 preceding siblings ...)
  2010-05-05 17:23 ` Christoph Hellwig
@ 2010-05-05 17:23 ` Christoph Hellwig
  2010-05-05 17:59 ` Avi Kivity
  2010-05-05 17:59 ` Avi Kivity
  6 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2010-05-05 17:23 UTC (permalink / raw)
  To: Pankaj Thakkar
  Cc: linux-kernel, netdev, virtualization, pv-drivers, sbhatewara

On Tue, May 04, 2010 at 04:02:25PM -0700, Pankaj Thakkar wrote:
> The plugin image is provided by the IHVs along with the PF driver and is
> packaged in the hypervisor. The plugin image is OS agnostic and can be loaded
> either into a Linux VM or a Windows VM. The plugin is written against the Shell
> API interface which the shell is responsible for implementing. The API

We're not going to add any kind of loader for binry blobs into kernel
space, sorry.  Don't even bother wasting your time on this.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
  2010-05-04 23:02 Pankaj Thakkar
                   ` (2 preceding siblings ...)
  2010-05-05  0:58   ` Chris Wright
@ 2010-05-05 17:23 ` Christoph Hellwig
  2010-05-05 17:23 ` Christoph Hellwig
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2010-05-05 17:23 UTC (permalink / raw)
  To: Pankaj Thakkar; +Cc: pv-drivers, netdev, linux-kernel, virtualization

On Tue, May 04, 2010 at 04:02:25PM -0700, Pankaj Thakkar wrote:
> The plugin image is provided by the IHVs along with the PF driver and is
> packaged in the hypervisor. The plugin image is OS agnostic and can be loaded
> either into a Linux VM or a Windows VM. The plugin is written against the Shell
> API interface which the shell is responsible for implementing. The API

We're not going to add any kind of loader for binry blobs into kernel
space, sorry.  Don't even bother wasting your time on this.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
  2010-05-05  0:18     ` Pankaj Thakkar
@ 2010-05-05  2:44       ` Stephen Hemminger
  -1 siblings, 0 replies; 33+ messages in thread
From: Stephen Hemminger @ 2010-05-05  2:44 UTC (permalink / raw)
  To: Pankaj Thakkar
  Cc: linux-kernel, netdev, virtualization, pv-drivers, Shreyas Bhatewara

On Tue, 4 May 2010 17:18:57 -0700
Pankaj Thakkar <pthakkar@vmware.com> wrote:

> The purpose of this email is to introduce the architecture and the design principles. The overall project involves more than just changes to vmxnet3 driver and hence we though an overview email would be better. Once people agree to the design in general we intend to provide the code changes to the vmxnet3 driver.

As Dave said, we care more about what the implementation looks like than the high level
goals of the design. I think we all agree that better management of virtualized devices
is necessary, the problem is that their are so many of them (vmware, xen, HV, Xen), 
and vendors seem to to lean on their own specific implementation of a offloading, 
which makes a general solution more difficult. Please, Please solve this cleanly.

The little things like API's and locking semantics and handling of dynamic versus
static control can make a good design in principle fall apart when someone does a bad
job of implementing them.

Lastly, projects that have had multiple people involved for long periods of time
in the dark often end up building a legacy mentality "but we convinced vendor XXX to include it
in their Enterprise version 666" and require lots of "retraining" before the code
becomes acceptable.

-- 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
@ 2010-05-05  2:44       ` Stephen Hemminger
  0 siblings, 0 replies; 33+ messages in thread
From: Stephen Hemminger @ 2010-05-05  2:44 UTC (permalink / raw)
  To: Pankaj Thakkar
  Cc: linux-kernel, netdev, virtualization, pv-drivers, Shreyas Bhatewara

On Tue, 4 May 2010 17:18:57 -0700
Pankaj Thakkar <pthakkar@vmware.com> wrote:

> The purpose of this email is to introduce the architecture and the design principles. The overall project involves more than just changes to vmxnet3 driver and hence we though an overview email would be better. Once people agree to the design in general we intend to provide the code changes to the vmxnet3 driver.

As Dave said, we care more about what the implementation looks like than the high level
goals of the design. I think we all agree that better management of virtualized devices
is necessary, the problem is that their are so many of them (vmware, xen, HV, Xen), 
and vendors seem to to lean on their own specific implementation of a offloading, 
which makes a general solution more difficult. Please, Please solve this cleanly.

The little things like API's and locking semantics and handling of dynamic versus
static control can make a good design in principle fall apart when someone does a bad
job of implementing them.

Lastly, projects that have had multiple people involved for long periods of time
in the dark often end up building a legacy mentality "but we convinced vendor XXX to include it
in their Enterprise version 666" and require lots of "retraining" before the code
becomes acceptable.

-- 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
  2010-05-05  0:18     ` Pankaj Thakkar
  (?)
  (?)
@ 2010-05-05  2:44     ` Stephen Hemminger
  -1 siblings, 0 replies; 33+ messages in thread
From: Stephen Hemminger @ 2010-05-05  2:44 UTC (permalink / raw)
  To: Pankaj Thakkar; +Cc: pv-drivers, netdev, linux-kernel, virtualization

On Tue, 4 May 2010 17:18:57 -0700
Pankaj Thakkar <pthakkar@vmware.com> wrote:

> The purpose of this email is to introduce the architecture and the design principles. The overall project involves more than just changes to vmxnet3 driver and hence we though an overview email would be better. Once people agree to the design in general we intend to provide the code changes to the vmxnet3 driver.

As Dave said, we care more about what the implementation looks like than the high level
goals of the design. I think we all agree that better management of virtualized devices
is necessary, the problem is that their are so many of them (vmware, xen, HV, Xen), 
and vendors seem to to lean on their own specific implementation of a offloading, 
which makes a general solution more difficult. Please, Please solve this cleanly.

The little things like API's and locking semantics and handling of dynamic versus
static control can make a good design in principle fall apart when someone does a bad
job of implementing them.

Lastly, projects that have had multiple people involved for long periods of time
in the dark often end up building a legacy mentality "but we convinced vendor XXX to include it
in their Enterprise version 666" and require lots of "retraining" before the code
becomes acceptable.

-- 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
  2010-05-04 23:02 Pankaj Thakkar
@ 2010-05-05  0:58   ` Chris Wright
  2010-05-05  0:05 ` Stephen Hemminger
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Chris Wright @ 2010-05-05  0:58 UTC (permalink / raw)
  To: Pankaj Thakkar
  Cc: linux-kernel, netdev, virtualization, pv-drivers, sbhatewara, kvm

* Pankaj Thakkar (pthakkar@vmware.com) wrote:
> We intend to upgrade the upstreamed vmxnet3 driver to implement NPA so that
> Linux users can exploit the benefits provided by passthrough devices in a
> seamless manner while retaining the benefits of virtualization. The document
> below tries to answer most of the questions which we anticipated. Please let us
> know your comments and queries.

How does the throughput, latency, and host CPU utilization for normal
data path compare with say NetQueue?

And does this obsolete your UPT implementation?

> Network Plugin Architecture
> ---------------------------
> 
> VMware has been working on various device passthrough technologies for the past
> few years. Passthrough technology is interesting as it can result in better
> performance/cpu utilization for certain demanding applications. In our vSphere
> product we support direct assignment of PCI devices like networking adapters to
> a guest virtual machine. This allows the guest to drive the device using the
> device drivers installed inside the guest. This is similar to the way KVM
> allows for passthrough of PCI devices to the guests. The hypervisor is bypassed
> for all I/O and control operations and hence it can not provide any value add
> features such as live migration, suspend/resume, etc.
> 
> 
> Network Plugin Architecture (NPA) is an approach which VMware has developed in
> joint partnership with Intel which allows us to retain the best of passthrough
> technology and virtualization. NPA allows for passthrough of the fast data
> (I/O) path and lets the hypervisor deal with the slow control path using
> traditional emulation/paravirtualization techniques. Through this splitting of
> data and control path the hypervisor can still provide the above mentioned
> value add features and exploit the performance benefits of passthrough.

How many cards actually support this NPA interface?  What does it look
like, i.e. where is the NPA specification?  (AFAIK, we never got the UPT
one).

> NPA requires SR-IOV hardware which allows for sharing of one single NIC adapter
> by multiple guests. SR-IOV hardware has many logically separate functions
> called virtual functions (VF) which can be independently assigned to the guest
> OS. They also have one or more physical functions (PF) (managed by a PF driver)
> which are used by the hypervisor to control certain aspects of the VFs and the
> rest of the hardware.

How do you handle hardware which has a more symmetric view of the
SR-IOV world (SR-IOV is only PCI sepcification, not a network driver
specification)?  Or hardware which has multiple functions per physical
port (multiqueue, hw filtering, embedded switch, etc.)?

> NPA splits the guest driver into two components called
> the Shell and the Plugin. The shell is responsible for interacting with the
> guest networking stack and funneling the control operations to the hypervisor.
> The plugin is responsible for driving the data path of the virtual function
> exposed to the guest and is specific to the NIC hardware. NPA also requires an
> embedded switch in the NIC to allow for switching traffic among the virtual
> functions. The PF is also used as an uplink to provide connectivity to other
> VMs which are in emulation mode. The figure below shows the major components in
> a block diagram.
> 
>         +------------------------------+
>         |         Guest VM             |
>         |                              |
>         |      +----------------+      |
>         |      | vmxnet3 driver |      |
>         |      |     Shell      |      |
>         |      | +============+ |      |
>         |      | |   Plugin   | |      |
>         +------+-+------------+-+------+
>                 |           .
>                +---------+  .
>                | vmxnet3 |  .
>                |___+-----+  .
>                      |      .
>                      |      .
>                 +----------------------------+
>                 |                            |
>                 |       virtual switch       |
>                 +----------------------------+
>                   |         .               \
>                   |         .                \
>            +=============+  .                 \
>            | PF control  |  .                  \
>            |             |  .                   \
>            |  L2 driver  |  .                    \
>            +-------------+  .                     \
>                   |         .                      \
>                   |         .                       \
>                 +------------------------+     +------------+
>                 | PF   VF1 VF2 ...   VFn |     |            |
>                 |                        |     |  regular   |
>                 |       SR-IOV NIC       |     |    nic     |
>                 |    +--------------+    |     |   +--------+
>                 |    |   embedded   |    |     +---+
>                 |    |    switch    |    |
>                 |    +--------------+    |
>                 |        +---------------+
>                 +--------+
> 
> NPA offers several benefits:
> 1. Performance: Critical performance sensitive paths are not trapped and the
> guest can directly drive the hardware without incurring virtualization
> overheads.

Can you demonstrate with data?

> 2. Hypervisor control: All control operations from the guest such as programming
> MAC address go through the hypervisor layer and hence can be subjected to
> hypervisor policies. The PF driver can be further used to put policy decisions
> like which VLAN the guest should be on.

This can happen without NPA as well.  VF simply needs to request
the change via the PF (in fact, hw does that right now).  Also, we
already have a host side management interface via PF (see, for example,
RTM_SETLINK IFLA_VF_MAC interface).

What is control plane interface?  Just something like a fixed register set?

> 3. Guest Management: No hardware specific drivers need to be installed in the
> guest virtual machine and hence no overheads are incurred for guest management.
> All software for the driver (including the PF driver and the plugin) is
> installed in the hypervisor.

So we have a plugin per hardware VF implementation?  And the hypervisor
injects this code into the guest?

> 4. IHV independence: The architecture provides guidelines for splitting the
> functionality between the VFs and PF but does not dictate how the hardware
> should be implemented. It gives the IHV the freedom to do asynchronous updates
> either to the software or the hardware to work around any defects.

Yes, this is important, esp. instead of the requirement for hw to
implement a specific interface (I suspect you know all about this issue
already).

> The fundamental tenet in NPA is to let the hypervisor control the passthrough
> functionality with minimal guest intervention. This gives a lot of flexibility
> to the hypervisor which can then treat passthrough as an offload feature (just
> like TSO, LRO, etc) which is offered to the guest virtual machine when there
> are no conflicting features present. For example, if the hypervisor wants to
> migrate the virtual machine from one host to another, the hypervisor can switch
> the virtual machine out of passthrough mode into paravirtualized/emulated mode
> and it can use existing technique to migrate the virtual machine. Once the
> virtual machine is migrated to the destination host the hypervisor can switch
> the virtual machine back to passthrough mode if a supporting SR-IOV nic is
> present. This may involve reloading of a different plugin corresponding to the
> new SR-IOV hardware.
> 
> Internally we have explored various other options before settling on the NPA
> approach. For example there are approaches which create a bonding driver on top
> of a complete passthrough of a NIC device and an emulated/paravirtualized
> device. Though this approach allows for live migration to work it adds a lot of
> complexity and dependency. First the hypervisor has to rely on a guest with
> hot-add support. Second the hypervisor has to depend on the guest networking
> stack to cooperate to perform migration. Third the guest has to carry the
> driver images for all possible hardware to which the guest may migrate to.
> Fourth the hypervisor does not get full control for all the policy decisions.
> Another approach we have considered is to have a uniform interface for the data
> path between the emulated/paravirtualized device and the hardware device which
> allows the hypervisor to seamlessly switch from the emulated interface to the
> hardware interface. Though this approach is very attractive and can work
> without any guest involvement it is not acceptable to the IHVs as it does not
> give them the freedom to fix bugs/erratas and differentiate from each other. We
> believe NPA approach provides the right level of control and flexibility to the
> hypervisors while letting the guest exploit the benefits of passthrough.

> The plugin image is provided by the IHVs along with the PF driver and is
> packaged in the hypervisor. The plugin image is OS agnostic and can be loaded
> either into a Linux VM or a Windows VM. The plugin is written against the Shell

And it will need to be GPL AFAICT from what you've said thus far.  It
does sound worrisome, although I suppose hw firmware isn't particularly
different.

> API interface which the shell is responsible for implementing. The API
> interface allows the plugin to do TX and RX only by programming the hardware
> rings (along with things like buffer allocation and basic initialization). The
> virtual machine comes up in paravirtualized/emulated mode when it is booted.
> The hypervisor allocates the VF and other resources and notifies the shell of
> the availability of the VF. The hypervisor injects the plugin into memory
> location specified by the shell. The shell initializes the plugin by calling
> into a known entry point and the plugin initializes the data path. The control
> path is already initialized by the PF driver when the VF is allocated. At this
> point the shell switches to using the loaded plugin to do all further TX and RX
> operations. The guest networking stack does not participate in these operations
> and continues to function normally. All the control operations continue being
> trapped by the hypervisor and are directed to the PF driver as needed. For
> example, if the MAC address changes the hypervisor updates its internal state
> and changes the state of the embedded switch as well through the PF control
> API.

How does the shell switch back to emulated mode for live migration?

> We have reworked our existing Linux vmxnet3 driver to accomodate NPA by
> splitting the driver into two parts: Shell and Plugin. The new split driver is
> backwards compatible and continues to work on old/existing vmxnet3 device
> emulations. The shell implements the API interface and contains code to do the
> bookkeeping for TX/RX buffers along with interrupt management. The shell code
> also handles the loading of the plugin and verifying the license of the loaded
> plugin. The plugin contains the code specific to vmxnet3 ring and descriptor
> management. The plugin uses the same Shell API interface which would be used by
> other IHVs. This vmxnet3 plugin is compiled statically along with the shell as
> this is needed to provide connectivity when there is no underlying SR-IOV
> device present. The IHV plugins are required to be distributed under GPL
> license and we are currently looking at ways to verify this both within the
> hypervisor and within the shell.

Please make this shell API interface and the PF/VF requirments available.

thanks,
-chris

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
@ 2010-05-05  0:58   ` Chris Wright
  0 siblings, 0 replies; 33+ messages in thread
From: Chris Wright @ 2010-05-05  0:58 UTC (permalink / raw)
  To: Pankaj Thakkar; +Cc: kvm, pv-drivers, netdev, linux-kernel, virtualization

* Pankaj Thakkar (pthakkar@vmware.com) wrote:
> We intend to upgrade the upstreamed vmxnet3 driver to implement NPA so that
> Linux users can exploit the benefits provided by passthrough devices in a
> seamless manner while retaining the benefits of virtualization. The document
> below tries to answer most of the questions which we anticipated. Please let us
> know your comments and queries.

How does the throughput, latency, and host CPU utilization for normal
data path compare with say NetQueue?

And does this obsolete your UPT implementation?

> Network Plugin Architecture
> ---------------------------
> 
> VMware has been working on various device passthrough technologies for the past
> few years. Passthrough technology is interesting as it can result in better
> performance/cpu utilization for certain demanding applications. In our vSphere
> product we support direct assignment of PCI devices like networking adapters to
> a guest virtual machine. This allows the guest to drive the device using the
> device drivers installed inside the guest. This is similar to the way KVM
> allows for passthrough of PCI devices to the guests. The hypervisor is bypassed
> for all I/O and control operations and hence it can not provide any value add
> features such as live migration, suspend/resume, etc.
> 
> 
> Network Plugin Architecture (NPA) is an approach which VMware has developed in
> joint partnership with Intel which allows us to retain the best of passthrough
> technology and virtualization. NPA allows for passthrough of the fast data
> (I/O) path and lets the hypervisor deal with the slow control path using
> traditional emulation/paravirtualization techniques. Through this splitting of
> data and control path the hypervisor can still provide the above mentioned
> value add features and exploit the performance benefits of passthrough.

How many cards actually support this NPA interface?  What does it look
like, i.e. where is the NPA specification?  (AFAIK, we never got the UPT
one).

> NPA requires SR-IOV hardware which allows for sharing of one single NIC adapter
> by multiple guests. SR-IOV hardware has many logically separate functions
> called virtual functions (VF) which can be independently assigned to the guest
> OS. They also have one or more physical functions (PF) (managed by a PF driver)
> which are used by the hypervisor to control certain aspects of the VFs and the
> rest of the hardware.

How do you handle hardware which has a more symmetric view of the
SR-IOV world (SR-IOV is only PCI sepcification, not a network driver
specification)?  Or hardware which has multiple functions per physical
port (multiqueue, hw filtering, embedded switch, etc.)?

> NPA splits the guest driver into two components called
> the Shell and the Plugin. The shell is responsible for interacting with the
> guest networking stack and funneling the control operations to the hypervisor.
> The plugin is responsible for driving the data path of the virtual function
> exposed to the guest and is specific to the NIC hardware. NPA also requires an
> embedded switch in the NIC to allow for switching traffic among the virtual
> functions. The PF is also used as an uplink to provide connectivity to other
> VMs which are in emulation mode. The figure below shows the major components in
> a block diagram.
> 
>         +------------------------------+
>         |         Guest VM             |
>         |                              |
>         |      +----------------+      |
>         |      | vmxnet3 driver |      |
>         |      |     Shell      |      |
>         |      | +============+ |      |
>         |      | |   Plugin   | |      |
>         +------+-+------------+-+------+
>                 |           .
>                +---------+  .
>                | vmxnet3 |  .
>                |___+-----+  .
>                      |      .
>                      |      .
>                 +----------------------------+
>                 |                            |
>                 |       virtual switch       |
>                 +----------------------------+
>                   |         .               \
>                   |         .                \
>            +=============+  .                 \
>            | PF control  |  .                  \
>            |             |  .                   \
>            |  L2 driver  |  .                    \
>            +-------------+  .                     \
>                   |         .                      \
>                   |         .                       \
>                 +------------------------+     +------------+
>                 | PF   VF1 VF2 ...   VFn |     |            |
>                 |                        |     |  regular   |
>                 |       SR-IOV NIC       |     |    nic     |
>                 |    +--------------+    |     |   +--------+
>                 |    |   embedded   |    |     +---+
>                 |    |    switch    |    |
>                 |    +--------------+    |
>                 |        +---------------+
>                 +--------+
> 
> NPA offers several benefits:
> 1. Performance: Critical performance sensitive paths are not trapped and the
> guest can directly drive the hardware without incurring virtualization
> overheads.

Can you demonstrate with data?

> 2. Hypervisor control: All control operations from the guest such as programming
> MAC address go through the hypervisor layer and hence can be subjected to
> hypervisor policies. The PF driver can be further used to put policy decisions
> like which VLAN the guest should be on.

This can happen without NPA as well.  VF simply needs to request
the change via the PF (in fact, hw does that right now).  Also, we
already have a host side management interface via PF (see, for example,
RTM_SETLINK IFLA_VF_MAC interface).

What is control plane interface?  Just something like a fixed register set?

> 3. Guest Management: No hardware specific drivers need to be installed in the
> guest virtual machine and hence no overheads are incurred for guest management.
> All software for the driver (including the PF driver and the plugin) is
> installed in the hypervisor.

So we have a plugin per hardware VF implementation?  And the hypervisor
injects this code into the guest?

> 4. IHV independence: The architecture provides guidelines for splitting the
> functionality between the VFs and PF but does not dictate how the hardware
> should be implemented. It gives the IHV the freedom to do asynchronous updates
> either to the software or the hardware to work around any defects.

Yes, this is important, esp. instead of the requirement for hw to
implement a specific interface (I suspect you know all about this issue
already).

> The fundamental tenet in NPA is to let the hypervisor control the passthrough
> functionality with minimal guest intervention. This gives a lot of flexibility
> to the hypervisor which can then treat passthrough as an offload feature (just
> like TSO, LRO, etc) which is offered to the guest virtual machine when there
> are no conflicting features present. For example, if the hypervisor wants to
> migrate the virtual machine from one host to another, the hypervisor can switch
> the virtual machine out of passthrough mode into paravirtualized/emulated mode
> and it can use existing technique to migrate the virtual machine. Once the
> virtual machine is migrated to the destination host the hypervisor can switch
> the virtual machine back to passthrough mode if a supporting SR-IOV nic is
> present. This may involve reloading of a different plugin corresponding to the
> new SR-IOV hardware.
> 
> Internally we have explored various other options before settling on the NPA
> approach. For example there are approaches which create a bonding driver on top
> of a complete passthrough of a NIC device and an emulated/paravirtualized
> device. Though this approach allows for live migration to work it adds a lot of
> complexity and dependency. First the hypervisor has to rely on a guest with
> hot-add support. Second the hypervisor has to depend on the guest networking
> stack to cooperate to perform migration. Third the guest has to carry the
> driver images for all possible hardware to which the guest may migrate to.
> Fourth the hypervisor does not get full control for all the policy decisions.
> Another approach we have considered is to have a uniform interface for the data
> path between the emulated/paravirtualized device and the hardware device which
> allows the hypervisor to seamlessly switch from the emulated interface to the
> hardware interface. Though this approach is very attractive and can work
> without any guest involvement it is not acceptable to the IHVs as it does not
> give them the freedom to fix bugs/erratas and differentiate from each other. We
> believe NPA approach provides the right level of control and flexibility to the
> hypervisors while letting the guest exploit the benefits of passthrough.

> The plugin image is provided by the IHVs along with the PF driver and is
> packaged in the hypervisor. The plugin image is OS agnostic and can be loaded
> either into a Linux VM or a Windows VM. The plugin is written against the Shell

And it will need to be GPL AFAICT from what you've said thus far.  It
does sound worrisome, although I suppose hw firmware isn't particularly
different.

> API interface which the shell is responsible for implementing. The API
> interface allows the plugin to do TX and RX only by programming the hardware
> rings (along with things like buffer allocation and basic initialization). The
> virtual machine comes up in paravirtualized/emulated mode when it is booted.
> The hypervisor allocates the VF and other resources and notifies the shell of
> the availability of the VF. The hypervisor injects the plugin into memory
> location specified by the shell. The shell initializes the plugin by calling
> into a known entry point and the plugin initializes the data path. The control
> path is already initialized by the PF driver when the VF is allocated. At this
> point the shell switches to using the loaded plugin to do all further TX and RX
> operations. The guest networking stack does not participate in these operations
> and continues to function normally. All the control operations continue being
> trapped by the hypervisor and are directed to the PF driver as needed. For
> example, if the MAC address changes the hypervisor updates its internal state
> and changes the state of the embedded switch as well through the PF control
> API.

How does the shell switch back to emulated mode for live migration?

> We have reworked our existing Linux vmxnet3 driver to accomodate NPA by
> splitting the driver into two parts: Shell and Plugin. The new split driver is
> backwards compatible and continues to work on old/existing vmxnet3 device
> emulations. The shell implements the API interface and contains code to do the
> bookkeeping for TX/RX buffers along with interrupt management. The shell code
> also handles the loading of the plugin and verifying the license of the loaded
> plugin. The plugin contains the code specific to vmxnet3 ring and descriptor
> management. The plugin uses the same Shell API interface which would be used by
> other IHVs. This vmxnet3 plugin is compiled statically along with the shell as
> this is needed to provide connectivity when there is no underlying SR-IOV
> device present. The IHV plugins are required to be distributed under GPL
> license and we are currently looking at ways to verify this both within the
> hypervisor and within the shell.

Please make this shell API interface and the PF/VF requirments available.

thanks,
-chris

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
  2010-05-05  0:32       ` David Miller
@ 2010-05-05  0:38         ` Pankaj Thakkar
  -1 siblings, 0 replies; 33+ messages in thread
From: Pankaj Thakkar @ 2010-05-05  0:38 UTC (permalink / raw)
  To: David Miller
  Cc: shemminger, linux-kernel, netdev, virtualization, pv-drivers,
	Shreyas Bhatewara

Sure. We have been working on NPA for a while and have the code internally up
and running. Let me sync up internally on how and when we can provide the
vmxnet3 driver code so that people can look at it.


On Tue, May 04, 2010 at 05:32:36PM -0700, David Miller wrote:
> Date: Tue, 4 May 2010 17:32:36 -0700
> From: David Miller <davem@davemloft.net>
> To: Pankaj Thakkar <pthakkar@vmware.com>
> CC: "shemminger@vyatta.com" <shemminger@vyatta.com>,
> 	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
> 	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
> 	"virtualization@lists.linux-foundation.org"
>  <virtualization@lists.linux-foundation.org>,
> 	"pv-drivers@vmware.com" <pv-drivers@vmware.com>,
> 	Shreyas Bhatewara <sbhatewara@vmware.com>
> Subject: Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
> 
> From: Pankaj Thakkar <pthakkar@vmware.com>
> Date: Tue, 4 May 2010 17:18:57 -0700
> 
> > The purpose of this email is to introduce the architecture and the
> > design principles. The overall project involves more than just
> > changes to vmxnet3 driver and hence we though an overview email
> > would be better. Once people agree to the design in general we
> > intend to provide the code changes to the vmxnet3 driver.
> 
> Stephen's point is that code talks and bullshit walks.
> 
> Talk about high level designs rarely gets any traction, and often goes
> nowhere.  Give us an example implementation so there is something
> concrete for us to sink our teeth into.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
@ 2010-05-05  0:38         ` Pankaj Thakkar
  0 siblings, 0 replies; 33+ messages in thread
From: Pankaj Thakkar @ 2010-05-05  0:38 UTC (permalink / raw)
  To: David Miller
  Cc: shemminger, linux-kernel, netdev, virtualization, pv-drivers,
	Shreyas Bhatewara

Sure. We have been working on NPA for a while and have the code internally up
and running. Let me sync up internally on how and when we can provide the
vmxnet3 driver code so that people can look at it.


On Tue, May 04, 2010 at 05:32:36PM -0700, David Miller wrote:
> Date: Tue, 4 May 2010 17:32:36 -0700
> From: David Miller <davem@davemloft.net>
> To: Pankaj Thakkar <pthakkar@vmware.com>
> CC: "shemminger@vyatta.com" <shemminger@vyatta.com>,
> 	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
> 	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
> 	"virtualization@lists.linux-foundation.org"
>  <virtualization@lists.linux-foundation.org>,
> 	"pv-drivers@vmware.com" <pv-drivers@vmware.com>,
> 	Shreyas Bhatewara <sbhatewara@vmware.com>
> Subject: Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
> 
> From: Pankaj Thakkar <pthakkar@vmware.com>
> Date: Tue, 4 May 2010 17:18:57 -0700
> 
> > The purpose of this email is to introduce the architecture and the
> > design principles. The overall project involves more than just
> > changes to vmxnet3 driver and hence we though an overview email
> > would be better. Once people agree to the design in general we
> > intend to provide the code changes to the vmxnet3 driver.
> 
> Stephen's point is that code talks and bullshit walks.
> 
> Talk about high level designs rarely gets any traction, and often goes
> nowhere.  Give us an example implementation so there is something
> concrete for us to sink our teeth into.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
  2010-05-05  0:32       ` David Miller
  (?)
@ 2010-05-05  0:38       ` Pankaj Thakkar
  -1 siblings, 0 replies; 33+ messages in thread
From: Pankaj Thakkar @ 2010-05-05  0:38 UTC (permalink / raw)
  To: David Miller; +Cc: pv-drivers, netdev, linux-kernel, virtualization, shemminger

Sure. We have been working on NPA for a while and have the code internally up
and running. Let me sync up internally on how and when we can provide the
vmxnet3 driver code so that people can look at it.


On Tue, May 04, 2010 at 05:32:36PM -0700, David Miller wrote:
> Date: Tue, 4 May 2010 17:32:36 -0700
> From: David Miller <davem@davemloft.net>
> To: Pankaj Thakkar <pthakkar@vmware.com>
> CC: "shemminger@vyatta.com" <shemminger@vyatta.com>,
> 	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
> 	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
> 	"virtualization@lists.linux-foundation.org"
>  <virtualization@lists.linux-foundation.org>,
> 	"pv-drivers@vmware.com" <pv-drivers@vmware.com>,
> 	Shreyas Bhatewara <sbhatewara@vmware.com>
> Subject: Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
> 
> From: Pankaj Thakkar <pthakkar@vmware.com>
> Date: Tue, 4 May 2010 17:18:57 -0700
> 
> > The purpose of this email is to introduce the architecture and the
> > design principles. The overall project involves more than just
> > changes to vmxnet3 driver and hence we though an overview email
> > would be better. Once people agree to the design in general we
> > intend to provide the code changes to the vmxnet3 driver.
> 
> Stephen's point is that code talks and bullshit walks.
> 
> Talk about high level designs rarely gets any traction, and often goes
> nowhere.  Give us an example implementation so there is something
> concrete for us to sink our teeth into.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
  2010-05-05  0:18     ` Pankaj Thakkar
@ 2010-05-05  0:32       ` David Miller
  -1 siblings, 0 replies; 33+ messages in thread
From: David Miller @ 2010-05-05  0:32 UTC (permalink / raw)
  To: pthakkar
  Cc: shemminger, linux-kernel, netdev, virtualization, pv-drivers, sbhatewara

From: Pankaj Thakkar <pthakkar@vmware.com>
Date: Tue, 4 May 2010 17:18:57 -0700

> The purpose of this email is to introduce the architecture and the
> design principles. The overall project involves more than just
> changes to vmxnet3 driver and hence we though an overview email
> would be better. Once people agree to the design in general we
> intend to provide the code changes to the vmxnet3 driver.

Stephen's point is that code talks and bullshit walks.

Talk about high level designs rarely gets any traction, and often goes
nowhere.  Give us an example implementation so there is something
concrete for us to sink our teeth into.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
@ 2010-05-05  0:32       ` David Miller
  0 siblings, 0 replies; 33+ messages in thread
From: David Miller @ 2010-05-05  0:32 UTC (permalink / raw)
  To: pthakkar; +Cc: pv-drivers, netdev, linux-kernel, virtualization, shemminger

From: Pankaj Thakkar <pthakkar@vmware.com>
Date: Tue, 4 May 2010 17:18:57 -0700

> The purpose of this email is to introduce the architecture and the
> design principles. The overall project involves more than just
> changes to vmxnet3 driver and hence we though an overview email
> would be better. Once people agree to the design in general we
> intend to provide the code changes to the vmxnet3 driver.

Stephen's point is that code talks and bullshit walks.

Talk about high level designs rarely gets any traction, and often goes
nowhere.  Give us an example implementation so there is something
concrete for us to sink our teeth into.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
  2010-05-05  0:05 ` Stephen Hemminger
@ 2010-05-05  0:18     ` Pankaj Thakkar
  2010-05-05  0:18     ` Pankaj Thakkar
  1 sibling, 0 replies; 33+ messages in thread
From: Pankaj Thakkar @ 2010-05-05  0:18 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: linux-kernel, netdev, virtualization, pv-drivers, Shreyas Bhatewara

The purpose of this email is to introduce the architecture and the design principles. The overall project involves more than just changes to vmxnet3 driver and hence we though an overview email would be better. Once people agree to the design in general we intend to provide the code changes to the vmxnet3 driver.

The architecture supports more than Intel NICs. We started the project with Intel but plan to support all major IHVs including Broadcom, Qlogic, Emulex and others through a certification program. The architecture works on VMware ESX server only as it requires significant support from the hypervisor. Also, the vmxnet3 driver works on VMware platform only. AFAICT Xen has a different model for supporting SR-IOV devices and allowing live migration and the document briefly talks about it (paragraph 6).

Thanks,

-pankaj


On Tue, May 04, 2010 at 05:05:31PM -0700, Stephen Hemminger wrote:
> Date: Tue, 4 May 2010 17:05:31 -0700
> From: Stephen Hemminger <shemminger@vyatta.com>
> To: Pankaj Thakkar <pthakkar@vmware.com>
> CC: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
> 	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
> 	"virtualization@lists.linux-foundation.org"
>  <virtualization@lists.linux-foundation.org>,
> 	"pv-drivers@vmware.com" <pv-drivers@vmware.com>,
> 	Shreyas Bhatewara <sbhatewara@vmware.com>
> Subject: Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
> 
> On Tue, 4 May 2010 16:02:25 -0700
> Pankaj Thakkar <pthakkar@vmware.com> wrote:
> 
> > Device passthrough technology allows a guest to bypass the hypervisor and drive
> > the underlying physical device. VMware has been exploring various ways to
> > deliver this technology to users in a manner which is easy to adopt. In this
> > process we have prepared an architecture along with Intel - NPA (Network Plugin
> > Architecture). NPA allows the guest to use the virtualized NIC vmxnet3 to
> > passthrough to a number of physical NICs which support it. The document below
> > provides an overview of NPA.
> > 
> > We intend to upgrade the upstreamed vmxnet3 driver to implement NPA so that
> > Linux users can exploit the benefits provided by passthrough devices in a
> > seamless manner while retaining the benefits of virtualization. The document
> > below tries to answer most of the questions which we anticipated. Please let us
> > know your comments and queries.
> > 
> > Thank you.
> > 
> > Signed-off-by: Pankaj Thakkar <pthakkar@vmware.com>
> 
> 
> Code please. Also, it has to work for all architectures not just VMware and
> Intel.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
@ 2010-05-05  0:18     ` Pankaj Thakkar
  0 siblings, 0 replies; 33+ messages in thread
From: Pankaj Thakkar @ 2010-05-05  0:18 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: linux-kernel, netdev, virtualization, pv-drivers, Shreyas Bhatewara

The purpose of this email is to introduce the architecture and the design principles. The overall project involves more than just changes to vmxnet3 driver and hence we though an overview email would be better. Once people agree to the design in general we intend to provide the code changes to the vmxnet3 driver.

The architecture supports more than Intel NICs. We started the project with Intel but plan to support all major IHVs including Broadcom, Qlogic, Emulex and others through a certification program. The architecture works on VMware ESX server only as it requires significant support from the hypervisor. Also, the vmxnet3 driver works on VMware platform only. AFAICT Xen has a different model for supporting SR-IOV devices and allowing live migration and the document briefly talks about it (paragraph 6).

Thanks,

-pankaj


On Tue, May 04, 2010 at 05:05:31PM -0700, Stephen Hemminger wrote:
> Date: Tue, 4 May 2010 17:05:31 -0700
> From: Stephen Hemminger <shemminger@vyatta.com>
> To: Pankaj Thakkar <pthakkar@vmware.com>
> CC: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
> 	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
> 	"virtualization@lists.linux-foundation.org"
>  <virtualization@lists.linux-foundation.org>,
> 	"pv-drivers@vmware.com" <pv-drivers@vmware.com>,
> 	Shreyas Bhatewara <sbhatewara@vmware.com>
> Subject: Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
> 
> On Tue, 4 May 2010 16:02:25 -0700
> Pankaj Thakkar <pthakkar@vmware.com> wrote:
> 
> > Device passthrough technology allows a guest to bypass the hypervisor and drive
> > the underlying physical device. VMware has been exploring various ways to
> > deliver this technology to users in a manner which is easy to adopt. In this
> > process we have prepared an architecture along with Intel - NPA (Network Plugin
> > Architecture). NPA allows the guest to use the virtualized NIC vmxnet3 to
> > passthrough to a number of physical NICs which support it. The document below
> > provides an overview of NPA.
> > 
> > We intend to upgrade the upstreamed vmxnet3 driver to implement NPA so that
> > Linux users can exploit the benefits provided by passthrough devices in a
> > seamless manner while retaining the benefits of virtualization. The document
> > below tries to answer most of the questions which we anticipated. Please let us
> > know your comments and queries.
> > 
> > Thank you.
> > 
> > Signed-off-by: Pankaj Thakkar <pthakkar@vmware.com>
> 
> 
> Code please. Also, it has to work for all architectures not just VMware and
> Intel.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
  2010-05-05  0:05 ` Stephen Hemminger
@ 2010-05-05  0:18   ` Pankaj Thakkar
  2010-05-05  0:18     ` Pankaj Thakkar
  1 sibling, 0 replies; 33+ messages in thread
From: Pankaj Thakkar @ 2010-05-05  0:18 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: pv-drivers, netdev, linux-kernel, virtualization

The purpose of this email is to introduce the architecture and the design principles. The overall project involves more than just changes to vmxnet3 driver and hence we though an overview email would be better. Once people agree to the design in general we intend to provide the code changes to the vmxnet3 driver.

The architecture supports more than Intel NICs. We started the project with Intel but plan to support all major IHVs including Broadcom, Qlogic, Emulex and others through a certification program. The architecture works on VMware ESX server only as it requires significant support from the hypervisor. Also, the vmxnet3 driver works on VMware platform only. AFAICT Xen has a different model for supporting SR-IOV devices and allowing live migration and the document briefly talks about it (paragraph 6).

Thanks,

-pankaj


On Tue, May 04, 2010 at 05:05:31PM -0700, Stephen Hemminger wrote:
> Date: Tue, 4 May 2010 17:05:31 -0700
> From: Stephen Hemminger <shemminger@vyatta.com>
> To: Pankaj Thakkar <pthakkar@vmware.com>
> CC: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
> 	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
> 	"virtualization@lists.linux-foundation.org"
>  <virtualization@lists.linux-foundation.org>,
> 	"pv-drivers@vmware.com" <pv-drivers@vmware.com>,
> 	Shreyas Bhatewara <sbhatewara@vmware.com>
> Subject: Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
> 
> On Tue, 4 May 2010 16:02:25 -0700
> Pankaj Thakkar <pthakkar@vmware.com> wrote:
> 
> > Device passthrough technology allows a guest to bypass the hypervisor and drive
> > the underlying physical device. VMware has been exploring various ways to
> > deliver this technology to users in a manner which is easy to adopt. In this
> > process we have prepared an architecture along with Intel - NPA (Network Plugin
> > Architecture). NPA allows the guest to use the virtualized NIC vmxnet3 to
> > passthrough to a number of physical NICs which support it. The document below
> > provides an overview of NPA.
> > 
> > We intend to upgrade the upstreamed vmxnet3 driver to implement NPA so that
> > Linux users can exploit the benefits provided by passthrough devices in a
> > seamless manner while retaining the benefits of virtualization. The document
> > below tries to answer most of the questions which we anticipated. Please let us
> > know your comments and queries.
> > 
> > Thank you.
> > 
> > Signed-off-by: Pankaj Thakkar <pthakkar@vmware.com>
> 
> 
> Code please. Also, it has to work for all architectures not just VMware and
> Intel.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
  2010-05-04 23:02 Pankaj Thakkar
@ 2010-05-05  0:05 ` Stephen Hemminger
  2010-05-05  0:18   ` Pankaj Thakkar
  2010-05-05  0:18     ` Pankaj Thakkar
  2010-05-05  0:05 ` Stephen Hemminger
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 33+ messages in thread
From: Stephen Hemminger @ 2010-05-05  0:05 UTC (permalink / raw)
  To: Pankaj Thakkar
  Cc: linux-kernel, netdev, virtualization, pv-drivers, sbhatewara

On Tue, 4 May 2010 16:02:25 -0700
Pankaj Thakkar <pthakkar@vmware.com> wrote:

> Device passthrough technology allows a guest to bypass the hypervisor and drive
> the underlying physical device. VMware has been exploring various ways to
> deliver this technology to users in a manner which is easy to adopt. In this
> process we have prepared an architecture along with Intel - NPA (Network Plugin
> Architecture). NPA allows the guest to use the virtualized NIC vmxnet3 to
> passthrough to a number of physical NICs which support it. The document below
> provides an overview of NPA.
> 
> We intend to upgrade the upstreamed vmxnet3 driver to implement NPA so that
> Linux users can exploit the benefits provided by passthrough devices in a
> seamless manner while retaining the benefits of virtualization. The document
> below tries to answer most of the questions which we anticipated. Please let us
> know your comments and queries.
> 
> Thank you.
> 
> Signed-off-by: Pankaj Thakkar <pthakkar@vmware.com>


Code please. Also, it has to work for all architectures not just VMware and
Intel.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
  2010-05-04 23:02 Pankaj Thakkar
  2010-05-05  0:05 ` Stephen Hemminger
@ 2010-05-05  0:05 ` Stephen Hemminger
  2010-05-05  0:58   ` Chris Wright
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Stephen Hemminger @ 2010-05-05  0:05 UTC (permalink / raw)
  To: Pankaj Thakkar; +Cc: pv-drivers, netdev, linux-kernel, virtualization

On Tue, 4 May 2010 16:02:25 -0700
Pankaj Thakkar <pthakkar@vmware.com> wrote:

> Device passthrough technology allows a guest to bypass the hypervisor and drive
> the underlying physical device. VMware has been exploring various ways to
> deliver this technology to users in a manner which is easy to adopt. In this
> process we have prepared an architecture along with Intel - NPA (Network Plugin
> Architecture). NPA allows the guest to use the virtualized NIC vmxnet3 to
> passthrough to a number of physical NICs which support it. The document below
> provides an overview of NPA.
> 
> We intend to upgrade the upstreamed vmxnet3 driver to implement NPA so that
> Linux users can exploit the benefits provided by passthrough devices in a
> seamless manner while retaining the benefits of virtualization. The document
> below tries to answer most of the questions which we anticipated. Please let us
> know your comments and queries.
> 
> Thank you.
> 
> Signed-off-by: Pankaj Thakkar <pthakkar@vmware.com>


Code please. Also, it has to work for all architectures not just VMware and
Intel.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RFC: Network Plugin Architecture (NPA) for vmxnet3
@ 2010-05-04 23:02 Pankaj Thakkar
  2010-05-05  0:05 ` Stephen Hemminger
                   ` (6 more replies)
  0 siblings, 7 replies; 33+ messages in thread
From: Pankaj Thakkar @ 2010-05-04 23:02 UTC (permalink / raw)
  To: linux-kernel, netdev, virtualization; +Cc: pv-drivers, sbhatewara

Device passthrough technology allows a guest to bypass the hypervisor and drive
the underlying physical device. VMware has been exploring various ways to
deliver this technology to users in a manner which is easy to adopt. In this
process we have prepared an architecture along with Intel - NPA (Network Plugin
Architecture). NPA allows the guest to use the virtualized NIC vmxnet3 to
passthrough to a number of physical NICs which support it. The document below
provides an overview of NPA.

We intend to upgrade the upstreamed vmxnet3 driver to implement NPA so that
Linux users can exploit the benefits provided by passthrough devices in a
seamless manner while retaining the benefits of virtualization. The document
below tries to answer most of the questions which we anticipated. Please let us
know your comments and queries.

Thank you.

Signed-off-by: Pankaj Thakkar <pthakkar@vmware.com>


Network Plugin Architecture
---------------------------

VMware has been working on various device passthrough technologies for the past
few years. Passthrough technology is interesting as it can result in better
performance/cpu utilization for certain demanding applications. In our vSphere
product we support direct assignment of PCI devices like networking adapters to
a guest virtual machine. This allows the guest to drive the device using the
device drivers installed inside the guest. This is similar to the way KVM
allows for passthrough of PCI devices to the guests. The hypervisor is bypassed
for all I/O and control operations and hence it can not provide any value add
features such as live migration, suspend/resume, etc.

Network Plugin Architecture (NPA) is an approach which VMware has developed in
joint partnership with Intel which allows us to retain the best of passthrough
technology and virtualization. NPA allows for passthrough of the fast data
(I/O) path and lets the hypervisor deal with the slow control path using
traditional emulation/paravirtualization techniques. Through this splitting of
data and control path the hypervisor can still provide the above mentioned
value add features and exploit the performance benefits of passthrough.

NPA requires SR-IOV hardware which allows for sharing of one single NIC adapter
by multiple guests. SR-IOV hardware has many logically separate functions
called virtual functions (VF) which can be independently assigned to the guest
OS. They also have one or more physical functions (PF) (managed by a PF driver)
which are used by the hypervisor to control certain aspects of the VFs and the
rest of the hardware. NPA splits the guest driver into two components called
the Shell and the Plugin. The shell is responsible for interacting with the
guest networking stack and funneling the control operations to the hypervisor.
The plugin is responsible for driving the data path of the virtual function
exposed to the guest and is specific to the NIC hardware. NPA also requires an
embedded switch in the NIC to allow for switching traffic among the virtual
functions. The PF is also used as an uplink to provide connectivity to other
VMs which are in emulation mode. The figure below shows the major components in
a block diagram.

        +------------------------------+
        |         Guest VM             |
        |                              |
        |      +----------------+      |
        |      | vmxnet3 driver |      |
        |      |     Shell      |      |
        |      | +============+ |      |
        |      | |   Plugin   | |      |
        +------+-+------------+-+------+
                |           .
               +---------+  .
               | vmxnet3 |  .
               |___+-----+  .
                     |      .
                     |      .
                +----------------------------+
                |                            |
                |       virtual switch       |
                +----------------------------+
                  |         .               \
                  |         .                \
           +=============+  .                 \
           | PF control  |  .                  \
           |             |  .                   \
           |  L2 driver  |  .                    \
           +-------------+  .                     \
                  |         .                      \
                  |         .                       \
                +------------------------+     +------------+
                | PF   VF1 VF2 ...   VFn |     |            |
                |                        |     |  regular   |
                |       SR-IOV NIC       |     |    nic     |
                |    +--------------+    |     |   +--------+
                |    |   embedded   |    |     +---+
                |    |    switch    |    |
                |    +--------------+    |
                |        +---------------+
                +--------+

NPA offers several benefits:
1. Performance: Critical performance sensitive paths are not trapped and the
guest can directly drive the hardware without incurring virtualization
overheads.

2. Hypervisor control: All control operations from the guest such as programming
MAC address go through the hypervisor layer and hence can be subjected to
hypervisor policies. The PF driver can be further used to put policy decisions
like which VLAN the guest should be on.

3. Guest Management: No hardware specific drivers need to be installed in the
guest virtual machine and hence no overheads are incurred for guest management.
All software for the driver (including the PF driver and the plugin) is
installed in the hypervisor.

4. IHV independence: The architecture provides guidelines for splitting the
functionality between the VFs and PF but does not dictate how the hardware
should be implemented. It gives the IHV the freedom to do asynchronous updates
either to the software or the hardware to work around any defects.

The fundamental tenet in NPA is to let the hypervisor control the passthrough
functionality with minimal guest intervention. This gives a lot of flexibility
to the hypervisor which can then treat passthrough as an offload feature (just
like TSO, LRO, etc) which is offered to the guest virtual machine when there
are no conflicting features present. For example, if the hypervisor wants to
migrate the virtual machine from one host to another, the hypervisor can switch
the virtual machine out of passthrough mode into paravirtualized/emulated mode
and it can use existing technique to migrate the virtual machine. Once the
virtual machine is migrated to the destination host the hypervisor can switch
the virtual machine back to passthrough mode if a supporting SR-IOV nic is
present. This may involve reloading of a different plugin corresponding to the
new SR-IOV hardware.

Internally we have explored various other options before settling on the NPA
approach. For example there are approaches which create a bonding driver on top
of a complete passthrough of a NIC device and an emulated/paravirtualized
device. Though this approach allows for live migration to work it adds a lot of
complexity and dependency. First the hypervisor has to rely on a guest with
hot-add support. Second the hypervisor has to depend on the guest networking
stack to cooperate to perform migration. Third the guest has to carry the
driver images for all possible hardware to which the guest may migrate to.
Fourth the hypervisor does not get full control for all the policy decisions.
Another approach we have considered is to have a uniform interface for the data
path between the emulated/paravirtualized device and the hardware device which
allows the hypervisor to seamlessly switch from the emulated interface to the
hardware interface. Though this approach is very attractive and can work
without any guest involvement it is not acceptable to the IHVs as it does not
give them the freedom to fix bugs/erratas and differentiate from each other. We
believe NPA approach provides the right level of control and flexibility to the
hypervisors while letting the guest exploit the benefits of passthrough.

The plugin image is provided by the IHVs along with the PF driver and is
packaged in the hypervisor. The plugin image is OS agnostic and can be loaded
either into a Linux VM or a Windows VM. The plugin is written against the Shell
API interface which the shell is responsible for implementing. The API
interface allows the plugin to do TX and RX only by programming the hardware
rings (along with things like buffer allocation and basic initialization). The
virtual machine comes up in paravirtualized/emulated mode when it is booted.
The hypervisor allocates the VF and other resources and notifies the shell of
the availability of the VF. The hypervisor injects the plugin into memory
location specified by the shell. The shell initializes the plugin by calling
into a known entry point and the plugin initializes the data path. The control
path is already initialized by the PF driver when the VF is allocated. At this
point the shell switches to using the loaded plugin to do all further TX and RX
operations. The guest networking stack does not participate in these operations
and continues to function normally. All the control operations continue being
trapped by the hypervisor and are directed to the PF driver as needed. For
example, if the MAC address changes the hypervisor updates its internal state
and changes the state of the embedded switch as well through the PF control
API.

We have reworked our existing Linux vmxnet3 driver to accomodate NPA by
splitting the driver into two parts: Shell and Plugin. The new split driver is
backwards compatible and continues to work on old/existing vmxnet3 device
emulations. The shell implements the API interface and contains code to do the
bookkeeping for TX/RX buffers along with interrupt management. The shell code
also handles the loading of the plugin and verifying the license of the loaded
plugin. The plugin contains the code specific to vmxnet3 ring and descriptor
management. The plugin uses the same Shell API interface which would be used by
other IHVs. This vmxnet3 plugin is compiled statically along with the shell as
this is needed to provide connectivity when there is no underlying SR-IOV
device present. The IHV plugins are required to be distributed under GPL
license and we are currently looking at ways to verify this both within the
hypervisor and within the shell.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2010-05-10 20:46 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-04 23:02 RFC: Network Plugin Architecture (NPA) for vmxnet3 Pankaj Thakkar
  -- strict thread matches above, loose matches on Subject: below --
2010-05-04 23:02 Pankaj Thakkar
2010-05-05  0:05 ` Stephen Hemminger
2010-05-05  0:18   ` Pankaj Thakkar
2010-05-05  0:18   ` Pankaj Thakkar
2010-05-05  0:18     ` Pankaj Thakkar
2010-05-05  0:32     ` David Miller
2010-05-05  0:32       ` David Miller
2010-05-05  0:38       ` Pankaj Thakkar
2010-05-05  0:38       ` Pankaj Thakkar
2010-05-05  0:38         ` Pankaj Thakkar
2010-05-05  2:44     ` Stephen Hemminger
2010-05-05  2:44     ` Stephen Hemminger
2010-05-05  2:44       ` Stephen Hemminger
2010-05-05  0:05 ` Stephen Hemminger
2010-05-05  0:58 ` Chris Wright
2010-05-05  0:58   ` Chris Wright
2010-05-05 19:00   ` Pankaj Thakkar
2010-05-05 19:00   ` Pankaj Thakkar
2010-05-05 19:00     ` Pankaj Thakkar
2010-05-05 17:23 ` Christoph Hellwig
2010-05-05 17:23 ` Christoph Hellwig
2010-05-05 17:59 ` Avi Kivity
2010-05-05 19:44   ` Pankaj Thakkar
2010-05-05 19:44     ` Pankaj Thakkar
2010-05-06  8:58     ` Avi Kivity
2010-05-06  8:58     ` Avi Kivity
2010-05-06  8:58       ` Avi Kivity
2010-05-10 20:46       ` Pankaj Thakkar
2010-05-10 20:46       ` Pankaj Thakkar
2010-05-10 20:46         ` Pankaj Thakkar
2010-05-05 19:44   ` Pankaj Thakkar
2010-05-05 17:59 ` Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.