linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 6/6 v3] PCI: document the change
@ 2008-09-27  8:28 Zhao, Yu
  2008-10-01 16:07 ` Matthew Wilcox
  0 siblings, 1 reply; 12+ messages in thread
From: Zhao, Yu @ 2008-09-27  8:28 UTC (permalink / raw)
  To: linux-pci
  Cc: Jesse Barnes, Randy Dunlap, Grant Grundler, Alex Chiang,
	Matthew Wilcox, Roland Dreier, Greg KH, linux-kernel, kvm,
	virtualization

Create how-to for SR-IOV user and device driver developer.

Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Alex Chiang <achiang@hp.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>

---
 Documentation/DocBook/kernel-api.tmpl |    2 +
 Documentation/PCI/pci-iov-howto.txt   |  227 +++++++++++++++++++++++++++++++++
 2 files changed, 229 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/PCI/pci-iov-howto.txt

diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl
index b7b1482..c6ceb39 100644
--- a/Documentation/DocBook/kernel-api.tmpl
+++ b/Documentation/DocBook/kernel-api.tmpl
@@ -239,6 +239,7 @@ X!Ekernel/module.c
      </sect1>

      <sect1><title>PCI Support Library</title>
+!Iinclude/linux/pci.h
 !Edrivers/pci/pci.c
 !Edrivers/pci/pci-driver.c
 !Edrivers/pci/remove.c
@@ -251,6 +252,7 @@ X!Edrivers/pci/hotplug.c
 -->
 !Edrivers/pci/probe.c
 !Edrivers/pci/rom.c
+!Edrivers/pci/iov.c
      </sect1>
      <sect1><title>PCI Hotplug Support Library</title>
 !Edrivers/pci/hotplug/pci_hotplug_core.c
diff --git a/Documentation/PCI/pci-iov-howto.txt b/Documentation/PCI/pci-iov-howto.txt
new file mode 100644
index 0000000..ff1969e
--- /dev/null
+++ b/Documentation/PCI/pci-iov-howto.txt
@@ -0,0 +1,227 @@
+               PCI Express Single Root I/O Virtualization HOWTO
+                       Copyright (C) 2008 Intel Corporation
+
+
+1. Overview
+
+1.1 What is SR-IOV
+
+Single Root I/O Virtualization (SR-IOV) is a PCI Express Extended
+capability which makes one physical device appear as multiple virtual
+devices. The physical device is referred to as Physical Function while
+the virtual devices are referred to as Virtual Functions. Allocation
+of Virtual Functions can be dynamically controlled by Physical Function
+via registers encapsulated in the capability. By default, this feature
+is not enabled and the Physical Function behaves as traditional PCIe
+device. Once it's turned on, each Virtual Function's PCI configuration
+space can be accessed by its own Bus, Device and Function Number (Routing
+ID). And each Virtual Function also has PCI Memory Space, which is used
+to map its register set. Virtual Function device driver operates on the
+register set so it can be functional and appear as a real existing PCI
+device.
+
+1.2 What is ARI
+
+Alternative Routing-ID Interpretation (ARI) allows a PCI Express Endpoint
+to use its device number field as part of function number. Traditionally,
+an Endpoint can only have 8 functions, and the device number of all
+Endpoints is zero. With ARI enabled, an Endpoint can have up to 256
+functions by using device number in conjunction with function number to
+indicate a function in the device. This is almost transparent to the Linux
+kernel because the Linux kernel still can use 8-bit bus number field plus
+8-bit devfn number field to locate a function. ARI is managed via the ARI
+Forwarding bit in the Device Capabilities 2 register of the PCI Express
+Capability on the Root Port or the Downstream Port and a new ARI Capability
+on the Endpoint.
+
+
+2. User Guide
+
+2.1 How can I manage SR-IOV
+
+If a device supports SR-IOV, then there should be some entries under
+Physical Function's PCI device directory. These entries are in directory:
+       - /sys/bus/pci/devices/BB:DD.F/iov/
+               (BB:DD:F is bus:dev:fun)
+and
+       - /sys/bus/pci/devices/BB:DD.F/iov/N
+               (N is VF number from 0 to initialvfs-1)
+
+To enable or disable SR-IOV:
+       - /sys/bus/pci/devices/BB:DD.F/iov/enable
+               (writing 1/0 means enable/disable VFs, state change will
+                notify PF driver)
+
+To change number of Virtual Functions:
+       - /sys/bus/pci/devices/BB:DD.F/iov/numvfs
+               (writing positive integer to this file will change NumVFs)
+
+The total and initial number of VFs can get from:
+       - /sys/bus/pci/devices/BB:DD.F/iov/totalvfs
+       - /sys/bus/pci/devices/BB:DD.F/iov/initialvfs
+
+The identifier of a VF that belongs to this PF can get from:
+       - /sys/bus/pci/devices/BB:DD.F/iov/N/rid
+               (for all class of devices)
+
+For network device, there are:
+       - /sys/bus/pci/devices/BB:DD.F/iov/N/mac
+       - /sys/bus/pci/devices/BB:DD.F/iov/N/vlan
+               (value update will notify PF driver)
+
+2.2 How can I use Virtual Functions
+
+Virtual Functions are treated as hot-plugged PCI devices in the kernel,
+so they should be able to work in the same way as real PCI devices.
+NOTE: Virtual Function device driver must be loaded to make it work.
+
+
+3. Developer Guide
+
+3.1 SR-IOV APIs
+
+To register SR-IOV service, Physical Function device driver needs to call:
+       int pci_iov_register(struct pci_dev *dev,
+               int (*notify)(struct pci_dev *, u32), char **entries)
+
+Note: entries could be NULL if PF driver doesn't want to create new entries
+under /sys/bus/pci/devices/BB:DD.F/iov/N/.
+
+To unregister SR-IOV service, Physical Function device driver needs to call:
+       void pci_iov_unregister(struct pci_dev *dev)
+
+To enable SR-IOV, Physical Function device driver needs to call:
+       int pci_iov_enable(struct pci_dev *dev, int numvfs)
+
+To disable SR-IOV, Physical Function device driver needs to call:
+       void pci_iov_disable(struct pci_dev *dev)
+
+Note: above two functions sleeps 1 second waiting on hardware transaction
+completion according to SR-IOV specification.
+
+To read or write VFs configuration:
+       - int pci_iov_read_config(struct pci_dev *dev, int id,
+                       char *entry, char *buf, int size);
+       - int pci_iov_write_config(struct pci_dev *dev, int id,
+                       char *entry, char *buf);
+3.2 Usage example
+
+Following piece of code illustrates the usage of APIs above.
+
+static char *entries[] = { "foo", "bar", NULL };
+
+static int callback(struct pci_dev *dev, u32 event)
+{
+       int err;
+       int vfn;
+       int numvfs;
+
+       if (event & PCI_IOV_ENABLE) {
+               /*
+                * request to enable SR-IOV, NumVFs is available.
+                * Note: if the PF want to support PM, it has to
+                * check the device power state here to see if
+                * the request is allowed or not.
+                */
+
+               numvfs = event & PCI_IOV_NUM_VIRTFN;
+
+       } else if (event & PCI_IOV_DISABLE) {
+               /*
+                * request to disable SR-IOV.
+                */
+               ...
+
+       } else if (event & PCI_IOV_RD_CONF) {
+               /*
+                * request to read VF configuration, Virtual
+                * Function Number is available.
+                */
+
+               vfn = event & PCI_IOV_VIRTFN_ID;
+
+               /* pass the config to SR-IOV code so user can read it */
+               err = pci_iov_write_config(dev, vfn, entry, buf);
+
+       } else if (event & PCI_IOV_WR_CONF) {
+               /*
+                * request to write VF configuration, Virtual
+                * Function Number is available.
+                */
+
+               vfn = event & PCI_IOV_VIRTFN_ID;
+
+               /* read the config that has been written by user */
+               err = pci_iov_read_config(dev, vfn, entry, buf, size);
+
+       } else
+               return -EINVAL;
+
+       return err;
+}
+
+static int __devinit dev_probe(struct pci_dev *dev,
+                               const struct pci_device_id *id)
+{
+       int err;
+
+       err = pci_iov_register(dev, callback, entries);
+       ...
+
+       err = pci_iov_enable(dev, nr_virtfn, callback);
+
+       ...
+
+       return err;
+}
+
+static void __devexit dev_remove(struct pci_dev *dev)
+{
+       ...
+
+       pci_iov_disable(dev);
+
+       ...
+
+       pci_iov_unregister(dev);
+
+       ...
+}
+
+#ifdef CONFIG_PM
+/*
+ * If Physical Function supports the power management, then the
+ * SR-IOV needs to be disabled before the adapter goes to sleep,
+ * because Virtual Functions will not work when the adapter is in
+ * the power-saving mode.
+ * The SR-IOV can be enabled again after the adapter wakes up.
+ */
+static int dev_suspend(struct pci_dev *dev, pm_message_t state)
+{
+       ...
+
+       pci_iov_disable(dev);
+
+       ...
+}
+
+static int dev_resume(struct pci_dev *dev)
+{
+       ...
+
+       pci_iov_enable(dev, numvfs);
+
+       ...
+}
+#endif
+
+static struct pci_driver dev_driver = {
+       .name =         "SR-IOV Physical Function driver",
+       .id_table =     dev_id_table,
+       .probe =        dev_probe,
+       .remove =       __devexit_p(dev_remove),
+#ifdef CONFIG_PM
+       .suspend =      dev_suspend,
+       .resume =       dev_resume,
+#endif
+};
--
1.5.6.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 6/6 v3] PCI: document the change
  2008-09-27  8:28 [PATCH 6/6 v3] PCI: document the change Zhao, Yu
@ 2008-10-01 16:07 ` Matthew Wilcox
  2008-10-14  0:23   ` Dong, Eddie
  2008-11-15 12:38   ` Yu Zhao
  0 siblings, 2 replies; 12+ messages in thread
From: Matthew Wilcox @ 2008-10-01 16:07 UTC (permalink / raw)
  To: Zhao, Yu
  Cc: linux-pci, Jesse Barnes, Randy Dunlap, Grant Grundler,
	Alex Chiang, Roland Dreier, Greg KH, linux-kernel, kvm,
	virtualization

On Sat, Sep 27, 2008 at 04:28:45PM +0800, Zhao, Yu wrote:
> +++ b/Documentation/DocBook/kernel-api.tmpl
> @@ -239,6 +239,7 @@ X!Ekernel/module.c
>       </sect1>
> 
>       <sect1><title>PCI Support Library</title>
> +!Iinclude/linux/pci.h

Why do you need to do this?  Thus far, all the documentation has been
with the implementation, not in the header file.

> +1.2 What is ARI
> +
> +Alternative Routing-ID Interpretation (ARI) allows a PCI Express Endpoint
> +to use its device number field as part of function number. Traditionally,
> +an Endpoint can only have 8 functions, and the device number of all
> +Endpoints is zero. With ARI enabled, an Endpoint can have up to 256
> +functions by using device number in conjunction with function number to
> +indicate a function in the device. This is almost transparent to the Linux
> +kernel because the Linux kernel still can use 8-bit bus number field plus
> +8-bit devfn number field to locate a function. ARI is managed via the ARI
> +Forwarding bit in the Device Capabilities 2 register of the PCI Express
> +Capability on the Root Port or the Downstream Port and a new ARI Capability
> +on the Endpoint.

I don't think this section actually helps a software developer use
SR-IOV, does it?

> +2. User Guide
> +
> +2.1 How can I manage SR-IOV
> +
> +If a device supports SR-IOV, then there should be some entries under
> +Physical Function's PCI device directory. These entries are in directory:
> +       - /sys/bus/pci/devices/BB:DD.F/iov/
> +               (BB:DD:F is bus:dev:fun)

The 'domain:' prefix has been there for a long time now.

> +and
> +       - /sys/bus/pci/devices/BB:DD.F/iov/N
> +               (N is VF number from 0 to initialvfs-1)
> +
> +To enable or disable SR-IOV:
> +       - /sys/bus/pci/devices/BB:DD.F/iov/enable
> +               (writing 1/0 means enable/disable VFs, state change will
> +                notify PF driver)
> +
> +To change number of Virtual Functions:
> +       - /sys/bus/pci/devices/BB:DD.F/iov/numvfs
> +               (writing positive integer to this file will change NumVFs)
> +
> +The total and initial number of VFs can get from:
> +       - /sys/bus/pci/devices/BB:DD.F/iov/totalvfs
> +       - /sys/bus/pci/devices/BB:DD.F/iov/initialvfs
> +
> +The identifier of a VF that belongs to this PF can get from:
> +       - /sys/bus/pci/devices/BB:DD.F/iov/N/rid
> +               (for all class of devices)

Wouldn't it be more useful to have the iov/N directories be a symlink to
the actual pci_dev used by the virtual function?

> +For network device, there are:
> +       - /sys/bus/pci/devices/BB:DD.F/iov/N/mac
> +       - /sys/bus/pci/devices/BB:DD.F/iov/N/vlan
> +               (value update will notify PF driver)

We already have tools to set the MAC and VLAN parameters for network
devices.

> +To register SR-IOV service, Physical Function device driver needs to call:
> +       int pci_iov_register(struct pci_dev *dev,
> +               int (*notify)(struct pci_dev *, u32), char **entries)

I think a better interface would put the 'notify' into the struct
pci_driver.  That would make 'notify' a bad name .... how about
'virtual'?  There's also no documentation for the second parameter to
'notify'.

> +Note: entries could be NULL if PF driver doesn't want to create new entries
> +under /sys/bus/pci/devices/BB:DD.F/iov/N/.

So 'entries' is a list of names to create sysfs entries for?

> +To enable SR-IOV, Physical Function device driver needs to call:
> +       int pci_iov_enable(struct pci_dev *dev, int numvfs)
> +
> +To disable SR-IOV, Physical Function device driver needs to call:
> +       void pci_iov_disable(struct pci_dev *dev)

I'm not 100% convinced about this API.  The assumption here is that the
driver will do it, but I think it should probably be in the core.  The
driver probably wants to be notified that the PCI core is going to
create a virtual function, and would it please prepare to do so, but I'm
not convinced this should be triggered by the driver.  How would the
driver decide to create a new virtual function?

> +To read or write VFs configuration:
> +       - int pci_iov_read_config(struct pci_dev *dev, int id,
> +                       char *entry, char *buf, int size);
> +       - int pci_iov_write_config(struct pci_dev *dev, int id,
> +                       char *entry, char *buf);

I think we'd be better off having the driver create its own sysfs
entries if it really needs to.

> +3.2 Usage example
> +
> +Following piece of code illustrates the usage of APIs above.
> +
[...]
> +static struct pci_driver dev_driver = {
> +       .name =         "SR-IOV Physical Function driver",
> +       .id_table =     dev_id_table,
> +       .probe =        dev_probe,
> +       .remove =       __devexit_p(dev_remove),
> +#ifdef CONFIG_PM
> +       .suspend =      dev_suspend,
> +       .resume =       dev_resume,
> +#endif
> +};

>From my reading of the SR-IOV spec, this isn't how it's supposed to
work.  The device is supposed to be a fully functional PCI device that
on demand can start peeling off virtual functions; it's not supposed to
boot up and initialise all its virtual functions at once.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 6/6 v3] PCI: document the change
  2008-10-01 16:07 ` Matthew Wilcox
@ 2008-10-14  0:23   ` Dong, Eddie
  2008-10-14  1:08     ` Matthew Wilcox
  2008-11-15 12:38   ` Yu Zhao
  1 sibling, 1 reply; 12+ messages in thread
From: Dong, Eddie @ 2008-10-14  0:23 UTC (permalink / raw)
  To: Matthew Wilcox, Zhao, Yu
  Cc: linux-pci, Jesse Barnes, Randy Dunlap, Grant Grundler,
	Alex Chiang, Roland Dreier, Greg KH, linux-kernel, kvm,
	virtualization, Dong, Eddie

Matthew Wilcox wrote:
 
> Wouldn't it be more useful to have the iov/N directories
> be a symlink to the actual pci_dev used by the virtual
> function? 

The main concern here is that a VF may be disabed such as when PF enter
D3 state or undergo an reset and thus be plug-off, but user won't
re-configure the VF in case the PF return back to working state.

> 
>> +For network device, there are:
>> +       - /sys/bus/pci/devices/BB:DD.F/iov/N/mac
>> +       - /sys/bus/pci/devices/BB:DD.F/iov/N/vlan
>> +               (value update will notify PF driver)
> 
> We already have tools to set the MAC and VLAN parameters
> for network devices.

Do you mean Ethtool? If yes, it is impossible for SR-IOV since the
configuration has to be done in PF side, rather than VF.

> 
> I'm not 100% convinced about this API.  The assumption
> here is that the driver will do it, but I think it should
> probably be in the core.  The driver probably wants to be

Our concern is that the PF driver may put an default state when it is
loaded so that SR-IOV can work without any user level configuration, but
of course the driver won't dynamically change it.
Do u mean we remove this ability?

> notified that the PCI core is going to create a virtual
> function, and would it please prepare to do so, but I'm
> not convinced this should be triggered by the driver. 
> How would the driver decide to create a new virtual
> function?  
> 
> 
> From my reading of the SR-IOV spec, this isn't how it's
> supposed to work.  The device is supposed to be a fully
> functional PCI device that on demand can start peeling
> off virtual functions; it's not supposed to boot up and
> initialise all its virtual functions at once. 

The spec defines either we enable all VFs or Disable. Per VF enabling is
not supported.
Is this what you concern?

Thanks, eddie

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 6/6 v3] PCI: document the change
  2008-10-14  0:23   ` Dong, Eddie
@ 2008-10-14  1:08     ` Matthew Wilcox
  2008-10-14  2:31       ` Dong, Eddie
  0 siblings, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2008-10-14  1:08 UTC (permalink / raw)
  To: Dong, Eddie
  Cc: Zhao, Yu, linux-pci, Jesse Barnes, Randy Dunlap, Grant Grundler,
	Alex Chiang, Roland Dreier, Greg KH, linux-kernel, kvm,
	virtualization

On Tue, Oct 14, 2008 at 08:23:34AM +0800, Dong, Eddie wrote:
> Matthew Wilcox wrote:
> > Wouldn't it be more useful to have the iov/N directories
> > be a symlink to the actual pci_dev used by the virtual
> > function? 
> 
> The main concern here is that a VF may be disabed such as when PF enter
> D3 state or undergo an reset and thus be plug-off, but user won't
> re-configure the VF in case the PF return back to working state.

If we're relying on the user to reconfigure virtual functions on return
to D0 from D3, that's a very fragile system.

> >> +For network device, there are:
> >> +       - /sys/bus/pci/devices/BB:DD.F/iov/N/mac
> >> +       - /sys/bus/pci/devices/BB:DD.F/iov/N/vlan
> >> +               (value update will notify PF driver)
> > 
> > We already have tools to set the MAC and VLAN parameters
> > for network devices.
> 
> Do you mean Ethtool? If yes, it is impossible for SR-IOV since the
> configuration has to be done in PF side, rather than VF.

I don't think ethtool has that ability; ip(8) can set mac addresses and
vconfig(8) sets vlan parameters.

The device driver already has to be aware of SR-IOV.  If it's going to
support the standard tools (and it damn well ought to), then it should
call the PF driver to set these kinds of parameters.

> > I'm not 100% convinced about this API.  The assumption
> > here is that the driver will do it, but I think it should
> > probably be in the core.  The driver probably wants to be
> 
> Our concern is that the PF driver may put an default state when it is
> loaded so that SR-IOV can work without any user level configuration, but
> of course the driver won't dynamically change it.
> Do u mean we remove this ability?
> 
> > notified that the PCI core is going to create a virtual
> > function, and would it please prepare to do so, but I'm
> > not convinced this should be triggered by the driver. 
> > How would the driver decide to create a new virtual
> > function?  

Let me try to explain this a bit better.

The user decides they want a new ethernet virtual function.  In the
scheme as you have set up:

1. User communicates to ethernet driver "I want a new VF"
2. Ethernet driver tells PCI core "create new VF".

I propose:

1. User tells PCI core "I want a new VF on PCI device 0000:01:03.0"
2. PCI core tells driver "User wants a new VF"

My scheme gives us a unified way of creating new VFs, yours requires each
driver to invent a way for the user to tell them to create a new VF.
Unless I've misunderstood your code and docs.

> > From my reading of the SR-IOV spec, this isn't how it's
> > supposed to work.  The device is supposed to be a fully
> > functional PCI device that on demand can start peeling
> > off virtual functions; it's not supposed to boot up and
> > initialise all its virtual functions at once. 
> 
> The spec defines either we enable all VFs or Disable. Per VF enabling is
> not supported.
> Is this what you concern?

I don't think that's true.  The spec requires you to enable all the
VFs from 0 to NumVFs, but NumVFs can be lower than TotalVFs.  At least,
that's how I read it.

But no, that isn't my concern.  My concern is that you've written a
driver here that seems to be a stub driver.  That doesn't seem to be
how SR-IOV is supposed to work; it's supposed to be a fully-functional
driver that has SR-IOV knowledge added to it.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 6/6 v3] PCI: document the change
  2008-10-14  2:31       ` Dong, Eddie
@ 2008-10-14  2:14         ` Yu Zhao
  2008-10-14  4:01           ` Matthew Wilcox
  0 siblings, 1 reply; 12+ messages in thread
From: Yu Zhao @ 2008-10-14  2:14 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: eddie.dong, linux-pci, Jesse Barnes, Randy Dunlap,
	Grant Grundler, Alex Chiang, Roland Dreier, Greg KH,
	linux-kernel, kvm, virtualization

On Tue, Oct 14, 2008 at 10:31:03AM +0800, Dong, Eddie wrote:
> Matthew Wilcox wrote:
> > On Tue, Oct 14, 2008 at 08:23:34AM +0800, Dong, Eddie
> > wrote:
> >> Matthew Wilcox wrote:
> >>> Wouldn't it be more useful to have the iov/N directories
> >>> be a symlink to the actual pci_dev used by the virtual
> >>> function?
> >>
> >> The main concern here is that a VF may be disabed such
> >> as when PF enter D3 state or undergo an reset and thus
> >> be plug-off, but user won't re-configure the VF in case
> >> the PF return back to working state.
> >
> > If we're relying on the user to reconfigure virtual
> > functions on return to D0 from D3, that's a very fragile
> > system.
> 
> No. that is the concern we don't put those configuration under VF nodes because it will disappear.
> Do I miss something?
> 
> >
> >>>> +For network device, there are:
> >>>> +       - /sys/bus/pci/devices/BB:DD.F/iov/N/mac
> >>>> +       - /sys/bus/pci/devices/BB:DD.F/iov/N/vlan
> >>>> +               (value update will notify PF driver)
> >>>
> >>> We already have tools to set the MAC and VLAN parameters
> >>> for network devices.
> >>
> >> Do you mean Ethtool? If yes, it is impossible for SR-IOV
> >> since the configuration has to be done in PF side,
> >> rather than VF.
> >
> > I don't think ethtool has that ability; ip(8) can set mac
> > addresses and vconfig(8) sets vlan parameters.
> >
> > The device driver already has to be aware of SR-IOV.  If
> > it's going to support the standard tools (and it damn
> > well ought to), then it should call the PF driver to set
> > these kinds of parameters.
> 
> OK, as if it has the VF parameter, will look into details.

Neither ip(8) nor vconfig(8) can set MAC and VLAN address for VF when
the VF driver is not loaded.

> BTW, the SR-IOV patch is not only for network, some other devices such as IDE will use same code base as well and we image it could have other parameter to set such as starting LBA of a IDE VF.

As Eddie said, we have two problems here:
1) User has to set device specific parameters of a VF when he wants to
use this VF with KVM (assign this device to KVM guest). In this case,
VF driver is not loaded in the host environment. So operations which
are implemented as driver callback (e.g. set_mac_address()) are not
supported.
2) For security reason, some SR-IOV devices prohibit the VF driver
configuring the VF via its own register space. Instead, the configurations
must be done through the PF which the VF is associated with. This means PF
driver has to receive parameters that are used to configure its VFs. These
parameters obviously can be passed by traditional tools, if without
modification for SR-IOV.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 6/6 v3] PCI: document the change
  2008-10-14  1:08     ` Matthew Wilcox
@ 2008-10-14  2:31       ` Dong, Eddie
  2008-10-14  2:14         ` Yu Zhao
  0 siblings, 1 reply; 12+ messages in thread
From: Dong, Eddie @ 2008-10-14  2:31 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Zhao, Yu, linux-pci, Jesse Barnes, Randy Dunlap, Grant Grundler,
	Alex Chiang, Roland Dreier, Greg KH, linux-kernel, kvm,
	virtualization, Dong, Eddie

Matthew Wilcox wrote:
> On Tue, Oct 14, 2008 at 08:23:34AM +0800, Dong, Eddie
> wrote: 
>> Matthew Wilcox wrote:
>>> Wouldn't it be more useful to have the iov/N directories
>>> be a symlink to the actual pci_dev used by the virtual
>>> function?
>> 
>> The main concern here is that a VF may be disabed such
>> as when PF enter D3 state or undergo an reset and thus
>> be plug-off, but user won't re-configure the VF in case
>> the PF return back to working state. 
> 
> If we're relying on the user to reconfigure virtual
> functions on return to D0 from D3, that's a very fragile
> system. 

No. that is the concern we don't put those configuration under VF nodes
because it will disappear.
Do I miss something?

> 
>>>> +For network device, there are:
>>>> +       - /sys/bus/pci/devices/BB:DD.F/iov/N/mac
>>>> +       - /sys/bus/pci/devices/BB:DD.F/iov/N/vlan
>>>> +               (value update will notify PF driver)
>>> 
>>> We already have tools to set the MAC and VLAN parameters
>>> for network devices.
>> 
>> Do you mean Ethtool? If yes, it is impossible for SR-IOV
>> since the configuration has to be done in PF side,
>> rather than VF. 
> 
> I don't think ethtool has that ability; ip(8) can set mac
> addresses and vconfig(8) sets vlan parameters.
> 
> The device driver already has to be aware of SR-IOV.  If
> it's going to support the standard tools (and it damn
> well ought to), then it should call the PF driver to set
> these kinds of parameters. 

OK, as if it has the VF parameter, will look into details.
BTW, the SR-IOV patch is not only for network, some other devices such
as IDE will use same code base as well and we image it could have other
parameter to set such as starting LBA of a IDE VF.



> 
>>> I'm not 100% convinced about this API.  The assumption
>>> here is that the driver will do it, but I think it
>>> should probably be in the core.  The driver probably
>>> wants to be 
>> 
>> Our concern is that the PF driver may put an default
>> state when it is loaded so that SR-IOV can work without
>> any user level configuration, but of course the driver
>> won't dynamically change it. 
>> Do u mean we remove this ability?
>> 
>>> notified that the PCI core is going to create a virtual
>>> function, and would it please prepare to do so, but I'm
>>> not convinced this should be triggered by the driver.
>>> How would the driver decide to create a new virtual
>>> function?
> 
> Let me try to explain this a bit better.
> 
> The user decides they want a new ethernet virtual
> function.  In the scheme as you have set up:
> 
> 1. User communicates to ethernet driver "I want a new VF"
> 2. Ethernet driver tells PCI core "create new VF".
> 
> I propose:
> 
> 1. User tells PCI core "I want a new VF on PCI device
> 0000:01:03.0" 
> 2. PCI core tells driver "User wants a new VF"

If user need a new VF, the VF must be already enabled or existed in OS.
Otherwise, we need to disable all VFs first and then change NumVFs to
re-enable VFs.
Spec says: "NumVFs may only be written while VF Enable is Clear"

> 
> My scheme gives us a unified way of creating new VFs,
> yours requires each driver to invent a way for the user
> to tell them to create a new VF. Unless I've
> misunderstood your code and docs. 

Assign a VF is kind of user & core job. 

> 
>>> From my reading of the SR-IOV spec, this isn't how it's
>>> supposed to work.  The device is supposed to be a fully
>>> functional PCI device that on demand can start peeling
>>> off virtual functions; it's not supposed to boot up and
>>> initialise all its virtual functions at once.
>> 
>> The spec defines either we enable all VFs or Disable.
>> Per VF enabling is not supported. Is this what you
>> concern? 
> 
> I don't think that's true.  The spec requires you to
> enable all the 
> VFs from 0 to NumVFs, but NumVFs can be lower than
> TotalVFs.  At least, that's how I read it.

Yes, but setting NumVFs can only occur when VFs are disabled.
Following are from spec.

NumVFs may only be written while VF Enable is Clear. If NumVFs is
written when VF Enable is
Set, the results are undefined.
The initial value of NumVFs is undefined.

> 
> But no, that isn't my concern.  My concern is that you've
> written a driver here that seems to be a stub driver. 
> That doesn't seem to be 
> how SR-IOV is supposed to work; it's supposed to be a
> fully-functional driver that has SR-IOV knowledge added
> to it. 

Yes, it is a full feature driver as if PF has resource in, for example
not all queues are assigned to VFs.

Thx, eddie

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 6/6 v3] PCI: document the change
  2008-10-14  2:14         ` Yu Zhao
@ 2008-10-14  4:01           ` Matthew Wilcox
  2008-10-14  4:06             ` Yu Zhao
  2008-10-14  4:18             ` Dong, Eddie
  0 siblings, 2 replies; 12+ messages in thread
From: Matthew Wilcox @ 2008-10-14  4:01 UTC (permalink / raw)
  To: Yu Zhao
  Cc: eddie.dong, linux-pci, Jesse Barnes, Randy Dunlap,
	Grant Grundler, Alex Chiang, Roland Dreier, Greg KH,
	linux-kernel, kvm, virtualization

On Tue, Oct 14, 2008 at 10:14:35AM +0800, Yu Zhao wrote:
> > BTW, the SR-IOV patch is not only for network, some other devices such as IDE will use same code base as well and we image it could have other parameter to set such as starting LBA of a IDE VF.
> 
> As Eddie said, we have two problems here:
> 1) User has to set device specific parameters of a VF when he wants to
> use this VF with KVM (assign this device to KVM guest). In this case,
> VF driver is not loaded in the host environment. So operations which
> are implemented as driver callback (e.g. set_mac_address()) are not
> supported.

I suspect what you want to do is create, then configure the device in
the host, then assign it to the guest.

> 2) For security reason, some SR-IOV devices prohibit the VF driver
> configuring the VF via its own register space. Instead, the configurations
> must be done through the PF which the VF is associated with. This means PF
> driver has to receive parameters that are used to configure its VFs. These
> parameters obviously can be passed by traditional tools, if without
> modification for SR-IOV.

I think that idea also covers this point.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 6/6 v3] PCI: document the change
  2008-10-14  4:01           ` Matthew Wilcox
@ 2008-10-14  4:06             ` Yu Zhao
  2008-10-14  4:18             ` Dong, Eddie
  1 sibling, 0 replies; 12+ messages in thread
From: Yu Zhao @ 2008-10-14  4:06 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Dong, Eddie, linux-pci, Jesse Barnes, Randy Dunlap,
	Grant Grundler, Alex Chiang, Roland Dreier, Greg KH,
	linux-kernel, kvm, virtualization

On Tue, Oct 14, 2008 at 12:01:05PM +0800, Matthew Wilcox wrote:
> On Tue, Oct 14, 2008 at 10:14:35AM +0800, Yu Zhao wrote:
> > > BTW, the SR-IOV patch is not only for network, some other devices such as IDE will use same code base as well and we image it could have other parameter to set such as starting LBA of a IDE VF.
> >
> > As Eddie said, we have two problems here:
> > 1) User has to set device specific parameters of a VF when he wants to
> > use this VF with KVM (assign this device to KVM guest). In this case,
> > VF driver is not loaded in the host environment. So operations which
> > are implemented as driver callback (e.g. set_mac_address()) are not
> > supported.
> 
> I suspect what you want to do is create, then configure the device in
> the host, then assign it to the guest.
> 
> > 2) For security reason, some SR-IOV devices prohibit the VF driver
> > configuring the VF via its own register space. Instead, the configurations
> > must be done through the PF which the VF is associated with. This means PF
> > driver has to receive parameters that are used to configure its VFs. These
> > parameters obviously can be passed by traditional tools, if without
                         ^^^
Sorry, here I meant to say 'can not'.

> > modification for SR-IOV.
> 
> I think that idea also covers this point.

Can you please elaborate this?

Thanks a lot.

> 
> --
> Matthew Wilcox                          Intel Open Source Technology Centre
> "Bill, look, we understand that you're interested in selling us this
> operating system, but compare it to ours.  We can't possibly take such
> a retrograde step."
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 6/6 v3] PCI: document the change
  2008-10-14  4:01           ` Matthew Wilcox
  2008-10-14  4:06             ` Yu Zhao
@ 2008-10-14  4:18             ` Dong, Eddie
  2008-10-14  4:46               ` Matthew Wilcox
  1 sibling, 1 reply; 12+ messages in thread
From: Dong, Eddie @ 2008-10-14  4:18 UTC (permalink / raw)
  To: Matthew Wilcox, Zhao, Yu
  Cc: linux-pci, Jesse Barnes, Randy Dunlap, Grant Grundler,
	Alex Chiang, Roland Dreier, Greg KH, linux-kernel, kvm,
	virtualization, Dong, Eddie

Matthew Wilcox wrote:
> On Tue, Oct 14, 2008 at 10:14:35AM +0800, Yu Zhao wrote:
>>> BTW, the SR-IOV patch is not only for network, some
>>> other devices such as IDE will use same code base as
>>> well and we image it could have other parameter to set
>>> such as starting LBA of a IDE VF.   
>> 
>> As Eddie said, we have two problems here:
>> 1) User has to set device specific parameters of a VF
>> when he wants to use this VF with KVM (assign this
>> device to KVM guest). In this case, 
>> VF driver is not loaded in the host environment. So
>> operations which 
>> are implemented as driver callback (e.g.
>> set_mac_address()) are not supported.
> 
> I suspect what you want to do is create, then configure
> the device in the host, then assign it to the guest.

That is not true. Rememver the created VFs will be destroyed no matter
for PF power event or error recovery conducted reset.
So what we want is:

Config, create, assign, and then deassign and destroy and then
recreate...

> 
>> 2) For security reason, some SR-IOV devices prohibit the
>> VF driver configuring the VF via its own register space.
>> Instead, the configurations must be done through the PF
>> which the VF is associated with. This means PF driver
>> has to receive parameters that are used to configure its
>> VFs. These parameters obviously can be passed by
>> traditional tools, if without modification for SR-IOV. 
> 
> I think that idea also covers this point.
> 
Sorry can u explain a little bit more? The SR-IOV patch won't define
what kind of entries should be created or not, we leave network
subsystem to decide what to do. Same for disk subsstem etc.

Thx, eddie

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 6/6 v3] PCI: document the change
  2008-10-14  4:18             ` Dong, Eddie
@ 2008-10-14  4:46               ` Matthew Wilcox
  2008-10-17  5:48                 ` Anirban Chakraborty
  0 siblings, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2008-10-14  4:46 UTC (permalink / raw)
  To: Dong, Eddie
  Cc: Zhao, Yu, linux-pci, Jesse Barnes, Randy Dunlap, Grant Grundler,
	Alex Chiang, Roland Dreier, Greg KH, linux-kernel, kvm,
	virtualization

On Tue, Oct 14, 2008 at 12:18:40PM +0800, Dong, Eddie wrote:
> Matthew Wilcox wrote:
> > On Tue, Oct 14, 2008 at 10:14:35AM +0800, Yu Zhao wrote:
> >> As Eddie said, we have two problems here:
> >> 1) User has to set device specific parameters of a VF
> >> when he wants to use this VF with KVM (assign this
> >> device to KVM guest). In this case, 
> >> VF driver is not loaded in the host environment. So
> >> operations which 
> >> are implemented as driver callback (e.g.
> >> set_mac_address()) are not supported.
> > 
> > I suspect what you want to do is create, then configure
> > the device in the host, then assign it to the guest.
> 
> That is not true. Rememver the created VFs will be destroyed no matter
> for PF power event or error recovery conducted reset.
> So what we want is:
> 
> Config, create, assign, and then deassign and destroy and then
> recreate...

Yes, but my point is this all happens in the _host_, not in the _guest_.

> Sorry can u explain a little bit more? The SR-IOV patch won't define
> what kind of entries should be created or not, we leave network
> subsystem to decide what to do. Same for disk subsstem etc.

No entries should be created.  This needs to be not SR-IOV specific.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 6/6 v3] PCI: document the change
  2008-10-14  4:46               ` Matthew Wilcox
@ 2008-10-17  5:48                 ` Anirban Chakraborty
  0 siblings, 0 replies; 12+ messages in thread
From: Anirban Chakraborty @ 2008-10-17  5:48 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Dong, Eddie, Zhao, Yu, linux-pci, Jesse Barnes, Randy Dunlap,
	Grant Grundler, Alex Chiang, Roland Dreier, Greg KH,
	linux-kernel, kvm, virtualization


On Oct 13, 2008, at 9:46 PM, Matthew Wilcox wrote:

> On Tue, Oct 14, 2008 at 12:18:40PM +0800, Dong, Eddie wrote:
>> Matthew Wilcox wrote:
>>> On Tue, Oct 14, 2008 at 10:14:35AM +0800, Yu Zhao wrote:
>>>> As Eddie said, we have two problems here:
>>>> 1) User has to set device specific parameters of a VF
>>>> when he wants to use this VF with KVM (assign this
>>>> device to KVM guest). In this case,
>>>> VF driver is not loaded in the host environment. So
>>>> operations which
>>>> are implemented as driver callback (e.g.
>>>> set_mac_address()) are not supported.
>>>
>>> I suspect what you want to do is create, then configure
>>> the device in the host, then assign it to the guest.
>>
>> That is not true. Rememver the created VFs will be destroyed no  
>> matter
>> for PF power event or error recovery conducted reset.
>> So what we want is:
>>
>> Config, create, assign, and then deassign and destroy and then
>> recreate...
>
> Yes, but my point is this all happens in the _host_, not in the  
> _guest_.
>
>> Sorry can u explain a little bit more? The SR-IOV patch won't define
>> what kind of entries should be created or not, we leave network
>> subsystem to decide what to do. Same for disk subsstem etc.
>
> No entries should be created.  This needs to be not SR-IOV specific.

I think we need to cover both the scenarios here, virtualization and  
non virtualization. In the absence of virtualization, the VF and PF  
driver should be identical. In this context, how does the PF driver  
allocates a VF? Is dynamic allocation of VFs possible, or does it have  
to allocate all the VFs that the device supports when the PF driver  
loads? Also, will the probe function be called for the VFs, or does  
the PF driver handle only the probe for the physical function? In  
virtualization context things get bit more complex as the the VF  
driver in guest would like to treat the VF as a physical function but  
that may not be possible from the device perspective as the control  
registers may well be shared between VF and PF.
I would think that the VF allocation is the job of SR PCIM. PCIM may  
well ask the PF driver to configure a VF upon user request.

Thanks much,
Anirban Chakraborty

> -- 
> Matthew Wilcox				Intel Open Source Technology Centre
> "Bill, look, we understand that you're interested in selling us this
> operating system, but compare it to ours.  We can't possibly take such
> a retrograde step."
> --
> To unsubscribe from this list: send the line "unsubscribe linux- 
> kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 6/6 v3] PCI: document the change
  2008-10-01 16:07 ` Matthew Wilcox
  2008-10-14  0:23   ` Dong, Eddie
@ 2008-11-15 12:38   ` Yu Zhao
  1 sibling, 0 replies; 12+ messages in thread
From: Yu Zhao @ 2008-11-15 12:38 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Zhao, Yu, linux-pci, Jesse Barnes, Randy Dunlap, Grant Grundler,
	Alex Chiang, Roland Dreier, Greg KH, linux-kernel, kvm,
	virtualization

Matthew Wilcox wrote:
> On Sat, Sep 27, 2008 at 04:28:45PM +0800, Zhao, Yu wrote:
>> +To register SR-IOV service, Physical Function device driver needs to call:
>> +       int pci_iov_register(struct pci_dev *dev,
>> +               int (*notify)(struct pci_dev *, u32), char **entries)
> 
> I think a better interface would put the 'notify' into the struct
> pci_driver.  That would make 'notify' a bad name .... how about
> 'virtual'?  There's also no documentation for the second parameter to
> 'notify'.

Yes, putting the callback function to the 'pci_driver' is better. Looks 
like the 'virtual' is not very descriptive (and it's a adj. while other 
callbacks are verb). Any other candidates?

Thanks,
Yu



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-11-15 12:46 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-09-27  8:28 [PATCH 6/6 v3] PCI: document the change Zhao, Yu
2008-10-01 16:07 ` Matthew Wilcox
2008-10-14  0:23   ` Dong, Eddie
2008-10-14  1:08     ` Matthew Wilcox
2008-10-14  2:31       ` Dong, Eddie
2008-10-14  2:14         ` Yu Zhao
2008-10-14  4:01           ` Matthew Wilcox
2008-10-14  4:06             ` Yu Zhao
2008-10-14  4:18             ` Dong, Eddie
2008-10-14  4:46               ` Matthew Wilcox
2008-10-17  5:48                 ` Anirban Chakraborty
2008-11-15 12:38   ` Yu Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).