All of lore.kernel.org
 help / color / mirror / Atom feed
* New commands to configure IOV features
@ 2012-05-07 11:17 Yuval Mintz
  2012-05-07 15:16 ` Greg Rose
  0 siblings, 1 reply; 52+ messages in thread
From: Yuval Mintz @ 2012-05-07 11:17 UTC (permalink / raw)
  To: gregory.v.rose; +Cc: Ben Hutchings, netdev

I've tried to figure out if there was a standard interface
(ethtool/iproute) through which a user could configure the number
of vfs in his system.

I've seen the RFC suggested in http://markmail.org/thread/qblfcv7zbxsxp7q6,
and http://markmail.org/thread/fw54dcppmxuxoe6n, but failed to see any
later references to it (commits or further discussion on this topic).

How exactly are things standing with these RFCs? Were they abandoned?

Thanks,
Yuval

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-05-07 11:17 New commands to configure IOV features Yuval Mintz
@ 2012-05-07 15:16 ` Greg Rose
  2012-06-26 12:21   ` Yuval Mintz
  0 siblings, 1 reply; 52+ messages in thread
From: Greg Rose @ 2012-05-07 15:16 UTC (permalink / raw)
  To: Yuval Mintz; +Cc: Ben Hutchings, netdev

On Mon, 7 May 2012 14:17:54 +0300
Yuval Mintz <yuvalmin@broadcom.com> wrote:

> I've tried to figure out if there was a standard interface
> (ethtool/iproute) through which a user could configure the number
> of vfs in his system.
> 
> I've seen the RFC suggested in
> http://markmail.org/thread/qblfcv7zbxsxp7q6, and
> http://markmail.org/thread/fw54dcppmxuxoe6n, but failed to see any
> later references to it (commits or further discussion on this topic).
> 
> How exactly are things standing with these RFCs? Were they abandoned?

The only way to configure the number of VFs continues to be through the
max_vfs module parameter.  I've got a patch to do it through ethtool
sitting on the back burner but due to other requirements of my day job
I've not been able to work on it since last fall.

- Greg

> 
> Thanks,
> Yuval
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-05-07 15:16 ` Greg Rose
@ 2012-06-26 12:21   ` Yuval Mintz
  2012-06-26 16:13     ` Alexander Duyck
  2012-06-26 17:19     ` Greg Rose
  0 siblings, 2 replies; 52+ messages in thread
From: Yuval Mintz @ 2012-06-26 12:21 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Greg Rose, netdev

On 05/07/2012 06:16 PM, Greg Rose wrote:

> On Mon, 7 May 2012 14:17:54 +0300
> Yuval Mintz <yuvalmin@broadcom.com> wrote:
> 
>> I've tried to figure out if there was a standard interface
>> (ethtool/iproute) through which a user could configure the number
>> of vfs in his system.
>>
>> I've seen the RFC suggested in
>> http://markmail.org/thread/qblfcv7zbxsxp7q6, and
>> http://markmail.org/thread/fw54dcppmxuxoe6n, but failed to see any
>> later references to it (commits or further discussion on this topic).
>>
>> How exactly are things standing with these RFCs? Were they abandoned?
> 
> The only way to configure the number of VFs continues to be through the
> max_vfs module parameter.  I've got a patch to do it through ethtool
> sitting on the back burner but due to other requirements of my day job
> I've not been able to work on it since last fall.
> 
> - Greg


Hi Ben,

If I want to pick the RFCs and add support for configuring the number of
VFs - do you think ethtool's the right place for such added support?

I'm asking since as far as I can see, ethtool (today) doesn't contain any
features related to virtual functions.

Thanks,
Yuval

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-06-26 12:21   ` Yuval Mintz
@ 2012-06-26 16:13     ` Alexander Duyck
  2012-06-26 17:19     ` Greg Rose
  1 sibling, 0 replies; 52+ messages in thread
From: Alexander Duyck @ 2012-06-26 16:13 UTC (permalink / raw)
  To: Yuval Mintz; +Cc: Ben Hutchings, Greg Rose, netdev

On 06/26/2012 05:21 AM, Yuval Mintz wrote:
> On 05/07/2012 06:16 PM, Greg Rose wrote:
>
>> On Mon, 7 May 2012 14:17:54 +0300
>> Yuval Mintz <yuvalmin@broadcom.com> wrote:
>>
>>> I've tried to figure out if there was a standard interface
>>> (ethtool/iproute) through which a user could configure the number
>>> of vfs in his system.
>>>
>>> I've seen the RFC suggested in
>>> http://markmail.org/thread/qblfcv7zbxsxp7q6, and
>>> http://markmail.org/thread/fw54dcppmxuxoe6n, but failed to see any
>>> later references to it (commits or further discussion on this topic).
>>>
>>> How exactly are things standing with these RFCs? Were they abandoned?
>> The only way to configure the number of VFs continues to be through the
>> max_vfs module parameter.  I've got a patch to do it through ethtool
>> sitting on the back burner but due to other requirements of my day job
>> I've not been able to work on it since last fall.
>>
>> - Greg
>
> Hi Ben,
>
> If I want to pick the RFCs and add support for configuring the number of
> VFs - do you think ethtool's the right place for such added support?
>
> I'm asking since as far as I can see, ethtool (today) doesn't contain any
> features related to virtual functions.
>
> Thanks,
> Yuval

I think the issue is that any class of PCI device could theoretically
support SR-IOV.  For example there could be a storage controller out
there that supports spawning VFs, and an ethtool solution wouldn't work
for a device like that.  Personally what I would like to see is a
solution that is more focused on the PCI side of the network adapters
instead of the network side when it comes to enabling VFs.

Thanks,

Alex

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-06-26 12:21   ` Yuval Mintz
  2012-06-26 16:13     ` Alexander Duyck
@ 2012-06-26 17:19     ` Greg Rose
  2012-07-01 11:09       ` Yuval Mintz
  2012-07-09 18:39       ` Ben Hutchings
  1 sibling, 2 replies; 52+ messages in thread
From: Greg Rose @ 2012-06-26 17:19 UTC (permalink / raw)
  To: Yuval Mintz; +Cc: Ben Hutchings, netdev

On Tue, 26 Jun 2012 15:21:55 +0300
Yuval Mintz <yuvalmin@broadcom.com> wrote:

> On 05/07/2012 06:16 PM, Greg Rose wrote:
> 
> > On Mon, 7 May 2012 14:17:54 +0300
> > Yuval Mintz <yuvalmin@broadcom.com> wrote:
> > 
> >> I've tried to figure out if there was a standard interface
> >> (ethtool/iproute) through which a user could configure the number
> >> of vfs in his system.
> >>
> >> I've seen the RFC suggested in
> >> http://markmail.org/thread/qblfcv7zbxsxp7q6, and
> >> http://markmail.org/thread/fw54dcppmxuxoe6n, but failed to see any
> >> later references to it (commits or further discussion on this
> >> topic).
> >>
> >> How exactly are things standing with these RFCs? Were they
> >> abandoned?
> > 
> > The only way to configure the number of VFs continues to be through
> > the max_vfs module parameter.  I've got a patch to do it through
> > ethtool sitting on the back burner but due to other requirements of
> > my day job I've not been able to work on it since last fall.
> > 
> > - Greg
> 
> 
> Hi Ben,
> 
> If I want to pick the RFCs and add support for configuring the number
> of VFs - do you think ethtool's the right place for such added
> support?
> 
> I'm asking since as far as I can see, ethtool (today) doesn't contain
> any features related to virtual functions.

I think a PCI utility tool would be better, SR-IOV is not limited to
network devices.  That's one of the reasons I dropped the RFC.  I
haven't gotten back to the idea since then due to my day job keeping me
pretty busy.

- Greg

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-06-26 17:19     ` Greg Rose
@ 2012-07-01 11:09       ` Yuval Mintz
  2012-07-09 18:39       ` Ben Hutchings
  1 sibling, 0 replies; 52+ messages in thread
From: Yuval Mintz @ 2012-07-01 11:09 UTC (permalink / raw)
  To: linux-pci; +Cc: Greg Rose, Ben Hutchings, netdev


>>>> I've tried to figure out if there was a standard interface
>>>> (ethtool/iproute) through which a user could configure the number
>>>> of vfs in his system.
>>>>
>>>> I've seen the RFC suggested in
>>>> http://markmail.org/thread/qblfcv7zbxsxp7q6, and
>>>> http://markmail.org/thread/fw54dcppmxuxoe6n, but failed to see any
>>>> later references to it (commits or further discussion on this
>>>> topic).
>>>>
>>>> How exactly are things standing with these RFCs? Were they
>>>> abandoned?
>>>
>>> The only way to configure the number of VFs continues to be through
>>> the max_vfs module parameter.  I've got a patch to do it through
>>> ethtool sitting on the back burner but due to other requirements of
>>> my day job I've not been able to work on it since last fall.
>>>
>>> - Greg
>>
>>
>> Hi Ben,
>>
>> If I want to pick the RFCs and add support for configuring the number
>> of VFs - do you think ethtool's the right place for such added
>> support?
>>
>> I'm asking since as far as I can see, ethtool (today) doesn't contain
>> any features related to virtual functions.
> 
> I think a PCI utility tool would be better, SR-IOV is not limited to
> network devices.  That's one of the reasons I dropped the RFC.  I
> haven't gotten back to the idea since then due to my day job keeping me
> pretty busy.
> 
> - Greg


I'm adding the linux-pci guys, since they (obviously) are the experts in
these topics.

Is there any PCI utility (or at least an existing kernel API) which is
suitable for adding this sort of functionality?

Thanks,
Yuval

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-06-26 17:19     ` Greg Rose
  2012-07-01 11:09       ` Yuval Mintz
@ 2012-07-09 18:39       ` Ben Hutchings
  2012-07-09 21:13         ` Chris Friesen
  1 sibling, 1 reply; 52+ messages in thread
From: Ben Hutchings @ 2012-07-09 18:39 UTC (permalink / raw)
  To: Greg Rose; +Cc: Yuval Mintz, netdev

On Tue, 2012-06-26 at 10:19 -0700, Greg Rose wrote:
> On Tue, 26 Jun 2012 15:21:55 +0300
> Yuval Mintz <yuvalmin@broadcom.com> wrote:
> 
> > On 05/07/2012 06:16 PM, Greg Rose wrote:
> > 
> > > On Mon, 7 May 2012 14:17:54 +0300
> > > Yuval Mintz <yuvalmin@broadcom.com> wrote:
> > > 
> > >> I've tried to figure out if there was a standard interface
> > >> (ethtool/iproute) through which a user could configure the number
> > >> of vfs in his system.
> > >>
> > >> I've seen the RFC suggested in
> > >> http://markmail.org/thread/qblfcv7zbxsxp7q6, and
> > >> http://markmail.org/thread/fw54dcppmxuxoe6n, but failed to see any
> > >> later references to it (commits or further discussion on this
> > >> topic).
> > >>
> > >> How exactly are things standing with these RFCs? Were they
> > >> abandoned?
> > > 
> > > The only way to configure the number of VFs continues to be through
> > > the max_vfs module parameter.  I've got a patch to do it through
> > > ethtool sitting on the back burner but due to other requirements of
> > > my day job I've not been able to work on it since last fall.
> > > 
> > > - Greg
> > 
> > 
> > Hi Ben,
> > 
> > If I want to pick the RFCs and add support for configuring the number
> > of VFs - do you think ethtool's the right place for such added
> > support?
> > 
> > I'm asking since as far as I can see, ethtool (today) doesn't contain
> > any features related to virtual functions.
> 
> I think a PCI utility tool would be better, SR-IOV is not limited to
> network devices.  That's one of the reasons I dropped the RFC.  I
> haven't gotten back to the idea since then due to my day job keeping me
> pretty busy.

For what it's worth, I agree with this.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-07-09 18:39       ` Ben Hutchings
@ 2012-07-09 21:13         ` Chris Friesen
  2012-07-16  9:19           ` Yuval Mintz
  0 siblings, 1 reply; 52+ messages in thread
From: Chris Friesen @ 2012-07-09 21:13 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Greg Rose, Yuval Mintz, netdev

On 07/09/2012 12:39 PM, Ben Hutchings wrote:
> On Tue, 2012-06-26 at 10:19 -0700, Greg Rose wrote:
>> On Tue, 26 Jun 2012 15:21:55 +0300
>> Yuval Mintz<yuvalmin@broadcom.com>  wrote:
>>
>>> On 05/07/2012 06:16 PM, Greg Rose wrote:
>>>> The only way to configure the number of VFs continues to be through
>>>> the max_vfs module parameter.  I've got a patch to do it through
>>>> ethtool sitting on the back burner but due to other requirements of
>>>> my day job I've not been able to work on it since last fall.
>>>>
>>>> - Greg
>>>
>>>
>>> Hi Ben,
>>>
>>> If I want to pick the RFCs and add support for configuring the number
>>> of VFs - do you think ethtool's the right place for such added
>>> support?
>>>
>>> I'm asking since as far as I can see, ethtool (today) doesn't contain
>>> any features related to virtual functions.
>>
>> I think a PCI utility tool would be better, SR-IOV is not limited to
>> network devices.  That's one of the reasons I dropped the RFC.  I
>> haven't gotten back to the idea since then due to my day job keeping me
>> pretty busy.
>
> For what it's worth, I agree with this.

I'd love to have the ability to set the number of VFs on the fly.   From 
my perspective it would be ideal if this could be exported via /sys or 
something rather than a standalone tool using a binary API.  It's fewer 
packages for userspace to deal with as well as being easily scripted.

Chris

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-07-09 21:13         ` Chris Friesen
@ 2012-07-16  9:19           ` Yuval Mintz
  2012-07-17 19:29             ` Don Dutile
  0 siblings, 1 reply; 52+ messages in thread
From: Yuval Mintz @ 2012-07-16  9:19 UTC (permalink / raw)
  To: davem; +Cc: Chris Friesen, Ben Hutchings, Greg Rose, netdev, linux-pci


>>>> If I want to pick the RFCs and add support for configuring the number
>>>> of VFs - do you think ethtool's the right place for such added
>>>> support?
>>>>
>>> I think a PCI utility tool would be better, SR-IOV is not limited to
>>> network devices.  That's one of the reasons I dropped the RFC.  I
>>> haven't gotten back to the idea since then due to my day job keeping me
>>> pretty busy.
>>
>> For what it's worth, I agree with this.
> 
> From my perspective it would be ideal if this could be exported via /sys or something
> 


Well, obviously unless there was a sudden change in our stance regarding
sysfs we will not head that way.

This thread got no replies from the pci community, and I'm unfamiliar
with such a tool.

Dave, What's your stance in the matter - do you wish us to continue pursuing
some pci tool (which might or might not exist), or instead work on
a networking solution to this issue?

Do you happen to know such a tool?

Thanks,
Yuval

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-07-16  9:19           ` Yuval Mintz
@ 2012-07-17 19:29             ` Don Dutile
  2012-07-17 21:08               ` Chris Friesen
  0 siblings, 1 reply; 52+ messages in thread
From: Don Dutile @ 2012-07-17 19:29 UTC (permalink / raw)
  To: Yuval Mintz
  Cc: davem, Chris Friesen, Ben Hutchings, Greg Rose, netdev, linux-pci

On 07/16/2012 05:19 AM, Yuval Mintz wrote:
>
>>>>> If I want to pick the RFCs and add support for configuring the number
>>>>> of VFs - do you think ethtool's the right place for such added
>>>>> support?
>>>>>
>>>> I think a PCI utility tool would be better, SR-IOV is not limited to
>>>> network devices.  That's one of the reasons I dropped the RFC.  I
>>>> haven't gotten back to the idea since then due to my day job keeping me
>>>> pretty busy.
>>>
>>> For what it's worth, I agree with this.
>>
>>  From my perspective it would be ideal if this could be exported via /sys or something
>>
>
>
> Well, obviously unless there was a sudden change in our stance regarding
> sysfs we will not head that way.
>
> This thread got no replies from the pci community, and I'm unfamiliar
> with such a tool.
>
> Dave, What's your stance in the matter - do you wish us to continue pursuing
> some pci tool (which might or might not exist), or instead work on
> a networking solution to this issue?
>
> Do you happen to know such a tool?
>
> Thanks,
> Yuval
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yuval (et. al.),

Not seeing the original thread on netdev,
I just had a recent discussion w/Greg Rose about providing
sysfs-based, VF enable/disable methods.
I was told that historically, VF enablement started as a sysfs-based
function, was debated and pushed toward a device/driver-specific method,
as it is implemented today.   Now, with some experience with SRIOV and
its use in the virtualization space, the discussion has renewed as to whether
a sysfs-based enable/disable method should be resurrected, so it
provides a more generic method for virtualization tools/api's to
manage SRIOV/VF devices.

I was hoping to discuss this topic with a number of folks at
LinuxCon/Plumbers/KS when the PCI mini-summit is held, to gain
further insight, or be brought up to speed on past history,
and review current uses/status of VFs.

WRT SRIOV-nic devices, the thinking goes that protocol-level
parameters associated with VFs should use protocol-specific interfaces,
e.g., ethtool, ip link set, etc. for Ethernet VFs.
Thus, the various protocol control functions/tools should
be used to control VF parameters, as one would for a physical device
of that protocol/class.

- Don

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-07-17 19:29             ` Don Dutile
@ 2012-07-17 21:08               ` Chris Friesen
  2012-07-17 21:11                 ` David Miller
  0 siblings, 1 reply; 52+ messages in thread
From: Chris Friesen @ 2012-07-17 21:08 UTC (permalink / raw)
  To: Don Dutile
  Cc: Yuval Mintz, davem, Ben Hutchings, Greg Rose, netdev, linux-pci

On 07/17/2012 01:29 PM, Don Dutile wrote:

> WRT SRIOV-nic devices, the thinking goes that protocol-level
> parameters associated with VFs should use protocol-specific interfaces,
> e.g., ethtool, ip link set, etc. for Ethernet VFs.
> Thus, the various protocol control functions/tools should
> be used to control VF parameters, as one would for a physical device
> of that protocol/class.

It seems to me that the mere act of creating one or more VFs is 
something generic, applicable to all devices that are capable of it. 
The details of configuring those VFs can then be handled by 
protocol-specific interfaces.

I'm not too worried about the exact mechanism of doing it, as long as 
it's ultimately scriptable--that is, if it's a C API then it would be 
appreciated if there was a standard tool to call that implements it. 
 From that perspective a sysfs-based interface is ideal since it is 
directly scriptable.

Chris

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-07-17 21:08               ` Chris Friesen
@ 2012-07-17 21:11                 ` David Miller
  2012-07-20 15:27                   ` Chris Friesen
  0 siblings, 1 reply; 52+ messages in thread
From: David Miller @ 2012-07-17 21:11 UTC (permalink / raw)
  To: chris.friesen
  Cc: ddutile, yuvalmin, bhutchings, gregory.v.rose, netdev, linux-pci

From: Chris Friesen <chris.friesen@genband.com>
Date: Tue, 17 Jul 2012 15:08:45 -0600

> From that perspective a sysfs-based interface is ideal since it is
> directly scriptable.

As is anything ethtool or netlink based, since we have 'ethtool'
and 'ip' for scripting.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-07-17 21:11                 ` David Miller
@ 2012-07-20 15:27                   ` Chris Friesen
  2012-07-20 15:56                     ` Don Dutile
  0 siblings, 1 reply; 52+ messages in thread
From: Chris Friesen @ 2012-07-20 15:27 UTC (permalink / raw)
  To: David Miller
  Cc: ddutile, yuvalmin, bhutchings, gregory.v.rose, netdev, linux-pci

On 07/17/2012 03:11 PM, David Miller wrote:
> From: Chris Friesen<chris.friesen@genband.com>
> Date: Tue, 17 Jul 2012 15:08:45 -0600
>
>>  From that perspective a sysfs-based interface is ideal since it is
>> directly scriptable.
>
> As is anything ethtool or netlink based, since we have 'ethtool'
> and 'ip' for scripting.

I'm not picky...whatever works.

To me the act of creating virtual functions seems generic enough (I'm 
aware of SR-IOV capable storage controllers, I'm sure there is other 
hardware as well) that ethtool/ip don't really seem like the most 
appropriate tools for the job.

I would have thought it would make more sense as a generic PCI 
functionality, in which case I'm not aware of an existing binary tool 
that would be a logical choice to extend.

Chris

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-07-20 15:27                   ` Chris Friesen
@ 2012-07-20 15:56                     ` Don Dutile
  2012-07-20 17:42                       ` Ben Hutchings
  0 siblings, 1 reply; 52+ messages in thread
From: Don Dutile @ 2012-07-20 15:56 UTC (permalink / raw)
  To: Chris Friesen
  Cc: David Miller, yuvalmin, bhutchings, gregory.v.rose, netdev, linux-pci

On 07/20/2012 11:27 AM, Chris Friesen wrote:
> On 07/17/2012 03:11 PM, David Miller wrote:
>> From: Chris Friesen<chris.friesen@genband.com>
>> Date: Tue, 17 Jul 2012 15:08:45 -0600
>>
>>> From that perspective a sysfs-based interface is ideal since it is
>>> directly scriptable.
>>
>> As is anything ethtool or netlink based, since we have 'ethtool'
>> and 'ip' for scripting.
>
> I'm not picky...whatever works.
>
> To me the act of creating virtual functions seems generic enough (I'm aware of SR-IOV capable storage controllers, I'm sure there is other hardware as well) that ethtool/ip don't really seem like the most appropriate tools for the job.
>
Yes, and then there are 'other network' controllers too ... IB  which I don't know if it adheres to ethtool, since it's not an Ethernet device ... isn't that why they call it Infiniband ... ;-) )
In the telecom space, they use NTBs and PCI as a 'network' ... I know, not common in Linux space, and VFs in that space aren't being discussed (that I've ever heard), but another example where 'network' != Ethernet, so ethtool doesn't solve PCI-level configuration/use.

So, VFs are a PCI defined entity, so their enable/disablement should be handled by PCI.

Conversely, when dealing with networking attributes of a PCI-VF ethernet-nic, networking tools should be used, not PCI tools.

> I would have thought it would make more sense as a generic PCI functionality, in which case I'm not aware of an existing binary tool that would be a logical choice to extend.
>
> Chris

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-07-20 15:56                     ` Don Dutile
@ 2012-07-20 17:42                       ` Ben Hutchings
  2012-07-20 19:29                         ` Chris Friesen
  0 siblings, 1 reply; 52+ messages in thread
From: Ben Hutchings @ 2012-07-20 17:42 UTC (permalink / raw)
  To: Don Dutile
  Cc: Chris Friesen, David Miller, yuvalmin, gregory.v.rose, netdev, linux-pci

On Fri, 2012-07-20 at 11:56 -0400, Don Dutile wrote:
> On 07/20/2012 11:27 AM, Chris Friesen wrote:
> > On 07/17/2012 03:11 PM, David Miller wrote:
> >> From: Chris Friesen<chris.friesen@genband.com>
> >> Date: Tue, 17 Jul 2012 15:08:45 -0600
> >>
> >>> From that perspective a sysfs-based interface is ideal since it is
> >>> directly scriptable.
> >>
> >> As is anything ethtool or netlink based, since we have 'ethtool'
> >> and 'ip' for scripting.
> >
> > I'm not picky...whatever works.
> >
> > To me the act of creating virtual functions seems generic enough (I'm aware of SR-IOV capable storage controllers, I'm sure there is other hardware as well) that ethtool/ip don't really seem like the most appropriate tools for the job.
> >
> Yes, and then there are 'other network' controllers too ... IB  which
> I don't know if it adheres to ethtool, since it's not an Ethernet
> device ... isn't that why they call it Infiniband ... ;-) )
> In the telecom space, they use NTBs and PCI as a 'network' ... I know,
> not common in Linux space, and VFs in that space aren't being
> discussed (that I've ever heard), but another example where
> 'network' != Ethernet, so ethtool doesn't solve PCI-level
> configuration/use.
[...]

The ethtool API is typically used for net device operations that can be
largely devolved to individual drivers, and which the network stack can
mostly ignore (though offload features are an historical exception to
this).  It started with Ethernet link settings, but many operations are
applicable (and implemented by) other types of network device.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-07-20 17:42                       ` Ben Hutchings
@ 2012-07-20 19:29                         ` Chris Friesen
  2012-07-20 20:01                           ` Ben Hutchings
  0 siblings, 1 reply; 52+ messages in thread
From: Chris Friesen @ 2012-07-20 19:29 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Don Dutile, David Miller, yuvalmin, gregory.v.rose, netdev, linux-pci

On 07/20/2012 11:42 AM, Ben Hutchings wrote:
>
> The ethtool API is typically used for net device operations that can be
> largely devolved to individual drivers, and which the network stack can
> mostly ignore (though offload features are an historical exception to
> this).  It started with Ethernet link settings, but many operations are
> applicable (and implemented by) other types of network device.

That (potentially) accounts for all network devices, but it leaves all 
the other devices that could export virtual functions.

Why should I need to use a different API to enable virtual functions on 
my network device and my storage controller?  (And why should "ethtool" 
or "ip" care that it's a virtual function?)

What Don and I are suggesting is that the concept of virtual functions 
is a PCI thing, so it should be dealt with at the PCI layer.  Regardless 
of the type of device the export of virtual functions is conceptually 
the same thing, so it should use the same API.

Once the device exists, then domain-specific APIs would be used to 
configure it the same way that they would configure a physical device.

Chris

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-07-20 19:29                         ` Chris Friesen
@ 2012-07-20 20:01                           ` Ben Hutchings
  2012-07-20 20:15                             ` Don Dutile
  2012-07-20 23:42                             ` Chris Friesen
  0 siblings, 2 replies; 52+ messages in thread
From: Ben Hutchings @ 2012-07-20 20:01 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Don Dutile, David Miller, yuvalmin, gregory.v.rose, netdev, linux-pci

On Fri, 2012-07-20 at 13:29 -0600, Chris Friesen wrote:
> On 07/20/2012 11:42 AM, Ben Hutchings wrote:
> >
> > The ethtool API is typically used for net device operations that can be
> > largely devolved to individual drivers, and which the network stack can
> > mostly ignore (though offload features are an historical exception to
> > this).  It started with Ethernet link settings, but many operations are
> > applicable (and implemented by) other types of network device.
> 
> That (potentially) accounts for all network devices, but it leaves all 
> the other devices that could export virtual functions.
> 
> Why should I need to use a different API to enable virtual functions on 
> my network device and my storage controller?

Indeed; I was merely making the point that it would be quite valid to
use that means for setting VF network parameters for any network device
that supports IOV.

> (And why should "ethtool" or "ip" care that it's a virtual function?)

VFs may be assigned to a guest which is not fully trusted by the
hypervisor or privileged domain.  (This can sometimes be true for PFs
too, depending on the capabilities of the hypervisor and guest OS.)
Some configuration may therefore need to be done via a trusted PF.

> What Don and I are suggesting is that the concept of virtual functions 
> is a PCI thing, so it should be dealt with at the PCI layer.  Regardless 
> of the type of device the export of virtual functions is conceptually 
> the same thing, so it should use the same API.
> 
> Once the device exists, then domain-specific APIs would be used to 
> configure it the same way that they would configure a physical device.

To an extent, but not entirely.

Currently, the assigned MAC address and (optional) VLAN tag for each
networking VF are configured via the PF net device (though this is done
though the rtnetlink API rather than ethtool).

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-07-20 20:01                           ` Ben Hutchings
@ 2012-07-20 20:15                             ` Don Dutile
  2012-07-20 23:42                             ` Chris Friesen
  1 sibling, 0 replies; 52+ messages in thread
From: Don Dutile @ 2012-07-20 20:15 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Chris Friesen, David Miller, yuvalmin, gregory.v.rose, netdev, linux-pci

On 07/20/2012 04:01 PM, Ben Hutchings wrote:
> On Fri, 2012-07-20 at 13:29 -0600, Chris Friesen wrote:
>> On 07/20/2012 11:42 AM, Ben Hutchings wrote:
>>>
>>> The ethtool API is typically used for net device operations that can be
>>> largely devolved to individual drivers, and which the network stack can
>>> mostly ignore (though offload features are an historical exception to
>>> this).  It started with Ethernet link settings, but many operations are
>>> applicable (and implemented by) other types of network device.
>>
>> That (potentially) accounts for all network devices, but it leaves all
>> the other devices that could export virtual functions.
>>
>> Why should I need to use a different API to enable virtual functions on
>> my network device and my storage controller?
>
> Indeed; I was merely making the point that it would be quite valid to
> use that means for setting VF network parameters for any network device
> that supports IOV.
>
Yes, I read Ben's reply as supporting the proposition of VF enablement
at the PCI level.

>> (And why should "ethtool" or "ip" care that it's a virtual function?)
>
> VFs may be assigned to a guest which is not fully trusted by the
> hypervisor or privileged domain.  (This can sometimes be true for PFs
> too, depending on the capabilities of the hypervisor and guest OS.)
> Some configuration may therefore need to be done via a trusted PF.
>
Correct!  The security domain (for KVM) is the host, thus, the host
assignes VF attributes *before* they are given to the guest.... The guest
is just a consumer, at least that's been my experience with VF devices to date,
but I could see how an improper VF design could allow untrusted/guest
(ethtool/netlink) ops on the VF.

>> What Don and I are suggesting is that the concept of virtual functions
>> is a PCI thing, so it should be dealt with at the PCI layer.  Regardless
>> of the type of device the export of virtual functions is conceptually
>> the same thing, so it should use the same API.
>>
>> Once the device exists, then domain-specific APIs would be used to
>> configure it the same way that they would configure a physical device.
>
> To an extent, but not entirely.
>
> Currently, the assigned MAC address and (optional) VLAN tag for each
> networking VF are configured via the PF net device (though this is done
> though the rtnetlink API rather than ethtool).
Yes, through the PF, which is suppose to remain in the trusted host/hypervisor
domain.  (Do a 'man ip' on RHEL6 and look at 'ip link set'  where it then mentions
the parameter 'vf'.).

>
> Ben.
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-07-20 20:01                           ` Ben Hutchings
  2012-07-20 20:15                             ` Don Dutile
@ 2012-07-20 23:42                             ` Chris Friesen
  2012-07-21  0:52                               ` Rose, Gregory V
  2012-07-23 14:03                               ` Don Dutile
  1 sibling, 2 replies; 52+ messages in thread
From: Chris Friesen @ 2012-07-20 23:42 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Don Dutile, David Miller, yuvalmin, gregory.v.rose, netdev, linux-pci

On 07/20/2012 02:01 PM, Ben Hutchings wrote:
> On Fri, 2012-07-20 at 13:29 -0600, Chris Friesen wrote:

>> Once the device exists, then domain-specific APIs would be used to
>> configure it the same way that they would configure a physical device.
>
> To an extent, but not entirely.
>
> Currently, the assigned MAC address and (optional) VLAN tag for each
> networking VF are configured via the PF net device (though this is done
> though the rtnetlink API rather than ethtool).

I actually have a use-case where the guest needs to be able to modify 
the MAC addresses of network devices that are actually VFs.

The guest is bonding the network devices together, so the bonding driver 
in the guest expects to be able to set all the slaves to the same MAC 
address.

As I read the ixgbe driver, this should be possible as long as the host 
hasn't explicitly set the MAC address of the VF.  Is that correct?

Chris

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: New commands to configure IOV features
  2012-07-20 23:42                             ` Chris Friesen
@ 2012-07-21  0:52                               ` Rose, Gregory V
  2012-07-23 14:03                               ` Don Dutile
  1 sibling, 0 replies; 52+ messages in thread
From: Rose, Gregory V @ 2012-07-21  0:52 UTC (permalink / raw)
  To: Chris Friesen, Ben Hutchings
  Cc: Don Dutile, David Miller, yuvalmin, netdev, linux-pci

> -----Original Message-----
> From: Chris Friesen [mailto:chris.friesen@genband.com]
> Sent: Friday, July 20, 2012 4:42 PM
> To: Ben Hutchings
> Cc: Don Dutile; David Miller; yuvalmin@broadcom.com; Rose, Gregory V;
> netdev@vger.kernel.org; linux-pci@vger.kernel.org
> Subject: Re: New commands to configure IOV features
> 
> On 07/20/2012 02:01 PM, Ben Hutchings wrote:
> > On Fri, 2012-07-20 at 13:29 -0600, Chris Friesen wrote:
> 
> >> Once the device exists, then domain-specific APIs would be used to
> >> configure it the same way that they would configure a physical device.
> >
> > To an extent, but not entirely.
> >
> > Currently, the assigned MAC address and (optional) VLAN tag for each
> > networking VF are configured via the PF net device (though this is
> > done though the rtnetlink API rather than ethtool).
> 
> I actually have a use-case where the guest needs to be able to modify the
> MAC addresses of network devices that are actually VFs.
> 
> The guest is bonding the network devices together, so the bonding driver
> in the guest expects to be able to set all the slaves to the same MAC
> address.
> 
> As I read the ixgbe driver, this should be possible as long as the host
> hasn't explicitly set the MAC address of the VF.  Is that correct?

That is correct.

- Greg
Intel Corp.
LAN Access Division

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-07-20 23:42                             ` Chris Friesen
  2012-07-21  0:52                               ` Rose, Gregory V
@ 2012-07-23 14:03                               ` Don Dutile
  2012-07-23 15:09                                 ` Chris Friesen
  2012-07-23 16:37                                 ` Rose, Gregory V
  1 sibling, 2 replies; 52+ messages in thread
From: Don Dutile @ 2012-07-23 14:03 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Ben Hutchings, David Miller, yuvalmin, gregory.v.rose, netdev, linux-pci

On 07/20/2012 07:42 PM, Chris Friesen wrote:
> On 07/20/2012 02:01 PM, Ben Hutchings wrote:
>> On Fri, 2012-07-20 at 13:29 -0600, Chris Friesen wrote:
>
>>> Once the device exists, then domain-specific APIs would be used to
>>> configure it the same way that they would configure a physical device.
>>
>> To an extent, but not entirely.
>>
>> Currently, the assigned MAC address and (optional) VLAN tag for each
>> networking VF are configured via the PF net device (though this is done
>> though the rtnetlink API rather than ethtool).
>
> I actually have a use-case where the guest needs to be able to modify the MAC addresses of network devices that are actually VFs.
>
> The guest is bonding the network devices together, so the bonding driver in the guest expects to be able to set all the slaves to the same MAC address.
>
> As I read the ixgbe driver, this should be possible as long as the host hasn't explicitly set the MAC address of the VF. Is that correct?
>
> Chris

Interesting tug of war: hypervisors will want to set the macaddrs for security reasons,
                         some guests may want to set macaddr for (valid?) config reasons.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-07-23 14:03                               ` Don Dutile
@ 2012-07-23 15:09                                 ` Chris Friesen
  2012-07-23 17:06                                   ` Rose, Gregory V
  2012-07-23 18:36                                   ` Stephen Hemminger
  2012-07-23 16:37                                 ` Rose, Gregory V
  1 sibling, 2 replies; 52+ messages in thread
From: Chris Friesen @ 2012-07-23 15:09 UTC (permalink / raw)
  To: Don Dutile
  Cc: Ben Hutchings, David Miller, yuvalmin, gregory.v.rose, netdev, linux-pci

On 07/23/2012 08:03 AM, Don Dutile wrote:
> On 07/20/2012 07:42 PM, Chris Friesen wrote:
>>
>> I actually have a use-case where the guest needs to be able to modify 
>> the MAC addresses of network devices that are actually VFs.
>>
>> The guest is bonding the network devices together, so the bonding 
>> driver in the guest expects to be able to set all the slaves to the 
>> same MAC address.
>>
>> As I read the ixgbe driver, this should be possible as long as the 
>> host hasn't explicitly set the MAC address of the VF. Is that correct?
>>
>> Chris
>
> Interesting tug of war: hypervisors will want to set the macaddrs for 
> security reasons,
>                         some guests may want to set macaddr for 
> (valid?) config reasons.
>

In our case we have control over both guest an host anyways, so it's 
less of a security issue.  In the general case though I could see it 
being an interesting problem.

Back to the original discussion though--has anyone got any ideas about 
the best way to trigger runtime creation of VFs?  I don't know what the 
binary APIs looks like, but via sysfs I could see something like

echo number_of_new_vfs_to_create >  
/sys/bus/pci/devices/<address>/create_vfs

Something else that occurred to me--is there buy-in from driver 
maintainers?  I know the Intel ethernet drivers (what I'm most familiar 
with) would need to be substantially modified to support on-the-fly 
addition of new vfs.  Currently they assume that the number of vfs is 
known at module init time.

Chris

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: New commands to configure IOV features
  2012-07-23 14:03                               ` Don Dutile
  2012-07-23 15:09                                 ` Chris Friesen
@ 2012-07-23 16:37                                 ` Rose, Gregory V
  1 sibling, 0 replies; 52+ messages in thread
From: Rose, Gregory V @ 2012-07-23 16:37 UTC (permalink / raw)
  To: Don Dutile, Chris Friesen
  Cc: Ben Hutchings, David Miller, yuvalmin, netdev, linux-pci

> -----Original Message-----
> From: Don Dutile [mailto:ddutile@redhat.com]
> Sent: Monday, July 23, 2012 7:04 AM
> To: Chris Friesen
> Cc: Ben Hutchings; David Miller; yuvalmin@broadcom.com; Rose, Gregory V;
> netdev@vger.kernel.org; linux-pci@vger.kernel.org
> Subject: Re: New commands to configure IOV features
> 
> On 07/20/2012 07:42 PM, Chris Friesen wrote:
> > On 07/20/2012 02:01 PM, Ben Hutchings wrote:
> >> On Fri, 2012-07-20 at 13:29 -0600, Chris Friesen wrote:
> >
> >>> Once the device exists, then domain-specific APIs would be used to
> >>> configure it the same way that they would configure a physical device.
> >>
> >> To an extent, but not entirely.
> >>
> >> Currently, the assigned MAC address and (optional) VLAN tag for each
> >> networking VF are configured via the PF net device (though this is
> >> done though the rtnetlink API rather than ethtool).
> >
> > I actually have a use-case where the guest needs to be able to modify
> the MAC addresses of network devices that are actually VFs.
> >
> > The guest is bonding the network devices together, so the bonding driver
> in the guest expects to be able to set all the slaves to the same MAC
> address.
> >
> > As I read the ixgbe driver, this should be possible as long as the host
> hasn't explicitly set the MAC address of the VF. Is that correct?
> >
> > Chris
> 
> Interesting tug of war: hypervisors will want to set the macaddrs for
> security reasons,
>                          some guests may want to set macaddr for (valid?)
> config reasons.

It is a matter of trust.  The ability to set your own MAC address filters is a potential security issue, so host administrators have the ability to determine whether they trust the VF (and implicitly, the domain in which the VF resides).  There is also a sort of half-way solution.  By turning off anti-spoofing you can allow the VF to use source MAC addresses that are not actually assigned in HW filters.  This was done to support some bonding scenarios where the VF will need to transmit with a different source address.

Many applications using SR-IOV are embedded devices such as switches, edge relay devices, IP forwarding/filtering appliances, routers, etc.  More often than not the host administrator can trust domains that the VFs are assigned to because those domains are completely under the control of the local host.  In those cases the VFs are trusted and can be allowed to set their own MAC filters and use any source MAC address they please.

Other applications might need to assign VF devices to non-trusted domains.  Perhaps a service provider has leased a virtual machine domain to a subscriber who has purchased QoS levels that can only be met with the performance levels available with SR-IOV VF devices.  Other scenarios exist.  In these cases it is worthwhile to be able to restrict the VF's ability to set MAC filters and use source MAD addresses not assigned to it.

Rather than a tug of war I just view it as balancing security concerns with levels of additional capability and functionality.  That goes on all the time.

- Greg

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: New commands to configure IOV features
  2012-07-23 15:09                                 ` Chris Friesen
@ 2012-07-23 17:06                                   ` Rose, Gregory V
  2012-07-23 18:36                                   ` Stephen Hemminger
  1 sibling, 0 replies; 52+ messages in thread
From: Rose, Gregory V @ 2012-07-23 17:06 UTC (permalink / raw)
  To: Chris Friesen, Don Dutile
  Cc: Ben Hutchings, David Miller, yuvalmin, netdev, linux-pci

> -----Original Message-----
> From: Chris Friesen [mailto:chris.friesen@genband.com]
> Sent: Monday, July 23, 2012 8:10 AM
> To: Don Dutile
> Cc: Ben Hutchings; David Miller; yuvalmin@broadcom.com; Rose, Gregory V;
> netdev@vger.kernel.org; linux-pci@vger.kernel.org
> Subject: Re: New commands to configure IOV features
> 
> On 07/23/2012 08:03 AM, Don Dutile wrote:
> > On 07/20/2012 07:42 PM, Chris Friesen wrote:
> >>
> >> I actually have a use-case where the guest needs to be able to modify
> >> the MAC addresses of network devices that are actually VFs.
> >>
> >> The guest is bonding the network devices together, so the bonding
> >> driver in the guest expects to be able to set all the slaves to the
> >> same MAC address.
> >>
> >> As I read the ixgbe driver, this should be possible as long as the
> >> host hasn't explicitly set the MAC address of the VF. Is that correct?
> >>
> >> Chris
> >
> > Interesting tug of war: hypervisors will want to set the macaddrs for
> > security reasons,
> >                         some guests may want to set macaddr for
> > (valid?) config reasons.
> >
> 
> In our case we have control over both guest an host anyways, so it's
> less of a security issue.  In the general case though I could see it
> being an interesting problem.
> 
> Back to the original discussion though--has anyone got any ideas about
> the best way to trigger runtime creation of VFs?  I don't know what the
> binary APIs looks like, but via sysfs I could see something like
> 
> echo number_of_new_vfs_to_create >
> /sys/bus/pci/devices/<address>/create_vfs

The original proposals for creation and management of virtual functions were very much along these lines.  However, at the time most of the distributions that used virtualization were based upon the 2.6.18 kernel.  Red Hat Enterprise Linux 5.x and Citrix Xen Server were both using kernels derived from the 2.6.18 kernel.  In order to implement a sysfs based management approach we would have had to break the 2.6.18 kernel ABI, which is of course a non-starter.  We were able to implement an SR-IOV solution without breaking the ABI but it required use of a module parameter to inform the PF driver of how many VFs it should create.

This was fine during the first couple of years in which not many folks were using SR-IOV outside of a lab and the number of platforms that supported SR-IOV was very limited.  As an experimental solution it has worked pretty well.

The last year and a half or so we have seen SR-IOV go from an experimental technology to one that is becoming increasingly deployed in real world applications and now the limitations of that original approach are more apparent.  

> 
> Something else that occurred to me--is there buy-in from driver
> maintainers?

That's a good question.  I've seen a lot of resistance to using sysfs based interfaces in drivers but usually that is when a driver wants to implement a private interface that other drivers wouldn't want or be able to use.  I'm less sure about a sysfs solution that could be deployed by all SR-IOV capable devices of any type, be they Ethernet controllers, SCSI controllers, etc.  I'm not sure what the objection would be in case of a general purpose solution, however I've been told that in general sysfs based solutions are often frowned upon by kernel maintainers.  Perhaps that is because many of them are not generic solutions to a well defined problem.

  I know the Intel ethernet drivers (what I'm most familiar
> with) would need to be substantially modified to support on-the-fly
> addition of new vfs.

Actually, it wouldn't be that bad.  But yes, there'd have to be some way for a driver to register a callback routine with the PCI interface so that it could be notified when changes have been made to the SR-IOV configuration of the device.  This would require a new API, and some driver changes.  In the case of the Intel drivers it wouldn't be too intrusive and it would definitely help us to meet some customer requirements.  The current model using a module parameter forces all ports controlled by a PF driver to use the same number of VFs per function.  This is clunky and there are a lot of users that would like the ability to assign differing numbers of VFs to respective physical functions.

I should also note that we need to be careful about what we mean by the phrase "support on-the-fly addition of new vfs".

You cannot just add VFs without first tearing down the VFs that are currently allocated.  This is a limitation of the PCIE SR-IOV spec IIRC, and in any case it is true with Intel SR-IOV capable devices.  The numVFs parameter in the SR-IOV capability structure is not writeable while the VF enable bit is set.  To change that value you must first clear the VF enable bit and when you do that all your current VFs cease to exist.  You can then write a different number of VFs and re-enable them but during a short interval the VFs that were already there are destroyed and completely reset when they come back on-line.

- Greg

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-07-23 15:09                                 ` Chris Friesen
  2012-07-23 17:06                                   ` Rose, Gregory V
@ 2012-07-23 18:36                                   ` Stephen Hemminger
  2012-07-23 18:40                                     ` Rose, Gregory V
  1 sibling, 1 reply; 52+ messages in thread
From: Stephen Hemminger @ 2012-07-23 18:36 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Don Dutile, Ben Hutchings, David Miller, yuvalmin,
	gregory.v.rose, netdev, linux-pci

On Mon, 23 Jul 2012 09:09:38 -0600
Chris Friesen <chris.friesen@genband.com> wrote:

> On 07/23/2012 08:03 AM, Don Dutile wrote:
> > On 07/20/2012 07:42 PM, Chris Friesen wrote:
> >>
> >> I actually have a use-case where the guest needs to be able to modify 
> >> the MAC addresses of network devices that are actually VFs.
> >>
> >> The guest is bonding the network devices together, so the bonding 
> >> driver in the guest expects to be able to set all the slaves to the 
> >> same MAC address.
> >>
> >> As I read the ixgbe driver, this should be possible as long as the 
> >> host hasn't explicitly set the MAC address of the VF. Is that correct?
> >>
> >> Chris
> >
> > Interesting tug of war: hypervisors will want to set the macaddrs for 
> > security reasons,
> >                         some guests may want to set macaddr for 
> > (valid?) config reasons.
> >
> 
> In our case we have control over both guest an host anyways, so it's 
> less of a security issue.  In the general case though I could see it 
> being an interesting problem.
> 
> Back to the original discussion though--has anyone got any ideas about 
> the best way to trigger runtime creation of VFs?  I don't know what the 
> binary APIs looks like, but via sysfs I could see something like
> 
> echo number_of_new_vfs_to_create >  
> /sys/bus/pci/devices/<address>/create_vfs
> 
> Something else that occurred to me--is there buy-in from driver 
> maintainers?  I know the Intel ethernet drivers (what I'm most familiar 
> with) would need to be substantially modified to support on-the-fly 
> addition of new vfs.  Currently they assume that the number of vfs is 
> known at module init time.
> 

Why couldn't rtnl_link_ops be used for this. It is already the preferred
interface to create vlan's, bond devices, and other virtual devices?
The one issue is that do the created VF's exist in kernel as devices
or only visible to guest?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: New commands to configure IOV features
  2012-07-23 18:36                                   ` Stephen Hemminger
@ 2012-07-23 18:40                                     ` Rose, Gregory V
  2012-09-19 11:07                                       ` Yuval Mintz
  0 siblings, 1 reply; 52+ messages in thread
From: Rose, Gregory V @ 2012-07-23 18:40 UTC (permalink / raw)
  To: Stephen Hemminger, Chris Friesen
  Cc: Don Dutile, Ben Hutchings, David Miller, yuvalmin, netdev, linux-pci

> -----Original Message-----
> From: Stephen Hemminger [mailto:shemminger@vyatta.com]
> Sent: Monday, July 23, 2012 11:36 AM
> To: Chris Friesen
> Cc: Don Dutile; Ben Hutchings; David Miller; yuvalmin@broadcom.com; Rose,
> Gregory V; netdev@vger.kernel.org; linux-pci@vger.kernel.org
> Subject: Re: New commands to configure IOV features
> 
> On Mon, 23 Jul 2012 09:09:38 -0600
> Chris Friesen <chris.friesen@genband.com> wrote:
> 
> > On 07/23/2012 08:03 AM, Don Dutile wrote:
> > > On 07/20/2012 07:42 PM, Chris Friesen wrote:
> > >>
> > >> I actually have a use-case where the guest needs to be able to
> > >> modify the MAC addresses of network devices that are actually VFs.
> > >>
> > >> The guest is bonding the network devices together, so the bonding
> > >> driver in the guest expects to be able to set all the slaves to the
> > >> same MAC address.
> > >>
> > >> As I read the ixgbe driver, this should be possible as long as the
> > >> host hasn't explicitly set the MAC address of the VF. Is that
> correct?
> > >>
> > >> Chris
> > >
> > > Interesting tug of war: hypervisors will want to set the macaddrs
> > > for security reasons,
> > >                         some guests may want to set macaddr for
> > > (valid?) config reasons.
> > >
> >
> > In our case we have control over both guest an host anyways, so it's
> > less of a security issue.  In the general case though I could see it
> > being an interesting problem.
> >
> > Back to the original discussion though--has anyone got any ideas about
> > the best way to trigger runtime creation of VFs?  I don't know what
> > the binary APIs looks like, but via sysfs I could see something like
> >
> > echo number_of_new_vfs_to_create >
> > /sys/bus/pci/devices/<address>/create_vfs
> >
> > Something else that occurred to me--is there buy-in from driver
> > maintainers?  I know the Intel ethernet drivers (what I'm most
> > familiar
> > with) would need to be substantially modified to support on-the-fly
> > addition of new vfs.  Currently they assume that the number of vfs is
> > known at module init time.
> >
> 
> Why couldn't rtnl_link_ops be used for this. It is already the preferred
> interface to create vlan's, bond devices, and other virtual devices?
> The one issue is that do the created VF's exist in kernel as devices or
> only visible to guest?

I would say that rtnl_link_ops are network oriented and not appropriate for something like a storage controller or graphics device, which are two other common SR-IOV capable devices.

I think it should be oriented toward the PCIe interface and subsystems in the kernel.

- Greg

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-07-23 18:40                                     ` Rose, Gregory V
@ 2012-09-19 11:07                                       ` Yuval Mintz
  2012-09-19 15:53                                         ` Greg Rose
  2012-09-19 17:49                                         ` David Miller
  0 siblings, 2 replies; 52+ messages in thread
From: Yuval Mintz @ 2012-09-19 11:07 UTC (permalink / raw)
  To: davem, netdev; +Cc: Ariel Elior, Eilon Greenstein

>>> Back to the original discussion though--has anyone got any ideas about
>>> the best way to trigger runtime creation of VFs?  I don't know what
>>> the binary APIs looks like, but via sysfs I could see something like
>>>
>>> echo number_of_new_vfs_to_create >
>>> /sys/bus/pci/devices/<address>/create_vfs
>>>
>>> Something else that occurred to me--is there buy-in from driver
>>> maintainers?  I know the Intel ethernet drivers (what I'm most
>>> familiar
>>> with) would need to be substantially modified to support on-the-fly
>>> addition of new vfs.  Currently they assume that the number of vfs is
>>> known at module init time.
>>
>> Why couldn't rtnl_link_ops be used for this. It is already the preferred
>> interface to create vlan's, bond devices, and other virtual devices?
>> The one issue is that do the created VF's exist in kernel as devices or
>> only visible to guest?
> 
> I would say that rtnl_link_ops are network oriented and not appropriate for something like a storage controller or graphics device, which are two other common SR-IOV capable devices.

Hi Dave,

We're currently fine-tuning our SRIOV support, which we will shortly
send upstream.

We've encountered a problem though - all drivers currently supporting
SRIOV do so with the usage of a module param: e.g., 'max_vfs' for ixgbe,
'num_vfs' for benet, etc.
The SRIOV feature is disabled by default on all the drivers; it can only
be enabled via usage of the module param.

We don't want the lack of SRIOV module param in the bnx2x driver to be
the bottle-neck when we'll submit the SRIOV feature upstream, and we
also don't want to enable SRIOV by default (following the same logic of
other drivers; most users don't use SRIOV and it would strain their
resources).

As we see it, there are several possible ways of solving the issue:
 1. Use some network-tool (e.g., ethtool).
 2. Implement a standard sysfs interface for PCIe devices, as SRIOV is
    not solely network-related (this should be done via the PCI linux
    tree).
 3. Implement a module param in our bnx2x code.

We would like to know what's your preferred method for solving this issue,
and to hear if you have another (better?) method by which we can add this
kind of support.

Thanks,
Yuval Mintz

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-09-19 11:07                                       ` Yuval Mintz
@ 2012-09-19 15:53                                         ` Greg Rose
  2012-09-19 19:44                                           ` Ben Hutchings
  2012-09-19 17:49                                         ` David Miller
  1 sibling, 1 reply; 52+ messages in thread
From: Greg Rose @ 2012-09-19 15:53 UTC (permalink / raw)
  To: Yuval Mintz; +Cc: davem, netdev, Ariel Elior, Eilon Greenstein

On Wed, 19 Sep 2012 14:07:19 +0300
Yuval Mintz <yuvalmin@broadcom.com> wrote:

> >>> Back to the original discussion though--has anyone got any ideas
> >>> about the best way to trigger runtime creation of VFs?  I don't
> >>> know what the binary APIs looks like, but via sysfs I could see
> >>> something like
> >>>
> >>> echo number_of_new_vfs_to_create >
> >>> /sys/bus/pci/devices/<address>/create_vfs
> >>>
> >>> Something else that occurred to me--is there buy-in from driver
> >>> maintainers?  I know the Intel ethernet drivers (what I'm most
> >>> familiar
> >>> with) would need to be substantially modified to support
> >>> on-the-fly addition of new vfs.  Currently they assume that the
> >>> number of vfs is known at module init time.
> >>
> >> Why couldn't rtnl_link_ops be used for this. It is already the
> >> preferred interface to create vlan's, bond devices, and other
> >> virtual devices? The one issue is that do the created VF's exist
> >> in kernel as devices or only visible to guest?
> > 
> > I would say that rtnl_link_ops are network oriented and not
> > appropriate for something like a storage controller or graphics
> > device, which are two other common SR-IOV capable devices.
> 
> Hi Dave,
> 
> We're currently fine-tuning our SRIOV support, which we will shortly
> send upstream.
> 
> We've encountered a problem though - all drivers currently supporting
> SRIOV do so with the usage of a module param: e.g., 'max_vfs' for
> ixgbe, 'num_vfs' for benet, etc.
> The SRIOV feature is disabled by default on all the drivers; it can
> only be enabled via usage of the module param.
> 
> We don't want the lack of SRIOV module param in the bnx2x driver to be
> the bottle-neck when we'll submit the SRIOV feature upstream, and we
> also don't want to enable SRIOV by default (following the same logic
> of other drivers; most users don't use SRIOV and it would strain their
> resources).
> 
> As we see it, there are several possible ways of solving the issue:
>  1. Use some network-tool (e.g., ethtool).
>  2. Implement a standard sysfs interface for PCIe devices, as SRIOV is
>     not solely network-related (this should be done via the PCI linux
>     tree).

I was not able to attend the Linux conference held at the end of August
myself but coworkers of mine here at Intel informed that method 2 here
seems to be the preferred approach.  Perhaps some folks who attended
the the conference can chime in with more specifics.

- Greg
LAN Access Division
Intel Corp.



>  3. Implement a module param in our bnx2x code.
> 
> We would like to know what's your preferred method for solving this
> issue, and to hear if you have another (better?) method by which we
> can add this kind of support.
> 
> Thanks,
> Yuval Mintz
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-09-19 11:07                                       ` Yuval Mintz
  2012-09-19 15:53                                         ` Greg Rose
@ 2012-09-19 17:49                                         ` David Miller
  1 sibling, 0 replies; 52+ messages in thread
From: David Miller @ 2012-09-19 17:49 UTC (permalink / raw)
  To: yuvalmin; +Cc: netdev, ariele, eilong

From: "Yuval Mintz" <yuvalmin@broadcom.com>
Date: Wed, 19 Sep 2012 14:07:19 +0300

>  3. Implement a module param in our bnx2x code.

Just because I was a moron and didn't notice the other cases
doesn't mean I should let you guys do it too.

So I just want to say that #3 is absolutely not an option.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-09-19 15:53                                         ` Greg Rose
@ 2012-09-19 19:44                                           ` Ben Hutchings
  2012-09-19 22:17                                             ` Yinghai Lu
  0 siblings, 1 reply; 52+ messages in thread
From: Ben Hutchings @ 2012-09-19 19:44 UTC (permalink / raw)
  To: Greg Rose
  Cc: Yuval Mintz, davem, netdev, Ariel Elior, Eilon Greenstein, linux-pci

On Wed, 2012-09-19 at 08:53 -0700, Greg Rose wrote:
> On Wed, 19 Sep 2012 14:07:19 +0300
> Yuval Mintz <yuvalmin@broadcom.com> wrote:
[...]
> > We've encountered a problem though - all drivers currently supporting
> > SRIOV do so with the usage of a module param: e.g., 'max_vfs' for
> > ixgbe, 'num_vfs' for benet, etc.
> > The SRIOV feature is disabled by default on all the drivers; it can
> > only be enabled via usage of the module param.
> > 
> > We don't want the lack of SRIOV module param in the bnx2x driver to be
> > the bottle-neck when we'll submit the SRIOV feature upstream, and we
> > also don't want to enable SRIOV by default (following the same logic
> > of other drivers; most users don't use SRIOV and it would strain their
> > resources).
> > 
> > As we see it, there are several possible ways of solving the issue:
> >  1. Use some network-tool (e.g., ethtool).
> >  2. Implement a standard sysfs interface for PCIe devices, as SRIOV is
> >     not solely network-related (this should be done via the PCI linux
> >     tree).

There is another interim option, which is to put this setting somewhere
in NVRAM on the device.  This solves the one-size-for-all-devices
problem, though obviously not the inconsistent-interface problem.

> I was not able to attend the Linux conference held at the end of August
> myself but coworkers of mine here at Intel informed that method 2 here
> seems to be the preferred approach.  Perhaps some folks who attended
> the the conference can chime in with more specifics.

There really wasn't much more specific discussion.  Bjorn's summary of
the mini-summit <http://lwn.net/Articles/514113/> says:

> SR-IOV Management
> 
>     Currently drivers implement module parameters like "max_vfs".  This means
>     all devices claimed by the driver get the same number of VFs, and you can't
>     change anything without unloading and reloading the driver.
> 
>     Consensus that we should try to implement a knob for this in sysfs so it
>     can be generic (not in each driver) and set individually for each device.

I don't think any implementation has been posted to the linux-pci list
yet...?

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-09-19 19:44                                           ` Ben Hutchings
@ 2012-09-19 22:17                                             ` Yinghai Lu
  2012-09-19 22:46                                               ` Ben Hutchings
  0 siblings, 1 reply; 52+ messages in thread
From: Yinghai Lu @ 2012-09-19 22:17 UTC (permalink / raw)
  To: Ben Hutchings, Bjorn Helgaas
  Cc: Greg Rose, Yuval Mintz, davem, netdev, Ariel Elior,
	Eilon Greenstein, linux-pci

[-- Attachment #1: Type: text/plain, Size: 1100 bytes --]

On Wed, Sep 19, 2012 at 12:44 PM, Ben Hutchings
<bhutchings@solarflare.com> wrote:
> On Wed, 2012-09-19 at 08:53 -0700, Greg Rose wrote:
>> I was not able to attend the Linux conference held at the end of August
>> myself but coworkers of mine here at Intel informed that method 2 here
>> seems to be the preferred approach.  Perhaps some folks who attended
>> the the conference can chime in with more specifics.
>
> There really wasn't much more specific discussion.  Bjorn's summary of
> the mini-summit <http://lwn.net/Articles/514113/> says:
>
>> SR-IOV Management
>>
>>     Currently drivers implement module parameters like "max_vfs".  This means
>>     all devices claimed by the driver get the same number of VFs, and you can't
>>     change anything without unloading and reloading the driver.
>>
>>     Consensus that we should try to implement a knob for this in sysfs so it
>>     can be generic (not in each driver) and set individually for each device.
>
> I don't think any implementation has been posted to the linux-pci list

please check attached three patches...

Thanks

Yinghai

[-- Attachment #2: pci_dev_type.patch --]
[-- Type: application/octet-stream, Size: 2042 bytes --]

Subject: [PATCH] PCI: Add pci_dev_type

later use it for visiable attribute control.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/pci-sysfs.c |   24 ++++++++++++++++++++++++
 drivers/pci/pci.h       |    1 +
 drivers/pci/probe.c     |    1 +
 3 files changed, 26 insertions(+)

Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -1411,3 +1411,27 @@ static int __init pci_sysfs_init(void)
 }
 
 late_initcall(pci_sysfs_init);
+
+static struct attribute *pci_dev_dev_attrs[] = {
+	NULL,
+};
+
+static umode_t pci_dev_attrs_are_visible(struct kobject *kobj,
+						struct attribute *a, int n)
+{
+	return a->mode;
+}
+
+static struct attribute_group pci_dev_attr_group = {
+	.attrs = pci_dev_dev_attrs,
+	.is_visible = pci_dev_attrs_are_visible,
+};
+
+static const struct attribute_group *pci_dev_attr_groups[] = {
+	&pci_dev_attr_group,
+	NULL,
+};
+
+struct device_type pci_dev_type = {
+	.groups = pci_dev_attr_groups,
+};
Index: linux-2.6/drivers/pci/pci.h
===================================================================
--- linux-2.6.orig/drivers/pci/pci.h
+++ linux-2.6/drivers/pci/pci.h
@@ -157,6 +157,7 @@ static inline int pci_no_d1d2(struct pci
 }
 extern struct device_attribute pci_dev_attrs[];
 extern struct device_attribute pcibus_dev_attrs[];
+extern struct device_type pci_dev_type;
 #ifdef CONFIG_HOTPLUG
 extern struct bus_attribute pci_bus_attrs[];
 #else
Index: linux-2.6/drivers/pci/probe.c
===================================================================
--- linux-2.6.orig/drivers/pci/probe.c
+++ linux-2.6/drivers/pci/probe.c
@@ -975,6 +975,7 @@ int pci_setup_device(struct pci_dev *dev
 	dev->sysdata = dev->bus->sysdata;
 	dev->dev.parent = dev->bus->bridge;
 	dev->dev.bus = &pci_bus_type;
+	dev->dev.type = &pci_dev_type;
 	dev->hdr_type = hdr_type & 0x7f;
 	dev->multifunction = !!(hdr_type & 0x80);
 	dev->error_state = pci_channel_io_normal;

[-- Attachment #3: pci_010_pci_dev_boot_vga_x.patch --]
[-- Type: application/octet-stream, Size: 1868 bytes --]

Subject: [PATCH 10/10] PCI, sys: Use is_visable() with boot_vga attribute for pci_dev

That could let pci_create_sysfs_dev_files more simple.

also fix possible fix memleak during removing path.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/pci-sysfs.c |   21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -1304,29 +1304,20 @@ int __must_check pci_create_sysfs_dev_fi
 		pdev->rom_attr = attr;
 	}
 
-	if ((pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA) {
-		retval = device_create_file(&pdev->dev, &vga_attr);
-		if (retval)
-			goto err_rom_file;
-	}
-
 	/* add platform-specific attributes */
 	retval = pcibios_add_platform_entries(pdev);
 	if (retval)
-		goto err_vga_file;
+		goto err_rom_file;
 
 	/* add sysfs entries for various capabilities */
 	retval = pci_create_capabilities_sysfs(pdev);
 	if (retval)
-		goto err_vga_file;
+		goto err_rom_file;
 
 	pci_create_firmware_label_files(pdev);
 
 	return 0;
 
-err_vga_file:
-	if ((pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA)
-		device_remove_file(&pdev->dev, &vga_attr);
 err_rom_file:
 	if (rom_size) {
 		sysfs_remove_bin_file(&pdev->dev.kobj, pdev->rom_attr);
@@ -1414,12 +1405,20 @@ static int __init pci_sysfs_init(void)
 late_initcall(pci_sysfs_init);
 
 static struct attribute *pci_dev_dev_attrs[] = {
+	&vga_attr.attr,
 	NULL,
 };
 
 static umode_t pci_dev_attrs_are_visible(struct kobject *kobj,
 						struct attribute *a, int n)
 {
+	struct device *dev = container_of(kobj, struct device, kobj);
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	if (a == &vga_attr.attr)
+		if ((pdev->class >> 8) != PCI_CLASS_DISPLAY_VGA)
+			return 0;
+
 	return a->mode;
 }
 

[-- Attachment #4: sriov_xxx.patch --]
[-- Type: application/octet-stream, Size: 2158 bytes --]

Subject: [PATCH] PCI: Add max_vfs in sysfs per pci device where supports

driver later need to check dev->max_vfs in /sys

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/pci-sysfs.c |   29 +++++++++++++++++++++++++++++
 include/linux/pci.h     |    1 +
 2 files changed, 30 insertions(+)

Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -459,6 +459,30 @@ boot_vga_show(struct device *dev, struct
 }
 struct device_attribute vga_attr = __ATTR_RO(boot_vga);
 
+static ssize_t
+max_vfs_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	return sprintf(buf, "%u\n", pdev->max_vfs);
+}
+static ssize_t
+max_vfs_store(struct device *dev, struct device_attribute *attr,
+		 const char *buf, size_t count)
+{
+	unsigned long val;
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	if (strict_strtoul(buf, 0, &val) < 0)
+		return -EINVAL;
+
+	pdev->max_vfs = val;
+
+	return count;
+}
+struct device_attribute max_vfs_attr =
+	__ATTR(max_vfs, 0644, max_vfs_show, max_vfs_store);
+
 static void
 pci_config_pm_runtime_get(struct pci_dev *pdev)
 {
@@ -1406,6 +1430,7 @@ late_initcall(pci_sysfs_init);
 
 static struct attribute *pci_dev_dev_attrs[] = {
 	&vga_attr.attr,
+	&max_vfs_attr.attr,
 	NULL,
 };
 
@@ -1419,6 +1444,10 @@ static umode_t pci_dev_attrs_are_visible
 		if ((pdev->class >> 8) != PCI_CLASS_DISPLAY_VGA)
 			return 0;
 
+	if (a == &max_vfs_attr.attr)
+		if (!pdev->is_physfn)
+			return 0;
+
 	return a->mode;
 }
 
Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -359,6 +359,7 @@ struct pci_dev {
 	unsigned int	broken_intx_masking:1;
 	unsigned int	io_window_1k:1;	/* Intel P2P bridge 1K I/O windows */
 	pci_dev_flags_t dev_flags;
+	unsigned int	max_vfs;
 	atomic_t	enable_cnt;	/* pci_enable_device has been called */
 
 	u32		saved_config_space[16]; /* config space saved at suspend time */

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-09-19 22:17                                             ` Yinghai Lu
@ 2012-09-19 22:46                                               ` Ben Hutchings
  2012-09-20  0:19                                                 ` Yinghai Lu
  0 siblings, 1 reply; 52+ messages in thread
From: Ben Hutchings @ 2012-09-19 22:46 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Greg Rose, Yuval Mintz, davem, netdev,
	Ariel Elior, Eilon Greenstein, linux-pci

On Wed, 2012-09-19 at 15:17 -0700, Yinghai Lu wrote:
> On Wed, Sep 19, 2012 at 12:44 PM, Ben Hutchings
> <bhutchings@solarflare.com> wrote:
> > On Wed, 2012-09-19 at 08:53 -0700, Greg Rose wrote:
> >> I was not able to attend the Linux conference held at the end of August
> >> myself but coworkers of mine here at Intel informed that method 2 here
> >> seems to be the preferred approach.  Perhaps some folks who attended
> >> the the conference can chime in with more specifics.
> >
> > There really wasn't much more specific discussion.  Bjorn's summary of
> > the mini-summit <http://lwn.net/Articles/514113/> says:
> >
> >> SR-IOV Management
> >>
> >>     Currently drivers implement module parameters like "max_vfs".  This means
> >>     all devices claimed by the driver get the same number of VFs, and you can't
> >>     change anything without unloading and reloading the driver.
> >>
> >>     Consensus that we should try to implement a knob for this in sysfs so it
> >>     can be generic (not in each driver) and set individually for each device.
> >
> > I don't think any implementation has been posted to the linux-pci list
> 
> please check attached three patches...
[...]
> --- linux-2.6.orig/drivers/pci/pci-sysfs.c
> +++ linux-2.6/drivers/pci/pci-sysfs.c
> @@ -459,6 +459,30 @@ boot_vga_show(struct device *dev, struct
>  }
>  struct device_attribute vga_attr = __ATTR_RO(boot_vga);
>  
> +static ssize_t
> +max_vfs_show(struct device *dev, struct device_attribute *attr, char *buf)
> +{
> +       struct pci_dev *pdev = to_pci_dev(dev);
> +
> +       return sprintf(buf, "%u\n", pdev->max_vfs);
> +}
> +static ssize_t
> +max_vfs_store(struct device *dev, struct device_attribute *attr,
> +                const char *buf, size_t count)
> +{
> +       unsigned long val;
> +       struct pci_dev *pdev = to_pci_dev(dev);
> +
> +       if (strict_strtoul(buf, 0, &val) < 0)
> +               return -EINVAL;
> +
> +       pdev->max_vfs = val;
> +
> +       return count;
> +}
[...]

Then what would actually trigger creation of the VFs?  There's no way we
can assume that some sysfs attribute will be written before the PF
driver is loaded (what if it's built-in?).  I thought the idea was to
add a driver callback that would be called when the sysfs attribute was
written.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-09-19 22:46                                               ` Ben Hutchings
@ 2012-09-20  0:19                                                 ` Yinghai Lu
  2012-09-20  1:23                                                   ` Ben Hutchings
  0 siblings, 1 reply; 52+ messages in thread
From: Yinghai Lu @ 2012-09-20  0:19 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Bjorn Helgaas, Greg Rose, Yuval Mintz, davem, netdev,
	Ariel Elior, Eilon Greenstein, linux-pci

[-- Attachment #1: Type: text/plain, Size: 911 bytes --]

On Wed, Sep 19, 2012 at 3:46 PM, Ben Hutchings
<bhutchings@solarflare.com> wrote:
> On Wed, 2012-09-19 at 15:17 -0700, Yinghai Lu wrote:
>> +max_vfs_store(struct device *dev, struct device_attribute *attr,
>> +                const char *buf, size_t count)
>> +{
>> +       unsigned long val;
>> +       struct pci_dev *pdev = to_pci_dev(dev);
>> +
>> +       if (strict_strtoul(buf, 0, &val) < 0)
>> +               return -EINVAL;
>> +
>> +       pdev->max_vfs = val;
>> +
>> +       return count;
>> +}
> [...]
>
> Then what would actually trigger creation of the VFs?  There's no way we
> can assume that some sysfs attribute will be written before the PF
> driver is loaded (what if it's built-in?).  I thought the idea was to
> add a driver callback that would be called when the sysfs attribute was
> written.

could just stop the device and add it back again?

just like updated patch.

Thanks

Yinghai

[-- Attachment #2: sriov_xxx.patch --]
[-- Type: application/octet-stream, Size: 3664 bytes --]

Subject: [PATCH] PCI: Add max_vfs in sysfs per pci device where supports

driver later need to check dev->max_vfs in /sys

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/pci-sysfs.c |   57 ++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/pci/remove.c    |    2 -
 include/linux/pci.h     |    2 +
 3 files changed, 60 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -459,6 +459,58 @@ boot_vga_show(struct device *dev, struct
 }
 struct device_attribute vga_attr = __ATTR_RO(boot_vga);
 
+static ssize_t
+max_vfs_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	return sprintf(buf, "%u\n", pdev->max_vfs);
+}
+
+static void rebind_callback(struct device *dev)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	int err;
+
+	mutex_lock(&pci_remove_rescan_mutex);
+	pci_stop_bus_device(pdev);
+	err = pci_bus_add_device(pdev);
+	if (!err) {
+		struct pci_bus *child = pdev->subordinate;
+
+		if (child) {
+			pci_bus_add_devices(child);
+			if (!child->is_added)
+				pci_bus_add_child(child);
+		}
+	}
+	mutex_unlock(&pci_remove_rescan_mutex);
+}
+static ssize_t
+max_vfs_store(struct device *dev, struct device_attribute *attr,
+		 const char *buf, size_t count)
+{
+	unsigned long val;
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	if (strict_strtoul(buf, 0, &val) < 0)
+		return -EINVAL;
+
+	pdev->max_vfs = val;
+
+	if (pdev->is_added) {
+		int err;
+		err = device_schedule_callback(dev, rebind_callback);
+
+		if (err)
+			return err;
+	}
+
+	return count;
+}
+struct device_attribute max_vfs_attr =
+	__ATTR(max_vfs, 0644, max_vfs_show, max_vfs_store);
+
 static void
 pci_config_pm_runtime_get(struct pci_dev *pdev)
 {
@@ -1406,6 +1458,7 @@ late_initcall(pci_sysfs_init);
 
 static struct attribute *pci_dev_dev_attrs[] = {
 	&vga_attr.attr,
+	&max_vfs_attr.attr,
 	NULL,
 };
 
@@ -1419,6 +1472,10 @@ static umode_t pci_dev_attrs_are_visible
 		if ((pdev->class >> 8) != PCI_CLASS_DISPLAY_VGA)
 			return 0;
 
+	if (a == &max_vfs_attr.attr)
+		if (!pdev->is_physfn)
+			return 0;
+
 	return a->mode;
 }
 
Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -359,6 +359,7 @@ struct pci_dev {
 	unsigned int	broken_intx_masking:1;
 	unsigned int	io_window_1k:1;	/* Intel P2P bridge 1K I/O windows */
 	pci_dev_flags_t dev_flags;
+	unsigned int	max_vfs;
 	atomic_t	enable_cnt;	/* pci_enable_device has been called */
 
 	u32		saved_config_space[16]; /* config space saved at suspend time */
@@ -812,6 +813,7 @@ u8 pci_common_swizzle(struct pci_dev *de
 extern struct pci_dev *pci_dev_get(struct pci_dev *dev);
 extern void pci_dev_put(struct pci_dev *dev);
 extern void pci_remove_bus(struct pci_bus *b);
+void pci_stop_bus_device(struct pci_dev *dev);
 extern void pci_stop_and_remove_bus_device(struct pci_dev *dev);
 void pci_stop_root_bus(struct pci_bus *bus);
 void pci_remove_root_bus(struct pci_bus *bus);
Index: linux-2.6/drivers/pci/remove.c
===================================================================
--- linux-2.6.orig/drivers/pci/remove.c
+++ linux-2.6/drivers/pci/remove.c
@@ -55,7 +55,7 @@ void pci_remove_bus(struct pci_bus *bus)
 }
 EXPORT_SYMBOL(pci_remove_bus);
 
-static void pci_stop_bus_device(struct pci_dev *dev)
+void pci_stop_bus_device(struct pci_dev *dev)
 {
 	struct pci_bus *bus = dev->subordinate;
 	struct pci_dev *child, *tmp;

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-09-20  0:19                                                 ` Yinghai Lu
@ 2012-09-20  1:23                                                   ` Ben Hutchings
  2012-09-20  2:27                                                     ` Yinghai Lu
  2012-09-20 15:39                                                       ` Rose, Gregory V
  0 siblings, 2 replies; 52+ messages in thread
From: Ben Hutchings @ 2012-09-20  1:23 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Greg Rose, Yuval Mintz, davem, netdev,
	Ariel Elior, Eilon Greenstein, linux-pci

On Wed, 2012-09-19 at 17:19 -0700, Yinghai Lu wrote:
> On Wed, Sep 19, 2012 at 3:46 PM, Ben Hutchings
> <bhutchings@solarflare.com> wrote:
> > On Wed, 2012-09-19 at 15:17 -0700, Yinghai Lu wrote:
> >> +max_vfs_store(struct device *dev, struct device_attribute *attr,
> >> +                const char *buf, size_t count)
> >> +{
> >> +       unsigned long val;
> >> +       struct pci_dev *pdev = to_pci_dev(dev);
> >> +
> >> +       if (strict_strtoul(buf, 0, &val) < 0)
> >> +               return -EINVAL;
> >> +
> >> +       pdev->max_vfs = val;
> >> +
> >> +       return count;
> >> +}
> > [...]
> >
> > Then what would actually trigger creation of the VFs?  There's no way we
> > can assume that some sysfs attribute will be written before the PF
> > driver is loaded (what if it's built-in?).  I thought the idea was to
> > add a driver callback that would be called when the sysfs attribute was
> > written.
> 
> could just stop the device and add it back again?

This is highly disruptive and I think it would be totally unacceptable
for at least networking devices.

Ben.

> just like updated patch.
> 
> Thanks
> 
> Yinghai

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-09-20  1:23                                                   ` Ben Hutchings
@ 2012-09-20  2:27                                                     ` Yinghai Lu
  2012-09-20  3:08                                                       ` Subhendu Ghosh
  2012-09-20 15:39                                                       ` Rose, Gregory V
  1 sibling, 1 reply; 52+ messages in thread
From: Yinghai Lu @ 2012-09-20  2:27 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Bjorn Helgaas, Greg Rose, Yuval Mintz, davem, netdev,
	Ariel Elior, Eilon Greenstein, linux-pci

On Wed, Sep 19, 2012 at 6:23 PM, Ben Hutchings
<bhutchings@solarflare.com> wrote:
> On Wed, 2012-09-19 at 17:19 -0700, Yinghai Lu wrote:
>> could just stop the device and add it back again?
>
> This is highly disruptive and I think it would be totally unacceptable
> for at least networking devices.

so you want PF network device continue work while enable VF on it?

Yinghai

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-09-20  2:27                                                     ` Yinghai Lu
@ 2012-09-20  3:08                                                       ` Subhendu Ghosh
  0 siblings, 0 replies; 52+ messages in thread
From: Subhendu Ghosh @ 2012-09-20  3:08 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ben Hutchings, Bjorn Helgaas, Greg Rose, Yuval Mintz, davem,
	netdev, Ariel Elior, Eilon Greenstein, linux-pci

On Wed, Sep 19, 2012 at 10:27 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Wed, Sep 19, 2012 at 6:23 PM, Ben Hutchings
> <bhutchings@solarflare.com> wrote:
>> On Wed, 2012-09-19 at 17:19 -0700, Yinghai Lu wrote:
>>> could just stop the device and add it back again?
>>
>> This is highly disruptive and I think it would be totally unacceptable
>> for at least networking devices.
>
> so you want PF network device continue work while enable VF on it?
>

Yes - normal scenario a system is installed thru automation over a
network. The default initrd for installer does not configure VFs.
Post install, a management tool like Openstack Quantum or others are
used to enable VFs. If the PF NIC happens to be the communication path
- disrupting the PF during a remote configuration cycle can leave you
blind and the system unreachable.

-subhendu

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: New commands to configure IOV features
  2012-09-20  1:23                                                   ` Ben Hutchings
@ 2012-09-20 15:39                                                       ` Rose, Gregory V
  2012-09-20 15:39                                                       ` Rose, Gregory V
  1 sibling, 0 replies; 52+ messages in thread
From: Rose, Gregory V @ 2012-09-20 15:39 UTC (permalink / raw)
  To: Ben Hutchings, Yinghai Lu
  Cc: Bjorn Helgaas, Yuval Mintz, davem, netdev, Ariel Elior,
	Eilon Greenstein, linux-pci

> -----Original Message-----
> From: Ben Hutchings [mailto:bhutchings@solarflare.com]
> Sent: Wednesday, September 19, 2012 6:23 PM
> To: Yinghai Lu
> Cc: Bjorn Helgaas; Rose, Gregory V; Yuval Mintz; davem@davemloft.net;
> netdev@vger.kernel.org; Ariel Elior; Eilon Greenstein; linux-pci
> Subject: Re: New commands to configure IOV features
> 
> On Wed, 2012-09-19 at 17:19 -0700, Yinghai Lu wrote:
> > On Wed, Sep 19, 2012 at 3:46 PM, Ben Hutchings
> > <bhutchings@solarflare.com> wrote:
> > > On Wed, 2012-09-19 at 15:17 -0700, Yinghai Lu wrote:
> > >> +max_vfs_store(struct device *dev, struct device_attribute *attr,
> > >> +                const char *buf, size_t count) {
> > >> +       unsigned long val;
> > >> +       struct pci_dev *pdev = to_pci_dev(dev);
> > >> +
> > >> +       if (strict_strtoul(buf, 0, &val) < 0)
> > >> +               return -EINVAL;
> > >> +
> > >> +       pdev->max_vfs = val;
> > >> +
> > >> +       return count;
> > >> +}
> > > [...]
> > >
> > > Then what would actually trigger creation of the VFs?  There's no
> > > way we can assume that some sysfs attribute will be written before
> > > the PF driver is loaded (what if it's built-in?).  I thought the
> > > idea was to add a driver callback that would be called when the
> > > sysfs attribute was written.
> >
> > could just stop the device and add it back again?
> 
> This is highly disruptive and I think it would be totally unacceptable for
> at least networking devices.

Agreed.

We need the driver callback.

- Greg


^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: New commands to configure IOV features
@ 2012-09-20 15:39                                                       ` Rose, Gregory V
  0 siblings, 0 replies; 52+ messages in thread
From: Rose, Gregory V @ 2012-09-20 15:39 UTC (permalink / raw)
  To: Ben Hutchings, Yinghai Lu
  Cc: Bjorn Helgaas, Yuval Mintz, davem, netdev, Ariel Elior,
	Eilon Greenstein, linux-pci

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBCZW4gSHV0Y2hpbmdzIFttYWls
dG86Ymh1dGNoaW5nc0Bzb2xhcmZsYXJlLmNvbV0NCj4gU2VudDogV2VkbmVzZGF5LCBTZXB0ZW1i
ZXIgMTksIDIwMTIgNjoyMyBQTQ0KPiBUbzogWWluZ2hhaSBMdQ0KPiBDYzogQmpvcm4gSGVsZ2Fh
czsgUm9zZSwgR3JlZ29yeSBWOyBZdXZhbCBNaW50ejsgZGF2ZW1AZGF2ZW1sb2Z0Lm5ldDsNCj4g
bmV0ZGV2QHZnZXIua2VybmVsLm9yZzsgQXJpZWwgRWxpb3I7IEVpbG9uIEdyZWVuc3RlaW47IGxp
bnV4LXBjaQ0KPiBTdWJqZWN0OiBSZTogTmV3IGNvbW1hbmRzIHRvIGNvbmZpZ3VyZSBJT1YgZmVh
dHVyZXMNCj4gDQo+IE9uIFdlZCwgMjAxMi0wOS0xOSBhdCAxNzoxOSAtMDcwMCwgWWluZ2hhaSBM
dSB3cm90ZToNCj4gPiBPbiBXZWQsIFNlcCAxOSwgMjAxMiBhdCAzOjQ2IFBNLCBCZW4gSHV0Y2hp
bmdzDQo+ID4gPGJodXRjaGluZ3NAc29sYXJmbGFyZS5jb20+IHdyb3RlOg0KPiA+ID4gT24gV2Vk
LCAyMDEyLTA5LTE5IGF0IDE1OjE3IC0wNzAwLCBZaW5naGFpIEx1IHdyb3RlOg0KPiA+ID4+ICtt
YXhfdmZzX3N0b3JlKHN0cnVjdCBkZXZpY2UgKmRldiwgc3RydWN0IGRldmljZV9hdHRyaWJ1dGUg
KmF0dHIsDQo+ID4gPj4gKyAgICAgICAgICAgICAgICBjb25zdCBjaGFyICpidWYsIHNpemVfdCBj
b3VudCkgew0KPiA+ID4+ICsgICAgICAgdW5zaWduZWQgbG9uZyB2YWw7DQo+ID4gPj4gKyAgICAg
ICBzdHJ1Y3QgcGNpX2RldiAqcGRldiA9IHRvX3BjaV9kZXYoZGV2KTsNCj4gPiA+PiArDQo+ID4g
Pj4gKyAgICAgICBpZiAoc3RyaWN0X3N0cnRvdWwoYnVmLCAwLCAmdmFsKSA8IDApDQo+ID4gPj4g
KyAgICAgICAgICAgICAgIHJldHVybiAtRUlOVkFMOw0KPiA+ID4+ICsNCj4gPiA+PiArICAgICAg
IHBkZXYtPm1heF92ZnMgPSB2YWw7DQo+ID4gPj4gKw0KPiA+ID4+ICsgICAgICAgcmV0dXJuIGNv
dW50Ow0KPiA+ID4+ICt9DQo+ID4gPiBbLi4uXQ0KPiA+ID4NCj4gPiA+IFRoZW4gd2hhdCB3b3Vs
ZCBhY3R1YWxseSB0cmlnZ2VyIGNyZWF0aW9uIG9mIHRoZSBWRnM/ICBUaGVyZSdzIG5vDQo+ID4g
PiB3YXkgd2UgY2FuIGFzc3VtZSB0aGF0IHNvbWUgc3lzZnMgYXR0cmlidXRlIHdpbGwgYmUgd3Jp
dHRlbiBiZWZvcmUNCj4gPiA+IHRoZSBQRiBkcml2ZXIgaXMgbG9hZGVkICh3aGF0IGlmIGl0J3Mg
YnVpbHQtaW4/KS4gIEkgdGhvdWdodCB0aGUNCj4gPiA+IGlkZWEgd2FzIHRvIGFkZCBhIGRyaXZl
ciBjYWxsYmFjayB0aGF0IHdvdWxkIGJlIGNhbGxlZCB3aGVuIHRoZQ0KPiA+ID4gc3lzZnMgYXR0
cmlidXRlIHdhcyB3cml0dGVuLg0KPiA+DQo+ID4gY291bGQganVzdCBzdG9wIHRoZSBkZXZpY2Ug
YW5kIGFkZCBpdCBiYWNrIGFnYWluPw0KPiANCj4gVGhpcyBpcyBoaWdobHkgZGlzcnVwdGl2ZSBh
bmQgSSB0aGluayBpdCB3b3VsZCBiZSB0b3RhbGx5IHVuYWNjZXB0YWJsZSBmb3INCj4gYXQgbGVh
c3QgbmV0d29ya2luZyBkZXZpY2VzLg0KDQpBZ3JlZWQuDQoNCldlIG5lZWQgdGhlIGRyaXZlciBj
YWxsYmFjay4NCg0KLSBHcmVnDQoNCg==

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-09-20 15:39                                                       ` Rose, Gregory V
  (?)
@ 2012-09-21  5:50                                                       ` Yinghai Lu
  2012-09-21 17:35                                                         ` Ben Hutchings
  2012-09-21 18:06                                                           ` Don Dutile
  -1 siblings, 2 replies; 52+ messages in thread
From: Yinghai Lu @ 2012-09-21  5:50 UTC (permalink / raw)
  To: Rose, Gregory V
  Cc: Ben Hutchings, Bjorn Helgaas, Yuval Mintz, davem, netdev,
	Ariel Elior, Eilon Greenstein, linux-pci

[-- Attachment #1: Type: text/plain, Size: 486 bytes --]

On Thu, Sep 20, 2012 at 8:39 AM, Rose, Gregory V
<gregory.v.rose@intel.com> wrote:
>> -----Original Message-----
>> From: Ben Hutchings [mailto:bhutchings@solarflare.com]
>> >
>> > could just stop the device and add it back again?
>>
>> This is highly disruptive and I think it would be totally unacceptable for
>> at least networking devices.
>
> Agreed.
>
> We need the driver callback.

ok, something like attached patches?

ixgbe change need more cleanup from ixgbe guys.

-Yinghai

[-- Attachment #2: pci_dev_type.patch --]
[-- Type: application/octet-stream, Size: 2044 bytes --]

Subject: [PATCH] PCI: Add pci_dev_type

later use it for visiable attribute control.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/pci-sysfs.c |   24 ++++++++++++++++++++++++
 drivers/pci/pci.h       |    1 +
 drivers/pci/probe.c     |    1 +
 3 files changed, 26 insertions(+)

Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -1412,3 +1412,27 @@ static int __init pci_sysfs_init(void)
 }
 
 late_initcall(pci_sysfs_init);
+
+static struct attribute *pci_dev_dev_attrs[] = {
+	NULL,
+};
+
+static umode_t pci_dev_attrs_are_visible(struct kobject *kobj,
+						struct attribute *a, int n)
+{
+	return a->mode;
+}
+
+static struct attribute_group pci_dev_attr_group = {
+	.attrs = pci_dev_dev_attrs,
+	.is_visible = pci_dev_attrs_are_visible,
+};
+
+static const struct attribute_group *pci_dev_attr_groups[] = {
+	&pci_dev_attr_group,
+	NULL,
+};
+
+struct device_type pci_dev_type = {
+	.groups = pci_dev_attr_groups,
+};
Index: linux-2.6/drivers/pci/pci.h
===================================================================
--- linux-2.6.orig/drivers/pci/pci.h
+++ linux-2.6/drivers/pci/pci.h
@@ -158,6 +158,7 @@ static inline int pci_no_d1d2(struct pci
 }
 extern struct device_attribute pci_dev_attrs[];
 extern struct device_attribute pcibus_dev_attrs[];
+extern struct device_type pci_dev_type;
 #ifdef CONFIG_HOTPLUG
 extern struct bus_attribute pci_bus_attrs[];
 #else
Index: linux-2.6/drivers/pci/probe.c
===================================================================
--- linux-2.6.orig/drivers/pci/probe.c
+++ linux-2.6/drivers/pci/probe.c
@@ -1277,6 +1277,7 @@ int pci_setup_device(struct pci_dev *dev
 	dev->sysdata = dev->bus->sysdata;
 	dev->dev.parent = dev->bus->bridge;
 	dev->dev.bus = &pci_bus_type;
+	dev->dev.type = &pci_dev_type;
 	dev->hdr_type = hdr_type & 0x7f;
 	dev->multifunction = !!(hdr_type & 0x80);
 	dev->error_state = pci_channel_io_normal;

[-- Attachment #3: pci_010_pci_dev_boot_vga_x.patch --]
[-- Type: application/octet-stream, Size: 1868 bytes --]

Subject: [PATCH 10/10] PCI, sys: Use is_visable() with boot_vga attribute for pci_dev

That could let pci_create_sysfs_dev_files more simple.

also fix possible fix memleak during removing path.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/pci-sysfs.c |   21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -1304,29 +1304,20 @@ int __must_check pci_create_sysfs_dev_fi
 		pdev->rom_attr = attr;
 	}
 
-	if ((pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA) {
-		retval = device_create_file(&pdev->dev, &vga_attr);
-		if (retval)
-			goto err_rom_file;
-	}
-
 	/* add platform-specific attributes */
 	retval = pcibios_add_platform_entries(pdev);
 	if (retval)
-		goto err_vga_file;
+		goto err_rom_file;
 
 	/* add sysfs entries for various capabilities */
 	retval = pci_create_capabilities_sysfs(pdev);
 	if (retval)
-		goto err_vga_file;
+		goto err_rom_file;
 
 	pci_create_firmware_label_files(pdev);
 
 	return 0;
 
-err_vga_file:
-	if ((pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA)
-		device_remove_file(&pdev->dev, &vga_attr);
 err_rom_file:
 	if (rom_size) {
 		sysfs_remove_bin_file(&pdev->dev.kobj, pdev->rom_attr);
@@ -1414,12 +1405,20 @@ static int __init pci_sysfs_init(void)
 late_initcall(pci_sysfs_init);
 
 static struct attribute *pci_dev_dev_attrs[] = {
+	&vga_attr.attr,
 	NULL,
 };
 
 static umode_t pci_dev_attrs_are_visible(struct kobject *kobj,
 						struct attribute *a, int n)
 {
+	struct device *dev = container_of(kobj, struct device, kobj);
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	if (a == &vga_attr.attr)
+		if ((pdev->class >> 8) != PCI_CLASS_DISPLAY_VGA)
+			return 0;
+
 	return a->mode;
 }
 

[-- Attachment #4: pci_drv_set_max_vfs.patch --]
[-- Type: application/octet-stream, Size: 830 bytes --]

---
 include/linux/pci.h |    1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -663,6 +663,7 @@ struct pci_driver {
 	const struct pci_device_id *id_table;	/* must be non-NULL for probe to be called */
 	int  (*probe)  (struct pci_dev *dev, const struct pci_device_id *id);	/* New device inserted */
 	void (*remove) (struct pci_dev *dev);	/* Device removed (NULL if not a hot-plug capable driver) */
+	void (*set_max_vfs) (struct pci_dev *dev); /* enable sriov */
 	int  (*suspend) (struct pci_dev *dev, pm_message_t state);	/* Device suspended */
 	int  (*suspend_late) (struct pci_dev *dev, pm_message_t state);
 	int  (*resume_early) (struct pci_dev *dev);

[-- Attachment #5: sriov_xxx.patch --]
[-- Type: application/octet-stream, Size: 2657 bytes --]

Subject: [PATCH] PCI: Add max_vfs in sysfs per pci device where supports

driver later need to check dev->max_vfs in /sys

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/pci-sysfs.c |   53 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h     |    1 
 2 files changed, 54 insertions(+)

Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -459,6 +459,54 @@ boot_vga_show(struct device *dev, struct
 }
 struct device_attribute vga_attr = __ATTR_RO(boot_vga);
 
+static ssize_t
+max_vfs_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	return sprintf(buf, "%u\n", pdev->max_vfs);
+}
+
+static void max_vfs_callback(struct device *dev)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	mutex_lock(&pci_remove_rescan_mutex);
+	if (pdev->is_added && dev->driver){
+		struct pci_driver *pdrv;
+
+		pdrv = to_pci_driver(dev->driver);
+		if (pdrv->set_max_vfs)
+			pdrv->set_max_vfs(pdev);
+
+	}
+	mutex_unlock(&pci_remove_rescan_mutex);
+}
+static ssize_t
+max_vfs_store(struct device *dev, struct device_attribute *attr,
+		 const char *buf, size_t count)
+{
+	unsigned long val;
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	if (strict_strtoul(buf, 0, &val) < 0)
+		return -EINVAL;
+
+	pdev->max_vfs = val;
+
+	if (pdev->is_added) {
+		int err;
+		err = device_schedule_callback(dev, max_vfs_callback);
+
+		if (err)
+			return err;
+	}
+
+	return count;
+}
+struct device_attribute max_vfs_attr =
+	__ATTR(max_vfs, 0644, max_vfs_show, max_vfs_store);
+
 static void
 pci_config_pm_runtime_get(struct pci_dev *pdev)
 {
@@ -1406,6 +1454,7 @@ late_initcall(pci_sysfs_init);
 
 static struct attribute *pci_dev_dev_attrs[] = {
 	&vga_attr.attr,
+	&max_vfs_attr.attr,
 	NULL,
 };
 
@@ -1419,6 +1468,10 @@ static umode_t pci_dev_attrs_are_visible
 		if ((pdev->class >> 8) != PCI_CLASS_DISPLAY_VGA)
 			return 0;
 
+	if (a == &max_vfs_attr.attr)
+		if (!pdev->is_physfn)
+			return 0;
+
 	return a->mode;
 }
 
Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -359,6 +359,7 @@ struct pci_dev {
 	unsigned int	broken_intx_masking:1;
 	unsigned int	io_window_1k:1;	/* Intel P2P bridge 1K I/O windows */
 	pci_dev_flags_t dev_flags;
+	unsigned int	max_vfs;
 	atomic_t	enable_cnt;	/* pci_enable_device has been called */
 
 	u32		saved_config_space[16]; /* config space saved at suspend time */

[-- Attachment #6: ixgbe_max_vfs.patch --]
[-- Type: application/octet-stream, Size: 3087 bytes --]

---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |    2 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   43 ++++++++++++++++++++------
 2 files changed, 36 insertions(+), 9 deletions(-)

Index: linux-2.6/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
===================================================================
--- linux-2.6.orig/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ linux-2.6/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -129,13 +129,6 @@ static struct notifier_block dca_notifie
 };
 #endif
 
-#ifdef CONFIG_PCI_IOV
-static unsigned int max_vfs;
-module_param(max_vfs, uint, 0);
-MODULE_PARM_DESC(max_vfs,
-		 "Maximum number of virtual functions to allocate per physical function - default is zero and maximum value is 63");
-#endif /* CONFIG_PCI_IOV */
-
 static unsigned int allow_unsupported_sfp;
 module_param(allow_unsupported_sfp, uint, 0);
 MODULE_PARM_DESC(allow_unsupported_sfp,
@@ -4528,7 +4521,7 @@ static int __devinit ixgbe_sw_init(struc
 #ifdef CONFIG_PCI_IOV
 	/* assign number of SR-IOV VFs */
 	if (hw->mac.type != ixgbe_mac_82598EB)
-		adapter->num_vfs = (max_vfs > 63) ? 0 : max_vfs;
+		adapter->num_vfs = (pdev->max_vfs > 63) ? 0 : pdev->max_vfs;
 
 #endif
 	/* enable itr by default in dynamic mode */
@@ -7249,8 +7242,9 @@ static int __devinit ixgbe_probe(struct
 
 #ifdef CONFIG_PCI_IOV
 	ixgbe_enable_sriov(adapter, ii);
-
 #endif
+	adapter->ixgbe_info = ii;
+
 	netdev->features = NETIF_F_SG |
 			   NETIF_F_IP_CSUM |
 			   NETIF_F_IPV6_CSUM |
@@ -7720,11 +7714,42 @@ static const struct pci_error_handlers i
 	.resume = ixgbe_io_resume,
 };
 
+static void ixgbe_set_max_vfs(struct pci_dev *pdev)
+{
+#ifdef CONFIG_PCI_IOV
+	struct ixgbe_adapter *adapter = pci_get_drvdata(pdev);
+	struct net_device *netdev = adapter->netdev;
+	struct ixgbe_hw *hw = &adapter->hw;
+	int num_vfs = 0;
+
+	/* assign number of SR-IOV VFs */
+	if (hw->mac.type != ixgbe_mac_82598EB)
+		num_vfs = (pdev->max_vfs > 63) ? 0 : pdev->max_vfs;
+
+	/* no change */
+	if (adapter->num_vfs == num_vfs)
+		return;
+
+	if (!num_vfs) {
+		/* disable sriov */
+		ixgbe_disable_sriov(adapter);
+		adapter->num_vfs = 0;
+	} else if (!adapter->num_vfs && num_vfs) {
+		/* enable sriov */
+		adapter->num_vfs = num_vfs;
+		ixgbe_enable_sriov(adapter, adapter->ixgbe_info);
+	} else {
+		/* increase or decrease */
+	}
+#endif
+}
+
 static struct pci_driver ixgbe_driver = {
 	.name     = ixgbe_driver_name,
 	.id_table = ixgbe_pci_tbl,
 	.probe    = ixgbe_probe,
 	.remove   = __devexit_p(ixgbe_remove),
+	.set_max_vfs = ixgbe_set_max_vfs,
 #ifdef CONFIG_PM
 	.suspend  = ixgbe_suspend,
 	.resume   = ixgbe_resume,
Index: linux-2.6/drivers/net/ethernet/intel/ixgbe/ixgbe.h
===================================================================
--- linux-2.6.orig/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ linux-2.6/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -571,6 +571,8 @@ struct ixgbe_adapter {
 	u32 interrupt_event;
 	u32 led_reg;
 
+	struct ixgbe_info *ixgbe_info;
+
 #ifdef CONFIG_IXGBE_PTP
 	struct ptp_clock *ptp_clock;
 	struct ptp_clock_info ptp_caps;

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-09-21  5:50                                                       ` Yinghai Lu
@ 2012-09-21 17:35                                                         ` Ben Hutchings
  2012-09-21 19:23                                                           ` Yinghai Lu
  2012-09-21 18:06                                                           ` Don Dutile
  1 sibling, 1 reply; 52+ messages in thread
From: Ben Hutchings @ 2012-09-21 17:35 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Rose, Gregory V, Bjorn Helgaas, Yuval Mintz, davem, netdev,
	Ariel Elior, Eilon Greenstein, linux-pci

On Thu, 2012-09-20 at 22:50 -0700, Yinghai Lu wrote:
[...]
> Index: linux-2.6/include/linux/pci.h
> ===================================================================
> --- linux-2.6.orig/include/linux/pci.h
> +++ linux-2.6/include/linux/pci.h
> @@ -663,6 +663,7 @@ struct pci_driver {
>         const struct pci_device_id *id_table;   /* must be non-NULL for probe to be called */
>         int  (*probe)  (struct pci_dev *dev, const struct pci_device_id *id);   /* New device inserted */
>         void (*remove) (struct pci_dev *dev);   /* Device removed (NULL if not a hot-plug capable driver) */
> +       void (*set_max_vfs) (struct pci_dev *dev); /* enable sriov */
>         int  (*suspend) (struct pci_dev *dev, pm_message_t state);      /* Device suspended */
>         int  (*suspend_late) (struct pci_dev *dev, pm_message_t state);
>         int  (*resume_early) (struct pci_dev *dev);

Not sure about this; the driver may fail to enable those VFs and it
would be nice to be able to report that failure directly.

That said, this is 'max_vfs' (a limit) and if the driver fails to set up
all the VFs then the *limit* may still be changed.

> Subject: [PATCH] PCI: Add max_vfs in sysfs per pci device where supports
> 
> driver later need to check dev->max_vfs in /sys
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[...]
> +static ssize_t
> +max_vfs_store(struct device *dev, struct device_attribute *attr,
> +                const char *buf, size_t count)
> +{
> +       unsigned long val;
> +       struct pci_dev *pdev = to_pci_dev(dev);
> +
> +       if (strict_strtoul(buf, 0, &val) < 0)
> +               return -EINVAL;
> +
> +       pdev->max_vfs = val;
> +
> +       if (pdev->is_added) {

No locking required here?

> +               int err;
> +               err = device_schedule_callback(dev, max_vfs_callback);

Any particular reason this should be a callback?

[...]
> Index: linux-2.6/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> ===================================================================
> --- linux-2.6.orig/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ linux-2.6/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -129,13 +129,6 @@ static struct notifier_block dca_notifie
>  };
>  #endif
>  
> -#ifdef CONFIG_PCI_IOV
> -static unsigned int max_vfs;
> -module_param(max_vfs, uint, 0);
> -MODULE_PARM_DESC(max_vfs,
> -                "Maximum number of virtual functions to allocate per
> physical function - default is zero and maximum value is 63");
> -#endif /* CONFIG_PCI_IOV */
> -
>  static unsigned int allow_unsupported_sfp;
>  module_param(allow_unsupported_sfp, uint, 0);
>  MODULE_PARM_DESC(allow_unsupported_sfp,
> @@ -4528,7 +4521,7 @@ static int __devinit ixgbe_sw_init(struc
>  #ifdef CONFIG_PCI_IOV
>         /* assign number of SR-IOV VFs */
>         if (hw->mac.type != ixgbe_mac_82598EB)
> -               adapter->num_vfs = (max_vfs > 63) ? 0 : max_vfs;
> +               adapter->num_vfs = (pdev->max_vfs > 63) ? 0 : pdev->max_vfs;

We are trying to make all SR-IOV capable drivers work the same, so this
weird limiting behaviour should not be retained.

So I think the correct assignment is:
		adapter->num_vfs = min(pdev->max_vfs, 63);

>  #endif
>         /* enable itr by default in dynamic mode */
> @@ -7249,8 +7242,9 @@ static int __devinit ixgbe_probe(struct
>  
>  #ifdef CONFIG_PCI_IOV
>         ixgbe_enable_sriov(adapter, ii);
> -
>  #endif
> +       adapter->ixgbe_info = ii;
> +
>         netdev->features = NETIF_F_SG |
>                            NETIF_F_IP_CSUM |
>                            NETIF_F_IPV6_CSUM |
> @@ -7720,11 +7714,42 @@ static const struct pci_error_handlers i
>         .resume = ixgbe_io_resume,
>  };
>  
> +static void ixgbe_set_max_vfs(struct pci_dev *pdev)
> +{
> +#ifdef CONFIG_PCI_IOV
> +       struct ixgbe_adapter *adapter = pci_get_drvdata(pdev);
> +       struct net_device *netdev = adapter->netdev;
> +       struct ixgbe_hw *hw = &adapter->hw;
> +       int num_vfs = 0;
> +
> +       /* assign number of SR-IOV VFs */
> +       if (hw->mac.type != ixgbe_mac_82598EB)
> +               num_vfs = (pdev->max_vfs > 63) ? 0 : pdev->max_vfs;
> +
> +       /* no change */
> +       if (adapter->num_vfs == num_vfs)
> +               return;
> +
> +       if (!num_vfs) {
> +               /* disable sriov */
> +               ixgbe_disable_sriov(adapter);
> +               adapter->num_vfs = 0;
> +       } else if (!adapter->num_vfs && num_vfs) {
> +               /* enable sriov */
> +               adapter->num_vfs = num_vfs;
> +               ixgbe_enable_sriov(adapter, adapter->ixgbe_info);
> +       } else {
> +               /* increase or decrease */

Indeed, increase or decrease is not supported either in our PCI API or
in the SR-IOV spec.  I think I would prefer the PCI core to filter out
unsupported changes (i.e. not call set_max_vfs and maybe report an
error), but I'm not sure about that.

Ben.

> +       }
> +#endif
> +}
> +
[...]

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-09-21  5:50                                                       ` Yinghai Lu
@ 2012-09-21 18:06                                                           ` Don Dutile
  2012-09-21 18:06                                                           ` Don Dutile
  1 sibling, 0 replies; 52+ messages in thread
From: Don Dutile @ 2012-09-21 18:06 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Rose, Gregory V, Ben Hutchings, Bjorn Helgaas, Yuval Mintz,
	davem, netdev, Ariel Elior, Eilon Greenstein, linux-pci

On 09/21/2012 01:50 AM, Yinghai Lu wrote:
> On Thu, Sep 20, 2012 at 8:39 AM, Rose, Gregory V
> <gregory.v.rose@intel.com>  wrote:
>>> -----Original Message-----
>>> From: Ben Hutchings [mailto:bhutchings@solarflare.com]
>>>>
>>>> could just stop the device and add it back again?
>>>
>>> This is highly disruptive and I think it would be totally unacceptable for
>>> at least networking devices.
>>
>> Agreed.
>>
>> We need the driver callback.
>
> ok, something like attached patches?
>
> ixgbe change need more cleanup from ixgbe guys.
>
> -Yinghai
yuk, no.
I have a set of patches almost done.
It provides 3 sysfs files (enable, disable, num_max_vfs);
callbacks for enable & disable.

i'm tied up until Monday on RHEL6, then I'll switch gears & post a set of patches.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
@ 2012-09-21 18:06                                                           ` Don Dutile
  0 siblings, 0 replies; 52+ messages in thread
From: Don Dutile @ 2012-09-21 18:06 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Rose, Gregory V, Ben Hutchings, Bjorn Helgaas, Yuval Mintz,
	davem, netdev, Ariel Elior, Eilon Greenstein, linux-pci

On 09/21/2012 01:50 AM, Yinghai Lu wrote:
> On Thu, Sep 20, 2012 at 8:39 AM, Rose, Gregory V
> <gregory.v.rose@intel.com>  wrote:
>>> -----Original Message-----
>>> From: Ben Hutchings [mailto:bhutchings@solarflare.com]
>>>>
>>>> could just stop the device and add it back again?
>>>
>>> This is highly disruptive and I think it would be totally unacceptable for
>>> at least networking devices.
>>
>> Agreed.
>>
>> We need the driver callback.
>
> ok, something like attached patches?
>
> ixgbe change need more cleanup from ixgbe guys.
>
> -Yinghai
yuk, no.
I have a set of patches almost done.
It provides 3 sysfs files (enable, disable, num_max_vfs);
callbacks for enable & disable.

i'm tied up until Monday on RHEL6, then I'll switch gears & post a set of patches.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-09-21 17:35                                                         ` Ben Hutchings
@ 2012-09-21 19:23                                                           ` Yinghai Lu
  0 siblings, 0 replies; 52+ messages in thread
From: Yinghai Lu @ 2012-09-21 19:23 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Rose, Gregory V, Bjorn Helgaas, Yuval Mintz, davem, netdev,
	Ariel Elior, Eilon Greenstein, linux-pci

On Fri, Sep 21, 2012 at 10:35 AM, Ben Hutchings
<bhutchings@solarflare.com> wrote:
> On Thu, 2012-09-20 at 22:50 -0700, Yinghai Lu wrote:
> [...]
>> Index: linux-2.6/include/linux/pci.h
>> ===================================================================
>> --- linux-2.6.orig/include/linux/pci.h
>> +++ linux-2.6/include/linux/pci.h
>> @@ -663,6 +663,7 @@ struct pci_driver {
>>         const struct pci_device_id *id_table;   /* must be non-NULL for probe to be called */
>>         int  (*probe)  (struct pci_dev *dev, const struct pci_device_id *id);   /* New device inserted */
>>         void (*remove) (struct pci_dev *dev);   /* Device removed (NULL if not a hot-plug capable driver) */
>> +       void (*set_max_vfs) (struct pci_dev *dev); /* enable sriov */
>>         int  (*suspend) (struct pci_dev *dev, pm_message_t state);      /* Device suspended */
>>         int  (*suspend_late) (struct pci_dev *dev, pm_message_t state);
>>         int  (*resume_early) (struct pci_dev *dev);
>
> Not sure about this; the driver may fail to enable those VFs and it
> would be nice to be able to report that failure directly.
>
> That said, this is 'max_vfs' (a limit) and if the driver fails to set up
> all the VFs then the *limit* may still be changed.
>
>> Subject: [PATCH] PCI: Add max_vfs in sysfs per pci device where supports
>>
>> driver later need to check dev->max_vfs in /sys
>>
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> [...]
>> +static ssize_t
>> +max_vfs_store(struct device *dev, struct device_attribute *attr,
>> +                const char *buf, size_t count)
>> +{
>> +       unsigned long val;
>> +       struct pci_dev *pdev = to_pci_dev(dev);
>> +
>> +       if (strict_strtoul(buf, 0, &val) < 0)
>> +               return -EINVAL;
>> +
>> +       pdev->max_vfs = val;
>> +
>> +       if (pdev->is_added) {
>
> No locking required here?

should be removed.

>
>> +               int err;
>> +               err = device_schedule_callback(dev, max_vfs_callback);
>
> Any particular reason this should be a callback?

no, just copied that from bus rescan.

>
> [...]
>> Index: linux-2.6/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>> ===================================================================
>> --- linux-2.6.orig/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>> +++ linux-2.6/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>> @@ -129,13 +129,6 @@ static struct notifier_block dca_notifie
>>  };
>>  #endif
>>
>> -#ifdef CONFIG_PCI_IOV
>> -static unsigned int max_vfs;
>> -module_param(max_vfs, uint, 0);
>> -MODULE_PARM_DESC(max_vfs,
>> -                "Maximum number of virtual functions to allocate per
>> physical function - default is zero and maximum value is 63");
>> -#endif /* CONFIG_PCI_IOV */
>> -
>>  static unsigned int allow_unsupported_sfp;
>>  module_param(allow_unsupported_sfp, uint, 0);
>>  MODULE_PARM_DESC(allow_unsupported_sfp,
>> @@ -4528,7 +4521,7 @@ static int __devinit ixgbe_sw_init(struc
>>  #ifdef CONFIG_PCI_IOV
>>         /* assign number of SR-IOV VFs */
>>         if (hw->mac.type != ixgbe_mac_82598EB)
>> -               adapter->num_vfs = (max_vfs > 63) ? 0 : max_vfs;
>> +               adapter->num_vfs = (pdev->max_vfs > 63) ? 0 : pdev->max_vfs;
>
> We are trying to make all SR-IOV capable drivers work the same, so this
> weird limiting behaviour should not be retained.
>
> So I think the correct assignment is:
>                 adapter->num_vfs = min(pdev->max_vfs, 63);

this will treat >63 as 63. old one treat > 63 as 0 aka disabled.

looks your version is more reasonable...

>
>>  #endif
>>         /* enable itr by default in dynamic mode */
>> @@ -7249,8 +7242,9 @@ static int __devinit ixgbe_probe(struct
>>
>>  #ifdef CONFIG_PCI_IOV
>>         ixgbe_enable_sriov(adapter, ii);
>> -
>>  #endif
>> +       adapter->ixgbe_info = ii;
>> +
>>         netdev->features = NETIF_F_SG |
>>                            NETIF_F_IP_CSUM |
>>                            NETIF_F_IPV6_CSUM |
>> @@ -7720,11 +7714,42 @@ static const struct pci_error_handlers i
>>         .resume = ixgbe_io_resume,
>>  };
>>
>> +static void ixgbe_set_max_vfs(struct pci_dev *pdev)
>> +{
>> +#ifdef CONFIG_PCI_IOV
>> +       struct ixgbe_adapter *adapter = pci_get_drvdata(pdev);
>> +       struct net_device *netdev = adapter->netdev;
>> +       struct ixgbe_hw *hw = &adapter->hw;
>> +       int num_vfs = 0;
>> +
>> +       /* assign number of SR-IOV VFs */
>> +       if (hw->mac.type != ixgbe_mac_82598EB)
>> +               num_vfs = (pdev->max_vfs > 63) ? 0 : pdev->max_vfs;
>> +
>> +       /* no change */
>> +       if (adapter->num_vfs == num_vfs)
>> +               return;
>> +
>> +       if (!num_vfs) {
>> +               /* disable sriov */
>> +               ixgbe_disable_sriov(adapter);
>> +               adapter->num_vfs = 0;
>> +       } else if (!adapter->num_vfs && num_vfs) {
>> +               /* enable sriov */
>> +               adapter->num_vfs = num_vfs;
>> +               ixgbe_enable_sriov(adapter, adapter->ixgbe_info);
>> +       } else {
>> +               /* increase or decrease */
>
> Indeed, increase or decrease is not supported either in our PCI API or
> in the SR-IOV spec.  I think I would prefer the PCI core to filter out
> unsupported changes (i.e. not call set_max_vfs and maybe report an
> error), but I'm not sure about that.

should still call set_max_vfs, and let it set finally valid max_vfs.

pci driver should know better which max_vfs is better for exact ...

-Yinghai

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-09-21 18:06                                                           ` Don Dutile
  (?)
@ 2012-09-21 19:49                                                           ` Yinghai Lu
  2012-09-21 20:08                                                               ` Don Dutile
  -1 siblings, 1 reply; 52+ messages in thread
From: Yinghai Lu @ 2012-09-21 19:49 UTC (permalink / raw)
  To: Don Dutile
  Cc: Rose, Gregory V, Ben Hutchings, Bjorn Helgaas, Yuval Mintz,
	davem, netdev, Ariel Elior, Eilon Greenstein, linux-pci

On Fri, Sep 21, 2012 at 11:06 AM, Don Dutile <ddutile@redhat.com> wrote:
>>
>> ok, something like attached patches?
>>
>> ixgbe change need more cleanup from ixgbe guys.
>>
> yuk, no.
> I have a set of patches almost done.

good. includes update to those network drivers?

> It provides 3 sysfs files (enable, disable, num_max_vfs);
> callbacks for enable & disable.

why using three? only pass max_vfs should be enough...

aka pass 0 mean disabling, pass other value mean enabling.

>
> i'm tied up until Monday on RHEL6, then I'll switch gears & post a set of
> patches.

so that is your employer 'sinternal policy? for RHEL 6 kernel first,
then upstream kernel?
thought RH only accept backporting only patches get into upstream already.

-Yinghai

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-09-21 19:49                                                           ` Yinghai Lu
@ 2012-09-21 20:08                                                               ` Don Dutile
  0 siblings, 0 replies; 52+ messages in thread
From: Don Dutile @ 2012-09-21 20:08 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Rose, Gregory V, Ben Hutchings, Bjorn Helgaas, Yuval Mintz,
	davem, netdev, Ariel Elior, Eilon Greenstein, linux-pci

On 09/21/2012 03:49 PM, Yinghai Lu wrote:
> On Fri, Sep 21, 2012 at 11:06 AM, Don Dutile<ddutile@redhat.com>  wrote:
>>>
>>> ok, something like attached patches?
>>>
>>> ixgbe change need more cleanup from ixgbe guys.
>>>
>> yuk, no.
>> I have a set of patches almost done.
>
> good. includes update to those network drivers?
>
>> It provides 3 sysfs files (enable, disable, num_max_vfs);
>> callbacks for enable&  disable.
>
> why using three? only pass max_vfs should be enough...
>
> aka pass 0 mean disabling, pass other value mean enabling.
>
could do that.  but I wouldn't use 'max_vfs'; I would recommend
'num_vfs', as max implies, er, um, the maximum, and what one
wants is to be able to enable a number from 1->max_vfs.
'max_vfs' will be provided by another file.

>>
>> i'm tied up until Monday on RHEL6, then I'll switch gears&  post a set of
>> patches.
>
> so that is your employer 'sinternal policy? for RHEL 6 kernel first,
> then upstream kernel?
No, I have deadlines for RHEL6 for *other work* until Monday.  After that,
I can re-focus on upstream work.  Some of us actually have other work than
just upstream.... crazy talk, I know! ;-)

> thought RH only accept backporting only patches get into upstream already.
>
yup. 'upstream first' is the rule at RH before it gets into a RHEL release...
and sometimes that isn't sufficient!

Have a beer, enjoy the weekend, posting forthcoming next week!

> -Yinghai

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
@ 2012-09-21 20:08                                                               ` Don Dutile
  0 siblings, 0 replies; 52+ messages in thread
From: Don Dutile @ 2012-09-21 20:08 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Rose, Gregory V, Ben Hutchings, Bjorn Helgaas, Yuval Mintz,
	davem, netdev, Ariel Elior, Eilon Greenstein, linux-pci

On 09/21/2012 03:49 PM, Yinghai Lu wrote:
> On Fri, Sep 21, 2012 at 11:06 AM, Don Dutile<ddutile@redhat.com>  wrote:
>>>
>>> ok, something like attached patches?
>>>
>>> ixgbe change need more cleanup from ixgbe guys.
>>>
>> yuk, no.
>> I have a set of patches almost done.
>
> good. includes update to those network drivers?
>
>> It provides 3 sysfs files (enable, disable, num_max_vfs);
>> callbacks for enable&  disable.
>
> why using three? only pass max_vfs should be enough...
>
> aka pass 0 mean disabling, pass other value mean enabling.
>
could do that.  but I wouldn't use 'max_vfs'; I would recommend
'num_vfs', as max implies, er, um, the maximum, and what one
wants is to be able to enable a number from 1->max_vfs.
'max_vfs' will be provided by another file.

>>
>> i'm tied up until Monday on RHEL6, then I'll switch gears&  post a set of
>> patches.
>
> so that is your employer 'sinternal policy? for RHEL 6 kernel first,
> then upstream kernel?
No, I have deadlines for RHEL6 for *other work* until Monday.  After that,
I can re-focus on upstream work.  Some of us actually have other work than
just upstream.... crazy talk, I know! ;-)

> thought RH only accept backporting only patches get into upstream already.
>
yup. 'upstream first' is the rule at RH before it gets into a RHEL release...
and sometimes that isn't sufficient!

Have a beer, enjoy the weekend, posting forthcoming next week!

> -Yinghai


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-09-21 20:08                                                               ` Don Dutile
  (?)
@ 2012-09-23 15:49                                                               ` Yuval Mintz
  2012-09-24 17:37                                                                   ` Don Dutile
  -1 siblings, 1 reply; 52+ messages in thread
From: Yuval Mintz @ 2012-09-23 15:49 UTC (permalink / raw)
  To: Don Dutile
  Cc: Yinghai Lu, Rose, Gregory V, Ben Hutchings, Bjorn Helgaas, davem,
	netdev, Ariel Elior, Eilon Greenstein, linux-pci

>>> It provides 3 sysfs files (enable, disable, num_max_vfs);
>>> callbacks for enable&  disable.
>>
>> why using three? only pass max_vfs should be enough...
>>
>> aka pass 0 mean disabling, pass other value mean enabling.
>>
> could do that.  but I wouldn't use 'max_vfs'; I would recommend
> 'num_vfs', as max implies, er, um, the maximum, and what one
> wants is to be able to enable a number from 1->max_vfs.
> 'max_vfs' will be provided by another file.
> 

If we're at it, how do you suggest the configurations of the VFs
resources/properties should be made?
Notice I'm asking about properties which relate to all PCI devices,
E.g., number of interrupts assigned to each VF (for VF RSS).

(For properties which relate only to networking devices, we could
use standard network API such as netlink).

Perhaps given a configurable property that applies to all PCI devices,
once SRIOV is enabled and the VFs are created, there would be an
additional sysfs node under the VFs through which the user could
configure that property.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-09-23 15:49                                                               ` Yuval Mintz
@ 2012-09-24 17:37                                                                   ` Don Dutile
  0 siblings, 0 replies; 52+ messages in thread
From: Don Dutile @ 2012-09-24 17:37 UTC (permalink / raw)
  To: Yuval Mintz
  Cc: Yinghai Lu, Rose, Gregory V, Ben Hutchings, Bjorn Helgaas, davem,
	netdev, Ariel Elior, Eilon Greenstein, linux-pci

On 09/23/2012 11:49 AM, Yuval Mintz wrote:
>>>> It provides 3 sysfs files (enable, disable, num_max_vfs);
>>>> callbacks for enable&   disable.
>>>
>>> why using three? only pass max_vfs should be enough...
>>>
>>> aka pass 0 mean disabling, pass other value mean enabling.
>>>
>> could do that.  but I wouldn't use 'max_vfs'; I would recommend
>> 'num_vfs', as max implies, er, um, the maximum, and what one
>> wants is to be able to enable a number from 1->max_vfs.
>> 'max_vfs' will be provided by another file.
>>
>
> If we're at it, how do you suggest the configurations of the VFs
> resources/properties should be made?
> Notice I'm asking about properties which relate to all PCI devices,
> E.g., number of interrupts assigned to each VF (for VF RSS).
>
This issue/question is indep of VFs & SRIOV.
It's a generic problem for PFs and how many (msi) intr's a device
is given, based on its request, i.e., system provides 8 MSI per cpu
per device ... how are those 8 assigned/used by the device? ...
how to free 4 of the 8 if one device doesn't use them, and another
device could use them..   cache/memory affinity to intr's.... the list
goes on...

> (For properties which relate only to networking devices, we could
> use standard network API such as netlink).
>
> Perhaps given a configurable property that applies to all PCI devices,
A VF is 'just another PCI device', so configurable properties
for all PCI devices should encompass VF/SRIOV devices equally.

> once SRIOV is enabled and the VFs are created, there would be an
> additional sysfs node under the VFs through which the user could
> configure that property.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
@ 2012-09-24 17:37                                                                   ` Don Dutile
  0 siblings, 0 replies; 52+ messages in thread
From: Don Dutile @ 2012-09-24 17:37 UTC (permalink / raw)
  To: Yuval Mintz
  Cc: Yinghai Lu, Rose, Gregory V, Ben Hutchings, Bjorn Helgaas, davem,
	netdev, Ariel Elior, Eilon Greenstein, linux-pci

On 09/23/2012 11:49 AM, Yuval Mintz wrote:
>>>> It provides 3 sysfs files (enable, disable, num_max_vfs);
>>>> callbacks for enable&   disable.
>>>
>>> why using three? only pass max_vfs should be enough...
>>>
>>> aka pass 0 mean disabling, pass other value mean enabling.
>>>
>> could do that.  but I wouldn't use 'max_vfs'; I would recommend
>> 'num_vfs', as max implies, er, um, the maximum, and what one
>> wants is to be able to enable a number from 1->max_vfs.
>> 'max_vfs' will be provided by another file.
>>
>
> If we're at it, how do you suggest the configurations of the VFs
> resources/properties should be made?
> Notice I'm asking about properties which relate to all PCI devices,
> E.g., number of interrupts assigned to each VF (for VF RSS).
>
This issue/question is indep of VFs & SRIOV.
It's a generic problem for PFs and how many (msi) intr's a device
is given, based on its request, i.e., system provides 8 MSI per cpu
per device ... how are those 8 assigned/used by the device? ...
how to free 4 of the 8 if one device doesn't use them, and another
device could use them..   cache/memory affinity to intr's.... the list
goes on...

> (For properties which relate only to networking devices, we could
> use standard network API such as netlink).
>
> Perhaps given a configurable property that applies to all PCI devices,
A VF is 'just another PCI device', so configurable properties
for all PCI devices should encompass VF/SRIOV devices equally.

> once SRIOV is enabled and the VFs are created, there would be an
> additional sysfs node under the VFs through which the user could
> configure that property.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-09-21 20:08                                                               ` Don Dutile
  (?)
  (?)
@ 2012-09-30  6:39                                                               ` Yuval Mintz
  2012-10-01 14:12                                                                   ` Don Dutile
  -1 siblings, 1 reply; 52+ messages in thread
From: Yuval Mintz @ 2012-09-30  6:39 UTC (permalink / raw)
  To: Don Dutile
  Cc: Yinghai Lu, Rose, Gregory V, Ben Hutchings, Bjorn Helgaas, davem,
	netdev, Ariel Elior, Eilon Greenstein, linux-pci

>>> yuk, no.
>>> I have a set of patches almost done.
>>> i'm tied up until Monday on RHEL6, then I'll switch gears&  post a set of
>>> patches.

hi Don - anything new about this incoming patch-series?

>>
>> so that is your employer 'sinternal policy? for RHEL 6 kernel first,
>> then upstream kernel?
> No, I have deadlines for RHEL6 for *other work* until Monday.  After that,
> I can re-focus on upstream work.  Some of us actually have other work than
> just upstream.... crazy talk, I know! ;-)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
  2012-09-30  6:39                                                               ` Yuval Mintz
@ 2012-10-01 14:12                                                                   ` Don Dutile
  0 siblings, 0 replies; 52+ messages in thread
From: Don Dutile @ 2012-10-01 14:12 UTC (permalink / raw)
  To: Yuval Mintz
  Cc: Yinghai Lu, Rose, Gregory V, Ben Hutchings, Bjorn Helgaas, davem,
	netdev, Ariel Elior, Eilon Greenstein, linux-pci

On 09/30/2012 02:39 AM, Yuval Mintz wrote:
>>>> yuk, no.
>>>> I have a set of patches almost done.
>>>> i'm tied up until Monday on RHEL6, then I'll switch gears&   post a set of
>>>> patches.
>
> hi Don - anything new about this incoming patch-series?
>
>>>
>>> so that is your employer 'sinternal policy? for RHEL 6 kernel first,
>>> then upstream kernel?
>> No, I have deadlines for RHEL6 for *other work* until Monday.  After that,
>> I can re-focus on upstream work.  Some of us actually have other work than
>> just upstream.... crazy talk, I know! ;-)
>
>
I should have the sysfs-level patches done & posted today.
Need another day or 2 to get a driver refactored to use it (ixgbe).

Although the patches will enable per-device sriov enablement/disablement,
further thought on tools that will stack on top of it, probably will need
to augment a new pci sysfs directory that tools can find sriov devices
quicker/easier w/o traversing the entire /sys/bus/pci/devices tree
searching for a 'sriov_max_vfs' file, but, I can do that as a follow-on patch.
I'm thinking of what is done now with virtfn<X> linked files in the pf,
but the inverse link in a directory, like /sys/bus/pci/sriov.

Then there's the additional patches needed to do an 'all', or <DOMAIN:B:D.F>
qualified enable/disable from a per-sysfs-driver perspective. ... fun stuff!

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: New commands to configure IOV features
@ 2012-10-01 14:12                                                                   ` Don Dutile
  0 siblings, 0 replies; 52+ messages in thread
From: Don Dutile @ 2012-10-01 14:12 UTC (permalink / raw)
  To: Yuval Mintz
  Cc: Yinghai Lu, Rose, Gregory V, Ben Hutchings, Bjorn Helgaas, davem,
	netdev, Ariel Elior, Eilon Greenstein, linux-pci

On 09/30/2012 02:39 AM, Yuval Mintz wrote:
>>>> yuk, no.
>>>> I have a set of patches almost done.
>>>> i'm tied up until Monday on RHEL6, then I'll switch gears&   post a set of
>>>> patches.
>
> hi Don - anything new about this incoming patch-series?
>
>>>
>>> so that is your employer 'sinternal policy? for RHEL 6 kernel first,
>>> then upstream kernel?
>> No, I have deadlines for RHEL6 for *other work* until Monday.  After that,
>> I can re-focus on upstream work.  Some of us actually have other work than
>> just upstream.... crazy talk, I know! ;-)
>
>
I should have the sysfs-level patches done & posted today.
Need another day or 2 to get a driver refactored to use it (ixgbe).

Although the patches will enable per-device sriov enablement/disablement,
further thought on tools that will stack on top of it, probably will need
to augment a new pci sysfs directory that tools can find sriov devices
quicker/easier w/o traversing the entire /sys/bus/pci/devices tree
searching for a 'sriov_max_vfs' file, but, I can do that as a follow-on patch.
I'm thinking of what is done now with virtfn<X> linked files in the pf,
but the inverse link in a directory, like /sys/bus/pci/sriov.

Then there's the additional patches needed to do an 'all', or <DOMAIN:B:D.F>
qualified enable/disable from a per-sysfs-driver perspective. ... fun stuff!

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2012-10-01 14:12 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-07 11:17 New commands to configure IOV features Yuval Mintz
2012-05-07 15:16 ` Greg Rose
2012-06-26 12:21   ` Yuval Mintz
2012-06-26 16:13     ` Alexander Duyck
2012-06-26 17:19     ` Greg Rose
2012-07-01 11:09       ` Yuval Mintz
2012-07-09 18:39       ` Ben Hutchings
2012-07-09 21:13         ` Chris Friesen
2012-07-16  9:19           ` Yuval Mintz
2012-07-17 19:29             ` Don Dutile
2012-07-17 21:08               ` Chris Friesen
2012-07-17 21:11                 ` David Miller
2012-07-20 15:27                   ` Chris Friesen
2012-07-20 15:56                     ` Don Dutile
2012-07-20 17:42                       ` Ben Hutchings
2012-07-20 19:29                         ` Chris Friesen
2012-07-20 20:01                           ` Ben Hutchings
2012-07-20 20:15                             ` Don Dutile
2012-07-20 23:42                             ` Chris Friesen
2012-07-21  0:52                               ` Rose, Gregory V
2012-07-23 14:03                               ` Don Dutile
2012-07-23 15:09                                 ` Chris Friesen
2012-07-23 17:06                                   ` Rose, Gregory V
2012-07-23 18:36                                   ` Stephen Hemminger
2012-07-23 18:40                                     ` Rose, Gregory V
2012-09-19 11:07                                       ` Yuval Mintz
2012-09-19 15:53                                         ` Greg Rose
2012-09-19 19:44                                           ` Ben Hutchings
2012-09-19 22:17                                             ` Yinghai Lu
2012-09-19 22:46                                               ` Ben Hutchings
2012-09-20  0:19                                                 ` Yinghai Lu
2012-09-20  1:23                                                   ` Ben Hutchings
2012-09-20  2:27                                                     ` Yinghai Lu
2012-09-20  3:08                                                       ` Subhendu Ghosh
2012-09-20 15:39                                                     ` Rose, Gregory V
2012-09-20 15:39                                                       ` Rose, Gregory V
2012-09-21  5:50                                                       ` Yinghai Lu
2012-09-21 17:35                                                         ` Ben Hutchings
2012-09-21 19:23                                                           ` Yinghai Lu
2012-09-21 18:06                                                         ` Don Dutile
2012-09-21 18:06                                                           ` Don Dutile
2012-09-21 19:49                                                           ` Yinghai Lu
2012-09-21 20:08                                                             ` Don Dutile
2012-09-21 20:08                                                               ` Don Dutile
2012-09-23 15:49                                                               ` Yuval Mintz
2012-09-24 17:37                                                                 ` Don Dutile
2012-09-24 17:37                                                                   ` Don Dutile
2012-09-30  6:39                                                               ` Yuval Mintz
2012-10-01 14:12                                                                 ` Don Dutile
2012-10-01 14:12                                                                   ` Don Dutile
2012-09-19 17:49                                         ` David Miller
2012-07-23 16:37                                 ` Rose, Gregory V

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.