linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PCI mini-summit notes
@ 2012-08-29  7:28 Bjorn Helgaas
  2012-08-29 15:48 ` Jiang Liu
  0 siblings, 1 reply; 8+ messages in thread
From: Bjorn Helgaas @ 2012-08-29  7:28 UTC (permalink / raw)
  To: ksummit-2012-discuss; +Cc: linux-pci

We held a PCI mini-summit on Aug 28 in San Diego in conjunction with
Kernel Summit and the Linux Plumbers Conference.  I want to thank
everybody who participated.  We had a good discussion and I really
appreciate all the input and ideas everybody provided.

My summary of the major discussions we had is below.

Bjorn



Host bridge hotplug

    There's a lot of interest in this functionality, mostly on x86 using
    ACPI-mediated hotplug.

    The acpiphp driver handles both host bridge and PCI device hotplug.  We
    believe these should be separated.

    Host bridge hotplug requires IOAPIC and DMAR hotplug with proper sequencing
    (started before PCI enumeration and removed after PCI drivers are removed).
    On x86, we think this should happen naturally if we add this support to the
    ACPI pci_root.c driver.  We do need some tweaks to x86 IOAPIC init and
    IOMMU drivers.

    We'd like a sysfs interface to this, and it's not clear what form
    it should take.  One way is to add hooks in the PCI side, e.g.,
    /sys/devices/.../pci_bus/remove.  This has the advantage of looking the
    same across all architectures, but it doesn't map well to firmware
    interfaces and it's not obvious how to deal with hot-adds, when the pci_bus
    doesn't exist yet.  Another way would be to have them connected to the host
    bridge and its enclosing scope, e.g., /sys/devices/.../PNP0A08:00/remove
    and /sys/devices/.../LNXSYBUS:00/rescan.  This is architecture-specific but
    has the advantage of matching the logical system topology.

Hot Plug Issues

    We know we have locking issues and races in the PCI device hotplug area.
    We have some pending patches to address these.  They may be merged for 3.7
    or 3.8.

    We still have some device setup being done by initcalls, and obviously this
    doesn't work for hot-added devices.  We've fixed some of these areas, but
    there are a few more to do.

    What about CONFIG_HOTPLUG?  We didn't discuss this in the mini-summit,
    but it was raised on the ksummit-discuss list.

SR-IOV Management

    Currently drivers implement module parameters like "max_vfs".  This means
    all devices claimed by the driver get the same number of VFs, and you can't
    change anything without unloading and reloading the driver.

    Consensus that we should try to implement a knob for this in sysfs so it
    can be generic (not in each driver) and set individually for each device.

SR-IOV Implementation Issues

    VFs of a single device can appear on several "virtual" buses as well as on
    the PF's bus.  The virtual buses are not connected to an upstream bridge,
    so typical code that iterates over bus->devices lists misses these VFs.

    We had several ideas for fixing this, but the right answer is not
    obvious yet.

PCI Device Resources

    We've been moving more resource management from architecture code into
    the core.  For example, the core now supports host bridge address
    translation.  However, this exposes inconsistencies in how we decide
    whether a BAR contains a valid address.  We may need a new pcibios
    interface to handle special cases here.

    We plan to continue moving this code out of architectures so that,
    for example, pci_claim_resource() is done consistently in the core.

    In the longer term, we'd like to pull pcibios_assign_resources() into
    the core as well, along with the flags that tell us to either pay attention
    to or ignore what firmware has done.

    We've had patches circulating that do reassignment of bus numbers to make
    space for hot-added devices.  We're very concerned about the safety of this
    because we fear that ACPI AML, DMAR tables, and other firmware may assume
    that bus/device/function addresses stay constant.

Max Payload Size

    MPS is a knob that can improve performance but has to be set consistently
    on all communicating devices.  We have code to do this, but were burned in
    the past by defective devices, so it's currently turned off.  We have
    quirks for those devices, and we hope to try turning this on again in the
    3.7 merge window.

    There are also potential issues when hot-adding a device that requires
    something other than the current MPS setting.  This needs to be
    investigated more.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PCI mini-summit notes
  2012-08-29  7:28 PCI mini-summit notes Bjorn Helgaas
@ 2012-08-29 15:48 ` Jiang Liu
  2012-08-29 22:44   ` Jon Mason
  0 siblings, 1 reply; 8+ messages in thread
From: Jiang Liu @ 2012-08-29 15:48 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: ksummit-2012-discuss, linux-pci

On 08/29/2012 03:28 PM, Bjorn Helgaas wrote:
> We held a PCI mini-summit on Aug 28 in San Diego in conjunction with
> Kernel Summit and the Linux Plumbers Conference.  I want to thank
> everybody who participated.  We had a good discussion and I really
> appreciate all the input and ideas everybody provided.
> 
> My summary of the major discussions we had is below.
> 
> Bjorn
> 
> 
> 
> Host bridge hotplug
> 
>     There's a lot of interest in this functionality, mostly on x86 using
>     ACPI-mediated hotplug.
> 
>     The acpiphp driver handles both host bridge and PCI device hotplug.  We
>     believe these should be separated.
> 
>     Host bridge hotplug requires IOAPIC and DMAR hotplug with proper sequencing
>     (started before PCI enumeration and removed after PCI drivers are removed).
>     On x86, we think this should happen naturally if we add this support to the
>     ACPI pci_root.c driver.  We do need some tweaks to x86 IOAPIC init and
>     IOMMU drivers.
> 
>     We'd like a sysfs interface to this, and it's not clear what form
>     it should take.  One way is to add hooks in the PCI side, e.g.,
>     /sys/devices/.../pci_bus/remove.  This has the advantage of looking the
>     same across all architectures, but it doesn't map well to firmware
>     interfaces and it's not obvious how to deal with hot-adds, when the pci_bus
>     doesn't exist yet.  Another way would be to have them connected to the host
>     bridge and its enclosing scope, e.g., /sys/devices/.../PNP0A08:00/remove
>     and /sys/devices/.../LNXSYBUS:00/rescan.  This is architecture-specific but
>     has the advantage of matching the logical system topology.
> 
> Hot Plug Issues
> 
>     We know we have locking issues and races in the PCI device hotplug area.
>     We have some pending patches to address these.  They may be merged for 3.7
>     or 3.8.
> 
>     We still have some device setup being done by initcalls, and obviously this
>     doesn't work for hot-added devices.  We've fixed some of these areas, but
>     there are a few more to do.
> 
>     What about CONFIG_HOTPLUG?  We didn't discuss this in the mini-summit,
>     but it was raised on the ksummit-discuss list.
> 
> SR-IOV Management
> 
>     Currently drivers implement module parameters like "max_vfs".  This means
>     all devices claimed by the driver get the same number of VFs, and you can't
>     change anything without unloading and reloading the driver.
> 
>     Consensus that we should try to implement a knob for this in sysfs so it
>     can be generic (not in each driver) and set individually for each device.
Hi Bjorn,
	One of my team member reported another corner case for SR-IOV. There's
are two NIC cards in the system driven by the same driver, but one supports
SR-IOV and the other doesn't. It runs into trouble if "max_vfs" parameter is
set for the NIC driver.
	--Gerry



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PCI mini-summit notes
  2012-08-29 15:48 ` Jiang Liu
@ 2012-08-29 22:44   ` Jon Mason
  2012-08-29 22:54     ` Yinghai Lu
  0 siblings, 1 reply; 8+ messages in thread
From: Jon Mason @ 2012-08-29 22:44 UTC (permalink / raw)
  To: Jiang Liu; +Cc: Bjorn Helgaas, ksummit-2012-discuss, linux-pci

On Wed, Aug 29, 2012 at 8:48 AM, Jiang Liu <liuj97@gmail.com> wrote:
> On 08/29/2012 03:28 PM, Bjorn Helgaas wrote:
>> We held a PCI mini-summit on Aug 28 in San Diego in conjunction with
>> Kernel Summit and the Linux Plumbers Conference.  I want to thank
>> everybody who participated.  We had a good discussion and I really
>> appreciate all the input and ideas everybody provided.
>>
>> My summary of the major discussions we had is below.
>>
>> Bjorn
>>
>>
>>
>> Host bridge hotplug
>>
>>     There's a lot of interest in this functionality, mostly on x86 using
>>     ACPI-mediated hotplug.
>>
>>     The acpiphp driver handles both host bridge and PCI device hotplug.  We
>>     believe these should be separated.
>>
>>     Host bridge hotplug requires IOAPIC and DMAR hotplug with proper sequencing
>>     (started before PCI enumeration and removed after PCI drivers are removed).
>>     On x86, we think this should happen naturally if we add this support to the
>>     ACPI pci_root.c driver.  We do need some tweaks to x86 IOAPIC init and
>>     IOMMU drivers.
>>
>>     We'd like a sysfs interface to this, and it's not clear what form
>>     it should take.  One way is to add hooks in the PCI side, e.g.,
>>     /sys/devices/.../pci_bus/remove.  This has the advantage of looking the
>>     same across all architectures, but it doesn't map well to firmware
>>     interfaces and it's not obvious how to deal with hot-adds, when the pci_bus
>>     doesn't exist yet.  Another way would be to have them connected to the host
>>     bridge and its enclosing scope, e.g., /sys/devices/.../PNP0A08:00/remove
>>     and /sys/devices/.../LNXSYBUS:00/rescan.  This is architecture-specific but
>>     has the advantage of matching the logical system topology.
>>
>> Hot Plug Issues
>>
>>     We know we have locking issues and races in the PCI device hotplug area.
>>     We have some pending patches to address these.  They may be merged for 3.7
>>     or 3.8.
>>
>>     We still have some device setup being done by initcalls, and obviously this
>>     doesn't work for hot-added devices.  We've fixed some of these areas, but
>>     there are a few more to do.
>>
>>     What about CONFIG_HOTPLUG?  We didn't discuss this in the mini-summit,
>>     but it was raised on the ksummit-discuss list.
>>
>> SR-IOV Management
>>
>>     Currently drivers implement module parameters like "max_vfs".  This means
>>     all devices claimed by the driver get the same number of VFs, and you can't
>>     change anything without unloading and reloading the driver.
>>
>>     Consensus that we should try to implement a knob for this in sysfs so it
>>     can be generic (not in each driver) and set individually for each device.
> Hi Bjorn,
>         One of my team member reported another corner case for SR-IOV. There's
> are two NIC cards in the system driven by the same driver, but one supports
> SR-IOV and the other doesn't. It runs into trouble if "max_vfs" parameter is
> set for the NIC driver.
>         --Gerry

I believe it was decided that a per-pf sysfs interface would be used
to replace the current module parameter that specifies the number of
vf's.  This should enable different numbers of vf's for each physical
device.  The driver interface that was discussed would introduce new
function pointers for handlers to setup/teardown the vf's.  I believe
this will solve your problem once it has been implemented.

Thanks,
Jon

>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PCI mini-summit notes
  2012-08-29 22:44   ` Jon Mason
@ 2012-08-29 22:54     ` Yinghai Lu
  2012-08-30  6:02       ` Bjorn Helgaas
  2012-09-03  8:51       ` Ram Pai
  0 siblings, 2 replies; 8+ messages in thread
From: Yinghai Lu @ 2012-08-29 22:54 UTC (permalink / raw)
  To: Jon Mason; +Cc: Jiang Liu, Bjorn Helgaas, ksummit-2012-discuss, linux-pci

On Wed, Aug 29, 2012 at 3:44 PM, Jon Mason <jdmason@kudzu.us> wrote:
>> Hi Bjorn,
>>         One of my team member reported another corner case for SR-IOV. There's
>> are two NIC cards in the system driven by the same driver, but one supports
>> SR-IOV and the other doesn't. It runs into trouble if "max_vfs" parameter is
>> set for the NIC driver.
>>         --Gerry
>
> I believe it was decided that a per-pf sysfs interface would be used
> to replace the current module parameter that specifies the number of
> vf's.  This should enable different numbers of vf's for each physical
> device.  The driver interface that was discussed would introduce new
> function pointers for handlers to setup/teardown the vf's.  I believe
> this will solve your problem once it has been implemented.

now we have ixgbe.max_vfs=63

so if change to per pci device (PF),

how about having the driver built-in?
what kind of kernel parameters will be passed?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PCI mini-summit notes
  2012-08-29 22:54     ` Yinghai Lu
@ 2012-08-30  6:02       ` Bjorn Helgaas
  2012-09-05  1:33         ` Don Dutile
  2012-09-03  8:51       ` Ram Pai
  1 sibling, 1 reply; 8+ messages in thread
From: Bjorn Helgaas @ 2012-08-30  6:02 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Jon Mason, Jiang Liu, linux-pci

[removed cc ksummit-2012-discuss]

On Wed, Aug 29, 2012 at 3:54 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Wed, Aug 29, 2012 at 3:44 PM, Jon Mason <jdmason@kudzu.us> wrote:
>>> Hi Bjorn,
>>>         One of my team member reported another corner case for SR-IOV. There's
>>> are two NIC cards in the system driven by the same driver, but one supports
>>> SR-IOV and the other doesn't. It runs into trouble if "max_vfs" parameter is
>>> set for the NIC driver.
>>>         --Gerry
>>
>> I believe it was decided that a per-pf sysfs interface would be used
>> to replace the current module parameter that specifies the number of
>> vf's.  This should enable different numbers of vf's for each physical
>> device.  The driver interface that was discussed would introduce new
>> function pointers for handlers to setup/teardown the vf's.  I believe
>> this will solve your problem once it has been implemented.
>
> now we have ixgbe.max_vfs=63
>
> so if change to per pci device (PF),
>
> how about having the driver built-in?
> what kind of kernel parameters will be passed?

I don't think we really discussed things at this level, and I
personally don't know enough about the current SR-IOV support to even
know what the possible strategies are.  I think it's really up to the
person doing the implementation to figure out what makes sense given
the constraints of the SR-IOV specs and the current Linux support.

Bjorn

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PCI mini-summit notes
  2012-08-29 22:54     ` Yinghai Lu
  2012-08-30  6:02       ` Bjorn Helgaas
@ 2012-09-03  8:51       ` Ram Pai
  1 sibling, 0 replies; 8+ messages in thread
From: Ram Pai @ 2012-09-03  8:51 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Jon Mason, Jiang Liu, Bjorn Helgaas, ksummit-2012-discuss, linux-pci

On Wed, Aug 29, 2012 at 03:54:56PM -0700, Yinghai Lu wrote:
> On Wed, Aug 29, 2012 at 3:44 PM, Jon Mason <jdmason@kudzu.us> wrote:
> >> Hi Bjorn,
> >>         One of my team member reported another corner case for SR-IOV. There's
> >> are two NIC cards in the system driven by the same driver, but one supports
> >> SR-IOV and the other doesn't. It runs into trouble if "max_vfs" parameter is
> >> set for the NIC driver.
> >>         --Gerry
> >
> > I believe it was decided that a per-pf sysfs interface would be used
> > to replace the current module parameter that specifies the number of
> > vf's.  This should enable different numbers of vf's for each physical
> > device.  The driver interface that was discussed would introduce new
> > function pointers for handlers to setup/teardown the vf's.  I believe
> > this will solve your problem once it has been implemented.
> 
> now we have ixgbe.max_vfs=63
> 
> so if change to per pci device (PF),
> 
> how about having the driver built-in?
> what kind of kernel parameters will be passed?

 The driver parameter has to go away. The parameter makes more sense to
 be associated with a PF than with the driver. It has to be a
 run time interface to enable/disable the number of VFs for a given PF.


 RP


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PCI mini-summit notes
  2012-08-30  6:02       ` Bjorn Helgaas
@ 2012-09-05  1:33         ` Don Dutile
  2012-09-05  1:41           ` Don Dutile
  0 siblings, 1 reply; 8+ messages in thread
From: Don Dutile @ 2012-09-05  1:33 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Yinghai Lu, Jon Mason, Jiang Liu, linux-pci

On 08/30/2012 02:02 AM, Bjorn Helgaas wrote:
> [removed cc ksummit-2012-discuss]
>
> On Wed, Aug 29, 2012 at 3:54 PM, Yinghai Lu<yinghai@kernel.org>  wrote:
>> On Wed, Aug 29, 2012 at 3:44 PM, Jon Mason<jdmason@kudzu.us>  wrote:
>>>> Hi Bjorn,
>>>>          One of my team member reported another corner case for SR-IOV. There's
>>>> are two NIC cards in the system driven by the same driver, but one supports
>>>> SR-IOV and the other doesn't. It runs into trouble if "max_vfs" parameter is
>>>> set for the NIC driver.
>>>>          --Gerry
>>>
>>> I believe it was decided that a per-pf sysfs interface would be used
>>> to replace the current module parameter that specifies the number of
>>> vf's.  This should enable different numbers of vf's for each physical
>>> device.  The driver interface that was discussed would introduce new
>>> function pointers for handlers to setup/teardown the vf's.  I believe
>>> this will solve your problem once it has been implemented.
>>
>> now we have ixgbe.max_vfs=63
>>
>> so if change to per pci device (PF),
>>
>> how about having the driver built-in?
>> what kind of kernel parameters will be passed?

Having the driver built-in won't make a difference.

The model is to have files under sys fs, i.e.,
/sys/bus/pci/devices/0000:03:01.0/[sriov_vf_enable, sriov_vf_disable]

where one echo's the number of vf's one wants to configure for a pf
into the sriov_vf_enable file;  if want to disable/deconfigure the vf's,
one echo's a 1 to sriov_vf_disable (all or nothing disable).

the pci core will be able to check & filter that the max num of vf's
is not exceeded.  Also looking to add a file like 'sriov_num_vfs' to
indicate max number of vf's a PF supports.  Whether the bus configuration
supports it (enough mmio space, enough pci bus nums, etc.) won't be
known until the enable count is written.

I have the code down to create the sysfs files & report the num_vfs.
Hope to have the first enable/disable working within another week.
The tough part is to re-factor a pf driver that enables/configures pf;
I'm working with the igb(_main.c) driver right now.

- Don
>
> I don't think we really discussed things at this level, and I
> personally don't know enough about the current SR-IOV support to even
> know what the possible strategies are.  I think it's really up to the
> person doing the implementation to figure out what makes sense given
> the constraints of the SR-IOV specs and the current Linux support.
>
> Bjorn
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PCI mini-summit notes
  2012-09-05  1:33         ` Don Dutile
@ 2012-09-05  1:41           ` Don Dutile
  0 siblings, 0 replies; 8+ messages in thread
From: Don Dutile @ 2012-09-05  1:41 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Yinghai Lu, Jon Mason, Jiang Liu, linux-pci

On 09/04/2012 09:33 PM, Don Dutile wrote:
> On 08/30/2012 02:02 AM, Bjorn Helgaas wrote:
>> [removed cc ksummit-2012-discuss]
>>
>> On Wed, Aug 29, 2012 at 3:54 PM, Yinghai Lu<yinghai@kernel.org> wrote:
>>> On Wed, Aug 29, 2012 at 3:44 PM, Jon Mason<jdmason@kudzu.us> wrote:
>>>>> Hi Bjorn,
>>>>> One of my team member reported another corner case for SR-IOV. There's
>>>>> are two NIC cards in the system driven by the same driver, but one supports
>>>>> SR-IOV and the other doesn't. It runs into trouble if "max_vfs" parameter is
>>>>> set for the NIC driver.
>>>>> --Gerry
>>>>
>>>> I believe it was decided that a per-pf sysfs interface would be used
>>>> to replace the current module parameter that specifies the number of
>>>> vf's. This should enable different numbers of vf's for each physical
>>>> device. The driver interface that was discussed would introduce new
>>>> function pointers for handlers to setup/teardown the vf's. I believe
>>>> this will solve your problem once it has been implemented.
>>>
>>> now we have ixgbe.max_vfs=63
>>>
>>> so if change to per pci device (PF),
>>>
>>> how about having the driver built-in?
>>> what kind of kernel parameters will be passed?
>
> Having the driver built-in won't make a difference.
>
> The model is to have files under sys fs, i.e.,
> /sys/bus/pci/devices/0000:03:01.0/[sriov_vf_enable, sriov_vf_disable]
>
> where one echo's the number of vf's one wants to configure for a pf
> into the sriov_vf_enable file; if want to disable/deconfigure the vf's,
> one echo's a 1 to sriov_vf_disable (all or nothing disable).
>
> the pci core will be able to check & filter that the max num of vf's
> is not exceeded. Also looking to add a file like 'sriov_num_vfs' to
> indicate max number of vf's a PF supports. Whether the bus configuration
> supports it (enough mmio space, enough pci bus nums, etc.) won't be
> known until the enable count is written.
>
> I have the code down to create the sysfs files & report the num_vfs.
                   ^^^^ should be 'done' ...

> Hope to have the first enable/disable working within another week.
> The tough part is to re-factor a pf driver that enables/configures pf;
> I'm working with the igb(_main.c) driver right now.
>
> - Don

btw -- what we didn't resolve at summit, and I haven't taken a crack at it yet,
is a method to set the vf-enablement on a per-pf basis at boot time...
-- kernel cmdline (sounds knarly for large PCIe config)
-- /etc/module.d/sriov.conf ?
-- other ?

>>
>> I don't think we really discussed things at this level, and I
>> personally don't know enough about the current SR-IOV support to even
>> know what the possible strategies are. I think it's really up to the
>> person doing the implementation to figure out what makes sense given
>> the constraints of the SR-IOV specs and the current Linux support.
>>
>> Bjorn
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-09-05  1:41 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-29  7:28 PCI mini-summit notes Bjorn Helgaas
2012-08-29 15:48 ` Jiang Liu
2012-08-29 22:44   ` Jon Mason
2012-08-29 22:54     ` Yinghai Lu
2012-08-30  6:02       ` Bjorn Helgaas
2012-09-05  1:33         ` Don Dutile
2012-09-05  1:41           ` Don Dutile
2012-09-03  8:51       ` Ram Pai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).