All of lore.kernel.org
 help / color / mirror / Atom feed
* SR-IOV setup race between PCI and rtnetlink
@ 2012-02-16 18:22 Ben Hutchings
  2012-02-16 19:55 ` Rose, Gregory V
  0 siblings, 1 reply; 3+ messages in thread
From: Ben Hutchings @ 2012-02-16 18:22 UTC (permalink / raw)
  To: netdev; +Cc: Shradha Shah

A customer hit this WARNING from rtnetlink:

------------[ cut here ]------------
WARNING: at net/core/rtnetlink.c:1568 rtmsg_ifinfo+0x25a/0x260() (Not tainted)
Hardware name: ProLiant DL380 G7
Modules linked in: bonding ipv6 dm_mirror dm_region_hash dm_log power_meter
hpilo hpwdt bnx2 onload(U) sfc_char(U) sfc_resource(U) sfc_affinity(U)
sfc_tune(U) sfc(U) mdio sg microcode serio_raw iTCO_wdt iTCO_vendor_support
i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif hpsa radeon
ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_multipath dm_mod [last
unloaded: scsi_wait_scan]
Pid: 1927, comm: ifup-eth Not tainted 2.6.32-131.6.1.el6.x86_64 #1
Call Trace:
[<ffffffff810670f7>] ? warn_slowpath_common+0x87/0xc0
[<ffffffff8106714a>] ? warn_slowpath_null+0x1a/0x20
[<ffffffff8142da9a>] ? rtmsg_ifinfo+0x25a/0x260
[<ffffffff8108a928>] ? synchronize_sched+0x58/0x60
[<ffffffff8108a8b0>] ? wakeme_after_rcu+0x0/0x20
[<ffffffff8141c42e>] ? netdev_set_master+0x6e/0xd0
[<ffffffffa041191f>] ? bond_enslave+0x22f/0xd00 [bonding]
[<ffffffff814da624>] ? printk+0x41/0x45
[<ffffffffa041afd7>] ? bonding_store_slaves+0x2a7/0x420 [bonding]
[<ffffffff81336900>] ? dev_attr_store+0x20/0x30
[<ffffffff811e4d95>] ? sysfs_write_file+0xe5/0x170
[<ffffffff81172748>] ? vfs_write+0xb8/0x1a0
[<ffffffff810d1ad2>] ? audit_syscall_entry+0x272/0x2a0
[<ffffffff81173181>] ? sys_write+0x51/0x90
[<ffffffff8100b172>] ? system_call_fastpath+0x16/0x1b
---[ end trace 7676a5a34ad7b8ee ]---
------------[ cut here ]------------

This was seen with OpenOnload on RHEL 6.1, but I believe the same issue
exists with the changes I just submitted against net-next.

The WARNING is produced by:
		/* -EMSGSIZE implies BUG in if_nlmsg_size() */
		WARN_ON(err == -EMSGSIZE);

rtnl_vfinfo_size(), rtnl_fill_ifinfo(), etc. use dev_num_vf() to get the
number of VFs that will be included in the message, i.e. they ask the
PCI device and not the net device.

The number of VFs is changed by pci_enable_sriov(), which obviously does
not acquire the RTNL lock.  Further, it is unsafe for its callers to
hold the RTNL lock, because it may synchronously bind the new VFs to
drivers that themselves acquire the RTNL in their probe functions.  So
the number of VFs may change between the time at which the message size
is calculated and the time at which it is built.

Now rtnl_fill_ifinfo() will stop trying to add VF information as soon as
ndo_get_vf_config() returns an error.  If the driver implementation
ensures that it returns errors until after pci_enable_sriov() returns
and it has reacquired the RTNL lock, the message doesn't actually get
any bigger and the WARNING won't be hit.

However we really need the VFs to be configurable immediately, so that
the VF driver can communicate with the PF driver (sfc).  I could add an
separate flag to keep the RTNL interface disabled while the inter-driver
interface is enabled, but that doesn't seem like the right thing to do.

Perhaps there should be a net device op to return the number of VFs,
which the net driver must then only change while holding the RTNL lock?
RTNL would then use that instead of dev_num_vf().

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: SR-IOV setup race between PCI and rtnetlink
  2012-02-16 18:22 SR-IOV setup race between PCI and rtnetlink Ben Hutchings
@ 2012-02-16 19:55 ` Rose, Gregory V
  2012-02-16 21:25   ` Ben Hutchings
  0 siblings, 1 reply; 3+ messages in thread
From: Rose, Gregory V @ 2012-02-16 19:55 UTC (permalink / raw)
  To: Ben Hutchings, netdev; +Cc: Shradha Shah

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Ben Hutchings
> Sent: Thursday, February 16, 2012 10:23 AM
> To: netdev
> Cc: Shradha Shah
> Subject: SR-IOV setup race between PCI and rtnetlink
> 
> A customer hit this WARNING from rtnetlink:
> 
> ------------[ cut here ]------------
> WARNING: at net/core/rtnetlink.c:1568 rtmsg_ifinfo+0x25a/0x260() (Not
> tainted)
> Hardware name: ProLiant DL380 G7
> Modules linked in: bonding ipv6 dm_mirror dm_region_hash dm_log
> power_meter
> hpilo hpwdt bnx2 onload(U) sfc_char(U) sfc_resource(U) sfc_affinity(U)
> sfc_tune(U) sfc(U) mdio sg microcode serio_raw iTCO_wdt
> iTCO_vendor_support
> i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif hpsa
> radeon
> ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_multipath dm_mod
> [last
> unloaded: scsi_wait_scan]
> Pid: 1927, comm: ifup-eth Not tainted 2.6.32-131.6.1.el6.x86_64 #1
> Call Trace:
> [<ffffffff810670f7>] ? warn_slowpath_common+0x87/0xc0
> [<ffffffff8106714a>] ? warn_slowpath_null+0x1a/0x20
> [<ffffffff8142da9a>] ? rtmsg_ifinfo+0x25a/0x260
> [<ffffffff8108a928>] ? synchronize_sched+0x58/0x60
> [<ffffffff8108a8b0>] ? wakeme_after_rcu+0x0/0x20
> [<ffffffff8141c42e>] ? netdev_set_master+0x6e/0xd0
> [<ffffffffa041191f>] ? bond_enslave+0x22f/0xd00 [bonding]
> [<ffffffff814da624>] ? printk+0x41/0x45
> [<ffffffffa041afd7>] ? bonding_store_slaves+0x2a7/0x420 [bonding]
> [<ffffffff81336900>] ? dev_attr_store+0x20/0x30
> [<ffffffff811e4d95>] ? sysfs_write_file+0xe5/0x170
> [<ffffffff81172748>] ? vfs_write+0xb8/0x1a0
> [<ffffffff810d1ad2>] ? audit_syscall_entry+0x272/0x2a0
> [<ffffffff81173181>] ? sys_write+0x51/0x90
> [<ffffffff8100b172>] ? system_call_fastpath+0x16/0x1b
> ---[ end trace 7676a5a34ad7b8ee ]---
> ------------[ cut here ]------------
> 
> This was seen with OpenOnload on RHEL 6.1, but I believe the same issue
> exists with the changes I just submitted against net-next.
> 
> The WARNING is produced by:
> 		/* -EMSGSIZE implies BUG in if_nlmsg_size() */
> 		WARN_ON(err == -EMSGSIZE);
> 
> rtnl_vfinfo_size(), rtnl_fill_ifinfo(), etc. use dev_num_vf() to get the
> number of VFs that will be included in the message, i.e. they ask the
> PCI device and not the net device.
> 
> The number of VFs is changed by pci_enable_sriov(), which obviously does
> not acquire the RTNL lock.  Further, it is unsafe for its callers to
> hold the RTNL lock, because it may synchronously bind the new VFs to
> drivers that themselves acquire the RTNL in their probe functions.  So
> the number of VFs may change between the time at which the message size
> is calculated and the time at which it is built.
> 
> Now rtnl_fill_ifinfo() will stop trying to add VF information as soon as
> ndo_get_vf_config() returns an error.  If the driver implementation
> ensures that it returns errors until after pci_enable_sriov() returns
> and it has reacquired the RTNL lock, the message doesn't actually get
> any bigger and the WARNING won't be hit.
> 
> However we really need the VFs to be configurable immediately, so that
> the VF driver can communicate with the PF driver (sfc).  I could add an
> separate flag to keep the RTNL interface disabled while the inter-driver
> interface is enabled, but that doesn't seem like the right thing to do.
> 
> Perhaps there should be a net device op to return the number of VFs,
> which the net driver must then only change while holding the RTNL lock?
> RTNL would then use that instead of dev_num_vf().

Intel drivers aren't subject to this problem at this time because the number of VFs are configured on driver load and don't change until you unload the driver and then reload it with a new max_vfs parameter.  However, if/when I get some patches done that allow the user to set the number of VFs via ethtool instead of just the module parameter that will change and this change would be very helpful in that circumstance also.

Sounds like a good idea to me.

- Greg


^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: SR-IOV setup race between PCI and rtnetlink
  2012-02-16 19:55 ` Rose, Gregory V
@ 2012-02-16 21:25   ` Ben Hutchings
  0 siblings, 0 replies; 3+ messages in thread
From: Ben Hutchings @ 2012-02-16 21:25 UTC (permalink / raw)
  To: Rose, Gregory V; +Cc: netdev, Shradha Shah

On Thu, 2012-02-16 at 19:55 +0000, Rose, Gregory V wrote:
[...]
> > However we really need the VFs to be configurable immediately, so that
> > the VF driver can communicate with the PF driver (sfc).  I could add an
> > separate flag to keep the RTNL interface disabled while the inter-driver
> > interface is enabled, but that doesn't seem like the right thing to do.
> > 
> > Perhaps there should be a net device op to return the number of VFs,
> > which the net driver must then only change while holding the RTNL lock?
> > RTNL would then use that instead of dev_num_vf().
> 
> Intel drivers aren't subject to this problem at this time because the
> number of VFs are configured on driver load and don't change until you
> unload the driver and then reload it with a new max_vfs parameter. 

sfc also fixes the number of VFs at driver load, but this race can occur
*during* driver load if the driver calls pci_enable_sriov() after
register_netdevice().

It looks like most other drivers with SR-IOV capability avoid this
problem by calling pci_enable_sriov() before registering the net device.
(cxgb4 is an exception and should also suffer from this.)

The reason for the current ordering in sfc is that we also use VFs for
user-level networking and in that case the VF driver's probe function
needs to identify the corresponding net device.  If the net device isn't
registered until later then we'll have to defer that to a netdevice
notifier.

> However, if/when I get some patches done that allow the user to set
> the number of VFs via ethtool instead of just the module parameter
> that will change and this change would be very helpful in that
> circumstance also.

Wouldn't it make more sense to do that through rtnetlink, along with the
configuration of the individual VFs?

Ben.

> Sounds like a good idea to me.
> 
> - Greg
> 

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-02-16 21:25 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-16 18:22 SR-IOV setup race between PCI and rtnetlink Ben Hutchings
2012-02-16 19:55 ` Rose, Gregory V
2012-02-16 21:25   ` Ben Hutchings

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.