linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ip_auto_config() prevents network device to be registered
@ 2017-01-31 17:49 Javier Martinez Canillas
  2017-03-17 17:18 ` Javier Martinez Canillas
  0 siblings, 1 reply; 2+ messages in thread
From: Javier Martinez Canillas @ 2017-01-31 17:49 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Sjoerd Simons, Kevin Hilman, Shuah Khan, LKML

Hello,

The kernelci folks pointed out that a Samsung Exynos based board was failing
to boot when trying to mount the rootfs via NFS, due a networking issue [0].

I looked at the issue and it turned out to be a race between ip_auto_config()
and register_netdev() when using the ip=dhcp param in the kernel command line.

The problem is that ip_auto_config() calls wait_for_devices() [1] and returns
as soon as it finds a network device registered. Then ic_open_devs() [2] is
called then to bring the network devs up and wait for their carrier signals.

But ic_open_devs() grabs the rtnl_mutex lock [3] when doing this, which is the
same lock that register_netdev() [4] grabs before registering a network device.

And so if a network dev is found and wait_for_devices() returns, ic_open_devs()
will be called and no new network dev could be registered in the meantime.

So since ic_open_devs() waits up to CONF_CARRIER_TIMEOUT (120 secs) with this
lock held, if the network dev that's supposed to get its IP over DHCP isn't the
first to be registered, the boot test job may timeout and be considered a fail.

A workaround is to use ip=:::::eth0:dhcp instead ip=dhcp, so wait_for_devices()
waits for this specific device. Another workaround is to increase the timeout
for the job to be much bigger than CONF_CARRIER_TIMEOUT so ip_auto_config() can
retry and the network devices can be registered between tries.

But I wonder if someone can suggest a proper way to fix this. Grabbing a mutex
that prevents network devs to be registered for 120 secs doesn't sound correct.

Thanks a lot for your help and please let me know if I misunderstood something.

[0]: https://storage.kernelci.org/mainline/v4.9/arm-exynos_defconfig/lab-collabora/boot-exynos5422-odroidxu3_rootfs:nfs.html
[1]: http://lxr.free-electrons.com/source/net/ipv4/ipconfig.c#L1368
[2]: http://lxr.free-electrons.com/source/net/ipv4/ipconfig.c#L202
[3]: http://lxr.free-electrons.com/source/net/core/rtnetlink.c#L68
[4]: http://lxr.free-electrons.com/source/net/core/dev.c#L7326

Best regards,
-- 
Javier Martinez Canillas
Open Source Group
Samsung Research America

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: ip_auto_config() prevents network device to be registered
  2017-01-31 17:49 ip_auto_config() prevents network device to be registered Javier Martinez Canillas
@ 2017-03-17 17:18 ` Javier Martinez Canillas
  0 siblings, 0 replies; 2+ messages in thread
From: Javier Martinez Canillas @ 2017-03-17 17:18 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Sjoerd Simons, Kevin Hilman, Shuah Khan, LKML

Hello,

On 01/31/2017 02:49 PM, Javier Martinez Canillas wrote:
> 
> The kernelci folks pointed out that a Samsung Exynos based board was failing
> to boot when trying to mount the rootfs via NFS, due a networking issue [0].
> 
> I looked at the issue and it turned out to be a race between ip_auto_config()
> and register_netdev() when using the ip=dhcp param in the kernel command line.
> 
> The problem is that ip_auto_config() calls wait_for_devices() [1] and returns
> as soon as it finds a network device registered. Then ic_open_devs() [2] is
> called then to bring the network devs up and wait for their carrier signals.
> 
> But ic_open_devs() grabs the rtnl_mutex lock [3] when doing this, which is the
> same lock that register_netdev() [4] grabs before registering a network device.
> 
> And so if a network dev is found and wait_for_devices() returns, ic_open_devs()
> will be called and no new network dev could be registered in the meantime.
> 
> So since ic_open_devs() waits up to CONF_CARRIER_TIMEOUT (120 secs) with this
> lock held, if the network dev that's supposed to get its IP over DHCP isn't the
> first to be registered, the boot test job may timeout and be considered a fail.
> 
> A workaround is to use ip=:::::eth0:dhcp instead ip=dhcp, so wait_for_devices()
> waits for this specific device. Another workaround is to increase the timeout
> for the job to be much bigger than CONF_CARRIER_TIMEOUT so ip_auto_config() can
> retry and the network devices can be registered between tries.
> 
> But I wonder if someone can suggest a proper way to fix this. Grabbing a mutex
> that prevents network devs to be registered for 120 secs doesn't sound correct.
> 
> Thanks a lot for your help and please let me know if I misunderstood something.
> 
> [0]: https://storage.kernelci.org/mainline/v4.9/arm-exynos_defconfig/lab-collabora/boot-exynos5422-odroidxu3_rootfs:nfs.html
> [1]: http://lxr.free-electrons.com/source/net/ipv4/ipconfig.c#L1368
> [2]: http://lxr.free-electrons.com/source/net/ipv4/ipconfig.c#L202
> [3]: http://lxr.free-electrons.com/source/net/core/rtnetlink.c#L68
> [4]: http://lxr.free-electrons.com/source/net/core/dev.c#L7326
> 
> 

Any comments on this?

We are still seeing this problem with today's -next (20170310):

https://storage.kernelci.org/next/next-20170310/arm-exynos_defconfig/lab-collabora/boot-exynos5422-odroidxu3.html

Best regards,
-- 
Javier Martinez Canillas
Open Source Group
Samsung Research America

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2017-03-17 17:26 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-31 17:49 ip_auto_config() prevents network device to be registered Javier Martinez Canillas
2017-03-17 17:18 ` Javier Martinez Canillas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).