All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] net: core: explicitly call linkwatch_fire_event to speed up the startup of network services
@ 2020-08-01  8:58 Wen Yang
  2020-08-04 22:58 ` David Miller
  0 siblings, 1 reply; 4+ messages in thread
From: Wen Yang @ 2020-08-01  8:58 UTC (permalink / raw)
  To: davem, Jakub Kicinski
  Cc: Xunlei Pang, Caspar Zhang, Wen Yang, Andrew Lunn, Eric Dumazet,
	Jiri Pirko, Leon Romanovsky, Julian Wiedmann, netdev,
	linux-kernel

The linkwatch_event work queue runs up to one second later.
When the MicroVM starts, it takes 300+ms for the ethX flag
to change from '+UP +LOWER_UP' to '+RUNNING', as follows:
Jul 20 22:00:47.432552 systemd-networkd[210]: eth0: bringing link up
...
Jul 20 22:00:47.446108 systemd-networkd[210]: eth0: flags change: +UP +LOWER_UP
...
Jul 20 22:00:47.781463 systemd-networkd[210]: eth0: flags change: +RUNNING

Let's manually trigger it here to make the network service start faster.

After applying this patch, the time consumption of
systemd-networkd.service was reduced from 366ms to 50ms.

Signed-off-by: Wen Yang <wenyang@linux.alibaba.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Julian Wiedmann <jwi@linux.ibm.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 net/core/link_watch.c | 3 +++
 net/core/rtnetlink.c  | 1 +
 2 files changed, 4 insertions(+)

diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index 75431ca..6b9d44b 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -98,6 +98,9 @@ static bool linkwatch_urgent_event(struct net_device *dev)
 	if (netif_is_lag_port(dev) || netif_is_lag_master(dev))
 		return true;
 
+	if ((dev->flags & IFF_UP) && dev->operstate == IF_OPER_DOWN)
+		return true;
+
 	return netif_carrier_ok(dev) &&	qdisc_tx_changing(dev);
 }
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 58c484a..fd0b3b6 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2604,6 +2604,7 @@ static int do_setlink(const struct sk_buff *skb,
 				       extack);
 		if (err < 0)
 			goto errout;
+		linkwatch_fire_event(dev);
 	}
 
 	if (tb[IFLA_MASTER]) {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] net: core: explicitly call linkwatch_fire_event to speed up the startup of network services
  2020-08-01  8:58 [PATCH] net: core: explicitly call linkwatch_fire_event to speed up the startup of network services Wen Yang
@ 2020-08-04 22:58 ` David Miller
  2020-08-06  9:09   ` Wen Yang
  0 siblings, 1 reply; 4+ messages in thread
From: David Miller @ 2020-08-04 22:58 UTC (permalink / raw)
  To: wenyang
  Cc: kuba, xlpang, caspar, andrew, edumazet, jiri, leon, jwi, netdev,
	linux-kernel

From: Wen Yang <wenyang@linux.alibaba.com>
Date: Sat,  1 Aug 2020 16:58:45 +0800

> diff --git a/net/core/link_watch.c b/net/core/link_watch.c
> index 75431ca..6b9d44b 100644
> --- a/net/core/link_watch.c
> +++ b/net/core/link_watch.c
> @@ -98,6 +98,9 @@ static bool linkwatch_urgent_event(struct net_device *dev)
>  	if (netif_is_lag_port(dev) || netif_is_lag_master(dev))
>  		return true;
>  
> +	if ((dev->flags & IFF_UP) && dev->operstate == IF_OPER_DOWN)
> +		return true;
> +
>  	return netif_carrier_ok(dev) &&	qdisc_tx_changing(dev);
>  }
>  

You're bypassing explicitly the logic here:

	/*
	 * Limit the number of linkwatch events to one
	 * per second so that a runaway driver does not
	 * cause a storm of messages on the netlink
	 * socket.  This limit does not apply to up events
	 * while the device qdisc is down.
	 */
	if (!urgent_only)
		linkwatch_nextevent = jiffies + HZ;
	/* Limit wrap-around effect on delay. */
	else if (time_after(linkwatch_nextevent, jiffies + HZ))
		linkwatch_nextevent = jiffies;

Something about this isn't right.  We need to analyze what you are seeing,
what device you are using, and what systemd is doing to figure out what
the right place for the fix.

Thank you.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] net: core: explicitly call linkwatch_fire_event to speed up the startup of network services
  2020-08-04 22:58 ` David Miller
@ 2020-08-06  9:09   ` Wen Yang
  2020-09-16  3:07     ` Wen Yang
  0 siblings, 1 reply; 4+ messages in thread
From: Wen Yang @ 2020-08-06  9:09 UTC (permalink / raw)
  To: David Miller
  Cc: kuba, xlpang, caspar, andrew, edumazet, jiri, leon, jwi, netdev,
	linux-kernel



在 2020/8/5 上午6:58, David Miller 写道:
> From: Wen Yang <wenyang@linux.alibaba.com>
> Date: Sat,  1 Aug 2020 16:58:45 +0800
> 
>> diff --git a/net/core/link_watch.c b/net/core/link_watch.c
>> index 75431ca..6b9d44b 100644
>> --- a/net/core/link_watch.c
>> +++ b/net/core/link_watch.c
>> @@ -98,6 +98,9 @@ static bool linkwatch_urgent_event(struct net_device *dev)
>>   	if (netif_is_lag_port(dev) || netif_is_lag_master(dev))
>>   		return true;
>>   
>> +	if ((dev->flags & IFF_UP) && dev->operstate == IF_OPER_DOWN)
>> +		return true;
>> +
>>   	return netif_carrier_ok(dev) &&	qdisc_tx_changing(dev);
>>   }
>>   
> 
> You're bypassing explicitly the logic here:
> 
> 	/*
> 	 * Limit the number of linkwatch events to one
> 	 * per second so that a runaway driver does not
> 	 * cause a storm of messages on the netlink
> 	 * socket.  This limit does not apply to up events
> 	 * while the device qdisc is down.
> 	 */
> 	if (!urgent_only)
> 		linkwatch_nextevent = jiffies + HZ;
> 	/* Limit wrap-around effect on delay. */
> 	else if (time_after(linkwatch_nextevent, jiffies + HZ))
> 		linkwatch_nextevent = jiffies;
> 
> Something about this isn't right.  We need to analyze what you are seeing,
> what device you are using, and what systemd is doing to figure out what
> the right place for the fix.
> 
> Thank you.
> 

Thank you very much for your comments.
We are using virtio_net and the environment is a microvm similar to 
firecracker.

Let's briefly explain.
net_device->operstate is assigned through linkwatch_event, and the call 
stack is as follows:
process_one_work
-> linkwatch_event
  -> __linkwatch_run_queue
   -> linkwatch_do_dev
    -> rfc2863_policy
     -> default_operstate

During the machine startup process, net_device->operstate has the 
following two-step state changes:

STEP A: virtnet_probe detects the network card and triggers the 
execution of linkwatch_fire_event.
Since linkwatch_nextevent is initialized to 0, linkwatch_work will run.
And since net_device->state is 6 (__LINK_STATE_PRESENT | 
__LINK_STATE_NOCARRIER), net_device->operstate will be changed from 
IF_OPER_UNKNOWN to IF_OPER_DOWN:
eth0 operstate:0 (IF_OPER_UNKNOWN) -> operstate:2 (IF_OPER_DOWN)

virtnet_probe then executes netif_carrier_on to update 
net_device->state, it will be changed from ‘__LINK_STATE_PRESENT | 
__LINK_STATE_NOCARRIER’ to __LINK_STATE_PRESENT:
eth0 state: 6 (__LINK_STATE_PRESENT | __LINK_STATE_NOCARRIER) -> 2 
(__LINK_STATE_PRESENT)

STEP B: One second later (because linkwatch_nextevent = jiffies + HZ), 
linkwatch_work is executed again.
At this time, since net_device->state is __LINK_STATE_PRESENT, so the 
net_device->operstate will be changed from IF_OPER_DOWN to IF_OPER_UP:
eth0 operstate:2 (IF_OPER_DOWN) -> operstate:6 (IF_OPER_UP)


The above state change can be completed within 2 seconds.
Generally, the machine will load the initramfs first, and do some 
initialization in the initramfs, which takes some time; then switch_root 
to the system disk and continue the initialization, which will also take 
some time, and finally start the systemd-networkd service, bringing 
link, etc.,
In this way, the linkwatch_work work queue has enough time to run twice, 
and the state of net_device->operstate is already IF_OPER_UP,
So bringing link up quickly returns the following information:
Aug 06 16:35:55.966121 iZuf6h1kfgutxc3el68z2lZ systemd-networkd[580]: 
eth0: bringing link up
...
Aug 06 16:35:55.990461 iZuf6h1kfgutxc3el68z2lZ systemd-networkd[580]: 
eth0: flags change: +UP +LOWER_UP +RUNNING

But we are now using MicroVM, which requires extreme speed to start, 
bypassing the initramfs and directly booting the trimmed system on the disk.
systemd-networkd starts in less than 1 second after booting. the STEP B 
has not been run yet, so it will wait for several hundred milliseconds 
here, as follows:
Jul 20 22:00:47.432552 systemd-networkd[210]: eth0: bringing link up
...
Jul 20 22:00:47.446108 systemd-networkd[210]: eth0: flags change: +UP 
+LOWER_UP
...
Jul 20 22:00:47.781463 systemd-networkd[210]: eth0: flags change: +RUNNING


Note: dhcp pays attention to IFF_RUNNING status, we may refer to:
https://www.kernel.org/doc/Documentation/networking/operstates.txt

A routing daemon or dhcp client just needs to care for IFF_RUNNING or
waiting for operstate to go IF_OPER_UP/IF_OPER_UNKNOWN before
considering the interface / querying a DHCP address.

Finally, the STEP B above only updates the value of operstate based on 
the known state (operstate/state) on the net_device, without any 
hardware interaction involved, so it is not very reasonable to wait for 
1 second there.

By adding:
+	if ((dev->flags & IFF_UP) && dev->operstate == IF_OPER_DOWN)
+		return true;
+
We hope to improve the linkwatch_urgent_event function a bit.

Hope to get more of your advice and guidance.

Best wishes,
Wen

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] net: core: explicitly call linkwatch_fire_event to speed up the startup of network services
  2020-08-06  9:09   ` Wen Yang
@ 2020-09-16  3:07     ` Wen Yang
  0 siblings, 0 replies; 4+ messages in thread
From: Wen Yang @ 2020-09-16  3:07 UTC (permalink / raw)
  To: David Miller
  Cc: kuba, andrew, edumazet, jiri, leon, jwi, netdev, linux-kernel


on 2020/8/6 PM5:09, Wen Yang wrote:
>
>
> 在 2020/8/5 上午6:58, David Miller 写道:
>> From: Wen Yang <wenyang@linux.alibaba.com>
>> Date: Sat,  1 Aug 2020 16:58:45 +0800
>>
>>> diff --git a/net/core/link_watch.c b/net/core/link_watch.c
>>> index 75431ca..6b9d44b 100644
>>> --- a/net/core/link_watch.c
>>> +++ b/net/core/link_watch.c
>>> @@ -98,6 +98,9 @@ static bool linkwatch_urgent_event(struct 
>>> net_device *dev)
>>>       if (netif_is_lag_port(dev) || netif_is_lag_master(dev))
>>>           return true;
>>>   +    if ((dev->flags & IFF_UP) && dev->operstate == IF_OPER_DOWN)
>>> +        return true;
>>> +
>>>       return netif_carrier_ok(dev) && qdisc_tx_changing(dev);
>>>   }
>>
>> You're bypassing explicitly the logic here:
>>
>>     /*
>>      * Limit the number of linkwatch events to one
>>      * per second so that a runaway driver does not
>>      * cause a storm of messages on the netlink
>>      * socket.  This limit does not apply to up events
>>      * while the device qdisc is down.
>>      */
>>     if (!urgent_only)
>>         linkwatch_nextevent = jiffies + HZ;
>>     /* Limit wrap-around effect on delay. */
>>     else if (time_after(linkwatch_nextevent, jiffies + HZ))
>>         linkwatch_nextevent = jiffies;
>>
>> Something about this isn't right.  We need to analyze what you are 
>> seeing,
>> what device you are using, and what systemd is doing to figure out what
>> the right place for the fix.
>>
>> Thank you.
>>
>
> Thank you very much for your comments.
> We are using virtio_net and the environment is a microvm similar to 
> firecracker.
>
> Let's briefly explain.
> net_device->operstate is assigned through linkwatch_event, and the 
> call stack is as follows:
> process_one_work
> -> linkwatch_event
>  -> __linkwatch_run_queue
>   -> linkwatch_do_dev
>    -> rfc2863_policy
>     -> default_operstate
>
> During the machine startup process, net_device->operstate has the 
> following two-step state changes:
>
> STEP A: virtnet_probe detects the network card and triggers the 
> execution of linkwatch_fire_event.
> Since linkwatch_nextevent is initialized to 0, linkwatch_work will run.
> And since net_device->state is 6 (__LINK_STATE_PRESENT | 
> __LINK_STATE_NOCARRIER), net_device->operstate will be changed from 
> IF_OPER_UNKNOWN to IF_OPER_DOWN:
> eth0 operstate:0 (IF_OPER_UNKNOWN) -> operstate:2 (IF_OPER_DOWN)
>
> virtnet_probe then executes netif_carrier_on to update 
> net_device->state, it will be changed from ‘__LINK_STATE_PRESENT | 
> __LINK_STATE_NOCARRIER’ to __LINK_STATE_PRESENT:
> eth0 state: 6 (__LINK_STATE_PRESENT | __LINK_STATE_NOCARRIER) -> 2 
> (__LINK_STATE_PRESENT)
>
> STEP B: One second later (because linkwatch_nextevent = jiffies + HZ), 
> linkwatch_work is executed again.
> At this time, since net_device->state is __LINK_STATE_PRESENT, so the 
> net_device->operstate will be changed from IF_OPER_DOWN to IF_OPER_UP:
> eth0 operstate:2 (IF_OPER_DOWN) -> operstate:6 (IF_OPER_UP)
>
>
> The above state change can be completed within 2 seconds.
> Generally, the machine will load the initramfs first, and do some 
> initialization in the initramfs, which takes some time; then 
> switch_root to the system disk and continue the initialization, which 
> will also take some time, and finally start the systemd-networkd 
> service, bringing link, etc.,
> In this way, the linkwatch_work work queue has enough time to run 
> twice, and the state of net_device->operstate is already IF_OPER_UP,
> So bringing link up quickly returns the following information:
> Aug 06 16:35:55.966121 iZuf6h1kfgutxc3el68z2lZ systemd-networkd[580]: 
> eth0: bringing link up
> ...
> Aug 06 16:35:55.990461 iZuf6h1kfgutxc3el68z2lZ systemd-networkd[580]: 
> eth0: flags change: +UP +LOWER_UP +RUNNING
>
> But we are now using MicroVM, which requires extreme speed to start, 
> bypassing the initramfs and directly booting the trimmed system on the 
> disk.
> systemd-networkd starts in less than 1 second after booting. the STEP 
> B has not been run yet, so it will wait for several hundred 
> milliseconds here, as follows:
> Jul 20 22:00:47.432552 systemd-networkd[210]: eth0: bringing link up
> ...
> Jul 20 22:00:47.446108 systemd-networkd[210]: eth0: flags change: +UP 
> +LOWER_UP
> ...
> Jul 20 22:00:47.781463 systemd-networkd[210]: eth0: flags change: 
> +RUNNING
>
>
> Note: dhcp pays attention to IFF_RUNNING status, we may refer to:
> https://www.kernel.org/doc/Documentation/networking/operstates.txt
>
> A routing daemon or dhcp client just needs to care for IFF_RUNNING or
> waiting for operstate to go IF_OPER_UP/IF_OPER_UNKNOWN before
> considering the interface / querying a DHCP address.
>
> Finally, the STEP B above only updates the value of operstate based on 
> the known state (operstate/state) on the net_device, without any 
> hardware interaction involved, so it is not very reasonable to wait 
> for 1 second there.
>
> By adding:
> +    if ((dev->flags & IFF_UP) && dev->operstate == IF_OPER_DOWN)
> +        return true;
> +
> We hope to improve the linkwatch_urgent_event function a bit.
>
> Hope to get more of your advice and guidance.
>
> Best wishes,
> Wen

hi, this issue is worth continuing discussion.
In the microVM scenario, it is valuable to increase the startup speed by 
a few hundred milliseconds.

Best wishes,
Wen




^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-09-16  3:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-01  8:58 [PATCH] net: core: explicitly call linkwatch_fire_event to speed up the startup of network services Wen Yang
2020-08-04 22:58 ` David Miller
2020-08-06  9:09   ` Wen Yang
2020-09-16  3:07     ` Wen Yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.