Netdev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] net: core: explicitly call linkwatch_fire_event to speed up the startup of network services
@ 2020-08-01  8:58 Wen Yang
  2020-08-04 22:58 ` David Miller
  0 siblings, 1 reply; 3+ messages in thread
From: Wen Yang @ 2020-08-01  8:58 UTC (permalink / raw)
  To: davem, Jakub Kicinski
  Cc: Xunlei Pang, Caspar Zhang, Wen Yang, Andrew Lunn, Eric Dumazet,
	Jiri Pirko, Leon Romanovsky, Julian Wiedmann, netdev,
	linux-kernel

The linkwatch_event work queue runs up to one second later.
When the MicroVM starts, it takes 300+ms for the ethX flag
to change from '+UP +LOWER_UP' to '+RUNNING', as follows:
Jul 20 22:00:47.432552 systemd-networkd[210]: eth0: bringing link up
...
Jul 20 22:00:47.446108 systemd-networkd[210]: eth0: flags change: +UP +LOWER_UP
...
Jul 20 22:00:47.781463 systemd-networkd[210]: eth0: flags change: +RUNNING

Let's manually trigger it here to make the network service start faster.

After applying this patch, the time consumption of
systemd-networkd.service was reduced from 366ms to 50ms.

Signed-off-by: Wen Yang <wenyang@linux.alibaba.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Julian Wiedmann <jwi@linux.ibm.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 net/core/link_watch.c | 3 +++
 net/core/rtnetlink.c  | 1 +
 2 files changed, 4 insertions(+)

diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index 75431ca..6b9d44b 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -98,6 +98,9 @@ static bool linkwatch_urgent_event(struct net_device *dev)
 	if (netif_is_lag_port(dev) || netif_is_lag_master(dev))
 		return true;
 
+	if ((dev->flags & IFF_UP) && dev->operstate == IF_OPER_DOWN)
+		return true;
+
 	return netif_carrier_ok(dev) &&	qdisc_tx_changing(dev);
 }
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 58c484a..fd0b3b6 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2604,6 +2604,7 @@ static int do_setlink(const struct sk_buff *skb,
 				       extack);
 		if (err < 0)
 			goto errout;
+		linkwatch_fire_event(dev);
 	}
 
 	if (tb[IFLA_MASTER]) {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] net: core: explicitly call linkwatch_fire_event to speed up the startup of network services
  2020-08-01  8:58 [PATCH] net: core: explicitly call linkwatch_fire_event to speed up the startup of network services Wen Yang
@ 2020-08-04 22:58 ` David Miller
  2020-08-06  9:09   ` Wen Yang
  0 siblings, 1 reply; 3+ messages in thread
From: David Miller @ 2020-08-04 22:58 UTC (permalink / raw)
  To: wenyang
  Cc: kuba, xlpang, caspar, andrew, edumazet, jiri, leon, jwi, netdev,
	linux-kernel

From: Wen Yang <wenyang@linux.alibaba.com>
Date: Sat,  1 Aug 2020 16:58:45 +0800

> diff --git a/net/core/link_watch.c b/net/core/link_watch.c
> index 75431ca..6b9d44b 100644
> --- a/net/core/link_watch.c
> +++ b/net/core/link_watch.c
> @@ -98,6 +98,9 @@ static bool linkwatch_urgent_event(struct net_device *dev)
>  	if (netif_is_lag_port(dev) || netif_is_lag_master(dev))
>  		return true;
>  
> +	if ((dev->flags & IFF_UP) && dev->operstate == IF_OPER_DOWN)
> +		return true;
> +
>  	return netif_carrier_ok(dev) &&	qdisc_tx_changing(dev);
>  }
>  

You're bypassing explicitly the logic here:

	/*
	 * Limit the number of linkwatch events to one
	 * per second so that a runaway driver does not
	 * cause a storm of messages on the netlink
	 * socket.  This limit does not apply to up events
	 * while the device qdisc is down.
	 */
	if (!urgent_only)
		linkwatch_nextevent = jiffies + HZ;
	/* Limit wrap-around effect on delay. */
	else if (time_after(linkwatch_nextevent, jiffies + HZ))
		linkwatch_nextevent = jiffies;

Something about this isn't right.  We need to analyze what you are seeing,
what device you are using, and what systemd is doing to figure out what
the right place for the fix.

Thank you.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] net: core: explicitly call linkwatch_fire_event to speed up the startup of network services
  2020-08-04 22:58 ` David Miller
@ 2020-08-06  9:09   ` Wen Yang
  0 siblings, 0 replies; 3+ messages in thread
From: Wen Yang @ 2020-08-06  9:09 UTC (permalink / raw)
  To: David Miller
  Cc: kuba, xlpang, caspar, andrew, edumazet, jiri, leon, jwi, netdev,
	linux-kernel



在 2020/8/5 上午6:58, David Miller 写道:
> From: Wen Yang <wenyang@linux.alibaba.com>
> Date: Sat,  1 Aug 2020 16:58:45 +0800
> 
>> diff --git a/net/core/link_watch.c b/net/core/link_watch.c
>> index 75431ca..6b9d44b 100644
>> --- a/net/core/link_watch.c
>> +++ b/net/core/link_watch.c
>> @@ -98,6 +98,9 @@ static bool linkwatch_urgent_event(struct net_device *dev)
>>   	if (netif_is_lag_port(dev) || netif_is_lag_master(dev))
>>   		return true;
>>   
>> +	if ((dev->flags & IFF_UP) && dev->operstate == IF_OPER_DOWN)
>> +		return true;
>> +
>>   	return netif_carrier_ok(dev) &&	qdisc_tx_changing(dev);
>>   }
>>   
> 
> You're bypassing explicitly the logic here:
> 
> 	/*
> 	 * Limit the number of linkwatch events to one
> 	 * per second so that a runaway driver does not
> 	 * cause a storm of messages on the netlink
> 	 * socket.  This limit does not apply to up events
> 	 * while the device qdisc is down.
> 	 */
> 	if (!urgent_only)
> 		linkwatch_nextevent = jiffies + HZ;
> 	/* Limit wrap-around effect on delay. */
> 	else if (time_after(linkwatch_nextevent, jiffies + HZ))
> 		linkwatch_nextevent = jiffies;
> 
> Something about this isn't right.  We need to analyze what you are seeing,
> what device you are using, and what systemd is doing to figure out what
> the right place for the fix.
> 
> Thank you.
> 

Thank you very much for your comments.
We are using virtio_net and the environment is a microvm similar to 
firecracker.

Let's briefly explain.
net_device->operstate is assigned through linkwatch_event, and the call 
stack is as follows:
process_one_work
-> linkwatch_event
  -> __linkwatch_run_queue
   -> linkwatch_do_dev
    -> rfc2863_policy
     -> default_operstate

During the machine startup process, net_device->operstate has the 
following two-step state changes:

STEP A: virtnet_probe detects the network card and triggers the 
execution of linkwatch_fire_event.
Since linkwatch_nextevent is initialized to 0, linkwatch_work will run.
And since net_device->state is 6 (__LINK_STATE_PRESENT | 
__LINK_STATE_NOCARRIER), net_device->operstate will be changed from 
IF_OPER_UNKNOWN to IF_OPER_DOWN:
eth0 operstate:0 (IF_OPER_UNKNOWN) -> operstate:2 (IF_OPER_DOWN)

virtnet_probe then executes netif_carrier_on to update 
net_device->state, it will be changed from ‘__LINK_STATE_PRESENT | 
__LINK_STATE_NOCARRIER’ to __LINK_STATE_PRESENT:
eth0 state: 6 (__LINK_STATE_PRESENT | __LINK_STATE_NOCARRIER) -> 2 
(__LINK_STATE_PRESENT)

STEP B: One second later (because linkwatch_nextevent = jiffies + HZ), 
linkwatch_work is executed again.
At this time, since net_device->state is __LINK_STATE_PRESENT, so the 
net_device->operstate will be changed from IF_OPER_DOWN to IF_OPER_UP:
eth0 operstate:2 (IF_OPER_DOWN) -> operstate:6 (IF_OPER_UP)


The above state change can be completed within 2 seconds.
Generally, the machine will load the initramfs first, and do some 
initialization in the initramfs, which takes some time; then switch_root 
to the system disk and continue the initialization, which will also take 
some time, and finally start the systemd-networkd service, bringing 
link, etc.,
In this way, the linkwatch_work work queue has enough time to run twice, 
and the state of net_device->operstate is already IF_OPER_UP,
So bringing link up quickly returns the following information:
Aug 06 16:35:55.966121 iZuf6h1kfgutxc3el68z2lZ systemd-networkd[580]: 
eth0: bringing link up
...
Aug 06 16:35:55.990461 iZuf6h1kfgutxc3el68z2lZ systemd-networkd[580]: 
eth0: flags change: +UP +LOWER_UP +RUNNING

But we are now using MicroVM, which requires extreme speed to start, 
bypassing the initramfs and directly booting the trimmed system on the disk.
systemd-networkd starts in less than 1 second after booting. the STEP B 
has not been run yet, so it will wait for several hundred milliseconds 
here, as follows:
Jul 20 22:00:47.432552 systemd-networkd[210]: eth0: bringing link up
...
Jul 20 22:00:47.446108 systemd-networkd[210]: eth0: flags change: +UP 
+LOWER_UP
...
Jul 20 22:00:47.781463 systemd-networkd[210]: eth0: flags change: +RUNNING


Note: dhcp pays attention to IFF_RUNNING status, we may refer to:
https://www.kernel.org/doc/Documentation/networking/operstates.txt

A routing daemon or dhcp client just needs to care for IFF_RUNNING or
waiting for operstate to go IF_OPER_UP/IF_OPER_UNKNOWN before
considering the interface / querying a DHCP address.

Finally, the STEP B above only updates the value of operstate based on 
the known state (operstate/state) on the net_device, without any 
hardware interaction involved, so it is not very reasonable to wait for 
1 second there.

By adding:
+	if ((dev->flags & IFF_UP) && dev->operstate == IF_OPER_DOWN)
+		return true;
+
We hope to improve the linkwatch_urgent_event function a bit.

Hope to get more of your advice and guidance.

Best wishes,
Wen

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, back to index

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-01  8:58 [PATCH] net: core: explicitly call linkwatch_fire_event to speed up the startup of network services Wen Yang
2020-08-04 22:58 ` David Miller
2020-08-06  9:09   ` Wen Yang

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git