All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] netpoll: Remove 4s sleep during carrier detection
@ 2023-01-25 18:52 Breno Leitao
  2023-01-26  9:04 ` David Laight
  2023-01-28  8:40 ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 5+ messages in thread
From: Breno Leitao @ 2023-01-25 18:52 UTC (permalink / raw)
  To: kuba, netdev
  Cc: leitao, leit, davem, edumazet, pabeni, andrew, linux-kernel,
	Michael van der Westhuizen

This patch removes the msleep(4s) during netpoll_setup() if the carrier
appears instantly.

Here are some scenarios where this workaround is counter-productive in
modern ages:

Servers which have BMC communicating over NC-SI via the same NIC as gets
used for netconsole. BMC will keep the PHY up, hence the carrier
appearing instantly.

The link is fibre, SERDES getting sync could happen within 0.1Hz, and
the carrier also appears instantly.

Other than that, if a driver is reporting instant carrier and then
losing it, this is probably a driver bug.

Reported-by: Michael van der Westhuizen <rmikey@meta.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
--
v1->v2: added "RFC" in the subject
v2->v3: improved the commit message
---
 net/core/netpoll.c | 12 +-----------
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 9be762e1d..a089b704b 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -682,7 +682,7 @@ int netpoll_setup(struct netpoll *np)
 	}
 
 	if (!netif_running(ndev)) {
-		unsigned long atmost, atleast;
+		unsigned long atmost;
 
 		np_info(np, "device %s not up yet, forcing it\n", np->dev_name);
 
@@ -694,7 +694,6 @@ int netpoll_setup(struct netpoll *np)
 		}
 
 		rtnl_unlock();
-		atleast = jiffies + HZ/10;
 		atmost = jiffies + carrier_timeout * HZ;
 		while (!netif_carrier_ok(ndev)) {
 			if (time_after(jiffies, atmost)) {
@@ -704,15 +703,6 @@ int netpoll_setup(struct netpoll *np)
 			msleep(1);
 		}
 
-		/* If carrier appears to come up instantly, we don't
-		 * trust it and pause so that we don't pump all our
-		 * queued console messages into the bitbucket.
-		 */
-
-		if (time_before(jiffies, atleast)) {
-			np_notice(np, "carrier detect appears untrustworthy, waiting 4 seconds\n");
-			msleep(4000);
-		}
 		rtnl_lock();
 	}
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [PATCH v3] netpoll: Remove 4s sleep during carrier detection
  2023-01-25 18:52 [PATCH v3] netpoll: Remove 4s sleep during carrier detection Breno Leitao
@ 2023-01-26  9:04 ` David Laight
  2023-01-26 10:52   ` Breno Leitao
  2023-01-26 13:22   ` Andrew Lunn
  2023-01-28  8:40 ` patchwork-bot+netdevbpf
  1 sibling, 2 replies; 5+ messages in thread
From: David Laight @ 2023-01-26  9:04 UTC (permalink / raw)
  To: 'Breno Leitao', kuba, netdev
  Cc: leit, davem, edumazet, pabeni, andrew, linux-kernel,
	Michael van der Westhuizen

From: Breno Leitao
> Sent: 25 January 2023 18:53
> This patch removes the msleep(4s) during netpoll_setup() if the carrier
> appears instantly.
> 
> Here are some scenarios where this workaround is counter-productive in
> modern ages:
> 
> Servers which have BMC communicating over NC-SI via the same NIC as gets
> used for netconsole. BMC will keep the PHY up, hence the carrier
> appearing instantly.
> 
> The link is fibre, SERDES getting sync could happen within 0.1Hz, and
> the carrier also appears instantly.
> 
> Other than that, if a driver is reporting instant carrier and then
> losing it, this is probably a driver bug.

I can't help feeling that this will break something.
The 4 second delay does look counter productive though.
Obvious alternatives are 'wait a bit before the first check'
and 'require carrier to be present for a few checks'.

It also has to be said that checking every ms seems over enthusiastic.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] netpoll: Remove 4s sleep during carrier detection
  2023-01-26  9:04 ` David Laight
@ 2023-01-26 10:52   ` Breno Leitao
  2023-01-26 13:22   ` Andrew Lunn
  1 sibling, 0 replies; 5+ messages in thread
From: Breno Leitao @ 2023-01-26 10:52 UTC (permalink / raw)
  To: David Laight, kuba, netdev
  Cc: leit, leit, davem, edumazet, pabeni, andrew, linux-kernel,
	Michael van der Westhuizen

On 26/01/2023 09:04, David Laight wrote:

>> This patch removes the msleep(4s) during netpoll_setup() if the carrier
>> appears instantly.
>>
>> Here are some scenarios where this workaround is counter-productive in
>> modern ages:
>>
>> Servers which have BMC communicating over NC-SI via the same NIC as gets
>> used for netconsole. BMC will keep the PHY up, hence the carrier
>> appearing instantly.
>>
>> The link is fibre, SERDES getting sync could happen within 0.1Hz, and
>> the carrier also appears instantly.
>>
>> Other than that, if a driver is reporting instant carrier and then
>> losing it, this is probably a driver bug.
> 
> I can't help feeling that this will break something.

If we see breakages after this patch, then we can identify broken 
drivers, and fix the driver itself.

On the other side, if we keep this workaround, we are penalizing the 
boot of every modern machine in 4s, just because we might have some 
broken driver somewhere.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] netpoll: Remove 4s sleep during carrier detection
  2023-01-26  9:04 ` David Laight
  2023-01-26 10:52   ` Breno Leitao
@ 2023-01-26 13:22   ` Andrew Lunn
  1 sibling, 0 replies; 5+ messages in thread
From: Andrew Lunn @ 2023-01-26 13:22 UTC (permalink / raw)
  To: David Laight
  Cc: 'Breno Leitao',
	kuba, netdev, leit, davem, edumazet, pabeni, linux-kernel,
	Michael van der Westhuizen

On Thu, Jan 26, 2023 at 09:04:42AM +0000, David Laight wrote:
> From: Breno Leitao
> > Sent: 25 January 2023 18:53
> > This patch removes the msleep(4s) during netpoll_setup() if the carrier
> > appears instantly.
> > 
> > Here are some scenarios where this workaround is counter-productive in
> > modern ages:
> > 
> > Servers which have BMC communicating over NC-SI via the same NIC as gets
> > used for netconsole. BMC will keep the PHY up, hence the carrier
> > appearing instantly.
> > 
> > The link is fibre, SERDES getting sync could happen within 0.1Hz, and
> > the carrier also appears instantly.
> > 
> > Other than that, if a driver is reporting instant carrier and then
> > losing it, this is probably a driver bug.
> 
> I can't help feeling that this will break something.
> The 4 second delay does look counter productive though.
> Obvious alternatives are 'wait a bit before the first check'
> and 'require carrier to be present for a few checks'.

I'm guessing, but i think the issue is that the MAC reports the
carrier is up, even though autoneg has not completed, and so packets
are getting dropped. Autoneg takes around 1.5 seconds, so you need to
wait this long before starting to send to prevent packets landing in
the bit bucket. And i guess polling as you suggests does not help,
since it never returns the true status.

But this is pure guesswork. Maybe some mailing list archaeology can
help explain this code.

I guess the likely breaking scenario is that simply the first 1.5
seconds of the kernel log goes to the bit bucket for broken
MACs. Which is not fatal, just annoying for somebody trying to debug a
crash in the first few seconds. I suppose dhcp might also take longer
for broken MACs, since its first requests also get lost, and it might
get into exponential back off.

I guess the risks are small here. But i use the word guess a lot...

  Andrew

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] netpoll: Remove 4s sleep during carrier detection
  2023-01-25 18:52 [PATCH v3] netpoll: Remove 4s sleep during carrier detection Breno Leitao
  2023-01-26  9:04 ` David Laight
@ 2023-01-28  8:40 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 5+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-01-28  8:40 UTC (permalink / raw)
  To: Breno Leitao
  Cc: kuba, netdev, leit, davem, edumazet, pabeni, andrew,
	linux-kernel, rmikey

Hello:

This patch was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 25 Jan 2023 10:52:30 -0800 you wrote:
> This patch removes the msleep(4s) during netpoll_setup() if the carrier
> appears instantly.
> 
> Here are some scenarios where this workaround is counter-productive in
> modern ages:
> 
> Servers which have BMC communicating over NC-SI via the same NIC as gets
> used for netconsole. BMC will keep the PHY up, hence the carrier
> appearing instantly.
> 
> [...]

Here is the summary with links:
  - [v3] netpoll: Remove 4s sleep during carrier detection
    https://git.kernel.org/netdev/net-next/c/d8afe2f8a92d

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-01-28  8:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-25 18:52 [PATCH v3] netpoll: Remove 4s sleep during carrier detection Breno Leitao
2023-01-26  9:04 ` David Laight
2023-01-26 10:52   ` Breno Leitao
2023-01-26 13:22   ` Andrew Lunn
2023-01-28  8:40 ` patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.