2nd RTM_NEWLINK notification with operstate down is always 1 second delayed

* 2nd RTM_NEWLINK notification with operstate down is always 1 second delayed
@ 2024-04-17 17:37 Tom, Deepak Abraham
  2024-04-17 22:33 ` Stephen Hemminger
  0 siblings, 1 reply; 4+ messages in thread
From: Tom, Deepak Abraham @ 2024-04-17 17:37 UTC (permalink / raw)
  To: netdev

Hi!

I have a system configured with 2 physical eth interfaces connected to a switch.
When I reboot the switch, I see that the userspace RTM_NEWLINK notifications for the interfaces are always 1 second apart although both links actually go down almost simultaneously!
The subsequent RTM_NEWLINK notifications when the switch comes back up are however only delayed by a few microseconds between each other, which is as expected.

Turns out this delay is intentionally introudced by the linux kernel networking code in net/core/link_watch.c, last modified 17 years ago in commit 294cc44:
         /*
          * Limit the number of linkwatch events to one
          * per second so that a runaway driver does not
          * cause a storm of messages on the netlink
          * socket.  This limit does not apply to up events
          * while the device qdisc is down.
          */


On modern high performance systems, limiting the number of down events to just one per second have far reaching consequences.
I was wondering if it would be advisable to reduce this delay to something smaller, say 5ms (so 5ms+scheduling delay practically):

--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -130,8 +130,8 @@ static void linkwatch_schedule_work(int urgent)
                delay = 0;
        }

-       /* If we wrap around we'll delay it by at most HZ. */
-       if (delay > HZ)
+       /* If we wrap around we'll delay it by at most HZ/200. */
+       if (delay > (HZ/200))
                delay = 0;

        /*
@@ -187,15 +187,15 @@ static void __linkwatch_run_queue(int urgent_only)

        /*
         * Limit the number of linkwatch events to one
-        * per second so that a runaway driver does not
+        * per 5 millisecond so that a runaway driver does not
         * cause a storm of messages on the netlink
         * socket.  This limit does not apply to up events
         * while the device qdisc is down.
         */
        if (!urgent_only)
-               linkwatch_nextevent = jiffies + HZ;
+               linkwatch_nextevent = jiffies + (HZ/200);
        /* Limit wrap-around effect on delay. */
-       else if (time_after(linkwatch_nextevent, jiffies + HZ))
+       else if (time_after(linkwatch_nextevent, jiffies + (HZ/200)))
                linkwatch_nextevent = jiffies;

        clear_bit(LW_URGENT, &linkwatch_flags);


I have tested this change in my environment, and it works as expected. I don't see any new issues popping up because of this.

Are there any concerns with making this change today? Hoping to get some feedback.


Thank You,
Deepak Abraham Tom

^ permalink raw reply	[flat|nested] 4+ messages in thread