Re: Excessive network latency when using Realtek R8168/R8111 et al NIC

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Rod Webster <rod@vmn.com.au>
Cc: linux-rt-users@vger.kernel.org, Marcelo Tosatti <mtosatti@redhat.com>
Subject: Re: Excessive network latency when using Realtek R8168/R8111 et al NIC
Date: Fri, 19 May 2023 10:37:45 +0200	[thread overview]
Message-ID: <20230519083745.AB0-5kD9@linutronix.de> (raw)
In-Reply-To: <CANV1gkfsAfDt76=STFrekQA4M6sfVKyq7bujA=Tu+S6k+EGYcg@mail.gmail.com>

On 2023-05-18 20:15:20 [+1000], Rod Webster wrote:
> Sebastian,
Hi Rod,

> I have cc'd Marcelo here as he has offered to help.
> 
> I am attaching the last 1000 lines or so of our trace (I hope
> attachments are allowed)
> 
> Our bash script to terminate the trace was called PacketErrorTest. It
> loops until the hardware reports a packet error.
> The script contains:
> #!/usr/bin/bash
> stat=0
> while (($stat < 1))
> do
> stat=`halcmd getp hm2_7i96s.0.packet-error-total`
> done
> trace-cmd stop
> 
> I must say we have no idea what we are looking at but can't see
> anything obvious to a layperson.
> Please advise next steps

I don't see much wrong here. Maybe the trace isn't long enough.
However what I saw is that your networking interrupt isn't threaded and
then all networking is outsourced to ksoftirqrd which might get less
processing time if there is much going on.
It would help if you enable NAPI threads: As root, please do

	echo 1 > /sys/class/net/enp2s0/threaded

This will spawn a few threads named "napi/enp2s0-$napi_id. The
actual number of threads depends on the number of NAPI instances
(usually queues) used by the driver.
Once you identified them, please pin them the right CPU and adjust the
priority:
	taskset -pc 2 $pid
	chrt -p -f 50 $pid

The taskset command will pin the NAPI thread to CPU2. This is what I see
in your trace where the interrupt is active. It is good to have them
(the interrupt and the interrupt thread or the NAPI thread in your
case) running on the same CPU. The chrt will adjust the priority of the
thread from SCHED_OTHER to a SCHED_FIFO priority 50.

By default all interrupt threads run at priority 50 (middle priority).
They get preferred over all tasks with lower priority including
SCHED_OTHER tasks. SCHED_OTHER (the default scheduling class) tasks get
preempted once they used their time slice so that they don't lock up the
system. This isn't the case with SCHED_FIFO.

This realtek/r8169 driver is not using interrupts threads in your
configuration. While I don't see any API breakage I am rethinking the
whole softirq concept and moving to NAPI threads might not be that bad.
Be aware that _now_ you don't have any "overloading" concept. I.e. what
should be done if the system is receiving a lot of packages that it
didn't ask for (say ping flood or so). Before this change, the NAPI
would hand over to ksoftirqd running at SCHED_OTHER so it would be
balanced with other tasks within the system.

An explanation of the trace you sent:
|  PacketErrorTest-2552  [002]   191.050337: irq_handler_entry:    irq=125 name=enp2s0
   Running Task          <CPU>               Interrupt occurred    network card

|  PacketErrorTest-2552  [002]   191.050339: softirq_raise:        vec=3 [action=NET_RX]
|  PacketErrorTest-2552  [002]   191.050339: irq_handler_exit:     irq=125 ret=handled
|  PacketErrorTest-2552  [002]   191.050340: sched_waking:         comm=ksoftirqd/2 pid=37 prio=120 target_cpu=002
|  PacketErrorTest-2552  [002]   191.050343: sched_wakeup:         ksoftirqd/2:37 [120] CPU:002

It scheduled NAPI (NET_RX) which is processed in softirq. Since this
happened in hardirq context ("trace-cmd report -l" would reveal that) it
wakes ksoftirqd for the processing.

|  PacketErrorTest-2552  [002]   191.050344: sched_stat_runtime:   comm=PacketErrorTest pid=2552 runtime=587973 [ns] vruntime=15925574432 [ns]
|  PacketErrorTest-2552  [002]   191.050346: sched_switch:         PacketErrorTest:2552 [120] R ==> ksoftirqd/2:37 [120]
|      ksoftirqd/2-37    [002]   191.050347: softirq_entry:        vec=3 [action=NET_RX]
|      ksoftirqd/2-37    [002]   191.050351: softirq_exit:         vec=3 [action=NET_RX]

Here receiving of network packets is completed. Timing wise,
191.050351 - 191.050337 = 14us after the interrupt triggered. Looking
through the whole file, it ranges from 6us to 14us (17 samples).
So it doesn't look bad. If processing would get preempted and would need
a few millisecond then it would be bad.

> Rod Webster
> 
> VMN®
> 
> www.vmn.com.au

Sebastian