From: Rod Webster <rod@vmn.com.au>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users@vger.kernel.org, Marcelo Tosatti <mtosatti@redhat.com>
Subject: Re: Excessive network latency when using Realtek R8168/R8111 et al NIC
Date: Fri, 19 May 2023 21:41:30 +1000 [thread overview]
Message-ID: <CANV1gkcah59yVOMZ-s_0TMeMEgpwOaDKSWbdL3Y2Y86RQ+UsKA@mail.gmail.com> (raw)
In-Reply-To: <20230519083745.AB0-5kD9@linutronix.de>
There is something weird going on, I have an i5 PC with R8111 NIC
which was problematic until I installed the R8168-dkms driver and the
6.3 kernel I compiled. It regularly threw packet errors on a RT thread
that fires every 1 ms.
I wanted to revert it so I could do these tests.
So I purged the R8168-dkms driver to revert to the R8169 and rolled
back to a Debian 6.1 RT kernel. It's still running isolcpus=2,3
The darn thing refused to generate a packet error until I dropped the
thread timing to fire every 0.15 ms.
So it seems the process of installing and uninstalling has changed the
driver environment.
I do recall having something similar accidentally happen to another PC
once before. I think it was related to the install and uninstall of
the 8168-dkms driver.
Any ideas what could be going on?
I will do some testing tomorrow now I know what's going on but I
wonder how relevant it will be given this turnaround?
Rod Webster
Rod Webster
VMN®
www.vmn.com.au
Ph: 1300 896 832
Mob: +61 435 765 611
On Fri, 19 May 2023 at 18:37, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2023-05-18 20:15:20 [+1000], Rod Webster wrote:
> > Sebastian,
> Hi Rod,
>
> > I have cc'd Marcelo here as he has offered to help.
> >
> > I am attaching the last 1000 lines or so of our trace (I hope
> > attachments are allowed)
> >
> > Our bash script to terminate the trace was called PacketErrorTest. It
> > loops until the hardware reports a packet error.
> > The script contains:
> > #!/usr/bin/bash
> > stat=0
> > while (($stat < 1))
> > do
> > stat=`halcmd getp hm2_7i96s.0.packet-error-total`
> > done
> > trace-cmd stop
> >
> > I must say we have no idea what we are looking at but can't see
> > anything obvious to a layperson.
> > Please advise next steps
>
> I don't see much wrong here. Maybe the trace isn't long enough.
> However what I saw is that your networking interrupt isn't threaded and
> then all networking is outsourced to ksoftirqrd which might get less
> processing time if there is much going on.
> It would help if you enable NAPI threads: As root, please do
>
> echo 1 > /sys/class/net/enp2s0/threaded
>
> This will spawn a few threads named "napi/enp2s0-$napi_id. The
> actual number of threads depends on the number of NAPI instances
> (usually queues) used by the driver.
> Once you identified them, please pin them the right CPU and adjust the
> priority:
> taskset -pc 2 $pid
> chrt -p -f 50 $pid
>
> The taskset command will pin the NAPI thread to CPU2. This is what I see
> in your trace where the interrupt is active. It is good to have them
> (the interrupt and the interrupt thread or the NAPI thread in your
> case) running on the same CPU. The chrt will adjust the priority of the
> thread from SCHED_OTHER to a SCHED_FIFO priority 50.
>
> By default all interrupt threads run at priority 50 (middle priority).
> They get preferred over all tasks with lower priority including
> SCHED_OTHER tasks. SCHED_OTHER (the default scheduling class) tasks get
> preempted once they used their time slice so that they don't lock up the
> system. This isn't the case with SCHED_FIFO.
>
> This realtek/r8169 driver is not using interrupts threads in your
> configuration. While I don't see any API breakage I am rethinking the
> whole softirq concept and moving to NAPI threads might not be that bad.
> Be aware that _now_ you don't have any "overloading" concept. I.e. what
> should be done if the system is receiving a lot of packages that it
> didn't ask for (say ping flood or so). Before this change, the NAPI
> would hand over to ksoftirqd running at SCHED_OTHER so it would be
> balanced with other tasks within the system.
>
> An explanation of the trace you sent:
> | PacketErrorTest-2552 [002] 191.050337: irq_handler_entry: irq=125 name=enp2s0
> Running Task <CPU> Interrupt occurred network card
>
> | PacketErrorTest-2552 [002] 191.050339: softirq_raise: vec=3 [action=NET_RX]
> | PacketErrorTest-2552 [002] 191.050339: irq_handler_exit: irq=125 ret=handled
> | PacketErrorTest-2552 [002] 191.050340: sched_waking: comm=ksoftirqd/2 pid=37 prio=120 target_cpu=002
> | PacketErrorTest-2552 [002] 191.050343: sched_wakeup: ksoftirqd/2:37 [120] CPU:002
>
> It scheduled NAPI (NET_RX) which is processed in softirq. Since this
> happened in hardirq context ("trace-cmd report -l" would reveal that) it
> wakes ksoftirqd for the processing.
>
> | PacketErrorTest-2552 [002] 191.050344: sched_stat_runtime: comm=PacketErrorTest pid=2552 runtime=587973 [ns] vruntime=15925574432 [ns]
> | PacketErrorTest-2552 [002] 191.050346: sched_switch: PacketErrorTest:2552 [120] R ==> ksoftirqd/2:37 [120]
> | ksoftirqd/2-37 [002] 191.050347: softirq_entry: vec=3 [action=NET_RX]
> | ksoftirqd/2-37 [002] 191.050351: softirq_exit: vec=3 [action=NET_RX]
>
> Here receiving of network packets is completed. Timing wise,
> 191.050351 - 191.050337 = 14us after the interrupt triggered. Looking
> through the whole file, it ranges from 6us to 14us (17 samples).
> So it doesn't look bad. If processing would get preempted and would need
> a few millisecond then it would be bad.
>
> > Rod Webster
> >
> > VMN®
> >
> > www.vmn.com.au
>
> Sebastian
next prev parent reply other threads:[~2023-05-19 11:41 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-18 3:23 Excessive network latency when using Realtek R8168/R8111 et al NIC Rod Webster
2023-04-28 8:51 ` Sebastian Andrzej Siewior
[not found] ` <CANV1gkcr4jBUY-iH-iJJPdrVMp+1Nq1YrNPONferMC1AutJgkg@mail.gmail.com>
2023-04-28 13:12 ` Sebastian Andrzej Siewior
2023-04-28 21:37 ` Rod Webster
2023-04-29 1:00 ` Rod Webster
2023-05-16 10:59 ` Sebastian Andrzej Siewior
[not found] ` <CANV1gkftrZvhUhXV-mJ-mYmsue3ER33cXCNmVD1bGAc6TmTHuA@mail.gmail.com>
[not found] ` <CANV1gkfsAfDt76=STFrekQA4M6sfVKyq7bujA=Tu+S6k+EGYcg@mail.gmail.com>
2023-05-19 8:37 ` Sebastian Andrzej Siewior
2023-05-19 11:41 ` Rod Webster [this message]
2023-05-22 9:32 ` Sebastian Andrzej Siewior
2023-05-22 10:06 ` Rod Webster
2023-05-22 14:45 ` Marcelo Tosatti
2023-05-22 20:02 ` Rod Webster
2023-05-22 20:37 ` Peter Wallace
2023-05-22 20:50 ` Rod Webster
2023-05-23 23:21 ` Marcelo Tosatti
2023-05-24 14:09 ` Peter Wallace
2023-05-23 23:04 ` Marcelo Tosatti
2023-05-24 9:37 ` Stephane ANCELOT
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CANV1gkcah59yVOMZ-s_0TMeMEgpwOaDKSWbdL3Y2Y86RQ+UsKA@mail.gmail.com \
--to=rod@vmn.com.au \
--cc=bigeasy@linutronix.de \
--cc=linux-rt-users@vger.kernel.org \
--cc=mtosatti@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).