linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rod Webster <rod@vmn.com.au>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users@vger.kernel.org, Marcelo Tosatti <mtosatti@redhat.com>
Subject: Re: Excessive network latency when using Realtek R8168/R8111 et al NIC
Date: Fri, 19 May 2023 21:41:30 +1000	[thread overview]
Message-ID: <CANV1gkcah59yVOMZ-s_0TMeMEgpwOaDKSWbdL3Y2Y86RQ+UsKA@mail.gmail.com> (raw)
In-Reply-To: <20230519083745.AB0-5kD9@linutronix.de>

There is something weird going on, I have an i5 PC with R8111 NIC
which was problematic until I installed the R8168-dkms driver and the
6.3 kernel I compiled. It regularly threw packet errors on a RT thread
that fires every 1 ms.
I wanted to revert it so I could do these tests.
So I purged  the R8168-dkms driver to revert to the R8169 and rolled
back to a Debian 6.1 RT kernel. It's still running isolcpus=2,3
The darn thing refused to generate a packet error until I dropped the
thread timing to fire every 0.15 ms.
So it seems the process of installing and uninstalling has changed the
driver environment.
I do recall having something similar accidentally happen to another PC
once before.  I think it was related to the install and uninstall of
the 8168-dkms driver.
Any ideas what could be going on?

I will do some testing tomorrow now I know what's going on but I
wonder how relevant it will be given this turnaround?


Rod Webster





Rod Webster

VMN®

www.vmn.com.au

Ph: 1300 896 832

Mob: +61 435 765 611




On Fri, 19 May 2023 at 18:37, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2023-05-18 20:15:20 [+1000], Rod Webster wrote:
> > Sebastian,
> Hi Rod,
>
> > I have cc'd Marcelo here as he has offered to help.
> >
> > I am attaching the last 1000 lines or so of our trace (I hope
> > attachments are allowed)
> >
> > Our bash script to terminate the trace was called PacketErrorTest. It
> > loops until the hardware reports a packet error.
> > The script contains:
> > #!/usr/bin/bash
> > stat=0
> > while (($stat < 1))
> > do
> > stat=`halcmd getp hm2_7i96s.0.packet-error-total`
> > done
> > trace-cmd stop
> >
> > I must say we have no idea what we are looking at but can't see
> > anything obvious to a layperson.
> > Please advise next steps
>
> I don't see much wrong here. Maybe the trace isn't long enough.
> However what I saw is that your networking interrupt isn't threaded and
> then all networking is outsourced to ksoftirqrd which might get less
> processing time if there is much going on.
> It would help if you enable NAPI threads: As root, please do
>
>         echo 1 > /sys/class/net/enp2s0/threaded
>
> This will spawn a few threads named "napi/enp2s0-$napi_id. The
> actual number of threads depends on the number of NAPI instances
> (usually queues) used by the driver.
> Once you identified them, please pin them the right CPU and adjust the
> priority:
>         taskset -pc 2 $pid
>         chrt -p -f 50 $pid
>
> The taskset command will pin the NAPI thread to CPU2. This is what I see
> in your trace where the interrupt is active. It is good to have them
> (the interrupt and the interrupt thread or the NAPI thread in your
> case) running on the same CPU. The chrt will adjust the priority of the
> thread from SCHED_OTHER to a SCHED_FIFO priority 50.
>
> By default all interrupt threads run at priority 50 (middle priority).
> They get preferred over all tasks with lower priority including
> SCHED_OTHER tasks. SCHED_OTHER (the default scheduling class) tasks get
> preempted once they used their time slice so that they don't lock up the
> system. This isn't the case with SCHED_FIFO.
>
> This realtek/r8169 driver is not using interrupts threads in your
> configuration. While I don't see any API breakage I am rethinking the
> whole softirq concept and moving to NAPI threads might not be that bad.
> Be aware that _now_ you don't have any "overloading" concept. I.e. what
> should be done if the system is receiving a lot of packages that it
> didn't ask for (say ping flood or so). Before this change, the NAPI
> would hand over to ksoftirqd running at SCHED_OTHER so it would be
> balanced with other tasks within the system.
>
> An explanation of the trace you sent:
> |  PacketErrorTest-2552  [002]   191.050337: irq_handler_entry:    irq=125 name=enp2s0
>    Running Task          <CPU>               Interrupt occurred    network card
>
> |  PacketErrorTest-2552  [002]   191.050339: softirq_raise:        vec=3 [action=NET_RX]
> |  PacketErrorTest-2552  [002]   191.050339: irq_handler_exit:     irq=125 ret=handled
> |  PacketErrorTest-2552  [002]   191.050340: sched_waking:         comm=ksoftirqd/2 pid=37 prio=120 target_cpu=002
> |  PacketErrorTest-2552  [002]   191.050343: sched_wakeup:         ksoftirqd/2:37 [120] CPU:002
>
> It scheduled NAPI (NET_RX) which is processed in softirq. Since this
> happened in hardirq context ("trace-cmd report -l" would reveal that) it
> wakes ksoftirqd for the processing.
>
> |  PacketErrorTest-2552  [002]   191.050344: sched_stat_runtime:   comm=PacketErrorTest pid=2552 runtime=587973 [ns] vruntime=15925574432 [ns]
> |  PacketErrorTest-2552  [002]   191.050346: sched_switch:         PacketErrorTest:2552 [120] R ==> ksoftirqd/2:37 [120]
> |      ksoftirqd/2-37    [002]   191.050347: softirq_entry:        vec=3 [action=NET_RX]
> |      ksoftirqd/2-37    [002]   191.050351: softirq_exit:         vec=3 [action=NET_RX]
>
> Here receiving of network packets is completed. Timing wise,
> 191.050351 - 191.050337 = 14us after the interrupt triggered. Looking
> through the whole file, it ranges from 6us to 14us (17 samples).
> So it doesn't look bad. If processing would get preempted and would need
> a few millisecond then it would be bad.
>
> > Rod Webster
> >
> > VMN®
> >
> > www.vmn.com.au
>
> Sebastian

  reply	other threads:[~2023-05-19 11:41 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-18  3:23 Excessive network latency when using Realtek R8168/R8111 et al NIC Rod Webster
2023-04-28  8:51 ` Sebastian Andrzej Siewior
     [not found]   ` <CANV1gkcr4jBUY-iH-iJJPdrVMp+1Nq1YrNPONferMC1AutJgkg@mail.gmail.com>
2023-04-28 13:12     ` Sebastian Andrzej Siewior
2023-04-28 21:37       ` Rod Webster
2023-04-29  1:00         ` Rod Webster
2023-05-16 10:59         ` Sebastian Andrzej Siewior
     [not found]           ` <CANV1gkftrZvhUhXV-mJ-mYmsue3ER33cXCNmVD1bGAc6TmTHuA@mail.gmail.com>
     [not found]             ` <CANV1gkfsAfDt76=STFrekQA4M6sfVKyq7bujA=Tu+S6k+EGYcg@mail.gmail.com>
2023-05-19  8:37               ` Sebastian Andrzej Siewior
2023-05-19 11:41                 ` Rod Webster [this message]
2023-05-22  9:32                   ` Sebastian Andrzej Siewior
2023-05-22 10:06                     ` Rod Webster
2023-05-22 14:45                       ` Marcelo Tosatti
2023-05-22 20:02                         ` Rod Webster
2023-05-22 20:37                           ` Peter Wallace
2023-05-22 20:50                             ` Rod Webster
2023-05-23 23:21                             ` Marcelo Tosatti
2023-05-24 14:09                               ` Peter Wallace
2023-05-23 23:04                           ` Marcelo Tosatti
2023-05-24  9:37                             ` Stephane ANCELOT

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANV1gkcah59yVOMZ-s_0TMeMEgpwOaDKSWbdL3Y2Y86RQ+UsKA@mail.gmail.com \
    --to=rod@vmn.com.au \
    --cc=bigeasy@linutronix.de \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).