linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Rod Webster <rod@vmn.com.au>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	linux-rt-users@vger.kernel.org
Subject: Re: Excessive network latency when using Realtek R8168/R8111 et al NIC
Date: Tue, 23 May 2023 20:04:17 -0300	[thread overview]
Message-ID: <ZG1GcdTf5q9ObM5N@tpad> (raw)
In-Reply-To: <CANV1gkcuPVZz6=BrNxWBd9SqB4ydNmJOge6Tir2xG9C050oH1w@mail.gmail.com>

On Tue, May 23, 2023 at 06:02:13AM +1000, Rod Webster wrote:
> This stuff is hard! I just realised that rtapi_app is a red herring!
> rtapi_app is Linuxcnc and there is nothing wrong with it. Its thread
> is on a 1000us cycle so it seems it gets all its jobs done in 200us
> and then sleeps for 800us which makes perfect sense!
> 
> The issue we have is deeper than that. I think we should be looking at
> the NIC interrupt (but don't trust the novice!).
> The network communication is consuming more than the 800us slack from
> time to time. When that happens, our hardware sees the timing overrun
> and increments an internal packet error count. If too many of these
> happen in succession, the hardware decides the RT environment can't be
> relied on, disables further communication and returns an "error
> finishing read" to Linuxcnc to say it's given up.
> 
> Marcelo, we didn't resort to C. We were able to use a bash script and
> use a linuxcnc tool called halcmd to query the hardware as shown here.
> #!/usr/bin/bash
> stat=0
> while (($stat < 1))
> do
> stat=`halcmd getp hm2_7i96s.0.packet-error-total`
> done
> trace-cmd stop

You can also write a value to the trace before stopping it:

echo "pin high, stopping tracing!" > /sys/kernel/debug/tracing/trace_marker

Just to make sure you are seeing the trace up to the correct event.

> I think we need to increase the stat threshold  so we get more samples
> in our trace before stopping it. The current trace will only have one
> instance.
> Thanks for letting me see the issue more clearly.
> 
> 
> Rod Webster
> 
> 
> 
> Rod Webster
> 
> VMN®
> 
> www.vmn.com.au
> 
> Ph: 1300 896 832
> 
> Mob: +61 435 765 611
> 
> 
> 
> 
> On Tue, 23 May 2023 at 03:13, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> >
> > On Mon, May 22, 2023 at 08:06:39PM +1000, Rod Webster wrote:
> > > Yes, I won't be using a DKMS driver and will test with the in-tree driver.
> > > I rolled a PC back to the Debian 6.1 RT kernel and the default in-tree
> > > R8169 driver yesterday with isolcpus=2,3.
> > >
> > > I had a shot at it last night with the napi thread but was confused by
> > > the trace results so will try again.
> > > I could not see our PacketErrorTest script appearing in the trace but
> > > it stopped the trace when it should.
> >
> > Rod,
> >
> > Can you describe the PacketErrorTest script and how do you know, from
> > it, that latency has crossed the threshold? What was confusing about
> > the traces?
> >
> > Earlier you wrote:
> > I've had some feedback from a hardware supplier.
> > The Linuxcnc RT ethernet driver has a pin that reports ethernet packet
> > error count.
> > This increments when the ethernet response exceeds our real time servo
> > thread period (typically 1 ms)
> > We should be able to monitor this pin and stop the trace if it is >  0
> > without too much effort.
> >
> > So you are seeing that pin increase its value?
> >
> > When you do, you should write to the "tracing_on" file (a zero):
> >
> >         /*
> >          * if we're not tracing and the tracing_on fd is not open,
> >          * open the tracing_on file so that we can stop the trace
> >          * if we hit a breaktrace threshold
> >          */
> >         if (trace_fd < 0) {
> >                 sprintf(path, "%s/%s", fileprefix, "tracing_on");
> >                 if ((trace_fd = open(path, O_WRONLY)) < 0)
> >                         warn("unable to open tracing_on file: %s\n", path);
> >         }
> >
> >         /* now stop any trace */
> >         write(trace_fd, "0\n", 2);
> >
> > The trace file (/sys/kernel/debug/tracing/trace) should now contain
> > the trace up to the write to tracing_on file.
> >
> > Sebastian: didnt see anything weird in the traces as well:
> >
> > "Follows the delta between a list of timestamped events,
> > where each event is: "TIMESTAMP: rtapi is scheduled IN CPU-3"
> > (there is at one time only a single process scheduled in cpu-3).
> >
> > [191.050245, 191.050278]: 0.000033
> > [191.050278, 191.050293]: 0.000015
> > [191.050293, 191.050307]: 0.000014
> > [191.050307, 191.050322]: 0.000015
> > [191.050322, 191.050336]: 0.000014
> >
> > rtapi_app goes to sleep for approximately 15us, and is awakened
> > by a timer.
> >
> > [191.050336, 191.050350]: 0.000014
> > [191.050350, 191.050365]: 0.000015
> > [191.050365, 191.050379]: 0.000014
> > [191.050379, 191.050393]: 0.000014
> > [191.050393, 191.050408]: 0.000015
> > [191.050408, 191.051243]: 0.000835
> > [191.051243, 191.051272]: 0.000029
> >
> > It eventually sleeps for 800us (and this pattern repeats)".
> >
> > Perhaps tracing the system call which sends out the packet would be
> > useful as well.
> >
> > > Also our RT process did not appear to be on an isolated thread which
> > > seemed odd. I did not get as far as setting the affinity for napi.
> > >
> > > I've also installed  Bookworm RC3 on another PC with the default RT
> > > kernel on a Gigabyte Brix N3160 with an R8111/R8168 NIC and the
> > > in-tree R8169 driver.
> > > This PC historically had max latency of about 130000 ns  back with
> > > kernel 4.x and now its reporting 23000 ns with the 6.1 kernel over 24
> > > hours testing with the linuxcnc tools. Quite amazing what you guys
> > > have achieved! It will be a few days before I can lug my hardware to
> > > work to test on it as well for another data point.
> > >
> > > I really appreciate your willingness to help. This remains a
> > > frustrating issue for many Linuxcnc users who are using my compiled
> > > 6.3 kernel to get out of jail.
> > > Bear with me as I bumble my way through!
> > >
> > > Rod Webster
> > >
> > >
> > >
> > >
> > >
> > > Rod Webster
> > >
> > > VMN®
> > >
> > > www.vmn.com.au
> > >
> > > Ph: 1300 896 832
> > >
> > > Mob: +61 435 765 611
> > >
> > >
> > >
> > >
> > > On Mon, 22 May 2023 at 19:32, Sebastian Andrzej Siewior
> > > <bigeasy@linutronix.de> wrote:
> > > >
> > > > On 2023-05-19 21:41:30 [+1000], Rod Webster wrote:
> > > > > I will do some testing tomorrow now I know what's going on but I
> > > > > wonder how relevant it will be given this turnaround?
> > > >
> > > > I'm not going to look at the dkms/OOT driver. If there is an actual
> > > > problem or something odd with mainline and RT and I will look into it.
> > > > I managed to find an older box which is using the r8169 and I'm going to
> > > > boot and see if there the driver is doing something odd what I didn't
> > > > see earlier…
> > > >
> > > > Please make sure that you are using the in-tree driver. If you
> > > > experience high latencies then I'm willing to help investigate.
> > > >
> > > > > Rod Webster
> > > >
> > > > Sebastian
> > >
> > >
> >
> 
> 


  parent reply	other threads:[~2023-05-23 23:06 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-18  3:23 Excessive network latency when using Realtek R8168/R8111 et al NIC Rod Webster
2023-04-28  8:51 ` Sebastian Andrzej Siewior
     [not found]   ` <CANV1gkcr4jBUY-iH-iJJPdrVMp+1Nq1YrNPONferMC1AutJgkg@mail.gmail.com>
2023-04-28 13:12     ` Sebastian Andrzej Siewior
2023-04-28 21:37       ` Rod Webster
2023-04-29  1:00         ` Rod Webster
2023-05-16 10:59         ` Sebastian Andrzej Siewior
     [not found]           ` <CANV1gkftrZvhUhXV-mJ-mYmsue3ER33cXCNmVD1bGAc6TmTHuA@mail.gmail.com>
     [not found]             ` <CANV1gkfsAfDt76=STFrekQA4M6sfVKyq7bujA=Tu+S6k+EGYcg@mail.gmail.com>
2023-05-19  8:37               ` Sebastian Andrzej Siewior
2023-05-19 11:41                 ` Rod Webster
2023-05-22  9:32                   ` Sebastian Andrzej Siewior
2023-05-22 10:06                     ` Rod Webster
2023-05-22 14:45                       ` Marcelo Tosatti
2023-05-22 20:02                         ` Rod Webster
2023-05-22 20:37                           ` Peter Wallace
2023-05-22 20:50                             ` Rod Webster
2023-05-23 23:21                             ` Marcelo Tosatti
2023-05-24 14:09                               ` Peter Wallace
2023-05-23 23:04                           ` Marcelo Tosatti [this message]
2023-05-24  9:37                             ` Stephane ANCELOT

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZG1GcdTf5q9ObM5N@tpad \
    --to=mtosatti@redhat.com \
    --cc=bigeasy@linutronix.de \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=rod@vmn.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).