netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Laight <David.Laight@ACULAB.COM>
To: 'Jakub Kicinski' <kuba@kernel.org>
Cc: 'Pavan Chebbi' <pavan.chebbi@broadcom.com>,
	Paolo Abeni <pabeni@redhat.com>,
	Michael Chan <michael.chan@broadcom.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"mchan@broadcom.com" <mchan@broadcom.com>,
	David Miller <davem@davemloft.net>
Subject: RE: tg3 dropping packets at high packet rates
Date: Wed, 25 May 2022 21:48:15 +0000	[thread overview]
Message-ID: <4ff290b53a3945669f2057eddd8441b2@AcuMS.aculab.com> (raw)
In-Reply-To: <20220525085647.6dfb7ed0@kernel.org>

From: Jakub Kicinski
> Sent: 25 May 2022 16:57
> 
> On Wed, 25 May 2022 07:28:42 +0000 David Laight wrote:
> > > As the trace below shows I think the underlying problem
> > > is that the napi callbacks aren't being made in a timely manner.
> >
> > Further investigations have shown that this is actually
> > a generic problem with the way napi callbacks are called
> > from the softint handler.
> >
> > The underlying problem is the effect of this code
> > in __do_softirq().
> >
> >         pending = local_softirq_pending();
> >         if (pending) {
> >                 if (time_before(jiffies, end) && !need_resched() &&
> >                     --max_restart)
> >                         goto restart;
> >
> >                 wakeup_softirqd();
> >         }
> >
> > The napi processing can loop through here and needs to do
> > the 'goto restart' - not doing so will drop packets.
> > The need_resched() test is particularly troublesome.
> > I've also had to increase the limit for 'max_restart' from
> > its (hard coded) 10 to 1000 (100 isn't enough).
> > I'm not sure whether I'm hitting the jiffies limit,
> > but that is hard coded at 2.
> >
> > I'm going to start another thread.
> 
> If you share the core between the application and NAPI try using prefer
> busy polling (SO_PREFER_BUSY_POLL), and manage polling from user space.
> If you have separate cores use threaded NAPI and isolate the core
> running NAPI or give it high prio.

The application is looking at 10000 UDP sockets each of which
typically has 1 packet every 20ms (but there might be an extra
one).
About the only way to handle this is with an array of 100 epoll
fd each of which has 100 UDP sockets.
Every 10ms (we do our RTP in 10ms epochs) each application thread
picks the next epoll fd (using atomic_in_ov()) and then reads all
the 'ready' sockets.
The application can't afford so take a mutex in any hot path
because mutex contention can happen while the process is 'stolen'
by a hardware interrupt and/or softint.
That then stalls all the waiting application threads.

Even then I've got 35 application threads that call epoll_wait()
and recvfrom() and run at about 50% cpu.

The ethernet napi code is using about 50% of two cpu.
I'm using RPS to move the IP/UDP processing to other cpu.
(Manually avoiding the ones taking the hardware interrupts.)

> YMMV but I've spent more time than I'd like to admit looking at the
> softirq yielding conditions, they are hard to beat :(

I've spent a long time discovering it is one reason I'm losing
a lot of packets on a system with a reasonable amount of idle time.

> If you control
> the app much better use of your time to arrange busy poll or pin things.

Pinning things gets annoying.
I've been running the 'important' application threads under the
RT scheduler. This makes their cpu assignment very sticky.
So nearly as good a pinning but the scheduler decides where they go.

This afternoon I tried using threaded napi for the ethernet interface.
(Suggested by Eric.)
This can only really work if the threads run under the RT scheduler.
I can't see an easy way to do this, apart from:
  (cd /proc; for pid in [1-9]*; do
	comm=$(cat $pid/comm);
	[ "${comm#napi/}" != "$comm" ] && chrt --pid 50 $pid;
  done;)
Since I was only running 35 RT application threads the extra 5
napi ones is exactly one for each cpu and AFAICT they all run
on separate cpu.

With threaded napi (and RPS) I'm only seeing 250 (or so)
busy ethernet ring entries in the napi code - not the 1000+
I was getting with the default __do_softirq() code.
Similar to stopping the softint code falling back to its thread.
I'm still losing packets though (sometimes over 100/sec)
not sure if the hardware has no free ring buffers or
whether the switch is dropping some of them.

Apart from python (which needs pinning to a single cpu) I'm not
sure how much effect pinning a normal priority process to a
cpu has - unless you also pin more general processes away from
that cpu.
Running under the RT scheduler (provided you don't create too
many RT processes) sort of gives each one its own cpu while
allowing other processes to use the cpu when it is idle.
Straight forward pinning of processes doesn't do that.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


  reply	other threads:[~2022-05-25 21:48 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-18 16:08 tg3 dropping packets at high packet rates David Laight
2022-05-18 17:27 ` Paolo Abeni
2022-05-18 21:31   ` David Laight
2022-05-19  0:52     ` Michael Chan
2022-05-19  8:44       ` David Laight
2022-05-19 10:20         ` Pavan Chebbi
2022-05-19 13:14           ` David Laight
2022-05-19 13:29             ` Paolo Abeni
2022-05-19 13:54               ` Andrew Lunn
2022-05-19 14:11               ` David Laight
2022-05-19 14:35                 ` Pavan Chebbi
2022-05-19 14:42                   ` David Laight
2022-05-20 16:08                     ` David Laight
2022-05-23 16:01                       ` David Laight
2022-05-23 16:14                         ` Pavan Chebbi
2022-05-23 21:23                           ` David Laight
2022-05-25  7:28                             ` David Laight
2022-05-25 15:56                               ` Jakub Kicinski
2022-05-25 21:48                                 ` David Laight [this message]
2022-05-22 23:22         ` Michael Chan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ff290b53a3945669f2057eddd8441b2@AcuMS.aculab.com \
    --to=david.laight@aculab.com \
    --cc=davem@davemloft.net \
    --cc=kuba@kernel.org \
    --cc=mchan@broadcom.com \
    --cc=michael.chan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pavan.chebbi@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).