netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Softirq latencies causing lost ethernet packets
@ 2022-05-25  9:01 David Laight
  2022-05-25 11:01 ` Paolo Abeni
  0 siblings, 1 reply; 3+ messages in thread
From: David Laight @ 2022-05-25  9:01 UTC (permalink / raw)
  To: netdev, Eric Dumazet, 'greearb@candelatech.com',
	'tglx@linutronix.de'
  Cc: 'tj@kernel.org', 'priikone@iki.fi',
	'peterz@infradead.org'

I've finally discovered why I'm getting a lot of lost ethernet
packets in one of my high packet rate tests (400k/sec short UDP).

The underlying problem is that the napi callbacks need to loop
in the softirq code.
For my test I need the cpu to be running at well over 50% 'softint'.
(And that is just for the ethernet receive, RPS is moving the IP/UDP
processing elsewhere.)

The problems are caused by this bit of code in __do_softirq():

        pending = local_softirq_pending();
        if (pending) {
                if (time_before(jiffies, end) && !need_resched() &&
                    --max_restart)
                        goto restart;

                wakeup_softirqd();
        }

Eric's c10d73671 changed it from:
        if (pending) {
                if (--max_restart)
                        goto restart;

                wakeup_softirqd();
        }

to
        if (pending) {
                if (time_before(jiffies, end) && !need_resched())
                        goto restart;

                wakeup_softirqd();
        }

Because just running 10 copies caused excessive latencies.

The good work was then undone by 34376a50f that added the
'max_restart' check back (with its limit of 10) to avoid
an issue with stop_machine getting stuck (jiffies doesn't
increment).

This can (probably) be fixed by setting the limit to 1000.

However there is a separate issue with the need_resched() check.
In my tests this is stopping the softint/napi callbacks for
anything up to 9 milliseconds - more than enough to drop packets.

The problem here is that the softirqd are low priority processes.
The application processes the receive the UDP all run under the
realtime scheduler (priority -51).
If the softint interrupts my RT process it is fine.
But the following sequence isn't:
 - softint runs on idle process.
 - RT process scheduled on the same cpu
 - __do_softirq() detects need_resched() calls wakeup_softirqd()
 - scheduler switches from the idle to my RT process.
 - RT process runs for several milliseconds.
 - finally softirqd is scheduled

The softint is usually higher priority than any RT thread
(because it just steals the context).
But in the more unusual case of an RT process being scheduled
while the softint is active it suddenly becomes lower priority
than the RT process.

I'm sure what the intended purpose of the need_resched() is?
I think it was eric's first thought for a limit, but he had to
add the jiffies test as well to avoid RCU stalls.

The jiffies test itself might be problematic.
It is fixed at 2 jiffies - 1ms to 2ms at 1000Hz.
I'm expecting the softint code to be running at (maybe) 80% cpu.
So that limit would need increasing.
There is a similar limit in the napi code - but that is configurable
(and, I think, just causes the softing code to loop).

But if RCU stalls are a problem maybe the rcu read lock ought to
disable softints?
So the softint is run when the rcu lock is released.

I did try setting the softirqd processes to a much higher priority
but that didn't seem to help - I didn't look exactly why.

While I could use processor affinities to stop the application's
RT threads running on the softint-heavy cpu that is all hard
and difficult to arrange.
In any case the application can make use of the non-softint time
on those cpu.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Softirq latencies causing lost ethernet packets
  2022-05-25  9:01 Softirq latencies causing lost ethernet packets David Laight
@ 2022-05-25 11:01 ` Paolo Abeni
  2022-05-25 12:00   ` Eric Dumazet
  0 siblings, 1 reply; 3+ messages in thread
From: Paolo Abeni @ 2022-05-25 11:01 UTC (permalink / raw)
  To: David Laight, netdev, Eric Dumazet,
	'greearb@candelatech.com', 'tglx@linutronix.de'
  Cc: 'tj@kernel.org', 'priikone@iki.fi',
	'peterz@infradead.org'

On Wed, 2022-05-25 at 09:01 +0000, David Laight wrote:
> I've finally discovered why I'm getting a lot of lost ethernet
> packets in one of my high packet rate tests (400k/sec short UDP).
> 
> The underlying problem is that the napi callbacks need to loop
> in the softirq code.
> For my test I need the cpu to be running at well over 50% 'softint'.
> (And that is just for the ethernet receive, RPS is moving the IP/UDP
> processing elsewhere.)
> 
> The problems are caused by this bit of code in __do_softirq():
> 
>         pending = local_softirq_pending();
>         if (pending) {
>                 if (time_before(jiffies, end) && !need_resched() &&
>                     --max_restart)
>                         goto restart;
> 
>                 wakeup_softirqd();
>         }
> 
> Eric's c10d73671 changed it from:
>         if (pending) {
>                 if (--max_restart)
>                         goto restart;
> 
>                 wakeup_softirqd();
>         }
> 
> to
>         if (pending) {
>                 if (time_before(jiffies, end) && !need_resched())
>                         goto restart;
> 
>                 wakeup_softirqd();
>         }
> 
> Because just running 10 copies caused excessive latencies.
> 
> The good work was then undone by 34376a50f that added the
> 'max_restart' check back (with its limit of 10) to avoid
> an issue with stop_machine getting stuck (jiffies doesn't
> increment).
> 
> This can (probably) be fixed by setting the limit to 1000.
> 
> However there is a separate issue with the need_resched() check.
> In my tests this is stopping the softint/napi callbacks for
> anything up to 9 milliseconds - more than enough to drop packets.
> 
> The problem here is that the softirqd are low priority processes.
> The application processes the receive the UDP all run under the
> realtime scheduler (priority -51).
> If the softint interrupts my RT process it is fine.
> But the following sequence isn't:
>  - softint runs on idle process.
>  - RT process scheduled on the same cpu
>  - __do_softirq() detects need_resched() calls wakeup_softirqd()
>  - scheduler switches from the idle to my RT process.
>  - RT process runs for several milliseconds.
>  - finally softirqd is scheduled
> 
> The softint is usually higher priority than any RT thread
> (because it just steals the context).
> But in the more unusual case of an RT process being scheduled
> while the softint is active it suddenly becomes lower priority
> than the RT process.
> 
> I'm sure what the intended purpose of the need_resched() is?
> I think it was eric's first thought for a limit, but he had to
> add the jiffies test as well to avoid RCU stalls.
> 
> The jiffies test itself might be problematic.
> It is fixed at 2 jiffies - 1ms to 2ms at 1000Hz.
> I'm expecting the softint code to be running at (maybe) 80% cpu.
> So that limit would need increasing.
> There is a similar limit in the napi code - but that is configurable
> (and, I think, just causes the softing code to loop).
> 
> But if RCU stalls are a problem maybe the rcu read lock ought to
> disable softints?
> So the softint is run when the rcu lock is released.
> 
> I did try setting the softirqd processes to a much higher priority
> but that didn't seem to help - I didn't look exactly why.
> 
> While I could use processor affinities to stop the application's
> RT threads running on the softint-heavy cpu that is all hard
> and difficult to arrange.
> In any case the application can make use of the non-softint time
> on those cpu.

Overall this looks like a scenario where the napi threaded model could
help?

echo 1 > /sys/class/net/<dev name>/threaded

and than set the napi threads scheduling parameter as it fit you
better.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Softirq latencies causing lost ethernet packets
  2022-05-25 11:01 ` Paolo Abeni
@ 2022-05-25 12:00   ` Eric Dumazet
  0 siblings, 0 replies; 3+ messages in thread
From: Eric Dumazet @ 2022-05-25 12:00 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: David Laight, netdev, greearb, tglx, tj, priikone, peterz

On Wed, May 25, 2022 at 4:01 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Wed, 2022-05-25 at 09:01 +0000, David Laight wrote:
> > I've finally discovered why I'm getting a lot of lost ethernet
> > packets in one of my high packet rate tests (400k/sec short UDP).
> >
> > The underlying problem is that the napi callbacks need to loop
> > in the softirq code.
> > For my test I need the cpu to be running at well over 50% 'softint'.
> > (And that is just for the ethernet receive, RPS is moving the IP/UDP
> > processing elsewhere.)
> >
> > The problems are caused by this bit of code in __do_softirq():
> >
> >         pending = local_softirq_pending();
> >         if (pending) {
> >                 if (time_before(jiffies, end) && !need_resched() &&
> >                     --max_restart)
> >                         goto restart;
> >
> >                 wakeup_softirqd();
> >         }
> >
> > Eric's c10d73671 changed it from:
> >         if (pending) {
> >                 if (--max_restart)
> >                         goto restart;
> >
> >                 wakeup_softirqd();
> >         }
> >
> > to
> >         if (pending) {
> >                 if (time_before(jiffies, end) && !need_resched())
> >                         goto restart;
> >
> >                 wakeup_softirqd();
> >         }
> >
> > Because just running 10 copies caused excessive latencies.
> >
> > The good work was then undone by 34376a50f that added the
> > 'max_restart' check back (with its limit of 10) to avoid
> > an issue with stop_machine getting stuck (jiffies doesn't
> > increment).
> >
> > This can (probably) be fixed by setting the limit to 1000.
> >
> > However there is a separate issue with the need_resched() check.
> > In my tests this is stopping the softint/napi callbacks for
> > anything up to 9 milliseconds - more than enough to drop packets.
> >
> > The problem here is that the softirqd are low priority processes.
> > The application processes the receive the UDP all run under the
> > realtime scheduler (priority -51).
> > If the softint interrupts my RT process it is fine.
> > But the following sequence isn't:
> >  - softint runs on idle process.
> >  - RT process scheduled on the same cpu
> >  - __do_softirq() detects need_resched() calls wakeup_softirqd()
> >  - scheduler switches from the idle to my RT process.
> >  - RT process runs for several milliseconds.
> >  - finally softirqd is scheduled
> >
> > The softint is usually higher priority than any RT thread
> > (because it just steals the context).
> > But in the more unusual case of an RT process being scheduled
> > while the softint is active it suddenly becomes lower priority
> > than the RT process.
> >
> > I'm sure what the intended purpose of the need_resched() is?
> > I think it was eric's first thought for a limit, but he had to
> > add the jiffies test as well to avoid RCU stalls.
> >
> > The jiffies test itself might be problematic.
> > It is fixed at 2 jiffies - 1ms to 2ms at 1000Hz.
> > I'm expecting the softint code to be running at (maybe) 80% cpu.
> > So that limit would need increasing.
> > There is a similar limit in the napi code - but that is configurable
> > (and, I think, just causes the softing code to loop).
> >
> > But if RCU stalls are a problem maybe the rcu read lock ought to
> > disable softints?
> > So the softint is run when the rcu lock is released.
> >
> > I did try setting the softirqd processes to a much higher priority
> > but that didn't seem to help - I didn't look exactly why.
> >
> > While I could use processor affinities to stop the application's
> > RT threads running on the softint-heavy cpu that is all hard
> > and difficult to arrange.
> > In any case the application can make use of the non-softint time
> > on those cpu.
>
> Overall this looks like a scenario where the napi threaded model could
> help?
>
> echo 1 > /sys/class/net/<dev name>/threaded
>
> and than set the napi threads scheduling parameter as it fit you
> better.
>
> Cheers,
>
> Paolo

Also, make sure your user threads are not allowed to run on the cpu
servicing NIC interrupts.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-05-25 12:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-25  9:01 Softirq latencies causing lost ethernet packets David Laight
2022-05-25 11:01 ` Paolo Abeni
2022-05-25 12:00   ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).