All of lore.kernel.org
 help / color / mirror / Atom feed
* Question on tw_timer TIMER_PINNED
@ 2023-09-06 11:58 Juri Lelli
  2023-09-06 12:10 ` Eric Dumazet
  0 siblings, 1 reply; 4+ messages in thread
From: Juri Lelli @ 2023-09-06 11:58 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: LKML, Paolo Abeni, netdev, Valentin Schneider

Hi Eric,

I'm bothering you with a question about timewait_sock tw_timer, as I
believe you are one of the last persons touching it sometime ago. Please
feel free to redirect if I failed to git blame it correctly.

At my end, latency spikes (entering the kernel) have been reported when
running latency sensitive applications in the field (essentially a
polling userspace application that doesn't want any interruption at
all). I think I've been able to track down one of such interruptions to
the servicing of tw_timer_handler. This system isolates application CPUs
dynamically, so what I think it happens is that at some point tw_timer
is armed on a CPU, and it is PINNED to that CPU, meanwhile (before the
60s timeout) such CPU is 'isolated' and the latency sensitive app
started on it. After 60s the timer fires and interrupts the app
generating a spike.

I'm not very familiar with this part of the kernel and from staring
at code for a while I had mixed feeling about the need to keep tw_timer
as TIMER_PINNED. Could you please shed some light on it? Is it a strict
functional requirement or maybe a nice to have performance (locality I'd
guess) improvement? Could we in principle make it !PINNED (so that it
can be moved/queued away and prevent interruptions)?

Thanks a lot in advance!
Juri


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Question on tw_timer TIMER_PINNED
  2023-09-06 11:58 Question on tw_timer TIMER_PINNED Juri Lelli
@ 2023-09-06 12:10 ` Eric Dumazet
  2023-09-06 13:04   ` Juri Lelli
  2023-10-03 14:09   ` Valentin Schneider
  0 siblings, 2 replies; 4+ messages in thread
From: Eric Dumazet @ 2023-09-06 12:10 UTC (permalink / raw)
  To: Juri Lelli; +Cc: LKML, Paolo Abeni, netdev, Valentin Schneider

On Wed, Sep 6, 2023 at 1:58 PM Juri Lelli <juri.lelli@redhat.com> wrote:
>
> Hi Eric,
>
> I'm bothering you with a question about timewait_sock tw_timer, as I
> believe you are one of the last persons touching it sometime ago. Please
> feel free to redirect if I failed to git blame it correctly.
>
> At my end, latency spikes (entering the kernel) have been reported when
> running latency sensitive applications in the field (essentially a
> polling userspace application that doesn't want any interruption at
> all). I think I've been able to track down one of such interruptions to
> the servicing of tw_timer_handler. This system isolates application CPUs
> dynamically, so what I think it happens is that at some point tw_timer
> is armed on a CPU, and it is PINNED to that CPU, meanwhile (before the
> 60s timeout) such CPU is 'isolated' and the latency sensitive app
> started on it. After 60s the timer fires and interrupts the app
> generating a spike.
>
> I'm not very familiar with this part of the kernel and from staring
> at code for a while I had mixed feeling about the need to keep tw_timer
> as TIMER_PINNED. Could you please shed some light on it? Is it a strict
> functional requirement or maybe a nice to have performance (locality I'd
> guess) improvement? Could we in principle make it !PINNED (so that it
> can be moved/queued away and prevent interruptions)?
>

It is a functional requirement in current implementation.

cfac7f836a71 ("tcp/dccp: block bh before arming time_wait timer")
changelog has some details about it.

Can this be changed to non pinned ? Probably, but with some care.

You could simply disable tw completely, it is a best effort mechanism.


> Thanks a lot in advance!
> Juri
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Question on tw_timer TIMER_PINNED
  2023-09-06 12:10 ` Eric Dumazet
@ 2023-09-06 13:04   ` Juri Lelli
  2023-10-03 14:09   ` Valentin Schneider
  1 sibling, 0 replies; 4+ messages in thread
From: Juri Lelli @ 2023-09-06 13:04 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: LKML, Paolo Abeni, netdev, Valentin Schneider

On 06/09/23 14:10, Eric Dumazet wrote:
> On Wed, Sep 6, 2023 at 1:58 PM Juri Lelli <juri.lelli@redhat.com> wrote:
> >
> > Hi Eric,
> >
> > I'm bothering you with a question about timewait_sock tw_timer, as I
> > believe you are one of the last persons touching it sometime ago. Please
> > feel free to redirect if I failed to git blame it correctly.
> >
> > At my end, latency spikes (entering the kernel) have been reported when
> > running latency sensitive applications in the field (essentially a
> > polling userspace application that doesn't want any interruption at
> > all). I think I've been able to track down one of such interruptions to
> > the servicing of tw_timer_handler. This system isolates application CPUs
> > dynamically, so what I think it happens is that at some point tw_timer
> > is armed on a CPU, and it is PINNED to that CPU, meanwhile (before the
> > 60s timeout) such CPU is 'isolated' and the latency sensitive app
> > started on it. After 60s the timer fires and interrupts the app
> > generating a spike.
> >
> > I'm not very familiar with this part of the kernel and from staring
> > at code for a while I had mixed feeling about the need to keep tw_timer
> > as TIMER_PINNED. Could you please shed some light on it? Is it a strict
> > functional requirement or maybe a nice to have performance (locality I'd
> > guess) improvement? Could we in principle make it !PINNED (so that it
> > can be moved/queued away and prevent interruptions)?
> >
> 
> It is a functional requirement in current implementation.
> 
> cfac7f836a71 ("tcp/dccp: block bh before arming time_wait timer")
> changelog has some details about it.

Thanks for the reference.

> Can this be changed to non pinned ? Probably, but with some care.

I see. I will need to ponder about this.

> You could simply disable tw completely, it is a best effort mechanism.

But, first I think we are going to experiment with this route.

Thanks a lot for the super quick reply!

Best,
Juri


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Question on tw_timer TIMER_PINNED
  2023-09-06 12:10 ` Eric Dumazet
  2023-09-06 13:04   ` Juri Lelli
@ 2023-10-03 14:09   ` Valentin Schneider
  1 sibling, 0 replies; 4+ messages in thread
From: Valentin Schneider @ 2023-10-03 14:09 UTC (permalink / raw)
  To: Eric Dumazet, Juri Lelli; +Cc: LKML, Paolo Abeni, netdev

Hi,

On 06/09/23 14:10, Eric Dumazet wrote:
> On Wed, Sep 6, 2023 at 1:58 PM Juri Lelli <juri.lelli@redhat.com> wrote:
>>
>> Hi Eric,
>>
>> I'm bothering you with a question about timewait_sock tw_timer, as I
>> believe you are one of the last persons touching it sometime ago. Please
>> feel free to redirect if I failed to git blame it correctly.
>>
>> At my end, latency spikes (entering the kernel) have been reported when
>> running latency sensitive applications in the field (essentially a
>> polling userspace application that doesn't want any interruption at
>> all). I think I've been able to track down one of such interruptions to
>> the servicing of tw_timer_handler. This system isolates application CPUs
>> dynamically, so what I think it happens is that at some point tw_timer
>> is armed on a CPU, and it is PINNED to that CPU, meanwhile (before the
>> 60s timeout) such CPU is 'isolated' and the latency sensitive app
>> started on it. After 60s the timer fires and interrupts the app
>> generating a spike.
>>
>> I'm not very familiar with this part of the kernel and from staring
>> at code for a while I had mixed feeling about the need to keep tw_timer
>> as TIMER_PINNED. Could you please shed some light on it? Is it a strict
>> functional requirement or maybe a nice to have performance (locality I'd
>> guess) improvement? Could we in principle make it !PINNED (so that it
>> can be moved/queued away and prevent interruptions)?
>>
>
> It is a functional requirement in current implementation.
>
> cfac7f836a71 ("tcp/dccp: block bh before arming time_wait timer")
> changelog has some details about it.
>
> Can this be changed to non pinned ? Probably, but with some care.
>
> You could simply disable tw completely, it is a best effort mechanism.
>

So it's looking like doing that is not acceptable for our use-case, as
we still want timewait sockets for the traffic happening on the
housekepeing (non-isolated) CPUs.


I had a look at these commits to figure out what it would take to make it
not pinned:

  cfac7f836a71 ("tcp/dccp: block bh before arming time_wait timer")
  ed2e92394589 ("tcp/dccp: fix timewait races in timer handling")

and I'm struggling to understand why we want the timer to be armed before
inet_twsk_hashdance(). I found this discussion on LKML:

  https://lore.kernel.org/all/56941035.9040000@fastly.com/

And I can see that __inet_lookup_established() and tw_timer_handler()
both operate on __tw_common.skc_nulls_node and __tw_common.skc_refcnt, but:
- the timer has its own count in the refcount
- sk_nulls_for_each_rcu() is (on paper) safe to run concurrently with
  tw_timer_handler
  `\
    inet_twsk_kill()
    `\
      sk_nulls_del_node_init_rcu()

So I'm thinking we could let the timer be armed after the *hashdance(), so
it wouldn't need to be pinned anymore, but that's pretty much a revert of
  ed2e92394589 ("tcp/dccp: fix timewait races in timer handling")
which fixed a race.

Now this is the first time I poke my nose into this area and I can't
properly reason how said race is laid out. I'm sorry for asking about such
an old commit, but would you have any pointers on that?

Thanks


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-10-03 14:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-06 11:58 Question on tw_timer TIMER_PINNED Juri Lelli
2023-09-06 12:10 ` Eric Dumazet
2023-09-06 13:04   ` Juri Lelli
2023-10-03 14:09   ` Valentin Schneider

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.