netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* igc: missing HW timestamps at TX
@ 2022-07-17 14:42 Ferenc Fejes
  2022-07-17 14:46 ` Ferenc Fejes
  0 siblings, 1 reply; 21+ messages in thread
From: Ferenc Fejes @ 2022-07-17 14:42 UTC (permalink / raw)
  To: netdev; +Cc: Vinicius Costa Gomes, jesse.brandeburg, anthony.l.nguyen

Hello!

There is a short story how did I end up here: [0]. We did a few measuerements with isochron and noticed many  issing timestamps. Running the isochron on the same CPU core as the igc PTP worker, the situation gets much better, yet we still have missing timestamp. Then as ladimir Oltean pointed out the problem is actually documented, so at high TX packet rates the HW timestamps migh be skipped. There is also a couter for that, so that something expected.


[0] https://github.com/vladimiroltean/isochron/issues/13

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: igc: missing HW timestamps at TX
  2022-07-17 14:42 igc: missing HW timestamps at TX Ferenc Fejes
@ 2022-07-17 14:46 ` Ferenc Fejes
  2022-07-18 14:46   ` Vinicius Costa Gomes
  0 siblings, 1 reply; 21+ messages in thread
From: Ferenc Fejes @ 2022-07-17 14:46 UTC (permalink / raw)
  To: netdev; +Cc: Vinicius Costa Gomes, jesse.brandeburg, anthony.l.nguyen

(Ctrl+Enter'd by mistake)

My question here: is there anything I can quickly try to avoid that behavior? Even when I send only a few (like 10) packets but on fast rate (5us between packets) I get missing TX HW timestamps. The receive side looks much more roboust, I cannot noticed missing HW timestamps there.

Thanks,
Ferenc

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: igc: missing HW timestamps at TX
  2022-07-17 14:46 ` Ferenc Fejes
@ 2022-07-18 14:46   ` Vinicius Costa Gomes
  2022-07-19  7:40     ` Ferenc Fejes
  0 siblings, 1 reply; 21+ messages in thread
From: Vinicius Costa Gomes @ 2022-07-18 14:46 UTC (permalink / raw)
  To: Ferenc Fejes, netdev; +Cc: jesse.brandeburg, anthony.l.nguyen

Hi Ferenc,

Ferenc Fejes <ferenc.fejes@ericsson.com> writes:

> (Ctrl+Enter'd by mistake)
>
> My question here: is there anything I can quickly try to avoid that
> behavior? Even when I send only a few (like 10) packets but on fast
> rate (5us between packets) I get missing TX HW timestamps. The receive
> side looks much more roboust, I cannot noticed missing HW timestamps
> there.

There's a limitation in the i225/i226 in the number of "in flight" TX
timestamps they are able to handle. The hardware has 4 sets of registers
to handle timestamps.

There's an aditional issue that the driver as it is right now, only uses
one set of those registers.

I have one only briefly tested series that enables the driver to use the
full set of TX timestamp registers. Another reason that it was not
proposed yet is that I still have to benchmark it and see what is the
performance impact.

If you are feeling adventurous and feel like helping test it, here is
the link:

https://github.com/vcgomes/net-next/tree/igc-multiple-tstamp-timers-lock-new


Cheers,
-- 
Vinicius

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: igc: missing HW timestamps at TX
  2022-07-18 14:46   ` Vinicius Costa Gomes
@ 2022-07-19  7:40     ` Ferenc Fejes
  2022-08-11  8:54       ` Ferenc Fejes
  0 siblings, 1 reply; 21+ messages in thread
From: Ferenc Fejes @ 2022-07-19  7:40 UTC (permalink / raw)
  To: vinicius.gomes, netdev; +Cc: marton12050, peti.antal99

Hi Vinicius!

On Mon, 2022-07-18 at 11:46 -0300, Vinicius Costa Gomes wrote:
> Hi Ferenc,
> 
> Ferenc Fejes <ferenc.fejes@ericsson.com> writes:
> 
> > (Ctrl+Enter'd by mistake)
> > 
> > My question here: is there anything I can quickly try to avoid that
> > behavior? Even when I send only a few (like 10) packets but on fast
> > rate (5us between packets) I get missing TX HW timestamps. The
> > receive
> > side looks much more roboust, I cannot noticed missing HW
> > timestamps
> > there.
> 
> There's a limitation in the i225/i226 in the number of "in flight" TX
> timestamps they are able to handle. The hardware has 4 sets of
> registers
> to handle timestamps.
> 
> There's an aditional issue that the driver as it is right now, only
> uses
> one set of those registers.
> 
> I have one only briefly tested series that enables the driver to use
> the
> full set of TX timestamp registers. Another reason that it was not
> proposed yet is that I still have to benchmark it and see what is the
> performance impact.

Thank you for the quick reply! I'm glad you already have this series
right off the bat. I'll be back when we done with a quick testing,
hopefully sooner than later.
> 
> If you are feeling adventurous and feel like helping test it, here is
> the link:
> 
> https%3A%2F%2Fgithub.com%2Fvcgomes%2Fnet-next%2Ftree%2Figc-multiple-tstamp-timers-lock-new
> 
> 
> Cheers,

Best,
Ferenc

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: igc: missing HW timestamps at TX
  2022-07-19  7:40     ` Ferenc Fejes
@ 2022-08-11  8:54       ` Ferenc Fejes
  2022-08-11 13:33         ` Vinicius Costa Gomes
  0 siblings, 1 reply; 21+ messages in thread
From: Ferenc Fejes @ 2022-08-11  8:54 UTC (permalink / raw)
  To: vinicius.gomes, netdev; +Cc: marton12050, peti.antal99

Hi Vinicius!

On Tue, 2022-07-19 at 09:40 +0200, Fejes Ferenc wrote:
> Hi Vinicius!
> 
> On Mon, 2022-07-18 at 11:46 -0300, Vinicius Costa Gomes wrote:
> > Hi Ferenc,
> > 
> > Ferenc Fejes <ferenc.fejes@ericsson.com> writes:
> > 
> > > (Ctrl+Enter'd by mistake)
> > > 
> > > My question here: is there anything I can quickly try to avoid
> > > that
> > > behavior? Even when I send only a few (like 10) packets but on
> > > fast
> > > rate (5us between packets) I get missing TX HW timestamps. The
> > > receive
> > > side looks much more roboust, I cannot noticed missing HW
> > > timestamps
> > > there.
> > 
> > There's a limitation in the i225/i226 in the number of "in flight"
> > TX
> > timestamps they are able to handle. The hardware has 4 sets of
> > registers
> > to handle timestamps.
> > 
> > There's an aditional issue that the driver as it is right now, only
> > uses
> > one set of those registers.
> > 
> > I have one only briefly tested series that enables the driver to
> > use
> > the
> > full set of TX timestamp registers. Another reason that it was not
> > proposed yet is that I still have to benchmark it and see what is
> > the
> > performance impact.
> 
> Thank you for the quick reply! I'm glad you already have this series
> right off the bat. I'll be back when we done with a quick testing,
> hopefully sooner than later.

Sorry for the late reply. I had time for a few tests, with the patch.
For my tests it looks much better. I send a packet in every 500us with
isochron-send, TX SW and HW timestamping enabled and for 10000 packets
I see zero lost timestamp. Even for 100000 packets only a few dropped
HW timestamps visible.

With iperf TCP test line-rate achiveable just like without the patch.

> > 
> > If you are feeling adventurous and feel like helping test it, here
> > is
> > the link:
> > 
> > https%3A%2F%2Fgithub.com%2Fvcgomes%2Fnet-next%2Ftree%2Figc-
> > multiple-tstamp-timers-lock-new
> > 

Is there any test in partucular you interested in? My testbed is
configured so I can do some.

> > 
> > Cheers,
> 
> Best,
> Ferenc

Best,
Ferenc


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: igc: missing HW timestamps at TX
  2022-08-11  8:54       ` Ferenc Fejes
@ 2022-08-11 13:33         ` Vinicius Costa Gomes
  2022-08-12 14:13           ` Ferenc Fejes
  0 siblings, 1 reply; 21+ messages in thread
From: Vinicius Costa Gomes @ 2022-08-11 13:33 UTC (permalink / raw)
  To: Ferenc Fejes, netdev; +Cc: marton12050, peti.antal99

Hi Ferenc,

Ferenc Fejes <ferenc.fejes@ericsson.com> writes:

> Hi Vinicius!
>
> On Tue, 2022-07-19 at 09:40 +0200, Fejes Ferenc wrote:
>> Hi Vinicius!
>> 
>> On Mon, 2022-07-18 at 11:46 -0300, Vinicius Costa Gomes wrote:
>> > Hi Ferenc,
>> > 
>> > Ferenc Fejes <ferenc.fejes@ericsson.com> writes:
>> > 
>> > > (Ctrl+Enter'd by mistake)
>> > > 
>> > > My question here: is there anything I can quickly try to avoid
>> > > that
>> > > behavior? Even when I send only a few (like 10) packets but on
>> > > fast
>> > > rate (5us between packets) I get missing TX HW timestamps. The
>> > > receive
>> > > side looks much more roboust, I cannot noticed missing HW
>> > > timestamps
>> > > there.
>> > 
>> > There's a limitation in the i225/i226 in the number of "in flight"
>> > TX
>> > timestamps they are able to handle. The hardware has 4 sets of
>> > registers
>> > to handle timestamps.
>> > 
>> > There's an aditional issue that the driver as it is right now, only
>> > uses
>> > one set of those registers.
>> > 
>> > I have one only briefly tested series that enables the driver to
>> > use
>> > the
>> > full set of TX timestamp registers. Another reason that it was not
>> > proposed yet is that I still have to benchmark it and see what is
>> > the
>> > performance impact.
>> 
>> Thank you for the quick reply! I'm glad you already have this series
>> right off the bat. I'll be back when we done with a quick testing,
>> hopefully sooner than later.
>
> Sorry for the late reply. I had time for a few tests, with the patch.
> For my tests it looks much better. I send a packet in every 500us with
> isochron-send, TX SW and HW timestamping enabled and for 10000 packets
> I see zero lost timestamp. Even for 100000 packets only a few dropped
> HW timestamps visible.
>
> With iperf TCP test line-rate achiveable just like without the patch.
>

That's very good to know.

>> > 
>> > If you are feeling adventurous and feel like helping test it, here
>> > is
>> > the link:
>> > 
>> > https%3A%2F%2Fgithub.com%2Fvcgomes%2Fnet-next%2Ftree%2Figc-
>> > multiple-tstamp-timers-lock-new
>> > 
>
> Is there any test in partucular you interested in? My testbed is
> configured so I can do some.
>

The only thing I am worried about is, if in the "dropped" HW timestamps
case, if all the timestamp slots are indeed full, or if there's any bug
and we missed one timestamp.

Can you verify that for for every dropped HW timestamp in your
application, can you see that 'tx_hwtstamp_skipped' (from 'ethtool -S')
increases everytime the drop happens? Seeing if 'tx_hwtstamp_timeouts'
also increases would be useful as well.

If for every drop there's one 'tx_hwtstamp_skipped' increment, then it
means that the driver is doing its best, and the workload is requesting
more timestamps than the system is able to handle.

If only 'tx_hwtstamp_timeouts' increases then it's possible that there
could be a bug hiding still.

>> > 
>> > Cheers,
>> 
>> Best,
>> Ferenc
>
> Best,
> Ferenc
>

Cheers,
-- 
Vinicius

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: igc: missing HW timestamps at TX
  2022-08-11 13:33         ` Vinicius Costa Gomes
@ 2022-08-12 14:13           ` Ferenc Fejes
  2022-08-12 20:16             ` Vladimir Oltean
  0 siblings, 1 reply; 21+ messages in thread
From: Ferenc Fejes @ 2022-08-12 14:13 UTC (permalink / raw)
  To: vinicius.gomes, netdev; +Cc: marton12050, peti.antal99, vladimir.oltean

Hi Vinicius!

On Thu, 2022-08-11 at 10:33 -0300, Vinicius Costa Gomes wrote:
> Hi Ferenc,
> 
> > 
> > With iperf TCP test line-rate achiveable just like without the
> > patch.
> > 
> 
> That's very good to know.
> 
> > > > 
> > > > If you are feeling adventurous and feel like helping test it,
> > > > here
> > > > is
> > > > the link:
> > > > 
> > > > https%3A%2F%2Fgithub.com%2Fvcgomes%2Fnet-next%2Ftree%2Figc-
> > > > multiple-tstamp-timers-lock-new
> > > > 
> > 
> > Is there any test in partucular you interested in? My testbed is
> > configured so I can do some.
> > 
> 
> The only thing I am worried about is, if in the "dropped" HW
> timestamps
> case, if all the timestamp slots are indeed full, or if there's any
> bug
> and we missed one timestamp.
> 
> Can you verify that for for every dropped HW timestamp in your
> application, can you see that 'tx_hwtstamp_skipped' (from 'ethtool -
> S')
> increases everytime the drop happens? Seeing if
> 'tx_hwtstamp_timeouts'
> also increases would be useful as well.

Yes, its increasing. Let me illustrate it:

Ethtool before the measurement:
$ ethtool -S enp3s0 | grep hwtstamp
     tx_hwtstamp_timeouts: 1
     tx_hwtstamp_skipped: 409
     rx_hwtstamp_cleared: 0

Measurement:
$ sudo isochron send -i enp3s0 -s 64 -c 0.0000005 --client 10.0.0.20 --
num-frames 10000000 -F isochron.dat --sync-threshold 2000 -M $((1 <<
2)) --sched-fifo --sched-priority 99

(note: isochron would try to send a packet in every 500ns, but the rate
actually limited by the sleep/syscall latency so its sending packets in
about every 15-20us)

Output:
isochron[1660315948.335677744]: local ptpmon         -7 sysmon        -
25 receiver ptpmon          0 sysmon          4
Timed out waiting for TX timestamps, 10 timestamps unacknowledged
seqid 3441 missing timestamps: hw, 
seqid 3442 missing timestamps: hw, 
seqid 3443 missing timestamps: hw, 
seqid 3449 missing timestamps: hw, 
seqid 5530 missing timestamps: hw, 
seqid 5531 missing timestamps: hw, 
seqid 7597 missing timestamps: hw, 
seqid 7598 missing timestamps: hw, 
seqid 7599 missing timestamps: hw, 
seqid 7605 missing timestamps: hw, 


Ethtool after the measurement:
ethtool -S enp3s0 | grep hwtstamp
     tx_hwtstamp_timeouts: 1
     tx_hwtstamp_skipped: 419
     rx_hwtstamp_cleared: 0

Which is inline with what the isochron see.

But thats only happens if I forcingly put the affinity of the sender
different CPU core than the ptp worker of the igc. If those running on
the same core I doesnt lost any HW timestam even for 10 million
packets. Worth to mention actually I see many lost timestamp which
confused me a little bit but those are lost because of the small
MSG_ERRQUEUE. When I increased that from few kbytes to 20 mbytes I got
every timestamp successfully.

> 
> If for every drop there's one 'tx_hwtstamp_skipped' increment, then
> it
> means that the driver is doing its best, and the workload is
> requesting
> more timestamps than the system is able to handle.
> 
> If only 'tx_hwtstamp_timeouts' increases then it's possible that
> there
> could be a bug hiding still.

On the other hand I'm little bit confused with the ETF behavior.
Without HW offload, I lost almost every timestamp even with large (one
packet in every 500 us) sending rate and with HW offload I still lost a
lot. But that migh be beyond the igc, and some config issue on my setup
(I have to apply mqprio and do the PTP sync on default priority and
data packets with SO_TXTIME cmsg sent to ETF at prio 2). Does the
tx_queue affect the timestamping?

CC Vladimir, the author of isochron.

> 
> > > > 
> > > > Cheers,
> > > 
> > > Best,
> > > Ferenc

Best,
Ferenc


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: igc: missing HW timestamps at TX
  2022-08-12 14:13           ` Ferenc Fejes
@ 2022-08-12 20:16             ` Vladimir Oltean
  2022-08-15  6:47               ` Ferenc Fejes
  2022-08-15 21:39               ` Vinicius Costa Gomes
  0 siblings, 2 replies; 21+ messages in thread
From: Vladimir Oltean @ 2022-08-12 20:16 UTC (permalink / raw)
  To: Ferenc Fejes; +Cc: vinicius.gomes, netdev, marton12050, peti.antal99

Hi Ferenc,

On Fri, Aug 12, 2022 at 02:13:52PM +0000, Ferenc Fejes wrote:
> Ethtool after the measurement:
> ethtool -S enp3s0 | grep hwtstamp
>      tx_hwtstamp_timeouts: 1
>      tx_hwtstamp_skipped: 419
>      rx_hwtstamp_cleared: 0
> 
> Which is inline with what the isochron see.
> 
> But thats only happens if I forcingly put the affinity of the sender
> different CPU core than the ptp worker of the igc. If those running on
> the same core I doesnt lost any HW timestam even for 10 million
> packets. Worth to mention actually I see many lost timestamp which
> confused me a little bit but those are lost because of the small
> MSG_ERRQUEUE. When I increased that from few kbytes to 20 mbytes I got
> every timestamp successfully.

I have zero knowledge of Intel hardware. That being said, I've looked at
the driver for about 5 minutes, and the design seems to be that where
the timestamp is not available in band from the TX completion NAPI as
part of BD ring metadata, but rather, a TX timestamp complete is raised,
and this results in igc_tsync_interrupt() being called. However there
are 2 paths in the driver which call this, one is igc_msix_other() and
the other is igc_intr_msi() - this latter one is also the interrupt that
triggers the napi_schedule(). It would be interesting to see exactly
which MSI-X interrupt is the one that triggers igc_tsync_interrupt().

It's also interesting to understand what you mean precisely by affinity
of isochron. It has a main thread (used for PTP monitoring and for TX
timestamps) and a pthread for the sending process. The main thread's
affinity is controlled via taskset; the sender thread via --cpu-mask.
Is it the *sender* thread the one who makes the TX timestamps be
available quicker to user space, rather than the main thread, who
actually dequeues them from the error queue? If so, it might be because
the TX packets will trigger the TX completion interrupt, and this will
accelerate the processing of the TX timestamps. I'm unclear what happens
when the sender thread runs on a different CPU core than the TX
timestamp thread.

Your need to increase the SO_RCVBUF is also interesting. Keep in mind
that isochron at that scheduling priority and policy is a CPU hog, and
that igc_tsync_interrupt() calls schedule_work() - which uses the system
workqueue that runs at a very low priority (this begs the question, how
do you know how to match the CPU on which isochron runs with the CPU of
the system workqueue?). So isochron, high priority, competes for CPU
time with igc_ptp_tx_work(), low priority. One produces data, one
consumes it; queues are bound to get full at some point.
On the other hand, other drivers use the ptp_aux_kworker() that the PTP
core creates specifically for this purpose. It is a dedicated kthread
whose scheduling policy and priority can be adjusted using chrt. I think
it would be interesting to see how things behave when you replace
schedule_work() with ptp_schedule_worker().

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: igc: missing HW timestamps at TX
  2022-08-12 20:16             ` Vladimir Oltean
@ 2022-08-15  6:47               ` Ferenc Fejes
  2022-08-15 22:04                 ` Vinicius Costa Gomes
  2022-08-16  9:33                 ` Vladimir Oltean
  2022-08-15 21:39               ` Vinicius Costa Gomes
  1 sibling, 2 replies; 21+ messages in thread
From: Ferenc Fejes @ 2022-08-15  6:47 UTC (permalink / raw)
  To: vladimir.oltean; +Cc: vinicius.gomes, marton12050, netdev, peti.antal99

Hi Vladimir!

Thank you for the reply!

On Fri, 2022-08-12 at 20:16 +0000, Vladimir Oltean wrote:
> Hi Ferenc,
> 
> On Fri, Aug 12, 2022 at 02:13:52PM +0000, Ferenc Fejes wrote:
> > Ethtool after the measurement:
> > ethtool -S enp3s0 | grep hwtstamp
> >      tx_hwtstamp_timeouts: 1
> >      tx_hwtstamp_skipped: 419
> >      rx_hwtstamp_cleared: 0
> > 
> > Which is inline with what the isochron see.
> > 
> > But thats only happens if I forcingly put the affinity of the
> > sender
> > different CPU core than the ptp worker of the igc. If those running
> > on
> > the same core I doesnt lost any HW timestam even for 10 million
> > packets. Worth to mention actually I see many lost timestamp which
> > confused me a little bit but those are lost because of the small
> > MSG_ERRQUEUE. When I increased that from few kbytes to 20 mbytes I
> > got
> > every timestamp successfully.
> 
> I have zero knowledge of Intel hardware. That being said, I've looked
> at
> the driver for about 5 minutes, and the design seems to be that where
> the timestamp is not available in band from the TX completion NAPI as
> part of BD ring metadata, but rather, a TX timestamp complete is
> raised,
> and this results in igc_tsync_interrupt() being called. However there
> are 2 paths in the driver which call this, one is igc_msix_other()
> and
> the other is igc_intr_msi() - this latter one is also the interrupt
> that
> triggers the napi_schedule(). It would be interesting to see exactly
> which MSI-X interrupt is the one that triggers igc_tsync_interrupt().
> 
> It's also interesting to understand what you mean precisely by
> affinity
> of isochron. It has a main thread (used for PTP monitoring and for TX
> timestamps) and a pthread for the sending process. The main thread's
> affinity is controlled via taskset; the sender thread via --cpu-mask.

I just played with those a little. Looks like the --cpu-mask the one it
helps in my case. For example I checked the CPU core of the
igc_ptp_tx_work:

# bpftrace -e 'kprobe:igc_ptp_tx_work { printf("%d\n", cpu); exit(); }'
Attaching 1 probe...
0

Looks like its running on core 0, so I run the isochro:
taskset -c 0 isochron ... --cpu-mask $((1 << 0)) - no lost timestamps
taskset -c 1 isochron ... --cpu-mask $((1 << 0)) - no lost timestamps
taskset -c 0 isochron ... --cpu-mask $((1 << 1)) - losing timestamps
taskset -c 1 isochron ... --cpu-mask $((1 << 1)) - losing timestamps

> Is it the *sender* thread the one who makes the TX timestamps be
> available quicker to user space, rather than the main thread, who
> actually dequeues them from the error queue? If so, it might be
> because
> the TX packets will trigger the TX completion interrupt, and this
> will
> accelerate the processing of the TX timestamps. I'm unclear what
> happens
> when the sender thread runs on a different CPU core than the TX
> timestamp thread.

Well I have no clue unfortunately but your theory makes sense. Vinicius
might help us out here.

> 
> Your need to increase the SO_RCVBUF is also interesting. Keep in mind
> that isochron at that scheduling priority and policy is a CPU hog,
> and
> that igc_tsync_interrupt() calls schedule_work() - which uses the
> system
> workqueue that runs at a very low priority (this begs the question,
> how
> do you know how to match the CPU on which isochron runs with the CPU
> of
> the system workqueue?). So isochron, high priority, competes for CPU
> time with igc_ptp_tx_work(), low priority. One produces data, one
> consumes it; queues are bound to get full at some point.

Maybe this is what helps in my case? With funccount tracer I checked
that when the sender thread and igc_ptp_tx_work running on the same
core, the worker called exactly as many times as many packets I sent.

However if the worker running on different core, funccount show some
random number (less than the packets sent) and in that case I also lost
timestamps.

I'm not sure what happening here, maybe the "deferred" scheduling of
the worker sometimes too slow to enqueue every timestamp into the error
queue? And because I force both the sender and worker to the same core,
they executed in order (my system pretty much idle other than these
processes) introducing some sort of throtthling to the timestamp
processing?

> On the other hand, other drivers use the ptp_aux_kworker() that the
> PTP
> core creates specifically for this purpose. It is a dedicated kthread
> whose scheduling policy and priority can be adjusted using chrt. I
> think
> it would be interesting to see how things behave when you replace
> schedule_work() with ptp_schedule_worker().

I will try to take a look into that. Anyway, thank you for the
insights, I'm happy with the way how it works now (at least I can do my
experiments with that).

Best,
Ferenc


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: igc: missing HW timestamps at TX
  2022-08-12 20:16             ` Vladimir Oltean
  2022-08-15  6:47               ` Ferenc Fejes
@ 2022-08-15 21:39               ` Vinicius Costa Gomes
  2022-08-15 22:26                 ` Vladimir Oltean
  1 sibling, 1 reply; 21+ messages in thread
From: Vinicius Costa Gomes @ 2022-08-15 21:39 UTC (permalink / raw)
  To: Vladimir Oltean, Ferenc Fejes; +Cc: netdev, marton12050, peti.antal99

Hi Vladimir,

Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> Hi Ferenc,
>
> On Fri, Aug 12, 2022 at 02:13:52PM +0000, Ferenc Fejes wrote:
>> Ethtool after the measurement:
>> ethtool -S enp3s0 | grep hwtstamp
>>      tx_hwtstamp_timeouts: 1
>>      tx_hwtstamp_skipped: 419
>>      rx_hwtstamp_cleared: 0
>> 
>> Which is inline with what the isochron see.
>> 
>> But thats only happens if I forcingly put the affinity of the sender
>> different CPU core than the ptp worker of the igc. If those running on
>> the same core I doesnt lost any HW timestam even for 10 million
>> packets. Worth to mention actually I see many lost timestamp which
>> confused me a little bit but those are lost because of the small
>> MSG_ERRQUEUE. When I increased that from few kbytes to 20 mbytes I got
>> every timestamp successfully.
>
> I have zero knowledge of Intel hardware. That being said, I've looked at
> the driver for about 5 minutes, and the design seems to be that where
> the timestamp is not available in band from the TX completion NAPI as
> part of BD ring metadata, but rather, a TX timestamp complete is raised,
> and this results in igc_tsync_interrupt() being called. However there
> are 2 paths in the driver which call this, one is igc_msix_other() and
> the other is igc_intr_msi() - this latter one is also the interrupt that
> triggers the napi_schedule(). It would be interesting to see exactly
> which MSI-X interrupt is the one that triggers igc_tsync_interrupt().

Just some aditional information (note that I know very little about
interrupt internal workings), igc_intr_msi() is called when MSI-X is not
enabled (i.e. "MSI only" system), igc_msix_other() is called when MSI-X
is available. When MSI-X is available, i225/i226 sets up a separate
interrupt handler for "general" events, the TX timestamp being available
to be read from the registers is one those events.

>
> It's also interesting to understand what you mean precisely by affinity
> of isochron. It has a main thread (used for PTP monitoring and for TX
> timestamps) and a pthread for the sending process. The main thread's
> affinity is controlled via taskset; the sender thread via --cpu-mask.
> Is it the *sender* thread the one who makes the TX timestamps be
> available quicker to user space, rather than the main thread, who
> actually dequeues them from the error queue? If so, it might be because
> the TX packets will trigger the TX completion interrupt, and this will
> accelerate the processing of the TX timestamps. I'm unclear what happens
> when the sender thread runs on a different CPU core than the TX
> timestamp thread.
>
> Your need to increase the SO_RCVBUF is also interesting. Keep in mind
> that isochron at that scheduling priority and policy is a CPU hog, and
> that igc_tsync_interrupt() calls schedule_work() - which uses the system
> workqueue that runs at a very low priority (this begs the question, how
> do you know how to match the CPU on which isochron runs with the CPU of
> the system workqueue?). So isochron, high priority, competes for CPU
> time with igc_ptp_tx_work(), low priority. One produces data, one
> consumes it; queues are bound to get full at some point.
> On the other hand, other drivers use the ptp_aux_kworker() that the PTP
> core creates specifically for this purpose. It is a dedicated kthread
> whose scheduling policy and priority can be adjusted using chrt. I think
> it would be interesting to see how things behave when you replace
> schedule_work() with ptp_schedule_worker().

I was planning to do the conversion to use the PTP aux worker thread at
some point, perhaps this is the "excuse" I was looking for.


Cheers,
-- 
Vinicius

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: igc: missing HW timestamps at TX
  2022-08-15  6:47               ` Ferenc Fejes
@ 2022-08-15 22:04                 ` Vinicius Costa Gomes
  2022-08-16  9:33                 ` Vladimir Oltean
  1 sibling, 0 replies; 21+ messages in thread
From: Vinicius Costa Gomes @ 2022-08-15 22:04 UTC (permalink / raw)
  To: Ferenc Fejes, vladimir.oltean; +Cc: marton12050, netdev, peti.antal99

Hi Ferenc,

Ferenc Fejes <ferenc.fejes@ericsson.com> writes:

> Hi Vladimir!
>
> Thank you for the reply!
>
> On Fri, 2022-08-12 at 20:16 +0000, Vladimir Oltean wrote:
>> Hi Ferenc,
>> 
>> On Fri, Aug 12, 2022 at 02:13:52PM +0000, Ferenc Fejes wrote:
>> > Ethtool after the measurement:
>> > ethtool -S enp3s0 | grep hwtstamp
>> >      tx_hwtstamp_timeouts: 1
>> >      tx_hwtstamp_skipped: 419
>> >      rx_hwtstamp_cleared: 0
>> > 
>> > Which is inline with what the isochron see.
>> > 
>> > But thats only happens if I forcingly put the affinity of the
>> > sender
>> > different CPU core than the ptp worker of the igc. If those running
>> > on
>> > the same core I doesnt lost any HW timestam even for 10 million
>> > packets. Worth to mention actually I see many lost timestamp which
>> > confused me a little bit but those are lost because of the small
>> > MSG_ERRQUEUE. When I increased that from few kbytes to 20 mbytes I
>> > got
>> > every timestamp successfully.
>> 
>> I have zero knowledge of Intel hardware. That being said, I've looked
>> at
>> the driver for about 5 minutes, and the design seems to be that where
>> the timestamp is not available in band from the TX completion NAPI as
>> part of BD ring metadata, but rather, a TX timestamp complete is
>> raised,
>> and this results in igc_tsync_interrupt() being called. However there
>> are 2 paths in the driver which call this, one is igc_msix_other()
>> and
>> the other is igc_intr_msi() - this latter one is also the interrupt
>> that
>> triggers the napi_schedule(). It would be interesting to see exactly
>> which MSI-X interrupt is the one that triggers igc_tsync_interrupt().
>> 
>> It's also interesting to understand what you mean precisely by
>> affinity
>> of isochron. It has a main thread (used for PTP monitoring and for TX
>> timestamps) and a pthread for the sending process. The main thread's
>> affinity is controlled via taskset; the sender thread via --cpu-mask.
>
> I just played with those a little. Looks like the --cpu-mask the one it
> helps in my case. For example I checked the CPU core of the
> igc_ptp_tx_work:
>
> # bpftrace -e 'kprobe:igc_ptp_tx_work { printf("%d\n", cpu); exit(); }'
> Attaching 1 probe...
> 0
>
> Looks like its running on core 0, so I run the isochro:
> taskset -c 0 isochron ... --cpu-mask $((1 << 0)) - no lost timestamps
> taskset -c 1 isochron ... --cpu-mask $((1 << 0)) - no lost timestamps
> taskset -c 0 isochron ... --cpu-mask $((1 << 1)) - losing timestamps
> taskset -c 1 isochron ... --cpu-mask $((1 << 1)) - losing timestamps
>
>> Is it the *sender* thread the one who makes the TX timestamps be
>> available quicker to user space, rather than the main thread, who
>> actually dequeues them from the error queue? If so, it might be
>> because
>> the TX packets will trigger the TX completion interrupt, and this
>> will
>> accelerate the processing of the TX timestamps. I'm unclear what
>> happens
>> when the sender thread runs on a different CPU core than the TX
>> timestamp thread.
>
> Well I have no clue unfortunately but your theory makes sense. Vinicius
> might help us out here.
>

I think it's more related to the synchronization (and communication)
overhead of having two CPUs involved. And the fact that the workqueue
runs with low priority (as Vladimir pointed out below) on a different
CPU doesn't help.

>> 
>> Your need to increase the SO_RCVBUF is also interesting. Keep in mind
>> that isochron at that scheduling priority and policy is a CPU hog,
>> and
>> that igc_tsync_interrupt() calls schedule_work() - which uses the
>> system
>> workqueue that runs at a very low priority (this begs the question,
>> how
>> do you know how to match the CPU on which isochron runs with the CPU
>> of
>> the system workqueue?). So isochron, high priority, competes for CPU
>> time with igc_ptp_tx_work(), low priority. One produces data, one
>> consumes it; queues are bound to get full at some point.
>
> Maybe this is what helps in my case? With funccount tracer I checked
> that when the sender thread and igc_ptp_tx_work running on the same
> core, the worker called exactly as many times as many packets I sent.
>
> However if the worker running on different core, funccount show some
> random number (less than the packets sent) and in that case I also lost
> timestamps.
>
> I'm not sure what happening here, maybe the "deferred" scheduling of
> the worker sometimes too slow to enqueue every timestamp into the error
> queue? And because I force both the sender and worker to the same core,
> they executed in order (my system pretty much idle other than these
> processes) introducing some sort of throtthling to the timestamp
> processing?
>
>> On the other hand, other drivers use the ptp_aux_kworker() that the
>> PTP
>> core creates specifically for this purpose. It is a dedicated kthread
>> whose scheduling policy and priority can be adjusted using chrt. I
>> think
>> it would be interesting to see how things behave when you replace
>> schedule_work() with ptp_schedule_worker().
>
> I will try to take a look into that. Anyway, thank you for the
> insights, I'm happy with the way how it works now (at least I can do my
> experiments with that).
>
> Best,
> Ferenc
>

-- 
Vinicius

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: igc: missing HW timestamps at TX
  2022-08-15 21:39               ` Vinicius Costa Gomes
@ 2022-08-15 22:26                 ` Vladimir Oltean
  2022-08-15 23:07                   ` Vinicius Costa Gomes
  0 siblings, 1 reply; 21+ messages in thread
From: Vladimir Oltean @ 2022-08-15 22:26 UTC (permalink / raw)
  To: Vinicius Costa Gomes; +Cc: Ferenc Fejes, netdev, marton12050, peti.antal99

Hi Vinicius,

On Mon, Aug 15, 2022 at 02:39:33PM -0700, Vinicius Costa Gomes wrote:
> Just some aditional information (note that I know very little about
> interrupt internal workings), igc_intr_msi() is called when MSI-X is not
> enabled (i.e. "MSI only" system), igc_msix_other() is called when MSI-X
> is available. When MSI-X is available, i225/i226 sets up a separate
> interrupt handler for "general" events, the TX timestamp being available
> to be read from the registers is one those events.

Thanks for the extra information.

Why is the i225/i226 emitting an interrupt about the availability of a
new TX timestamp, if the igc driver polls for its availability anyway?
In other words, when IGC_TSICR_TXTS is found set, is a TX timestamp
available or is it not? Why does the driver schedule a deferred work
item to retrieve it?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: igc: missing HW timestamps at TX
  2022-08-15 22:26                 ` Vladimir Oltean
@ 2022-08-15 23:07                   ` Vinicius Costa Gomes
  2022-08-16  8:51                     ` Kurt Kanzenbach
  2022-08-16  9:10                     ` Vladimir Oltean
  0 siblings, 2 replies; 21+ messages in thread
From: Vinicius Costa Gomes @ 2022-08-15 23:07 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: Ferenc Fejes, netdev, marton12050, peti.antal99

Hi Vladimir,

Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> Hi Vinicius,
>
> On Mon, Aug 15, 2022 at 02:39:33PM -0700, Vinicius Costa Gomes wrote:
>> Just some aditional information (note that I know very little about
>> interrupt internal workings), igc_intr_msi() is called when MSI-X is not
>> enabled (i.e. "MSI only" system), igc_msix_other() is called when MSI-X
>> is available. When MSI-X is available, i225/i226 sets up a separate
>> interrupt handler for "general" events, the TX timestamp being available
>> to be read from the registers is one those events.
>
> Thanks for the extra information.
>
> Why is the i225/i226 emitting an interrupt about the availability of a
> new TX timestamp, if the igc driver polls for its availability anyway?
> In other words, when IGC_TSICR_TXTS is found set, is a TX timestamp
> available or is it not? Why does the driver schedule a deferred work
> item to retrieve it?

The interrupt that is generated is a general/misc interrupt, we have to
check on the interrupt cause bit that it's something TimeSync related,
and only then, we have to check that it's indeed a TX Timestamp that is
ready. And then, there's another register with some bits saying which
one of the 4 registers for timestamps that is ready. There are a few
levels of indirection, but no polling.

I think your question is more "why there's that workqueue on igc?"/"why
don't you retrieve the TX timestamp 'inline' with the interrupt?", if I
got that right, then, I don't have a good reason, apart from the feeling
that reading all those (5-6?) registers may take too long for a
interrupt handler. And it's something that's being done the same for
most (all?) Intel drivers.

I have a TODO to experiment with removing the workqueue, and retrieving
the TX timestamp in the same context as the interrupt handler, but other
things always come up.


Cheers,
-- 
Vinicius

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: igc: missing HW timestamps at TX
  2022-08-15 23:07                   ` Vinicius Costa Gomes
@ 2022-08-16  8:51                     ` Kurt Kanzenbach
  2022-08-16 20:45                       ` Vinicius Costa Gomes
  2022-08-16  9:10                     ` Vladimir Oltean
  1 sibling, 1 reply; 21+ messages in thread
From: Kurt Kanzenbach @ 2022-08-16  8:51 UTC (permalink / raw)
  To: Vinicius Costa Gomes, Vladimir Oltean
  Cc: Ferenc Fejes, netdev, marton12050, peti.antal99

[-- Attachment #1: Type: text/plain, Size: 1192 bytes --]

Hi Vinicius,

On Mon Aug 15 2022, Vinicius Costa Gomes wrote:
> I think your question is more "why there's that workqueue on igc?"/"why
> don't you retrieve the TX timestamp 'inline' with the interrupt?", if I
> got that right, then, I don't have a good reason, apart from the feeling
> that reading all those (5-6?) registers may take too long for a
> interrupt handler. And it's something that's being done the same for
> most (all?) Intel drivers.

We do have one optimization for igb which attempts to read the Tx
timestamp directly from the ISR. If that's not ready *only* then we
schedule the worker. I do assume igb and igc have the same logic for
retrieving the timestamps here.

The problem with workqueues is that under heavy system load, it might be
deferred and timestamps will be lost. I guess that workqueue was added
because of something like this: 1f6e8178d685 ("igb: Prevent dropped Tx
timestamps via work items and interrupts.").

>
> I have a TODO to experiment with removing the workqueue, and retrieving
> the TX timestamp in the same context as the interrupt handler, but other
> things always come up.

Let me know if you have interest in that igb patch.

Thanks,
Kurt

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: igc: missing HW timestamps at TX
  2022-08-15 23:07                   ` Vinicius Costa Gomes
  2022-08-16  8:51                     ` Kurt Kanzenbach
@ 2022-08-16  9:10                     ` Vladimir Oltean
  2022-08-16 18:11                       ` Vinicius Costa Gomes
  1 sibling, 1 reply; 21+ messages in thread
From: Vladimir Oltean @ 2022-08-16  9:10 UTC (permalink / raw)
  To: Vinicius Costa Gomes; +Cc: Ferenc Fejes, netdev, marton12050, peti.antal99

Hi Vinicius,

On Mon, Aug 15, 2022 at 04:07:13PM -0700, Vinicius Costa Gomes wrote:
> The interrupt that is generated is a general/misc interrupt, we have to
> check on the interrupt cause bit that it's something TimeSync related,
> and only then, we have to check that it's indeed a TX Timestamp that is
> ready. And then, there's another register with some bits saying which
> one of the 4 registers for timestamps that is ready. There are a few
> levels of indirection, but no polling.

I used the word "poll" after being inspired by the code comments:

/**
 * igc_ptp_tx_work
 * @work: pointer to work struct
 *
 * This work function polls the TSYNCTXCTL valid bit to determine when a
 * timestamp has been taken for the current stored skb.
 */

> I think your question is more "why there's that workqueue on igc?"/"why
> don't you retrieve the TX timestamp 'inline' with the interrupt?", if I
> got that right, then, I don't have a good reason, apart from the feeling
> that reading all those (5-6?) registers may take too long for a
> interrupt handler. And it's something that's being done the same for
> most (all?) Intel drivers.

Ok, so basically it is an attempt of making the interrupt handler threaded,
which doesn't run in hardirq context?

Note that this decision makes the igc limitation of "single timestampable
skb in flight" even much worse than it needs to be, because it prolongs
the "in flight" period until the system_wq actually gets to run the work
item we create.

> I have a TODO to experiment with removing the workqueue, and retrieving
> the TX timestamp in the same context as the interrupt handler, but other
> things always come up.
> 
> 
> Cheers,
> -- 
> Vinicius

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: igc: missing HW timestamps at TX
  2022-08-15  6:47               ` Ferenc Fejes
  2022-08-15 22:04                 ` Vinicius Costa Gomes
@ 2022-08-16  9:33                 ` Vladimir Oltean
  2022-08-17  7:44                   ` Ferenc Fejes
  1 sibling, 1 reply; 21+ messages in thread
From: Vladimir Oltean @ 2022-08-16  9:33 UTC (permalink / raw)
  To: Ferenc Fejes; +Cc: vinicius.gomes, marton12050, netdev, peti.antal99

Hi Ferenc,

On Mon, Aug 15, 2022 at 06:47:31AM +0000, Ferenc Fejes wrote:
> I just played with those a little. Looks like the --cpu-mask the one it
> helps in my case. For example I checked the CPU core of the
> igc_ptp_tx_work:
> 
> # bpftrace -e 'kprobe:igc_ptp_tx_work { printf("%d\n", cpu); exit(); }'
> Attaching 1 probe...
> 0

I think this print is slightly irrelevant in the grand scheme, or at
least not very stable. Because schedule_work() is implemented as
"queue_work(system_wq, work)", and queue_work() is implemented as
"queue_work_on(WORK_CPU_UNBOUND, wq, work)", it means that the work item
associated with igc_ptp_tx_work() is not bound to any requested CPU.
So unless the prints are taken from the actual test rather than just
done once before it, which percpu kthread worker executes it from within
the pool might vary.  In turn, __queue_work() selects the CPU based on
raw_smp_processor_id() on which the caller is located (in this case, the
IRQ handler). So it will depend upon the tsync interrupt affinity,
basically.

> 
> Looks like its running on core 0, so I run the isochro:
> taskset -c 0 isochron ... --cpu-mask $((1 << 0)) - no lost timestamps
> taskset -c 1 isochron ... --cpu-mask $((1 << 0)) - no lost timestamps
> taskset -c 0 isochron ... --cpu-mask $((1 << 1)) - losing timestamps
> taskset -c 1 isochron ... --cpu-mask $((1 << 1)) - losing timestamps
(...)
> Maybe this is what helps in my case? With funccount tracer I checked
> that when the sender thread and igc_ptp_tx_work running on the same
> core, the worker called exactly as many times as many packets I sent.
> 
> However if the worker running on different core, funccount show some
> random number (less than the packets sent) and in that case I also lost
> timestamps.

Thanks.

Note that if igc_ptp_tx_work runs well on the same CPU (0) as the
isochron sender thread, but *not* that well on the other CPU,
I think a simple explanation (for now) might have to do with dynamic
frequency scaling of the CPUs (CONFIG_CPU_FREQ). If the CPU is kept busy
by the sender thread, the governor will increase the CPU frequency and
the tsync interrupt will be processed quicker, and this will unclog the
"single skb in flight" limitation quicker. If the CPU is mostly idle and
woken up only from time to time by a tsync interrupt, then the "single
skb in flight" limitation will kick in more often, and the isochron
thread will have its TX timestamp requests silently dropped in that
meantime until the idle CPU ramps up to execute its scheduled work item.

To prove my point you can try to compile a kernel with CONFIG_CPU_FREQ=n.
Makes sense?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: igc: missing HW timestamps at TX
  2022-08-16  9:10                     ` Vladimir Oltean
@ 2022-08-16 18:11                       ` Vinicius Costa Gomes
  0 siblings, 0 replies; 21+ messages in thread
From: Vinicius Costa Gomes @ 2022-08-16 18:11 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: Ferenc Fejes, netdev, marton12050, peti.antal99

Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> Hi Vinicius,
>
> On Mon, Aug 15, 2022 at 04:07:13PM -0700, Vinicius Costa Gomes wrote:
>> The interrupt that is generated is a general/misc interrupt, we have to
>> check on the interrupt cause bit that it's something TimeSync related,
>> and only then, we have to check that it's indeed a TX Timestamp that is
>> ready. And then, there's another register with some bits saying which
>> one of the 4 registers for timestamps that is ready. There are a few
>> levels of indirection, but no polling.
>
> I used the word "poll" after being inspired by the code comments:
>
> /**
>  * igc_ptp_tx_work
>  * @work: pointer to work struct
>  *
>  * This work function polls the TSYNCTXCTL valid bit to determine when a
>  * timestamp has been taken for the current stored skb.
>  */
>

Ah, I see. That comment is misleading/wrong, that was "inherited" from
igb, which has a model that didn't have the TimeSync interrupt cause,
IIRC. Going to fix that.

>> I think your question is more "why there's that workqueue on igc?"/"why
>> don't you retrieve the TX timestamp 'inline' with the interrupt?", if I
>> got that right, then, I don't have a good reason, apart from the feeling
>> that reading all those (5-6?) registers may take too long for a
>> interrupt handler. And it's something that's being done the same for
>> most (all?) Intel drivers.
>
> Ok, so basically it is an attempt of making the interrupt handler threaded,
> which doesn't run in hardirq context?
>

Something like that, I guess. Mixed with "if it ain't broken, don't fix
it", but things are breaking with higher link speeds, different use
cases, more users/interest.

> Note that this decision makes the igc limitation of "single timestampable
> skb in flight" even much worse than it needs to be, because it prolongs
> the "in flight" period until the system_wq actually gets to run the work
> item we create.
>

Yes, no disagreement from my side.

>> I have a TODO to experiment with removing the workqueue, and retrieving
>> the TX timestamp in the same context as the interrupt handler, but other
>> things always come up.
>> 
>> 
>> Cheers,
>> -- 
>> Vinicius


Cheers,
-- 
Vinicius

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: igc: missing HW timestamps at TX
  2022-08-16  8:51                     ` Kurt Kanzenbach
@ 2022-08-16 20:45                       ` Vinicius Costa Gomes
  2022-08-17  6:10                         ` Kurt Kanzenbach
  0 siblings, 1 reply; 21+ messages in thread
From: Vinicius Costa Gomes @ 2022-08-16 20:45 UTC (permalink / raw)
  To: Kurt Kanzenbach, Vladimir Oltean
  Cc: Ferenc Fejes, netdev, marton12050, peti.antal99

Kurt Kanzenbach <kurt@linutronix.de> writes:

> Hi Vinicius,
>
> On Mon Aug 15 2022, Vinicius Costa Gomes wrote:
>> I think your question is more "why there's that workqueue on igc?"/"why
>> don't you retrieve the TX timestamp 'inline' with the interrupt?", if I
>> got that right, then, I don't have a good reason, apart from the feeling
>> that reading all those (5-6?) registers may take too long for a
>> interrupt handler. And it's something that's being done the same for
>> most (all?) Intel drivers.
>
> We do have one optimization for igb which attempts to read the Tx
> timestamp directly from the ISR. If that's not ready *only* then we
> schedule the worker. I do assume igb and igc have the same logic for
> retrieving the timestamps here.
>

That seems a sensible approach. And yes, the timestamping logic is the
same.

> The problem with workqueues is that under heavy system load, it might be
> deferred and timestamps will be lost. I guess that workqueue was added
> because of something like this: 1f6e8178d685 ("igb: Prevent dropped Tx
> timestamps via work items and interrupts.").
>
>>
>> I have a TODO to experiment with removing the workqueue, and retrieving
>> the TX timestamp in the same context as the interrupt handler, but other
>> things always come up.
>
> Let me know if you have interest in that igb patch.

That would be great! Thanks.

>
> Thanks,
> Kurt


Cheers,
-- 
Vinicius

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: igc: missing HW timestamps at TX
  2022-08-16 20:45                       ` Vinicius Costa Gomes
@ 2022-08-17  6:10                         ` Kurt Kanzenbach
  2022-08-17 19:24                           ` Vinicius Costa Gomes
  0 siblings, 1 reply; 21+ messages in thread
From: Kurt Kanzenbach @ 2022-08-17  6:10 UTC (permalink / raw)
  To: Vinicius Costa Gomes, Vladimir Oltean
  Cc: Ferenc Fejes, netdev, marton12050, peti.antal99,
	Sebastian Andrzej Siewior

[-- Attachment #1: Type: text/plain, Size: 6123 bytes --]

On Tue Aug 16 2022, Vinicius Costa Gomes wrote:
> Kurt Kanzenbach <kurt@linutronix.de> writes:
>
>> Hi Vinicius,
>>
>> On Mon Aug 15 2022, Vinicius Costa Gomes wrote:
>>> I think your question is more "why there's that workqueue on igc?"/"why
>>> don't you retrieve the TX timestamp 'inline' with the interrupt?", if I
>>> got that right, then, I don't have a good reason, apart from the feeling
>>> that reading all those (5-6?) registers may take too long for a
>>> interrupt handler. And it's something that's being done the same for
>>> most (all?) Intel drivers.
>>
>> We do have one optimization for igb which attempts to read the Tx
>> timestamp directly from the ISR. If that's not ready *only* then we
>> schedule the worker. I do assume igb and igc have the same logic for
>> retrieving the timestamps here.
>>
>
> That seems a sensible approach. And yes, the timestamping logic is the
> same.
>
>> The problem with workqueues is that under heavy system load, it might be
>> deferred and timestamps will be lost. I guess that workqueue was added
>> because of something like this: 1f6e8178d685 ("igb: Prevent dropped Tx
>> timestamps via work items and interrupts.").
>>
>>>
>>> I have a TODO to experiment with removing the workqueue, and retrieving
>>> the TX timestamp in the same context as the interrupt handler, but other
>>> things always come up.
>>
>> Let me know if you have interest in that igb patch.
>
> That would be great! Thanks.

Sure. See igb patch below.

I'm also wondering whether that delayed work should be replaced
completely by the PTP AUX worker, because that one can be prioritized in
accordance to the use case. And I see Vladimir already suggested this.

From c546621323ba325eccfcf6a9891681929d4b9b4f Mon Sep 17 00:00:00 2001
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date: Tue, 8 Jun 2021 15:38:43 +0200
Subject: [PATCH] igb: Try to tread the timestamp from the ISR.

Commit
   1f6e8178d6851 ("igb: Prevent dropped Tx timestamps via work items and interrupts.")

moved the read of the timestamp into a workqueue in order to retry the
read process in case the timestamp is not immediatly available after the
interrupt. The workqueue is scheduled immediatelly after sending a skb
on 82576 and for all other card types once the timestamp interrupt
occurs. Once scheduled, the workqueue will busy poll for the timestamp
until the operation times out.

By defferring the read of timestamp into a workqueue, the driver can
drop the timestamp of a following packet if the read does not occur
before the following packet is sent. This can happen if the system is
busy and the workqueue is delayed so that it is scheduled after another
skb.

Attempt the of the timestamp directly in the ISR and schedule a
workqueue in case the timestamp is not yet ready. Guard the read process
with __IGB_PTP_TX_READ_IN_PROGRESS to ensure that only one process
attempts to read the timestamp.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/ethernet/intel/igb/igb.h      | 2 ++
 drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
 drivers/net/ethernet/intel/igb/igb_ptp.c  | 7 +++++--
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 2d3daf022651..4c49550768ee 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -705,6 +705,7 @@ enum e1000_state_t {
 	__IGB_RESETTING,
 	__IGB_DOWN,
 	__IGB_PTP_TX_IN_PROGRESS,
+	__IGB_PTP_TX_READ_IN_PROGRESS,
 };
 
 enum igb_boards {
@@ -750,6 +751,7 @@ void igb_ptp_tx_hang(struct igb_adapter *adapter);
 void igb_ptp_rx_rgtstamp(struct igb_q_vector *q_vector, struct sk_buff *skb);
 int igb_ptp_rx_pktstamp(struct igb_q_vector *q_vector, void *va,
 			ktime_t *timestamp);
+void igb_ptp_tx_work(struct work_struct *work);
 int igb_ptp_set_ts_config(struct net_device *netdev, struct ifreq *ifr);
 int igb_ptp_get_ts_config(struct net_device *netdev, struct ifreq *ifr);
 void igb_set_flag_queue_pairs(struct igb_adapter *, const u32);
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index b88303351484..b74b305b5730 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6756,7 +6756,7 @@ static void igb_tsync_interrupt(struct igb_adapter *adapter)
 
 	if (tsicr & E1000_TSICR_TXTS) {
 		/* retrieve hardware timestamp */
-		schedule_work(&adapter->ptp_tx_work);
+		igb_ptp_tx_work(&adapter->ptp_tx_work);
 		ack |= E1000_TSICR_TXTS;
 	}
 
diff --git a/drivers/net/ethernet/intel/igb/igb_ptp.c b/drivers/net/ethernet/intel/igb/igb_ptp.c
index 0011b15e678c..97b444f84b83 100644
--- a/drivers/net/ethernet/intel/igb/igb_ptp.c
+++ b/drivers/net/ethernet/intel/igb/igb_ptp.c
@@ -680,7 +680,7 @@ static int igb_ptp_verify_pin(struct ptp_clock_info *ptp, unsigned int pin,
  * This work function polls the TSYNCTXCTL valid bit to determine when a
  * timestamp has been taken for the current stored skb.
  **/
-static void igb_ptp_tx_work(struct work_struct *work)
+void igb_ptp_tx_work(struct work_struct *work)
 {
 	struct igb_adapter *adapter = container_of(work, struct igb_adapter,
 						   ptp_tx_work);
@@ -705,7 +705,9 @@ static void igb_ptp_tx_work(struct work_struct *work)
 	}
 
 	tsynctxctl = rd32(E1000_TSYNCTXCTL);
-	if (tsynctxctl & E1000_TSYNCTXCTL_VALID)
+	if (tsynctxctl & E1000_TSYNCTXCTL_VALID &&
+	    !test_and_set_bit_lock(__IGB_PTP_TX_READ_IN_PROGRESS,
+				   &adapter->state))
 		igb_ptp_tx_hwtstamp(adapter);
 	else
 		/* reschedule to check later */
@@ -850,6 +852,7 @@ static void igb_ptp_tx_hwtstamp(struct igb_adapter *adapter)
 	 */
 	adapter->ptp_tx_skb = NULL;
 	clear_bit_unlock(__IGB_PTP_TX_IN_PROGRESS, &adapter->state);
+	clear_bit_unlock(__IGB_PTP_TX_READ_IN_PROGRESS, &adapter->state);
 
 	/* Notify the stack and free the skb after we've unlocked */
 	skb_tstamp_tx(skb, &shhwtstamps);
-- 
2.30.2


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: igc: missing HW timestamps at TX
  2022-08-16  9:33                 ` Vladimir Oltean
@ 2022-08-17  7:44                   ` Ferenc Fejes
  0 siblings, 0 replies; 21+ messages in thread
From: Ferenc Fejes @ 2022-08-17  7:44 UTC (permalink / raw)
  To: vladimir.oltean; +Cc: vinicius.gomes, marton12050, netdev, peti.antal99

Hi Vladimir!

Thank your for the explanation.

On Tue, 2022-08-16 at 09:33 +0000, Vladimir Oltean wrote:
> Hi Ferenc,
> 
> On Mon, Aug 15, 2022 at 06:47:31AM +0000, Ferenc Fejes wrote:
> > I just played with those a little. Looks like the --cpu-mask the
> > one it
> > helps in my case. For example I checked the CPU core of the
> > igc_ptp_tx_work:
> > 
> > # bpftrace -e 'kprobe:igc_ptp_tx_work { printf("%d\n", cpu);
> > exit(); }'
> > Attaching 1 probe...
> > 0
> 
> I think this print is slightly irrelevant in the grand scheme, or at
> least not very stable. Because schedule_work() is implemented as
> "queue_work(system_wq, work)", and queue_work() is implemented as
> "queue_work_on(WORK_CPU_UNBOUND, wq, work)", it means that the work
> item
> associated with igc_ptp_tx_work() is not bound to any requested CPU.

I see, good to know. However when I let bpftrace run during the whole
measurement(s) it always printed the same CPU core. I agree that might
happens by pure chance, but from many measurement I cant see
counterexample so after that I decided to just run the bpftrace at the
beginning of my expriments. Interesting nevertheless...

> So unless the prints are taken from the actual test rather than just
> done once before it, which percpu kthread worker executes it from
> within
> the pool might vary.  In turn, __queue_work() selects the CPU based
> on
> raw_smp_processor_id() on which the caller is located (in this case,
> the
> IRQ handler). So it will depend upon the tsync interrupt affinity,
> basically.
> 
> > 
> > Looks like its running on core 0, so I run the isochro:
> > taskset -c 0 isochron ... --cpu-mask $((1 << 0)) - no lost
> > timestamps
> > taskset -c 1 isochron ... --cpu-mask $((1 << 0)) - no lost
> > timestamps
> > taskset -c 0 isochron ... --cpu-mask $((1 << 1)) - losing
> > timestamps
> > taskset -c 1 isochron ... --cpu-mask $((1 << 1)) - losing
> > timestamps
> (...)
> > Maybe this is what helps in my case? With funccount tracer I
> > checked
> > that when the sender thread and igc_ptp_tx_work running on the same
> > core, the worker called exactly as many times as many packets I
> > sent.
> > 
> > However if the worker running on different core, funccount show
> > some
> > random number (less than the packets sent) and in that case I also
> > lost
> > timestamps.
> 
> Thanks.
> 
> Note that if igc_ptp_tx_work runs well on the same CPU (0) as the
> isochron sender thread, but *not* that well on the other CPU,
> I think a simple explanation (for now) might have to do with dynamic
> frequency scaling of the CPUs (CONFIG_CPU_FREQ). If the CPU is kept
> busy
> by the sender thread, the governor will increase the CPU frequency
> and
> the tsync interrupt will be processed quicker, and this will unclog
> the
> "single skb in flight" limitation quicker. If the CPU is mostly idle
> and
> woken up only from time to time by a tsync interrupt, then the
> "single
> skb in flight" limitation will kick in more often, and the isochron
> thread will have its TX timestamp requests silently dropped in that
> meantime until the idle CPU ramps up to execute its scheduled work
> item.

I'm using the performance cpfreq governor so all of my cores running at
peak (4,2GHz) frequency. Does that matter? I'll take a look into that
later, after a quick try some other settings might be depending on
CPU_FREQ because menuconfig dont let me disable it nor ./scripts/config
(it was enable after I compile the kernel).

> 
> To prove my point you can try to compile a kernel with
> CONFIG_CPU_FREQ=n.
> Makes sense?

Thanks,
Ferenc


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: igc: missing HW timestamps at TX
  2022-08-17  6:10                         ` Kurt Kanzenbach
@ 2022-08-17 19:24                           ` Vinicius Costa Gomes
  0 siblings, 0 replies; 21+ messages in thread
From: Vinicius Costa Gomes @ 2022-08-17 19:24 UTC (permalink / raw)
  To: Kurt Kanzenbach, Vladimir Oltean
  Cc: Ferenc Fejes, netdev, marton12050, peti.antal99,
	Sebastian Andrzej Siewior

Kurt Kanzenbach <kurt@linutronix.de> writes:

> On Tue Aug 16 2022, Vinicius Costa Gomes wrote:
>> Kurt Kanzenbach <kurt@linutronix.de> writes:
>>
>>> Hi Vinicius,
>>>
>>> On Mon Aug 15 2022, Vinicius Costa Gomes wrote:
>>>> I think your question is more "why there's that workqueue on igc?"/"why
>>>> don't you retrieve the TX timestamp 'inline' with the interrupt?", if I
>>>> got that right, then, I don't have a good reason, apart from the feeling
>>>> that reading all those (5-6?) registers may take too long for a
>>>> interrupt handler. And it's something that's being done the same for
>>>> most (all?) Intel drivers.
>>>
>>> We do have one optimization for igb which attempts to read the Tx
>>> timestamp directly from the ISR. If that's not ready *only* then we
>>> schedule the worker. I do assume igb and igc have the same logic for
>>> retrieving the timestamps here.
>>>
>>
>> That seems a sensible approach. And yes, the timestamping logic is the
>> same.
>>
>>> The problem with workqueues is that under heavy system load, it might be
>>> deferred and timestamps will be lost. I guess that workqueue was added
>>> because of something like this: 1f6e8178d685 ("igb: Prevent dropped Tx
>>> timestamps via work items and interrupts.").
>>>
>>>>
>>>> I have a TODO to experiment with removing the workqueue, and retrieving
>>>> the TX timestamp in the same context as the interrupt handler, but other
>>>> things always come up.
>>>
>>> Let me know if you have interest in that igb patch.
>>
>> That would be great! Thanks.
>
> Sure. See igb patch below.

Thanks!

>
> I'm also wondering whether that delayed work should be replaced
> completely by the PTP AUX worker, because that one can be prioritized in
> accordance to the use case. And I see Vladimir already suggested this.

That was the idea.


Cheers,
-- 
Vinicius

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-08-17 19:24 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-17 14:42 igc: missing HW timestamps at TX Ferenc Fejes
2022-07-17 14:46 ` Ferenc Fejes
2022-07-18 14:46   ` Vinicius Costa Gomes
2022-07-19  7:40     ` Ferenc Fejes
2022-08-11  8:54       ` Ferenc Fejes
2022-08-11 13:33         ` Vinicius Costa Gomes
2022-08-12 14:13           ` Ferenc Fejes
2022-08-12 20:16             ` Vladimir Oltean
2022-08-15  6:47               ` Ferenc Fejes
2022-08-15 22:04                 ` Vinicius Costa Gomes
2022-08-16  9:33                 ` Vladimir Oltean
2022-08-17  7:44                   ` Ferenc Fejes
2022-08-15 21:39               ` Vinicius Costa Gomes
2022-08-15 22:26                 ` Vladimir Oltean
2022-08-15 23:07                   ` Vinicius Costa Gomes
2022-08-16  8:51                     ` Kurt Kanzenbach
2022-08-16 20:45                       ` Vinicius Costa Gomes
2022-08-17  6:10                         ` Kurt Kanzenbach
2022-08-17 19:24                           ` Vinicius Costa Gomes
2022-08-16  9:10                     ` Vladimir Oltean
2022-08-16 18:11                       ` Vinicius Costa Gomes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).