All of lore.kernel.org
 help / color / mirror / Atom feed
From: "N. Benes" <nbenes@eso.org>
To: dev@dpdk.org
Subject: Re: [dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds
Date: Fri, 29 May 2020 20:46:00 +0000	[thread overview]
Message-ID: <660c0552-ded2-7454-c6fc-43db9844f06c@eso.org> (raw)
In-Reply-To: <91af9f42-9477-b27f-c5c0-cb0e44a95573@kth.se>

Hi everyone,

Tom Barbette:
> 
> Le 22/05/2020 à 20:43, PATRICK KEROULAS a écrit :
>>>>>> mlx5 part of libibverbs includes a ts-to-ns converter which takes the
>>>>>> instantaneous clock info. It's unused in dpdk so far. I've tested
>>>>>> it in the
>>>>>> device/port init routine and the result looks reliable. Since this
>>>>>> approach
>>>>>> looks very simple, compared to the time sync mechanism, I'm trying to
>>>>>> integrate.
>>>>>>
>>>>>> The conversion should occur in the primary process (testpmd) I
>>>>>> suppose.
>>>>>> 1) The needed clock info derives from ethernet device. Is it
>>>>>> possible to
>>>>>>     access that struct from a rx callback?
>>>>>> 2) how to attach the nanosecond to mbuf so that `pdump` catches it?
>>>>>>     (workaround: copy `mbuf->udata64` in forwarded packets.)
>>>>>> 3) any other idea?
>>>>> The timestamp is carried in mbuf.
>>>>> Then the conversion must be done by the ethdev caller (application or
>>>>> any other upper layer).
>>>> What if the converter function needs a clock_info?
>>>> https://github.com/linux-rdma/rdma-core/blob/7af01c79e00555207dee6132d72e7bfc1bb5485e/providers/mlx5/mlx5dv.h#L1201
>>>>
>>>> I'm affraid this info may change by the time the converter is called
>>>> by upper layer.
>>> Indeed, the clock in the device is not an atomic one :)
>>> We need to adjust the time conversion continuously.
>>> I am not an expert of time synchronization, so I add more people Cc
>>> who could help for having a precise timestamp.
>> Thanks Thomas.
>> Not sure this is a synchronization issue. We have dedicated processes
>> (linuxptp) to keep both NIC and sys clocks in sync with an external
>> clock.
>> It is "just" a matter of unit conversion.
>>
>> If it has to be performed in dpdk-pdump, I would need some help to
>> retrieve mlx5_clock_info from inside a secondary process. Looking at
>> mlx5_read_clock(), this info is extracted from ibv_context which looks
>> reachable in a primary process only (segfault, if I try in pdump).

The normal phc2sys can not only synchronise NIC -> system but also sys
-> NIC and (I believe it does but have not tried) NIC1 -> NIC2.
If I understand your proposal correctly, you want to use a free running
NIC counter and calibrate out the drift afterwards.
It may be easier to adapt phc2sys to use a NIC through DPDK and sync the
NIC's timewheel/VCO in a proven/reliable manner (e.g. low pass filtering
excursions). Then you could directly use the NIC counter value.

> I don't know about the integrated ts-to-ns, but we implemented a
> translation mechanism that mimics what NTP does in Linux to translate a
> given clock (TSC at first) to a wall time. You'll find more info at
> https://orbi.uliege.be/bitstream/2268/226257/1/thesis.pdf chapter
> 3.4.1.  This is an often forgotten matter, as we saw in real switches
> that the time spent in time-related VDSO is enormous.

Do you have measurements of vDSO clock_gettime and how much is
"enormous" to you?
To my knowledge, clock_gettime via vDSO on Linux only takes a few
nanoseconds in the average case. However, it can go up to ~10 or even
~50 microseconds every few (~10) seconds, depending on the number of
CPUs (for example single vs. dual socket, though my hardware for this
test is quite old, Dell R210-II, R610). Presumably this is when the
kernel locks the struct in VVAR to update the TSC drift compensation
parameters.

Linux clock_gettime implementation is here (different versions):
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/vdso/vclock_gettime.c?h=linux-3.10.y#n193
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/entry/vdso/vclock_gettime.c?h=linux-4.19.y#n241
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/lib/vdso/gettimeofday.c#n98

I use busy waiting on clock_gettime in a packet generator application
(so far 10 GbE only) to pace jumbo frames according to a spec
(simulating the traffic pattern of a to-be-developed hardware with
FPGA), and COTS sniffer hardware with absolute timestamping to verify my
generator's performance. I can observe the above 10-50 us artefacts and
sufficiently good/low (for my needs) average execution time of
clock_gettime. The only sad thing is that TAI clock does not go through
vDSO and therefore I cannot use it.

> We wanted to do a very precise capture too, se we made that clock able
> to synchronize itself with the ConnectX 5 internal clock as a base
> instead of TSC. FYI the clock in CX5 si running at 800MHz, so pure
> nanosecond is impossible, but close enough. It is for that purpose that
> I proposed the rte_eth_read_clock() patch in DPDK. We need to be able to
> read the current clock (like rdtsc() instruction for TSC) to compute the
> frequency.

Doesn't this mean that you need to wait for the PCIe op from the NIC?
Is this really faster than a rdtsc, memory/cache read, integer
multiplication and shift?

Cheers,
nicolas


  reply	other threads:[~2020-05-29 20:46 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-19 18:20 [dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds PATRICK KEROULAS
2020-05-21 15:33 ` Thomas Monjalon
2020-05-21 19:57   ` PATRICK KEROULAS
2020-05-21 20:09     ` Thomas Monjalon
2020-05-22 18:43       ` PATRICK KEROULAS
2020-05-26  7:44         ` Tom Barbette
2020-05-29 20:46           ` N. Benes [this message]
2020-05-26 16:00         ` Slava Ovsiienko
2020-05-29 20:56           ` PATRICK KEROULAS
2020-05-31 19:47             ` Slava Ovsiienko
2020-06-02 19:18           ` PATRICK KEROULAS
2020-06-03  7:48             ` Slava Ovsiienko
2020-06-05  0:09               ` PATRICK KEROULAS
2020-06-05 16:30                 ` Slava Ovsiienko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=660c0552-ded2-7454-c6fc-43db9844f06c@eso.org \
    --to=nbenes@eso.org \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.