dev.dpdk.org archive mirror
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH v4 0/3] Add rte_eth_read_clock API
@ 2019-05-02 12:11 Tom Barbette
  2019-05-02 12:11 ` [dpdk-dev] [PATCH v4 1/3] rte_ethdev: Add API function to read dev clock Tom Barbette
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Tom Barbette @ 2019-05-02 12:11 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, john.mcnamara, Thomas Monjalon, Ferruh Yigit,
	Andrew Rybchenko, Shahaf Shuler, Yongseok Koh, olivier.matz,
	Tom Barbette

Some NICs allow to timestamp packets, but do not support the full
PTP synchronization process. Hence, the value set in the mbuf
timestamp field is only the raw value of an internal clock.

To make sense of this value, one at least needs to be able to query
the current hardware clock value. This patch series adds a new API to do
so, rte_eth_read_clock. As with the TSC, from there
a frequency can be derieved by querying multiple time the current value of the
internal clock with some known delay between the queries (example
provided in the API doc).

This patch series adds support of read_clock for MLX5.

An example app is provided in the rxtx_callback application.
It has been updated to display, on top of the software latency
in cycles, the total latency since the packet was received in hardware.
The API is used to compute a delta in the Tx callback. The raw amount of
ticks is converted to cycles using a variation of the technique describe above.

Aside from offloading timestamping, which relieve the
software from a few operations, this allows to get much more precision
when studying the source of the latency in a system.
Eg. in our 100G, CX5 setup the rxtx callback application shows
SW latency is around 74 cycles (TSC is 3.2Ghz), but the latency
including NIC processing, PCIe, and queuing is around 196 cycles.

One may think at first this API is overlapping with te_eth_timesync_read_time.
rte_eth_timesync_read_time is clearly identified as part of a set of functions
to use PTP synchronization.
The device raw clock is not "sync" in any way. More importantly, the returned
value is not a timeval, but an amount of ticks. We could have a cast-based
solution, but on top of being an ugly solution, some people seeing the timeval
type of rte_eth_timesync_read_time could use it blindly.

Change in v2:
  - Rebase on current master

Change in v3:
  - Address comments from Ferruh Yigit

Changes in v4:
  - Address comments from Keith Wiles and Andrew Rybchenko
  - Use "clock" as argunment name everywhere.
  - Expand the API description to make clear that read_clock gives an
    amount in ticks, and that it has no unit.

Tom Barbette (3):
  rte_ethdev: Add API function to read dev clock
  mlx5: Implement support for read_clock
  rxtx_callbacks: Add support for HW timestamp

 doc/guides/nics/features.rst                |  1 +
 doc/guides/sample_app_ug/rxtx_callbacks.rst |  9 ++-
 drivers/net/mlx5/mlx5.c                     |  1 +
 drivers/net/mlx5/mlx5.h                     |  1 +
 drivers/net/mlx5/mlx5_ethdev.c              | 30 +++++++
 drivers/net/mlx5/mlx5_glue.c                |  8 ++
 drivers/net/mlx5/mlx5_glue.h                |  2 +
 examples/rxtx_callbacks/Makefile            |  3 +
 examples/rxtx_callbacks/main.c              | 87 ++++++++++++++++++++-
 examples/rxtx_callbacks/meson.build         |  3 +
 lib/librte_ethdev/rte_ethdev.c              | 12 +++
 lib/librte_ethdev/rte_ethdev.h              | 47 +++++++++++
 lib/librte_ethdev/rte_ethdev_core.h         |  6 ++
 lib/librte_ethdev/rte_ethdev_version.map    |  1 +
 lib/librte_mbuf/rte_mbuf.h                  |  2 +
 15 files changed, 208 insertions(+), 5 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [dpdk-dev] [PATCH v4 1/3] rte_ethdev: Add API function to read dev clock
  2019-05-02 12:11 [dpdk-dev] [PATCH v4 0/3] Add rte_eth_read_clock API Tom Barbette
@ 2019-05-02 12:11 ` Tom Barbette
  2019-05-08  7:54   ` Andrew Rybchenko
  2019-06-04 13:57   ` Ferruh Yigit
  2019-05-02 12:11 ` [dpdk-dev] [PATCH v4 2/3] mlx5: Implement support for read_clock Tom Barbette
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 10+ messages in thread
From: Tom Barbette @ 2019-05-02 12:11 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, john.mcnamara, Thomas Monjalon, Ferruh Yigit,
	Andrew Rybchenko, Shahaf Shuler, Yongseok Koh, olivier.matz,
	Tom Barbette

Add rte_eth_read_clock to read the raw clock of a device.

The main use is to get the device clock conversion co-efficients to be
able to translate the raw clock of the timestamp field of the pkt mbuf
to a local synced time value.

This function was missing to allow users to convert the Rx timestamp field
to real time without the complexity of the rte_timesync* facility. One can
derivate the clock frequency by calling twice read_clock and then keep a
common time base.

Signed-off-by: Tom Barbette <barbette@kth.se>
---
 doc/guides/nics/features.rst             |  1 +
 lib/librte_ethdev/rte_ethdev.c           | 12 ++++++
 lib/librte_ethdev/rte_ethdev.h           | 47 ++++++++++++++++++++++++
 lib/librte_ethdev/rte_ethdev_core.h      |  6 +++
 lib/librte_ethdev/rte_ethdev_version.map |  1 +
 lib/librte_mbuf/rte_mbuf.h               |  2 +
 6 files changed, 69 insertions(+)

diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index c5bf32222..025b7f812 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -602,6 +602,7 @@ Supports Timestamp.
 * **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_TIMESTAMP``.
 * **[provides] mbuf**: ``mbuf.timestamp``.
 * **[provides] rte_eth_dev_info**: ``rx_offload_capa,rx_queue_offload_capa: DEV_RX_OFFLOAD_TIMESTAMP``.
+* **[related] eth_dev_ops**: ``read_clock``.
 
 .. _nic_features_macsec_offload:
 
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index d7cfa3d53..9507a985f 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -4170,6 +4170,18 @@ rte_eth_timesync_write_time(uint16_t port_id, const struct timespec *timestamp)
 								timestamp));
 }
 
+int
+rte_eth_read_clock(uint16_t port_id, uint64_t *clock)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	dev = &rte_eth_devices[port_id];
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->read_clock, -ENOTSUP);
+	return eth_err(port_id, (*dev->dev_ops->read_clock)(dev, clock));
+}
+
 int
 rte_eth_dev_get_reg_info(uint16_t port_id, struct rte_dev_reg_info *info)
 {
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index b8d19c69f..7dd1f8ae7 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -3793,6 +3793,53 @@ int rte_eth_timesync_read_time(uint16_t port_id, struct timespec *time);
  */
 int rte_eth_timesync_write_time(uint16_t port_id, const struct timespec *time);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Read the current clock counter of an Ethernet device
+ *
+ * This returns the current raw clock value of an Ethernet device. It is
+ * a raw amount of ticks, with no given time reference.
+ * The value returned here is from the same clock than the one
+ * filling timestamp field of Rx packets when using hardware timestamp
+ * offload. Therefore it can be used to compute a precise conversion of
+ * the device clock to the real time.
+ *
+ * E.g, a simple heuristic to derivate the frequency would be:
+ * uint64_t start, end;
+ * rte_eth_read_clock(port, start);
+ * rte_delay_ms(100);
+ * rte_eth_read_clock(port, end);
+ * double freq = (end - start) * 10;
+ *
+ * Compute a common reference with:
+ * uint64_t base_time_sec = current_time();
+ * uint64_t base_clock;
+ * rte_eth_read_clock(port, base_clock);
+ *
+ * Then, convert the raw mbuf timestamp with:
+ * base_time_sec + (double)(mbuf->timestamp - base_clock) / freq;
+ *
+ * This simple example will not provide a very good accuracy. One must
+ * at least measure multiple times the frequency and do a regression.
+ * To avoid deviation from the system time, the common reference can
+ * be repeated from time to time. The integer division can also be
+ * converted by a multiplication and a shift for better performance.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param time
+ *   Pointer to the uint64_t that holds the raw clock value.
+ *
+ * @return
+ *   - 0: Success.
+ *   - -ENODEV: The port ID is invalid.
+ *   - -ENOTSUP: The function is not supported by the Ethernet driver.
+ */
+int __rte_experimental
+rte_eth_read_clock(uint16_t port_id, uint64_t *clock);
+
 /**
  * Config l2 tunnel ether type of an Ethernet device for filtering specific
  * tunnel packets by ether type.
diff --git a/lib/librte_ethdev/rte_ethdev_core.h b/lib/librte_ethdev/rte_ethdev_core.h
index 8f03f83f6..86806b3eb 100644
--- a/lib/librte_ethdev/rte_ethdev_core.h
+++ b/lib/librte_ethdev/rte_ethdev_core.h
@@ -322,6 +322,10 @@ typedef int (*eth_timesync_write_time)(struct rte_eth_dev *dev,
 				       const struct timespec *timestamp);
 /**< @internal Function used to get time from the device clock */
 
+typedef int (*eth_read_clock)(struct rte_eth_dev *dev,
+				      uint64_t *timestamp);
+/**< @internal Function used to get the current value of the device clock. */
+
 typedef int (*eth_get_reg_t)(struct rte_eth_dev *dev,
 				struct rte_dev_reg_info *info);
 /**< @internal Retrieve registers  */
@@ -496,6 +500,8 @@ struct eth_dev_ops {
 	eth_timesync_read_time     timesync_read_time; /** Get the device clock time. */
 	eth_timesync_write_time    timesync_write_time; /** Set the device clock time. */
 
+	eth_read_clock             read_clock;
+
 	eth_xstats_get_by_id_t     xstats_get_by_id;
 	/**< Get extended device statistic values by ID. */
 	eth_xstats_get_names_by_id_t xstats_get_names_by_id;
diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
index afcd25599..df9141825 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -253,6 +253,7 @@ EXPERIMENTAL {
 	rte_eth_dev_rx_intr_ctl_q_get_fd;
 	rte_eth_find_next_of;
 	rte_eth_find_next_sibling;
+	rte_eth_read_clock;
 	rte_eth_switch_domain_alloc;
 	rte_eth_switch_domain_free;
 	rte_flow_conv;
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 68415af02..e530a96c5 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -668,6 +668,8 @@ struct rte_mbuf {
 
 	/** Valid if PKT_RX_TIMESTAMP is set. The unit and time reference
 	 * are not normalized but are always the same for a given port.
+	 * Some devices allow to query rte_eth_read_clock that will return the
+	 * current device timestamp.
 	 */
 	uint64_t timestamp;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [dpdk-dev] [PATCH v4 2/3] mlx5: Implement support for read_clock
  2019-05-02 12:11 [dpdk-dev] [PATCH v4 0/3] Add rte_eth_read_clock API Tom Barbette
  2019-05-02 12:11 ` [dpdk-dev] [PATCH v4 1/3] rte_ethdev: Add API function to read dev clock Tom Barbette
@ 2019-05-02 12:11 ` Tom Barbette
  2019-05-02 12:11 ` [dpdk-dev] [PATCH v4 3/3] rxtx_callbacks: Add support for HW timestamp Tom Barbette
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Tom Barbette @ 2019-05-02 12:11 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, john.mcnamara, Thomas Monjalon, Ferruh Yigit,
	Andrew Rybchenko, Shahaf Shuler, Yongseok Koh, olivier.matz,
	Tom Barbette

Implements support for read_clock for the mlx5 driver. mlx5 supports
hardware timestamp offload, setting packets timestamp field to the
device clock. rte_eth_read_clock allows to read the device's current
clock value and therefore compare values on similar time base.

See rxtx_callbacks for an example.

Signed-off-by: Tom Barbette <barbette@kth.se>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5.c        |  1 +
 drivers/net/mlx5/mlx5.h        |  1 +
 drivers/net/mlx5/mlx5_ethdev.c | 30 ++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_glue.c   |  8 ++++++++
 drivers/net/mlx5/mlx5_glue.h   |  2 ++
 5 files changed, 42 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 46ca08a4d..947943346 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -748,6 +748,7 @@ const struct eth_dev_ops mlx5_dev_ops = {
 	.xstats_get_names = mlx5_xstats_get_names,
 	.fw_version_get = mlx5_fw_version_get,
 	.dev_infos_get = mlx5_dev_infos_get,
+	.read_clock = mlx5_read_clock,
 	.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
 	.vlan_filter_set = mlx5_vlan_filter_set,
 	.rx_queue_setup = mlx5_rx_queue_setup,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 0a6d7f1d5..187703b61 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -403,6 +403,7 @@ int mlx5_set_flags(struct rte_eth_dev *dev, unsigned int keep,
 		   unsigned int flags);
 int mlx5_dev_configure(struct rte_eth_dev *dev);
 void mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info);
+int mlx5_read_clock(struct rte_eth_dev *dev, uint64_t *clock);
 int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size);
 const uint32_t *mlx5_dev_supported_ptypes_get(struct rte_eth_dev *dev);
 int mlx5_link_update(struct rte_eth_dev *dev, int wait_to_complete);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 57a64495d..00906f99d 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -564,6 +564,36 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 	}
 }
 
+/**
+ * Get device current raw clock counter
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param[out] time
+ *   Current raw clock counter of the device.
+ *
+ * @return
+ *   0 if the clock has correctly been read
+ *   The value of errno in case of error
+ */
+int
+mlx5_read_clock(struct rte_eth_dev *dev, uint64_t *clock)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct ibv_context *ctx = priv->sh->ctx;
+	struct ibv_values_ex values;
+	int err = 0;
+
+	values.comp_mask = IBV_VALUES_MASK_RAW_CLOCK;
+	err = mlx5_glue->query_rt_values_ex(ctx, &values);
+	if (err != 0) {
+		DRV_LOG(WARNING, "Could not query the clock !");
+		return err;
+	}
+	*clock = values.raw_clock.tv_nsec;
+	return 0;
+}
+
 /**
  * Get firmware version of a device.
  *
diff --git a/drivers/net/mlx5/mlx5_glue.c b/drivers/net/mlx5/mlx5_glue.c
index b32cd09c3..c1c650cff 100644
--- a/drivers/net/mlx5/mlx5_glue.c
+++ b/drivers/net/mlx5/mlx5_glue.c
@@ -87,6 +87,13 @@ mlx5_glue_query_device_ex(struct ibv_context *context,
 	return ibv_query_device_ex(context, input, attr);
 }
 
+static int
+mlx5_glue_query_rt_values_ex(struct ibv_context *context,
+			  struct ibv_values_ex *values)
+{
+	return ibv_query_rt_values_ex(context, values);
+}
+
 static int
 mlx5_glue_query_port(struct ibv_context *context, uint8_t port_num,
 		     struct ibv_port_attr *port_attr)
@@ -834,6 +841,7 @@ const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
 	.close_device = mlx5_glue_close_device,
 	.query_device = mlx5_glue_query_device,
 	.query_device_ex = mlx5_glue_query_device_ex,
+	.query_rt_values_ex = mlx5_glue_query_rt_values_ex,
 	.query_port = mlx5_glue_query_port,
 	.create_comp_channel = mlx5_glue_create_comp_channel,
 	.destroy_comp_channel = mlx5_glue_destroy_comp_channel,
diff --git a/drivers/net/mlx5/mlx5_glue.h b/drivers/net/mlx5/mlx5_glue.h
index 1d06583f4..e76e0b7af 100644
--- a/drivers/net/mlx5/mlx5_glue.h
+++ b/drivers/net/mlx5/mlx5_glue.h
@@ -83,6 +83,8 @@ struct mlx5_glue {
 	int (*query_device_ex)(struct ibv_context *context,
 			       const struct ibv_query_device_ex_input *input,
 			       struct ibv_device_attr_ex *attr);
+	int (*query_rt_values_ex)(struct ibv_context *context,
+			       struct ibv_values_ex *values);
 	int (*query_port)(struct ibv_context *context, uint8_t port_num,
 			  struct ibv_port_attr *port_attr);
 	struct ibv_comp_channel *(*create_comp_channel)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [dpdk-dev] [PATCH v4 3/3] rxtx_callbacks: Add support for HW timestamp
  2019-05-02 12:11 [dpdk-dev] [PATCH v4 0/3] Add rte_eth_read_clock API Tom Barbette
  2019-05-02 12:11 ` [dpdk-dev] [PATCH v4 1/3] rte_ethdev: Add API function to read dev clock Tom Barbette
  2019-05-02 12:11 ` [dpdk-dev] [PATCH v4 2/3] mlx5: Implement support for read_clock Tom Barbette
@ 2019-05-02 12:11 ` Tom Barbette
  2019-05-31  7:46   ` Ferruh Yigit
  2019-06-13  5:55   ` Thomas Monjalon
  2019-05-08  7:49 ` [dpdk-dev] [PATCH v4 0/3] Add rte_eth_read_clock API Tom Barbette
  2019-05-31  7:46 ` Ferruh Yigit
  4 siblings, 2 replies; 10+ messages in thread
From: Tom Barbette @ 2019-05-02 12:11 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, john.mcnamara, Thomas Monjalon, Ferruh Yigit,
	Andrew Rybchenko, Shahaf Shuler, Yongseok Koh, olivier.matz,
	Tom Barbette

Use rxtx callback to demonstrate a way to use rte_eth_read_clock to
convert the hardware timestamps to an amount of cycles.

This allows to get the amount of time the packet spent since its entry
in the device. While the regular latency only shows the latency from
when it entered the software stack.

Signed-off-by: Tom Barbette <barbette@kth.se>
---
 doc/guides/sample_app_ug/rxtx_callbacks.rst |  9 ++-
 examples/rxtx_callbacks/Makefile            |  3 +
 examples/rxtx_callbacks/main.c              | 87 ++++++++++++++++++++-
 examples/rxtx_callbacks/meson.build         |  3 +
 4 files changed, 97 insertions(+), 5 deletions(-)

diff --git a/doc/guides/sample_app_ug/rxtx_callbacks.rst b/doc/guides/sample_app_ug/rxtx_callbacks.rst
index 81463d28d..6b0c64461 100644
--- a/doc/guides/sample_app_ug/rxtx_callbacks.rst
+++ b/doc/guides/sample_app_ug/rxtx_callbacks.rst
@@ -13,6 +13,10 @@ In the sample application a user defined callback is applied to all received
 packets to add a timestamp. A separate callback is applied to all packets
 prior to transmission to calculate the elapsed time, in CPU cycles.
 
+If hardware timestamping is supported by the NIC, the sample application will
+also display the average latency since the packet was timestamped in hardware,
+on top of the latency since the packet was received and processed by the RX
+callback.
 
 Compiling the Application
 -------------------------
@@ -36,7 +40,10 @@ To run the example in a ``linux`` environment:
 
 .. code-block:: console
 
-    ./build/rxtx_callbacks -l 1 -n 4
+    ./build/rxtx_callbacks -l 1 -n 4 -- [-t]
+
+Use -t to enable hardware timestamping. If not supported by the NIC, an error
+will be displayed.
 
 Refer to *DPDK Getting Started Guide* for general information on running
 applications and the Environment Abstraction Layer (EAL) options.
diff --git a/examples/rxtx_callbacks/Makefile b/examples/rxtx_callbacks/Makefile
index b937d599b..0a4660681 100644
--- a/examples/rxtx_callbacks/Makefile
+++ b/examples/rxtx_callbacks/Makefile
@@ -50,6 +50,9 @@ include $(RTE_SDK)/mk/rte.vars.mk
 
 CFLAGS += $(WERROR_FLAGS)
 
+# rte_eth_read_clock is experimental
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+
 # workaround for a gcc bug with noreturn attribute
 # http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
 ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
diff --git a/examples/rxtx_callbacks/main.c b/examples/rxtx_callbacks/main.c
index 2058be627..55aa82288 100644
--- a/examples/rxtx_callbacks/main.c
+++ b/examples/rxtx_callbacks/main.c
@@ -4,6 +4,7 @@
 
 #include <stdint.h>
 #include <inttypes.h>
+#include <getopt.h>
 #include <rte_eal.h>
 #include <rte_ethdev.h>
 #include <rte_cycles.h>
@@ -17,6 +18,9 @@
 #define MBUF_CACHE_SIZE 250
 #define BURST_SIZE 32
 
+static const char usage[] =
+	"%s EAL_ARGS -- [-t]\n";
+
 static const struct rte_eth_conf port_conf_default = {
 	.rxmode = {
 		.max_rx_pkt_len = ETHER_MAX_LEN,
@@ -25,9 +29,14 @@ static const struct rte_eth_conf port_conf_default = {
 
 static struct {
 	uint64_t total_cycles;
+	uint64_t total_queue_cycles;
 	uint64_t total_pkts;
 } latency_numbers;
 
+int hw_timestamping;
+
+#define TICKS_PER_CYCLE_SHIFT 16
+static uint64_t ticks_per_cycle_mult;
 
 static uint16_t
 add_timestamps(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
@@ -43,22 +52,42 @@ add_timestamps(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
 }
 
 static uint16_t
-calc_latency(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+calc_latency(uint16_t port, uint16_t qidx __rte_unused,
 		struct rte_mbuf **pkts, uint16_t nb_pkts, void *_ __rte_unused)
 {
 	uint64_t cycles = 0;
+	uint64_t queue_ticks = 0;
 	uint64_t now = rte_rdtsc();
+	uint64_t ticks;
 	unsigned i;
 
-	for (i = 0; i < nb_pkts; i++)
+	if (hw_timestamping)
+		rte_eth_read_clock(port, &ticks);
+
+	for (i = 0; i < nb_pkts; i++) {
 		cycles += now - pkts[i]->udata64;
+		if (hw_timestamping)
+			queue_ticks += ticks - pkts[i]->timestamp;
+	}
+
 	latency_numbers.total_cycles += cycles;
+	if (hw_timestamping)
+		latency_numbers.total_queue_cycles += (queue_ticks
+			* ticks_per_cycle_mult) >> TICKS_PER_CYCLE_SHIFT;
+
 	latency_numbers.total_pkts += nb_pkts;
 
 	if (latency_numbers.total_pkts > (100 * 1000 * 1000ULL)) {
 		printf("Latency = %"PRIu64" cycles\n",
 		latency_numbers.total_cycles / latency_numbers.total_pkts);
-		latency_numbers.total_cycles = latency_numbers.total_pkts = 0;
+		if (hw_timestamping) {
+			printf("Latency from HW = %"PRIu64" cycles\n",
+			   latency_numbers.total_queue_cycles
+			   / latency_numbers.total_pkts);
+		}
+		latency_numbers.total_cycles = 0;
+		latency_numbers.total_queue_cycles = 0;
+		latency_numbers.total_pkts = 0;
 	}
 	return nb_pkts;
 }
@@ -77,6 +106,7 @@ port_init(uint16_t port, struct rte_mempool *mbuf_pool)
 	int retval;
 	uint16_t q;
 	struct rte_eth_dev_info dev_info;
+	struct rte_eth_rxconf rxconf;
 	struct rte_eth_txconf txconf;
 
 	if (!rte_eth_dev_is_valid_port(port))
@@ -95,9 +125,20 @@ port_init(uint16_t port, struct rte_mempool *mbuf_pool)
 	if (retval != 0)
 		return retval;
 
+	rxconf = dev_info.default_rxconf;
+
+	if (hw_timestamping) {
+		if (!(dev_info.rx_offload_capa & DEV_RX_OFFLOAD_TIMESTAMP)) {
+			printf("\nERROR: Port %u does not support hardware timestamping\n"
+					, port);
+			return -1;
+		}
+		rxconf.offloads |= DEV_RX_OFFLOAD_TIMESTAMP;
+	}
+
 	for (q = 0; q < rx_rings; q++) {
 		retval = rte_eth_rx_queue_setup(port, q, nb_rxd,
-				rte_eth_dev_socket_id(port), NULL, mbuf_pool);
+			rte_eth_dev_socket_id(port), &rxconf, mbuf_pool);
 		if (retval < 0)
 			return retval;
 	}
@@ -115,6 +156,27 @@ port_init(uint16_t port, struct rte_mempool *mbuf_pool)
 	if (retval < 0)
 		return retval;
 
+	if (hw_timestamping && ticks_per_cycle_mult  == 0) {
+		uint64_t cycles_base = rte_rdtsc();
+		uint64_t ticks_base;
+		retval = rte_eth_read_clock(port, &ticks_base);
+		if (retval != 0)
+			return retval;
+		rte_delay_ms(100);
+		uint64_t cycles = rte_rdtsc();
+		uint64_t ticks;
+		rte_eth_read_clock(port, &ticks);
+		uint64_t c_freq = cycles - cycles_base;
+		uint64_t t_freq = ticks - ticks_base;
+		double freq_mult = (double)c_freq / t_freq;
+		printf("TSC Freq ~= %lu\nHW Freq ~= %lu\nRatio : %f\n",
+				c_freq * 10, t_freq * 10, freq_mult);
+		/* TSC will be faster than internal ticks so freq_mult is > 0
+		 * We convert the multiplication to an integer shift & mult
+		 */
+		ticks_per_cycle_mult = (1 << TICKS_PER_CYCLE_SHIFT) / freq_mult;
+	}
+
 	struct ether_addr addr;
 
 	rte_eth_macaddr_get(port, &addr);
@@ -177,6 +239,11 @@ main(int argc, char *argv[])
 	struct rte_mempool *mbuf_pool;
 	uint16_t nb_ports;
 	uint16_t portid;
+	struct option lgopts[] = {
+		{ NULL,  0, 0, 0 }
+	};
+	int opt, option_index;
+
 
 	/* init EAL */
 	int ret = rte_eal_init(argc, argv);
@@ -186,6 +253,18 @@ main(int argc, char *argv[])
 	argc -= ret;
 	argv += ret;
 
+	while ((opt = getopt_long(argc, argv, "t", lgopts, &option_index))
+			!= EOF)
+		switch (opt) {
+		case 't':
+			hw_timestamping = 1;
+			break;
+		default:
+			printf(usage, argv[0]);
+			return -1;
+		}
+	optind = 1; /* reset getopt lib */
+
 	nb_ports = rte_eth_dev_count_avail();
 	if (nb_ports < 2 || (nb_ports & 1))
 		rte_exit(EXIT_FAILURE, "Error: number of ports must be even\n");
diff --git a/examples/rxtx_callbacks/meson.build b/examples/rxtx_callbacks/meson.build
index c34e11e36..a7bf12dd3 100644
--- a/examples/rxtx_callbacks/meson.build
+++ b/examples/rxtx_callbacks/meson.build
@@ -6,6 +6,9 @@
 # To build this example as a standalone application with an already-installed
 # DPDK instance, use 'make'
 
+#rte_eth_read_clock is experimental
+allow_experimental_apis = true
+
 sources = files(
 	'main.c'
 )
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [PATCH v4 0/3] Add rte_eth_read_clock API
  2019-05-02 12:11 [dpdk-dev] [PATCH v4 0/3] Add rte_eth_read_clock API Tom Barbette
                   ` (2 preceding siblings ...)
  2019-05-02 12:11 ` [dpdk-dev] [PATCH v4 3/3] rxtx_callbacks: Add support for HW timestamp Tom Barbette
@ 2019-05-08  7:49 ` Tom Barbette
  2019-05-31  7:46 ` Ferruh Yigit
  4 siblings, 0 replies; 10+ messages in thread
From: Tom Barbette @ 2019-05-08  7:49 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, john.mcnamara, Thomas Monjalon, Ferruh Yigit,
	Andrew Rybchenko, Shahaf Shuler, Yongseok Koh, olivier.matz

Maybe a (last) motivation point.

We just did a 100G link traffic capture with time-stamping of all 
packets in HW using a Mellanox CX5. SW time-stamping fails to reveal 
queueing delays, and as multi-queue is needed for writing 100G traffic 
to multiple NVMe drives, does not allow to recover the original ordering 
mixed by multi-queuing.

Here, we timestamped traffic in hardware (FYI, given in ticks of the 
internal CX5 clock, not in unit of time), and thanks to the new API, 
converted it to real time value (through frequency + base).

But precision is not the only improvement. As DPDK is userlevel, calling 
get_timeofday for millions of packets pretty much kills the capture.
Here we do a simple math per packet to convert the packet's timestamp in 
ticks to the real clock time, (very) much cheaper than even a vDSO syscall.

Tom


On 2019-05-02 14:11, Tom Barbette wrote:
> Some NICs allow to timestamp packets, but do not support the full
> PTP synchronization process. Hence, the value set in the mbuf
> timestamp field is only the raw value of an internal clock.
> 
> To make sense of this value, one at least needs to be able to query
> the current hardware clock value. This patch series adds a new API to do
> so, rte_eth_read_clock. As with the TSC, from there
> a frequency can be derieved by querying multiple time the current value of the
> internal clock with some known delay between the queries (example
> provided in the API doc).
> 
> This patch series adds support of read_clock for MLX5.
> 
> An example app is provided in the rxtx_callback application.
> It has been updated to display, on top of the software latency
> in cycles, the total latency since the packet was received in hardware.
> The API is used to compute a delta in the Tx callback. The raw amount of
> ticks is converted to cycles using a variation of the technique describe above.
> 
> Aside from offloading timestamping, which relieve the
> software from a few operations, this allows to get much more precision
> when studying the source of the latency in a system.
> Eg. in our 100G, CX5 setup the rxtx callback application shows
> SW latency is around 74 cycles (TSC is 3.2Ghz), but the latency
> including NIC processing, PCIe, and queuing is around 196 cycles.
> 
> One may think at first this API is overlapping with te_eth_timesync_read_time.
> rte_eth_timesync_read_time is clearly identified as part of a set of functions
> to use PTP synchronization.
> The device raw clock is not "sync" in any way. More importantly, the returned
> value is not a timeval, but an amount of ticks. We could have a cast-based
> solution, but on top of being an ugly solution, some people seeing the timeval
> type of rte_eth_timesync_read_time could use it blindly.
> 
> Change in v2:
>    - Rebase on current master
> 
> Change in v3:
>    - Address comments from Ferruh Yigit
> 
> Changes in v4:
>    - Address comments from Keith Wiles and Andrew Rybchenko
>    - Use "clock" as argunment name everywhere.
>    - Expand the API description to make clear that read_clock gives an
>      amount in ticks, and that it has no unit.
> 
> Tom Barbette (3):
>    rte_ethdev: Add API function to read dev clock
>    mlx5: Implement support for read_clock
>    rxtx_callbacks: Add support for HW timestamp
> 
>   doc/guides/nics/features.rst                |  1 +
>   doc/guides/sample_app_ug/rxtx_callbacks.rst |  9 ++-
>   drivers/net/mlx5/mlx5.c                     |  1 +
>   drivers/net/mlx5/mlx5.h                     |  1 +
>   drivers/net/mlx5/mlx5_ethdev.c              | 30 +++++++
>   drivers/net/mlx5/mlx5_glue.c                |  8 ++
>   drivers/net/mlx5/mlx5_glue.h                |  2 +
>   examples/rxtx_callbacks/Makefile            |  3 +
>   examples/rxtx_callbacks/main.c              | 87 ++++++++++++++++++++-
>   examples/rxtx_callbacks/meson.build         |  3 +
>   lib/librte_ethdev/rte_ethdev.c              | 12 +++
>   lib/librte_ethdev/rte_ethdev.h              | 47 +++++++++++
>   lib/librte_ethdev/rte_ethdev_core.h         |  6 ++
>   lib/librte_ethdev/rte_ethdev_version.map    |  1 +
>   lib/librte_mbuf/rte_mbuf.h                  |  2 +
>   15 files changed, 208 insertions(+), 5 deletions(-)
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/3] rte_ethdev: Add API function to read dev clock
  2019-05-02 12:11 ` [dpdk-dev] [PATCH v4 1/3] rte_ethdev: Add API function to read dev clock Tom Barbette
@ 2019-05-08  7:54   ` Andrew Rybchenko
  2019-06-04 13:57   ` Ferruh Yigit
  1 sibling, 0 replies; 10+ messages in thread
From: Andrew Rybchenko @ 2019-05-08  7:54 UTC (permalink / raw)
  To: Tom Barbette, dev
  Cc: bruce.richardson, john.mcnamara, Thomas Monjalon, Ferruh Yigit,
	Shahaf Shuler, Yongseok Koh, olivier.matz

On 5/2/19 3:11 PM, Tom Barbette wrote:
> Add rte_eth_read_clock to read the raw clock of a device.
>
> The main use is to get the device clock conversion co-efficients to be
> able to translate the raw clock of the timestamp field of the pkt mbuf
> to a local synced time value.
>
> This function was missing to allow users to convert the Rx timestamp field
> to real time without the complexity of the rte_timesync* facility. One can
> derivate the clock frequency by calling twice read_clock and then keep a
> common time base.
>
> Signed-off-by: Tom Barbette <barbette@kth.se>

Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/3] rxtx_callbacks: Add support for HW timestamp
  2019-05-02 12:11 ` [dpdk-dev] [PATCH v4 3/3] rxtx_callbacks: Add support for HW timestamp Tom Barbette
@ 2019-05-31  7:46   ` Ferruh Yigit
  2019-06-13  5:55   ` Thomas Monjalon
  1 sibling, 0 replies; 10+ messages in thread
From: Ferruh Yigit @ 2019-05-31  7:46 UTC (permalink / raw)
  To: Tom Barbette, dev
  Cc: bruce.richardson, john.mcnamara, Thomas Monjalon,
	Andrew Rybchenko, Shahaf Shuler, Yongseok Koh, olivier.matz

On 5/2/2019 1:11 PM, Tom Barbette wrote:
> Use rxtx callback to demonstrate a way to use rte_eth_read_clock to
> convert the hardware timestamps to an amount of cycles.
> 
> This allows to get the amount of time the packet spent since its entry
> in the device. While the regular latency only shows the latency from
> when it entered the software stack.
> 
> Signed-off-by: Tom Barbette <barbette@kth.se>

Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [PATCH v4 0/3] Add rte_eth_read_clock API
  2019-05-02 12:11 [dpdk-dev] [PATCH v4 0/3] Add rte_eth_read_clock API Tom Barbette
                   ` (3 preceding siblings ...)
  2019-05-08  7:49 ` [dpdk-dev] [PATCH v4 0/3] Add rte_eth_read_clock API Tom Barbette
@ 2019-05-31  7:46 ` Ferruh Yigit
  4 siblings, 0 replies; 10+ messages in thread
From: Ferruh Yigit @ 2019-05-31  7:46 UTC (permalink / raw)
  To: Tom Barbette, dev
  Cc: bruce.richardson, john.mcnamara, Thomas Monjalon,
	Andrew Rybchenko, Shahaf Shuler, Yongseok Koh, olivier.matz

On 5/2/2019 1:11 PM, Tom Barbette wrote:
> Some NICs allow to timestamp packets, but do not support the full
> PTP synchronization process. Hence, the value set in the mbuf
> timestamp field is only the raw value of an internal clock.
> 
> To make sense of this value, one at least needs to be able to query
> the current hardware clock value. This patch series adds a new API to do
> so, rte_eth_read_clock. As with the TSC, from there
> a frequency can be derieved by querying multiple time the current value of the
> internal clock with some known delay between the queries (example
> provided in the API doc).
> 
> This patch series adds support of read_clock for MLX5.
> 
> An example app is provided in the rxtx_callback application.
> It has been updated to display, on top of the software latency
> in cycles, the total latency since the packet was received in hardware.
> The API is used to compute a delta in the Tx callback. The raw amount of
> ticks is converted to cycles using a variation of the technique describe above.
> 
> Aside from offloading timestamping, which relieve the
> software from a few operations, this allows to get much more precision
> when studying the source of the latency in a system.
> Eg. in our 100G, CX5 setup the rxtx callback application shows
> SW latency is around 74 cycles (TSC is 3.2Ghz), but the latency
> including NIC processing, PCIe, and queuing is around 196 cycles.
> 
> One may think at first this API is overlapping with te_eth_timesync_read_time.
> rte_eth_timesync_read_time is clearly identified as part of a set of functions
> to use PTP synchronization.
> The device raw clock is not "sync" in any way. More importantly, the returned
> value is not a timeval, but an amount of ticks. We could have a cast-based
> solution, but on top of being an ugly solution, some people seeing the timeval
> type of rte_eth_timesync_read_time could use it blindly.
> 
> Change in v2:
>   - Rebase on current master
> 
> Change in v3:
>   - Address comments from Ferruh Yigit
> 
> Changes in v4:
>   - Address comments from Keith Wiles and Andrew Rybchenko
>   - Use "clock" as argunment name everywhere.
>   - Expand the API description to make clear that read_clock gives an
>     amount in ticks, and that it has no unit.
> 
> Tom Barbette (3):
>   rte_ethdev: Add API function to read dev clock
>   mlx5: Implement support for read_clock
>   rxtx_callbacks: Add support for HW timestamp

Series applied to dpdk-next-net/master, thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/3] rte_ethdev: Add API function to read dev clock
  2019-05-02 12:11 ` [dpdk-dev] [PATCH v4 1/3] rte_ethdev: Add API function to read dev clock Tom Barbette
  2019-05-08  7:54   ` Andrew Rybchenko
@ 2019-06-04 13:57   ` Ferruh Yigit
  1 sibling, 0 replies; 10+ messages in thread
From: Ferruh Yigit @ 2019-06-04 13:57 UTC (permalink / raw)
  To: Tom Barbette, dev
  Cc: bruce.richardson, john.mcnamara, Thomas Monjalon,
	Andrew Rybchenko, Shahaf Shuler, Yongseok Koh, olivier.matz

On 5/2/2019 1:11 PM, Tom Barbette wrote:
> Add rte_eth_read_clock to read the raw clock of a device.
> 
> The main use is to get the device clock conversion co-efficients to be
> able to translate the raw clock of the timestamp field of the pkt mbuf
> to a local synced time value.
> 
> This function was missing to allow users to convert the Rx timestamp field
> to real time without the complexity of the rte_timesync* facility. One can
> derivate the clock frequency by calling twice read_clock and then keep a
> common time base.
> 
> Signed-off-by: Tom Barbette <barbette@kth.se>

<...>

> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Read the current clock counter of an Ethernet device
> + *
> + * This returns the current raw clock value of an Ethernet device. It is
> + * a raw amount of ticks, with no given time reference.
> + * The value returned here is from the same clock than the one
> + * filling timestamp field of Rx packets when using hardware timestamp
> + * offload. Therefore it can be used to compute a precise conversion of
> + * the device clock to the real time.
> + *
> + * E.g, a simple heuristic to derivate the frequency would be:
> + * uint64_t start, end;
> + * rte_eth_read_clock(port, start);
> + * rte_delay_ms(100);
> + * rte_eth_read_clock(port, end);
> + * double freq = (end - start) * 10;
> + *
> + * Compute a common reference with:
> + * uint64_t base_time_sec = current_time();
> + * uint64_t base_clock;
> + * rte_eth_read_clock(port, base_clock);
> + *
> + * Then, convert the raw mbuf timestamp with:
> + * base_time_sec + (double)(mbuf->timestamp - base_clock) / freq;
> + *
> + * This simple example will not provide a very good accuracy. One must
> + * at least measure multiple times the frequency and do a regression.
> + * To avoid deviation from the system time, the common reference can
> + * be repeated from time to time. The integer division can also be
> + * converted by a multiplication and a shift for better performance.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param time
> + *   Pointer to the uint64_t that holds the raw clock value.
> + *
> + * @return
> + *   - 0: Success.
> + *   - -ENODEV: The port ID is invalid.
> + *   - -ENOTSUP: The function is not supported by the Ethernet driver.
> + */
> +int __rte_experimental
> +rte_eth_read_clock(uint16_t port_id, uint64_t *clock);

This is causing doc build error, since @param should be "clock" instead of "time".
I have fixed this in the next-net repo and force pushed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/3] rxtx_callbacks: Add support for HW timestamp
  2019-05-02 12:11 ` [dpdk-dev] [PATCH v4 3/3] rxtx_callbacks: Add support for HW timestamp Tom Barbette
  2019-05-31  7:46   ` Ferruh Yigit
@ 2019-06-13  5:55   ` Thomas Monjalon
  1 sibling, 0 replies; 10+ messages in thread
From: Thomas Monjalon @ 2019-06-13  5:55 UTC (permalink / raw)
  To: Tom Barbette
  Cc: dev, bruce.richardson, john.mcnamara, Ferruh Yigit,
	Andrew Rybchenko, Shahaf Shuler, Yongseok Koh, olivier.matz

02/05/2019 21:11, Tom Barbette:
> +	if (hw_timestamping && ticks_per_cycle_mult  == 0) {
> +		uint64_t cycles_base = rte_rdtsc();
> +		uint64_t ticks_base;
> +		retval = rte_eth_read_clock(port, &ticks_base);
> +		if (retval != 0)
> +			return retval;
> +		rte_delay_ms(100);
> +		uint64_t cycles = rte_rdtsc();
> +		uint64_t ticks;
> +		rte_eth_read_clock(port, &ticks);
> +		uint64_t c_freq = cycles - cycles_base;
> +		uint64_t t_freq = ticks - ticks_base;
> +		double freq_mult = (double)c_freq / t_freq;
> +		printf("TSC Freq ~= %lu\nHW Freq ~= %lu\nRatio : %f\n",
> +				c_freq * 10, t_freq * 10, freq_mult);
> +		/* TSC will be faster than internal ticks so freq_mult is > 0
> +		 * We convert the multiplication to an integer shift & mult
> +		 */
> +		ticks_per_cycle_mult = (1 << TICKS_PER_CYCLE_SHIFT) / freq_mult;
> +	}

I see two issues in this code:
1/ statements are mixed with variable declarations
2/ %lu is used for 64-bit variables, which does not work on 32-bit system.

I am fixing item 2 when merging.
I hope item 1 won't be an issue for some old compilers.



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-06-13  5:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-02 12:11 [dpdk-dev] [PATCH v4 0/3] Add rte_eth_read_clock API Tom Barbette
2019-05-02 12:11 ` [dpdk-dev] [PATCH v4 1/3] rte_ethdev: Add API function to read dev clock Tom Barbette
2019-05-08  7:54   ` Andrew Rybchenko
2019-06-04 13:57   ` Ferruh Yigit
2019-05-02 12:11 ` [dpdk-dev] [PATCH v4 2/3] mlx5: Implement support for read_clock Tom Barbette
2019-05-02 12:11 ` [dpdk-dev] [PATCH v4 3/3] rxtx_callbacks: Add support for HW timestamp Tom Barbette
2019-05-31  7:46   ` Ferruh Yigit
2019-06-13  5:55   ` Thomas Monjalon
2019-05-08  7:49 ` [dpdk-dev] [PATCH v4 0/3] Add rte_eth_read_clock API Tom Barbette
2019-05-31  7:46 ` Ferruh Yigit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).