netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next V3 0/5] net/mlx4: HW timestamping support
@ 2013-04-23 16:06 Amir Vadai
  2013-04-23 16:06 ` [PATCH net-next V3 1/5] net/mlx4_core: Add timestamping device capability Amir Vadai
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Amir Vadai @ 2013-04-23 16:06 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Richard Cochran, Or Gerlitz, Eugenia Emantayev, Amir Vadai

This V3 series of patches introduces Ethernet HW timestamping support in the
mlx4 driver.

The series is made of of five patches
1. Add timestamping device capability
2. Read HCA frequency and map internal clock
3. Add HW timestamping (TS) support --  Enable/disable HW timestamping 
   for incoming and/or outgoing packets. Add and initialize all structs and 
   callbacks needed by the kernel timestamping API.
4. Add needed call to enable software timestamping
5. Add a service task to mlx4_en to run periodic tasks. Currently only running
   a watchdog to prevent clock overflow  

The patches provide raw HW timestamps, where registeration in the 
driver as PHC class device will be done in a 2nd step, once the basic 
code is merged.

Changes from V0:
- accept HWTSTAMP_FILTER_PTP_ and HWTSTAMP_FILTER_SOME flags but 
  reply with HWTSTAMP_FILTER_ALL
- remove usage of the timecompare API
- call netdev_features_change() after changing dev->features

Changes from V1:
- get_ts_info function provided.
- watchdog function was added to catch overflows
- indentation and style fixes
- en_timestamp.c => en_clock.c
- use utility clockcounter utility functions

Changes from V2:
- mac => MAC
- no need for macro CORE_CLOCK_MASK
- changed clocksource shift to 14 to prevent overflows
- added some comments to make calculations clearer
- use NSEC_PER_SEC macro
- initialize overflow_period in jiffies and not seconds

Amir Vadai (3):
  net/mlx4_en: Add HW timestamping (TS) support
  net/mlx4_en: Support software timestamping
  net/mlx4_en: Add a service task

Eugenia Emantayev (2):
  net/mlx4_core: Add timestamping device capability
  net/mlx4_core: Read HCA frequency and map internal clock

 drivers/infiniband/hw/mlx4/cq.c                   |    2 +-
 drivers/net/ethernet/mellanox/mlx4/Makefile       |    2 +-
 drivers/net/ethernet/mellanox/mlx4/cq.c           |   10 +-
 drivers/net/ethernet/mellanox/mlx4/en_clock.c     |  151 +++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx4/en_cq.c        |   10 +-
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c   |   30 ++++
 drivers/net/ethernet/mellanox/mlx4/en_main.c      |    5 +
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c    |   99 ++++++++++++++
 drivers/net/ethernet/mellanox/mlx4/en_resources.c |    3 +
 drivers/net/ethernet/mellanox/mlx4/en_rx.c        |   29 ++++-
 drivers/net/ethernet/mellanox/mlx4/en_tx.c        |   31 ++++-
 drivers/net/ethernet/mellanox/mlx4/fw.c           |   18 +++-
 drivers/net/ethernet/mellanox/mlx4/fw.h           |    1 +
 drivers/net/ethernet/mellanox/mlx4/main.c         |   79 +++++++++++
 drivers/net/ethernet/mellanox/mlx4/mlx4.h         |    6 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h      |   25 ++++-
 include/linux/mlx4/cq.h                           |   16 +++
 include/linux/mlx4/device.h                       |   10 +-
 18 files changed, 507 insertions(+), 20 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx4/en_clock.c

-- 
1.7.8.2

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net-next V3 1/5] net/mlx4_core: Add timestamping device capability
  2013-04-23 16:06 [PATCH net-next V3 0/5] net/mlx4: HW timestamping support Amir Vadai
@ 2013-04-23 16:06 ` Amir Vadai
  2013-04-23 16:06 ` [PATCH net-next V3 2/5] net/mlx4_core: Read HCA frequency and map internal clock Amir Vadai
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Amir Vadai @ 2013-04-23 16:06 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Richard Cochran, Or Gerlitz, Eugenia Emantayev, Amir Vadai

From: Eugenia Emantayev <eugenia@mellanox.com>

Add new device capability for timestamping support and query FW to retrieve it.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/fw.c |    7 ++++++-
 include/linux/mlx4/device.h             |    3 ++-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index ab470d9..f1e7097 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -130,7 +130,8 @@ static void dump_dev_cap_flags2(struct mlx4_dev *dev, u64 flags)
 		[1] = "RSS Toeplitz Hash Function support",
 		[2] = "RSS XOR Hash Function support",
 		[3] = "Device manage flow steering support",
-		[4] = "Automatic mac reassignment support"
+		[4] = "Automatic MAC reassignment support",
+		[5] = "Time stamping support"
 	};
 	int i;
 
@@ -444,6 +445,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 #define QUERY_DEV_CAP_MAX_MSG_SZ_OFFSET		0x38
 #define QUERY_DEV_CAP_MAX_GID_OFFSET		0x3b
 #define QUERY_DEV_CAP_RATE_SUPPORT_OFFSET	0x3c
+#define QUERY_DEV_CAP_CQ_TS_SUPPORT_OFFSET	0x3e
 #define QUERY_DEV_CAP_MAX_PKEY_OFFSET		0x3f
 #define QUERY_DEV_CAP_EXT_FLAGS_OFFSET		0x40
 #define QUERY_DEV_CAP_FLAGS_OFFSET		0x44
@@ -560,6 +562,9 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	dev_cap->fs_max_num_qp_per_entry = field;
 	MLX4_GET(stat_rate, outbox, QUERY_DEV_CAP_RATE_SUPPORT_OFFSET);
 	dev_cap->stat_rate_support = stat_rate;
+	MLX4_GET(field, outbox, QUERY_DEV_CAP_CQ_TS_SUPPORT_OFFSET);
+	if (field & 0x80)
+		dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_TS;
 	MLX4_GET(ext_flags, outbox, QUERY_DEV_CAP_EXT_FLAGS_OFFSET);
 	MLX4_GET(flags, outbox, QUERY_DEV_CAP_FLAGS_OFFSET);
 	dev_cap->flags = flags | (u64)ext_flags << 32;
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 1bc5a75..86ae260 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -152,7 +152,8 @@ enum {
 	MLX4_DEV_CAP_FLAG2_RSS_TOP		= 1LL <<  1,
 	MLX4_DEV_CAP_FLAG2_RSS_XOR		= 1LL <<  2,
 	MLX4_DEV_CAP_FLAG2_FS_EN		= 1LL <<  3,
-	MLX4_DEV_CAP_FLAGS2_REASSIGN_MAC_EN	= 1LL <<  4
+	MLX4_DEV_CAP_FLAGS2_REASSIGN_MAC_EN	= 1LL <<  4,
+	MLX4_DEV_CAP_FLAG2_TS			= 1LL <<  5
 };
 
 enum {
-- 
1.7.8.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next V3 2/5] net/mlx4_core: Read HCA frequency and map internal clock
  2013-04-23 16:06 [PATCH net-next V3 0/5] net/mlx4: HW timestamping support Amir Vadai
  2013-04-23 16:06 ` [PATCH net-next V3 1/5] net/mlx4_core: Add timestamping device capability Amir Vadai
@ 2013-04-23 16:06 ` Amir Vadai
  2013-04-23 16:06 ` [PATCH net-next V3 3/5] net/mlx4_en: Add HW timestamping (TS) support Amir Vadai
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Amir Vadai @ 2013-04-23 16:06 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Richard Cochran, Or Gerlitz, Eugenia Emantayev, Amir Vadai

From: Eugenia Emantayev <eugenia@mellanox.com>

Read HCA frequency, read PCI clock bar and offset, map internal clock to
PCI bar.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/fw.c   |   11 ++++++
 drivers/net/ethernet/mellanox/mlx4/fw.h   |    1 +
 drivers/net/ethernet/mellanox/mlx4/main.c |   57 +++++++++++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx4/mlx4.h |    6 +++-
 include/linux/mlx4/device.h               |    1 +
 5 files changed, 75 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index f1e7097..6776c25 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -1013,6 +1013,9 @@ int mlx4_QUERY_FW(struct mlx4_dev *dev)
 #define QUERY_FW_COMM_BASE_OFFSET      0x40
 #define QUERY_FW_COMM_BAR_OFFSET       0x48
 
+#define QUERY_FW_CLOCK_OFFSET	       0x50
+#define QUERY_FW_CLOCK_BAR	       0x58
+
 	mailbox = mlx4_alloc_cmd_mailbox(dev);
 	if (IS_ERR(mailbox))
 		return PTR_ERR(mailbox);
@@ -1087,6 +1090,12 @@ int mlx4_QUERY_FW(struct mlx4_dev *dev)
 		 fw->comm_bar, fw->comm_base);
 	mlx4_dbg(dev, "FW size %d KB\n", fw->fw_pages >> 2);
 
+	MLX4_GET(fw->clock_offset, outbox, QUERY_FW_CLOCK_OFFSET);
+	MLX4_GET(fw->clock_bar,    outbox, QUERY_FW_CLOCK_BAR);
+	fw->clock_bar = (fw->clock_bar >> 6) * 2;
+	mlx4_dbg(dev, "Internal clock bar:%d offset:0x%llx\n",
+		 fw->clock_bar, fw->clock_offset);
+
 	/*
 	 * Round up number of system pages needed in case
 	 * MLX4_ICM_PAGE_SIZE < PAGE_SIZE.
@@ -1374,6 +1383,7 @@ int mlx4_QUERY_HCA(struct mlx4_dev *dev,
 	u8 byte_field;
 
 #define QUERY_HCA_GLOBAL_CAPS_OFFSET	0x04
+#define QUERY_HCA_CORE_CLOCK_OFFSET	0x0c
 
 	mailbox = mlx4_alloc_cmd_mailbox(dev);
 	if (IS_ERR(mailbox))
@@ -1388,6 +1398,7 @@ int mlx4_QUERY_HCA(struct mlx4_dev *dev,
 		goto out;
 
 	MLX4_GET(param->global_caps, outbox, QUERY_HCA_GLOBAL_CAPS_OFFSET);
+	MLX4_GET(param->hca_core_clock, outbox, QUERY_HCA_CORE_CLOCK_OFFSET);
 
 	/* QPC/EEC/CQC/EQC/RDMARC attributes */
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.h b/drivers/net/ethernet/mellanox/mlx4/fw.h
index 151c2bb..fdf4166 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.h
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.h
@@ -162,6 +162,7 @@ struct mlx4_init_hca_param {
 	u64 global_caps;
 	u16 log_mc_entry_sz;
 	u16 log_mc_hash_sz;
+	u16 hca_core_clock; /* Internal Clock Frequency (in MHz) */
 	u8  log_num_qps;
 	u8  log_num_srqs;
 	u8  log_num_cqs;
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 16abde2..e81840f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -513,6 +513,8 @@ static int mlx4_slave_cap(struct mlx4_dev *dev)
 
 	mlx4_log_num_mgm_entry_size = hca_param.log_mc_entry_sz;
 
+	dev->caps.hca_core_clock = hca_param.hca_core_clock;
+
 	memset(&dev_cap, 0, sizeof(dev_cap));
 	dev->caps.max_qp_dest_rdma = 1 << hca_param.log_rd_per_qp;
 	err = mlx4_dev_cap(dev, &dev_cap);
@@ -1226,8 +1228,31 @@ static void unmap_bf_area(struct mlx4_dev *dev)
 		io_mapping_free(mlx4_priv(dev)->bf_mapping);
 }
 
+static int map_internal_clock(struct mlx4_dev *dev)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+
+	priv->clock_mapping =
+		ioremap(pci_resource_start(dev->pdev, priv->fw.clock_bar) +
+			priv->fw.clock_offset, MLX4_CLOCK_SIZE);
+
+	if (!priv->clock_mapping)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void unmap_internal_clock(struct mlx4_dev *dev)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+
+	if (priv->clock_mapping)
+		iounmap(priv->clock_mapping);
+}
+
 static void mlx4_close_hca(struct mlx4_dev *dev)
 {
+	unmap_internal_clock(dev);
 	unmap_bf_area(dev);
 	if (mlx4_is_slave(dev))
 		mlx4_slave_exit(dev);
@@ -1445,6 +1470,37 @@ static int mlx4_init_hca(struct mlx4_dev *dev)
 			mlx4_err(dev, "INIT_HCA command failed, aborting.\n");
 			goto err_free_icm;
 		}
+		/*
+		 * If TS is supported by FW
+		 * read HCA frequency by QUERY_HCA command
+		 */
+		if (dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_TS) {
+			memset(&init_hca, 0, sizeof(init_hca));
+			err = mlx4_QUERY_HCA(dev, &init_hca);
+			if (err) {
+				mlx4_err(dev, "QUERY_HCA command failed, disable timestamp.\n");
+				dev->caps.flags2 &= ~MLX4_DEV_CAP_FLAG2_TS;
+			} else {
+				dev->caps.hca_core_clock =
+					init_hca.hca_core_clock;
+			}
+
+			/* In case we got HCA frequency 0 - disable timestamping
+			 * to avoid dividing by zero
+			 */
+			if (!dev->caps.hca_core_clock) {
+				dev->caps.flags2 &= ~MLX4_DEV_CAP_FLAG2_TS;
+				mlx4_err(dev,
+					 "HCA frequency is 0. Timestamping is not supported.");
+			} else if (map_internal_clock(dev)) {
+				/*
+				 * Map internal clock,
+				 * in case of failure disable timestamping
+				 */
+				dev->caps.flags2 &= ~MLX4_DEV_CAP_FLAG2_TS;
+				mlx4_err(dev, "Failed to map internal clock. Timestamping is not supported.\n");
+			}
+		}
 	} else {
 		err = mlx4_init_slave(dev);
 		if (err) {
@@ -1478,6 +1534,7 @@ static int mlx4_init_hca(struct mlx4_dev *dev)
 	return 0;
 
 unmap_bf:
+	unmap_internal_clock(dev);
 	unmap_bf_area(dev);
 
 err_close:
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
index 252f4ba..0567f01 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
@@ -87,7 +87,8 @@ enum {
 	MLX4_HCR_SIZE		= 0x0001c,
 	MLX4_CLR_INT_SIZE	= 0x00008,
 	MLX4_SLAVE_COMM_BASE	= 0x0,
-	MLX4_COMM_PAGESIZE	= 0x1000
+	MLX4_COMM_PAGESIZE	= 0x1000,
+	MLX4_CLOCK_SIZE		= 0x00008
 };
 
 enum {
@@ -403,6 +404,7 @@ struct mlx4_fw {
 	u64			clr_int_base;
 	u64			catas_offset;
 	u64			comm_base;
+	u64			clock_offset;
 	struct mlx4_icm	       *fw_icm;
 	struct mlx4_icm	       *aux_icm;
 	u32			catas_size;
@@ -410,6 +412,7 @@ struct mlx4_fw {
 	u8			clr_int_bar;
 	u8			catas_bar;
 	u8			comm_bar;
+	u8			clock_bar;
 };
 
 struct mlx4_comm {
@@ -826,6 +829,7 @@ struct mlx4_priv {
 	struct list_head	bf_list;
 	struct mutex		bf_mutex;
 	struct io_mapping	*bf_mapping;
+	void __iomem            *clock_mapping;
 	int			reserved_mtts;
 	int			fs_hash_mode;
 	u8 virt2phys_pkey[MLX4_MFUNC_MAX][MLX4_MAX_PORTS][MLX4_MAX_PORT_PKEYS];
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 86ae260..e088290 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -445,6 +445,7 @@ struct mlx4_caps {
 	u8			eqe_factor;
 	u32			userspace_caps; /* userspace must be aware of these */
 	u32			function_caps;  /* VFs must be aware of these */
+	u16			hca_core_clock;
 };
 
 struct mlx4_buf_list {
-- 
1.7.8.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next V3 3/5] net/mlx4_en: Add HW timestamping (TS) support
  2013-04-23 16:06 [PATCH net-next V3 0/5] net/mlx4: HW timestamping support Amir Vadai
  2013-04-23 16:06 ` [PATCH net-next V3 1/5] net/mlx4_core: Add timestamping device capability Amir Vadai
  2013-04-23 16:06 ` [PATCH net-next V3 2/5] net/mlx4_core: Read HCA frequency and map internal clock Amir Vadai
@ 2013-04-23 16:06 ` Amir Vadai
  2013-04-25 19:26   ` Richard Cochran
  2013-04-23 16:06 ` [PATCH net-next V3 4/5] net/mlx4_en: Support software timestamping Amir Vadai
  2013-04-23 16:06 ` [PATCH net-next V3 5/5] net/mlx4_en: Add a service task Amir Vadai
  4 siblings, 1 reply; 13+ messages in thread
From: Amir Vadai @ 2013-04-23 16:06 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Richard Cochran, Or Gerlitz, Eugenia Emantayev, Amir Vadai

The patch allows to enable/disable HW timestamping for incoming and/or
outgoing packets. It adds and initializes all structs and callbacks
needed by kernel TS API.
To enable/disable HW timestamping appropriate ioctl should be used.
Currently HWTSTAMP_FILTER_ALL/NONE and HWTSAMP_TX_ON/OFF only are
supported.
When enabling TS on receive flow - VLAN stripping will be disabled.
Also were made all relevant changes in RX/TX flows to consider TS request
and plant HW timestamps into relevant structures.
mlx4_ib was fixed to compile with new mlx4_cq_alloc() signature.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 drivers/infiniband/hw/mlx4/cq.c                   |    2 +-
 drivers/net/ethernet/mellanox/mlx4/Makefile       |    2 +-
 drivers/net/ethernet/mellanox/mlx4/cq.c           |   10 +-
 drivers/net/ethernet/mellanox/mlx4/en_clock.c     |  132 +++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx4/en_cq.c        |   10 ++-
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c   |   30 +++++
 drivers/net/ethernet/mellanox/mlx4/en_main.c      |    5 +
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c    |   75 ++++++++++++
 drivers/net/ethernet/mellanox/mlx4/en_resources.c |    3 +
 drivers/net/ethernet/mellanox/mlx4/en_rx.c        |   29 ++++-
 drivers/net/ethernet/mellanox/mlx4/en_tx.c        |   29 ++++-
 drivers/net/ethernet/mellanox/mlx4/main.c         |   22 ++++
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h      |   21 +++-
 include/linux/mlx4/cq.h                           |   16 +++
 include/linux/mlx4/device.h                       |    6 +-
 15 files changed, 375 insertions(+), 17 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx4/en_clock.c

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index ae67df3..73b3a71 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -228,7 +228,7 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector
 		vector = dev->eq_table[vector % ibdev->num_comp_vectors];
 
 	err = mlx4_cq_alloc(dev->dev, entries, &cq->buf.mtt, uar,
-			    cq->db.dma, &cq->mcq, vector, 0);
+			    cq->db.dma, &cq->mcq, vector, 0, 0);
 	if (err)
 		goto err_dbmap;
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/Makefile b/drivers/net/ethernet/mellanox/mlx4/Makefile
index 293127d..3e9c70f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx4/Makefile
@@ -6,5 +6,5 @@ mlx4_core-y :=	alloc.o catas.o cmd.o cq.o eq.o fw.o icm.o intf.o main.o mcg.o \
 obj-$(CONFIG_MLX4_EN)               += mlx4_en.o
 
 mlx4_en-y := 	en_main.o en_tx.o en_rx.o en_ethtool.o en_port.o en_cq.o \
-		en_resources.o en_netdev.o en_selftest.o
+		en_resources.o en_netdev.o en_selftest.o en_clock.o
 mlx4_en-$(CONFIG_MLX4_EN_DCB) += en_dcb_nl.o
diff --git a/drivers/net/ethernet/mellanox/mlx4/cq.c b/drivers/net/ethernet/mellanox/mlx4/cq.c
index 0706623..004e423 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cq.c
@@ -240,9 +240,10 @@ static void mlx4_cq_free_icm(struct mlx4_dev *dev, int cqn)
 		__mlx4_cq_free_icm(dev, cqn);
 }
 
-int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
-		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq,
-		  unsigned vector, int collapsed)
+int mlx4_cq_alloc(struct mlx4_dev *dev, int nent,
+		  struct mlx4_mtt *mtt, struct mlx4_uar *uar, u64 db_rec,
+		  struct mlx4_cq *cq, unsigned vector, int collapsed,
+		  int timestamp_en)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	struct mlx4_cq_table *cq_table = &priv->cq_table;
@@ -276,6 +277,9 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
 	memset(cq_context, 0, sizeof *cq_context);
 
 	cq_context->flags	    = cpu_to_be32(!!collapsed << 18);
+	if (timestamp_en)
+		cq_context->flags  |= cpu_to_be32(1 << 19);
+
 	cq_context->logsize_usrpage = cpu_to_be32((ilog2(nent) << 24) | uar->index);
 	cq_context->comp_eqn	    = priv->eq_table.eq[vector].eqn;
 	cq_context->log_page_size   = mtt->page_shift - MLX4_ICM_PAGE_SHIFT;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_clock.c b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
new file mode 100644
index 0000000..501c72f
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
@@ -0,0 +1,132 @@
+/*
+ * Copyright (c) 2012 Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#include <linux/mlx4/device.h>
+
+#include "mlx4_en.h"
+
+int mlx4_en_timestamp_config(struct net_device *dev, int tx_type, int rx_filter)
+{
+	struct mlx4_en_priv *priv = netdev_priv(dev);
+	struct mlx4_en_dev *mdev = priv->mdev;
+	int port_up = 0;
+	int err = 0;
+
+	mutex_lock(&mdev->state_lock);
+	if (priv->port_up) {
+		port_up = 1;
+		mlx4_en_stop_port(dev, 1);
+	}
+
+	mlx4_en_free_resources(priv);
+
+	en_warn(priv, "Changing Time Stamp configuration\n");
+
+	priv->hwtstamp_config.tx_type = tx_type;
+	priv->hwtstamp_config.rx_filter = rx_filter;
+
+	if (rx_filter != HWTSTAMP_FILTER_NONE)
+		dev->features &= ~NETIF_F_HW_VLAN_CTAG_RX;
+	else
+		dev->features |= NETIF_F_HW_VLAN_CTAG_RX;
+
+	err = mlx4_en_alloc_resources(priv);
+	if (err) {
+		en_err(priv, "Failed reallocating port resources\n");
+		goto out;
+	}
+	if (port_up) {
+		err = mlx4_en_start_port(dev);
+		if (err)
+			en_err(priv, "Failed starting port\n");
+	}
+
+out:
+	mutex_unlock(&mdev->state_lock);
+	netdev_features_change(dev);
+	return err;
+}
+
+/* mlx4_en_read_clock - read raw cycle counter (to be used by time counter)
+ */
+static cycle_t mlx4_en_read_clock(const struct cyclecounter *tc)
+{
+	struct mlx4_en_dev *mdev =
+		container_of(tc, struct mlx4_en_dev, cycles);
+	struct mlx4_dev *dev = mdev->dev;
+
+	return mlx4_read_clock(dev) & tc->mask;
+}
+
+u64 mlx4_en_get_cqe_ts(struct mlx4_cqe *cqe)
+{
+	u64 hi, lo;
+	struct mlx4_ts_cqe *ts_cqe = (struct mlx4_ts_cqe *)cqe;
+
+	lo = (u64)be16_to_cpu(ts_cqe->timestamp_lo);
+	hi = ((u64)be32_to_cpu(ts_cqe->timestamp_hi) + !lo) << 16;
+
+	return hi | lo;
+}
+
+void mlx4_en_fill_hwtstamps(struct mlx4_en_dev *mdev,
+			    struct skb_shared_hwtstamps *hwts,
+			    u64 timestamp)
+{
+	u64 nsec;
+
+	nsec = timecounter_cyc2time(&mdev->clock, timestamp);
+
+	memset(hwts, 0, sizeof(struct skb_shared_hwtstamps));
+	hwts->hwtstamp = ns_to_ktime(nsec);
+}
+
+void mlx4_en_init_timestamp(struct mlx4_en_dev *mdev)
+{
+	struct mlx4_dev *dev = mdev->dev;
+
+	memset(&mdev->cycles, 0, sizeof(mdev->cycles));
+	mdev->cycles.read = mlx4_en_read_clock;
+	mdev->cycles.mask = CLOCKSOURCE_MASK(48);
+	/* Using shift to make calculation more accurate. Since current HW
+	 * clock frequency is 427 MHz, and cycles are given using a 48 bits
+	 * register, the biggest shift when calculating using u64, is 14
+	 * (max_cycles * multiplier < 2^64)
+	 */
+	mdev->cycles.shift = 14;
+	mdev->cycles.mult =
+		clocksource_khz2mult(1000 * dev->caps.hca_core_clock, mdev->cycles.shift);
+
+	timecounter_init(&mdev->clock, &mdev->cycles,
+			 ktime_to_ns(ktime_get_real()));
+}
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_cq.c b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
index b8d0854..1e6c594 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
@@ -77,6 +77,7 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
 	struct mlx4_en_dev *mdev = priv->mdev;
 	int err = 0;
 	char name[25];
+	int timestamp_en = 0;
 	struct cpu_rmap *rmap =
 #ifdef CONFIG_RFS_ACCEL
 		priv->dev->rx_cpu_rmap;
@@ -123,8 +124,13 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
 	if (!cq->is_tx)
 		cq->size = priv->rx_ring[cq->ring].actual_size;
 
-	err = mlx4_cq_alloc(mdev->dev, cq->size, &cq->wqres.mtt, &mdev->priv_uar,
-			    cq->wqres.db.dma, &cq->mcq, cq->vector, 0);
+	if ((cq->is_tx && priv->hwtstamp_config.tx_type) ||
+	    (!cq->is_tx && priv->hwtstamp_config.rx_filter))
+		timestamp_en = 1;
+
+	err = mlx4_cq_alloc(mdev->dev, cq->size, &cq->wqres.mtt,
+			    &mdev->priv_uar, cq->wqres.db.dma, &cq->mcq,
+			    cq->vector, 0, timestamp_en);
 	if (err)
 		return err;
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index 00f25b5..bcf4d11 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -1147,6 +1147,35 @@ out:
 	return err;
 }
 
+static int mlx4_en_get_ts_info(struct net_device *dev,
+			       struct ethtool_ts_info *info)
+{
+	struct mlx4_en_priv *priv = netdev_priv(dev);
+	struct mlx4_en_dev *mdev = priv->mdev;
+	int ret;
+
+	ret = ethtool_op_get_ts_info(dev, info);
+	if (ret)
+		return ret;
+
+	if (mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_TS) {
+		info->so_timestamping |=
+			SOF_TIMESTAMPING_TX_HARDWARE |
+			SOF_TIMESTAMPING_RX_HARDWARE |
+			SOF_TIMESTAMPING_RAW_HARDWARE;
+
+		info->tx_types =
+			(1 << HWTSTAMP_TX_OFF) |
+			(1 << HWTSTAMP_TX_ON);
+
+		info->rx_filters =
+			(1 << HWTSTAMP_FILTER_NONE) |
+			(1 << HWTSTAMP_FILTER_ALL);
+	}
+
+	return ret;
+}
+
 const struct ethtool_ops mlx4_en_ethtool_ops = {
 	.get_drvinfo = mlx4_en_get_drvinfo,
 	.get_settings = mlx4_en_get_settings,
@@ -1173,6 +1202,7 @@ const struct ethtool_ops mlx4_en_ethtool_ops = {
 	.set_rxfh_indir = mlx4_en_set_rxfh_indir,
 	.get_channels = mlx4_en_get_channels,
 	.set_channels = mlx4_en_set_channels,
+	.get_ts_info = mlx4_en_get_ts_info,
 };
 
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_main.c b/drivers/net/ethernet/mellanox/mlx4/en_main.c
index fc27800..a5c9df07 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_main.c
@@ -300,6 +300,11 @@ static void *mlx4_en_add(struct mlx4_dev *dev)
 		if (mlx4_en_init_netdev(mdev, i, &mdev->profile.prof[i]))
 			mdev->pndev[i] = NULL;
 	}
+
+	/* Initialize time stamp mechanism */
+	if (mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_TS)
+		mlx4_en_init_timestamp(mdev);
+
 	return mdev;
 
 err_mr:
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index e7e2784..4cb9f32 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1916,6 +1916,75 @@ static int mlx4_en_change_mtu(struct net_device *dev, int new_mtu)
 	return 0;
 }
 
+static int mlx4_en_hwtstamp_ioctl(struct net_device *dev, struct ifreq *ifr)
+{
+	struct mlx4_en_priv *priv = netdev_priv(dev);
+	struct mlx4_en_dev *mdev = priv->mdev;
+	struct hwtstamp_config config;
+
+	if (copy_from_user(&config, ifr->ifr_data, sizeof(config)))
+		return -EFAULT;
+
+	/* reserved for future extensions */
+	if (config.flags)
+		return -EINVAL;
+
+	/* device doesn't support time stamping */
+	if (!(mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_TS))
+		return -EINVAL;
+
+	/* TX HW timestamp */
+	switch (config.tx_type) {
+	case HWTSTAMP_TX_OFF:
+	case HWTSTAMP_TX_ON:
+		break;
+	default:
+		return -ERANGE;
+	}
+
+	/* RX HW timestamp */
+	switch (config.rx_filter) {
+	case HWTSTAMP_FILTER_NONE:
+		break;
+	case HWTSTAMP_FILTER_ALL:
+	case HWTSTAMP_FILTER_SOME:
+	case HWTSTAMP_FILTER_PTP_V1_L4_EVENT:
+	case HWTSTAMP_FILTER_PTP_V1_L4_SYNC:
+	case HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ:
+	case HWTSTAMP_FILTER_PTP_V2_L4_EVENT:
+	case HWTSTAMP_FILTER_PTP_V2_L4_SYNC:
+	case HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ:
+	case HWTSTAMP_FILTER_PTP_V2_L2_EVENT:
+	case HWTSTAMP_FILTER_PTP_V2_L2_SYNC:
+	case HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ:
+	case HWTSTAMP_FILTER_PTP_V2_EVENT:
+	case HWTSTAMP_FILTER_PTP_V2_SYNC:
+	case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ:
+		config.rx_filter = HWTSTAMP_FILTER_ALL;
+		break;
+	default:
+		return -ERANGE;
+	}
+
+	if (mlx4_en_timestamp_config(dev, config.tx_type, config.rx_filter)) {
+		config.tx_type = HWTSTAMP_TX_OFF;
+		config.rx_filter = HWTSTAMP_FILTER_NONE;
+	}
+
+	return copy_to_user(ifr->ifr_data, &config,
+			    sizeof(config)) ? -EFAULT : 0;
+}
+
+static int mlx4_en_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
+{
+	switch (cmd) {
+	case SIOCSHWTSTAMP:
+		return mlx4_en_hwtstamp_ioctl(dev, ifr);
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
 static int mlx4_en_set_features(struct net_device *netdev,
 		netdev_features_t features)
 {
@@ -1943,6 +2012,7 @@ static const struct net_device_ops mlx4_netdev_ops = {
 	.ndo_set_mac_address	= mlx4_en_set_mac,
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_change_mtu		= mlx4_en_change_mtu,
+	.ndo_do_ioctl		= mlx4_en_ioctl,
 	.ndo_tx_timeout		= mlx4_en_tx_timeout,
 	.ndo_vlan_rx_add_vid	= mlx4_en_vlan_rx_add_vid,
 	.ndo_vlan_rx_kill_vid	= mlx4_en_vlan_rx_kill_vid,
@@ -2054,6 +2124,11 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 	spin_lock_init(&priv->filters_lock);
 #endif
 
+	/* Initialize time stamping config */
+	priv->hwtstamp_config.flags = 0;
+	priv->hwtstamp_config.tx_type = HWTSTAMP_TX_OFF;
+	priv->hwtstamp_config.rx_filter = HWTSTAMP_FILTER_NONE;
+
 	/* Allocate page for receive rings */
 	err = mlx4_alloc_hwq_res(mdev->dev, &priv->res,
 				MLX4_EN_PAGE_SIZE, MLX4_EN_PAGE_SIZE);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_resources.c b/drivers/net/ethernet/mellanox/mlx4/en_resources.c
index 10c24c7..91f2b2c 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_resources.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_resources.c
@@ -42,6 +42,7 @@ void mlx4_en_fill_qp_context(struct mlx4_en_priv *priv, int size, int stride,
 			     int user_prio, struct mlx4_qp_context *context)
 {
 	struct mlx4_en_dev *mdev = priv->mdev;
+	struct net_device *dev = priv->dev;
 
 	memset(context, 0, sizeof *context);
 	context->flags = cpu_to_be32(7 << 16 | rss << MLX4_RSS_QPC_FLAG_OFFSET);
@@ -65,6 +66,8 @@ void mlx4_en_fill_qp_context(struct mlx4_en_priv *priv, int size, int stride,
 	context->cqn_send = cpu_to_be32(cqn);
 	context->cqn_recv = cpu_to_be32(cqn);
 	context->db_rec_addr = cpu_to_be64(priv->res.db.dma << 2);
+	if (!(dev->features & NETIF_F_HW_VLAN_CTAG_RX))
+		context->param3 |= cpu_to_be32(1 << 30);
 }
 
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 4006f88..02aee1e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -320,6 +320,8 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
 	}
 	ring->buf = ring->wqres.buf.direct.buf;
 
+	ring->hwtstamp_rx_filter = priv->hwtstamp_config.rx_filter;
+
 	return 0;
 
 err_hwq:
@@ -554,6 +556,7 @@ static void mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv,
 int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int budget)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
+	struct mlx4_en_dev *mdev = priv->mdev;
 	struct mlx4_cqe *cqe;
 	struct mlx4_en_rx_ring *ring = &priv->rx_ring[cq->ring];
 	struct mlx4_en_rx_alloc *frags;
@@ -565,6 +568,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 	int polled = 0;
 	int ip_summed;
 	int factor = priv->cqe_factor;
+	u64 timestamp;
 
 	if (!priv->port_up)
 		return 0;
@@ -669,8 +673,9 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 					gro_skb->data_len = length;
 					gro_skb->ip_summed = CHECKSUM_UNNECESSARY;
 
-					if (cqe->vlan_my_qpn &
-					    cpu_to_be32(MLX4_CQE_VLAN_PRESENT_MASK)) {
+					if ((cqe->vlan_my_qpn &
+					    cpu_to_be32(MLX4_CQE_VLAN_PRESENT_MASK)) &&
+					    (dev->features & NETIF_F_HW_VLAN_CTAG_RX)) {
 						u16 vid = be16_to_cpu(cqe->sl_vid);
 
 						__vlan_hwaccel_put_tag(gro_skb, htons(ETH_P_8021Q), vid);
@@ -680,8 +685,15 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 						gro_skb->rxhash = be32_to_cpu(cqe->immed_rss_invalid);
 
 					skb_record_rx_queue(gro_skb, cq->ring);
-					napi_gro_frags(&cq->napi);
 
+					if (ring->hwtstamp_rx_filter == HWTSTAMP_FILTER_ALL) {
+						timestamp = mlx4_en_get_cqe_ts(cqe);
+						mlx4_en_fill_hwtstamps(mdev,
+								       skb_hwtstamps(gro_skb),
+								       timestamp);
+					}
+
+					napi_gro_frags(&cq->napi);
 					goto next;
 				}
 
@@ -714,10 +726,17 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 		if (dev->features & NETIF_F_RXHASH)
 			skb->rxhash = be32_to_cpu(cqe->immed_rss_invalid);
 
-		if (be32_to_cpu(cqe->vlan_my_qpn) &
-		    MLX4_CQE_VLAN_PRESENT_MASK)
+		if ((be32_to_cpu(cqe->vlan_my_qpn) &
+		    MLX4_CQE_VLAN_PRESENT_MASK) &&
+		    (dev->features & NETIF_F_HW_VLAN_CTAG_RX))
 			__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), be16_to_cpu(cqe->sl_vid));
 
+		if (ring->hwtstamp_rx_filter == HWTSTAMP_FILTER_ALL) {
+			timestamp = mlx4_en_get_cqe_ts(cqe);
+			mlx4_en_fill_hwtstamps(mdev, skb_hwtstamps(skb),
+					       timestamp);
+		}
+
 		/* Push it up the stack */
 		netif_receive_skb(skb);
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 49308cc..b0a2d2b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -118,6 +118,8 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
 	} else
 		ring->bf_enabled = true;
 
+	ring->hwtstamp_tx_type = priv->hwtstamp_config.tx_type;
+
 	return 0;
 
 err_map:
@@ -192,8 +194,9 @@ void mlx4_en_deactivate_tx_ring(struct mlx4_en_priv *priv,
 
 static u32 mlx4_en_free_tx_desc(struct mlx4_en_priv *priv,
 				struct mlx4_en_tx_ring *ring,
-				int index, u8 owner)
+				int index, u8 owner, u64 timestamp)
 {
+	struct mlx4_en_dev *mdev = priv->mdev;
 	struct mlx4_en_tx_info *tx_info = &ring->tx_info[index];
 	struct mlx4_en_tx_desc *tx_desc = ring->buf + index * TXBB_SIZE;
 	struct mlx4_wqe_data_seg *data = (void *) tx_desc + tx_info->data_offset;
@@ -204,6 +207,12 @@ static u32 mlx4_en_free_tx_desc(struct mlx4_en_priv *priv,
 	int i;
 	__be32 *ptr = (__be32 *)tx_desc;
 	__be32 stamp = cpu_to_be32(STAMP_VAL | (!!owner << STAMP_SHIFT));
+	struct skb_shared_hwtstamps hwts;
+
+	if (timestamp) {
+		mlx4_en_fill_hwtstamps(mdev, &hwts, timestamp);
+		skb_tstamp_tx(skb, &hwts);
+	}
 
 	/* Optimize the common case when there are no wraparounds */
 	if (likely((void *) tx_desc + tx_info->nr_txbb * TXBB_SIZE <= end)) {
@@ -289,7 +298,7 @@ int mlx4_en_free_tx_buf(struct net_device *dev, struct mlx4_en_tx_ring *ring)
 	while (ring->cons != ring->prod) {
 		ring->last_nr_txbb = mlx4_en_free_tx_desc(priv, ring,
 						ring->cons & ring->size_mask,
-						!!(ring->cons & ring->size));
+						!!(ring->cons & ring->size), 0);
 		ring->cons += ring->last_nr_txbb;
 		cnt++;
 	}
@@ -318,6 +327,7 @@ static void mlx4_en_process_tx_cq(struct net_device *dev, struct mlx4_en_cq *cq)
 	u32 packets = 0;
 	u32 bytes = 0;
 	int factor = priv->cqe_factor;
+	u64 timestamp = 0;
 
 	if (!priv->port_up)
 		return;
@@ -341,11 +351,14 @@ static void mlx4_en_process_tx_cq(struct net_device *dev, struct mlx4_en_cq *cq)
 		do {
 			txbbs_skipped += ring->last_nr_txbb;
 			ring_index = (ring_index + ring->last_nr_txbb) & size_mask;
+			if (ring->tx_info[ring_index].ts_requested)
+				timestamp = mlx4_en_get_cqe_ts(cqe);
+
 			/* free next descriptor */
 			ring->last_nr_txbb = mlx4_en_free_tx_desc(
 					priv, ring, ring_index,
 					!!((ring->cons + txbbs_skipped) &
-							ring->size));
+					ring->size), timestamp);
 			packets++;
 			bytes += ring->tx_info[ring_index].nr_bytes;
 		} while (ring_index != new_index);
@@ -629,6 +642,16 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 	tx_info->skb = skb;
 	tx_info->nr_txbb = nr_txbb;
 
+	/*
+	 * For timestamping add flag to skb_shinfo and
+	 * set flag for further reference
+	 */
+	if (ring->hwtstamp_tx_type == HWTSTAMP_TX_ON &&
+	    skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) {
+		skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
+		tx_info->ts_requested = 1;
+	}
+
 	/* Prepare ctrl segement apart opcode+ownership, which depends on
 	 * whether LSO is used */
 	tx_desc->ctrl.vlan_tag = cpu_to_be16(vlan_tag);
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index e81840f..0d32a82 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -1228,6 +1228,28 @@ static void unmap_bf_area(struct mlx4_dev *dev)
 		io_mapping_free(mlx4_priv(dev)->bf_mapping);
 }
 
+cycle_t mlx4_read_clock(struct mlx4_dev *dev)
+{
+	u32 clockhi, clocklo, clockhi1;
+	cycle_t cycles;
+	int i;
+	struct mlx4_priv *priv = mlx4_priv(dev);
+
+	for (i = 0; i < 10; i++) {
+		clockhi = swab32(readl(priv->clock_mapping));
+		clocklo = swab32(readl(priv->clock_mapping + 4));
+		clockhi1 = swab32(readl(priv->clock_mapping));
+		if (clockhi == clockhi1)
+			break;
+	}
+
+	cycles = (u64) clockhi << 32 | (u64) clocklo;
+
+	return cycles;
+}
+EXPORT_SYMBOL_GPL(mlx4_read_clock);
+
+
 static int map_internal_clock(struct mlx4_dev *dev)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index d4cb5d3..85b0754 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -40,6 +40,7 @@
 #include <linux/mutex.h>
 #include <linux/netdevice.h>
 #include <linux/if_vlan.h>
+#include <linux/net_tstamp.h>
 #ifdef CONFIG_MLX4_EN_DCB
 #include <linux/dcbnl.h>
 #endif
@@ -207,6 +208,7 @@ struct mlx4_en_tx_info {
 	u8 linear;
 	u8 data_offset;
 	u8 inl;
+	u8 ts_requested;
 };
 
 
@@ -262,6 +264,7 @@ struct mlx4_en_tx_ring {
 	struct mlx4_bf bf;
 	bool bf_enabled;
 	struct netdev_queue *tx_queue;
+	int hwtstamp_tx_type;
 };
 
 struct mlx4_en_rx_desc {
@@ -288,6 +291,7 @@ struct mlx4_en_rx_ring {
 	unsigned long packets;
 	unsigned long csum_ok;
 	unsigned long csum_none;
+	int hwtstamp_rx_filter;
 };
 
 struct mlx4_en_cq {
@@ -348,6 +352,9 @@ struct mlx4_en_dev {
 	u32                     priv_pdn;
 	spinlock_t              uar_lock;
 	u8			mac_removed[MLX4_MAX_PORTS + 1];
+	struct cyclecounter	cycles;
+	struct timecounter	clock;
+	unsigned long		last_overflow_check;
 };
 
 
@@ -525,6 +532,7 @@ struct mlx4_en_priv {
 	struct device *ddev;
 	int base_tx_qpn;
 	struct hlist_head mac_hash[MLX4_EN_MAC_HASH_SIZE];
+	struct hwtstamp_config hwtstamp_config;
 
 #ifdef CONFIG_MLX4_EN_DCB
 	struct ieee_ets ets;
@@ -639,7 +647,18 @@ void mlx4_en_ex_selftest(struct net_device *dev, u32 *flags, u64 *buf);
 u64 mlx4_en_mac_to_u64(u8 *addr);
 
 /*
- * Globals
+ * Functions for time stamping
+ */
+u64 mlx4_en_get_cqe_ts(struct mlx4_cqe *cqe);
+void mlx4_en_fill_hwtstamps(struct mlx4_en_dev *mdev,
+			    struct skb_shared_hwtstamps *hwts,
+			    u64 timestamp);
+void mlx4_en_init_timestamp(struct mlx4_en_dev *mdev);
+int mlx4_en_timestamp_config(struct net_device *dev,
+			     int tx_type,
+			     int rx_filter);
+
+/* Globals
  */
 extern const struct ethtool_ops mlx4_en_ethtool_ops;
 
diff --git a/include/linux/mlx4/cq.h b/include/linux/mlx4/cq.h
index 6f65b2c..98fa492 100644
--- a/include/linux/mlx4/cq.h
+++ b/include/linux/mlx4/cq.h
@@ -64,6 +64,22 @@ struct mlx4_err_cqe {
 	u8			owner_sr_opcode;
 };
 
+struct mlx4_ts_cqe {
+	__be32			vlan_my_qpn;
+	__be32			immed_rss_invalid;
+	__be32			g_mlpath_rqpn;
+	__be32			timestamp_hi;
+	__be16			status;
+	u8			ipv6_ext_mask;
+	u8			badfcs_enc;
+	__be32			byte_cnt;
+	__be16			wqe_index;
+	__be16			checksum;
+	u8			reserved;
+	__be16			timestamp_lo;
+	u8			owner_sr_opcode;
+} __packed;
+
 enum {
 	MLX4_CQE_VLAN_PRESENT_MASK	= 1 << 29,
 	MLX4_CQE_QPN_MASK		= 0xffffff,
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index e088290..2fbc146 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -40,6 +40,8 @@
 
 #include <linux/atomic.h>
 
+#include <linux/clocksource.h>
+
 #define MAX_MSIX_P_PORT		17
 #define MAX_MSIX		64
 #define MSIX_LEGACY_SZ		4
@@ -840,7 +842,7 @@ void mlx4_free_hwq_res(struct mlx4_dev *mdev, struct mlx4_hwq_resources *wqres,
 
 int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
 		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq,
-		  unsigned vector, int collapsed);
+		  unsigned vector, int collapsed, int timestamp_en);
 void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq);
 
 int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base);
@@ -1031,4 +1033,6 @@ int set_and_calc_slave_port_state(struct mlx4_dev *dev, int slave, u8 port, int
 void mlx4_put_slave_node_guid(struct mlx4_dev *dev, int slave, __be64 guid);
 __be64 mlx4_get_slave_node_guid(struct mlx4_dev *dev, int slave);
 
+cycle_t mlx4_read_clock(struct mlx4_dev *dev);
+
 #endif /* MLX4_DEVICE_H */
-- 
1.7.8.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next V3 4/5] net/mlx4_en: Support software timestamping
  2013-04-23 16:06 [PATCH net-next V3 0/5] net/mlx4: HW timestamping support Amir Vadai
                   ` (2 preceding siblings ...)
  2013-04-23 16:06 ` [PATCH net-next V3 3/5] net/mlx4_en: Add HW timestamping (TS) support Amir Vadai
@ 2013-04-23 16:06 ` Amir Vadai
  2013-04-23 16:06 ` [PATCH net-next V3 5/5] net/mlx4_en: Add a service task Amir Vadai
  4 siblings, 0 replies; 13+ messages in thread
From: Amir Vadai @ 2013-04-23 16:06 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Richard Cochran, Or Gerlitz, Eugenia Emantayev, Amir Vadai

Kernel software timestamping requires that the driver calls skb_tx_timestamp
just before passing the skb to the HW MAC layer. This patch adds this call.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_tx.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index b0a2d2b..4e6877a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -752,6 +752,8 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (bounce)
 		tx_desc = mlx4_en_bounce_to_desc(priv, ring, index, desc_size);
 
+	skb_tx_timestamp(skb);
+
 	if (ring->bf_enabled && desc_size <= MAX_BF && !bounce && !vlan_tx_tag_present(skb)) {
 		*(__be32 *) (&tx_desc->ctrl.vlan_tag) |= cpu_to_be32(ring->doorbell_qpn);
 		op_own |= htonl((bf_index & 0xffff) << 8);
-- 
1.7.8.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next V3 5/5] net/mlx4_en: Add a service task
  2013-04-23 16:06 [PATCH net-next V3 0/5] net/mlx4: HW timestamping support Amir Vadai
                   ` (3 preceding siblings ...)
  2013-04-23 16:06 ` [PATCH net-next V3 4/5] net/mlx4_en: Support software timestamping Amir Vadai
@ 2013-04-23 16:06 ` Amir Vadai
  2013-04-26  0:13   ` Eric Dumazet
  4 siblings, 1 reply; 13+ messages in thread
From: Amir Vadai @ 2013-04-23 16:06 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Richard Cochran, Or Gerlitz, Eugenia Emantayev, Amir Vadai

Add a service task to run tasks that needed to be executed periodically.
Currently the only task is a watchdog to catch NIC clock overflow, to make
timestamping accurate.
Will move the statistics task into this framework in a later patch.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_clock.c  |   19 +++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |   24 ++++++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |    4 ++++
 3 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_clock.c b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
index 501c72f..2f18121 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_clock.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
@@ -129,4 +129,23 @@ void mlx4_en_init_timestamp(struct mlx4_en_dev *mdev)
 
 	timecounter_init(&mdev->clock, &mdev->cycles,
 			 ktime_to_ns(ktime_get_real()));
+
+	/* Calculate period in seconds to call the overflow watchdog - to make
+	 * sure counter is checked at least once every wrap around.
+	 */
+	mdev->overflow_period =
+		(cyclecounter_cyc2ns(&mdev->cycles,
+				    mdev->cycles.mask) / NSEC_PER_SEC / 2)
+		* HZ;
+}
+
+void mlx4_en_ptp_overflow_check(struct mlx4_en_dev *mdev)
+{
+	bool timeout = time_is_before_jiffies(mdev->last_overflow_check +
+					      mdev->overflow_period);
+
+	if (timeout) {
+		timecounter_read(&mdev->clock);
+		mdev->last_overflow_check = jiffies;
+	}
 }
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 4cb9f32..f4f88b8 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1361,6 +1361,26 @@ static void mlx4_en_do_get_stats(struct work_struct *work)
 	mutex_unlock(&mdev->state_lock);
 }
 
+/* mlx4_en_service_task - Run service task for tasks that needed to be done
+ * periodically
+ */
+static void mlx4_en_service_task(struct work_struct *work)
+{
+	struct delayed_work *delay = to_delayed_work(work);
+	struct mlx4_en_priv *priv = container_of(delay, struct mlx4_en_priv,
+						 service_task);
+	struct mlx4_en_dev *mdev = priv->mdev;
+
+	mutex_lock(&mdev->state_lock);
+	if (mdev->device_up) {
+		mlx4_en_ptp_overflow_check(mdev);
+
+		queue_delayed_work(mdev->workqueue, &priv->service_task,
+				   SERVICE_TASK_DELAY);
+	}
+	mutex_unlock(&mdev->state_lock);
+}
+
 static void mlx4_en_linkstate(struct work_struct *work)
 {
 	struct mlx4_en_priv *priv = container_of(work, struct mlx4_en_priv,
@@ -1865,6 +1885,7 @@ void mlx4_en_destroy_netdev(struct net_device *dev)
 		mlx4_free_hwq_res(mdev->dev, &priv->res, MLX4_EN_PAGE_SIZE);
 
 	cancel_delayed_work(&priv->stats_task);
+	cancel_delayed_work(&priv->service_task);
 	/* flush any pending task for this netdev */
 	flush_workqueue(mdev->workqueue);
 
@@ -2084,6 +2105,7 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 	INIT_WORK(&priv->watchdog_task, mlx4_en_restart);
 	INIT_WORK(&priv->linkstate_task, mlx4_en_linkstate);
 	INIT_DELAYED_WORK(&priv->stats_task, mlx4_en_do_get_stats);
+	INIT_DELAYED_WORK(&priv->service_task, mlx4_en_service_task);
 #ifdef CONFIG_MLX4_EN_DCB
 	if (!mlx4_is_slave(priv->mdev->dev)) {
 		if (mdev->dev->caps.flags & MLX4_DEV_CAP_FLAG_SET_ETH_SCHED) {
@@ -2206,6 +2228,8 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 	}
 	mlx4_en_set_default_moderation(priv);
 	queue_delayed_work(mdev->workqueue, &priv->stats_task, STATS_DELAY);
+	queue_delayed_work(mdev->workqueue, &priv->service_task,
+			   SERVICE_TASK_DELAY);
 	return 0;
 
 out:
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 85b0754..b1d7657 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -78,6 +78,7 @@
 #define STAMP_SHIFT		31
 #define STAMP_VAL		0x7fffffff
 #define STATS_DELAY		(HZ / 4)
+#define SERVICE_TASK_DELAY	(HZ / 4)
 #define MAX_NUM_OF_FS_RULES	256
 
 #define MLX4_EN_FILTER_HASH_SHIFT 4
@@ -355,6 +356,7 @@ struct mlx4_en_dev {
 	struct cyclecounter	cycles;
 	struct timecounter	clock;
 	unsigned long		last_overflow_check;
+	unsigned long		overflow_period;
 };
 
 
@@ -519,6 +521,7 @@ struct mlx4_en_priv {
 	struct work_struct watchdog_task;
 	struct work_struct linkstate_task;
 	struct delayed_work stats_task;
+	struct delayed_work service_task;
 	struct mlx4_en_perf_stats pstats;
 	struct mlx4_en_pkt_stats pkstats;
 	struct mlx4_en_port_stats port_stats;
@@ -645,6 +648,7 @@ void mlx4_en_cleanup_filters(struct mlx4_en_priv *priv,
 #define MLX4_EN_NUM_SELF_TEST	5
 void mlx4_en_ex_selftest(struct net_device *dev, u32 *flags, u64 *buf);
 u64 mlx4_en_mac_to_u64(u8 *addr);
+void mlx4_en_ptp_overflow_check(struct mlx4_en_dev *mdev);
 
 /*
  * Functions for time stamping
-- 
1.7.8.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next V3 3/5] net/mlx4_en: Add HW timestamping (TS) support
  2013-04-23 16:06 ` [PATCH net-next V3 3/5] net/mlx4_en: Add HW timestamping (TS) support Amir Vadai
@ 2013-04-25 19:26   ` Richard Cochran
  2013-04-28  6:33     ` Amir Vadai
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Cochran @ 2013-04-25 19:26 UTC (permalink / raw)
  To: Amir Vadai; +Cc: David S. Miller, netdev, Or Gerlitz, Eugenia Emantayev

On Tue, Apr 23, 2013 at 07:06:49PM +0300, Amir Vadai wrote:

> +u64 mlx4_en_get_cqe_ts(struct mlx4_cqe *cqe)
> +{
> +	u64 hi, lo;
> +	struct mlx4_ts_cqe *ts_cqe = (struct mlx4_ts_cqe *)cqe;
> +
> +	lo = (u64)be16_to_cpu(ts_cqe->timestamp_lo);
> +	hi = ((u64)be32_to_cpu(ts_cqe->timestamp_hi) + !lo) << 16;
                                                     ^^^^^
That looks a bit strange. Can you explain?

> +
> +	return hi | lo;
> +}

Thanks,
Richard

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next V3 5/5] net/mlx4_en: Add a service task
  2013-04-23 16:06 ` [PATCH net-next V3 5/5] net/mlx4_en: Add a service task Amir Vadai
@ 2013-04-26  0:13   ` Eric Dumazet
  2013-04-26  2:24     ` Or Gerlitz
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2013-04-26  0:13 UTC (permalink / raw)
  To: Amir Vadai
  Cc: David S. Miller, netdev, Richard Cochran, Or Gerlitz, Eugenia Emantayev

On Tue, 2013-04-23 at 19:06 +0300, Amir Vadai wrote:
> Add a service task to run tasks that needed to be executed periodically.
> Currently the only task is a watchdog to catch NIC clock overflow, to make
> timestamping accurate.
> Will move the statistics task into this framework in a later patch.
> 
> Signed-off-by: Amir Vadai <amirv@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlx4/en_clock.c  |   19 +++++++++++++++++++
>  drivers/net/ethernet/mellanox/mlx4/en_netdev.c |   24 ++++++++++++++++++++++++
>  drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |    4 ++++
>  3 files changed, 47 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_clock.c b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
> index 501c72f..2f18121 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_clock.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
> @@ -129,4 +129,23 @@ void mlx4_en_init_timestamp(struct mlx4_en_dev *mdev)
>  
>  	timecounter_init(&mdev->clock, &mdev->cycles,
>  			 ktime_to_ns(ktime_get_real()));
> +
> +	/* Calculate period in seconds to call the overflow watchdog - to make
> +	 * sure counter is checked at least once every wrap around.
> +	 */
> +	mdev->overflow_period =
> +		(cyclecounter_cyc2ns(&mdev->cycles,
> +				    mdev->cycles.mask) / NSEC_PER_SEC / 2)
> +		* HZ;
> +}
> +
> +void mlx4_en_ptp_overflow_check(struct mlx4_en_dev *mdev)
> +{
> +	bool timeout = time_is_before_jiffies(mdev->last_overflow_check +
> +					      mdev->overflow_period);
> +
> +	if (timeout) {
> +		timecounter_read(&mdev->clock);
> +		mdev->last_overflow_check = jiffies;
> +	}
>  }
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index 4cb9f32..f4f88b8 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -1361,6 +1361,26 @@ static void mlx4_en_do_get_stats(struct work_struct *work)
>  	mutex_unlock(&mdev->state_lock);
>  }
>  
> +/* mlx4_en_service_task - Run service task for tasks that needed to be done
> + * periodically
> + */
> +static void mlx4_en_service_task(struct work_struct *work)
> +{
> +	struct delayed_work *delay = to_delayed_work(work);
> +	struct mlx4_en_priv *priv = container_of(delay, struct mlx4_en_priv,
> +						 service_task);
> +	struct mlx4_en_dev *mdev = priv->mdev;
> +
> +	mutex_lock(&mdev->state_lock);
> +	if (mdev->device_up) {
> +		mlx4_en_ptp_overflow_check(mdev);
> +
> +		queue_delayed_work(mdev->workqueue, &priv->service_task,
> +				   SERVICE_TASK_DELAY);
> +	}
> +	mutex_unlock(&mdev->state_lock);
> +}
> +

What if mlx4_en_init_timestamp() was not called ?

if (mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_TS)
	mlx4_en_init_timestamp(mdev);

Answer : a NULL deref

[   67.414454] BUG: unable to handle kernel NULL pointer dereference at           (null)
[   67.422313] IP: [<ffffffff810c9db7>] timecounter_read+0x17/0x50
[   67.428255] PGD c5faa3067 PUD c6002f067 PMD 0 
[   67.432747] Oops: 0000 [#1] SMP 
[   67.436007] Modules linked in: msr cpuid genrtc mlx4_en ib_uverbs mlx4_ib ib_sa ib_mad ib_core mlx4_core libcrc32c mdio ipv6
[   67.448615] CPU 0 
[   67.450457] Pid: 1321, comm: kworker/u:7 Not tainted 3.9.0-smp-DEV #328 Intel TBG,ICH10/I
[   67.459770] RIP: 0010:[<ffffffff810c9db7>]  [<ffffffff810c9db7>] timecounter_read+0x17/0x50
[   67.468130] RSP: 0018:ffff880660635d88  EFLAGS: 00010296
[   67.473438] RAX: 0000000000000000 RBX: ffff8806604ce368 RCX: ffff88064f8e1d58
[   67.480568] RDX: 00000000fffc72a1 RSI: ffffffff81b18510 RDI: 0000000000000000
[   67.487695] RBP: ffff880660635d98 R08: eac0000000000000 R09: dfe649cdda2e1d58
[   67.494824] R10: dfe649cdda2e1d58 R11: 0000000000000000 R12: ffff88064f8e1d58
[   67.501953] R13: ffff8806604ce200 R14: 0000000000000000 R15: ffff8806604ce005
[   67.509085] FS:  0000000000000000(0000) GS:ffff88067fc00000(0000) knlGS:0000000000000000
[   67.517168] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   67.522910] CR2: 0000000000000000 CR3: 0000000c6074d000 CR4: 00000000000007f0
[   67.530037] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   67.537166] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   67.544298] Process kworker/u:7 (pid: 1321, threadinfo ffff880660634000, task ffff880660d4d820)
[   67.552986] Stack:
[   67.554997]  ffff88067fc127c0 ffff8806604ce200 ffff880660635db8 ffffffffa01e1274
[   67.562457]  ffff8806604ce210 ffff8806604ce210 ffff880660635de8 ffffffffa01dc78e
[   67.569917]  ffffffff81b18500 ffff880c612ec540 ffffffff81b18500 ffff8806604ce000
[   67.577377] Call Trace:
[   67.579826]  [<ffffffffa01e1274>] mlx4_en_ptp_overflow_check+0x44/0x60 [mlx4_en]
[   67.587223]  [<ffffffffa01dc78e>] mlx4_en_service_task+0x3e/0x70 [mlx4_en]
[   67.594099]  [<ffffffff810a1df5>] process_one_work+0x175/0x3f0
[   67.599926]  [<ffffffff810a3208>] worker_thread+0x118/0x370
[   67.605494]  [<ffffffff810a30f0>] ? manage_workers+0x390/0x390
[   67.611325]  [<ffffffff810a91d0>] kthread+0xc0/0xd0
[   67.616199]  [<ffffffff810a9110>] ? flush_kthread_worker+0x80/0x80
[   67.622374]  [<ffffffff815c699c>] ret_from_fork+0x7c/0xb0
[   67.627768]  [<ffffffff810a9110>] ? flush_kthread_worker+0x80/0x80
[   67.633941] Code: 8b 65 f8 48 8b 5d f0 c9 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 07 48 89 c7 <ff> 10 48 8b 0b 48 89 c2 48 2b 53 08 48 23 51 08 8b 71 10 8b 49 
[   67.653910] RIP  [<ffffffff810c9db7>] timecounter_read+0x17/0x50
[   67.659931]  RSP <ffff880660635d88>
[   67.663418] CR2: 0000000000000000
[   67.666757] ---[ end trace 5a2c9af6569fc2bb ]---
[   67.671388] Kernel panic - not syncing: Fatal exception

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next V3 5/5] net/mlx4_en: Add a service task
  2013-04-26  0:13   ` Eric Dumazet
@ 2013-04-26  2:24     ` Or Gerlitz
  2013-04-26  2:39       ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Or Gerlitz @ 2013-04-26  2:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Amir Vadai, David S. Miller, netdev, Richard Cochran, Or Gerlitz,
	Eugenia Emantayev

On Fri, Apr 26, 2013 at 3:13 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2013-04-23 at 19:06 +0300, Amir Vadai wrote:
> > Add a service task to run tasks that needed to be executed periodically.
> > Currently the only task is a watchdog to catch NIC clock overflow, to
> > make timestamping accurate. Will move the statistics task into this framework in a >> later patch.
> >
> > Signed-off-by: Amir Vadai <amirv@mellanox.com>
> > ---
> >  drivers/net/ethernet/mellanox/mlx4/en_clock.c  |   19
> > +++++++++++++++++++
> >  drivers/net/ethernet/mellanox/mlx4/en_netdev.c |   24
> > ++++++++++++++++++++++++
> >  drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |    4 ++++
> >  3 files changed, 47 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_clock.c
> > b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
> > index 501c72f..2f18121 100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/en_clock.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
> > @@ -129,4 +129,23 @@ void mlx4_en_init_timestamp(struct mlx4_en_dev
> > *mdev)
> >
> >       timecounter_init(&mdev->clock, &mdev->cycles,
> >                        ktime_to_ns(ktime_get_real()));
> > +
> > +     /* Calculate period in seconds to call the overflow watchdog - to
> > make
> > +      * sure counter is checked at least once every wrap around.
> > +      */
> > +     mdev->overflow_period =
> > +             (cyclecounter_cyc2ns(&mdev->cycles,
> > +                                 mdev->cycles.mask) / NSEC_PER_SEC / 2)
> > +             * HZ;
> > +}
> > +
> > +void mlx4_en_ptp_overflow_check(struct mlx4_en_dev *mdev)
> > +{
> > +     bool timeout = time_is_before_jiffies(mdev->last_overflow_check +
> > +                                           mdev->overflow_period);
> > +
> > +     if (timeout) {
> > +             timecounter_read(&mdev->clock);
> > +             mdev->last_overflow_check = jiffies;
> > +     }
> >  }
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> > b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> > index 4cb9f32..f4f88b8 100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> > @@ -1361,6 +1361,26 @@ static void mlx4_en_do_get_stats(struct
> > work_struct *work)
> >       mutex_unlock(&mdev->state_lock);
> >  }
> >
> > +/* mlx4_en_service_task - Run service task for tasks that needed to be
> > done
> > + * periodically
> > + */
> > +static void mlx4_en_service_task(struct work_struct *work)
> > +{
> > +     struct delayed_work *delay = to_delayed_work(work);
> > +     struct mlx4_en_priv *priv = container_of(delay, struct
> > mlx4_en_priv,
> > +                                              service_task);
> > +     struct mlx4_en_dev *mdev = priv->mdev;
> > +
> > +     mutex_lock(&mdev->state_lock);
> > +     if (mdev->device_up) {
> > +             mlx4_en_ptp_overflow_check(mdev);
> > +
> > +             queue_delayed_work(mdev->workqueue, &priv->service_task,
> > +                                SERVICE_TASK_DELAY);
> > +     }
> > +     mutex_unlock(&mdev->state_lock);
> > +}
> > +
>
> What if mlx4_en_init_timestamp() was not called ?
>
> if (mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_TS)
>         mlx4_en_init_timestamp(mdev);
>
> Answer : a NULL deref

Hi Eric,

This is fixed in "[PATCH V1 net-next 2/8] net/mlx4_en: Disable HW
clock overflow check when no HW support"
http://marc.info/?l=linux-netdev&m=136690338314540&w=2

Or.


>
> [   67.414454] BUG: unable to handle kernel NULL pointer dereference at
> (null)
> [   67.422313] IP: [<ffffffff810c9db7>] timecounter_read+0x17/0x50
> [   67.428255] PGD c5faa3067 PUD c6002f067 PMD 0
> [   67.432747] Oops: 0000 [#1] SMP
> [   67.436007] Modules linked in: msr cpuid genrtc mlx4_en ib_uverbs
> mlx4_ib ib_sa ib_mad ib_core mlx4_core libcrc32c mdio ipv6
> [   67.448615] CPU 0
> [   67.450457] Pid: 1321, comm: kworker/u:7 Not tainted 3.9.0-smp-DEV #328
> Intel TBG,ICH10/I
> [   67.459770] RIP: 0010:[<ffffffff810c9db7>]  [<ffffffff810c9db7>]
> timecounter_read+0x17/0x50
> [   67.468130] RSP: 0018:ffff880660635d88  EFLAGS: 00010296
> [   67.473438] RAX: 0000000000000000 RBX: ffff8806604ce368 RCX:
> ffff88064f8e1d58
> [   67.480568] RDX: 00000000fffc72a1 RSI: ffffffff81b18510 RDI:
> 0000000000000000
> [   67.487695] RBP: ffff880660635d98 R08: eac0000000000000 R09:
> dfe649cdda2e1d58
> [   67.494824] R10: dfe649cdda2e1d58 R11: 0000000000000000 R12:
> ffff88064f8e1d58
> [   67.501953] R13: ffff8806604ce200 R14: 0000000000000000 R15:
> ffff8806604ce005
> [   67.509085] FS:  0000000000000000(0000) GS:ffff88067fc00000(0000)
> knlGS:0000000000000000
> [   67.517168] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [   67.522910] CR2: 0000000000000000 CR3: 0000000c6074d000 CR4:
> 00000000000007f0
> [   67.530037] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [   67.537166] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [   67.544298] Process kworker/u:7 (pid: 1321, threadinfo
> ffff880660634000, task ffff880660d4d820)
> [   67.552986] Stack:
> [   67.554997]  ffff88067fc127c0 ffff8806604ce200 ffff880660635db8
> ffffffffa01e1274
> [   67.562457]  ffff8806604ce210 ffff8806604ce210 ffff880660635de8
> ffffffffa01dc78e
> [   67.569917]  ffffffff81b18500 ffff880c612ec540 ffffffff81b18500
> ffff8806604ce000
> [   67.577377] Call Trace:
> [   67.579826]  [<ffffffffa01e1274>] mlx4_en_ptp_overflow_check+0x44/0x60
> [mlx4_en]
> [   67.587223]  [<ffffffffa01dc78e>] mlx4_en_service_task+0x3e/0x70
> [mlx4_en]
> [   67.594099]  [<ffffffff810a1df5>] process_one_work+0x175/0x3f0
> [   67.599926]  [<ffffffff810a3208>] worker_thread+0x118/0x370
> [   67.605494]  [<ffffffff810a30f0>] ? manage_workers+0x390/0x390
> [   67.611325]  [<ffffffff810a91d0>] kthread+0xc0/0xd0
> [   67.616199]  [<ffffffff810a9110>] ? flush_kthread_worker+0x80/0x80
> [   67.622374]  [<ffffffff815c699c>] ret_from_fork+0x7c/0xb0
> [   67.627768]  [<ffffffff810a9110>] ? flush_kthread_worker+0x80/0x80
> [   67.633941] Code: 8b 65 f8 48 8b 5d f0 c9 c3 66 66 2e 0f 1f 84 00 00 00
> 00 00 66 66 66 66 90 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 07 48 89 c7
> <ff> 10 48 8b 0b 48 89 c2 48 2b 53 08 48 23 51 08 8b 71 10 8b 49
> [   67.653910] RIP  [<ffffffff810c9db7>] timecounter_read+0x17/0x50
> [   67.659931]  RSP <ffff880660635d88>
> [   67.663418] CR2: 0000000000000000
> [   67.666757] ---[ end trace 5a2c9af6569fc2bb ]---
> [   67.671388] Kernel panic - not syncing: Fatal exception
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next V3 5/5] net/mlx4_en: Add a service task
  2013-04-26  2:24     ` Or Gerlitz
@ 2013-04-26  2:39       ` Eric Dumazet
  0 siblings, 0 replies; 13+ messages in thread
From: Eric Dumazet @ 2013-04-26  2:39 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Amir Vadai, David S. Miller, netdev, Richard Cochran, Or Gerlitz,
	Eugenia Emantayev

On Fri, 2013-04-26 at 05:24 +0300, Or Gerlitz wrote:

> Hi Eric,
> 
> This is fixed in "[PATCH V1 net-next 2/8] net/mlx4_en: Disable HW
> clock overflow check when no HW support"
> http://marc.info/?l=linux-netdev&m=136690338314540&w=2
> 

OK thanks, I missed it because the patch serie had the following label :

"[PATCH V1 net-next 0/8] net/mlx4: Add support to SRIOV VF management
ndo calls"

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next V3 3/5] net/mlx4_en: Add HW timestamping (TS) support
  2013-04-25 19:26   ` Richard Cochran
@ 2013-04-28  6:33     ` Amir Vadai
  2013-04-28  7:46       ` Richard Cochran
  0 siblings, 1 reply; 13+ messages in thread
From: Amir Vadai @ 2013-04-28  6:33 UTC (permalink / raw)
  To: Richard Cochran; +Cc: David S. Miller, netdev, Or Gerlitz, Eugenia Emantayev

On 25/04/2013 22:26, Richard Cochran wrote:
> On Tue, Apr 23, 2013 at 07:06:49PM +0300, Amir Vadai wrote:
> 
>> +u64 mlx4_en_get_cqe_ts(struct mlx4_cqe *cqe)
>> +{
>> +	u64 hi, lo;
>> +	struct mlx4_ts_cqe *ts_cqe = (struct mlx4_ts_cqe *)cqe;
>> +
>> +	lo = (u64)be16_to_cpu(ts_cqe->timestamp_lo);
>> +	hi = ((u64)be32_to_cpu(ts_cqe->timestamp_hi) + !lo) << 16;
>                                                      ^^^^^
> That looks a bit strange. Can you explain?

We need to increment the high order 32bit by one when the low order
16bit value is zero, due to HW limitation.

> 
>> +
>> +	return hi | lo;
>> +}
> 
> Thanks,
> Richard
> 

Amir.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next V3 3/5] net/mlx4_en: Add HW timestamping (TS) support
  2013-04-28  6:33     ` Amir Vadai
@ 2013-04-28  7:46       ` Richard Cochran
  2013-04-28  8:00         ` Amir Vadai
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Cochran @ 2013-04-28  7:46 UTC (permalink / raw)
  To: Amir Vadai; +Cc: David S. Miller, netdev, Or Gerlitz, Eugenia Emantayev

On Sun, Apr 28, 2013 at 09:33:12AM +0300, Amir Vadai wrote:
> On 25/04/2013 22:26, Richard Cochran wrote:
> > On Tue, Apr 23, 2013 at 07:06:49PM +0300, Amir Vadai wrote:
> > 
> >> +u64 mlx4_en_get_cqe_ts(struct mlx4_cqe *cqe)
> >> +{
> >> +	u64 hi, lo;
> >> +	struct mlx4_ts_cqe *ts_cqe = (struct mlx4_ts_cqe *)cqe;
> >> +
> >> +	lo = (u64)be16_to_cpu(ts_cqe->timestamp_lo);
> >> +	hi = ((u64)be32_to_cpu(ts_cqe->timestamp_hi) + !lo) << 16;
> >                                                      ^^^^^
> > That looks a bit strange. Can you explain?
> 
> We need to increment the high order 32bit by one when the low order
> 16bit value is zero, due to HW limitation.

So 'hi' increases by one, when 'lo' goes from 0x0000 to 0x0001?

Thanks,
Richard

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next V3 3/5] net/mlx4_en: Add HW timestamping (TS) support
  2013-04-28  7:46       ` Richard Cochran
@ 2013-04-28  8:00         ` Amir Vadai
  0 siblings, 0 replies; 13+ messages in thread
From: Amir Vadai @ 2013-04-28  8:00 UTC (permalink / raw)
  To: Richard Cochran; +Cc: David S. Miller, netdev, Or Gerlitz, Eugenia Emantayev

On 28/04/2013 10:46, Richard Cochran wrote:
> On Sun, Apr 28, 2013 at 09:33:12AM +0300, Amir Vadai wrote:
>> On 25/04/2013 22:26, Richard Cochran wrote:
>>> On Tue, Apr 23, 2013 at 07:06:49PM +0300, Amir Vadai wrote:
>>>
>>>> +u64 mlx4_en_get_cqe_ts(struct mlx4_cqe *cqe)
>>>> +{
>>>> +	u64 hi, lo;
>>>> +	struct mlx4_ts_cqe *ts_cqe = (struct mlx4_ts_cqe *)cqe;
>>>> +
>>>> +	lo = (u64)be16_to_cpu(ts_cqe->timestamp_lo);
>>>> +	hi = ((u64)be32_to_cpu(ts_cqe->timestamp_hi) + !lo) << 16;
>>>                                                      ^^^^^
>>> That looks a bit strange. Can you explain?
>>
>> We need to increment the high order 32bit by one when the low order
>> 16bit value is zero, due to HW limitation.
> 
> So 'hi' increases by one, when 'lo' goes from 0x0000 to 0x0001?
Actually, 'hi' was supposed to be increased by one when 'lo' goes from
0xffff to 0x0000, and instead it is done by HW only when 'lo' goes from
0x0000 to 0x0001.
So the SW is filling this gap, and do the increase when 'lo' goes to
0x0000, later on the HW will make sure 'hi' is increased by one.

> 
> Thanks,
> Richard
> 

Amir

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-04-28  8:00 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-23 16:06 [PATCH net-next V3 0/5] net/mlx4: HW timestamping support Amir Vadai
2013-04-23 16:06 ` [PATCH net-next V3 1/5] net/mlx4_core: Add timestamping device capability Amir Vadai
2013-04-23 16:06 ` [PATCH net-next V3 2/5] net/mlx4_core: Read HCA frequency and map internal clock Amir Vadai
2013-04-23 16:06 ` [PATCH net-next V3 3/5] net/mlx4_en: Add HW timestamping (TS) support Amir Vadai
2013-04-25 19:26   ` Richard Cochran
2013-04-28  6:33     ` Amir Vadai
2013-04-28  7:46       ` Richard Cochran
2013-04-28  8:00         ` Amir Vadai
2013-04-23 16:06 ` [PATCH net-next V3 4/5] net/mlx4_en: Support software timestamping Amir Vadai
2013-04-23 16:06 ` [PATCH net-next V3 5/5] net/mlx4_en: Add a service task Amir Vadai
2013-04-26  0:13   ` Eric Dumazet
2013-04-26  2:24     ` Or Gerlitz
2013-04-26  2:39       ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).