netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch net-next 0/4] Introduce Mellanox Technologies Switch ASICs switchdev drivers
@ 2015-07-23 15:43 Jiri Pirko
  2015-07-23 15:43 ` [patch net-next 1/4] mlxsw: Introduce Mellanox switch driver core Jiri Pirko
                   ` (6 more replies)
  0 siblings, 7 replies; 29+ messages in thread
From: Jiri Pirko @ 2015-07-23 15:43 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, ogerlitz, sfeldma, roopa, f.fainelli,
	tgraf, ast, jhs, daniel, john.fastabend, simon.horman, linville,
	andy, shm, nhorman

This patchset introduces Mellanox Technologies Switch driver infrastructure
and support for SwitchX-2 ASIC.

The driver is divided into 3 logical parts:
1) Bus - implements switch bus interface. Currently only PCI bus is
   implemented, but more buses will be added in the future. Namely I2C
   and SGMII.
   (patch #2)
2) Driver - implemements of ASIC-specific functions.
   Currently SwitchX-2 ASIC is supported, but a plan exists to introduce
   support for Spectrum ASIC in the near future.
   (patch #4)
3) Core - infrastructure that glues buses and drivers together.
   It implements register access logic (EMADs) and takes care of RX traps
   and events.
   (patch #1 and #3)

Ido Schimmel (1):
  mlxsw: Add interface to access registers and process events

Jiri Pirko (3):
  mlxsw: Introduce Mellanox switch driver core
  mlxsw: Add PCI bus implementation
  mlxsw: Introduce Mellanox SwitchX-2 ASIC support

 MAINTAINERS                                    |    9 +
 drivers/net/ethernet/mellanox/Kconfig          |    1 +
 drivers/net/ethernet/mellanox/Makefile         |    1 +
 drivers/net/ethernet/mellanox/mlxsw/Kconfig    |   32 +
 drivers/net/ethernet/mellanox/mlxsw/Makefile   |    6 +
 drivers/net/ethernet/mellanox/mlxsw/cmd.h      | 1090 +++++++++++++++
 drivers/net/ethernet/mellanox/mlxsw/core.c     | 1287 +++++++++++++++++
 drivers/net/ethernet/mellanox/mlxsw/core.h     |  203 +++
 drivers/net/ethernet/mellanox/mlxsw/emad.h     |  127 ++
 drivers/net/ethernet/mellanox/mlxsw/item.h     |  405 ++++++
 drivers/net/ethernet/mellanox/mlxsw/pci.c      | 1789 ++++++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlxsw/pci.h      |  221 +++
 drivers/net/ethernet/mellanox/mlxsw/port.h     |   73 +
 drivers/net/ethernet/mellanox/mlxsw/reg.h      | 1289 +++++++++++++++++
 drivers/net/ethernet/mellanox/mlxsw/switchx2.c | 1541 ++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlxsw/trap.h     |   68 +
 drivers/net/ethernet/mellanox/mlxsw/txheader.h |   80 ++
 17 files changed, 8222 insertions(+)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/Kconfig
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/Makefile
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/cmd.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/core.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/core.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/emad.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/item.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/pci.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/pci.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/port.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/reg.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/switchx2.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/trap.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/txheader.h

-- 
1.9.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch net-next 1/4] mlxsw: Introduce Mellanox switch driver core
  2015-07-23 15:43 [patch net-next 0/4] Introduce Mellanox Technologies Switch ASICs switchdev drivers Jiri Pirko
@ 2015-07-23 15:43 ` Jiri Pirko
  2015-07-23 15:43 ` [patch net-next 2/4] mlxsw: Add PCI bus implementation Jiri Pirko
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 29+ messages in thread
From: Jiri Pirko @ 2015-07-23 15:43 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, ogerlitz, sfeldma, roopa, f.fainelli,
	tgraf, ast, jhs, daniel, john.fastabend, simon.horman, linville,
	andy, shm, nhorman, Jiri Pirko

From: Jiri Pirko <jiri@mellanox.com>

Add core components of Mellanox switch driver infrastructure.
Core infrastructure is designed so that it can be used by multiple
bus drivers (PCI now, I2C and SGMII are planned to be implemented
in the future). Multiple switch kind drivers can be registered as well.
This core serves as a glue between buses and drivers.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Elad Raz <eladr@mellanox.com>
---
 MAINTAINERS                                  |    9 +
 drivers/net/ethernet/mellanox/Kconfig        |    1 +
 drivers/net/ethernet/mellanox/Makefile       |    1 +
 drivers/net/ethernet/mellanox/mlxsw/Kconfig  |   11 +
 drivers/net/ethernet/mellanox/mlxsw/Makefile |    2 +
 drivers/net/ethernet/mellanox/mlxsw/cmd.h    | 1090 ++++++++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlxsw/core.c   |  551 +++++++++++++
 drivers/net/ethernet/mellanox/mlxsw/core.h   |  180 +++++
 drivers/net/ethernet/mellanox/mlxsw/item.h   |  405 ++++++++++
 drivers/net/ethernet/mellanox/mlxsw/port.h   |   50 ++
 drivers/net/ethernet/mellanox/mlxsw/trap.h   |   68 ++
 11 files changed, 2368 insertions(+)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/Kconfig
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/Makefile
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/cmd.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/core.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/core.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/item.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/port.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/trap.h

diff --git a/MAINTAINERS b/MAINTAINERS
index a226416..da41267 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6645,6 +6645,15 @@ W:	http://www.mellanox.com
 Q:	http://patchwork.ozlabs.org/project/netdev/list/
 F:	drivers/net/ethernet/mellanox/mlx4/en_*
 
+MELLANOX ETHERNET SWITCH DRIVERS
+M:	Jiri Pirko <jiri@mellanox.com>
+M:	Ido Schimmel <idosch@mellanox.com>
+L:	netdev@vger.kernel.org
+S:	Supported
+W:	http://www.mellanox.com
+Q:	http://patchwork.ozlabs.org/project/netdev/list/
+F:	drivers/net/ethernet/mellanox/mlxsw/
+
 MEMORY MANAGEMENT
 L:	linux-mm@kvack.org
 W:	http://www.linux-mm.org
diff --git a/drivers/net/ethernet/mellanox/Kconfig b/drivers/net/ethernet/mellanox/Kconfig
index 52a6665..d547010 100644
--- a/drivers/net/ethernet/mellanox/Kconfig
+++ b/drivers/net/ethernet/mellanox/Kconfig
@@ -18,5 +18,6 @@ if NET_VENDOR_MELLANOX
 
 source "drivers/net/ethernet/mellanox/mlx4/Kconfig"
 source "drivers/net/ethernet/mellanox/mlx5/core/Kconfig"
+source "drivers/net/ethernet/mellanox/mlxsw/Kconfig"
 
 endif # NET_VENDOR_MELLANOX
diff --git a/drivers/net/ethernet/mellanox/Makefile b/drivers/net/ethernet/mellanox/Makefile
index 38fe32ef..2e2a5ec 100644
--- a/drivers/net/ethernet/mellanox/Makefile
+++ b/drivers/net/ethernet/mellanox/Makefile
@@ -4,3 +4,4 @@
 
 obj-$(CONFIG_MLX4_CORE) += mlx4/
 obj-$(CONFIG_MLX5_CORE) += mlx5/core/
+obj-$(CONFIG_MLXSW_CORE) += mlxsw/
diff --git a/drivers/net/ethernet/mellanox/mlxsw/Kconfig b/drivers/net/ethernet/mellanox/mlxsw/Kconfig
new file mode 100644
index 0000000..46268f2
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/Kconfig
@@ -0,0 +1,11 @@
+#
+# Mellanox switch drivers configuration
+#
+
+config MLXSW_CORE
+	tristate "Mellanox Technologies Switch ASICs support"
+	---help---
+	  This driver supports Mellanox Technologies Switch ASICs family.
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called mlxsw_core.
diff --git a/drivers/net/ethernet/mellanox/mlxsw/Makefile b/drivers/net/ethernet/mellanox/mlxsw/Makefile
new file mode 100644
index 0000000..271de28
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_MLXSW_CORE)	+= mlxsw_core.o
+mlxsw_core-objs			:= core.o
diff --git a/drivers/net/ethernet/mellanox/mlxsw/cmd.h b/drivers/net/ethernet/mellanox/mlxsw/cmd.h
new file mode 100644
index 0000000..770db17
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/cmd.h
@@ -0,0 +1,1090 @@
+/*
+ * drivers/net/ethernet/mellanox/mlxsw/cmd.h
+ * Copyright (c) 2015 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2015 Jiri Pirko <jiri@mellanox.com>
+ * Copyright (c) 2015 Ido Schimmel <idosch@mellanox.com>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the names of the copyright holders nor the names of its
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _MLXSW_CMD_H
+#define _MLXSW_CMD_H
+
+#include "item.h"
+
+#define MLXSW_CMD_MBOX_SIZE	4096
+
+static inline char *mlxsw_cmd_mbox_alloc(void)
+{
+	return kzalloc(MLXSW_CMD_MBOX_SIZE, GFP_KERNEL);
+}
+
+static inline void mlxsw_cmd_mbox_free(char *mbox)
+{
+	kfree(mbox);
+}
+
+static inline void mlxsw_cmd_mbox_zero(char *mbox)
+{
+	memset(mbox, 0, MLXSW_CMD_MBOX_SIZE);
+}
+
+struct mlxsw_core;
+
+int mlxsw_cmd_exec(struct mlxsw_core *mlxsw_core, u16 opcode, u8 opcode_mod,
+		   u32 in_mod, bool out_mbox_direct,
+		   char *in_mbox, size_t in_mbox_size,
+		   char *out_mbox, size_t out_mbox_size);
+
+static inline int mlxsw_cmd_exec_in(struct mlxsw_core *mlxsw_core, u16 opcode,
+				    u8 opcode_mod, u32 in_mod, char *in_mbox,
+				    size_t in_mbox_size)
+{
+	return mlxsw_cmd_exec(mlxsw_core, opcode, opcode_mod, in_mod, false,
+			      in_mbox, in_mbox_size, NULL, 0);
+}
+
+static inline int mlxsw_cmd_exec_out(struct mlxsw_core *mlxsw_core, u16 opcode,
+				     u8 opcode_mod, u32 in_mod,
+				     bool out_mbox_direct,
+				     char *out_mbox, size_t out_mbox_size)
+{
+	return mlxsw_cmd_exec(mlxsw_core, opcode, opcode_mod, in_mod,
+			      out_mbox_direct, NULL, 0,
+			      out_mbox, out_mbox_size);
+}
+
+static inline int mlxsw_cmd_exec_none(struct mlxsw_core *mlxsw_core, u16 opcode,
+				      u8 opcode_mod, u32 in_mod)
+{
+	return mlxsw_cmd_exec(mlxsw_core, opcode, opcode_mod, in_mod, false,
+			      NULL, 0, NULL, 0);
+}
+
+enum mlxsw_cmd_opcode {
+	MLXSW_CMD_OPCODE_QUERY_FW		= 0x004,
+	MLXSW_CMD_OPCODE_QUERY_BOARDINFO	= 0x006,
+	MLXSW_CMD_OPCODE_QUERY_AQ_CAP		= 0x003,
+	MLXSW_CMD_OPCODE_MAP_FA			= 0xFFF,
+	MLXSW_CMD_OPCODE_UNMAP_FA		= 0xFFE,
+	MLXSW_CMD_OPCODE_CONFIG_PROFILE		= 0x100,
+	MLXSW_CMD_OPCODE_ACCESS_REG		= 0x040,
+	MLXSW_CMD_OPCODE_SW2HW_DQ		= 0x201,
+	MLXSW_CMD_OPCODE_HW2SW_DQ		= 0x202,
+	MLXSW_CMD_OPCODE_2ERR_DQ		= 0x01E,
+	MLXSW_CMD_OPCODE_QUERY_DQ		= 0x022,
+	MLXSW_CMD_OPCODE_SW2HW_CQ		= 0x016,
+	MLXSW_CMD_OPCODE_HW2SW_CQ		= 0x017,
+	MLXSW_CMD_OPCODE_QUERY_CQ		= 0x018,
+	MLXSW_CMD_OPCODE_SW2HW_EQ		= 0x013,
+	MLXSW_CMD_OPCODE_HW2SW_EQ		= 0x014,
+	MLXSW_CMD_OPCODE_QUERY_EQ		= 0x015,
+};
+
+static inline const char *mlxsw_cmd_opcode_str(u16 opcode)
+{
+	switch (opcode) {
+	case MLXSW_CMD_OPCODE_QUERY_FW:
+		return "QUERY_FW";
+	case MLXSW_CMD_OPCODE_QUERY_BOARDINFO:
+		return "QUERY_BOARDINFO";
+	case MLXSW_CMD_OPCODE_QUERY_AQ_CAP:
+		return "QUERY_AQ_CAP";
+	case MLXSW_CMD_OPCODE_MAP_FA:
+		return "MAP_FA";
+	case MLXSW_CMD_OPCODE_UNMAP_FA:
+		return "UNMAP_FA";
+	case MLXSW_CMD_OPCODE_CONFIG_PROFILE:
+		return "CONFIG_PROFILE";
+	case MLXSW_CMD_OPCODE_ACCESS_REG:
+		return "ACCESS_REG";
+	case MLXSW_CMD_OPCODE_SW2HW_DQ:
+		return "SW2HW_DQ";
+	case MLXSW_CMD_OPCODE_HW2SW_DQ:
+		return "HW2SW_DQ";
+	case MLXSW_CMD_OPCODE_2ERR_DQ:
+		return "2ERR_DQ";
+	case MLXSW_CMD_OPCODE_QUERY_DQ:
+		return "QUERY_DQ";
+	case MLXSW_CMD_OPCODE_SW2HW_CQ:
+		return "SW2HW_CQ";
+	case MLXSW_CMD_OPCODE_HW2SW_CQ:
+		return "HW2SW_CQ";
+	case MLXSW_CMD_OPCODE_QUERY_CQ:
+		return "QUERY_CQ";
+	case MLXSW_CMD_OPCODE_SW2HW_EQ:
+		return "SW2HW_EQ";
+	case MLXSW_CMD_OPCODE_HW2SW_EQ:
+		return "HW2SW_EQ";
+	case MLXSW_CMD_OPCODE_QUERY_EQ:
+		return "QUERY_EQ";
+	default:
+		return "*UNKNOWN*";
+	}
+}
+
+enum mlxsw_cmd_status {
+	/* Command execution succeeded. */
+	MLXSW_CMD_STATUS_OK		= 0x00,
+	/* Internal error (e.g. bus error) occurred while processing command. */
+	MLXSW_CMD_STATUS_INTERNAL_ERR	= 0x01,
+	/* Operation/command not supported or opcode modifier not supported. */
+	MLXSW_CMD_STATUS_BAD_OP		= 0x02,
+	/* Parameter not supported, parameter out of range. */
+	MLXSW_CMD_STATUS_BAD_PARAM	= 0x03,
+	/* System was not enabled or bad system state. */
+	MLXSW_CMD_STATUS_BAD_SYS_STATE	= 0x04,
+	/* Attempt to access reserved or unallocated resource, or resource in
+	 * inappropriate ownership.
+	 */
+	MLXSW_CMD_STATUS_BAD_RESOURCE	= 0x05,
+	/* Requested resource is currently executing a command. */
+	MLXSW_CMD_STATUS_RESOURCE_BUSY	= 0x06,
+	/* Required capability exceeds device limits. */
+	MLXSW_CMD_STATUS_EXCEED_LIM	= 0x08,
+	/* Resource is not in the appropriate state or ownership. */
+	MLXSW_CMD_STATUS_BAD_RES_STATE	= 0x09,
+	/* Index out of range (might be beyond table size or attempt to
+	 * access a reserved resource).
+	 */
+	MLXSW_CMD_STATUS_BAD_INDEX	= 0x0A,
+	/* NVMEM checksum/CRC failed. */
+	MLXSW_CMD_STATUS_BAD_NVMEM	= 0x0B,
+	/* Bad management packet (silently discarded). */
+	MLXSW_CMD_STATUS_BAD_PKT	= 0x30,
+};
+
+static inline const char *mlxsw_cmd_status_str(u8 status)
+{
+	switch (status) {
+	case MLXSW_CMD_STATUS_OK:
+		return "OK";
+	case MLXSW_CMD_STATUS_INTERNAL_ERR:
+		return "INTERNAL_ERR";
+	case MLXSW_CMD_STATUS_BAD_OP:
+		return "BAD_OP";
+	case MLXSW_CMD_STATUS_BAD_PARAM:
+		return "BAD_PARAM";
+	case MLXSW_CMD_STATUS_BAD_SYS_STATE:
+		return "BAD_SYS_STATE";
+	case MLXSW_CMD_STATUS_BAD_RESOURCE:
+		return "BAD_RESOURCE";
+	case MLXSW_CMD_STATUS_RESOURCE_BUSY:
+		return "RESOURCE_BUSY";
+	case MLXSW_CMD_STATUS_EXCEED_LIM:
+		return "EXCEED_LIM";
+	case MLXSW_CMD_STATUS_BAD_RES_STATE:
+		return "BAD_RES_STATE";
+	case MLXSW_CMD_STATUS_BAD_INDEX:
+		return "BAD_INDEX";
+	case MLXSW_CMD_STATUS_BAD_NVMEM:
+		return "BAD_NVMEM";
+	case MLXSW_CMD_STATUS_BAD_PKT:
+		return "BAD_PKT";
+	default:
+		return "*UNKNOWN*";
+	}
+}
+
+/* QUERY_FW - Query Firmware
+ * -------------------------
+ * OpMod == 0, INMmod == 0
+ * -----------------------
+ * The QUERY_FW command retrieves information related to firmware, command
+ * interface version and the amount of resources that should be allocated to
+ * the firmware.
+ */
+
+static inline int mlxsw_cmd_query_fw(struct mlxsw_core *mlxsw_core,
+				     char *out_mbox)
+{
+	return mlxsw_cmd_exec_out(mlxsw_core, MLXSW_CMD_OPCODE_QUERY_FW,
+				  0, 0, false, out_mbox, MLXSW_CMD_MBOX_SIZE);
+}
+
+/* cmd_mbox_query_fw_fw_pages
+ * Amount of physical memory to be allocatedfor firmware usage in 4KB pages.
+ */
+MLXSW_ITEM32(cmd_mbox, query_fw, fw_pages, 0x00, 16, 16);
+
+/* cmd_mbox_query_fw_fw_rev_major
+ * Firmware Revision - Major
+ */
+MLXSW_ITEM32(cmd_mbox, query_fw, fw_rev_major, 0x00, 0, 16);
+
+/* cmd_mbox_query_fw_fw_rev_subminor
+ * Firmware Sub-minor version (Patch level)
+ */
+MLXSW_ITEM32(cmd_mbox, query_fw, fw_rev_subminor, 0x04, 16, 16);
+
+/* cmd_mbox_query_fw_fw_rev_minor
+ * Firmware Revision - Minor
+ */
+MLXSW_ITEM32(cmd_mbox, query_fw, fw_rev_minor, 0x04, 0, 16);
+
+/* cmd_mbox_query_fw_core_clk
+ * Internal Clock Frequency (in MHz)
+ */
+MLXSW_ITEM32(cmd_mbox, query_fw, core_clk, 0x08, 16, 16);
+
+/* cmd_mbox_query_fw_cmd_interface_rev
+ * Command Interface Interpreter Revision ID. This number is bumped up
+ * every time a non-backward-compatible change is done for the command
+ * interface. The current cmd_interface_rev is 1.
+ */
+MLXSW_ITEM32(cmd_mbox, query_fw, cmd_interface_rev, 0x08, 0, 16);
+
+/* cmd_mbox_query_fw_dt
+ * If set, Debug Trace is supported
+ */
+MLXSW_ITEM32(cmd_mbox, query_fw, dt, 0x0C, 31, 1);
+
+/* cmd_mbox_query_fw_api_version
+ * Indicates the version of the API, to enable software querying
+ * for compatibility. The current api_version is 1.
+ */
+MLXSW_ITEM32(cmd_mbox, query_fw, api_version, 0x0C, 0, 16);
+
+/* cmd_mbox_query_fw_fw_hour
+ * Firmware timestamp - hour
+ */
+MLXSW_ITEM32(cmd_mbox, query_fw, fw_hour, 0x10, 24, 8);
+
+/* cmd_mbox_query_fw_fw_minutes
+ * Firmware timestamp - minutes
+ */
+MLXSW_ITEM32(cmd_mbox, query_fw, fw_minutes, 0x10, 16, 8);
+
+/* cmd_mbox_query_fw_fw_seconds
+ * Firmware timestamp - seconds
+ */
+MLXSW_ITEM32(cmd_mbox, query_fw, fw_seconds, 0x10, 8, 8);
+
+/* cmd_mbox_query_fw_fw_year
+ * Firmware timestamp - year
+ */
+MLXSW_ITEM32(cmd_mbox, query_fw, fw_year, 0x14, 16, 16);
+
+/* cmd_mbox_query_fw_fw_month
+ * Firmware timestamp - month
+ */
+MLXSW_ITEM32(cmd_mbox, query_fw, fw_month, 0x14, 8, 8);
+
+/* cmd_mbox_query_fw_fw_day
+ * Firmware timestamp - day
+ */
+MLXSW_ITEM32(cmd_mbox, query_fw, fw_day, 0x14, 0, 8);
+
+/* cmd_mbox_query_fw_clr_int_base_offset
+ * Clear Interrupt register's offset from clr_int_bar register
+ * in PCI address space.
+ */
+MLXSW_ITEM64(cmd_mbox, query_fw, clr_int_base_offset, 0x20, 0, 64);
+
+/* cmd_mbox_query_fw_clr_int_bar
+ * PCI base address register (BAR) where clr_int register is located.
+ * 00 - BAR 0-1 (64 bit BAR)
+ */
+MLXSW_ITEM32(cmd_mbox, query_fw, clr_int_bar, 0x28, 30, 2);
+
+/* cmd_mbox_query_fw_error_buf_offset
+ * Read Only buffer for internal error reports of offset
+ * from error_buf_bar register in PCI address space).
+ */
+MLXSW_ITEM64(cmd_mbox, query_fw, error_buf_offset, 0x30, 0, 64);
+
+/* cmd_mbox_query_fw_error_buf_size
+ * Internal error buffer size in DWORDs
+ */
+MLXSW_ITEM32(cmd_mbox, query_fw, error_buf_size, 0x38, 0, 32);
+
+/* cmd_mbox_query_fw_error_int_bar
+ * PCI base address register (BAR) where error buffer
+ * register is located.
+ * 00 - BAR 0-1 (64 bit BAR)
+ */
+MLXSW_ITEM32(cmd_mbox, query_fw, error_int_bar, 0x3C, 30, 2);
+
+/* cmd_mbox_query_fw_doorbell_page_offset
+ * Offset of the doorbell page
+ */
+MLXSW_ITEM64(cmd_mbox, query_fw, doorbell_page_offset, 0x40, 0, 64);
+
+/* cmd_mbox_query_fw_doorbell_page_bar
+ * PCI base address register (BAR) of the doorbell page
+ * 00 - BAR 0-1 (64 bit BAR)
+ */
+MLXSW_ITEM32(cmd_mbox, query_fw, doorbell_page_bar, 0x48, 30, 2);
+
+/* QUERY_BOARDINFO - Query Board Information
+ * -----------------------------------------
+ * OpMod == 0 (N/A), INMmod == 0 (N/A)
+ * -----------------------------------
+ * The QUERY_BOARDINFO command retrieves adapter specific parameters.
+ */
+
+static inline int mlxsw_cmd_boardinfo(struct mlxsw_core *mlxsw_core,
+				      char *out_mbox)
+{
+	return mlxsw_cmd_exec_out(mlxsw_core, MLXSW_CMD_OPCODE_QUERY_BOARDINFO,
+				  0, 0, false, out_mbox, MLXSW_CMD_MBOX_SIZE);
+}
+
+/* cmd_mbox_boardinfo_intapin
+ * When PCIe interrupt messages are being used, this value is used for clearing
+ * an interrupt. When using MSI-X, this register is not used.
+ */
+MLXSW_ITEM32(cmd_mbox, boardinfo, intapin, 0x10, 24, 8);
+
+/* cmd_mbox_boardinfo_vsd_vendor_id
+ * PCISIG Vendor ID (www.pcisig.com/membership/vid_search) of the vendor
+ * specifying/formatting the VSD. The vsd_vendor_id identifies the management
+ * domain of the VSD/PSID data. Different vendors may choose different VSD/PSID
+ * format and encoding as long as they use their assigned vsd_vendor_id.
+ */
+MLXSW_ITEM32(cmd_mbox, boardinfo, vsd_vendor_id, 0x1C, 0, 16);
+
+/* cmd_mbox_boardinfo_vsd
+ * Vendor Specific Data. The VSD string that is burnt to the Flash
+ * with the firmware.
+ */
+#define MLXSW_CMD_BOARDINFO_VSD_LEN 208
+MLXSW_ITEM_BUF(cmd_mbox, boardinfo, vsd, 0x20, MLXSW_CMD_BOARDINFO_VSD_LEN);
+
+/* cmd_mbox_boardinfo_psid
+ * The PSID field is a 16-ascii (byte) character string which acts as
+ * the board ID. The PSID format is used in conjunction with
+ * Mellanox vsd_vendor_id (15B3h).
+ */
+#define MLXSW_CMD_BOARDINFO_PSID_LEN 16
+MLXSW_ITEM_BUF(cmd_mbox, boardinfo, psid, 0xF0, MLXSW_CMD_BOARDINFO_PSID_LEN);
+
+/* QUERY_AQ_CAP - Query Asynchronous Queues Capabilities
+ * -----------------------------------------------------
+ * OpMod == 0 (N/A), INMmod == 0 (N/A)
+ * -----------------------------------
+ * The QUERY_AQ_CAP command returns the device asynchronous queues
+ * capabilities supported.
+ */
+
+static inline int mlxsw_cmd_query_aq_cap(struct mlxsw_core *mlxsw_core,
+					 char *out_mbox)
+{
+	return mlxsw_cmd_exec_out(mlxsw_core, MLXSW_CMD_OPCODE_QUERY_AQ_CAP,
+				  0, 0, false, out_mbox, MLXSW_CMD_MBOX_SIZE);
+}
+
+/* cmd_mbox_query_aq_cap_log_max_sdq_sz
+ * Log (base 2) of max WQEs allowed on SDQ.
+ */
+MLXSW_ITEM32(cmd_mbox, query_aq_cap, log_max_sdq_sz, 0x00, 24, 8);
+
+/* cmd_mbox_query_aq_cap_max_num_sdqs
+ * Maximum number of SDQs.
+ */
+MLXSW_ITEM32(cmd_mbox, query_aq_cap, max_num_sdqs, 0x00, 0, 8);
+
+/* cmd_mbox_query_aq_cap_log_max_rdq_sz
+ * Log (base 2) of max WQEs allowed on RDQ.
+ */
+MLXSW_ITEM32(cmd_mbox, query_aq_cap, log_max_rdq_sz, 0x04, 24, 8);
+
+/* cmd_mbox_query_aq_cap_max_num_rdqs
+ * Maximum number of RDQs.
+ */
+MLXSW_ITEM32(cmd_mbox, query_aq_cap, max_num_rdqs, 0x04, 0, 8);
+
+/* cmd_mbox_query_aq_cap_log_max_cq_sz
+ * Log (base 2) of max CQEs allowed on CQ.
+ */
+MLXSW_ITEM32(cmd_mbox, query_aq_cap, log_max_cq_sz, 0x08, 24, 8);
+
+/* cmd_mbox_query_aq_cap_max_num_cqs
+ * Maximum number of CQs.
+ */
+MLXSW_ITEM32(cmd_mbox, query_aq_cap, max_num_cqs, 0x08, 0, 8);
+
+/* cmd_mbox_query_aq_cap_log_max_eq_sz
+ * Log (base 2) of max EQEs allowed on EQ.
+ */
+MLXSW_ITEM32(cmd_mbox, query_aq_cap, log_max_eq_sz, 0x0C, 24, 8);
+
+/* cmd_mbox_query_aq_cap_max_num_eqs
+ * Maximum number of EQs.
+ */
+MLXSW_ITEM32(cmd_mbox, query_aq_cap, max_num_eqs, 0x0C, 0, 8);
+
+/* cmd_mbox_query_aq_cap_max_sg_sq
+ * The maximum S/G list elements in an DSQ. DSQ must not contain
+ * more S/G entries than indicated here.
+ */
+MLXSW_ITEM32(cmd_mbox, query_aq_cap, max_sg_sq, 0x10, 8, 8);
+
+/* cmd_mbox_query_aq_cap_
+ * The maximum S/G list elements in an DRQ. DRQ must not contain
+ * more S/G entries than indicated here.
+ */
+MLXSW_ITEM32(cmd_mbox, query_aq_cap, max_sg_rq, 0x10, 0, 8);
+
+/* MAP_FA - Map Firmware Area
+ * --------------------------
+ * OpMod == 0 (N/A), INMmod == Number of VPM entries
+ * -------------------------------------------------
+ * The MAP_FA command passes physical pages to the switch. These pages
+ * are used to store the device firmware. MAP_FA can be executed multiple
+ * times until all the firmware area is mapped (the size that should be
+ * mapped is retrieved through the QUERY_FW command). All required pages
+ * must be mapped to finish the initialization phase. Physical memory
+ * passed in this command must be pinned.
+ */
+
+static inline int mlxsw_cmd_map_fa(struct mlxsw_core *mlxsw_core,
+				   char *in_mbox, u32 vpm_entries_count)
+{
+	return mlxsw_cmd_exec_in(mlxsw_core, MLXSW_CMD_OPCODE_MAP_FA,
+				 0, vpm_entries_count,
+				 in_mbox, MLXSW_CMD_MBOX_SIZE);
+}
+
+/* cmd_mbox_map_fa_pa
+ * Physical Address.
+ */
+MLXSW_ITEM64_INDEXED(cmd_mbox, map_fa, pa, 0x00, 12, 52, 0x08, 0x00, true);
+
+/* cmd_mbox_map_fa_log2size
+ * Log (base 2) of the size in 4KB pages of the physical and contiguous memory
+ * that starts at PA_L/H.
+ */
+MLXSW_ITEM32_INDEXED(cmd_mbox, map_fa, log2size, 0x00, 0, 5, 0x08, 0x04, false);
+
+/* UNMAP_FA - Unmap Firmware Area
+ * ------------------------------
+ * OpMod == 0 (N/A), INMmod == 0 (N/A)
+ * -----------------------------------
+ * The UNMAP_FA command unload the firmware and unmaps all the
+ * firmware area. After this command is completed the device will not access
+ * the pages that were mapped to the firmware area. After executing UNMAP_FA
+ * command, software reset must be done prior to execution of MAP_FW command.
+ */
+
+static inline int mlxsw_cmd_unmap_fa(struct mlxsw_core *mlxsw_core)
+{
+	return mlxsw_cmd_exec_none(mlxsw_core, MLXSW_CMD_OPCODE_UNMAP_FA, 0, 0);
+}
+
+/* CONFIG_PROFILE (Set) - Configure Switch Profile
+ * ------------------------------
+ * OpMod == 1 (Set), INMmod == 0 (N/A)
+ * -----------------------------------
+ * The CONFIG_PROFILE command sets the switch profile. The command can be
+ * executed on the device only once at startup in order to allocate and
+ * configure all switch resources and prepare it for operational mode.
+ * It is not possible to change the device profile after the chip is
+ * in operational mode.
+ * Failure of the CONFIG_PROFILE command leaves the hardware in an indeterminate
+ * state therefore it is required to perform software reset to the device
+ * following an unsuccessful completion of the command. It is required
+ * to perform software reset to the device to change an existing profile.
+ */
+
+static inline int mlxsw_cmd_config_profile_set(struct mlxsw_core *mlxsw_core,
+					       char *in_mbox)
+{
+	return mlxsw_cmd_exec_in(mlxsw_core, MLXSW_CMD_OPCODE_CONFIG_PROFILE,
+				 1, 0, in_mbox, MLXSW_CMD_MBOX_SIZE);
+}
+
+/* cmd_mbox_config_profile_set_max_vepa_channels
+ * Capability bit. Setting a bit to 1 configures the profile
+ * according to the mailbox contents.
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, set_max_vepa_channels, 0x0C, 0, 1);
+
+/* cmd_mbox_config_profile_set_max_lag
+ * Capability bit. Setting a bit to 1 configures the profile
+ * according to the mailbox contents.
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, set_max_lag, 0x0C, 1, 1);
+
+/* cmd_mbox_config_profile_set_max_port_per_lag
+ * Capability bit. Setting a bit to 1 configures the profile
+ * according to the mailbox contents.
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, set_max_port_per_lag, 0x0C, 2, 1);
+
+/* cmd_mbox_config_profile_set_max_mid
+ * Capability bit. Setting a bit to 1 configures the profile
+ * according to the mailbox contents.
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, set_max_mid, 0x0C, 3, 1);
+
+/* cmd_mbox_config_profile_set_max_pgt
+ * Capability bit. Setting a bit to 1 configures the profile
+ * according to the mailbox contents.
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, set_max_pgt, 0x0C, 4, 1);
+
+/* cmd_mbox_config_profile_set_max_system_port
+ * Capability bit. Setting a bit to 1 configures the profile
+ * according to the mailbox contents.
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, set_max_system_port, 0x0C, 5, 1);
+
+/* cmd_mbox_config_profile_set_max_vlan_groups
+ * Capability bit. Setting a bit to 1 configures the profile
+ * according to the mailbox contents.
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, set_max_vlan_groups, 0x0C, 6, 1);
+
+/* cmd_mbox_config_profile_set_max_regions
+ * Capability bit. Setting a bit to 1 configures the profile
+ * according to the mailbox contents.
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, set_max_regions, 0x0C, 7, 1);
+
+/* cmd_mbox_config_profile_set_fid_based
+ * Capability bit. Setting a bit to 1 configures the profile
+ * according to the mailbox contents.
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, set_flood_mode, 0x0C, 8, 1);
+
+/* cmd_mbox_config_profile_set_max_flood_tables
+ * Capability bit. Setting a bit to 1 configures the profile
+ * according to the mailbox contents.
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, set_flood_tables, 0x0C, 9, 1);
+
+/* cmd_mbox_config_profile_set_max_ib_mc
+ * Capability bit. Setting a bit to 1 configures the profile
+ * according to the mailbox contents.
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, set_max_ib_mc, 0x0C, 12, 1);
+
+/* cmd_mbox_config_profile_set_max_pkey
+ * Capability bit. Setting a bit to 1 configures the profile
+ * according to the mailbox contents.
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, set_max_pkey, 0x0C, 13, 1);
+
+/* cmd_mbox_config_profile_set_adaptive_routing_group_cap
+ * Capability bit. Setting a bit to 1 configures the profile
+ * according to the mailbox contents.
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile,
+	     set_adaptive_routing_group_cap, 0x0C, 14, 1);
+
+/* cmd_mbox_config_profile_set_ar_sec
+ * Capability bit. Setting a bit to 1 configures the profile
+ * according to the mailbox contents.
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, set_ar_sec, 0x0C, 15, 1);
+
+/* cmd_mbox_config_profile_max_vepa_channels
+ * Maximum number of VEPA channels per port (0 through 16)
+ * 0 - multi-channel VEPA is disabled
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, max_vepa_channels, 0x10, 0, 8);
+
+/* cmd_mbox_config_profile_max_lag
+ * Maximum number of LAG IDs requested.
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, max_lag, 0x14, 0, 16);
+
+/* cmd_mbox_config_profile_max_port_per_lag
+ * Maximum number of ports per LAG requested.
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, max_port_per_lag, 0x18, 0, 16);
+
+/* cmd_mbox_config_profile_max_mid
+ * Maximum Multicast IDs.
+ * Multicast IDs are allocated from 0 to max_mid-1
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, max_mid, 0x1C, 0, 16);
+
+/* cmd_mbox_config_profile_max_pgt
+ * Maximum records in the Port Group Table per Switch Partition.
+ * Port Group Table indexes are from 0 to max_pgt-1
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, max_pgt, 0x20, 0, 16);
+
+/* cmd_mbox_config_profile_max_system_port
+ * The maximum number of system ports that can be allocated.
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, max_system_port, 0x24, 0, 16);
+
+/* cmd_mbox_config_profile_max_vlan_groups
+ * Maximum number VLAN Groups for VLAN binding.
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, max_vlan_groups, 0x28, 0, 12);
+
+/* cmd_mbox_config_profile_max_regions
+ * Maximum number of TCAM Regions.
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, max_regions, 0x2C, 0, 16);
+
+/* cmd_mbox_config_profile_max_flood_tables
+ * Maximum number of Flooding Tables. Flooding Tables are associated to
+ * the different packet types for the different switch partitions.
+ * Note that the table size depends on the fid_based mode.
+ * In SwitchX silicon, tables are split equally between the switch
+ * partitions. e.g. for 2 swids and 8 tables, the first 4 are associated
+ * with swid-1 and the last 4 are associated with swid-2.
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, max_flood_tables, 0x30, 16, 4);
+
+/* cmd_mbox_config_profile_max_vid_flood_tables
+ * Maximum number of per-vid flooding tables. Flooding tables are associated
+ * to the different packet types for the different switch partitions.
+ * Table size is 4K entries covering all VID space.
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, max_vid_flood_tables, 0x30, 8, 4);
+
+/* cmd_mbox_config_profile_fid_based
+ * FID Based Flood Mode
+ * 00 Do not use FID to offset the index into the Port Group Table/Multicast ID
+ * 01 Use FID to offset the index to the Port Group Table (pgi)
+ * 10 Use FID to offset the index to the Port Group Table (pgi) and
+ * the Multicast ID
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, flood_mode, 0x30, 0, 2);
+
+/* cmd_mbox_config_profile_max_ib_mc
+ * Maximum number of multicast FDB records for InfiniBand
+ * FDB (in 512 chunks) per InfiniBand switch partition.
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, max_ib_mc, 0x40, 0, 15);
+
+/* cmd_mbox_config_profile_max_pkey
+ * Maximum per port PKEY table size (for PKEY enforcement)
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, max_pkey, 0x44, 0, 15);
+
+/* cmd_mbox_config_profile_ar_sec
+ * Primary/secondary capability
+ * Describes the number of adaptive routing sub-groups
+ * 0 - disable primary/secondary (single group)
+ * 1 - enable primary/secondary (2 sub-groups)
+ * 2 - 3 sub-groups: Not supported in SwitchX, SwitchX-2
+ * 3 - 4 sub-groups: Not supported in SwitchX, SwitchX-2
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, ar_sec, 0x4C, 24, 2);
+
+/* cmd_mbox_config_profile_adaptive_routing_group_cap
+ * Adaptive Routing Group Capability. Indicates the number of AR groups
+ * supported. Note that when Primary/secondary is enabled, each
+ * primary/secondary couple consumes 2 adaptive routing entries.
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, adaptive_routing_group_cap, 0x4C, 0, 16);
+
+/* cmd_mbox_config_profile_arn
+ * Adaptive Routing Notification Enable
+ * Not supported in SwitchX, SwitchX-2
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, arn, 0x50, 31, 1);
+
+/* cmd_mbox_config_profile_swid_config_mask
+ * Modify Switch Partition Configuration mask. When set, the configu-
+ * ration value for the Switch Partition are taken from the mailbox.
+ * When clear, the current configuration values are used.
+ * Bit 0 - set type
+ * Bit 1 - properties
+ * Other - reserved
+ */
+MLXSW_ITEM32_INDEXED(cmd_mbox, config_profile, swid_config_mask,
+		     0x60, 24, 8, 0x08, 0x00, false);
+
+/* cmd_mbox_config_profile_swid_config_type
+ * Switch Partition type.
+ * 0000 - disabled (Switch Partition does not exist)
+ * 0001 - InfiniBand
+ * 0010 - Ethernet
+ * 1000 - router port (SwitchX-2 only)
+ * Other - reserved
+ */
+MLXSW_ITEM32_INDEXED(cmd_mbox, config_profile, swid_config_type,
+		     0x60, 20, 4, 0x08, 0x00, false);
+
+/* cmd_mbox_config_profile_swid_config_properties
+ * Switch Partition properties.
+ */
+MLXSW_ITEM32_INDEXED(cmd_mbox, config_profile, swid_config_properties,
+		     0x60, 0, 8, 0x08, 0x00, false);
+
+/* ACCESS_REG - Access EMAD Supported Register
+ * ----------------------------------
+ * OpMod == 0 (N/A), INMmod == 0 (N/A)
+ * -------------------------------------
+ * The ACCESS_REG command supports accessing device registers. This access
+ * is mainly used for bootstrapping.
+ */
+
+static inline int mlxsw_cmd_access_reg(struct mlxsw_core *mlxsw_core,
+				       char *in_mbox, char *out_mbox)
+{
+	return mlxsw_cmd_exec(mlxsw_core, MLXSW_CMD_OPCODE_ACCESS_REG,
+			      0, 0, false, in_mbox, MLXSW_CMD_MBOX_SIZE,
+			      out_mbox, MLXSW_CMD_MBOX_SIZE);
+}
+
+/* SW2HW_DQ - Software to Hardware DQ
+ * ----------------------------------
+ * OpMod == 0 (send DQ) / OpMod == 1 (receive DQ)
+ * INMmod == DQ number
+ * ----------------------------------------------
+ * The SW2HW_DQ command transitions a descriptor queue from software to
+ * hardware ownership. The command enables posting WQEs and ringing DoorBells
+ * on the descriptor queue.
+ */
+
+static inline int __mlxsw_cmd_sw2hw_dq(struct mlxsw_core *mlxsw_core,
+				       char *in_mbox, u32 dq_number,
+				       u8 opcode_mod)
+{
+	return mlxsw_cmd_exec_in(mlxsw_core, MLXSW_CMD_OPCODE_SW2HW_DQ,
+				 opcode_mod, dq_number,
+				 in_mbox, MLXSW_CMD_MBOX_SIZE);
+}
+
+enum {
+	MLXSW_CMD_OPCODE_MOD_SDQ = 0,
+	MLXSW_CMD_OPCODE_MOD_RDQ = 1,
+};
+
+static inline int mlxsw_cmd_sw2hw_sdq(struct mlxsw_core *mlxsw_core,
+				      char *in_mbox, u32 dq_number)
+{
+	return __mlxsw_cmd_sw2hw_dq(mlxsw_core, in_mbox, dq_number,
+				    MLXSW_CMD_OPCODE_MOD_SDQ);
+}
+
+static inline int mlxsw_cmd_sw2hw_rdq(struct mlxsw_core *mlxsw_core,
+				      char *in_mbox, u32 dq_number)
+{
+	return __mlxsw_cmd_sw2hw_dq(mlxsw_core, in_mbox, dq_number,
+				    MLXSW_CMD_OPCODE_MOD_RDQ);
+}
+
+/* cmd_mbox_sw2hw_dq_cq
+ * Number of the CQ that this Descriptor Queue reports completions to.
+ */
+MLXSW_ITEM32(cmd_mbox, sw2hw_dq, cq, 0x00, 24, 8);
+
+/* cmd_mbox_sw2hw_dq_sdq_tclass
+ * SDQ: CPU Egress TClass
+ * RDQ: Reserved
+ */
+MLXSW_ITEM32(cmd_mbox, sw2hw_dq, sdq_tclass, 0x00, 16, 6);
+
+/* cmd_mbox_sw2hw_dq_log2_dq_sz
+ * Log (base 2) of the Descriptor Queue size in 4KB pages.
+ */
+MLXSW_ITEM32(cmd_mbox, sw2hw_dq, log2_dq_sz, 0x00, 0, 6);
+
+/* cmd_mbox_sw2hw_dq_pa
+ * Physical Address.
+ */
+MLXSW_ITEM64_INDEXED(cmd_mbox, sw2hw_dq, pa, 0x10, 12, 52, 0x08, 0x00, true);
+
+/* HW2SW_DQ - Hardware to Software DQ
+ * ----------------------------------
+ * OpMod == 0 (send DQ) / OpMod == 1 (receive DQ)
+ * INMmod == DQ number
+ * ----------------------------------------------
+ * The HW2SW_DQ command transitions a descriptor queue from hardware to
+ * software ownership. Incoming packets on the DQ are silently discarded,
+ * SW should not post descriptors on nonoperational DQs.
+ */
+
+static inline int __mlxsw_cmd_hw2sw_dq(struct mlxsw_core *mlxsw_core,
+				       u32 dq_number, u8 opcode_mod)
+{
+	return mlxsw_cmd_exec_none(mlxsw_core, MLXSW_CMD_OPCODE_HW2SW_DQ,
+				   opcode_mod, dq_number);
+}
+
+static inline int mlxsw_cmd_hw2sw_sdq(struct mlxsw_core *mlxsw_core,
+				      u32 dq_number)
+{
+	return __mlxsw_cmd_hw2sw_dq(mlxsw_core, dq_number,
+				    MLXSW_CMD_OPCODE_MOD_SDQ);
+}
+
+static inline int mlxsw_cmd_hw2sw_rdq(struct mlxsw_core *mlxsw_core,
+				      u32 dq_number)
+{
+	return __mlxsw_cmd_hw2sw_dq(mlxsw_core, dq_number,
+				    MLXSW_CMD_OPCODE_MOD_RDQ);
+}
+
+/* 2ERR_DQ - To Error DQ
+ * ---------------------
+ * OpMod == 0 (send DQ) / OpMod == 1 (receive DQ)
+ * INMmod == DQ number
+ * ----------------------------------------------
+ * The 2ERR_DQ command transitions the DQ into the error state from the state
+ * in which it has been. While the command is executed, some in-process
+ * descriptors may complete. Once the DQ transitions into the error state,
+ * if there are posted descriptors on the RDQ/SDQ, the hardware writes
+ * a completion with error (flushed) for all descriptors posted in the RDQ/SDQ.
+ * When the command is completed successfully, the DQ is already in
+ * the error state.
+ */
+
+static inline int __mlxsw_cmd_2err_dq(struct mlxsw_core *mlxsw_core,
+				      u32 dq_number, u8 opcode_mod)
+{
+	return mlxsw_cmd_exec_none(mlxsw_core, MLXSW_CMD_OPCODE_2ERR_DQ,
+				   opcode_mod, dq_number);
+}
+
+static inline int mlxsw_cmd_2err_sdq(struct mlxsw_core *mlxsw_core,
+				     u32 dq_number)
+{
+	return __mlxsw_cmd_2err_dq(mlxsw_core, dq_number,
+				   MLXSW_CMD_OPCODE_MOD_SDQ);
+}
+
+static inline int mlxsw_cmd_2err_rdq(struct mlxsw_core *mlxsw_core,
+				     u32 dq_number)
+{
+	return __mlxsw_cmd_2err_dq(mlxsw_core, dq_number,
+				   MLXSW_CMD_OPCODE_MOD_RDQ);
+}
+
+/* QUERY_DQ - Query DQ
+ * ---------------------
+ * OpMod == 0 (send DQ) / OpMod == 1 (receive DQ)
+ * INMmod == DQ number
+ * ----------------------------------------------
+ * The QUERY_DQ command retrieves a snapshot of DQ parameters from the hardware.
+ *
+ * Note: Output mailbox has the same format as SW2HW_DQ.
+ */
+
+static inline int __mlxsw_cmd_query_dq(struct mlxsw_core *mlxsw_core,
+				       char *out_mbox, u32 dq_number,
+				       u8 opcode_mod)
+{
+	return mlxsw_cmd_exec_out(mlxsw_core, MLXSW_CMD_OPCODE_2ERR_DQ,
+				  opcode_mod, dq_number, false,
+				  out_mbox, MLXSW_CMD_MBOX_SIZE);
+}
+
+static inline int mlxsw_cmd_query_sdq(struct mlxsw_core *mlxsw_core,
+				      char *out_mbox, u32 dq_number)
+{
+	return __mlxsw_cmd_query_dq(mlxsw_core, out_mbox, dq_number,
+				    MLXSW_CMD_OPCODE_MOD_SDQ);
+}
+
+static inline int mlxsw_cmd_query_rdq(struct mlxsw_core *mlxsw_core,
+				      char *out_mbox, u32 dq_number)
+{
+	return __mlxsw_cmd_query_dq(mlxsw_core, out_mbox, dq_number,
+				    MLXSW_CMD_OPCODE_MOD_RDQ);
+}
+
+/* SW2HW_CQ - Software to Hardware CQ
+ * ----------------------------------
+ * OpMod == 0 (N/A), INMmod == CQ number
+ * -------------------------------------
+ * The SW2HW_CQ command transfers ownership of a CQ context entry from software
+ * to hardware. The command takes the CQ context entry from the input mailbox
+ * and stores it in the CQC in the ownership of the hardware. The command fails
+ * if the requested CQC entry is already in the ownership of the hardware.
+ */
+
+static inline int mlxsw_cmd_sw2hw_cq(struct mlxsw_core *mlxsw_core,
+				     char *in_mbox, u32 cq_number)
+{
+	return mlxsw_cmd_exec_in(mlxsw_core, MLXSW_CMD_OPCODE_SW2HW_CQ,
+				 0, cq_number, in_mbox, MLXSW_CMD_MBOX_SIZE);
+}
+
+/* cmd_mbox_sw2hw_cq_cv
+ * CQE Version.
+ * 0 - CQE Version 0, 1 - CQE Version 1
+ */
+MLXSW_ITEM32(cmd_mbox, sw2hw_cq, cv, 0x00, 28, 4);
+
+/* cmd_mbox_sw2hw_cq_c_eqn
+ * Event Queue this CQ reports completion events to.
+ */
+MLXSW_ITEM32(cmd_mbox, sw2hw_cq, c_eqn, 0x00, 24, 1);
+
+/* cmd_mbox_sw2hw_cq_oi
+ * When set, overrun ignore is enabled. When set, updates of
+ * CQ consumer counter (poll for completion) or Request completion
+ * notifications (Arm CQ) DoorBells should not be rung on that CQ.
+ */
+MLXSW_ITEM32(cmd_mbox, sw2hw_cq, oi, 0x00, 12, 1);
+
+/* cmd_mbox_sw2hw_cq_st
+ * Event delivery state machine
+ * 0x0 - FIRED
+ * 0x1 - ARMED (Request for Notification)
+ */
+MLXSW_ITEM32(cmd_mbox, sw2hw_cq, st, 0x00, 8, 1);
+
+/* cmd_mbox_sw2hw_cq_log_cq_size
+ * Log (base 2) of the CQ size (in entries).
+ */
+MLXSW_ITEM32(cmd_mbox, sw2hw_cq, log_cq_size, 0x00, 0, 4);
+
+/* cmd_mbox_sw2hw_cq_producer_counter
+ * Producer Counter. The counter is incremented for each CQE that is
+ * written by the HW to the CQ.
+ * Maintained by HW (valid for the QUERY_CQ command only)
+ */
+MLXSW_ITEM32(cmd_mbox, sw2hw_cq, producer_counter, 0x04, 0, 16);
+
+/* cmd_mbox_sw2hw_cq_pa
+ * Physical Address.
+ */
+MLXSW_ITEM64_INDEXED(cmd_mbox, sw2hw_cq, pa, 0x10, 11, 53, 0x08, 0x00, true);
+
+/* HW2SW_CQ - Hardware to Software CQ
+ * ----------------------------------
+ * OpMod == 0 (N/A), INMmod == CQ number
+ * -------------------------------------
+ * The HW2SW_CQ command transfers ownership of a CQ context entry from hardware
+ * to software. The CQC entry is invalidated as a result of this command.
+ */
+
+static inline int mlxsw_cmd_hw2sw_cq(struct mlxsw_core *mlxsw_core,
+				     u32 cq_number)
+{
+	return mlxsw_cmd_exec_none(mlxsw_core, MLXSW_CMD_OPCODE_HW2SW_CQ,
+				   0, cq_number);
+}
+
+/* QUERY_CQ - Query CQ
+ * ----------------------------------
+ * OpMod == 0 (N/A), INMmod == CQ number
+ * -------------------------------------
+ * The QUERY_CQ command retrieves a snapshot of the current CQ context entry.
+ * The command stores the snapshot in the output mailbox in the software format.
+ * Note that the CQ context state and values are not affected by the QUERY_CQ
+ * command. The QUERY_CQ command is for debug purposes only.
+ *
+ * Note: Output mailbox has the same format as SW2HW_CQ.
+ */
+
+static inline int mlxsw_cmd_query_cq(struct mlxsw_core *mlxsw_core,
+				     char *out_mbox, u32 cq_number)
+{
+	return mlxsw_cmd_exec_out(mlxsw_core, MLXSW_CMD_OPCODE_QUERY_CQ,
+				  0, cq_number, false,
+				  out_mbox, MLXSW_CMD_MBOX_SIZE);
+}
+
+/* SW2HW_EQ - Software to Hardware EQ
+ * ----------------------------------
+ * OpMod == 0 (N/A), INMmod == EQ number
+ * -------------------------------------
+ * The SW2HW_EQ command transfers ownership of an EQ context entry from software
+ * to hardware. The command takes the EQ context entry from the input mailbox
+ * and stores it in the EQC in the ownership of the hardware. The command fails
+ * if the requested EQC entry is already in the ownership of the hardware.
+ */
+
+static inline int mlxsw_cmd_sw2hw_eq(struct mlxsw_core *mlxsw_core,
+				     char *in_mbox, u32 eq_number)
+{
+	return mlxsw_cmd_exec_in(mlxsw_core, MLXSW_CMD_OPCODE_SW2HW_EQ,
+				 0, eq_number, in_mbox, MLXSW_CMD_MBOX_SIZE);
+}
+
+/* cmd_mbox_sw2hw_eq_int_msix
+ * When set, MSI-X cycles will be generated by this EQ.
+ * When cleared, an interrupt will be generated by this EQ.
+ */
+MLXSW_ITEM32(cmd_mbox, sw2hw_eq, int_msix, 0x00, 24, 1);
+
+/* cmd_mbox_sw2hw_eq_int_oi
+ * When set, overrun ignore is enabled.
+ */
+MLXSW_ITEM32(cmd_mbox, sw2hw_eq, oi, 0x00, 12, 1);
+
+/* cmd_mbox_sw2hw_eq_int_st
+ * Event delivery state machine
+ * 0x0 - FIRED
+ * 0x1 - ARMED (Request for Notification)
+ * 0x11 - Always ARMED
+ * other - reserved
+ */
+MLXSW_ITEM32(cmd_mbox, sw2hw_eq, st, 0x00, 8, 2);
+
+/* cmd_mbox_sw2hw_eq_int_log_eq_size
+ * Log (base 2) of the EQ size (in entries).
+ */
+MLXSW_ITEM32(cmd_mbox, sw2hw_eq, log_eq_size, 0x00, 0, 4);
+
+/* cmd_mbox_sw2hw_eq_int_producer_counter
+ * Producer Counter. The counter is incremented for each EQE that is written
+ * by the HW to the EQ.
+ * Maintained by HW (valid for the QUERY_EQ command only)
+ */
+MLXSW_ITEM32(cmd_mbox, sw2hw_eq, producer_counter, 0x04, 0, 16);
+
+/* cmd_mbox_sw2hw_eq_int_pa
+ * Physical Address.
+ */
+MLXSW_ITEM64_INDEXED(cmd_mbox, sw2hw_eq, pa, 0x10, 11, 53, 0x08, 0x00, true);
+
+/* HW2SW_EQ - Hardware to Software EQ
+ * ----------------------------------
+ * OpMod == 0 (N/A), INMmod == EQ number
+ * -------------------------------------
+ */
+
+static inline int mlxsw_cmd_hw2sw_eq(struct mlxsw_core *mlxsw_core,
+				     u32 eq_number)
+{
+	return mlxsw_cmd_exec_none(mlxsw_core, MLXSW_CMD_OPCODE_HW2SW_EQ,
+				   0, eq_number);
+}
+
+/* QUERY_EQ - Query EQ
+ * ----------------------------------
+ * OpMod == 0 (N/A), INMmod == EQ number
+ * -------------------------------------
+ *
+ * Note: Output mailbox has the same format as SW2HW_EQ.
+ */
+
+static inline int mlxsw_cmd_query_eq(struct mlxsw_core *mlxsw_core,
+				     char *out_mbox, u32 eq_number)
+{
+	return mlxsw_cmd_exec_out(mlxsw_core, MLXSW_CMD_OPCODE_QUERY_EQ,
+				  0, eq_number, false,
+				  out_mbox, MLXSW_CMD_MBOX_SIZE);
+}
+
+#endif
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
new file mode 100644
index 0000000..211ec9b
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -0,0 +1,551 @@
+/*
+ * drivers/net/ethernet/mellanox/mlxsw/core.c
+ * Copyright (c) 2015 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2015 Jiri Pirko <jiri@mellanox.com>
+ * Copyright (c) 2015 Ido Schimmel <idosch@mellanox.com>
+ * Copyright (c) 2015 Elad Raz <eladr@mellanox.com>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the names of the copyright holders nor the names of its
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/export.h>
+#include <linux/err.h>
+#include <linux/if_link.h>
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+#include <linux/u64_stats_sync.h>
+#include <linux/netdevice.h>
+
+#include "core.h"
+#include "item.h"
+#include "cmd.h"
+#include "port.h"
+#include "trap.h"
+
+static LIST_HEAD(mlxsw_core_driver_list);
+static DEFINE_SPINLOCK(mlxsw_core_driver_list_lock);
+
+static const char mlxsw_core_driver_name[] = "mlxsw_core";
+
+static struct dentry *mlxsw_core_dbg_root;
+
+struct mlxsw_core_pcpu_stats {
+	u64			trap_rx_packets[MLXSW_TRAP_ID_MAX];
+	u64			trap_rx_bytes[MLXSW_TRAP_ID_MAX];
+	u64			port_rx_packets[MLXSW_PORT_MAX_PORTS];
+	u64			port_rx_bytes[MLXSW_PORT_MAX_PORTS];
+	struct u64_stats_sync	syncp;
+	u32			trap_rx_dropped[MLXSW_TRAP_ID_MAX];
+	u32			port_rx_dropped[MLXSW_PORT_MAX_PORTS];
+	u32			trap_rx_invalid;
+	u32			port_rx_invalid;
+};
+
+struct mlxsw_core {
+	struct mlxsw_driver *driver;
+	const struct mlxsw_bus *bus;
+	void *bus_priv;
+	const struct mlxsw_bus_info *bus_info;
+	struct list_head rx_listener_list;
+	struct mlxsw_core_pcpu_stats __percpu *pcpu_stats;
+	struct dentry *dbg_dir;
+	struct {
+		struct debugfs_blob_wrapper vsd_blob;
+		struct debugfs_blob_wrapper psid_blob;
+	} dbg;
+	unsigned long driver_priv[0];
+	/* driver_priv has to be always the last item */
+};
+
+struct mlxsw_rx_listener_item {
+	struct list_head list;
+	struct mlxsw_rx_listener rxl;
+	void *priv;
+};
+
+/*****************
+ * Core functions
+ *****************/
+
+static int mlxsw_core_rx_stats_dbg_read(struct seq_file *file, void *data)
+{
+	struct mlxsw_core *mlxsw_core = file->private;
+	struct mlxsw_core_pcpu_stats *p;
+	u64 rx_packets, rx_bytes;
+	u64 tmp_rx_packets, tmp_rx_bytes;
+	u32 rx_dropped, rx_invalid;
+	unsigned int start;
+	int i;
+	int j;
+	static const char hdr[] =
+		"     NUM   RX_PACKETS     RX_BYTES RX_DROPPED\n";
+
+	seq_printf(file, hdr);
+	for (i = 0; i < MLXSW_TRAP_ID_MAX; i++) {
+		rx_packets = 0;
+		rx_bytes = 0;
+		rx_dropped = 0;
+		for_each_possible_cpu(j) {
+			p = per_cpu_ptr(mlxsw_core->pcpu_stats, j);
+			do {
+				start = u64_stats_fetch_begin(&p->syncp);
+				tmp_rx_packets = p->trap_rx_packets[i];
+				tmp_rx_bytes = p->trap_rx_bytes[i];
+			} while (u64_stats_fetch_retry(&p->syncp, start));
+
+			rx_packets += tmp_rx_packets;
+			rx_bytes += tmp_rx_bytes;
+			rx_dropped += p->trap_rx_dropped[i];
+		}
+		seq_printf(file, "trap %3d %12llu %12llu %10u\n",
+			   i, rx_packets, rx_bytes, rx_dropped);
+	}
+	rx_invalid = 0;
+	for_each_possible_cpu(j) {
+		p = per_cpu_ptr(mlxsw_core->pcpu_stats, j);
+		rx_invalid += p->trap_rx_invalid;
+	}
+	seq_printf(file, "trap INV                           %10u\n",
+		   rx_invalid);
+
+	for (i = 0; i < MLXSW_PORT_MAX_PORTS; i++) {
+		rx_packets = 0;
+		rx_bytes = 0;
+		rx_dropped = 0;
+		for_each_possible_cpu(j) {
+			p = per_cpu_ptr(mlxsw_core->pcpu_stats, j);
+			do {
+				start = u64_stats_fetch_begin(&p->syncp);
+				tmp_rx_packets = p->port_rx_packets[i];
+				tmp_rx_bytes = p->port_rx_bytes[i];
+			} while (u64_stats_fetch_retry(&p->syncp, start));
+
+			rx_packets += tmp_rx_packets;
+			rx_bytes += tmp_rx_bytes;
+			rx_dropped += p->port_rx_dropped[i];
+		}
+		seq_printf(file, "port %3d %12llu %12llu %10u\n",
+			   i, rx_packets, rx_bytes, rx_dropped);
+	}
+	rx_invalid = 0;
+	for_each_possible_cpu(j) {
+		p = per_cpu_ptr(mlxsw_core->pcpu_stats, j);
+		rx_invalid += p->port_rx_invalid;
+	}
+	seq_printf(file, "port INV                           %10u\n",
+		   rx_invalid);
+	return 0;
+}
+
+static int mlxsw_core_rx_stats_dbg_open(struct inode *inode, struct file *f)
+{
+	struct mlxsw_core *mlxsw_core = inode->i_private;
+
+	return single_open(f, mlxsw_core_rx_stats_dbg_read, mlxsw_core);
+}
+
+static const struct file_operations mlxsw_core_rx_stats_dbg_ops = {
+	.owner = THIS_MODULE,
+	.open = mlxsw_core_rx_stats_dbg_open,
+	.release = single_release,
+	.read = seq_read,
+	.llseek = seq_lseek
+};
+
+static void mlxsw_core_buf_dump_dbg(struct mlxsw_core *mlxsw_core,
+				    const char *buf, size_t size)
+{
+	__be32 *m = (__be32 *) buf;
+	int i;
+	int count = size / sizeof(__be32);
+
+	for (i = count - 1; i >= 0; i--)
+		if (m[i])
+			break;
+	i++;
+	count = i ? i : 1;
+	for (i = 0; i < count; i += 4)
+		dev_dbg(mlxsw_core->bus_info->dev, "%04x - %08x %08x %08x %08x\n",
+			i * 4, be32_to_cpu(m[i]), be32_to_cpu(m[i + 1]),
+			be32_to_cpu(m[i + 2]), be32_to_cpu(m[i + 3]));
+}
+
+int mlxsw_core_driver_register(struct mlxsw_driver *mlxsw_driver)
+{
+	spin_lock(&mlxsw_core_driver_list_lock);
+	list_add_tail(&mlxsw_driver->list, &mlxsw_core_driver_list);
+	spin_unlock(&mlxsw_core_driver_list_lock);
+	return 0;
+}
+EXPORT_SYMBOL(mlxsw_core_driver_register);
+
+void mlxsw_core_driver_unregister(struct mlxsw_driver *mlxsw_driver)
+{
+	spin_lock(&mlxsw_core_driver_list_lock);
+	list_del(&mlxsw_driver->list);
+	spin_unlock(&mlxsw_core_driver_list_lock);
+}
+EXPORT_SYMBOL(mlxsw_core_driver_unregister);
+
+static struct mlxsw_driver *__driver_find(const char *kind)
+{
+	struct mlxsw_driver *mlxsw_driver;
+
+	list_for_each_entry(mlxsw_driver, &mlxsw_core_driver_list, list) {
+		if (strcmp(mlxsw_driver->kind, kind) == 0)
+			return mlxsw_driver;
+	}
+	return NULL;
+}
+
+static struct mlxsw_driver *mlxsw_core_driver_get(const char *kind)
+{
+	struct mlxsw_driver *mlxsw_driver;
+
+	spin_lock(&mlxsw_core_driver_list_lock);
+	mlxsw_driver = __driver_find(kind);
+	if (!mlxsw_driver) {
+		spin_unlock(&mlxsw_core_driver_list_lock);
+		request_module(MLXSW_MODULE_ALIAS_PREFIX "%s", kind);
+		spin_lock(&mlxsw_core_driver_list_lock);
+		mlxsw_driver = __driver_find(kind);
+	}
+	if (mlxsw_driver) {
+		if (!try_module_get(mlxsw_driver->owner))
+			mlxsw_driver = NULL;
+	}
+
+	spin_unlock(&mlxsw_core_driver_list_lock);
+	return mlxsw_driver;
+}
+
+static void mlxsw_core_driver_put(const char *kind)
+{
+	struct mlxsw_driver *mlxsw_driver;
+
+	spin_lock(&mlxsw_core_driver_list_lock);
+	mlxsw_driver = __driver_find(kind);
+	spin_unlock(&mlxsw_core_driver_list_lock);
+	if (!mlxsw_driver)
+		return;
+	module_put(mlxsw_driver->owner);
+}
+
+static int mlxsw_core_debugfs_init(struct mlxsw_core *mlxsw_core)
+{
+	const struct mlxsw_bus_info *bus_info = mlxsw_core->bus_info;
+
+	mlxsw_core->dbg_dir = debugfs_create_dir(bus_info->device_name,
+						 mlxsw_core_dbg_root);
+	if (!mlxsw_core->dbg_dir)
+		return -ENOMEM;
+	debugfs_create_file("rx_stats", S_IRUGO, mlxsw_core->dbg_dir,
+			    mlxsw_core, &mlxsw_core_rx_stats_dbg_ops);
+	mlxsw_core->dbg.vsd_blob.data = (void *) &bus_info->vsd;
+	mlxsw_core->dbg.vsd_blob.size = sizeof(bus_info->vsd);
+	debugfs_create_blob("vsd", S_IRUGO, mlxsw_core->dbg_dir,
+			    &mlxsw_core->dbg.vsd_blob);
+	mlxsw_core->dbg.psid_blob.data = (void *) &bus_info->psid;
+	mlxsw_core->dbg.psid_blob.size = sizeof(bus_info->psid);
+	debugfs_create_blob("psid", S_IRUGO, mlxsw_core->dbg_dir,
+			    &mlxsw_core->dbg.psid_blob);
+	return 0;
+}
+
+static void mlxsw_core_debugfs_fini(struct mlxsw_core *mlxsw_core)
+{
+	debugfs_remove_recursive(mlxsw_core->dbg_dir);
+}
+
+int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
+				   const struct mlxsw_bus *mlxsw_bus,
+				   void *bus_priv)
+{
+	const char *device_kind = mlxsw_bus_info->device_kind;
+	struct mlxsw_core *mlxsw_core;
+	struct mlxsw_driver *mlxsw_driver;
+	size_t alloc_size;
+	int err;
+
+	mlxsw_driver = mlxsw_core_driver_get(device_kind);
+	if (!mlxsw_driver)
+		return -EINVAL;
+	alloc_size = sizeof(*mlxsw_core) + mlxsw_driver->priv_size;
+	mlxsw_core = kzalloc(alloc_size, GFP_KERNEL);
+	if (!mlxsw_core) {
+		err = -ENOMEM;
+		goto err_core_alloc;
+	}
+
+	INIT_LIST_HEAD(&mlxsw_core->rx_listener_list);
+	mlxsw_core->driver = mlxsw_driver;
+	mlxsw_core->bus = mlxsw_bus;
+	mlxsw_core->bus_priv = bus_priv;
+	mlxsw_core->bus_info = mlxsw_bus_info;
+
+	mlxsw_core->pcpu_stats =
+		netdev_alloc_pcpu_stats(struct mlxsw_core_pcpu_stats);
+	if (!mlxsw_core->pcpu_stats) {
+		err = -ENOMEM;
+		goto err_alloc_stats;
+	}
+	err = mlxsw_bus->init(bus_priv, mlxsw_core, mlxsw_driver->profile);
+	if (err)
+		goto err_bus_init;
+
+	err = mlxsw_driver->init(mlxsw_core->driver_priv, mlxsw_core,
+				 mlxsw_bus_info);
+	if (err)
+		goto err_driver_init;
+
+	err = mlxsw_core_debugfs_init(mlxsw_core);
+	if (err)
+		goto err_debugfs_init;
+
+	return 0;
+
+err_debugfs_init:
+	mlxsw_core->driver->fini(mlxsw_core->driver_priv);
+err_driver_init:
+	mlxsw_bus->fini(bus_priv);
+err_bus_init:
+	free_percpu(mlxsw_core->pcpu_stats);
+err_alloc_stats:
+	kfree(mlxsw_core);
+err_core_alloc:
+	mlxsw_core_driver_put(device_kind);
+	return err;
+}
+EXPORT_SYMBOL(mlxsw_core_bus_device_register);
+
+void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core)
+{
+	const char *device_kind = mlxsw_core->bus_info->device_kind;
+
+	mlxsw_core_debugfs_fini(mlxsw_core);
+	mlxsw_core->driver->fini(mlxsw_core->driver_priv);
+	mlxsw_core->bus->fini(mlxsw_core->bus_priv);
+	free_percpu(mlxsw_core->pcpu_stats);
+	kfree(mlxsw_core);
+	mlxsw_core_driver_put(device_kind);
+}
+EXPORT_SYMBOL(mlxsw_core_bus_device_unregister);
+
+static struct mlxsw_core *__mlxsw_core_get(void *driver_priv)
+{
+	return container_of(driver_priv, struct mlxsw_core, driver_priv);
+}
+
+int mlxsw_core_skb_transmit(void *driver_priv, struct sk_buff *skb,
+			    const struct mlxsw_tx_info *tx_info)
+{
+	struct mlxsw_core *mlxsw_core = __mlxsw_core_get(driver_priv);
+
+	return mlxsw_core->bus->skb_transmit(mlxsw_core->bus_priv, skb,
+					     tx_info);
+}
+EXPORT_SYMBOL(mlxsw_core_skb_transmit);
+
+static bool __is_rx_listener_equal(const struct mlxsw_rx_listener *rxl_a,
+				   const struct mlxsw_rx_listener *rxl_b)
+{
+	return (rxl_a->func == rxl_b->func &&
+		rxl_a->local_port == rxl_b->local_port &&
+		rxl_a->trap_id == rxl_b->trap_id);
+}
+
+static struct mlxsw_rx_listener_item *
+__find_rx_listener_item(struct mlxsw_core *mlxsw_core,
+			const struct mlxsw_rx_listener *rxl,
+			void *priv)
+{
+	struct mlxsw_rx_listener_item *rxl_item;
+
+	list_for_each_entry(rxl_item, &mlxsw_core->rx_listener_list, list) {
+		if (__is_rx_listener_equal(&rxl_item->rxl, rxl) &&
+		    rxl_item->priv == priv)
+			return rxl_item;
+	}
+	return NULL;
+}
+
+int mlxsw_core_rx_listener_register(struct mlxsw_core *mlxsw_core,
+				    const struct mlxsw_rx_listener *rxl,
+				    void *priv)
+{
+	struct mlxsw_rx_listener_item *rxl_item;
+
+	rxl_item = __find_rx_listener_item(mlxsw_core, rxl, priv);
+	if (rxl_item)
+		return -EEXIST;
+	rxl_item = kmalloc(sizeof(*rxl_item), GFP_KERNEL);
+	if (!rxl_item)
+		return -ENOMEM;
+	rxl_item->rxl = *rxl;
+	rxl_item->priv = priv;
+
+	list_add_rcu(&rxl_item->list, &mlxsw_core->rx_listener_list);
+	return 0;
+}
+EXPORT_SYMBOL(mlxsw_core_rx_listener_register);
+
+void mlxsw_core_rx_listener_unregister(struct mlxsw_core *mlxsw_core,
+				       const struct mlxsw_rx_listener *rxl,
+				       void *priv)
+{
+	struct mlxsw_rx_listener_item *rxl_item;
+
+	rxl_item = __find_rx_listener_item(mlxsw_core, rxl, priv);
+	if (!rxl_item)
+		return;
+	list_del_rcu(&rxl_item->list);
+	synchronize_rcu();
+	kfree(rxl_item);
+}
+EXPORT_SYMBOL(mlxsw_core_rx_listener_unregister);
+
+void mlxsw_core_skb_receive(struct mlxsw_core *mlxsw_core, struct sk_buff *skb,
+			    struct mlxsw_rx_info *rx_info)
+{
+	struct mlxsw_rx_listener_item *rxl_item;
+	const struct mlxsw_rx_listener *rxl;
+	struct mlxsw_core_pcpu_stats *pcpu_stats;
+	u8 local_port = rx_info->sys_port;
+	bool found = false;
+
+	dev_dbg_ratelimited(mlxsw_core->bus_info->dev, "%s: sys_port = %d, trap_id = 0x%x\n",
+			    __func__, rx_info->sys_port, rx_info->trap_id);
+
+	if ((rx_info->trap_id >= MLXSW_TRAP_ID_MAX) ||
+	    (local_port >= MLXSW_PORT_MAX_PORTS))
+		goto drop;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(rxl_item, &mlxsw_core->rx_listener_list, list) {
+		rxl = &rxl_item->rxl;
+		if (((rxl->local_port == MLXSW_PORT_MAX_PORTS) ||
+		     (rxl->local_port == local_port)) &&
+		    ((rxl->trap_id == MLXSW_TRAP_ID_DONT_CARE) ||
+		     (rxl->trap_id == rx_info->trap_id))) {
+			found = true;
+			break;
+		}
+	}
+	rcu_read_unlock();
+	if (!found)
+		goto drop;
+
+	pcpu_stats = this_cpu_ptr(mlxsw_core->pcpu_stats);
+	u64_stats_update_begin(&pcpu_stats->syncp);
+	pcpu_stats->port_rx_packets[local_port]++;
+	pcpu_stats->port_rx_bytes[local_port] += skb->len;
+	pcpu_stats->trap_rx_packets[rx_info->trap_id]++;
+	pcpu_stats->trap_rx_bytes[rx_info->trap_id] += skb->len;
+	u64_stats_update_end(&pcpu_stats->syncp);
+
+	rxl->func(skb, local_port, rx_info->trap_id, rxl_item->priv);
+	return;
+
+drop:
+	if (rx_info->trap_id >= MLXSW_TRAP_ID_MAX)
+		this_cpu_inc(mlxsw_core->pcpu_stats->trap_rx_invalid);
+	else
+		this_cpu_inc(mlxsw_core->pcpu_stats->trap_rx_dropped[rx_info->trap_id]);
+	if (local_port >= MLXSW_PORT_MAX_PORTS)
+		this_cpu_inc(mlxsw_core->pcpu_stats->port_rx_invalid);
+	else
+		this_cpu_inc(mlxsw_core->pcpu_stats->port_rx_dropped[local_port]);
+	dev_kfree_skb(skb);
+}
+EXPORT_SYMBOL(mlxsw_core_skb_receive);
+
+int mlxsw_cmd_exec(struct mlxsw_core *mlxsw_core, u16 opcode, u8 opcode_mod,
+		   u32 in_mod, bool out_mbox_direct,
+		   char *in_mbox, size_t in_mbox_size,
+		   char *out_mbox, size_t out_mbox_size)
+{
+	u8 status;
+	int err;
+
+	BUG_ON(in_mbox_size % sizeof(u32) || out_mbox_size % sizeof(u32));
+	if (!mlxsw_core->bus->cmd_exec)
+		return -EOPNOTSUPP;
+
+	dev_dbg(mlxsw_core->bus_info->dev, "Cmd exec (opcode=%x(%s),opcode_mod=%x,in_mod=%x)\n",
+		opcode, mlxsw_cmd_opcode_str(opcode), opcode_mod, in_mod);
+	if (in_mbox) {
+		dev_dbg(mlxsw_core->bus_info->dev, "Input mailbox:\n");
+		mlxsw_core_buf_dump_dbg(mlxsw_core, in_mbox, in_mbox_size);
+	}
+
+	err = mlxsw_core->bus->cmd_exec(mlxsw_core->bus_priv, opcode,
+					opcode_mod, in_mod, out_mbox_direct,
+					in_mbox, in_mbox_size,
+					out_mbox, out_mbox_size, &status);
+
+	if (err == -EIO && status != MLXSW_CMD_STATUS_OK) {
+		dev_err(mlxsw_core->bus_info->dev, "Cmd exec failed (opcode=%x(%s),opcode_mod=%x,in_mod=%x,status=%x(%s))\n",
+			opcode, mlxsw_cmd_opcode_str(opcode), opcode_mod,
+			in_mod, status, mlxsw_cmd_status_str(status));
+	} else if (err == -ETIMEDOUT) {
+		dev_err(mlxsw_core->bus_info->dev, "Cmd exec timed-out (opcode=%x(%s),opcode_mod=%x,in_mod=%x)\n",
+			opcode, mlxsw_cmd_opcode_str(opcode), opcode_mod,
+			in_mod);
+	}
+
+	if (!err && out_mbox) {
+		dev_dbg(mlxsw_core->bus_info->dev, "Output mailbox:\n");
+		mlxsw_core_buf_dump_dbg(mlxsw_core, out_mbox, out_mbox_size);
+	}
+	return err;
+}
+EXPORT_SYMBOL(mlxsw_cmd_exec);
+
+static int __init mlxsw_core_module_init(void)
+{
+	mlxsw_core_dbg_root = debugfs_create_dir(mlxsw_core_driver_name, NULL);
+	if (!mlxsw_core_dbg_root)
+		return -ENOMEM;
+	return 0;
+}
+
+static void __exit mlxsw_core_module_exit(void)
+{
+	debugfs_remove_recursive(mlxsw_core_dbg_root);
+}
+
+module_init(mlxsw_core_module_init);
+module_exit(mlxsw_core_module_exit);
+
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_AUTHOR("Jiri Pirko <jiri@mellanox.com>");
+MODULE_DESCRIPTION("Mellanox switch device core driver");
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
new file mode 100644
index 0000000..8033408
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -0,0 +1,180 @@
+/*
+ * drivers/net/ethernet/mellanox/mlxsw/core.h
+ * Copyright (c) 2015 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2015 Jiri Pirko <jiri@mellanox.com>
+ * Copyright (c) 2015 Ido Schimmel <idosch@mellanox.com>
+ * Copyright (c) 2015 Elad Raz <eladr@mellanox.com>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the names of the copyright holders nor the names of its
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _MLXSW_CORE_H
+#define _MLXSW_CORE_H
+
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/slab.h>
+#include <linux/gfp.h>
+#include <linux/types.h>
+#include <linux/skbuff.h>
+
+#include "cmd.h"
+
+#define MLXSW_MODULE_ALIAS_PREFIX "mlxsw-driver-"
+#define MODULE_MLXSW_DRIVER_ALIAS(kind)	\
+	MODULE_ALIAS(MLXSW_MODULE_ALIAS_PREFIX kind)
+
+struct mlxsw_core;
+struct mlxsw_driver;
+struct mlxsw_bus;
+struct mlxsw_bus_info;
+
+int mlxsw_core_driver_register(struct mlxsw_driver *mlxsw_driver);
+void mlxsw_core_driver_unregister(struct mlxsw_driver *mlxsw_driver);
+
+int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
+				   const struct mlxsw_bus *mlxsw_bus,
+				   void *bus_priv);
+void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core);
+
+struct mlxsw_tx_info {
+	u8 local_port;
+	bool is_emad;
+};
+
+int mlxsw_core_skb_transmit(void *driver_priv, struct sk_buff *skb,
+			    const struct mlxsw_tx_info *tx_info);
+
+struct mlxsw_rx_listener {
+	void (*func)(struct sk_buff *skb, u8 local_port,
+		     u16 trap_id, void *priv);
+	u8 local_port;
+	u16 trap_id;
+};
+
+int mlxsw_core_rx_listener_register(struct mlxsw_core *mlxsw_core,
+				    const struct mlxsw_rx_listener *rxl,
+				    void *priv);
+void mlxsw_core_rx_listener_unregister(struct mlxsw_core *mlxsw_core,
+				       const struct mlxsw_rx_listener *rxl,
+				       void *priv);
+
+struct mlxsw_rx_info {
+	u16 sys_port;
+	int trap_id;
+};
+
+void mlxsw_core_skb_receive(struct mlxsw_core *mlxsw_core, struct sk_buff *skb,
+			    struct mlxsw_rx_info *rx_info);
+
+#define MLXSW_CONFIG_PROFILE_SWID_COUNT 8
+
+struct mlxsw_swid_config {
+	u8	used_type:1,
+		used_properties:1;
+	u8	type;
+	u8	properties;
+};
+
+struct mlxsw_config_profile {
+	u16	used_max_vepa_channels:1,
+		used_max_lag:1,
+		used_max_port_per_lag:1,
+		used_max_mid:1,
+		used_max_pgt:1,
+		used_max_system_port:1,
+		used_max_vlan_groups:1,
+		used_max_regions:1,
+		used_flood_tables:1,
+		used_flood_mode:1,
+		used_max_ib_mc:1,
+		used_max_pkey:1,
+		used_ar_sec:1,
+		used_adaptive_routing_group_cap:1;
+	u8	max_vepa_channels;
+	u16	max_lag;
+	u16	max_port_per_lag;
+	u16	max_mid;
+	u16	max_pgt;
+	u16	max_system_port;
+	u16	max_vlan_groups;
+	u16	max_regions;
+	u8	max_flood_tables;
+	u8	max_vid_flood_tables;
+	u8	flood_mode;
+	u16	max_ib_mc;
+	u16	max_pkey;
+	u8	ar_sec;
+	u16	adaptive_routing_group_cap;
+	u8	arn;
+	struct mlxsw_swid_config swid_config[MLXSW_CONFIG_PROFILE_SWID_COUNT];
+};
+
+struct mlxsw_driver {
+	struct list_head list;
+	const char *kind;
+	struct module *owner;
+	size_t priv_size;
+	int (*init)(void *driver_priv, struct mlxsw_core *mlxsw_core,
+		    const struct mlxsw_bus_info *mlxsw_bus_info);
+	void (*fini)(void *driver_priv);
+	void (*txhdr_construct)(struct sk_buff *skb,
+				const struct mlxsw_tx_info *tx_info);
+	u8 txhdr_len;
+	const struct mlxsw_config_profile *profile;
+};
+
+struct mlxsw_bus {
+	const char *kind;
+	int (*init)(void *bus_priv, struct mlxsw_core *mlxsw_core,
+		    const struct mlxsw_config_profile *profile);
+	void (*fini)(void *bus_priv);
+	int (*skb_transmit)(void *bus_priv, struct sk_buff *skb,
+			    const struct mlxsw_tx_info *tx_info);
+	int (*cmd_exec)(void *bus_priv, u16 opcode, u8 opcode_mod,
+			u32 in_mod, bool out_mbox_direct,
+			char *in_mbox, size_t in_mbox_size,
+			char *out_mbox, size_t out_mbox_size,
+			u8 *p_status);
+};
+
+struct mlxsw_bus_info {
+	const char *device_kind;
+	const char *device_name;
+	struct device *dev;
+	struct {
+		u16 major;
+		u16 minor;
+		u16 subminor;
+	} fw_rev;
+	u8 vsd[MLXSW_CMD_BOARDINFO_VSD_LEN];
+	u8 psid[MLXSW_CMD_BOARDINFO_PSID_LEN];
+};
+
+#endif
diff --git a/drivers/net/ethernet/mellanox/mlxsw/item.h b/drivers/net/ethernet/mellanox/mlxsw/item.h
new file mode 100644
index 0000000..4d0ac88
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/item.h
@@ -0,0 +1,405 @@
+/*
+ * drivers/net/ethernet/mellanox/mlxsw/item.h
+ * Copyright (c) 2015 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2015 Jiri Pirko <jiri@mellanox.com>
+ * Copyright (c) 2015 Ido Schimmel <idosch@mellanox.com>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the names of the copyright holders nor the names of its
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _MLXSW_ITEM_H
+#define _MLXSW_ITEM_H
+
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/bitops.h>
+
+struct mlxsw_item {
+	unsigned short	offset;		/* bytes in container */
+	unsigned short	step;		/* step in bytes for indexed items */
+	unsigned short	in_step_offset; /* offset within one step */
+	unsigned char	shift;		/* shift in bits */
+	unsigned char	element_size;	/* size of element in bit array */
+	bool		no_real_shift;
+	union {
+		unsigned char	bits;
+		unsigned short	bytes;
+	} size;
+	const char	*name;
+};
+
+static inline unsigned int
+__mlxsw_item_offset(struct mlxsw_item *item, unsigned short index,
+		    size_t typesize)
+{
+	BUG_ON(index && !item->step);
+	if (item->offset % typesize != 0 ||
+	    item->step % typesize != 0 ||
+	    item->in_step_offset % typesize != 0) {
+		pr_err("mlxsw: item bug (name=%s,offset=%x,step=%x,in_step_offset=%x,typesize=%lx)\n",
+		       item->name, item->offset, item->step,
+		       item->in_step_offset, typesize);
+		BUG();
+	}
+
+	return ((item->offset + item->step * index + item->in_step_offset) /
+		typesize);
+}
+
+static inline u16 __mlxsw_item_get16(char *buf, struct mlxsw_item *item,
+				     unsigned short index)
+{
+	unsigned int offset = __mlxsw_item_offset(item, index, sizeof(u16));
+	__be16 *b = (__be16 *) buf;
+	u16 tmp;
+
+	tmp = be16_to_cpu(b[offset]);
+	tmp >>= item->shift;
+	tmp &= GENMASK(item->size.bits - 1, 0);
+	if (item->no_real_shift)
+		tmp <<= item->shift;
+	return tmp;
+}
+
+static inline void __mlxsw_item_set16(char *buf, struct mlxsw_item *item,
+				      unsigned short index, u16 val)
+{
+	unsigned int offset = __mlxsw_item_offset(item, index,
+						  sizeof(u16));
+	__be16 *b = (__be16 *) buf;
+	u16 mask = GENMASK(item->size.bits - 1, 0) << item->shift;
+	u16 tmp;
+
+	if (!item->no_real_shift)
+		val <<= item->shift;
+	val &= mask;
+	tmp = be16_to_cpu(b[offset]);
+	tmp &= ~mask;
+	tmp |= val;
+	b[offset] = cpu_to_be16(tmp);
+}
+
+static inline u32 __mlxsw_item_get32(char *buf, struct mlxsw_item *item,
+				     unsigned short index)
+{
+	unsigned int offset = __mlxsw_item_offset(item, index, sizeof(u32));
+	__be32 *b = (__be32 *) buf;
+	u32 tmp;
+
+	tmp = be32_to_cpu(b[offset]);
+	tmp >>= item->shift;
+	tmp &= GENMASK(item->size.bits - 1, 0);
+	if (item->no_real_shift)
+		tmp <<= item->shift;
+	return tmp;
+}
+
+static inline void __mlxsw_item_set32(char *buf, struct mlxsw_item *item,
+				      unsigned short index, u32 val)
+{
+	unsigned int offset = __mlxsw_item_offset(item, index,
+						  sizeof(u32));
+	__be32 *b = (__be32 *) buf;
+	u32 mask = GENMASK(item->size.bits - 1, 0) << item->shift;
+	u32 tmp;
+
+	if (!item->no_real_shift)
+		val <<= item->shift;
+	val &= mask;
+	tmp = be32_to_cpu(b[offset]);
+	tmp &= ~mask;
+	tmp |= val;
+	b[offset] = cpu_to_be32(tmp);
+}
+
+static inline u64 __mlxsw_item_get64(char *buf, struct mlxsw_item *item,
+				     unsigned short index)
+{
+	unsigned int offset = __mlxsw_item_offset(item, index, sizeof(u64));
+	__be64 *b = (__be64 *) buf;
+	u64 tmp;
+
+	tmp = be64_to_cpu(b[offset]);
+	tmp >>= item->shift;
+	tmp &= GENMASK_ULL(item->size.bits - 1, 0);
+	if (item->no_real_shift)
+		tmp <<= item->shift;
+	return tmp;
+}
+
+static inline void __mlxsw_item_set64(char *buf, struct mlxsw_item *item,
+				      unsigned short index, u64 val)
+{
+	unsigned int offset = __mlxsw_item_offset(item, index, sizeof(u64));
+	__be64 *b = (__be64 *) buf;
+	u64 mask = GENMASK_ULL(item->size.bits - 1, 0) << item->shift;
+	u64 tmp;
+
+	if (!item->no_real_shift)
+		val <<= item->shift;
+	val &= mask;
+	tmp = be64_to_cpu(b[offset]);
+	tmp &= ~mask;
+	tmp |= val;
+	b[offset] = cpu_to_be64(tmp);
+}
+
+static inline void __mlxsw_item_memcpy_from(char *buf, char *dst,
+					    struct mlxsw_item *item)
+{
+	memcpy(dst, &buf[item->offset], item->size.bytes);
+}
+
+static inline void __mlxsw_item_memcpy_to(char *buf, char *src,
+					  struct mlxsw_item *item)
+{
+	memcpy(&buf[item->offset], src, item->size.bytes);
+}
+
+static inline u16
+__mlxsw_item_bit_array_offset(struct mlxsw_item *item, u16 index, u8 *shift)
+{
+	u16 max_index, be_index;
+	u16 offset;		/* byte offset inside the array */
+
+	BUG_ON(index && !item->element_size);
+	if (item->offset % sizeof(u32) != 0 ||
+	    BITS_PER_BYTE % item->element_size != 0) {
+		pr_err("mlxsw: item bug (name=%s,offset=%x,element_size=%x)\n",
+		       item->name, item->offset, item->element_size);
+		BUG();
+	}
+
+	max_index = (item->size.bytes << 3) / item->element_size - 1;
+	be_index = max_index - index;
+	offset = be_index * item->element_size >> 3;
+	*shift = index % (BITS_PER_BYTE / item->element_size) << 1;
+
+	return item->offset + offset;
+}
+
+static inline u8 __mlxsw_item_bit_array_get(char *buf, struct mlxsw_item *item,
+					    u16 index)
+{
+	u8 shift, tmp;
+	u16 offset = __mlxsw_item_bit_array_offset(item, index, &shift);
+
+	tmp = buf[offset];
+	tmp >>= shift;
+	tmp &= GENMASK(item->element_size - 1, 0);
+	return tmp;
+}
+
+static inline void __mlxsw_item_bit_array_set(char *buf, struct mlxsw_item *item,
+					      u16 index, u8 val)
+{
+	u8 shift, tmp;
+	u16 offset = __mlxsw_item_bit_array_offset(item, index, &shift);
+	u8 mask = GENMASK(item->element_size - 1, 0) << shift;
+
+	val <<= shift;
+	val &= mask;
+	tmp = buf[offset];
+	tmp &= ~mask;
+	tmp |= val;
+	buf[offset] = tmp;
+}
+
+#define __ITEM_NAME(_type, _cname, _iname)					\
+	mlxsw_##_type##_##_cname##_##_iname##_item
+
+/* _type: cmd_mbox, reg, etc.
+ * _cname: containter name (e.g. command name, register name)
+ * _iname: item name within the container
+ */
+
+#define MLXSW_ITEM16(_type, _cname, _iname, _offset, _shift, _sizebits)		\
+static struct mlxsw_item __ITEM_NAME(_type, _cname, _iname) = {			\
+	.offset = _offset,							\
+	.shift = _shift,							\
+	.size = {.bits = _sizebits,},						\
+	.name = #_type "_" #_cname "_" #_iname,					\
+};										\
+static inline u16 mlxsw_##_type##_##_cname##_##_iname##_get(char *buf)		\
+{										\
+	return __mlxsw_item_get16(buf, &__ITEM_NAME(_type, _cname, _iname), 0);	\
+}										\
+static inline void mlxsw_##_type##_##_cname##_##_iname##_set(char *buf, u16 val)\
+{										\
+	__mlxsw_item_set16(buf, &__ITEM_NAME(_type, _cname, _iname), 0, val);	\
+}
+
+#define MLXSW_ITEM16_INDEXED(_type, _cname, _iname, _offset, _shift, _sizebits,	\
+			     _step, _instepoffset, _norealshift)		\
+static struct mlxsw_item __ITEM_NAME(_type, _cname, _iname) = {			\
+	.offset = _offset,							\
+	.step = _step,								\
+	.in_step_offset = _instepoffset,					\
+	.shift = _shift,							\
+	.no_real_shift = _norealshift,						\
+	.size = {.bits = _sizebits,},						\
+	.name = #_type "_" #_cname "_" #_iname,					\
+};										\
+static inline u16								\
+mlxsw_##_type##_##_cname##_##_iname##_get(char *buf, unsigned short index)	\
+{										\
+	return __mlxsw_item_get16(buf, &__ITEM_NAME(_type, _cname, _iname),	\
+				  index);					\
+}										\
+static inline void								\
+mlxsw_##_type##_##_cname##_##_iname##_set(char *buf, unsigned short index,	\
+					  u16 val)				\
+{										\
+	__mlxsw_item_set16(buf, &__ITEM_NAME(_type, _cname, _iname),		\
+			   index, val);						\
+}
+
+#define MLXSW_ITEM32(_type, _cname, _iname, _offset, _shift, _sizebits)		\
+static struct mlxsw_item __ITEM_NAME(_type, _cname, _iname) = {			\
+	.offset = _offset,							\
+	.shift = _shift,							\
+	.size = {.bits = _sizebits,},						\
+	.name = #_type "_" #_cname "_" #_iname,					\
+};										\
+static inline u32 mlxsw_##_type##_##_cname##_##_iname##_get(char *buf)		\
+{										\
+	return __mlxsw_item_get32(buf, &__ITEM_NAME(_type, _cname, _iname), 0);	\
+}										\
+static inline void mlxsw_##_type##_##_cname##_##_iname##_set(char *buf, u32 val)\
+{										\
+	__mlxsw_item_set32(buf, &__ITEM_NAME(_type, _cname, _iname), 0, val);	\
+}
+
+#define MLXSW_ITEM32_INDEXED(_type, _cname, _iname, _offset, _shift, _sizebits,	\
+			     _step, _instepoffset, _norealshift)		\
+static struct mlxsw_item __ITEM_NAME(_type, _cname, _iname) = {			\
+	.offset = _offset,							\
+	.step = _step,								\
+	.in_step_offset = _instepoffset,					\
+	.shift = _shift,							\
+	.no_real_shift = _norealshift,						\
+	.size = {.bits = _sizebits,},						\
+	.name = #_type "_" #_cname "_" #_iname,					\
+};										\
+static inline u32								\
+mlxsw_##_type##_##_cname##_##_iname##_get(char *buf, unsigned short index)	\
+{										\
+	return __mlxsw_item_get32(buf, &__ITEM_NAME(_type, _cname, _iname),	\
+				  index);					\
+}										\
+static inline void								\
+mlxsw_##_type##_##_cname##_##_iname##_set(char *buf, unsigned short index,	\
+					  u32 val)				\
+{										\
+	__mlxsw_item_set32(buf, &__ITEM_NAME(_type, _cname, _iname),		\
+			   index, val);						\
+}
+
+#define MLXSW_ITEM64(_type, _cname, _iname, _offset, _shift, _sizebits)		\
+static struct mlxsw_item __ITEM_NAME(_type, _cname, _iname) = {			\
+	.offset = _offset,							\
+	.shift = _shift,							\
+	.size = {.bits = _sizebits,},						\
+	.name = #_type "_" #_cname "_" #_iname,					\
+};										\
+static inline u64 mlxsw_##_type##_##_cname##_##_iname##_get(char *buf)		\
+{										\
+	return __mlxsw_item_get64(buf, &__ITEM_NAME(_type, _cname, _iname), 0);	\
+}										\
+static inline void mlxsw_##_type##_##_cname##_##_iname##_set(char *buf, u64 val)\
+{										\
+	__mlxsw_item_set64(buf, &__ITEM_NAME(_type, _cname, _iname), 0,	val);	\
+}
+
+#define MLXSW_ITEM64_INDEXED(_type, _cname, _iname, _offset, _shift,		\
+			     _sizebits, _step, _instepoffset, _norealshift)	\
+static struct mlxsw_item __ITEM_NAME(_type, _cname, _iname) = {			\
+	.offset = _offset,							\
+	.step = _step,								\
+	.in_step_offset = _instepoffset,					\
+	.shift = _shift,							\
+	.no_real_shift = _norealshift,						\
+	.size = {.bits = _sizebits,},						\
+	.name = #_type "_" #_cname "_" #_iname,					\
+};										\
+static inline u64								\
+mlxsw_##_type##_##_cname##_##_iname##_get(char *buf, unsigned short index)	\
+{										\
+	return __mlxsw_item_get64(buf, &__ITEM_NAME(_type, _cname, _iname),	\
+				  index);					\
+}										\
+static inline void								\
+mlxsw_##_type##_##_cname##_##_iname##_set(char *buf, unsigned short index,	\
+					  u64 val)				\
+{										\
+	__mlxsw_item_set64(buf, &__ITEM_NAME(_type, _cname, _iname),		\
+			   index, val);						\
+}
+
+#define MLXSW_ITEM_BUF(_type, _cname, _iname, _offset, _sizebytes)		\
+static struct mlxsw_item __ITEM_NAME(_type, _cname, _iname) = {			\
+	.offset = _offset,							\
+	.size = {.bytes = _sizebytes,},						\
+	.name = #_type "_" #_cname "_" #_iname,					\
+};										\
+static inline void								\
+mlxsw_##_type##_##_cname##_##_iname##_memcpy_from(char *buf, char *dst)		\
+{										\
+	__mlxsw_item_memcpy_from(buf, dst, &__ITEM_NAME(_type, _cname, _iname));\
+}										\
+static inline void								\
+mlxsw_##_type##_##_cname##_##_iname##_memcpy_to(char *buf, char *src)		\
+{										\
+	__mlxsw_item_memcpy_to(buf, src, &__ITEM_NAME(_type, _cname, _iname));	\
+}
+
+#define MLXSW_ITEM_BIT_ARRAY(_type, _cname, _iname, _offset, _sizebytes,	\
+			     _element_size)					\
+static struct mlxsw_item __ITEM_NAME(_type, _cname, _iname) = {			\
+	.offset = _offset,							\
+	.element_size = _element_size,						\
+	.size = {.bytes = _sizebytes,},						\
+	.name = #_type "_" #_cname "_" #_iname,					\
+};										\
+static inline u8								\
+mlxsw_##_type##_##_cname##_##_iname##_get(char *buf, u16 index)			\
+{										\
+	return __mlxsw_item_bit_array_get(buf,					\
+					  &__ITEM_NAME(_type, _cname, _iname),	\
+					  index);				\
+}										\
+static inline void								\
+mlxsw_##_type##_##_cname##_##_iname##_set(char *buf, u16 index, u8 val)		\
+{										\
+	return __mlxsw_item_bit_array_set(buf,					\
+					  &__ITEM_NAME(_type, _cname, _iname),	\
+					  index, val);				\
+}										\
+
+#endif
diff --git a/drivers/net/ethernet/mellanox/mlxsw/port.h b/drivers/net/ethernet/mellanox/mlxsw/port.h
new file mode 100644
index 0000000..7d66c92
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/port.h
@@ -0,0 +1,50 @@
+/*
+ * drivers/net/ethernet/mellanox/mlxsw/port.h
+ * Copyright (c) 2015 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2015 Elad Raz <eladr@mellanox.com>
+ * Copyright (c) 2015 Jiri Pirko <jiri@mellanox.com>
+ * Copyright (c) 2015 Ido Schimmel <idosch@mellanox.com>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the names of the copyright holders nor the names of its
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+#ifndef _MLXSW_PORT_H
+#define _MLXSW_PORT_H
+
+#include <linux/types.h>
+
+#define MLXSW_PORT_MAX_MTU		10000
+
+#define MLXSW_PORT_DEFAULT_VID		1
+
+#define MLXSW_PORT_MAX_PHY_PORTS	0x40
+#define MLXSW_PORT_MAX_PORTS		MLXSW_PORT_MAX_PHY_PORTS
+
+#define MLXSW_PORT_CPU_PORT		0x0
+
+#endif /* _MLXSW_PORT_H */
diff --git a/drivers/net/ethernet/mellanox/mlxsw/trap.h b/drivers/net/ethernet/mellanox/mlxsw/trap.h
new file mode 100644
index 0000000..bff007d
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/trap.h
@@ -0,0 +1,68 @@
+/*
+ * drivers/net/ethernet/mellanox/mlxsw/trap.h
+ * Copyright (c) 2015 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2015 Elad Raz <eladr@mellanox.com>
+ * Copyright (c) 2015 Jiri Pirko <jiri@mellanox.com>
+ * Copyright (c) 2015 Ido Schimmel <idosch@mellanox.com>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the names of the copyright holders nor the names of its
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+#ifndef _MLXSW_TRAP_H
+#define _MLXSW_TRAP_H
+
+enum {
+	/* Ethernet EMAD and FDB miss */
+	MLXSW_TRAP_ID_FDB_MC = 0x01,
+	MLXSW_TRAP_ID_ETHEMAD = 0x05,
+	/* L2 traps for specific packet types */
+	MLXSW_TRAP_ID_STP = 0x10,
+	MLXSW_TRAP_ID_LACP = 0x11,
+	MLXSW_TRAP_ID_EAPOL = 0x12,
+	MLXSW_TRAP_ID_LLDP = 0x13,
+	MLXSW_TRAP_ID_MMRP = 0x14,
+	MLXSW_TRAP_ID_MVRP = 0x15,
+	MLXSW_TRAP_ID_RPVST = 0x16,
+	MLXSW_TRAP_ID_DHCP = 0x19,
+	MLXSW_TRAP_ID_IGMP_QUERY = 0x30,
+	MLXSW_TRAP_ID_IGMP_V1_REPORT = 0x31,
+	MLXSW_TRAP_ID_IGMP_V2_REPORT = 0x32,
+	MLXSW_TRAP_ID_IGMP_V2_LEAVE = 0x33,
+	MLXSW_TRAP_ID_IGMP_V3_REPORT = 0x34,
+
+	MLXSW_TRAP_ID_MAX = 0x1FF
+};
+
+enum mlxsw_event_trap_id {
+	/* Port Up/Down event generated by hardware */
+	MLXSW_TRAP_ID_PUDE = 0x8,
+};
+
+#define MLXSW_TRAP_ID_DONT_CARE (MLXSW_TRAP_ID_MAX)
+
+#endif /* _MLXSW_TRAP_H */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [patch net-next 2/4] mlxsw: Add PCI bus implementation
  2015-07-23 15:43 [patch net-next 0/4] Introduce Mellanox Technologies Switch ASICs switchdev drivers Jiri Pirko
  2015-07-23 15:43 ` [patch net-next 1/4] mlxsw: Introduce Mellanox switch driver core Jiri Pirko
@ 2015-07-23 15:43 ` Jiri Pirko
  2015-07-24  4:52   ` Scott Feldman
  2015-07-26  5:15   ` Scott Feldman
  2015-07-23 15:43 ` [patch net-next 3/4] mlxsw: Add interface to access registers and process events Jiri Pirko
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 29+ messages in thread
From: Jiri Pirko @ 2015-07-23 15:43 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, ogerlitz, sfeldma, roopa, f.fainelli,
	tgraf, ast, jhs, daniel, john.fastabend, simon.horman, linville,
	andy, shm, nhorman, Jiri Pirko

From: Jiri Pirko <jiri@mellanox.com>

Add PCI bus implementation for Mellanox Technologies Switch ASICs. This
includes firmware initialization, async queues manipulation and command
interface implementation.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Elad Raz <eladr@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/Kconfig  |   10 +
 drivers/net/ethernet/mellanox/mlxsw/Makefile |    2 +
 drivers/net/ethernet/mellanox/mlxsw/pci.c    | 1786 ++++++++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlxsw/pci.h    |  220 ++++
 4 files changed, 2018 insertions(+)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/pci.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/pci.h

diff --git a/drivers/net/ethernet/mellanox/mlxsw/Kconfig b/drivers/net/ethernet/mellanox/mlxsw/Kconfig
index 46268f2..1385f2c 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlxsw/Kconfig
@@ -9,3 +9,13 @@ config MLXSW_CORE
 
 	  To compile this driver as a module, choose M here: the
 	  module will be called mlxsw_core.
+
+config MLXSW_PCI
+	tristate "PCI bus implementation for Mellanox Technologies Switch ASICs"
+	depends on PCI && MLXSW_CORE
+	default m
+	---help---
+	  This is PCI bus implementation for Mellanox Technologies Switch ASICs.
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called mlxsw_pci.
diff --git a/drivers/net/ethernet/mellanox/mlxsw/Makefile b/drivers/net/ethernet/mellanox/mlxsw/Makefile
index 271de28..94841c3 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/Makefile
+++ b/drivers/net/ethernet/mellanox/mlxsw/Makefile
@@ -1,2 +1,4 @@
 obj-$(CONFIG_MLXSW_CORE)	+= mlxsw_core.o
 mlxsw_core-objs			:= core.o
+obj-$(CONFIG_MLXSW_PCI)		+= mlxsw_pci.o
+mlxsw_pci-objs			:= pci.o
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
new file mode 100644
index 0000000..8371e08
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -0,0 +1,1786 @@
+/*
+ * drivers/net/ethernet/mellanox/mlxsw/pci.c
+ * Copyright (c) 2015 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2015 Jiri Pirko <jiri@mellanox.com>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the names of the copyright holders nor the names of its
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/export.h>
+#include <linux/err.h>
+#include <linux/device.h>
+#include <linux/pci.h>
+#include <linux/interrupt.h>
+#include <linux/wait.h>
+#include <linux/types.h>
+#include <linux/skbuff.h>
+#include <linux/if_vlan.h>
+#include <linux/log2.h>
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+
+#include "pci.h"
+#include "core.h"
+#include "cmd.h"
+#include "port.h"
+
+static const char mlxsw_pci_driver_name[] = "mlxsw_pci";
+
+static const struct pci_device_id mlxsw_pci_id_table[] = {
+	{0, }
+};
+
+static struct dentry *mlxsw_pci_dbg_root;
+
+static const char *mlxsw_pci_device_kind_get(const struct pci_device_id *id)
+{
+	switch (id->device) {
+	default:
+		BUG();
+	}
+}
+
+#define mlxsw_pci_write32(mlxsw_pci, reg, val) \
+	iowrite32be(val, (mlxsw_pci)->hw_addr + (MLXSW_PCI_ ## reg))
+#define mlxsw_pci_read32(mlxsw_pci, reg) \
+	ioread32be((mlxsw_pci)->hw_addr + (MLXSW_PCI_ ## reg))
+
+enum mlxsw_pci_queue_type {
+	MLXSW_PCI_QUEUE_TYPE_SDQ,
+	MLXSW_PCI_QUEUE_TYPE_RDQ,
+	MLXSW_PCI_QUEUE_TYPE_CQ,
+	MLXSW_PCI_QUEUE_TYPE_EQ,
+};
+
+static const char *mlxsw_pci_queue_type_str(enum mlxsw_pci_queue_type q_type)
+{
+	switch (q_type) {
+	case MLXSW_PCI_QUEUE_TYPE_SDQ:
+		return "sdq";
+	case MLXSW_PCI_QUEUE_TYPE_RDQ:
+		return "rdq";
+	case MLXSW_PCI_QUEUE_TYPE_CQ:
+		return "cq";
+	case MLXSW_PCI_QUEUE_TYPE_EQ:
+		return "eq";
+	}
+	BUG();
+}
+
+#define MLXSW_PCI_QUEUE_TYPE_COUNT	4
+
+static const u16 mlxsw_pci_doorbell_type_offset[] = {
+	MLXSW_PCI_DOORBELL_SDQ_OFFSET,	/* for type MLXSW_PCI_QUEUE_TYPE_SDQ */
+	MLXSW_PCI_DOORBELL_RDQ_OFFSET,	/* for type MLXSW_PCI_QUEUE_TYPE_RDQ */
+	MLXSW_PCI_DOORBELL_CQ_OFFSET,	/* for type MLXSW_PCI_QUEUE_TYPE_CQ */
+	MLXSW_PCI_DOORBELL_EQ_OFFSET,	/* for type MLXSW_PCI_QUEUE_TYPE_EQ */
+};
+
+static const u16 mlxsw_pci_doorbell_arm_type_offset[] = {
+	0, /* unused */
+	0, /* unused */
+	MLXSW_PCI_DOORBELL_ARM_CQ_OFFSET, /* for type MLXSW_PCI_QUEUE_TYPE_CQ */
+	MLXSW_PCI_DOORBELL_ARM_EQ_OFFSET, /* for type MLXSW_PCI_QUEUE_TYPE_EQ */
+};
+
+struct mlxsw_pci_mem_item {
+	char *buf;
+	dma_addr_t mapaddr;
+	size_t size;
+};
+
+struct mlxsw_pci_queue_elem_info {
+	char *elem; /* pointer to actual dma mapped element mem chunk */
+	union {
+		struct {
+			struct sk_buff *skb;
+		} sdq;
+		struct {
+			struct sk_buff *skb;
+		} rdq;
+	} u;
+};
+
+struct mlxsw_pci_queue {
+	spinlock_t lock; /* for queue accesses */
+	struct mlxsw_pci_mem_item mem_item;
+	struct mlxsw_pci_queue_elem_info *elem_info;
+	u16 producer_counter;
+	u16 consumer_counter;
+	u16 count; /* number of elements in queue */
+	u8 num; /* queue number */
+	u8 elem_size; /* size of one element */
+	enum mlxsw_pci_queue_type type;
+	struct tasklet_struct tasklet; /* queue processing tasklet */
+	struct mlxsw_pci *pci;
+	union {
+		struct {
+			u32 comp_sdq_count;
+			u32 comp_rdq_count;
+		} cq;
+		struct {
+			u32 ev_cmd_count;
+			u32 ev_comp_count;
+			u32 ev_other_count;
+		} eq;
+	} u;
+};
+
+struct mlxsw_pci_queue_type_group {
+	struct mlxsw_pci_queue *q;
+	u8 count; /* number of queues in group */
+};
+
+struct mlxsw_pci {
+	struct pci_dev *pdev;
+	u8 __iomem *hw_addr;
+	struct mlxsw_pci_queue_type_group queues[MLXSW_PCI_QUEUE_TYPE_COUNT];
+	u32 doorbell_offset;
+	struct msix_entry msix_entry;
+	struct mlxsw_core *core;
+	struct {
+		u16 num_pages;
+		struct mlxsw_pci_mem_item *items;
+	} fw_area;
+	struct {
+		struct mutex lock; /* Lock access to command registers */
+		bool nopoll;
+		wait_queue_head_t wait;
+		bool wait_done;
+		struct {
+			u8 status;
+			u64 out_param;
+		} comp;
+	} cmd;
+	struct mlxsw_bus_info bus_info;
+	struct dentry *dbg_dir;
+};
+
+static void mlxsw_pci_queue_tasklet_schedule(struct mlxsw_pci_queue *q)
+{
+	tasklet_schedule(&q->tasklet);
+}
+
+static char *__mlxsw_pci_queue_elem_get(struct mlxsw_pci_queue *q,
+					size_t elem_size, int elem_index)
+{
+	return q->mem_item.buf + (elem_size * elem_index);
+}
+
+static struct mlxsw_pci_queue_elem_info *
+mlxsw_pci_queue_elem_info_get(struct mlxsw_pci_queue *q, int elem_index)
+{
+	return &q->elem_info[elem_index];
+}
+
+static struct mlxsw_pci_queue_elem_info *
+mlxsw_pci_queue_elem_info_producer_get(struct mlxsw_pci_queue *q)
+{
+	int index = q->producer_counter & (q->count - 1);
+
+	if ((q->producer_counter - q->consumer_counter) == q->count)
+		return NULL;
+	return mlxsw_pci_queue_elem_info_get(q, index);
+}
+
+static struct mlxsw_pci_queue_elem_info *
+mlxsw_pci_queue_elem_info_consumer_get(struct mlxsw_pci_queue *q)
+{
+	int index = q->consumer_counter & (q->count - 1);
+
+	return mlxsw_pci_queue_elem_info_get(q, index);
+}
+
+static char *mlxsw_pci_queue_elem_get(struct mlxsw_pci_queue *q, int elem_index)
+{
+	return mlxsw_pci_queue_elem_info_get(q, elem_index)->elem;
+}
+
+static bool mlxsw_pci_elem_hw_owned(struct mlxsw_pci_queue *q, bool owner_bit)
+{
+	return owner_bit != !!(q->consumer_counter & q->count);
+}
+
+static char *mlxsw_pci_queue_sw_elem_get(struct mlxsw_pci_queue *q,
+					 u32 (*get_elem_owner_func)(char *))
+{
+	struct mlxsw_pci_queue_elem_info *elem_info;
+	char *elem;
+	bool owner_bit;
+
+	elem_info = mlxsw_pci_queue_elem_info_consumer_get(q);
+	elem = elem_info->elem;
+	owner_bit = get_elem_owner_func(elem);
+	if (mlxsw_pci_elem_hw_owned(q, owner_bit))
+		return NULL;
+	q->consumer_counter++;
+	rmb(); /* make sure we read owned bit before the rest of elem */
+	return elem;
+}
+
+static struct mlxsw_pci_queue_type_group *
+mlxsw_pci_queue_type_group_get(struct mlxsw_pci *mlxsw_pci,
+			       enum mlxsw_pci_queue_type q_type)
+{
+	return &mlxsw_pci->queues[q_type];
+}
+
+static u8 __mlxsw_pci_queue_count(struct mlxsw_pci *mlxsw_pci,
+				  enum mlxsw_pci_queue_type q_type)
+{
+	struct mlxsw_pci_queue_type_group *queue_group;
+
+	queue_group = mlxsw_pci_queue_type_group_get(mlxsw_pci, q_type);
+	return queue_group->count;
+}
+
+static u8 mlxsw_pci_sdq_count(struct mlxsw_pci *mlxsw_pci)
+{
+	return __mlxsw_pci_queue_count(mlxsw_pci, MLXSW_PCI_QUEUE_TYPE_SDQ);
+}
+
+static u8 mlxsw_pci_rdq_count(struct mlxsw_pci *mlxsw_pci)
+{
+	return __mlxsw_pci_queue_count(mlxsw_pci, MLXSW_PCI_QUEUE_TYPE_RDQ);
+}
+
+static u8 mlxsw_pci_cq_count(struct mlxsw_pci *mlxsw_pci)
+{
+	return __mlxsw_pci_queue_count(mlxsw_pci, MLXSW_PCI_QUEUE_TYPE_CQ);
+}
+
+static u8 mlxsw_pci_eq_count(struct mlxsw_pci *mlxsw_pci)
+{
+	return __mlxsw_pci_queue_count(mlxsw_pci, MLXSW_PCI_QUEUE_TYPE_EQ);
+}
+
+static struct mlxsw_pci_queue *
+__mlxsw_pci_queue_get(struct mlxsw_pci *mlxsw_pci,
+		      enum mlxsw_pci_queue_type q_type, u8 q_num)
+{
+	return &mlxsw_pci->queues[q_type].q[q_num];
+}
+
+static struct mlxsw_pci_queue *mlxsw_pci_sdq_get(struct mlxsw_pci *mlxsw_pci,
+						 u8 q_num)
+{
+	return __mlxsw_pci_queue_get(mlxsw_pci,
+				     MLXSW_PCI_QUEUE_TYPE_SDQ, q_num);
+}
+
+static struct mlxsw_pci_queue *mlxsw_pci_rdq_get(struct mlxsw_pci *mlxsw_pci,
+						 u8 q_num)
+{
+	return __mlxsw_pci_queue_get(mlxsw_pci,
+				     MLXSW_PCI_QUEUE_TYPE_RDQ, q_num);
+}
+
+static struct mlxsw_pci_queue *mlxsw_pci_cq_get(struct mlxsw_pci *mlxsw_pci,
+						u8 q_num)
+{
+	return __mlxsw_pci_queue_get(mlxsw_pci, MLXSW_PCI_QUEUE_TYPE_CQ, q_num);
+}
+
+static struct mlxsw_pci_queue *mlxsw_pci_eq_get(struct mlxsw_pci *mlxsw_pci,
+						u8 q_num)
+{
+	return __mlxsw_pci_queue_get(mlxsw_pci, MLXSW_PCI_QUEUE_TYPE_EQ, q_num);
+}
+
+static void __mlxsw_pci_queue_doorbell_set(struct mlxsw_pci *mlxsw_pci,
+					   struct mlxsw_pci_queue *q,
+					   u16 val)
+{
+	mlxsw_pci_write32(mlxsw_pci,
+			  DOORBELL(mlxsw_pci->doorbell_offset,
+				   mlxsw_pci_doorbell_type_offset[q->type],
+				   q->num), val);
+}
+
+static void __mlxsw_pci_queue_doorbell_arm_set(struct mlxsw_pci *mlxsw_pci,
+					       struct mlxsw_pci_queue *q,
+					       u16 val)
+{
+	mlxsw_pci_write32(mlxsw_pci,
+			  DOORBELL(mlxsw_pci->doorbell_offset,
+				   mlxsw_pci_doorbell_arm_type_offset[q->type],
+				   q->num), val);
+}
+
+static void mlxsw_pci_queue_doorbell_producer_ring(struct mlxsw_pci *mlxsw_pci,
+						   struct mlxsw_pci_queue *q)
+{
+	wmb(); /* ensure all writes are done before we ring a bell */
+	__mlxsw_pci_queue_doorbell_set(mlxsw_pci, q, q->producer_counter);
+}
+
+static void mlxsw_pci_queue_doorbell_consumer_ring(struct mlxsw_pci *mlxsw_pci,
+						   struct mlxsw_pci_queue *q)
+{
+	wmb(); /* ensure all writes are done before we ring a bell */
+	__mlxsw_pci_queue_doorbell_set(mlxsw_pci, q,
+				       q->consumer_counter + q->count);
+}
+
+static void
+mlxsw_pci_queue_doorbell_arm_consumer_ring(struct mlxsw_pci *mlxsw_pci,
+					   struct mlxsw_pci_queue *q)
+{
+	wmb(); /* ensure all writes are done before we ring a bell */
+	__mlxsw_pci_queue_doorbell_arm_set(mlxsw_pci, q, q->consumer_counter);
+}
+
+static dma_addr_t __mlxsw_pci_queue_page_get(struct mlxsw_pci_queue *q,
+					     int page_index)
+{
+	return q->mem_item.mapaddr + MLXSW_PCI_PAGE_SIZE * page_index;
+}
+
+static int mlxsw_pci_sdq_init(struct mlxsw_pci *mlxsw_pci, char *mbox,
+			      struct mlxsw_pci_queue *q)
+{
+	int i;
+	int err;
+
+	q->producer_counter = 0;
+	q->consumer_counter = 0;
+
+	/* Set CQ of same number of this SDQ. */
+	mlxsw_cmd_mbox_sw2hw_dq_cq_set(mbox, q->num);
+	mlxsw_cmd_mbox_sw2hw_dq_sdq_tclass_set(mbox, 7);
+	mlxsw_cmd_mbox_sw2hw_dq_log2_dq_sz_set(mbox, 3); /* 8 pages */
+	for (i = 0; i < MLXSW_PCI_AQ_PAGES; i++) {
+		dma_addr_t mapaddr = __mlxsw_pci_queue_page_get(q, i);
+
+		mlxsw_cmd_mbox_sw2hw_dq_pa_set(mbox, i, mapaddr);
+	}
+
+	err = mlxsw_cmd_sw2hw_sdq(mlxsw_pci->core, mbox, q->num);
+	if (err)
+		return err;
+	mlxsw_pci_queue_doorbell_producer_ring(mlxsw_pci, q);
+	return 0;
+}
+
+static void mlxsw_pci_sdq_fini(struct mlxsw_pci *mlxsw_pci,
+			       struct mlxsw_pci_queue *q)
+{
+	mlxsw_cmd_hw2sw_sdq(mlxsw_pci->core, q->num);
+}
+
+static int mlxsw_pci_sdq_dbg_read(struct seq_file *file, void *data)
+{
+	struct mlxsw_pci *mlxsw_pci = dev_get_drvdata(file->private);
+	struct mlxsw_pci_queue *q;
+	int i;
+	static const char hdr[] =
+		"NUM PROD_COUNT CONS_COUNT COUNT\n";
+
+	seq_printf(file, hdr);
+	for (i = 0; i < mlxsw_pci_sdq_count(mlxsw_pci); i++) {
+		q = mlxsw_pci_sdq_get(mlxsw_pci, i);
+		spin_lock_bh(&q->lock);
+		seq_printf(file, "%3d %10d %10d %5d\n",
+			   i, q->producer_counter, q->consumer_counter,
+			   q->count);
+		spin_unlock_bh(&q->lock);
+	}
+	return 0;
+}
+
+static int mlxsw_pci_wqe_frag_map(struct mlxsw_pci *mlxsw_pci, char *wqe,
+				  int index, char *frag_data, size_t frag_len,
+				  int direction)
+{
+	struct pci_dev *pdev = mlxsw_pci->pdev;
+	dma_addr_t mapaddr;
+
+	mapaddr = pci_map_single(pdev, frag_data, frag_len, direction);
+	if (unlikely(pci_dma_mapping_error(pdev, mapaddr))) {
+		if (net_ratelimit())
+			dev_err(&pdev->dev, "failed to dma map tx frag\n");
+		return -EIO;
+	}
+	mlxsw_pci_wqe_address_set(wqe, index, mapaddr);
+	mlxsw_pci_wqe_byte_count_set(wqe, index, frag_len);
+	return 0;
+}
+
+static void mlxsw_pci_wqe_frag_unmap(struct mlxsw_pci *mlxsw_pci, char *wqe,
+				     int index, int direction)
+{
+	struct pci_dev *pdev = mlxsw_pci->pdev;
+	size_t frag_len = mlxsw_pci_wqe_byte_count_get(wqe, index);
+	dma_addr_t mapaddr = mlxsw_pci_wqe_address_get(wqe, index);
+
+	if (!frag_len)
+		return;
+	pci_unmap_single(pdev, mapaddr, frag_len, direction);
+}
+
+static int mlxsw_pci_rdq_skb_alloc(struct mlxsw_pci *mlxsw_pci,
+				   struct mlxsw_pci_queue_elem_info *elem_info)
+{
+	size_t buf_len = MLXSW_PORT_MAX_MTU;
+	char *wqe = elem_info->elem;
+	struct sk_buff *skb;
+	int err;
+
+	elem_info->u.rdq.skb = NULL;
+	skb = netdev_alloc_skb_ip_align(NULL, buf_len);
+	if (!skb)
+		return -ENOMEM;
+
+	/* Assume that wqe was previously zeroed. */
+
+	err = mlxsw_pci_wqe_frag_map(mlxsw_pci, wqe, 0, skb->data,
+				     buf_len, DMA_FROM_DEVICE);
+	if (err)
+		goto err_frag_map;
+
+	elem_info->u.rdq.skb = skb;
+	return 0;
+
+err_frag_map:
+	dev_kfree_skb_any(skb);
+	return err;
+}
+
+static void mlxsw_pci_rdq_skb_free(struct mlxsw_pci *mlxsw_pci,
+				   struct mlxsw_pci_queue_elem_info *elem_info)
+{
+	struct sk_buff *skb;
+	char *wqe;
+
+	skb = elem_info->u.rdq.skb;
+	wqe = elem_info->elem;
+
+	mlxsw_pci_wqe_frag_unmap(mlxsw_pci, wqe, 0, DMA_FROM_DEVICE);
+	dev_kfree_skb_any(skb);
+}
+
+static int mlxsw_pci_rdq_init(struct mlxsw_pci *mlxsw_pci, char *mbox,
+			      struct mlxsw_pci_queue *q)
+{
+	struct mlxsw_pci_queue_elem_info *elem_info;
+	int i;
+	int err;
+
+	q->producer_counter = 0;
+	q->consumer_counter = 0;
+
+	/* Set CQ of same number of this RDQ with base
+	 * above MLXSW_PCI_SDQS_MAX as the lower ones are assigned to SDQs.
+	 */
+	mlxsw_cmd_mbox_sw2hw_dq_cq_set(mbox, q->num + MLXSW_PCI_SDQS_COUNT);
+	mlxsw_cmd_mbox_sw2hw_dq_log2_dq_sz_set(mbox, 3); /* 8 pages */
+	for (i = 0; i < MLXSW_PCI_AQ_PAGES; i++) {
+		dma_addr_t mapaddr = __mlxsw_pci_queue_page_get(q, i);
+
+		mlxsw_cmd_mbox_sw2hw_dq_pa_set(mbox, i, mapaddr);
+	}
+
+	err = mlxsw_cmd_sw2hw_rdq(mlxsw_pci->core, mbox, q->num);
+	if (err)
+		return err;
+
+	mlxsw_pci_queue_doorbell_producer_ring(mlxsw_pci, q);
+
+	for (i = 0; i < q->count; i++) {
+		elem_info = mlxsw_pci_queue_elem_info_producer_get(q);
+		BUG_ON(!elem_info);
+		err = mlxsw_pci_rdq_skb_alloc(mlxsw_pci, elem_info);
+		if (err)
+			goto rollback;
+		/* Everything is set up, ring doorbell to pass elem to HW */
+		q->producer_counter++;
+		mlxsw_pci_queue_doorbell_producer_ring(mlxsw_pci, q);
+	}
+
+	return 0;
+
+rollback:
+	for (i--; i >= 0; i--) {
+		elem_info = mlxsw_pci_queue_elem_info_get(q, i);
+		mlxsw_pci_rdq_skb_free(mlxsw_pci, elem_info);
+	}
+	mlxsw_cmd_hw2sw_rdq(mlxsw_pci->core, q->num);
+
+	return err;
+}
+
+static void mlxsw_pci_rdq_fini(struct mlxsw_pci *mlxsw_pci,
+			       struct mlxsw_pci_queue *q)
+{
+	struct mlxsw_pci_queue_elem_info *elem_info;
+	int i;
+
+	mlxsw_cmd_hw2sw_rdq(mlxsw_pci->core, q->num);
+	for (i = 0; i < q->count; i++) {
+		elem_info = mlxsw_pci_queue_elem_info_get(q, i);
+		mlxsw_pci_rdq_skb_free(mlxsw_pci, elem_info);
+	}
+}
+
+static int mlxsw_pci_rdq_dbg_read(struct seq_file *file, void *data)
+{
+	struct mlxsw_pci *mlxsw_pci = dev_get_drvdata(file->private);
+	struct mlxsw_pci_queue *q;
+	int i;
+	static const char hdr[] =
+		"NUM PROD_COUNT CONS_COUNT COUNT\n";
+
+	seq_printf(file, hdr);
+	for (i = 0; i < mlxsw_pci_rdq_count(mlxsw_pci); i++) {
+		q = mlxsw_pci_rdq_get(mlxsw_pci, i);
+		spin_lock_bh(&q->lock);
+		seq_printf(file, "%3d %10d %10d %5d\n",
+			   i, q->producer_counter, q->consumer_counter,
+			   q->count);
+		spin_unlock_bh(&q->lock);
+	}
+	return 0;
+}
+
+static int mlxsw_pci_cq_init(struct mlxsw_pci *mlxsw_pci, char *mbox,
+			     struct mlxsw_pci_queue *q)
+{
+	int i;
+	int err;
+
+	q->consumer_counter = 0;
+
+	for (i = 0; i < q->count; i++) {
+		char *elem = mlxsw_pci_queue_elem_get(q, i);
+
+		mlxsw_pci_cqe_owner_set(elem, 1);
+	}
+
+	mlxsw_cmd_mbox_sw2hw_cq_cv_set(mbox, 0); /* CQE ver 0 */
+	mlxsw_cmd_mbox_sw2hw_cq_c_eqn_set(mbox, MLXSW_PCI_EQ_COMP_NUM);
+	mlxsw_cmd_mbox_sw2hw_cq_oi_set(mbox, 0);
+	mlxsw_cmd_mbox_sw2hw_cq_st_set(mbox, 0);
+	mlxsw_cmd_mbox_sw2hw_cq_log_cq_size_set(mbox, ilog2(q->count));
+	for (i = 0; i < MLXSW_PCI_AQ_PAGES; i++) {
+		dma_addr_t mapaddr = __mlxsw_pci_queue_page_get(q, i);
+
+		mlxsw_cmd_mbox_sw2hw_cq_pa_set(mbox, i, mapaddr);
+	}
+	err = mlxsw_cmd_sw2hw_cq(mlxsw_pci->core, mbox, q->num);
+	if (err)
+		return err;
+	mlxsw_pci_queue_doorbell_consumer_ring(mlxsw_pci, q);
+	mlxsw_pci_queue_doorbell_arm_consumer_ring(mlxsw_pci, q);
+	return 0;
+}
+
+static void mlxsw_pci_cq_fini(struct mlxsw_pci *mlxsw_pci,
+			      struct mlxsw_pci_queue *q)
+{
+	mlxsw_cmd_hw2sw_cq(mlxsw_pci->core, q->num);
+}
+
+static int mlxsw_pci_cq_dbg_read(struct seq_file *file, void *data)
+{
+	struct mlxsw_pci *mlxsw_pci = dev_get_drvdata(file->private);
+
+	struct mlxsw_pci_queue *q;
+	int i;
+	static const char hdr[] =
+		"NUM CONS_INDEX  SDQ_COUNT  RDQ_COUNT COUNT\n";
+
+	seq_printf(file, hdr);
+	for (i = 0; i < mlxsw_pci_cq_count(mlxsw_pci); i++) {
+		q = mlxsw_pci_cq_get(mlxsw_pci, i);
+		spin_lock_bh(&q->lock);
+		seq_printf(file, "%3d %10d %10d %10d %5d\n",
+			   i, q->consumer_counter, q->u.cq.comp_sdq_count,
+			   q->u.cq.comp_rdq_count, q->count);
+		spin_unlock_bh(&q->lock);
+	}
+	return 0;
+}
+
+static void mlxsw_pci_cqe_sdq_handle(struct mlxsw_pci *mlxsw_pci,
+				     struct mlxsw_pci_queue *q,
+				     u16 consumer_counter_limit,
+				     char *cqe)
+{
+	struct pci_dev *pdev = mlxsw_pci->pdev;
+	struct mlxsw_pci_queue_elem_info *elem_info;
+	char *wqe;
+	struct sk_buff *skb;
+	int i;
+
+	spin_lock(&q->lock);
+	elem_info = mlxsw_pci_queue_elem_info_consumer_get(q);
+	skb = elem_info->u.sdq.skb;
+	wqe = elem_info->elem;
+	for (i = 0; i < MLXSW_PCI_WQE_SG_ENTRIES; i++)
+		mlxsw_pci_wqe_frag_unmap(mlxsw_pci, wqe, i, DMA_TO_DEVICE);
+	dev_kfree_skb_any(skb);
+	elem_info->u.sdq.skb = NULL;
+
+	if (q->consumer_counter++ != consumer_counter_limit)
+		dev_dbg_ratelimited(&pdev->dev, "Consumer counter does not match limit in SDQ\n");
+	spin_unlock(&q->lock);
+}
+
+static void mlxsw_pci_cqe_rdq_handle(struct mlxsw_pci *mlxsw_pci,
+				     struct mlxsw_pci_queue *q,
+				     u16 consumer_counter_limit,
+				     char *cqe)
+{
+	struct pci_dev *pdev = mlxsw_pci->pdev;
+	struct mlxsw_pci_queue_elem_info *elem_info;
+	char *wqe;
+	struct sk_buff *skb;
+	struct mlxsw_rx_info rx_info;
+	int err;
+
+	elem_info = mlxsw_pci_queue_elem_info_consumer_get(q);
+	skb = elem_info->u.sdq.skb;
+	if (!skb)
+		return;
+	wqe = elem_info->elem;
+	mlxsw_pci_wqe_frag_unmap(mlxsw_pci, wqe, 0, DMA_FROM_DEVICE);
+
+	if (q->consumer_counter++ != consumer_counter_limit)
+		dev_dbg_ratelimited(&pdev->dev, "Consumer counter does not match limit in RDQ\n");
+
+	/* We do not support lag now */
+	if (mlxsw_pci_cqe_lag_get(cqe))
+		goto drop;
+
+	rx_info.sys_port = mlxsw_pci_cqe_system_port_get(cqe);
+	rx_info.trap_id = mlxsw_pci_cqe_trap_id_get(cqe);
+
+	skb_put(skb, mlxsw_pci_cqe_byte_count_get(cqe));
+	mlxsw_core_skb_receive(mlxsw_pci->core, skb, &rx_info);
+
+put_new_skb:
+	memset(wqe, 0, q->elem_size);
+	err = mlxsw_pci_rdq_skb_alloc(mlxsw_pci, elem_info);
+	if (err && net_ratelimit())
+		dev_dbg(&pdev->dev, "Failed to alloc skb for RDQ\n");
+	/* Everything is set up, ring doorbell to pass elem to HW */
+	q->producer_counter++;
+	mlxsw_pci_queue_doorbell_producer_ring(mlxsw_pci, q);
+	return;
+
+drop:
+	dev_kfree_skb_any(skb);
+	goto put_new_skb;
+}
+
+static char *mlxsw_pci_cq_sw_cqe_get(struct mlxsw_pci_queue *q)
+{
+	return mlxsw_pci_queue_sw_elem_get(q, mlxsw_pci_cqe_owner_get);
+}
+
+static void mlxsw_pci_cq_tasklet(unsigned long data)
+{
+	struct mlxsw_pci_queue *q = (struct mlxsw_pci_queue *) data;
+	struct mlxsw_pci *mlxsw_pci = q->pci;
+	char *cqe;
+	int items = 0;
+	int credits = q->count >> 1;
+
+	while ((cqe = mlxsw_pci_cq_sw_cqe_get(q))) {
+		u16 wqe_counter = mlxsw_pci_cqe_wqe_counter_get(cqe);
+		u8 sendq = mlxsw_pci_cqe_sr_get(cqe);
+		u8 dqn = mlxsw_pci_cqe_dqn_get(cqe);
+
+		if (sendq) {
+			struct mlxsw_pci_queue *sdq;
+
+			sdq = mlxsw_pci_sdq_get(mlxsw_pci, dqn);
+			mlxsw_pci_cqe_sdq_handle(mlxsw_pci, sdq,
+						 wqe_counter, cqe);
+			q->u.cq.comp_sdq_count++;
+		} else {
+			struct mlxsw_pci_queue *rdq;
+
+			rdq = mlxsw_pci_rdq_get(mlxsw_pci, dqn);
+			mlxsw_pci_cqe_rdq_handle(mlxsw_pci, rdq,
+						 wqe_counter, cqe);
+			q->u.cq.comp_rdq_count++;
+		}
+		if (++items == credits)
+			break;
+	}
+	if (items) {
+		mlxsw_pci_queue_doorbell_consumer_ring(mlxsw_pci, q);
+		mlxsw_pci_queue_doorbell_arm_consumer_ring(mlxsw_pci, q);
+	}
+}
+
+static int mlxsw_pci_eq_init(struct mlxsw_pci *mlxsw_pci, char *mbox,
+			     struct mlxsw_pci_queue *q)
+{
+	int i;
+	int err;
+
+	q->consumer_counter = 0;
+
+	for (i = 0; i < q->count; i++) {
+		char *elem = mlxsw_pci_queue_elem_get(q, i);
+
+		mlxsw_pci_eqe_owner_set(elem, 1);
+	}
+
+	mlxsw_cmd_mbox_sw2hw_eq_int_msix_set(mbox, 1); /* MSI-X used */
+	mlxsw_cmd_mbox_sw2hw_eq_oi_set(mbox, 0);
+	mlxsw_cmd_mbox_sw2hw_eq_st_set(mbox, 1); /* armed */
+	mlxsw_cmd_mbox_sw2hw_eq_log_eq_size_set(mbox, ilog2(q->count));
+	for (i = 0; i < MLXSW_PCI_AQ_PAGES; i++) {
+		dma_addr_t mapaddr = __mlxsw_pci_queue_page_get(q, i);
+
+		mlxsw_cmd_mbox_sw2hw_eq_pa_set(mbox, i, mapaddr);
+	}
+	err = mlxsw_cmd_sw2hw_eq(mlxsw_pci->core, mbox, q->num);
+	if (err)
+		return err;
+	mlxsw_pci_queue_doorbell_consumer_ring(mlxsw_pci, q);
+	mlxsw_pci_queue_doorbell_arm_consumer_ring(mlxsw_pci, q);
+	return 0;
+}
+
+static void mlxsw_pci_eq_fini(struct mlxsw_pci *mlxsw_pci,
+			      struct mlxsw_pci_queue *q)
+{
+	mlxsw_cmd_hw2sw_eq(mlxsw_pci->core, q->num);
+}
+
+static int mlxsw_pci_eq_dbg_read(struct seq_file *file, void *data)
+{
+	struct mlxsw_pci *mlxsw_pci = dev_get_drvdata(file->private);
+	struct mlxsw_pci_queue *q;
+	int i;
+	static const char hdr[] =
+		"NUM CONS_COUNT     EV_CMD    EV_COMP   EV_OTHER COUNT\n";
+
+	seq_printf(file, hdr);
+	for (i = 0; i < mlxsw_pci_eq_count(mlxsw_pci); i++) {
+		q = mlxsw_pci_eq_get(mlxsw_pci, i);
+		spin_lock_bh(&q->lock);
+		seq_printf(file, "%3d %10d %10d %10d %10d %5d\n",
+			   i, q->consumer_counter, q->u.eq.ev_cmd_count,
+			   q->u.eq.ev_comp_count, q->u.eq.ev_other_count,
+			   q->count);
+		spin_unlock_bh(&q->lock);
+	}
+	return 0;
+}
+
+static void mlxsw_pci_eq_cmd_event(struct mlxsw_pci *mlxsw_pci, char *eqe)
+{
+	mlxsw_pci->cmd.comp.status = mlxsw_pci_eqe_cmd_status_get(eqe);
+	mlxsw_pci->cmd.comp.out_param =
+		((u64) mlxsw_pci_eqe_cmd_out_param_h_get(eqe)) << 32 |
+		mlxsw_pci_eqe_cmd_out_param_l_get(eqe);
+	mlxsw_pci->cmd.wait_done = true;
+	wake_up(&mlxsw_pci->cmd.wait);
+}
+
+static char *mlxsw_pci_eq_sw_eqe_get(struct mlxsw_pci_queue *q)
+{
+	return mlxsw_pci_queue_sw_elem_get(q, mlxsw_pci_eqe_owner_get);
+}
+
+static void mlxsw_pci_eq_tasklet(unsigned long data)
+{
+	struct mlxsw_pci_queue *q = (struct mlxsw_pci_queue *) data;
+	struct mlxsw_pci *mlxsw_pci = q->pci;
+	unsigned long active_cqns[BITS_TO_LONGS(MLXSW_PCI_CQS_COUNT)];
+	char *eqe;
+	u8 cqn;
+	bool cq_handle = false;
+	int items = 0;
+	int credits = q->count >> 1;
+
+	memset(&active_cqns, 0, sizeof(active_cqns));
+
+	while ((eqe = mlxsw_pci_eq_sw_eqe_get(q))) {
+		u8 event_type = mlxsw_pci_eqe_event_type_get(eqe);
+
+		switch (event_type) {
+		case MLXSW_PCI_EQE_EVENT_TYPE_CMD:
+			mlxsw_pci_eq_cmd_event(mlxsw_pci, eqe);
+			q->u.eq.ev_cmd_count++;
+			break;
+		case MLXSW_PCI_EQE_EVENT_TYPE_COMP:
+			cqn = mlxsw_pci_eqe_cqn_get(eqe);
+			set_bit(cqn, active_cqns);
+			cq_handle = true;
+			q->u.eq.ev_comp_count++;
+			break;
+		default:
+			q->u.eq.ev_other_count++;
+		}
+		if (++items == credits)
+			break;
+	}
+	if (items) {
+		mlxsw_pci_queue_doorbell_consumer_ring(mlxsw_pci, q);
+		mlxsw_pci_queue_doorbell_arm_consumer_ring(mlxsw_pci, q);
+	}
+
+	if (!cq_handle)
+		return;
+	for_each_set_bit(cqn, active_cqns, MLXSW_PCI_CQS_COUNT) {
+		q = mlxsw_pci_cq_get(mlxsw_pci, cqn);
+		mlxsw_pci_queue_tasklet_schedule(q);
+	}
+}
+
+struct mlxsw_pci_queue_ops {
+	const char *name;
+	enum mlxsw_pci_queue_type type;
+	int (*init)(struct mlxsw_pci *mlxsw_pci, char *mbox,
+		    struct mlxsw_pci_queue *q);
+	void (*fini)(struct mlxsw_pci *mlxsw_pci,
+		     struct mlxsw_pci_queue *q);
+	void (*tasklet)(unsigned long data);
+	int (*dbg_read)(struct seq_file *s, void *data);
+	u16 elem_count;
+	u8 elem_size;
+};
+
+static const struct mlxsw_pci_queue_ops mlxsw_pci_sdq_ops = {
+	.type		= MLXSW_PCI_QUEUE_TYPE_SDQ,
+	.init		= mlxsw_pci_sdq_init,
+	.fini		= mlxsw_pci_sdq_fini,
+	.dbg_read	= mlxsw_pci_sdq_dbg_read,
+	.elem_count	= MLXSW_PCI_WQE_COUNT,
+	.elem_size	= MLXSW_PCI_WQE_SIZE,
+};
+
+static const struct mlxsw_pci_queue_ops mlxsw_pci_rdq_ops = {
+	.type		= MLXSW_PCI_QUEUE_TYPE_RDQ,
+	.init		= mlxsw_pci_rdq_init,
+	.fini		= mlxsw_pci_rdq_fini,
+	.dbg_read	= mlxsw_pci_rdq_dbg_read,
+	.elem_count	= MLXSW_PCI_WQE_COUNT,
+	.elem_size	= MLXSW_PCI_WQE_SIZE
+};
+
+static const struct mlxsw_pci_queue_ops mlxsw_pci_cq_ops = {
+	.type		= MLXSW_PCI_QUEUE_TYPE_CQ,
+	.init		= mlxsw_pci_cq_init,
+	.fini		= mlxsw_pci_cq_fini,
+	.tasklet	= mlxsw_pci_cq_tasklet,
+	.dbg_read	= mlxsw_pci_cq_dbg_read,
+	.elem_count	= MLXSW_PCI_CQE_COUNT,
+	.elem_size	= MLXSW_PCI_CQE_SIZE
+};
+
+static const struct mlxsw_pci_queue_ops mlxsw_pci_eq_ops = {
+	.type		= MLXSW_PCI_QUEUE_TYPE_EQ,
+	.init		= mlxsw_pci_eq_init,
+	.fini		= mlxsw_pci_eq_fini,
+	.tasklet	= mlxsw_pci_eq_tasklet,
+	.dbg_read	= mlxsw_pci_eq_dbg_read,
+	.elem_count	= MLXSW_PCI_EQE_COUNT,
+	.elem_size	= MLXSW_PCI_EQE_SIZE
+};
+
+static int mlxsw_pci_queue_init(struct mlxsw_pci *mlxsw_pci, char *mbox,
+				const struct mlxsw_pci_queue_ops *q_ops,
+				struct mlxsw_pci_queue *q, u8 q_num)
+{
+	struct mlxsw_pci_mem_item *mem_item = &q->mem_item;
+	int i;
+	int err;
+
+	spin_lock_init(&q->lock);
+	q->num = q_num;
+	q->count = q_ops->elem_count;
+	q->elem_size = q_ops->elem_size;
+	q->type = q_ops->type;
+	q->pci = mlxsw_pci;
+
+	if (q_ops->tasklet)
+		tasklet_init(&q->tasklet, q_ops->tasklet, (unsigned long) q);
+
+	mem_item->size = MLXSW_PCI_AQ_SIZE;
+	mem_item->buf = pci_alloc_consistent(mlxsw_pci->pdev,
+					     mem_item->size,
+					     &mem_item->mapaddr);
+	if (!mem_item->buf)
+		return -ENOMEM;
+	memset(mem_item->buf, 0, mem_item->size);
+
+	q->elem_info = kcalloc(q->count, sizeof(*q->elem_info), GFP_KERNEL);
+	if (!q->elem_info) {
+		err = -ENOMEM;
+		goto err_elem_info_alloc;
+	}
+
+	/* Initialize dma mapped elements info elem_info for
+	 * future easy access.
+	 */
+	for (i = 0; i < q->count; i++) {
+		struct mlxsw_pci_queue_elem_info *elem_info;
+
+		elem_info = mlxsw_pci_queue_elem_info_get(q, i);
+		elem_info->elem =
+			__mlxsw_pci_queue_elem_get(q, q_ops->elem_size, i);
+	}
+
+	mlxsw_cmd_mbox_zero(mbox);
+	err = q_ops->init(mlxsw_pci, mbox, q);
+	if (err)
+		goto err_q_ops_init;
+	return 0;
+
+err_q_ops_init:
+	kfree(q->elem_info);
+err_elem_info_alloc:
+	pci_free_consistent(mlxsw_pci->pdev, mem_item->size,
+			    mem_item->buf, mem_item->mapaddr);
+	return err;
+}
+
+static void mlxsw_pci_queue_fini(struct mlxsw_pci *mlxsw_pci,
+				 const struct mlxsw_pci_queue_ops *q_ops,
+				 struct mlxsw_pci_queue *q)
+{
+	struct mlxsw_pci_mem_item *mem_item = &q->mem_item;
+
+	q_ops->fini(mlxsw_pci, q);
+	kfree(q->elem_info);
+	pci_free_consistent(mlxsw_pci->pdev, mem_item->size,
+			    mem_item->buf, mem_item->mapaddr);
+}
+
+static int mlxsw_pci_queue_group_init(struct mlxsw_pci *mlxsw_pci, char *mbox,
+				      const struct mlxsw_pci_queue_ops *q_ops,
+				      u8 num_qs)
+{
+	struct pci_dev *pdev = mlxsw_pci->pdev;
+	struct mlxsw_pci_queue_type_group *queue_group;
+	char tmp[16];
+	int i;
+	int err;
+
+	queue_group = mlxsw_pci_queue_type_group_get(mlxsw_pci, q_ops->type);
+	queue_group->q = kcalloc(num_qs, sizeof(*queue_group->q), GFP_KERNEL);
+	if (!queue_group->q)
+		return -ENOMEM;
+
+	for (i = 0; i < num_qs; i++) {
+		err = mlxsw_pci_queue_init(mlxsw_pci, mbox, q_ops,
+					   &queue_group->q[i], i);
+		if (err)
+			goto err_queue_init;
+	}
+	queue_group->count = num_qs;
+
+	sprintf(tmp, "%s_stats", mlxsw_pci_queue_type_str(q_ops->type));
+	debugfs_create_devm_seqfile(&pdev->dev, tmp, mlxsw_pci->dbg_dir,
+				    q_ops->dbg_read);
+
+	return 0;
+
+err_queue_init:
+	for (i--; i >= 0; i--)
+		mlxsw_pci_queue_fini(mlxsw_pci, q_ops, &queue_group->q[i]);
+	kfree(queue_group->q);
+	return err;
+}
+
+static void mlxsw_pci_queue_group_fini(struct mlxsw_pci *mlxsw_pci,
+				       const struct mlxsw_pci_queue_ops *q_ops)
+{
+	struct mlxsw_pci_queue_type_group *queue_group;
+	int i;
+
+	queue_group = mlxsw_pci_queue_type_group_get(mlxsw_pci, q_ops->type);
+	for (i = 0; i < queue_group->count; i++)
+		mlxsw_pci_queue_fini(mlxsw_pci, q_ops, &queue_group->q[i]);
+	kfree(queue_group->q);
+}
+
+static int mlxsw_pci_aqs_init(struct mlxsw_pci *mlxsw_pci, char *mbox)
+{
+	struct pci_dev *pdev = mlxsw_pci->pdev;
+	u8 num_sdqs;
+	u8 sdq_log2sz;
+	u8 num_rdqs;
+	u8 rdq_log2sz;
+	u8 num_cqs;
+	u8 cq_log2sz;
+	u8 num_eqs;
+	u8 eq_log2sz;
+	int err;
+
+	mlxsw_cmd_mbox_zero(mbox);
+	err = mlxsw_cmd_query_aq_cap(mlxsw_pci->core, mbox);
+	if (err)
+		return err;
+
+	num_sdqs = mlxsw_cmd_mbox_query_aq_cap_max_num_sdqs_get(mbox);
+	sdq_log2sz = mlxsw_cmd_mbox_query_aq_cap_log_max_sdq_sz_get(mbox);
+	num_rdqs = mlxsw_cmd_mbox_query_aq_cap_max_num_rdqs_get(mbox);
+	rdq_log2sz = mlxsw_cmd_mbox_query_aq_cap_log_max_rdq_sz_get(mbox);
+	num_cqs = mlxsw_cmd_mbox_query_aq_cap_max_num_cqs_get(mbox);
+	cq_log2sz = mlxsw_cmd_mbox_query_aq_cap_log_max_cq_sz_get(mbox);
+	num_eqs = mlxsw_cmd_mbox_query_aq_cap_max_num_eqs_get(mbox);
+	eq_log2sz = mlxsw_cmd_mbox_query_aq_cap_log_max_eq_sz_get(mbox);
+
+	if ((num_sdqs != MLXSW_PCI_SDQS_COUNT) ||
+	    (num_rdqs != MLXSW_PCI_RDQS_COUNT) ||
+	    (num_cqs != MLXSW_PCI_CQS_COUNT) ||
+	    (num_eqs != MLXSW_PCI_EQS_COUNT)) {
+		dev_err(&pdev->dev, "Unsupported number of queues\n");
+		return -EINVAL;
+	}
+
+	if ((1 << sdq_log2sz != MLXSW_PCI_WQE_COUNT) ||
+	    (1 << rdq_log2sz != MLXSW_PCI_WQE_COUNT) ||
+	    (1 << cq_log2sz != MLXSW_PCI_CQE_COUNT) ||
+	    (1 << eq_log2sz != MLXSW_PCI_EQE_COUNT)) {
+		dev_err(&pdev->dev, "Unsupported number of async queue descriptors\n");
+		return -EINVAL;
+	}
+
+	err = mlxsw_pci_queue_group_init(mlxsw_pci, mbox, &mlxsw_pci_eq_ops,
+					 num_eqs);
+	if (err) {
+		dev_err(&pdev->dev, "Failed to initialize event queues\n");
+		return err;
+	}
+
+	err = mlxsw_pci_queue_group_init(mlxsw_pci, mbox, &mlxsw_pci_cq_ops,
+					 num_cqs);
+	if (err) {
+		dev_err(&pdev->dev, "Failed to initialize completion queues\n");
+		goto err_cqs_init;
+	}
+
+	err = mlxsw_pci_queue_group_init(mlxsw_pci, mbox, &mlxsw_pci_sdq_ops,
+					 num_sdqs);
+	if (err) {
+		dev_err(&pdev->dev, "Failed to initialize send descriptor queues\n");
+		goto err_sdqs_init;
+	}
+
+	err = mlxsw_pci_queue_group_init(mlxsw_pci, mbox, &mlxsw_pci_rdq_ops,
+					 num_rdqs);
+	if (err) {
+		dev_err(&pdev->dev, "Failed to initialize receive descriptor queues\n");
+		goto err_rdqs_init;
+	}
+
+	/* We have to poll in command interface until queues are initialized */
+	mlxsw_pci->cmd.nopoll = true;
+	return 0;
+
+err_rdqs_init:
+	mlxsw_pci_queue_group_fini(mlxsw_pci, &mlxsw_pci_sdq_ops);
+err_sdqs_init:
+	mlxsw_pci_queue_group_fini(mlxsw_pci, &mlxsw_pci_cq_ops);
+err_cqs_init:
+	mlxsw_pci_queue_group_fini(mlxsw_pci, &mlxsw_pci_eq_ops);
+	return err;
+}
+
+static void mlxsw_pci_aqs_fini(struct mlxsw_pci *mlxsw_pci)
+{
+	mlxsw_pci->cmd.nopoll = false;
+	mlxsw_pci_queue_group_fini(mlxsw_pci, &mlxsw_pci_rdq_ops);
+	mlxsw_pci_queue_group_fini(mlxsw_pci, &mlxsw_pci_sdq_ops);
+	mlxsw_pci_queue_group_fini(mlxsw_pci, &mlxsw_pci_cq_ops);
+	mlxsw_pci_queue_group_fini(mlxsw_pci, &mlxsw_pci_eq_ops);
+}
+
+static void
+mlxsw_pci_config_profile_swid_config(struct mlxsw_pci *mlxsw_pci,
+				     char *mbox, int index,
+				     const struct mlxsw_swid_config *swid)
+{
+	u8 mask = 0;
+
+	if (swid->used_type) {
+		mlxsw_cmd_mbox_config_profile_swid_config_type_set(
+			mbox, index, swid->type);
+		mask |= 1;
+	}
+	if (swid->used_properties) {
+		mlxsw_cmd_mbox_config_profile_swid_config_properties_set(
+			mbox, index, swid->properties);
+		mask |= 2;
+	}
+	mlxsw_cmd_mbox_config_profile_swid_config_mask_set(mbox, index, mask);
+}
+
+static int mlxsw_pci_config_profile(struct mlxsw_pci *mlxsw_pci, char *mbox,
+				    const struct mlxsw_config_profile *profile)
+{
+	int i;
+
+	mlxsw_cmd_mbox_zero(mbox);
+
+	if (profile->used_max_vepa_channels) {
+		mlxsw_cmd_mbox_config_profile_set_max_vepa_channels_set(
+			mbox, 1);
+		mlxsw_cmd_mbox_config_profile_max_vepa_channels_set(
+			mbox, profile->max_vepa_channels);
+	}
+	if (profile->used_max_lag) {
+		mlxsw_cmd_mbox_config_profile_set_max_lag_set(
+			mbox, 1);
+		mlxsw_cmd_mbox_config_profile_max_lag_set(
+			mbox, profile->max_lag);
+	}
+	if (profile->used_max_port_per_lag) {
+		mlxsw_cmd_mbox_config_profile_set_max_port_per_lag_set(
+			mbox, 1);
+		mlxsw_cmd_mbox_config_profile_max_port_per_lag_set(
+			mbox, profile->max_port_per_lag);
+	}
+	if (profile->used_max_mid) {
+		mlxsw_cmd_mbox_config_profile_set_max_mid_set(
+			mbox, 1);
+		mlxsw_cmd_mbox_config_profile_max_mid_set(
+			mbox, profile->max_mid);
+	}
+	if (profile->used_max_pgt) {
+		mlxsw_cmd_mbox_config_profile_set_max_pgt_set(
+			mbox, 1);
+		mlxsw_cmd_mbox_config_profile_max_pgt_set(
+			mbox, profile->max_pgt);
+	}
+	if (profile->used_max_system_port) {
+		mlxsw_cmd_mbox_config_profile_set_max_system_port_set(
+			mbox, 1);
+		mlxsw_cmd_mbox_config_profile_max_system_port_set(
+			mbox, profile->max_system_port);
+	}
+	if (profile->used_max_vlan_groups) {
+		mlxsw_cmd_mbox_config_profile_set_max_vlan_groups_set(
+			mbox, 1);
+		mlxsw_cmd_mbox_config_profile_max_vlan_groups_set(
+			mbox, profile->max_vlan_groups);
+	}
+	if (profile->used_max_regions) {
+		mlxsw_cmd_mbox_config_profile_set_max_regions_set(
+			mbox, 1);
+		mlxsw_cmd_mbox_config_profile_max_regions_set(
+			mbox, profile->max_regions);
+	}
+	if (profile->used_flood_tables) {
+		mlxsw_cmd_mbox_config_profile_set_flood_tables_set(
+			mbox, 1);
+		mlxsw_cmd_mbox_config_profile_max_flood_tables_set(
+			mbox, profile->max_flood_tables);
+		mlxsw_cmd_mbox_config_profile_max_vid_flood_tables_set(
+			mbox, profile->max_vid_flood_tables);
+	}
+	if (profile->used_flood_mode) {
+		mlxsw_cmd_mbox_config_profile_set_flood_mode_set(
+			mbox, 1);
+		mlxsw_cmd_mbox_config_profile_flood_mode_set(
+			mbox, profile->flood_mode);
+	}
+	if (profile->used_max_ib_mc) {
+		mlxsw_cmd_mbox_config_profile_set_max_ib_mc_set(
+			mbox, 1);
+		mlxsw_cmd_mbox_config_profile_max_ib_mc_set(
+			mbox, profile->max_ib_mc);
+	}
+	if (profile->used_max_pkey) {
+		mlxsw_cmd_mbox_config_profile_set_max_pkey_set(
+			mbox, 1);
+		mlxsw_cmd_mbox_config_profile_max_pkey_set(
+			mbox, profile->max_pkey);
+	}
+	if (profile->used_ar_sec) {
+		mlxsw_cmd_mbox_config_profile_set_ar_sec_set(
+			mbox, 1);
+		mlxsw_cmd_mbox_config_profile_ar_sec_set(
+			mbox, profile->ar_sec);
+	}
+	if (profile->used_adaptive_routing_group_cap) {
+		mlxsw_cmd_mbox_config_profile_set_adaptive_routing_group_cap_set(
+			mbox, 1);
+		mlxsw_cmd_mbox_config_profile_adaptive_routing_group_cap_set(
+			mbox, profile->adaptive_routing_group_cap);
+	}
+
+	for (i = 0; i < MLXSW_CONFIG_PROFILE_SWID_COUNT; i++)
+		mlxsw_pci_config_profile_swid_config(mlxsw_pci, mbox, i,
+						     &profile->swid_config[i]);
+
+	return mlxsw_cmd_config_profile_set(mlxsw_pci->core, mbox);
+}
+
+static int mlxsw_pci_boardinfo(struct mlxsw_pci *mlxsw_pci, char *mbox)
+{
+	struct mlxsw_bus_info *bus_info = &mlxsw_pci->bus_info;
+	int err;
+
+	mlxsw_cmd_mbox_zero(mbox);
+	err = mlxsw_cmd_boardinfo(mlxsw_pci->core, mbox);
+	if (err)
+		return err;
+	mlxsw_cmd_mbox_boardinfo_vsd_memcpy_from(mbox, bus_info->vsd);
+	mlxsw_cmd_mbox_boardinfo_psid_memcpy_from(mbox, bus_info->psid);
+	return 0;
+}
+
+static int mlxsw_pci_fw_area_init(struct mlxsw_pci *mlxsw_pci, char *mbox,
+				  u16 num_pages)
+{
+	struct mlxsw_pci_mem_item *mem_item;
+	int i;
+	int err;
+
+	mlxsw_pci->fw_area.items = kcalloc(num_pages, sizeof(*mem_item),
+					   GFP_KERNEL);
+	if (!mlxsw_pci->fw_area.items)
+		return -ENOMEM;
+	mlxsw_pci->fw_area.num_pages = num_pages;
+
+	mlxsw_cmd_mbox_zero(mbox);
+	for (i = 0; i < num_pages; i++) {
+		mem_item = &mlxsw_pci->fw_area.items[i];
+
+		mem_item->size = MLXSW_PCI_PAGE_SIZE;
+		mem_item->buf = pci_alloc_consistent(mlxsw_pci->pdev,
+						     mem_item->size,
+						     &mem_item->mapaddr);
+		if (!mem_item->buf)
+			goto err_alloc;
+		mlxsw_cmd_mbox_map_fa_pa_set(mbox, i, mem_item->mapaddr);
+		mlxsw_cmd_mbox_map_fa_log2size_set(mbox, i, 0); /* 1 page */
+	}
+
+	err = mlxsw_cmd_map_fa(mlxsw_pci->core, mbox, num_pages);
+	if (err)
+		goto err_cmd_map_fa;
+
+	return 0;
+
+err_cmd_map_fa:
+err_alloc:
+	for (i--; i >= 0; i--) {
+		mem_item = &mlxsw_pci->fw_area.items[i];
+
+		pci_free_consistent(mlxsw_pci->pdev, mem_item->size,
+				    mem_item->buf, mem_item->mapaddr);
+	}
+	kfree(mlxsw_pci->fw_area.items);
+	return err;
+}
+
+static void mlxsw_pci_fw_area_fini(struct mlxsw_pci *mlxsw_pci)
+{
+	struct mlxsw_pci_mem_item *mem_item;
+	int i;
+
+	mlxsw_cmd_unmap_fa(mlxsw_pci->core);
+
+	for (i = 0; i < mlxsw_pci->fw_area.num_pages; i++) {
+		mem_item = &mlxsw_pci->fw_area.items[i];
+
+		pci_free_consistent(mlxsw_pci->pdev, mem_item->size,
+				    mem_item->buf, mem_item->mapaddr);
+	}
+	kfree(mlxsw_pci->fw_area.items);
+}
+
+static irqreturn_t mlxsw_pci_eq_irq_handler(int irq, void *dev_id)
+{
+	struct mlxsw_pci *mlxsw_pci = dev_id;
+	struct mlxsw_pci_queue *q;
+	int i;
+
+	for (i = 0; i < MLXSW_PCI_EQS_COUNT; i++) {
+		q = mlxsw_pci_eq_get(mlxsw_pci, i);
+		mlxsw_pci_queue_tasklet_schedule(q);
+	}
+	return IRQ_HANDLED;
+}
+
+static int mlxsw_pci_init(void *bus_priv, struct mlxsw_core *mlxsw_core,
+			  const struct mlxsw_config_profile *profile)
+{
+	struct mlxsw_pci *mlxsw_pci = bus_priv;
+	struct pci_dev *pdev = mlxsw_pci->pdev;
+	char *mbox;
+	u16 num_pages;
+	int err;
+
+	mutex_init(&mlxsw_pci->cmd.lock);
+	init_waitqueue_head(&mlxsw_pci->cmd.wait);
+
+	mlxsw_pci->core = mlxsw_core;
+
+	mbox = mlxsw_cmd_mbox_alloc();
+	if (!mbox)
+		return -ENOMEM;
+	err = mlxsw_cmd_query_fw(mlxsw_core, mbox);
+	if (err)
+		goto err_query_fw;
+
+	mlxsw_pci->bus_info.fw_rev.major =
+		mlxsw_cmd_mbox_query_fw_fw_rev_major_get(mbox);
+	mlxsw_pci->bus_info.fw_rev.minor =
+		mlxsw_cmd_mbox_query_fw_fw_rev_minor_get(mbox);
+	mlxsw_pci->bus_info.fw_rev.subminor =
+		mlxsw_cmd_mbox_query_fw_fw_rev_subminor_get(mbox);
+
+	if (mlxsw_cmd_mbox_query_fw_cmd_interface_rev_get(mbox) != 1) {
+		dev_err(&pdev->dev, "Unsupported cmd interface revision ID queried from hw\n");
+		err = -EINVAL;
+		goto err_iface_rev;
+	}
+	if (mlxsw_cmd_mbox_query_fw_doorbell_page_bar_get(mbox) != 0) {
+		dev_err(&pdev->dev, "Unsupported doorbell page bar queried from hw\n");
+		err = -EINVAL;
+		goto err_doorbell_page_bar;
+	}
+
+	mlxsw_pci->doorbell_offset =
+		mlxsw_cmd_mbox_query_fw_doorbell_page_offset_get(mbox);
+
+	num_pages = mlxsw_cmd_mbox_query_fw_fw_pages_get(mbox);
+	err = mlxsw_pci_fw_area_init(mlxsw_pci, mbox, num_pages);
+	if (err)
+		goto err_fw_area_init;
+
+	err = mlxsw_pci_boardinfo(mlxsw_pci, mbox);
+	if (err)
+		goto err_boardinfo;
+
+	err = mlxsw_pci_config_profile(mlxsw_pci, mbox, profile);
+	if (err)
+		goto err_config_profile;
+
+	err = mlxsw_pci_aqs_init(mlxsw_pci, mbox);
+	if (err)
+		goto err_aqs_init;
+
+	err = request_irq(mlxsw_pci->msix_entry.vector,
+			  mlxsw_pci_eq_irq_handler, 0,
+			  mlxsw_pci_driver_name, mlxsw_pci);
+	if (err) {
+		dev_err(&pdev->dev, "IRQ request failed\n");
+		goto err_request_eq_irq;
+	}
+
+	goto mbox_put;
+
+err_request_eq_irq:
+	mlxsw_pci_aqs_fini(mlxsw_pci);
+err_aqs_init:
+err_config_profile:
+err_boardinfo:
+	mlxsw_pci_fw_area_fini(mlxsw_pci);
+err_fw_area_init:
+err_doorbell_page_bar:
+err_iface_rev:
+err_query_fw:
+mbox_put:
+	mlxsw_cmd_mbox_free(mbox);
+	return err;
+}
+
+static void mlxsw_pci_fini(void *bus_priv)
+{
+	struct mlxsw_pci *mlxsw_pci = bus_priv;
+
+	free_irq(mlxsw_pci->msix_entry.vector, mlxsw_pci);
+	mlxsw_pci_aqs_fini(mlxsw_pci);
+	mlxsw_pci_fw_area_fini(mlxsw_pci);
+}
+
+static struct mlxsw_pci_queue *
+mlxsw_pci_sdq_pick(struct mlxsw_pci *mlxsw_pci,
+		   const struct mlxsw_tx_info *tx_info)
+{
+	u8 sdqn = tx_info->local_port % mlxsw_pci_sdq_count(mlxsw_pci);
+
+	return mlxsw_pci_sdq_get(mlxsw_pci, sdqn);
+}
+
+static int mlxsw_pci_skb_transmit(void *bus_priv, struct sk_buff *skb,
+				  const struct mlxsw_tx_info *tx_info)
+{
+	struct mlxsw_pci *mlxsw_pci = bus_priv;
+	struct mlxsw_pci_queue *q;
+	struct mlxsw_pci_queue_elem_info *elem_info;
+	char *wqe;
+	int i;
+	int err;
+
+	if (skb_shinfo(skb)->nr_frags > MLXSW_PCI_WQE_SG_ENTRIES - 1)
+		return -EINVAL;
+
+	q = mlxsw_pci_sdq_pick(mlxsw_pci, tx_info);
+	spin_lock_bh(&q->lock);
+	elem_info = mlxsw_pci_queue_elem_info_producer_get(q);
+	if (!elem_info) {
+		/* queue is full */
+		err = -EAGAIN;
+		goto unlock;
+	}
+	elem_info->u.sdq.skb = skb;
+
+	wqe = elem_info->elem;
+	mlxsw_pci_wqe_c_set(wqe, 1); /* always report completion */
+	mlxsw_pci_wqe_lp_set(wqe, !!tx_info->is_emad);
+	mlxsw_pci_wqe_type_set(wqe, MLXSW_PCI_WQE_TYPE_ETHERNET);
+
+	err = mlxsw_pci_wqe_frag_map(mlxsw_pci, wqe, 0, skb->data,
+				     skb_headlen(skb), DMA_TO_DEVICE);
+	if (err)
+		goto unlock;
+
+	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+		const skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+
+		err = mlxsw_pci_wqe_frag_map(mlxsw_pci, wqe, i + 1,
+					     skb_frag_address(frag),
+					     skb_frag_size(frag),
+					     DMA_TO_DEVICE);
+		if (err)
+			goto unmap_frags;
+	}
+
+	/* Set unused sq entries byte count to zero. */
+	for (i++; i < MLXSW_PCI_WQE_SG_ENTRIES; i++)
+		mlxsw_pci_wqe_byte_count_set(wqe, i, 0);
+
+	/* Everything is set up, ring producer doorbell to get HW going */
+	q->producer_counter++;
+	mlxsw_pci_queue_doorbell_producer_ring(mlxsw_pci, q);
+
+	goto unlock;
+
+unmap_frags:
+	for (; i >= 0; i--)
+		mlxsw_pci_wqe_frag_unmap(mlxsw_pci, wqe, i, DMA_TO_DEVICE);
+unlock:
+	spin_unlock_bh(&q->lock);
+	return err;
+}
+
+static int mlxsw_pci_cmd_exec(void *bus_priv, u16 opcode, u8 opcode_mod,
+			      u32 in_mod, bool out_mbox_direct,
+			      char *in_mbox, size_t in_mbox_size,
+			      char *out_mbox, size_t out_mbox_size,
+			      u8 *p_status)
+{
+	struct mlxsw_pci *mlxsw_pci = bus_priv;
+	dma_addr_t in_mapaddr = 0;
+	dma_addr_t out_mapaddr = 0;
+	bool evreq = mlxsw_pci->cmd.nopoll;
+	unsigned long timeout = msecs_to_jiffies(MLXSW_PCI_CIR_TIMEOUT_MSECS);
+	bool *p_wait_done = &mlxsw_pci->cmd.wait_done;
+	int err;
+
+	*p_status = MLXSW_CMD_STATUS_OK;
+
+	err = mutex_lock_interruptible(&mlxsw_pci->cmd.lock);
+	if (err)
+		return err;
+
+	if (in_mbox) {
+		in_mapaddr = pci_map_single(mlxsw_pci->pdev, in_mbox,
+					    in_mbox_size, PCI_DMA_TODEVICE);
+		if (unlikely(pci_dma_mapping_error(mlxsw_pci->pdev,
+						   in_mapaddr))) {
+			err = -EIO;
+			goto err_in_mbox_map;
+		}
+	}
+	mlxsw_pci_write32(mlxsw_pci, CIR_IN_PARAM_HI, in_mapaddr >> 32);
+	mlxsw_pci_write32(mlxsw_pci, CIR_IN_PARAM_LO, in_mapaddr);
+
+	if (out_mbox) {
+		out_mapaddr = pci_map_single(mlxsw_pci->pdev, out_mbox,
+					     out_mbox_size, PCI_DMA_FROMDEVICE);
+		if (unlikely(pci_dma_mapping_error(mlxsw_pci->pdev,
+						   out_mapaddr))) {
+			err = -EIO;
+			goto err_out_mbox_map;
+		}
+	}
+	mlxsw_pci_write32(mlxsw_pci, CIR_OUT_PARAM_HI, out_mapaddr >> 32);
+	mlxsw_pci_write32(mlxsw_pci, CIR_OUT_PARAM_LO, out_mapaddr);
+
+	mlxsw_pci_write32(mlxsw_pci, CIR_IN_MODIFIER, in_mod);
+	mlxsw_pci_write32(mlxsw_pci, CIR_TOKEN, 0);
+
+	*p_wait_done = false;
+
+	wmb(); /* all needs to be written before we write control register */
+	mlxsw_pci_write32(mlxsw_pci, CIR_CTRL,
+			  MLXSW_PCI_CIR_CTRL_GO_BIT |
+			  (evreq ? MLXSW_PCI_CIR_CTRL_EVREQ_BIT : 0) |
+			  (opcode_mod << MLXSW_PCI_CIR_CTRL_OPCODE_MOD_SHIFT) |
+			  opcode);
+
+	if (!evreq) {
+		unsigned long end;
+
+		end = jiffies + timeout;
+		do {
+			u32 ctrl = mlxsw_pci_read32(mlxsw_pci, CIR_CTRL);
+
+			if (!(ctrl & MLXSW_PCI_CIR_CTRL_GO_BIT)) {
+				*p_wait_done = true;
+				*p_status = ctrl >> MLXSW_PCI_CIR_CTRL_STATUS_SHIFT;
+				break;
+			}
+			cond_resched();
+		} while (time_before(jiffies, end));
+	} else {
+		wait_event_timeout(mlxsw_pci->cmd.wait, *p_wait_done, timeout);
+		*p_status = mlxsw_pci->cmd.comp.status;
+	}
+
+	err = 0;
+	if (*p_wait_done) {
+		if (*p_status)
+			err = -EIO;
+	} else {
+		err = -ETIMEDOUT;
+	}
+
+	if (!err && out_mbox && out_mbox_direct) {
+		/* Some commands does not use output param as address to mailbox
+		 * but they store output directly into registers. In that case,
+		 * copy registers into mbox buffer.
+		 */
+		__be32 tmp;
+
+		if (!evreq) {
+			tmp = cpu_to_be32(mlxsw_pci_read32(mlxsw_pci,
+							   CIR_OUT_PARAM_HI));
+			memcpy(out_mbox, &tmp, sizeof(tmp));
+			tmp = cpu_to_be32(mlxsw_pci_read32(mlxsw_pci,
+							   CIR_OUT_PARAM_LO));
+			memcpy(out_mbox + sizeof(tmp), &tmp, sizeof(tmp));
+		}
+	}
+
+	if (out_mapaddr)
+		pci_unmap_single(mlxsw_pci->pdev, out_mapaddr, out_mbox_size,
+				 PCI_DMA_FROMDEVICE);
+
+	/* fall through */
+
+err_out_mbox_map:
+	if (in_mapaddr)
+		pci_unmap_single(mlxsw_pci->pdev, in_mapaddr, in_mbox_size,
+				 PCI_DMA_TODEVICE);
+err_in_mbox_map:
+	mutex_unlock(&mlxsw_pci->cmd.lock);
+
+	return err;
+}
+
+static const struct mlxsw_bus mlxsw_pci_bus = {
+	.kind		= "pci",
+	.init		= mlxsw_pci_init,
+	.fini		= mlxsw_pci_fini,
+	.skb_transmit	= mlxsw_pci_skb_transmit,
+	.cmd_exec	= mlxsw_pci_cmd_exec,
+};
+
+static int mlxsw_pci_sw_reset(struct mlxsw_pci *mlxsw_pci)
+{
+	mlxsw_pci_write32(mlxsw_pci, SW_RESET, MLXSW_PCI_SW_RESET_RST_BIT);
+	/* Current firware does not let us know when the reset is done.
+	 * So we just wait here for constant time and hope for the best.
+	 */
+	msleep(MLXSW_PCI_SW_RESET_TIMEOUT_MSECS);
+	return 0;
+}
+
+static int mlxsw_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+	struct mlxsw_pci *mlxsw_pci;
+	int err;
+
+	mlxsw_pci = kzalloc(sizeof(*mlxsw_pci), GFP_KERNEL);
+	if (!mlxsw_pci)
+		return -ENOMEM;
+
+	err = pci_enable_device(pdev);
+	if (err) {
+		dev_err(&pdev->dev, "pci_enable_device failed\n");
+		goto err_pci_enable_device;
+	}
+
+	err = pci_request_regions(pdev, mlxsw_pci_driver_name);
+	if (err) {
+		dev_err(&pdev->dev, "pci_request_regions failed\n");
+		goto err_pci_request_regions;
+	}
+
+	err = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
+	if (!err) {
+		err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
+		if (err) {
+			dev_err(&pdev->dev, "pci_set_consistent_dma_mask failed\n");
+			goto err_pci_set_dma_mask;
+		}
+	} else {
+		err = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
+		if (err) {
+			dev_err(&pdev->dev, "pci_set_dma_mask failed\n");
+			goto err_pci_set_dma_mask;
+		}
+	}
+
+	if (pci_resource_len(pdev, 0) < MLXSW_PCI_BAR0_SIZE) {
+		dev_err(&pdev->dev, "invalid PCI region size\n");
+		err = -EINVAL;
+		goto err_pci_resource_len_check;
+	}
+
+	mlxsw_pci->hw_addr = ioremap(pci_resource_start(pdev, 0),
+				     pci_resource_len(pdev, 0));
+	if (!mlxsw_pci->hw_addr) {
+		dev_err(&pdev->dev, "ioremap failed\n");
+		err = -EIO;
+		goto err_ioremap;
+	}
+	pci_set_master(pdev);
+
+	mlxsw_pci->pdev = pdev;
+	pci_set_drvdata(pdev, mlxsw_pci);
+
+	err = mlxsw_pci_sw_reset(mlxsw_pci);
+	if (err) {
+		dev_err(&pdev->dev, "Software reset failed\n");
+		goto err_sw_reset;
+	}
+
+	err = pci_enable_msix_exact(pdev, &mlxsw_pci->msix_entry, 1);
+	if (err) {
+		dev_err(&pdev->dev, "MSI-X init failed\n");
+		goto err_msix_init;
+	}
+
+	mlxsw_pci->bus_info.device_kind = mlxsw_pci_device_kind_get(id);
+	mlxsw_pci->bus_info.device_name = pci_name(mlxsw_pci->pdev);
+	mlxsw_pci->bus_info.dev = &pdev->dev;
+
+	mlxsw_pci->dbg_dir = debugfs_create_dir(mlxsw_pci->bus_info.device_name,
+						mlxsw_pci_dbg_root);
+	if (!mlxsw_pci->dbg_dir) {
+		dev_err(&pdev->dev, "Failed to create debugfs dir\n");
+		goto err_dbg_create_dir;
+	}
+
+	err = mlxsw_core_bus_device_register(&mlxsw_pci->bus_info,
+					     &mlxsw_pci_bus, mlxsw_pci);
+	if (err) {
+		dev_err(&pdev->dev, "cannot register bus device\n");
+		goto err_bus_device_register;
+	}
+
+	return 0;
+
+err_bus_device_register:
+	debugfs_remove_recursive(mlxsw_pci->dbg_dir);
+err_dbg_create_dir:
+	pci_disable_msix(mlxsw_pci->pdev);
+err_msix_init:
+err_sw_reset:
+	iounmap(mlxsw_pci->hw_addr);
+err_ioremap:
+err_pci_resource_len_check:
+err_pci_set_dma_mask:
+	pci_release_regions(pdev);
+err_pci_request_regions:
+	pci_disable_device(pdev);
+err_pci_enable_device:
+	kfree(mlxsw_pci);
+	return err;
+}
+
+static void mlxsw_pci_remove(struct pci_dev *pdev)
+{
+	struct mlxsw_pci *mlxsw_pci = pci_get_drvdata(pdev);
+
+	mlxsw_core_bus_device_unregister(mlxsw_pci->core);
+	debugfs_remove_recursive(mlxsw_pci->dbg_dir);
+	pci_disable_msix(mlxsw_pci->pdev);
+	iounmap(mlxsw_pci->hw_addr);
+	pci_release_regions(mlxsw_pci->pdev);
+	pci_disable_device(mlxsw_pci->pdev);
+	kfree(mlxsw_pci);
+}
+
+static struct pci_driver mlxsw_pci_driver = {
+	.name		= mlxsw_pci_driver_name,
+	.id_table	= mlxsw_pci_id_table,
+	.probe		= mlxsw_pci_probe,
+	.remove		= mlxsw_pci_remove,
+};
+
+static int __init mlxsw_pci_module_init(void)
+{
+	int err;
+
+	mlxsw_pci_dbg_root = debugfs_create_dir(mlxsw_pci_driver_name, NULL);
+	if (!mlxsw_pci_dbg_root)
+		return -ENOMEM;
+	err = pci_register_driver(&mlxsw_pci_driver);
+	if (err)
+		goto err_register_driver;
+	return 0;
+
+err_register_driver:
+	debugfs_remove_recursive(mlxsw_pci_dbg_root);
+	return err;
+}
+
+static void __exit mlxsw_pci_module_exit(void)
+{
+	pci_unregister_driver(&mlxsw_pci_driver);
+	debugfs_remove_recursive(mlxsw_pci_dbg_root);
+}
+
+module_init(mlxsw_pci_module_init);
+module_exit(mlxsw_pci_module_exit);
+
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_AUTHOR("Jiri Pirko <jiri@mellanox.com>");
+MODULE_DESCRIPTION("Mellanox switch PCI interface driver");
+MODULE_DEVICE_TABLE(pci, mlxsw_pci_id_table);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.h b/drivers/net/ethernet/mellanox/mlxsw/pci.h
new file mode 100644
index 0000000..6176a93
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.h
@@ -0,0 +1,220 @@
+/*
+ * drivers/net/ethernet/mellanox/mlxsw/pci.h
+ * Copyright (c) 2015 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2015 Jiri Pirko <jiri@mellanox.com>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the names of the copyright holders nor the names of its
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _MLXSW_PCI_H
+#define _MLXSW_PCI_H
+
+#include <linux/bitops.h>
+
+#include "item.h"
+
+#define MLXSW_PCI_BAR0_SIZE		(1024 * 1024) /* 1MB */
+#define MLXSW_PCI_PAGE_SIZE		4096
+
+#define MLXSW_PCI_CIR_BASE			0x71000
+#define MLXSW_PCI_CIR_IN_PARAM_HI		MLXSW_PCI_CIR_BASE
+#define MLXSW_PCI_CIR_IN_PARAM_LO		(MLXSW_PCI_CIR_BASE + 0x04)
+#define MLXSW_PCI_CIR_IN_MODIFIER		(MLXSW_PCI_CIR_BASE + 0x08)
+#define MLXSW_PCI_CIR_OUT_PARAM_HI		(MLXSW_PCI_CIR_BASE + 0x0C)
+#define MLXSW_PCI_CIR_OUT_PARAM_LO		(MLXSW_PCI_CIR_BASE + 0x10)
+#define MLXSW_PCI_CIR_TOKEN			(MLXSW_PCI_CIR_BASE + 0x14)
+#define MLXSW_PCI_CIR_CTRL			(MLXSW_PCI_CIR_BASE + 0x18)
+#define MLXSW_PCI_CIR_CTRL_GO_BIT		BIT(23)
+#define MLXSW_PCI_CIR_CTRL_EVREQ_BIT		BIT(22)
+#define MLXSW_PCI_CIR_CTRL_OPCODE_MOD_SHIFT	12
+#define MLXSW_PCI_CIR_CTRL_STATUS_SHIFT		24
+#define MLXSW_PCI_CIR_TIMEOUT_MSECS		1000
+
+#define MLXSW_PCI_SW_RESET			0xF0010
+#define MLXSW_PCI_SW_RESET_RST_BIT		BIT(0)
+#define MLXSW_PCI_SW_RESET_TIMEOUT_MSECS	5000
+
+#define MLXSW_PCI_DOORBELL_SDQ_OFFSET		0x000
+#define MLXSW_PCI_DOORBELL_RDQ_OFFSET		0x200
+#define MLXSW_PCI_DOORBELL_CQ_OFFSET		0x400
+#define MLXSW_PCI_DOORBELL_EQ_OFFSET		0x600
+#define MLXSW_PCI_DOORBELL_ARM_CQ_OFFSET	0x800
+#define MLXSW_PCI_DOORBELL_ARM_EQ_OFFSET	0xA00
+
+#define MLXSW_PCI_DOORBELL(offset, type_offset, num)	\
+	((offset) + (type_offset) + (num) * 4)
+
+#define MLXSW_PCI_RDQS_COUNT	24
+#define MLXSW_PCI_SDQS_COUNT	24
+#define MLXSW_PCI_CQS_COUNT	(MLXSW_PCI_RDQS_COUNT + MLXSW_PCI_SDQS_COUNT)
+#define MLXSW_PCI_EQS_COUNT	2
+#define MLXSW_PCI_EQ_ASYNC_NUM	0
+#define MLXSW_PCI_EQ_COMP_NUM	1
+
+#define MLXSW_PCI_AQ_PAGES	8
+#define MLXSW_PCI_AQ_SIZE	(MLXSW_PCI_PAGE_SIZE * MLXSW_PCI_AQ_PAGES)
+#define MLXSW_PCI_WQE_SIZE	32 /* 32 bytes per element */
+#define MLXSW_PCI_CQE_SIZE	16 /* 16 bytes per element */
+#define MLXSW_PCI_EQE_SIZE	16 /* 16 bytes per element */
+#define MLXSW_PCI_WQE_COUNT	(MLXSW_PCI_AQ_SIZE / MLXSW_PCI_WQE_SIZE)
+#define MLXSW_PCI_CQE_COUNT	(MLXSW_PCI_AQ_SIZE / MLXSW_PCI_CQE_SIZE)
+#define MLXSW_PCI_EQE_COUNT	(MLXSW_PCI_AQ_SIZE / MLXSW_PCI_EQE_SIZE)
+#define MLXSW_PCI_EQE_UPDATE_COUNT	0x80
+
+#define MLXSW_PCI_WQE_SG_ENTRIES	3
+#define MLXSW_PCI_WQE_TYPE_ETHERNET	0xA
+
+/* pci_wqe_c
+ * If set it indicates that a completion should be reported upon
+ * execution of this descriptor.
+ */
+MLXSW_ITEM32(pci, wqe, c, 0x00, 31, 1);
+
+/* pci_wqe_lp
+ * Local Processing, set if packet should be processed by the local
+ * switch hardware:
+ * For Ethernet EMAD (Direct Route and non Direct Route) -
+ * must be set if packet destination is local device
+ * For InfiniBand CTL - must be set if packet destination is local device
+ * Otherwise it must be clear
+ * Local Process packets must not exceed the size of 2K (including payload
+ * and headers).
+ */
+MLXSW_ITEM32(pci, wqe, lp, 0x00, 30, 1);
+
+/* pci_wqe_type
+ * Packet type.
+ */
+MLXSW_ITEM32(pci, wqe, type, 0x00, 23, 4);
+
+/* pci_wqe_byte_count
+ * Size of i-th scatter/gather entry, 0 if entry is unused.
+ */
+MLXSW_ITEM16_INDEXED(pci, wqe, byte_count, 0x02, 0, 14, 0x02, 0x00, false);
+
+/* pci_wqe_address
+ * Physical address of i-th scatter/gather entry.
+ * Gather Entries must be 2Byte aligned.
+ */
+MLXSW_ITEM64_INDEXED(pci, wqe, address, 0x08, 0, 64, 0x8, 0x0, false);
+
+/* pci_cqe_lag
+ * Packet arrives from a port which is a LAG
+ */
+MLXSW_ITEM32(pci, cqe, lag, 0x00, 23, 1);
+
+/* pci_cqe_system_port
+ * When lag=0: System port on which the packet was received
+ * When lag=1:
+ * bits [15:4] LAG ID on which the packet was received
+ * bits [3:0] sub_port on which the packet was received
+ */
+MLXSW_ITEM32(pci, cqe, system_port, 0x00, 0, 16);
+
+/* pci_cqe_wqe_counter
+ * WQE count of the WQEs completed on the associated dqn
+ */
+MLXSW_ITEM32(pci, cqe, wqe_counter, 0x04, 16, 16);
+
+/* pci_cqe_byte_count
+ * Byte count of received packets including additional two
+ * Reserved Bytes that are append to the end of the frame.
+ * Reserved for Send CQE.
+ */
+MLXSW_ITEM32(pci, cqe, byte_count, 0x04, 0, 14);
+
+/* pci_cqe_trap_id
+ * Trap ID that captured the packet.
+ */
+MLXSW_ITEM32(pci, cqe, trap_id, 0x08, 0, 8);
+
+/* pci_cqe_e
+ * CQE with Error.
+ */
+MLXSW_ITEM32(pci, cqe, e, 0x0C, 7, 1);
+
+/* pci_cqe_sr
+ * 1 - Send Queue
+ * 0 - Receive Queue
+ */
+MLXSW_ITEM32(pci, cqe, sr, 0x0C, 6, 1);
+
+/* pci_cqe_dqn
+ * Descriptor Queue (DQ) Number.
+ */
+MLXSW_ITEM32(pci, cqe, dqn, 0x0C, 1, 5);
+
+/* pci_cqe_owner
+ * Ownership bit.
+ */
+MLXSW_ITEM32(pci, cqe, owner, 0x0C, 0, 1);
+
+/* pci_eqe_event_type
+ * Event type.
+ */
+MLXSW_ITEM32(pci, eqe, event_type, 0x0C, 24, 8);
+#define MLXSW_PCI_EQE_EVENT_TYPE_COMP	0x00
+#define MLXSW_PCI_EQE_EVENT_TYPE_CMD	0x0A
+
+/* pci_eqe_event_sub_type
+ * Event type.
+ */
+MLXSW_ITEM32(pci, eqe, event_sub_type, 0x0C, 16, 8);
+
+/* pci_eqe_cqn
+ * Completion Queue that triggeret this EQE.
+ */
+MLXSW_ITEM32(pci, eqe, cqn, 0x0C, 8, 7);
+
+/* pci_eqe_owner
+ * Ownership bit.
+ */
+MLXSW_ITEM32(pci, eqe, owner, 0x0C, 0, 1);
+
+/* pci_eqe_cmd_token
+ * Command completion event - token
+ */
+MLXSW_ITEM32(pci, eqe, cmd_token, 0x08, 16, 16);
+
+/* pci_eqe_cmd_status
+ * Command completion event - status
+ */
+MLXSW_ITEM32(pci, eqe, cmd_status, 0x08, 0, 8);
+
+/* pci_eqe_cmd_out_param_h
+ * Command completion event - output parameter - higher part
+ */
+MLXSW_ITEM32(pci, eqe, cmd_out_param_h, 0x0C, 0, 32);
+
+/* pci_eqe_cmd_out_param_l
+ * Command completion event - output parameter - lower part
+ */
+MLXSW_ITEM32(pci, eqe, cmd_out_param_l, 0x10, 0, 32);
+
+#endif
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [patch net-next 3/4] mlxsw: Add interface to access registers and process events
  2015-07-23 15:43 [patch net-next 0/4] Introduce Mellanox Technologies Switch ASICs switchdev drivers Jiri Pirko
  2015-07-23 15:43 ` [patch net-next 1/4] mlxsw: Introduce Mellanox switch driver core Jiri Pirko
  2015-07-23 15:43 ` [patch net-next 2/4] mlxsw: Add PCI bus implementation Jiri Pirko
@ 2015-07-23 15:43 ` Jiri Pirko
  2015-07-23 21:12   ` Andy Gospodarek
  2015-07-24  5:13   ` Scott Feldman
  2015-07-23 15:43 ` [patch net-next 4/4] mlxsw: Introduce Mellanox SwitchX-2 ASIC support Jiri Pirko
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 29+ messages in thread
From: Jiri Pirko @ 2015-07-23 15:43 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, ogerlitz, sfeldma, roopa, f.fainelli,
	tgraf, ast, jhs, daniel, john.fastabend, simon.horman, linville,
	andy, shm, nhorman, Jiri Pirko

From: Ido Schimmel <idosch@mellanox.com>

Add the ability to construct mailbox-style register access messages
called EMADs with provisions to construct and parse the registers payload.
Implement EMAD transaction layer which is responsible for the reliable
transmission of EMADs.
Also, add an infrastructure used by the switch driver to register for
particular events generated by the device.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Elad Raz <eladr@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/core.c |  736 ++++++++++++++++
 drivers/net/ethernet/mellanox/mlxsw/core.h |   21 +
 drivers/net/ethernet/mellanox/mlxsw/emad.h |  127 +++
 drivers/net/ethernet/mellanox/mlxsw/port.h |   19 +
 drivers/net/ethernet/mellanox/mlxsw/reg.h  | 1289 ++++++++++++++++++++++++++++
 5 files changed, 2192 insertions(+)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/emad.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/reg.h

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 211ec9b..bd0f692 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -44,12 +44,27 @@
 #include <linux/seq_file.h>
 #include <linux/u64_stats_sync.h>
 #include <linux/netdevice.h>
+#include <linux/wait.h>
+#include <linux/skbuff.h>
+#include <linux/etherdevice.h>
+#include <linux/types.h>
+#include <linux/wait.h>
+#include <linux/string.h>
+#include <linux/gfp.h>
+#include <linux/random.h>
+#include <linux/jiffies.h>
+#include <linux/mutex.h>
+#include <linux/rcupdate.h>
+#include <linux/slab.h>
+#include <asm/byteorder.h>
 
 #include "core.h"
 #include "item.h"
 #include "cmd.h"
 #include "port.h"
 #include "trap.h"
+#include "emad.h"
+#include "reg.h"
 
 static LIST_HEAD(mlxsw_core_driver_list);
 static DEFINE_SPINLOCK(mlxsw_core_driver_list_lock);
@@ -76,6 +91,15 @@ struct mlxsw_core {
 	void *bus_priv;
 	const struct mlxsw_bus_info *bus_info;
 	struct list_head rx_listener_list;
+	struct list_head event_listener_list;
+	struct {
+		struct sk_buff *resp_skb;
+		u64 tid;
+		wait_queue_head_t wait;
+		bool trans_active;
+		struct mutex lock; /* One EMAD transaction at a time. */
+		bool use_emad;
+	} emad;
 	struct mlxsw_core_pcpu_stats __percpu *pcpu_stats;
 	struct dentry *dbg_dir;
 	struct {
@@ -92,6 +116,473 @@ struct mlxsw_rx_listener_item {
 	void *priv;
 };
 
+struct mlxsw_event_listener_item {
+	struct list_head list;
+	struct mlxsw_event_listener el;
+	void *priv;
+};
+
+/******************
+ * EMAD processing
+ ******************/
+
+/* emad_eth_hdr_dmac
+ * Destination MAC in EMAD's Ethernet header.
+ * Must be set to 01:02:c9:00:00:01
+ */
+MLXSW_ITEM_BUF(emad, eth_hdr, dmac, 0x00, 6);
+
+/* emad_eth_hdr_smac
+ * Source MAC in EMAD's Ethernet header.
+ * Must be set to 00:02:c9:01:02:03
+ */
+MLXSW_ITEM_BUF(emad, eth_hdr, smac, 0x06, 6);
+
+/* emad_eth_hdr_ethertype
+ * Ethertype in EMAD's Ethernet header.
+ * Must be set to 0x8932
+ */
+MLXSW_ITEM32(emad, eth_hdr, ethertype, 0x0C, 16, 16);
+
+/* emad_eth_hdr_mlx_proto
+ * Mellanox protocol.
+ * Must be set to 0x0.
+ */
+MLXSW_ITEM32(emad, eth_hdr, mlx_proto, 0x0C, 8, 8);
+
+/* emad_eth_hdr_ver
+ * Mellanox protocol version.
+ * Must be set to 0x0.
+ */
+MLXSW_ITEM32(emad, eth_hdr, ver, 0x0C, 4, 4);
+
+/* emad_op_tlv_type
+ * Type of the TLV.
+ * Must be set to 0x1 (operation TLV).
+ */
+MLXSW_ITEM32(emad, op_tlv, type, 0x00, 27, 5);
+
+/* emad_op_tlv_len
+ * Length of the operation TLV in u32.
+ * Must be set to 0x4.
+ */
+MLXSW_ITEM32(emad, op_tlv, len, 0x00, 16, 11);
+
+/* emad_op_tlv_dr
+ * Direct route bit. Setting to 1 indicates the EMAD is a direct route
+ * EMAD. DR TLV must follow.
+ *
+ * Note: Currently not supported and must not be set.
+ */
+MLXSW_ITEM32(emad, op_tlv, dr, 0x00, 15, 1);
+
+/* emad_op_tlv_status
+ * Returned status in case of EMAD response. Must be set to 0 in case
+ * of EMAD request.
+ * 0x0 - success
+ * 0x1 - device is busy. Requester should retry
+ * 0x2 - Mellanox protocol version not supported
+ * 0x3 - unknown TLV
+ * 0x4 - register not supported
+ * 0x5 - operation class not supported
+ * 0x6 - EMAD method not supported
+ * 0x7 - bad parameter (e.g. port out of range)
+ * 0x8 - resource not available
+ * 0x9 - message receipt acknowledgment. Requester should retry
+ * 0x70 - internal error
+ */
+MLXSW_ITEM32(emad, op_tlv, status, 0x00, 8, 7);
+
+/* emad_op_tlv_register_id
+ * Register ID of register within register TLV.
+ */
+MLXSW_ITEM32(emad, op_tlv, register_id, 0x04, 16, 16);
+
+/* emad_op_tlv_r
+ * Response bit. Setting to 1 indicates Response, otherwise request.
+ */
+MLXSW_ITEM32(emad, op_tlv, r, 0x04, 15, 1);
+
+/* emad_op_tlv_method
+ * EMAD method type.
+ * 0x1 - query
+ * 0x2 - write
+ * 0x3 - send (currently not supported)
+ * 0x4 - event
+ */
+MLXSW_ITEM32(emad, op_tlv, method, 0x04, 8, 7);
+
+/* emad_op_tlv_class
+ * EMAD operation class. Must be set to 0x1 (REG_ACCESS).
+ */
+MLXSW_ITEM32(emad, op_tlv, class, 0x04, 0, 8);
+
+/* emad_op_tlv_tid
+ * EMAD transaction ID. Used for pairing request and response EMADs.
+ */
+MLXSW_ITEM64(emad, op_tlv, tid, 0x08, 0, 64);
+
+/* emad_reg_tlv_type
+ * Type of the TLV.
+ * Must be set to 0x3 (register TLV).
+ */
+MLXSW_ITEM32(emad, reg_tlv, type, 0x00, 27, 5);
+
+/* emad_reg_tlv_len
+ * Length of the operation TLV in u32.
+ */
+MLXSW_ITEM32(emad, reg_tlv, len, 0x00, 16, 11);
+
+/* emad_end_tlv_type
+ * Type of the TLV.
+ * Must be set to 0x0 (end TLV).
+ */
+MLXSW_ITEM32(emad, end_tlv, type, 0x00, 27, 5);
+
+/* emad_end_tlv_len
+ * Length of the end TLV in u32.
+ * Must be set to 1.
+ */
+MLXSW_ITEM32(emad, end_tlv, len, 0x00, 16, 11);
+
+enum mlxsw_core_reg_access_type {
+	MLXSW_CORE_REG_ACCESS_TYPE_QUERY,
+	MLXSW_CORE_REG_ACCESS_TYPE_WRITE,
+};
+
+static inline const char *
+mlxsw_core_reg_access_type_str(enum mlxsw_core_reg_access_type type)
+{
+	switch (type) {
+	case MLXSW_CORE_REG_ACCESS_TYPE_QUERY:
+		return "query";
+	case MLXSW_CORE_REG_ACCESS_TYPE_WRITE:
+		return "write";
+	}
+	BUG();
+}
+
+static void mlxsw_emad_pack_end_tlv(char *end_tlv)
+{
+	mlxsw_emad_end_tlv_type_set(end_tlv, MLXSW_EMAD_TLV_TYPE_END);
+	mlxsw_emad_end_tlv_len_set(end_tlv, MLXSW_EMAD_END_TLV_LEN);
+}
+
+static void mlxsw_emad_pack_reg_tlv(char *reg_tlv,
+				    const struct mlxsw_reg_info *reg,
+				    char *payload)
+{
+	mlxsw_emad_reg_tlv_type_set(reg_tlv, MLXSW_EMAD_TLV_TYPE_REG);
+	mlxsw_emad_reg_tlv_len_set(reg_tlv, reg->len / sizeof(u32) + 1);
+	memcpy(reg_tlv + sizeof(u32), payload, reg->len);
+}
+
+static void mlxsw_emad_pack_op_tlv(char *op_tlv,
+				   const struct mlxsw_reg_info *reg,
+				   enum mlxsw_core_reg_access_type type,
+				   struct mlxsw_core *mlxsw_core)
+{
+	mlxsw_emad_op_tlv_type_set(op_tlv, MLXSW_EMAD_TLV_TYPE_OP);
+	mlxsw_emad_op_tlv_len_set(op_tlv, MLXSW_EMAD_OP_TLV_LEN);
+	mlxsw_emad_op_tlv_dr_set(op_tlv, 0);
+	mlxsw_emad_op_tlv_status_set(op_tlv, 0);
+	mlxsw_emad_op_tlv_register_id_set(op_tlv, reg->id);
+	mlxsw_emad_op_tlv_r_set(op_tlv, MLXSW_EMAD_OP_TLV_REQUEST);
+	if (MLXSW_CORE_REG_ACCESS_TYPE_QUERY == type)
+		mlxsw_emad_op_tlv_method_set(op_tlv,
+					     MLXSW_EMAD_OP_TLV_METHOD_QUERY);
+	else
+		mlxsw_emad_op_tlv_method_set(op_tlv,
+					     MLXSW_EMAD_OP_TLV_METHOD_WRITE);
+	mlxsw_emad_op_tlv_class_set(op_tlv,
+				    MLXSW_EMAD_OP_TLV_CLASS_REG_ACCESS);
+	mlxsw_emad_op_tlv_tid_set(op_tlv, mlxsw_core->emad.tid);
+}
+
+static int mlxsw_emad_construct_eth_hdr(struct sk_buff *skb)
+{
+	char *eth_hdr = skb_push(skb, MLXSW_EMAD_ETH_HDR_LEN);
+
+	mlxsw_emad_eth_hdr_dmac_memcpy_to(eth_hdr, MLXSW_EMAD_EH_DMAC);
+	mlxsw_emad_eth_hdr_smac_memcpy_to(eth_hdr, MLXSW_EMAD_EH_SMAC);
+	mlxsw_emad_eth_hdr_ethertype_set(eth_hdr, MLXSW_EMAD_EH_ETHERTYPE);
+	mlxsw_emad_eth_hdr_mlx_proto_set(eth_hdr, MLXSW_EMAD_EH_MLX_PROTO);
+	mlxsw_emad_eth_hdr_ver_set(eth_hdr, MLXSW_EMAD_EH_PROTO_VERSION);
+
+	skb_reset_mac_header(skb);
+
+	return 0;
+}
+
+static void mlxsw_emad_construct(struct sk_buff *skb,
+				 const struct mlxsw_reg_info *reg,
+				 char *payload,
+				 enum mlxsw_core_reg_access_type type,
+				 struct mlxsw_core *mlxsw_core)
+{
+	char *buf;
+
+	buf = skb_push(skb, MLXSW_EMAD_END_TLV_LEN * sizeof(u32));
+	mlxsw_emad_pack_end_tlv(buf);
+
+	buf = skb_push(skb, reg->len + sizeof(u32));
+	mlxsw_emad_pack_reg_tlv(buf, reg, payload);
+
+	buf = skb_push(skb, MLXSW_EMAD_OP_TLV_LEN * sizeof(u32));
+	mlxsw_emad_pack_op_tlv(buf, reg, type, mlxsw_core);
+
+	mlxsw_emad_construct_eth_hdr(skb);
+}
+
+static char *mlxsw_emad_op_tlv(const struct sk_buff *skb)
+{
+	return ((char *) (skb->data + MLXSW_EMAD_ETH_HDR_LEN));
+}
+
+static char *mlxsw_emad_reg_tlv(const struct sk_buff *skb)
+{
+	return ((char *) (skb->data + MLXSW_EMAD_ETH_HDR_LEN +
+				      MLXSW_EMAD_OP_TLV_LEN * sizeof(u32)));
+}
+
+static char *mlxsw_emad_reg_payload(const char *op_tlv)
+{
+	return ((char *) (op_tlv + (MLXSW_EMAD_OP_TLV_LEN + 1) * sizeof(u32)));
+}
+
+static u64 mlxsw_emad_get_tid(const struct sk_buff *skb)
+{
+	char *op_tlv;
+
+	op_tlv = mlxsw_emad_op_tlv(skb);
+	return mlxsw_emad_op_tlv_tid_get(op_tlv);
+}
+
+static bool mlxsw_emad_is_resp(const struct sk_buff *skb)
+{
+	char *op_tlv;
+
+	op_tlv = mlxsw_emad_op_tlv(skb);
+	return (MLXSW_EMAD_OP_TLV_RESPONSE == mlxsw_emad_op_tlv_r_get(op_tlv));
+}
+
+#define MLXSW_EMAD_TIMEOUT_MS 200
+
+static int __mlxsw_emad_transmit(struct mlxsw_core *mlxsw_core,
+				 struct sk_buff *skb,
+				 const struct mlxsw_tx_info *tx_info)
+{
+	int err;
+	int ret;
+
+	err = mlxsw_core_skb_transmit(mlxsw_core->driver_priv, skb, tx_info);
+	if (err) {
+		dev_warn(mlxsw_core->bus_info->dev, "Failed to transmit EMAD (tid=%llx)\n",
+			 mlxsw_core->emad.tid);
+		dev_kfree_skb(skb);
+		return err;
+	}
+
+	mlxsw_core->emad.trans_active = true;
+	ret = wait_event_timeout(mlxsw_core->emad.wait,
+				 !(mlxsw_core->emad.trans_active),
+				 msecs_to_jiffies(MLXSW_EMAD_TIMEOUT_MS));
+	if (!ret) {
+		dev_warn(mlxsw_core->bus_info->dev, "EMAD timed-out (tid=%llx)\n",
+			 mlxsw_core->emad.tid);
+		mlxsw_core->emad.trans_active = false;
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static int mlxsw_emad_process_status(struct mlxsw_core *mlxsw_core,
+				     char *op_tlv)
+{
+	enum mlxsw_emad_op_tlv_status status;
+	u64 tid;
+
+	status = mlxsw_emad_op_tlv_status_get(op_tlv);
+	tid = mlxsw_emad_op_tlv_tid_get(op_tlv);
+
+	switch (status) {
+	case MLXSW_EMAD_OP_TLV_STATUS_SUCCESS:
+		return 0;
+	case MLXSW_EMAD_OP_TLV_STATUS_BUSY:
+	case MLXSW_EMAD_OP_TLV_STATUS_MESSAGE_RECEIPT_ACK:
+		dev_warn(mlxsw_core->bus_info->dev, "Reg access status again (tid=%llx,status=%x(%s))\n",
+			 tid, status, mlxsw_emad_op_tlv_status_str(status));
+		return -EAGAIN;
+	case MLXSW_EMAD_OP_TLV_STATUS_VERSION_NOT_SUPPORTED:
+	case MLXSW_EMAD_OP_TLV_STATUS_UNKNOWN_TLV:
+	case MLXSW_EMAD_OP_TLV_STATUS_REGISTER_NOT_SUPPORTED:
+	case MLXSW_EMAD_OP_TLV_STATUS_CLASS_NOT_SUPPORTED:
+	case MLXSW_EMAD_OP_TLV_STATUS_METHOD_NOT_SUPPORTED:
+	case MLXSW_EMAD_OP_TLV_STATUS_BAD_PARAMETER:
+	case MLXSW_EMAD_OP_TLV_STATUS_RESOURCE_NOT_AVAILABLE:
+	case MLXSW_EMAD_OP_TLV_STATUS_INTERNAL_ERROR:
+	default:
+		dev_err(mlxsw_core->bus_info->dev, "Reg access status failed (tid=%llx,status=%x(%s))\n",
+			tid, status, mlxsw_emad_op_tlv_status_str(status));
+		return -EIO;
+	}
+}
+
+static int mlxsw_emad_process_status_skb(struct mlxsw_core *mlxsw_core,
+					 struct sk_buff *skb)
+{
+	return mlxsw_emad_process_status(mlxsw_core, mlxsw_emad_op_tlv(skb));
+}
+
+static int mlxsw_emad_transmit(struct mlxsw_core *mlxsw_core,
+			       struct sk_buff *skb,
+			       const struct mlxsw_tx_info *tx_info)
+{
+	struct sk_buff *trans_skb;
+	int n_retry;
+	int err;
+
+	n_retry = 0;
+retry:
+	/* We copy the EMAD to a new skb, since we might need
+	 * to retransmit it in case of failure.
+	 */
+	trans_skb = skb_copy(skb, GFP_KERNEL);
+	if (!trans_skb) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	err = __mlxsw_emad_transmit(mlxsw_core, trans_skb, tx_info);
+	if (!err) {
+		struct sk_buff *resp_skb = mlxsw_core->emad.resp_skb;
+
+		err = mlxsw_emad_process_status_skb(mlxsw_core, resp_skb);
+		if (err)
+			dev_kfree_skb(resp_skb);
+		if (!err || err != -EAGAIN)
+			goto out;
+	}
+	if (n_retry++ < MLXSW_EMAD_MAX_RETRY)
+		goto retry;
+
+out:
+	dev_kfree_skb(skb);
+	mlxsw_core->emad.tid++;
+	return err;
+}
+
+static void mlxsw_emad_rx_listener_func(struct sk_buff *skb, u8 local_port,
+					u16 trap_id, void *priv)
+{
+	struct mlxsw_core *mlxsw_core = priv;
+
+	if (mlxsw_emad_is_resp(skb) &&
+	    mlxsw_core->emad.trans_active &&
+	    mlxsw_emad_get_tid(skb) == mlxsw_core->emad.tid) {
+		mlxsw_core->emad.resp_skb = skb;
+		mlxsw_core->emad.trans_active = false;
+		wake_up(&mlxsw_core->emad.wait);
+	} else {
+		dev_kfree_skb(skb);
+	}
+}
+
+static const struct mlxsw_rx_listener mlxsw_emad_rx_listener = {
+	.func = mlxsw_emad_rx_listener_func,
+	.local_port = MLXSW_PORT_MAX_PORTS,
+	.trap_id = MLXSW_TRAP_ID_ETHEMAD,
+};
+
+static int mlxsw_emad_traps_set(struct mlxsw_core *mlxsw_core)
+{
+	char htgt_pl[MLXSW_REG_HTGT_LEN];
+	char hpkt_pl[MLXSW_REG_HPKT_LEN];
+	int err;
+
+	mlxsw_reg_htgt_pack(htgt_pl, MLXSW_REG_HTGT_TRAP_GROUP_EMAD);
+	err = mlxsw_reg_write(mlxsw_core, MLXSW_REG(htgt), htgt_pl);
+	if (err)
+		return err;
+
+	mlxsw_reg_hpkt_pack(hpkt_pl, MLXSW_REG_HPKT_ACTION_TRAP_TO_CPU,
+			    MLXSW_REG_HTGT_TRAP_GROUP_EMAD,
+			    MLXSW_TRAP_ID_ETHEMAD);
+	return mlxsw_reg_write(mlxsw_core, MLXSW_REG(hpkt), hpkt_pl);
+}
+
+static int mlxsw_emad_init(struct mlxsw_core *mlxsw_core)
+{
+	int err;
+
+	/* Set the upper 32 bits of the transaction ID field to a random
+	 * number. This allows us to discard EMADs addressed to other
+	 * devices.
+	 */
+	get_random_bytes(&mlxsw_core->emad.tid, 4);
+	mlxsw_core->emad.tid = mlxsw_core->emad.tid << 32;
+
+	init_waitqueue_head(&mlxsw_core->emad.wait);
+	mlxsw_core->emad.trans_active = false;
+	mutex_init(&mlxsw_core->emad.lock);
+
+	err = mlxsw_core_rx_listener_register(mlxsw_core,
+					      &mlxsw_emad_rx_listener,
+					      mlxsw_core);
+	if (err)
+		return err;
+
+	err = mlxsw_emad_traps_set(mlxsw_core);
+	if (err)
+		goto err_emad_trap_set;
+
+	mlxsw_core->emad.use_emad = true;
+
+	return 0;
+
+err_emad_trap_set:
+	mlxsw_core_rx_listener_unregister(mlxsw_core,
+					  &mlxsw_emad_rx_listener,
+					  mlxsw_core);
+	return err;
+}
+
+static void mlxsw_emad_fini(struct mlxsw_core *mlxsw_core)
+{
+	char hpkt_pl[MLXSW_REG_HPKT_LEN];
+
+	mlxsw_reg_hpkt_pack(hpkt_pl, MLXSW_REG_HPKT_ACTION_DISCARD,
+			    MLXSW_REG_HTGT_TRAP_GROUP_EMAD,
+			    MLXSW_TRAP_ID_ETHEMAD);
+	mlxsw_reg_write(mlxsw_core, MLXSW_REG(hpkt), hpkt_pl);
+
+	mlxsw_core_rx_listener_unregister(mlxsw_core,
+					  &mlxsw_emad_rx_listener,
+					  mlxsw_core);
+}
+
+static struct sk_buff *mlxsw_emad_alloc(const struct mlxsw_core *mlxsw_core,
+					u16 reg_len)
+{
+	struct sk_buff *skb;
+	u16 emad_len;
+
+	emad_len = (reg_len + sizeof(u32) + MLXSW_EMAD_ETH_HDR_LEN +
+		    (MLXSW_EMAD_OP_TLV_LEN + MLXSW_EMAD_END_TLV_LEN) *
+		    sizeof(u32) + mlxsw_core->driver->txhdr_len);
+	if (emad_len > MLXSW_EMAD_MAX_FRAME_LEN)
+		return NULL;
+
+	skb = netdev_alloc_skb(NULL, emad_len);
+	if (!skb)
+		return NULL;
+	memset(skb->data, 0, emad_len);
+	skb_reserve(skb, emad_len);
+
+	return skb;
+}
+
 /*****************
  * Core functions
  *****************/
@@ -307,6 +798,7 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 	}
 
 	INIT_LIST_HEAD(&mlxsw_core->rx_listener_list);
+	INIT_LIST_HEAD(&mlxsw_core->event_listener_list);
 	mlxsw_core->driver = mlxsw_driver;
 	mlxsw_core->bus = mlxsw_bus;
 	mlxsw_core->bus_priv = bus_priv;
@@ -318,10 +810,15 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 		err = -ENOMEM;
 		goto err_alloc_stats;
 	}
+
 	err = mlxsw_bus->init(bus_priv, mlxsw_core, mlxsw_driver->profile);
 	if (err)
 		goto err_bus_init;
 
+	err = mlxsw_emad_init(mlxsw_core);
+	if (err)
+		goto err_emad_init;
+
 	err = mlxsw_driver->init(mlxsw_core->driver_priv, mlxsw_core,
 				 mlxsw_bus_info);
 	if (err)
@@ -336,6 +833,8 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 err_debugfs_init:
 	mlxsw_core->driver->fini(mlxsw_core->driver_priv);
 err_driver_init:
+	mlxsw_emad_fini(mlxsw_core);
+err_emad_init:
 	mlxsw_bus->fini(bus_priv);
 err_bus_init:
 	free_percpu(mlxsw_core->pcpu_stats);
@@ -353,6 +852,7 @@ void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core)
 
 	mlxsw_core_debugfs_fini(mlxsw_core);
 	mlxsw_core->driver->fini(mlxsw_core->driver_priv);
+	mlxsw_emad_fini(mlxsw_core);
 	mlxsw_core->bus->fini(mlxsw_core->bus_priv);
 	free_percpu(mlxsw_core->pcpu_stats);
 	kfree(mlxsw_core);
@@ -433,6 +933,242 @@ void mlxsw_core_rx_listener_unregister(struct mlxsw_core *mlxsw_core,
 }
 EXPORT_SYMBOL(mlxsw_core_rx_listener_unregister);
 
+static void mlxsw_core_event_listener_func(struct sk_buff *skb, u8 local_port,
+					   u16 trap_id, void *priv)
+{
+	struct mlxsw_event_listener_item *event_listener_item = priv;
+	struct mlxsw_reg_info reg;
+	char *payload;
+	char *op_tlv = mlxsw_emad_op_tlv(skb);
+	char *reg_tlv = mlxsw_emad_reg_tlv(skb);
+
+	reg.id = mlxsw_emad_op_tlv_register_id_get(op_tlv);
+	reg.len = (mlxsw_emad_reg_tlv_len_get(reg_tlv) - 1) * sizeof(u32);
+	payload = mlxsw_emad_reg_payload(op_tlv);
+	event_listener_item->el.func(&reg, payload, event_listener_item->priv);
+	dev_kfree_skb(skb);
+}
+
+static bool __is_event_listener_equal(const struct mlxsw_event_listener *el_a,
+				      const struct mlxsw_event_listener *el_b)
+{
+	return (el_a->func == el_b->func &&
+		el_a->trap_id == el_b->trap_id);
+}
+
+static struct mlxsw_event_listener_item *
+__find_event_listener_item(struct mlxsw_core *mlxsw_core,
+			   const struct mlxsw_event_listener *el,
+			   void *priv)
+{
+	struct mlxsw_event_listener_item *el_item;
+
+	list_for_each_entry(el_item, &mlxsw_core->event_listener_list, list) {
+		if (__is_event_listener_equal(&el_item->el, el) &&
+		    el_item->priv == priv)
+			return el_item;
+	}
+	return NULL;
+}
+
+int mlxsw_core_event_listener_register(struct mlxsw_core *mlxsw_core,
+				       const struct mlxsw_event_listener *el,
+				       void *priv)
+{
+	int err;
+	struct mlxsw_event_listener_item *el_item;
+	const struct mlxsw_rx_listener rxl = {
+		.func = mlxsw_core_event_listener_func,
+		.local_port = MLXSW_PORT_MAX_PORTS,
+		.trap_id = el->trap_id,
+	};
+
+	el_item = __find_event_listener_item(mlxsw_core, el, priv);
+	if (el_item)
+		return -EEXIST;
+	el_item = kmalloc(sizeof(*el_item), GFP_KERNEL);
+	if (!el_item)
+		return -ENOMEM;
+	el_item->el = *el;
+	el_item->priv = priv;
+
+	err = mlxsw_core_rx_listener_register(mlxsw_core, &rxl, el_item);
+	if (err)
+		goto err_rx_listener_register;
+
+	/* No reason to save item if we did not manage to register an RX
+	 * listener for it.
+	 */
+	list_add_rcu(&el_item->list, &mlxsw_core->event_listener_list);
+
+	return 0;
+
+err_rx_listener_register:
+	kfree(el_item);
+	return err;
+}
+EXPORT_SYMBOL(mlxsw_core_event_listener_register);
+
+void mlxsw_core_event_listener_unregister(struct mlxsw_core *mlxsw_core,
+					  const struct mlxsw_event_listener *el,
+					  void *priv)
+{
+	struct mlxsw_event_listener_item *el_item;
+	const struct mlxsw_rx_listener rxl = {
+		.func = mlxsw_core_event_listener_func,
+		.local_port = MLXSW_PORT_MAX_PORTS,
+		.trap_id = el->trap_id,
+	};
+
+	el_item = __find_event_listener_item(mlxsw_core, el, priv);
+	if (!el_item)
+		return;
+	mlxsw_core_rx_listener_unregister(mlxsw_core, &rxl, el_item);
+	list_del(&el_item->list);
+	kfree(el_item);
+}
+EXPORT_SYMBOL(mlxsw_core_event_listener_unregister);
+
+static int mlxsw_core_reg_access_emad(struct mlxsw_core *mlxsw_core,
+				      const struct mlxsw_reg_info *reg,
+				      char *payload,
+				      enum mlxsw_core_reg_access_type type)
+{
+	int err;
+	char *op_tlv;
+	struct sk_buff *skb;
+	struct mlxsw_tx_info tx_info = {
+		.local_port = MLXSW_PORT_CPU_PORT,
+		.is_emad = true,
+	};
+
+	skb = mlxsw_emad_alloc(mlxsw_core, reg->len);
+	if (!skb)
+		return -ENOMEM;
+
+	mlxsw_emad_construct(skb, reg, payload, type, mlxsw_core);
+	mlxsw_core->driver->txhdr_construct(skb, &tx_info);
+
+	dev_dbg(mlxsw_core->bus_info->dev, "EMAD send (tid=%llx)\n",
+		mlxsw_core->emad.tid);
+	mlxsw_core_buf_dump_dbg(mlxsw_core, skb->data, skb->len);
+
+	err = mlxsw_emad_transmit(mlxsw_core, skb, &tx_info);
+	if (!err) {
+		op_tlv = mlxsw_emad_op_tlv(mlxsw_core->emad.resp_skb);
+		memcpy(payload, mlxsw_emad_reg_payload(op_tlv),
+		       reg->len);
+
+		dev_dbg(mlxsw_core->bus_info->dev, "EMAD recv (tid=%llx)\n",
+			mlxsw_core->emad.tid - 1);
+		mlxsw_core_buf_dump_dbg(mlxsw_core,
+					mlxsw_core->emad.resp_skb->data,
+					skb->len);
+
+		dev_kfree_skb(mlxsw_core->emad.resp_skb);
+	}
+
+	return err;
+}
+
+static int mlxsw_core_reg_access_cmd(struct mlxsw_core *mlxsw_core,
+				     const struct mlxsw_reg_info *reg,
+				     char *payload,
+				     enum mlxsw_core_reg_access_type type)
+{
+	int err, n_retry;
+	char *in_mbox, *out_mbox, *tmp;
+
+	in_mbox = mlxsw_cmd_mbox_alloc();
+	if (!in_mbox)
+		return -ENOMEM;
+
+	out_mbox = mlxsw_cmd_mbox_alloc();
+	if (!out_mbox) {
+		err = -ENOMEM;
+		goto free_in_mbox;
+	}
+
+	mlxsw_emad_pack_op_tlv(in_mbox, reg, type, mlxsw_core);
+	tmp = in_mbox + MLXSW_EMAD_OP_TLV_LEN * sizeof(u32);
+	mlxsw_emad_pack_reg_tlv(tmp, reg, payload);
+
+	n_retry = 0;
+retry:
+	err = mlxsw_cmd_access_reg(mlxsw_core, in_mbox, out_mbox);
+	if (!err) {
+		err = mlxsw_emad_process_status(mlxsw_core, out_mbox);
+		if (err == -EAGAIN && n_retry++ < MLXSW_EMAD_MAX_RETRY)
+			goto retry;
+	}
+
+	if (!err)
+		memcpy(payload, mlxsw_emad_reg_payload(out_mbox),
+		       reg->len);
+
+	mlxsw_core->emad.tid++;
+	mlxsw_cmd_mbox_free(out_mbox);
+free_in_mbox:
+	mlxsw_cmd_mbox_free(in_mbox);
+	return err;
+}
+
+static int mlxsw_core_reg_access(struct mlxsw_core *mlxsw_core,
+				 const struct mlxsw_reg_info *reg,
+				 char *payload,
+				 enum mlxsw_core_reg_access_type type)
+{
+	u64 cur_tid;
+	int err;
+
+	if (mutex_lock_interruptible(&mlxsw_core->emad.lock)) {
+		dev_err(mlxsw_core->bus_info->dev, "Reg access interrupted (reg_id=%x(%s),type=%s)\n",
+			reg->id, mlxsw_reg_id_str(reg->id),
+			mlxsw_core_reg_access_type_str(type));
+		return -EINTR;
+	}
+
+	cur_tid = mlxsw_core->emad.tid;
+	dev_dbg(mlxsw_core->bus_info->dev, "Reg access (tid=%llx,reg_id=%x(%s),type=%s)\n",
+		cur_tid, reg->id, mlxsw_reg_id_str(reg->id),
+		mlxsw_core_reg_access_type_str(type));
+
+	/* During initialization EMAD interface is not available to us,
+	 * so we default to command interface. We switch to EMAD interface
+	 * after setting the appropriate traps.
+	 */
+	if (!mlxsw_core->emad.use_emad)
+		err = mlxsw_core_reg_access_cmd(mlxsw_core, reg,
+						payload, type);
+	else
+		err = mlxsw_core_reg_access_emad(mlxsw_core, reg,
+						 payload, type);
+
+	if (err)
+		dev_err(mlxsw_core->bus_info->dev, "Reg access failed (tid=%llx,reg_id=%x(%s),type=%s)\n",
+			cur_tid, reg->id, mlxsw_reg_id_str(reg->id),
+			mlxsw_core_reg_access_type_str(type));
+
+	mutex_unlock(&mlxsw_core->emad.lock);
+	return err;
+}
+
+int mlxsw_reg_query(struct mlxsw_core *mlxsw_core,
+		    const struct mlxsw_reg_info *reg, char *payload)
+{
+	return mlxsw_core_reg_access(mlxsw_core, reg, payload,
+				     MLXSW_CORE_REG_ACCESS_TYPE_QUERY);
+}
+EXPORT_SYMBOL(mlxsw_reg_query);
+
+int mlxsw_reg_write(struct mlxsw_core *mlxsw_core,
+		    const struct mlxsw_reg_info *reg, char *payload)
+{
+	return mlxsw_core_reg_access(mlxsw_core, reg, payload,
+				     MLXSW_CORE_REG_ACCESS_TYPE_WRITE);
+}
+EXPORT_SYMBOL(mlxsw_reg_write);
+
 void mlxsw_core_skb_receive(struct mlxsw_core *mlxsw_core, struct sk_buff *skb,
 			    struct mlxsw_rx_info *rx_info)
 {
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index 8033408..b6c11d2 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -44,6 +44,9 @@
 #include <linux/types.h>
 #include <linux/skbuff.h>
 
+#include "trap.h"
+#include "reg.h"
+
 #include "cmd.h"
 
 #define MLXSW_MODULE_ALIAS_PREFIX "mlxsw-driver-"
@@ -78,6 +81,12 @@ struct mlxsw_rx_listener {
 	u16 trap_id;
 };
 
+struct mlxsw_event_listener {
+	void (*func)(const struct mlxsw_reg_info *reg,
+		     char *payload, void *priv);
+	enum mlxsw_event_trap_id trap_id;
+};
+
 int mlxsw_core_rx_listener_register(struct mlxsw_core *mlxsw_core,
 				    const struct mlxsw_rx_listener *rxl,
 				    void *priv);
@@ -85,6 +94,18 @@ void mlxsw_core_rx_listener_unregister(struct mlxsw_core *mlxsw_core,
 				       const struct mlxsw_rx_listener *rxl,
 				       void *priv);
 
+int mlxsw_core_event_listener_register(struct mlxsw_core *mlxsw_core,
+				       const struct mlxsw_event_listener *el,
+				       void *priv);
+void mlxsw_core_event_listener_unregister(struct mlxsw_core *mlxsw_core,
+					  const struct mlxsw_event_listener *el,
+					  void *priv);
+
+int mlxsw_reg_query(struct mlxsw_core *mlxsw_core,
+		    const struct mlxsw_reg_info *reg, char *payload);
+int mlxsw_reg_write(struct mlxsw_core *mlxsw_core,
+		    const struct mlxsw_reg_info *reg, char *payload);
+
 struct mlxsw_rx_info {
 	u16 sys_port;
 	int trap_id;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/emad.h b/drivers/net/ethernet/mellanox/mlxsw/emad.h
new file mode 100644
index 0000000..97b6bb5
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/emad.h
@@ -0,0 +1,127 @@
+/*
+ * drivers/net/ethernet/mellanox/mlxsw/emad.h
+ * Copyright (c) 2015 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2015 Ido Schimmel <idosch@mellanox.com>
+ * Copyright (c) 2015 Jiri Pirko <jiri@mellanox.com>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the names of the copyright holders nor the names of its
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _MLXSW_EMAD_H
+#define _MLXSW_EMAD_H
+
+#define MLXSW_EMAD_MAX_FRAME_LEN 1518	/* Length in u8 */
+#define MLXSW_EMAD_MAX_RETRY 5
+
+/* EMAD Ethernet header */
+#define MLXSW_EMAD_ETH_HDR_LEN 0x10	/* Length in u8 */
+#define MLXSW_EMAD_EH_DMAC "\x01\x02\xc9\x00\x00\x01"
+#define MLXSW_EMAD_EH_SMAC "\x00\x02\xc9\x01\x02\x03"
+#define MLXSW_EMAD_EH_ETHERTYPE 0x8932
+#define MLXSW_EMAD_EH_MLX_PROTO 0
+#define MLXSW_EMAD_EH_PROTO_VERSION 0
+
+/* EMAD TLV Types */
+enum {
+	MLXSW_EMAD_TLV_TYPE_END,
+	MLXSW_EMAD_TLV_TYPE_OP,
+	MLXSW_EMAD_TLV_TYPE_DR,
+	MLXSW_EMAD_TLV_TYPE_REG,
+	MLXSW_EMAD_TLV_TYPE_USERDATA,
+	MLXSW_EMAD_TLV_TYPE_OOBETH,
+};
+
+/* OP TLV */
+#define MLXSW_EMAD_OP_TLV_LEN 4		/* Length in u32 */
+
+enum {
+	MLXSW_EMAD_OP_TLV_CLASS_REG_ACCESS = 1,
+	MLXSW_EMAD_OP_TLV_CLASS_IPC = 2,
+};
+
+enum mlxsw_emad_op_tlv_status {
+	MLXSW_EMAD_OP_TLV_STATUS_SUCCESS,
+	MLXSW_EMAD_OP_TLV_STATUS_BUSY,
+	MLXSW_EMAD_OP_TLV_STATUS_VERSION_NOT_SUPPORTED,
+	MLXSW_EMAD_OP_TLV_STATUS_UNKNOWN_TLV,
+	MLXSW_EMAD_OP_TLV_STATUS_REGISTER_NOT_SUPPORTED,
+	MLXSW_EMAD_OP_TLV_STATUS_CLASS_NOT_SUPPORTED,
+	MLXSW_EMAD_OP_TLV_STATUS_METHOD_NOT_SUPPORTED,
+	MLXSW_EMAD_OP_TLV_STATUS_BAD_PARAMETER,
+	MLXSW_EMAD_OP_TLV_STATUS_RESOURCE_NOT_AVAILABLE,
+	MLXSW_EMAD_OP_TLV_STATUS_MESSAGE_RECEIPT_ACK,
+	MLXSW_EMAD_OP_TLV_STATUS_INTERNAL_ERROR = 0x70,
+};
+
+static inline char *mlxsw_emad_op_tlv_status_str(u8 status)
+{
+	switch (status) {
+	case MLXSW_EMAD_OP_TLV_STATUS_SUCCESS:
+		return "operation performed";
+	case MLXSW_EMAD_OP_TLV_STATUS_BUSY:
+		return "device is busy";
+	case MLXSW_EMAD_OP_TLV_STATUS_VERSION_NOT_SUPPORTED:
+		return "version not supported";
+	case MLXSW_EMAD_OP_TLV_STATUS_UNKNOWN_TLV:
+		return "unknown TLV";
+	case MLXSW_EMAD_OP_TLV_STATUS_REGISTER_NOT_SUPPORTED:
+		return "register not supported";
+	case MLXSW_EMAD_OP_TLV_STATUS_CLASS_NOT_SUPPORTED:
+		return "class not supported";
+	case MLXSW_EMAD_OP_TLV_STATUS_METHOD_NOT_SUPPORTED:
+		return "method not supported";
+	case MLXSW_EMAD_OP_TLV_STATUS_BAD_PARAMETER:
+		return "bad parameter";
+	case MLXSW_EMAD_OP_TLV_STATUS_RESOURCE_NOT_AVAILABLE:
+		return "resource not available";
+	case MLXSW_EMAD_OP_TLV_STATUS_MESSAGE_RECEIPT_ACK:
+		return "acknowledged. retransmit";
+	case MLXSW_EMAD_OP_TLV_STATUS_INTERNAL_ERROR:
+		return "internal error";
+	default:
+		return "*UNKNOWN*";
+	}
+}
+
+enum {
+	MLXSW_EMAD_OP_TLV_REQUEST,
+	MLXSW_EMAD_OP_TLV_RESPONSE
+};
+
+enum {
+	MLXSW_EMAD_OP_TLV_METHOD_QUERY = 1,
+	MLXSW_EMAD_OP_TLV_METHOD_WRITE = 2,
+	MLXSW_EMAD_OP_TLV_METHOD_SEND = 3,
+	MLXSW_EMAD_OP_TLV_METHOD_EVENT = 5,
+};
+
+/* END TLV */
+#define MLXSW_EMAD_END_TLV_LEN 1	/* Length in u32 */
+
+#endif
diff --git a/drivers/net/ethernet/mellanox/mlxsw/port.h b/drivers/net/ethernet/mellanox/mlxsw/port.h
index 7d66c92..e626231 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/port.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/port.h
@@ -42,9 +42,28 @@
 
 #define MLXSW_PORT_DEFAULT_VID		1
 
+#define MLXSW_PORT_SWID_DISABLED_PORT	255
+#define MLXSW_PORT_SWID_ALL_SWIDS	254
+#define MLXSW_PORT_SWID_TYPE_ETH	2
+
+#define MLXSW_PORT_MID			0xd000
+
 #define MLXSW_PORT_MAX_PHY_PORTS	0x40
 #define MLXSW_PORT_MAX_PORTS		MLXSW_PORT_MAX_PHY_PORTS
 
 #define MLXSW_PORT_CPU_PORT		0x0
 
+enum mlxsw_port_admin_status {
+	MLXSW_PORT_ADMIN_STATUS_UP = 1,
+	MLXSW_PORT_ADMIN_STATUS_DOWN = 2,
+	MLXSW_PORT_ADMIN_STATUS_UP_ONCE = 3,
+	MLXSW_PORT_ADMIN_STATUS_DISABLED = 4,
+};
+
+enum mlxsw_reg_pude_oper_status {
+	MLXSW_PORT_OPER_STATUS_UP = 1,
+	MLXSW_PORT_OPER_STATUS_DOWN = 2,
+	MLXSW_PORT_OPER_STATUS_FAILURE = 4,	/* Can be set to up again. */
+};
+
 #endif /* _MLXSW_PORT_H */
diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h
new file mode 100644
index 0000000..b5a72f8
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -0,0 +1,1289 @@
+/*
+ * drivers/net/ethernet/mellanox/mlxsw/reg.h
+ * Copyright (c) 2015 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2015 Ido Schimmel <idosch@mellanox.com>
+ * Copyright (c) 2015 Elad Raz <eladr@mellanox.com>
+ * Copyright (c) 2015 Jiri Pirko <jiri@mellanox.com>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the names of the copyright holders nor the names of its
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _MLXSW_REG_H
+#define _MLXSW_REG_H
+
+#include <linux/string.h>
+#include <linux/bitops.h>
+#include <linux/if_vlan.h>
+
+#include "item.h"
+#include "port.h"
+
+struct mlxsw_reg_info {
+	u16 id;
+	u16 len; /* In u8 */
+};
+
+#define MLXSW_REG(type) (&mlxsw_reg_##type)
+#define MLXSW_REG_LEN(type) MLXSW_REG(type)->len
+#define MLXSW_REG_ZERO(type, payload) memset(payload, 0, MLXSW_REG(type)->len)
+
+/* SGCR - Switch General Configuration Register
+ * --------------------------------------------
+ * This register is used for configuration of the switch capabilities.
+ */
+#define MLXSW_REG_SGCR_ID 0x2000
+#define MLXSW_REG_SGCR_LEN 0x10
+
+static const struct mlxsw_reg_info mlxsw_reg_sgcr = {
+	.id = MLXSW_REG_SGCR_ID,
+	.len = MLXSW_REG_SGCR_LEN,
+};
+
+/* reg_sgcr_llb
+ * Link Local Broadcast (Default=0)
+ * When set, all Link Local packets (224.0.0.X) will be treated as broadcast
+ * packets and ignore the IGMP snooping entries.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, sgcr, llb, 0x04, 0, 1);
+
+static inline void mlxsw_reg_sgcr_pack(char *payload, bool llb)
+{
+	MLXSW_REG_ZERO(sgcr, payload);
+	mlxsw_reg_sgcr_llb_set(payload, !!llb);
+}
+
+/* SPAD - Switch Physical Address Register
+ * ---------------------------------------
+ * The SPAD register configures the switch physical MAC address.
+ */
+#define MLXSW_REG_SPAD_ID 0x2002
+#define MLXSW_REG_SPAD_LEN 0x10
+
+static const struct mlxsw_reg_info mlxsw_reg_spad = {
+	.id = MLXSW_REG_SPAD_ID,
+	.len = MLXSW_REG_SPAD_LEN,
+};
+
+/* reg_spad_base_mac
+ * Base MAC address for the switch partitions.
+ * Per switch partition MAC address is equal to:
+ * base_mac + swid
+ * Access: RW
+ */
+MLXSW_ITEM_BUF(reg, spad, base_mac, 0x02, 6);
+
+/* SMID - Switch Multicast ID
+ * --------------------------
+ * In multi-chip configuration, each device should maintain mapping between
+ * Multicast ID (MID) into a list of local ports. This mapping is used in all
+ * the devices other than the ingress device, and is implemented as part of the
+ * FDB. The MID record maps from a MID, which is a unique identi- fier of the
+ * multicast group within the stacking domain, into a list of local ports into
+ * which the packet is replicated.
+ */
+#define MLXSW_REG_SMID_ID 0x2007
+#define MLXSW_REG_SMID_LEN 0x420
+
+static const struct mlxsw_reg_info mlxsw_reg_smid = {
+	.id = MLXSW_REG_SMID_ID,
+	.len = MLXSW_REG_SMID_LEN,
+};
+
+/* reg_smid_swid
+ * Switch partition ID.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, smid, swid, 0x00, 24, 8);
+
+/* reg_smid_mid
+ * Multicast identifier - global identifier that represents the multicast group
+ * across all devices
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, smid, mid, 0x00, 0, 16);
+
+/* reg_smid_port
+ * Local port memebership (1 bit per port).
+ * Access: RW
+ */
+MLXSW_ITEM_BIT_ARRAY(reg, smid, port, 0x20, 0x20, 1);
+
+/* reg_smid_port_mask
+ * Local port mask (1 bit per port).
+ * Access: W
+ */
+MLXSW_ITEM_BIT_ARRAY(reg, smid, port_mask, 0x220, 0x20, 1);
+
+static inline void mlxsw_reg_smid_pack(char *payload, u16 mid)
+{
+	MLXSW_REG_ZERO(smid, payload);
+	mlxsw_reg_smid_swid_set(payload, 0);
+	mlxsw_reg_smid_mid_set(payload, mid);
+	mlxsw_reg_smid_port_set(payload, MLXSW_PORT_CPU_PORT, 1);
+	mlxsw_reg_smid_port_mask_set(payload, MLXSW_PORT_CPU_PORT, 1);
+}
+
+/* SPMS - Switch Port MSTP/RSTP State Register
+ * -------------------------------------------
+ * Configures the spanning tree state of a physical port.
+ */
+#define MLXSW_REG_SPMS_ID 0x200d
+#define MLXSW_REG_SPMS_LEN 0x404
+
+static const struct mlxsw_reg_info mlxsw_reg_spms = {
+	.id = MLXSW_REG_SPMS_ID,
+	.len = MLXSW_REG_SPMS_LEN,
+};
+
+/* reg_spms_local_port
+ * Local port number.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, spms, local_port, 0x00, 16, 8);
+
+enum mlxsw_reg_spms_state {
+	MLXSW_REG_SPMS_STATE_NO_CHANGE,
+	MLXSW_REG_SPMS_STATE_DISCARDING,
+	MLXSW_REG_SPMS_STATE_LEARNING,
+	MLXSW_REG_SPMS_STATE_FORWARDING,
+};
+
+/* reg_spms_state
+ * Spanning tree state of each VLAN ID (VID) of the local port.
+ * 0 - Do not change spanning tree state (used only when writing).
+ * 1 - Discarding. No learning or forwarding to/from this port (default).
+ * 2 - Learning. Port is learning, but not forwarding.
+ * 3 - Forwarding. Port is learning and forwarding.
+ * Access: RW
+ */
+MLXSW_ITEM_BIT_ARRAY(reg, spms, state, 0x04, 0x400, 2);
+
+static inline void mlxsw_reg_spms_pack(char *payload, u8 local_port, u16 vid,
+				       enum mlxsw_reg_spms_state state)
+{
+	MLXSW_REG_ZERO(spms, payload);
+	mlxsw_reg_spms_local_port_set(payload, local_port);
+	mlxsw_reg_spms_state_set(payload, vid, state);
+}
+
+/* SFGC - Switch Flooding Group Configuration
+ * ------------------------------------------
+ * The following register controls the association of flooding tables and MIDs
+ * to packet types used for flooding.
+ */
+#define MLXSW_REG_SFGC_ID  0x2011
+#define MLXSW_REG_SFGC_LEN 0x10
+
+static const struct mlxsw_reg_info mlxsw_reg_sfgc = {
+	.id = MLXSW_REG_SFGC_ID,
+	.len = MLXSW_REG_SFGC_LEN,
+};
+
+enum mlxsw_reg_sfgc_type {
+	MLXSW_REG_SFGC_TYPE_BROADCAST = 0,
+	MLXSW_REG_SFGC_TYPE_UNKNOWN_UNICAST = 1,
+	MLXSW_REG_SFGC_TYPE_UNREGISTERED_MULTICAST_IPV4 = 2,
+	MLXSW_REG_SFGC_TYPE_UNREGISTERED_MULTICAST_IPV6 = 3,
+	MLXSW_REG_SFGC_TYPE_UNREGISTERED_MULTICAST_NON_IP = 5,
+	MLXSW_REG_SFGC_TYPE_IPV4_LINK_LOCAL = 6,
+	MLXSW_REG_SFGC_TYPE_IPV6_ALL_HOST = 7,
+};
+
+/* reg_sfgc_type
+ * The traffic type to reach the flooding table.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, sfgc, type, 0x00, 0, 4);
+
+enum mlxsw_reg_sfgc_bridge_type {
+	MLXSW_REG_SFGC_BRIDGE_TYPE_1Q_FID = 0,
+	MLXSW_REG_SFGC_BRIDGE_TYPE_VFID = 1,
+};
+
+/* reg_sfgc_bridge_type
+ * Access: Index
+ *
+ * Note: SwitchX-2 only supports 802.1Q mode.
+ */
+MLXSW_ITEM32(reg, sfgc, bridge_type, 0x04, 24, 3);
+
+enum mlxsw_flood_table_type {
+	MLXSW_REG_SFGC_TABLE_TYPE_VID = 1,
+	MLXSW_REG_SFGC_TABLE_TYPE_SINGLE = 2,
+	MLXSW_REG_SFGC_TABLE_TYPE_ANY = 0,
+	MLXSW_REG_SFGC_TABLE_TYPE_FID_OFFEST = 3,
+	MLXSW_REG_SFGC_TABLE_TYPE_FID = 4,
+};
+
+/* reg_sfgc_table_type
+ * See mlxsw_flood_table_type
+ * Access: RW
+ *
+ * Note: FID offset and FID types are not supported in SwitchX-2.
+ */
+MLXSW_ITEM32(reg, sfgc, table_type, 0x04, 16, 3);
+
+/* reg_sfgc_flood_table
+ * Flooding table index to associate with the specific type on the specific
+ * switch partition.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, sfgc, flood_table, 0x04, 0, 6);
+
+/* reg_sfgc_mid
+ * The multicast ID for the swid. Not supported for Spectrum
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, sfgc, mid, 0x08, 0, 16);
+
+/* reg_sfgc_counter_set_type
+ * Counter Set Type for flow counters.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, sfgc, counter_set_type, 0x0C, 24, 8);
+
+/* reg_sfgc_counter_index
+ * Counter Index for flow counters.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, sfgc, counter_index, 0x0C, 0, 24);
+
+static inline void
+mlxsw_reg_sfgc_pack(char *payload, enum mlxsw_reg_sfgc_type type,
+		    enum mlxsw_reg_sfgc_bridge_type bridge_type,
+		    enum mlxsw_flood_table_type table_type,
+		    unsigned int flood_table)
+{
+	MLXSW_REG_ZERO(sfgc, payload);
+	mlxsw_reg_sfgc_type_set(payload, type);
+	mlxsw_reg_sfgc_bridge_type_set(payload, bridge_type);
+	mlxsw_reg_sfgc_table_type_set(payload, table_type);
+	mlxsw_reg_sfgc_flood_table_set(payload, flood_table);
+	mlxsw_reg_sfgc_mid_set(payload, MLXSW_PORT_MID);
+}
+
+/* SFTR - Switch Flooding Table Register
+ * -------------------------------------
+ * The switch flooding table is used for flooding packet replication. The table
+ * defines a bit mask of ports for packet replication.
+ */
+#define MLXSW_REG_SFTR_ID 0x2012
+#define MLXSW_REG_SFTR_LEN 0x420
+
+static const struct mlxsw_reg_info mlxsw_reg_sftr = {
+	.id = MLXSW_REG_SFTR_ID,
+	.len = MLXSW_REG_SFTR_LEN,
+};
+
+/* reg_sftr_swid
+ * Switch partition ID with which to associate the port.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, sftr, swid, 0x00, 24, 8);
+
+/* reg_sftr_flood_table
+ * Flooding table index to associate with the specific type on the specific
+ * switch partition.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, sftr, flood_table, 0x00, 16, 6);
+
+/* reg_sftr_index
+ * Index. Used as an index into the Flooding Table in case the table is
+ * configured to use VID / FID or FID Offset.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, sftr, index, 0x00, 0, 16);
+
+/* reg_sftr_table_type
+ * See mlxsw_flood_table_type
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, sftr, table_type, 0x04, 16, 3);
+
+/* reg_sftr_range
+ * Range of entries to update
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, sftr, range, 0x04, 0, 16);
+
+/* reg_sftr_port
+ * Local port membership (1 bit per port).
+ * Access: RW
+ */
+MLXSW_ITEM_BIT_ARRAY(reg, sftr, port, 0x20, 0x20, 1);
+
+/* reg_sftr_cpu_port_mask
+ * CPU port mask (1 bit per port).
+ * Access: W
+ */
+MLXSW_ITEM_BIT_ARRAY(reg, sftr, port_mask, 0x220, 0x20, 1);
+
+static inline void mlxsw_reg_sftr_pack(char *payload,
+				       unsigned int flood_table,
+				       unsigned int index,
+				       enum mlxsw_flood_table_type table_type,
+				       unsigned int range)
+{
+	MLXSW_REG_ZERO(sftr, payload);
+	mlxsw_reg_sftr_swid_set(payload, 0);
+	mlxsw_reg_sftr_flood_table_set(payload, flood_table);
+	mlxsw_reg_sftr_index_set(payload, index);
+	mlxsw_reg_sftr_table_type_set(payload, table_type);
+	mlxsw_reg_sftr_range_set(payload, range);
+	mlxsw_reg_sftr_port_set(payload, MLXSW_PORT_CPU_PORT, 1);
+	mlxsw_reg_sftr_port_mask_set(payload, MLXSW_PORT_CPU_PORT, 1);
+}
+
+/* SPMLR - Switch Port MAC Learning Register
+ * -----------------------------------------
+ * Controls the Switch MAC learning policy per port.
+ */
+#define MLXSW_REG_SPMLR_ID 0x2018
+#define MLXSW_REG_SPMLR_LEN 0x8
+
+static const struct mlxsw_reg_info mlxsw_reg_spmlr = {
+	.id = MLXSW_REG_SPMLR_ID,
+	.len = MLXSW_REG_SPMLR_LEN,
+};
+
+/* reg_spmlr_local_port
+ * Local port number.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, spmlr, local_port, 0x00, 16, 8);
+
+/* reg_spmlr_sub_port
+ * Virtual port within the physical port.
+ * Should be set to 0 when virtual ports are not enabled on the port.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, spmlr, sub_port, 0x00, 8, 8);
+
+enum mlxsw_reg_spmlr_learn_mode {
+	MLXSW_REG_SPMLR_LEARN_MODE_DISABLE = 0,
+	MLXSW_REG_SPMLR_LEARN_MODE_ENABLE = 2,
+	MLXSW_REG_SPMLR_LEARN_MODE_SEC = 3,
+};
+
+/* reg_spmlr_learn_mode
+ * Learning mode on the port.
+ * 0 - Learning disabled.
+ * 2 - Learning enabled.
+ * 3 - Security mode.
+ *
+ * In security mode the switch does not learn MACs on the port, but uses the
+ * SMAC to see if it exists on another ingress port. If so, the packet is
+ * classified as a bad packet and is discarded unless the software registers
+ * to receive port security error packets usign HPKT.
+ */
+MLXSW_ITEM32(reg, spmlr, learn_mode, 0x04, 30, 2);
+
+static inline void mlxsw_reg_spmlr_pack(char *payload, u8 local_port,
+					enum mlxsw_reg_spmlr_learn_mode mode)
+{
+	MLXSW_REG_ZERO(spmlr, payload);
+	mlxsw_reg_spmlr_local_port_set(payload, local_port);
+	mlxsw_reg_spmlr_sub_port_set(payload, 0);
+	mlxsw_reg_spmlr_learn_mode_set(payload, mode);
+}
+
+/* PMLP - Ports Module to Local Port Register
+ * ------------------------------------------
+ * Configures the assignment of modules to local ports.
+ */
+#define MLXSW_REG_PMLP_ID 0x5002
+#define MLXSW_REG_PMLP_LEN 0x40
+
+static const struct mlxsw_reg_info mlxsw_reg_pmlp = {
+	.id = MLXSW_REG_PMLP_ID,
+	.len = MLXSW_REG_PMLP_LEN,
+};
+
+/* reg_pmlp_rxtx
+ * 0 - Tx value is used for both Tx and Rx.
+ * 1 - Rx value is taken from a separte field.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, pmlp, rxtx, 0x00, 31, 1);
+
+/* reg_pmlp_local_port
+ * Local port number.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, pmlp, local_port, 0x00, 16, 8);
+
+/* reg_pmlp_width
+ * 0 - Unmap local port.
+ * 1 - Lane 0 is used.
+ * 2 - Lanes 0 and 1 are used.
+ * 4 - Lanes 0, 1, 2 and 3 are used.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, pmlp, width, 0x00, 0, 8);
+
+/* reg_pmlp_module
+ * Module number.
+ * Access: RW
+ */
+MLXSW_ITEM32_INDEXED(reg, pmlp, module, 0x04, 0, 8, 0x04, 0, false);
+
+/* reg_pmlp_tx_lane
+ * Tx Lane. When rxtx field is cleared, this field is used for Rx as well.
+ * Access: RW
+ */
+MLXSW_ITEM32_INDEXED(reg, pmlp, tx_lane, 0x04, 16, 2, 0x04, 16, false);
+
+/* reg_pmlp_rx_lane
+ * Rx Lane. When rxtx field is cleared, this field is ignored and Rx lane is
+ * equal to Tx lane.
+ * Access: RW
+ */
+MLXSW_ITEM32_INDEXED(reg, pmlp, rx_lane, 0x04, 24, 2, 0x04, 24, false);
+
+static inline void mlxsw_reg_pmlp_pack(char *payload, u8 local_port)
+{
+	MLXSW_REG_ZERO(pmlp, payload);
+	mlxsw_reg_pmlp_local_port_set(payload, local_port);
+}
+
+/* PMTU - Port MTU Register
+ * ------------------------
+ * Configures and reports the port MTU.
+ */
+#define MLXSW_REG_PMTU_ID 0x5003
+#define MLXSW_REG_PMTU_LEN 0x10
+
+static const struct mlxsw_reg_info mlxsw_reg_pmtu = {
+	.id = MLXSW_REG_PMTU_ID,
+	.len = MLXSW_REG_PMTU_LEN,
+};
+
+/* reg_pmtu_local_port
+ * Local port number.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, pmtu, local_port, 0x00, 16, 8);
+
+/* reg_pmtu_max_mtu
+ * Maximum MTU.
+ * When port type (e.g. Ethernet) is configured, the relevant MTU is
+ * reported, otherwise the minimum between the max_mtu of the different
+ * types is reported.
+ * Access: RO
+ */
+MLXSW_ITEM32(reg, pmtu, max_mtu, 0x04, 16, 16);
+
+/* reg_pmtu_admin_mtu
+ * MTU value to set port to. Must be smaller or equal to max_mtu.
+ * Note: If port type is Infiniband, then port must be disabled, when its
+ * MTU is set.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, pmtu, admin_mtu, 0x08, 16, 16);
+
+/* reg_pmtu_oper_mtu
+ * The actual MTU configured on the port. Packets exceeding this size
+ * will be dropped.
+ * Note: In Ethernet and FC oper_mtu == admin_mtu, however, in Infiniband
+ * oper_mtu might be smaller than admin_mtu.
+ * Access: RO
+ */
+MLXSW_ITEM32(reg, pmtu, oper_mtu, 0x0C, 16, 16);
+
+static inline void mlxsw_reg_pmtu_pack(char *payload, u8 local_port,
+				       u16 new_mtu)
+{
+	MLXSW_REG_ZERO(pmtu, payload);
+	mlxsw_reg_pmtu_local_port_set(payload, local_port);
+	mlxsw_reg_pmtu_max_mtu_set(payload, 0);
+	mlxsw_reg_pmtu_admin_mtu_set(payload, new_mtu);
+	mlxsw_reg_pmtu_oper_mtu_set(payload, 0);
+}
+
+/* PTYS - Port Type and Speed Register
+ * -----------------------------------
+ * Configures and reports the port speed type.
+ *
+ * Note: When set while the link is up, the changes will not take effect
+ * until the port transitions from down to up state.
+ */
+#define MLXSW_REG_PTYS_ID 0x5004
+#define MLXSW_REG_PTYS_LEN 0x40
+
+static const struct mlxsw_reg_info mlxsw_reg_ptys = {
+	.id = MLXSW_REG_PTYS_ID,
+	.len = MLXSW_REG_PTYS_LEN,
+};
+
+/* reg_ptys_local_port
+ * Local port number.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, ptys, local_port, 0x00, 16, 8);
+
+#define MLXSW_REG_PTYS_PROTO_MASK_ETH	BIT(2)
+
+/* reg_ptys_proto_mask
+ * Protocol mask. Indicates which protocol is used.
+ * 0 - Infiniband.
+ * 1 - Fibre Channel.
+ * 2 - Ethernet.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, ptys, proto_mask, 0x00, 0, 3);
+
+#define MLXSW_REG_PTYS_ETH_SPEED_SGMII			BIT(0)
+#define MLXSW_REG_PTYS_ETH_SPEED_1000BASE_KX		BIT(1)
+#define MLXSW_REG_PTYS_ETH_SPEED_10GBASE_CX4		BIT(2)
+#define MLXSW_REG_PTYS_ETH_SPEED_10GBASE_KX4		BIT(3)
+#define MLXSW_REG_PTYS_ETH_SPEED_10GBASE_KR		BIT(4)
+#define MLXSW_REG_PTYS_ETH_SPEED_20GBASE_KR2		BIT(5)
+#define MLXSW_REG_PTYS_ETH_SPEED_40GBASE_CR4		BIT(6)
+#define MLXSW_REG_PTYS_ETH_SPEED_40GBASE_KR4		BIT(7)
+#define MLXSW_REG_PTYS_ETH_SPEED_56GBASE_R4		BIT(8)
+#define MLXSW_REG_PTYS_ETH_SPEED_10GBASE_CR		BIT(12)
+#define MLXSW_REG_PTYS_ETH_SPEED_10GBASE_SR		BIT(13)
+#define MLXSW_REG_PTYS_ETH_SPEED_10GBASE_ER_LR		BIT(14)
+#define MLXSW_REG_PTYS_ETH_SPEED_40GBASE_SR4		BIT(15)
+#define MLXSW_REG_PTYS_ETH_SPEED_40GBASE_LR4_ER4	BIT(16)
+#define MLXSW_REG_PTYS_ETH_SPEED_50GBASE_KR4		BIT(19)
+#define MLXSW_REG_PTYS_ETH_SPEED_100GBASE_CR4		BIT(20)
+#define MLXSW_REG_PTYS_ETH_SPEED_100GBASE_SR4		BIT(21)
+#define MLXSW_REG_PTYS_ETH_SPEED_100GBASE_KR4		BIT(22)
+#define MLXSW_REG_PTYS_ETH_SPEED_100GBASE_LR4_ER4	BIT(23)
+#define MLXSW_REG_PTYS_ETH_SPEED_100BASE_TX		BIT(24)
+#define MLXSW_REG_PTYS_ETH_SPEED_100BASE_T		BIT(25)
+#define MLXSW_REG_PTYS_ETH_SPEED_10GBASE_T		BIT(26)
+#define MLXSW_REG_PTYS_ETH_SPEED_25GBASE_CR		BIT(27)
+#define MLXSW_REG_PTYS_ETH_SPEED_25GBASE_KR		BIT(28)
+#define MLXSW_REG_PTYS_ETH_SPEED_25GBASE_SR		BIT(29)
+#define MLXSW_REG_PTYS_ETH_SPEED_50GBASE_CR2		BIT(30)
+#define MLXSW_REG_PTYS_ETH_SPEED_50GBASE_KR2		BIT(31)
+
+/* reg_ptys_eth_proto_cap
+ * Ethernet port supported speeds and protocols.
+ * Access: RO
+ */
+MLXSW_ITEM32(reg, ptys, eth_proto_cap, 0x0C, 0, 32);
+
+/* reg_ptys_eth_proto_admin
+ * Speed and protocol to set port to.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, ptys, eth_proto_admin, 0x18, 0, 32);
+
+/* reg_ptys_eth_proto_oper
+ * The current speed and protocol configured for the port.
+ * Access: RO
+ */
+MLXSW_ITEM32(reg, ptys, eth_proto_oper, 0x24, 0, 32);
+
+static inline void mlxsw_reg_ptys_pack(char *payload, u8 local_port,
+				       u32 proto_admin)
+{
+	MLXSW_REG_ZERO(ptys, payload);
+	mlxsw_reg_ptys_local_port_set(payload, local_port);
+	mlxsw_reg_ptys_proto_mask_set(payload, MLXSW_REG_PTYS_PROTO_MASK_ETH);
+	mlxsw_reg_ptys_eth_proto_admin_set(payload, proto_admin);
+}
+
+static inline void mlxsw_reg_ptys_unpack(char *payload, u32 *p_eth_proto_cap,
+					 u32 *p_eth_proto_adm,
+					 u32 *p_eth_proto_oper)
+{
+	if (p_eth_proto_cap)
+		*p_eth_proto_cap = mlxsw_reg_ptys_eth_proto_cap_get(payload);
+	if (p_eth_proto_adm)
+		*p_eth_proto_adm = mlxsw_reg_ptys_eth_proto_admin_get(payload);
+	if (p_eth_proto_oper)
+		*p_eth_proto_oper = mlxsw_reg_ptys_eth_proto_oper_get(payload);
+}
+
+/* PPAD - Port Physical Address Register
+ * -------------------------------------
+ * The PPAD register configures the per port physical MAC address.
+ */
+#define MLXSW_REG_PPAD_ID 0x5005
+#define MLXSW_REG_PPAD_LEN 0x10
+
+static const struct mlxsw_reg_info mlxsw_reg_ppad = {
+	.id = MLXSW_REG_PPAD_ID,
+	.len = MLXSW_REG_PPAD_LEN,
+};
+
+/* reg_ppad_single_base_mac
+ * 0: base_mac, local port should be 0 and mac[7:0] is
+ * reserved. HW will set incremental
+ * 1: single_mac - mac of the local_port
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, ppad, single_base_mac, 0x00, 28, 1);
+
+/* reg_ppad_local_port
+ * port number, if single_base_mac = 0 then local_port is reserved
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, ppad, local_port, 0x00, 16, 8);
+
+/* reg_ppad_mac
+ * If single_base_mac = 0 - base MAC address, mac[7:0] is reserved.
+ * If single_base_mac = 1 - the per port MAC address
+ * Access: RW
+ */
+MLXSW_ITEM_BUF(reg, ppad, mac, 0x02, 6);
+
+static inline void mlxsw_reg_ppad_pack(char *payload, bool single_base_mac,
+				       u8 local_port)
+{
+	MLXSW_REG_ZERO(ppad, payload);
+	mlxsw_reg_ppad_single_base_mac_set(payload, !!single_base_mac);
+	mlxsw_reg_ppad_local_port_set(payload, local_port);
+}
+
+/* PAOS - Ports Administrative and Operational Status Register
+ * -----------------------------------------------------------
+ * Configures and retrieves per port administrative and operational status.
+ */
+#define MLXSW_REG_PAOS_ID 0x5006
+#define MLXSW_REG_PAOS_LEN 0x10
+
+static const struct mlxsw_reg_info mlxsw_reg_paos = {
+	.id = MLXSW_REG_PAOS_ID,
+	.len = MLXSW_REG_PAOS_LEN,
+};
+
+/* reg_paos_swid
+ * Switch partition ID with which to associate the port.
+ * Note: while external ports uses unique local port numbers (and thus swid is
+ * redundant), router ports use the same local port number where swid is the
+ * only indication for the relevant port.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, paos, swid, 0x00, 24, 8);
+
+/* reg_paos_local_port
+ * Local port number.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, paos, local_port, 0x00, 16, 8);
+
+/* reg_paos_admin_status
+ * Port administrative state (the desired state of the port):
+ * 1 - Up.
+ * 2 - Down.
+ * 3 - Up once. This means that in case of link failure, the port won't go
+ *     into polling mode, but will wait to be re-enabled by software.
+ * 4 - Disabled by system. Can only be set by hardware.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, paos, admin_status, 0x00, 8, 4);
+
+/* reg_paos_oper_status
+ * Port operational state (the current state):
+ * 1 - Up.
+ * 2 - Down.
+ * 3 - Down by port failure. This means that the device will not let the
+ *     port up again until explicitly specified by software.
+ * Access: RO
+ */
+MLXSW_ITEM32(reg, paos, oper_status, 0x00, 0, 4);
+
+/* reg_paos_ase
+ * Admin state update enabled.
+ * Access: WO
+ */
+MLXSW_ITEM32(reg, paos, ase, 0x04, 31, 1);
+
+/* reg_paos_ee
+ * Event update enable. If this bit is set, event generation will be
+ * updated based on the e field.
+ * Access: WO
+ */
+MLXSW_ITEM32(reg, paos, ee, 0x04, 30, 1);
+
+/* reg_paos_e
+ * Event generation on operational state change:
+ * 0 - Do not generate event.
+ * 1 - Generate Event.
+ * 2 - Generate Single Event.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, paos, e, 0x04, 0, 2);
+
+static inline void mlxsw_reg_paos_pack(char *payload, u8 local_port,
+				       enum mlxsw_port_admin_status status)
+{
+	MLXSW_REG_ZERO(paos, payload);
+	mlxsw_reg_paos_swid_set(payload, 0);
+	mlxsw_reg_paos_local_port_set(payload, local_port);
+	mlxsw_reg_paos_admin_status_set(payload, status);
+	mlxsw_reg_paos_oper_status_set(payload, 0);
+	mlxsw_reg_paos_ase_set(payload, 1);
+	mlxsw_reg_paos_ee_set(payload, 1);
+	mlxsw_reg_paos_e_set(payload, 1);
+}
+
+/* PPCNT - Ports Performance Counters Register
+ * -------------------------------------------
+ * The PPCNT register retrieves per port performance counters.
+ */
+#define MLXSW_REG_PPCNT_ID 0x5008
+#define MLXSW_REG_PPCNT_LEN 0x100
+
+static const struct mlxsw_reg_info mlxsw_reg_ppcnt = {
+	.id = MLXSW_REG_PPCNT_ID,
+	.len = MLXSW_REG_PPCNT_LEN,
+};
+
+/* reg_ppcnt_swid
+ * For HCA: must be always 0.
+ * Switch partition ID to associate port with.
+ * Switch partitions are numbered from 0 to 7 inclusively.
+ * Switch partition 254 indicates stacking ports.
+ * Switch partition 255 indicates all switch partitions.
+ * Only valid on Set() operation with local_port=255.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, ppcnt, swid, 0x00, 24, 8);
+
+/* reg_ppcnt_local_port
+ * Local port number.
+ * 255 indicates all ports on the device, and is only allowed
+ * for Set() operation.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, ppcnt, local_port, 0x00, 16, 8);
+
+/* reg_ppcnt_pnat
+ * Port number access type:
+ * 0 - Local port number
+ * 1 - IB port number
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, ppcnt, pnat, 0x00, 14, 2);
+
+/* reg_ppcnt_grp
+ * Performance counter group.
+ * Group 63 indicates all groups. Only valid on Set() operation with
+ * clr bit set.
+ * 0x0: IEEE 802.3 Counters
+ * 0x1: RFC 2863 Counters
+ * 0x2: RFC 2819 Counters
+ * 0x3: RFC 3635 Counters
+ * 0x5: Ethernet Extended Counters
+ * 0x8: Link Level Retransmission Counters
+ * 0x10: Per Priority Counters
+ * 0x11: Per Traffic Class Counters
+ * 0x12: Physical Layer Counters
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, ppcnt, grp, 0x00, 0, 6);
+
+/* reg_ppcnt_clr
+ * Clear counters. Setting the clr bit will reset the counter value
+ * for all counters in the counter group. This bit can be set
+ * for both Set() and Get() operation.
+ * Access: OP
+ */
+MLXSW_ITEM32(reg, ppcnt, clr, 0x04, 31, 1);
+
+/* reg_ppcnt_prio_tc
+ * Priority for counter set that support per priority, valid values: 0-7.
+ * Traffic class for counter set that support per traffic class,
+ * valid values: 0- cap_max_tclass-1 .
+ * For HCA: cap_max_tclass is always 8.
+ * Otherwise must be 0.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, ppcnt, prio_tc, 0x04, 0, 5);
+
+/* reg_ppcnt_a_frames_transmitted_ok
+ * Access: RO
+ */
+MLXSW_ITEM64(reg, ppcnt, a_frames_transmitted_ok,
+	     0x08 + 0x00, 0, 64);
+
+/* reg_ppcnt_a_frames_received_ok
+ * Access: RO
+ */
+MLXSW_ITEM64(reg, ppcnt, a_frames_received_ok,
+	     0x08 + 0x08, 0, 64);
+
+/* reg_ppcnt_a_frame_check_sequence_errors
+ * Access: RO
+ */
+MLXSW_ITEM64(reg, ppcnt, a_frame_check_sequence_errors,
+	     0x08 + 0x10, 0, 64);
+
+/* reg_ppcnt_a_alignment_errors
+ * Access: RO
+ */
+MLXSW_ITEM64(reg, ppcnt, a_alignment_errors,
+	     0x08 + 0x18, 0, 64);
+
+/* reg_ppcnt_a_octets_transmitted_ok
+ * Access: RO
+ */
+MLXSW_ITEM64(reg, ppcnt, a_octets_transmitted_ok,
+	     0x08 + 0x20, 0, 64);
+
+/* reg_ppcnt_a_octets_received_ok
+ * Access: RO
+ */
+MLXSW_ITEM64(reg, ppcnt, a_octets_received_ok,
+	     0x08 + 0x28, 0, 64);
+
+/* reg_ppcnt_a_multicast_frames_xmitted_ok
+ * Access: RO
+ */
+MLXSW_ITEM64(reg, ppcnt, a_multicast_frames_xmitted_ok,
+	     0x08 + 0x30, 0, 64);
+
+/* reg_ppcnt_a_broadcast_frames_xmitted_ok
+ * Access: RO
+ */
+MLXSW_ITEM64(reg, ppcnt, a_broadcast_frames_xmitted_ok,
+	     0x08 + 0x38, 0, 64);
+
+/* reg_ppcnt_a_multicast_frames_received_ok
+ * Access: RO
+ */
+MLXSW_ITEM64(reg, ppcnt, a_multicast_frames_received_ok,
+	     0x08 + 0x40, 0, 64);
+
+/* reg_ppcnt_a_broadcast_frames_received_ok
+ * Access: RO
+ */
+MLXSW_ITEM64(reg, ppcnt, a_broadcast_frames_received_ok,
+	     0x08 + 0x48, 0, 64);
+
+/* reg_ppcnt_a_in_range_length_errors
+ * Access: RO
+ */
+MLXSW_ITEM64(reg, ppcnt, a_in_range_length_errors,
+	     0x08 + 0x50, 0, 64);
+
+/* reg_ppcnt_a_out_of_range_length_field
+ * Access: RO
+ */
+MLXSW_ITEM64(reg, ppcnt, a_out_of_range_length_field,
+	     0x08 + 0x58, 0, 64);
+
+/* reg_ppcnt_a_frame_too_long_errors
+ * Access: RO
+ */
+MLXSW_ITEM64(reg, ppcnt, a_frame_too_long_errors,
+	     0x08 + 0x60, 0, 64);
+
+/* reg_ppcnt_a_symbol_error_during_carrier
+ * Access: RO
+ */
+MLXSW_ITEM64(reg, ppcnt, a_symbol_error_during_carrier,
+	     0x08 + 0x68, 0, 64);
+
+/* reg_ppcnt_a_mac_control_frames_transmitted
+ * Access: RO
+ */
+MLXSW_ITEM64(reg, ppcnt, a_mac_control_frames_transmitted,
+	     0x08 + 0x70, 0, 64);
+
+/* reg_ppcnt_a_mac_control_frames_received
+ * Access: RO
+ */
+MLXSW_ITEM64(reg, ppcnt, a_mac_control_frames_received,
+	     0x08 + 0x78, 0, 64);
+
+/* reg_ppcnt_a_unsupported_opcodes_received
+ * Access: RO
+ */
+MLXSW_ITEM64(reg, ppcnt, a_unsupported_opcodes_received,
+	     0x08 + 0x80, 0, 64);
+
+/* reg_ppcnt_a_pause_mac_ctrl_frames_received
+ * Access: RO
+ */
+MLXSW_ITEM64(reg, ppcnt, a_pause_mac_ctrl_frames_received,
+	     0x08 + 0x88, 0, 64);
+
+/* reg_ppcnt_a_pause_mac_ctrl_frames_transmitted
+ * Access: RO
+ */
+MLXSW_ITEM64(reg, ppcnt, a_pause_mac_ctrl_frames_transmitted,
+	     0x08 + 0x90, 0, 64);
+
+static inline void mlxsw_reg_ppcnt_pack(char *payload, u8 local_port)
+{
+	MLXSW_REG_ZERO(ppcnt, payload);
+	mlxsw_reg_ppcnt_swid_set(payload, 0);
+	mlxsw_reg_ppcnt_local_port_set(payload, local_port);
+	mlxsw_reg_ppcnt_pnat_set(payload, 0);
+	mlxsw_reg_ppcnt_grp_set(payload, 0);
+	mlxsw_reg_ppcnt_clr_set(payload, 0);
+	mlxsw_reg_ppcnt_prio_tc_set(payload, 0);
+}
+
+/* PSPA - Port Switch Partition Allocation
+ * ---------------------------------------
+ * Controls the association of a port with a switch partition and enables
+ * configuring ports as stacking ports.
+ */
+#define MLXSW_REG_PSPA_ID 0x500d
+#define MLXSW_REG_PSPA_LEN 0x8
+
+static const struct mlxsw_reg_info mlxsw_reg_pspa = {
+	.id = MLXSW_REG_PSPA_ID,
+	.len = MLXSW_REG_PSPA_LEN,
+};
+
+/* reg_pspa_swid
+ * Switch partition ID.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, pspa, swid, 0x00, 24, 8);
+
+/* reg_pspa_local_port
+ * Local port number.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, pspa, local_port, 0x00, 16, 8);
+
+/* reg_pspa_sub_port
+ * Virtual port within the local port. Set to 0 when virtual ports are
+ * disabled on the local port.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, pspa, sub_port, 0x00, 8, 8);
+
+static inline void mlxsw_reg_pspa_pack(char *payload, u8 swid, u8 local_port)
+{
+	MLXSW_REG_ZERO(pspa, payload);
+	mlxsw_reg_pspa_swid_set(payload, swid);
+	mlxsw_reg_pspa_local_port_set(payload, local_port);
+	mlxsw_reg_pspa_sub_port_set(payload, 0);
+}
+
+/* HTGT - Host Trap Group Table
+ * ----------------------------
+ * Configures the properties for forwarding to CPU.
+ */
+#define MLXSW_REG_HTGT_ID 0x7002
+#define MLXSW_REG_HTGT_LEN 0x100
+
+static const struct mlxsw_reg_info mlxsw_reg_htgt = {
+	.id = MLXSW_REG_HTGT_ID,
+	.len = MLXSW_REG_HTGT_LEN,
+};
+
+/* reg_htgt_swid
+ * Switch partition ID.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, htgt, swid, 0x00, 24, 8);
+
+#define MLXSW_REG_HTGT_PATH_TYPE_LOCAL 0x0	/* For locally attached CPU */
+
+/* reg_htgt_type
+ * CPU path type.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, htgt, type, 0x00, 8, 4);
+
+#define MLXSW_REG_HTGT_TRAP_GROUP_EMAD	0x0
+#define MLXSW_REG_HTGT_TRAP_GROUP_RX	0x1
+
+/* reg_htgt_trap_group
+ * Trap group number. User defined number specifying which trap groups
+ * should be forwarded to the CPU. The mapping between trap IDs and trap
+ * groups is configured using HPKT register.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, htgt, trap_group, 0x00, 0, 8);
+
+enum {
+	MLXSW_REG_HTGT_POLICER_DISABLE,
+	MLXSW_REG_HTGT_POLICER_ENABLE,
+};
+
+/* reg_htgt_pide
+ * Enable policer ID specified using 'pid' field.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, htgt, pide, 0x04, 15, 1);
+
+/* reg_htgt_pid
+ * Policer ID for the trap group.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, htgt, pid, 0x04, 0, 8);
+
+#define MLXSW_REG_HTGT_TRAP_TO_CPU 0x0
+
+/* reg_htgt_mirror_action
+ * Mirror action to use.
+ * 0 - Trap to CPU.
+ * 1 - Trap to CPU and mirror to a mirroring agent.
+ * 2 - Mirror to a mirroring agent and do not trap to CPU.
+ * Access: RW
+ *
+ * Note: Mirroring to a mirroring agent is only supported in Spectrum.
+ */
+MLXSW_ITEM32(reg, htgt, mirror_action, 0x08, 8, 2);
+
+/* reg_htgt_mirroring_agent
+ * Mirroring agent.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, htgt, mirroring_agent, 0x08, 0, 3);
+
+/* reg_htgt_priority
+ * Trap group priority.
+ * In case a packet matches multiple classification rules, the packet will
+ * only be trapped once, based on the trap ID associated with the group (via
+ * register HPKT) with the highest priority.
+ * Supported values are 0-7, with 7 represnting the highest priority.
+ * Access: RW
+ *
+ * Note: In SwitchX-2 this field is ignored and the priority value is replaced
+ * by the 'trap_group' field.
+ */
+MLXSW_ITEM32(reg, htgt, priority, 0x0C, 0, 4);
+
+/* reg_htgt_local_path_cpu_tclass
+ * CPU ingress traffic class for the trap group.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, htgt, local_path_cpu_tclass, 0x10, 16, 6);
+
+#define MLXSW_REG_HTGT_LOCAL_PATH_RDQ_EMAD	0x15
+#define MLXSW_REG_HTGT_LOCAL_PATH_RDQ_RX	0x14
+
+/* reg_htgt_local_path_rdq
+ * Receive descriptor queue (RDQ) to use for the trap group.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, htgt, local_path_rdq, 0x10, 0, 6);
+
+static inline void mlxsw_reg_htgt_pack(char *payload, u8 trap_group)
+{
+	u8 swid, rdq;
+
+	MLXSW_REG_ZERO(htgt, payload);
+	if (MLXSW_REG_HTGT_TRAP_GROUP_EMAD == trap_group) {
+		swid = MLXSW_PORT_SWID_ALL_SWIDS;
+		rdq = MLXSW_REG_HTGT_LOCAL_PATH_RDQ_EMAD;
+	} else {
+		swid = 0;
+		rdq = MLXSW_REG_HTGT_LOCAL_PATH_RDQ_RX;
+	}
+	mlxsw_reg_htgt_swid_set(payload, swid);
+	mlxsw_reg_htgt_type_set(payload, MLXSW_REG_HTGT_PATH_TYPE_LOCAL);
+	mlxsw_reg_htgt_trap_group_set(payload, trap_group);
+	mlxsw_reg_htgt_pide_set(payload, MLXSW_REG_HTGT_POLICER_DISABLE);
+	mlxsw_reg_htgt_pid_set(payload, 0);
+	mlxsw_reg_htgt_mirror_action_set(payload, MLXSW_REG_HTGT_TRAP_TO_CPU);
+	mlxsw_reg_htgt_mirroring_agent_set(payload, 0);
+	mlxsw_reg_htgt_priority_set(payload, 0);
+	mlxsw_reg_htgt_local_path_cpu_tclass_set(payload, 7);
+	mlxsw_reg_htgt_local_path_rdq_set(payload, rdq);
+}
+
+/* HPKT - Host Packet Trap
+ * -----------------------
+ * Configures trap IDs inside trap groups.
+ */
+#define MLXSW_REG_HPKT_ID 0x7003
+#define MLXSW_REG_HPKT_LEN 0x10
+
+static const struct mlxsw_reg_info mlxsw_reg_hpkt = {
+	.id = MLXSW_REG_HPKT_ID,
+	.len = MLXSW_REG_HPKT_LEN,
+};
+
+enum {
+	MLXSW_REG_HPKT_ACK_NOT_REQUIRED,
+	MLXSW_REG_HPKT_ACK_REQUIRED,
+};
+
+/* reg_hpkt_ack
+ * Require acknowledgements from the host for events.
+ * If set, then the device will wait for the event it sent to be acknowledged
+ * by the host. This option is only relevant for event trap IDs.
+ * Access: RW
+ *
+ * Note: Currently not supported by firmware.
+ */
+MLXSW_ITEM32(reg, hpkt, ack, 0x00, 24, 1);
+
+enum mlxsw_reg_hpkt_action {
+	MLXSW_REG_HPKT_ACTION_FORWARD,
+	MLXSW_REG_HPKT_ACTION_TRAP_TO_CPU,
+	MLXSW_REG_HPKT_ACTION_MIRROR_TO_CPU,
+	MLXSW_REG_HPKT_ACTION_DISCARD,
+	MLXSW_REG_HPKT_ACTION_SOFT_DISCARD,
+	MLXSW_REG_HPKT_ACTION_TRAP_AND_SOFT_DISCARD,
+};
+
+/* reg_hpkt_action
+ * Action to perform on packet when trapped.
+ * 0 - No action. Forward to CPU based on switching rules.
+ * 1 - Trap to CPU (CPU receives sole copy).
+ * 2 - Mirror to CPU (CPU receives a replica of the packet).
+ * 3 - Discard.
+ * 4 - Soft discard (allow other traps to act on the packet).
+ * 5 - Trap and soft discard (allow other traps to overwrite this trap).
+ * Access: RW
+ *
+ * Note: Must be set to 0 (forward) for event trap IDs, as they are already
+ * addressed to the CPU.
+ */
+MLXSW_ITEM32(reg, hpkt, action, 0x00, 20, 3);
+
+/* reg_hpkt_trap_group
+ * Trap group to associate the trap with.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, hpkt, trap_group, 0x00, 12, 6);
+
+/* reg_hpkt_trap_id
+ * Trap ID.
+ * Access: Index
+ *
+ * Note: A trap ID can only be associated with a single trap group. The device
+ * will associate the trap ID with the last trap group configured.
+ */
+MLXSW_ITEM32(reg, hpkt, trap_id, 0x00, 0, 9);
+
+enum {
+	MLXSW_REG_HPKT_CTRL_PACKET_DEFAULT,
+	MLXSW_REG_HPKT_CTRL_PACKET_NO_BUFFER,
+	MLXSW_REG_HPKT_CTRL_PACKET_USE_BUFFER,
+};
+
+/* reg_hpkt_ctrl
+ * Configure dedicated buffer resources for control packets.
+ * 0 - Keep factory defaults.
+ * 1 - Do not use control buffer for this trap ID.
+ * 2 - Use control buffer for this trap ID.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, hpkt, ctrl, 0x04, 16, 2);
+
+static inline void mlxsw_reg_hpkt_pack(char *payload, u8 action,
+				       u8 trap_group, u16 trap_id)
+{
+	MLXSW_REG_ZERO(hpkt, payload);
+	mlxsw_reg_hpkt_ack_set(payload, MLXSW_REG_HPKT_ACK_NOT_REQUIRED);
+	mlxsw_reg_hpkt_action_set(payload, action);
+	mlxsw_reg_hpkt_trap_group_set(payload, trap_group);
+	mlxsw_reg_hpkt_trap_id_set(payload, trap_id);
+	mlxsw_reg_hpkt_ctrl_set(payload, MLXSW_REG_HPKT_CTRL_PACKET_DEFAULT);
+}
+
+static inline const char *mlxsw_reg_id_str(u16 reg_id)
+{
+	switch (reg_id) {
+	case MLXSW_REG_SGCR_ID:
+		return "SGCR";
+	case MLXSW_REG_SPAD_ID:
+		return "SPAD";
+	case MLXSW_REG_SMID_ID:
+		return "SMID";
+	case MLXSW_REG_SPMS_ID:
+		return "SPMS";
+	case MLXSW_REG_SFGC_ID:
+		return "SFGC";
+	case MLXSW_REG_SFTR_ID:
+		return "SFTR";
+	case MLXSW_REG_SPMLR_ID:
+		return "SPMLR";
+	case MLXSW_REG_PMLP_ID:
+		return "PMLP";
+	case MLXSW_REG_PMTU_ID:
+		return "PMTU";
+	case MLXSW_REG_PTYS_ID:
+		return "PTYS";
+	case MLXSW_REG_PPAD_ID:
+		return "PPAD";
+	case MLXSW_REG_PAOS_ID:
+		return "PAOS";
+	case MLXSW_REG_PPCNT_ID:
+		return "PPCNT";
+	case MLXSW_REG_PSPA_ID:
+		return "PSPA";
+	case MLXSW_REG_HTGT_ID:
+		return "HTGT";
+	case MLXSW_REG_HPKT_ID:
+		return "HPKT";
+	default:
+		return "*UNKNOWN*";
+	}
+}
+
+/* PUDE - Port Up / Down Event
+ * ---------------------------
+ * Reports the operational state change of a port.
+ */
+#define MLXSW_REG_PUDE_LEN 0x10
+
+/* reg_pude_swid
+ * Switch partition ID with which to associate the port.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, pude, swid, 0x00, 24, 8);
+
+/* reg_pude_local_port
+ * Local port number.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, pude, local_port, 0x00, 16, 8);
+
+/* reg_pude_admin_status
+ * Port administrative state (the desired state).
+ * 1 - Up.
+ * 2 - Down.
+ * 3 - Up once. This means that in case of link failure, the port won't go
+ *     into polling mode, but will wait to be re-enabled by software.
+ * 4 - Disabled by system. Can only be set by hardware.
+ * Access: RO
+ */
+MLXSW_ITEM32(reg, pude, admin_status, 0x00, 8, 4);
+
+/* reg_pude_oper_status
+ * Port operatioanl state.
+ * 1 - Up.
+ * 2 - Down.
+ * 3 - Down by port failure. This means that the device will not let the
+ *     port up again until explicitly specified by software.
+ * Access: RO
+ */
+MLXSW_ITEM32(reg, pude, oper_status, 0x00, 0, 4);
+
+#endif
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [patch net-next 4/4] mlxsw: Introduce Mellanox SwitchX-2 ASIC support
  2015-07-23 15:43 [patch net-next 0/4] Introduce Mellanox Technologies Switch ASICs switchdev drivers Jiri Pirko
                   ` (2 preceding siblings ...)
  2015-07-23 15:43 ` [patch net-next 3/4] mlxsw: Add interface to access registers and process events Jiri Pirko
@ 2015-07-23 15:43 ` Jiri Pirko
  2015-07-23 17:19   ` Alexander Duyck
  2015-07-26  2:45   ` Scott Feldman
  2015-07-24  0:03 ` [patch net-next 0/4] Introduce Mellanox Technologies Switch ASICs switchdev drivers Scott Feldman
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 29+ messages in thread
From: Jiri Pirko @ 2015-07-23 15:43 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, ogerlitz, sfeldma, roopa, f.fainelli,
	tgraf, ast, jhs, daniel, john.fastabend, simon.horman, linville,
	andy, shm, nhorman, Jiri Pirko

From: Jiri Pirko <jiri@mellanox.com>

Benefit from the previously introduced Mellanox Switch infrastructure and
add driver for SwitchX-2 ASIC. Note that this driver is very simple now.
It implements bare minimum for getting device to work on slow-path.
Fast-path offload functionality is going to be added soon.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Elad Raz <eladr@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/Kconfig    |   11 +
 drivers/net/ethernet/mellanox/mlxsw/Makefile   |    2 +
 drivers/net/ethernet/mellanox/mlxsw/core.h     |    2 +
 drivers/net/ethernet/mellanox/mlxsw/pci.c      |    3 +
 drivers/net/ethernet/mellanox/mlxsw/pci.h      |    1 +
 drivers/net/ethernet/mellanox/mlxsw/port.h     |    4 +
 drivers/net/ethernet/mellanox/mlxsw/switchx2.c | 1541 ++++++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlxsw/txheader.h |   80 ++
 8 files changed, 1644 insertions(+)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/switchx2.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/txheader.h

diff --git a/drivers/net/ethernet/mellanox/mlxsw/Kconfig b/drivers/net/ethernet/mellanox/mlxsw/Kconfig
index 1385f2c..8d1080d 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlxsw/Kconfig
@@ -19,3 +19,14 @@ config MLXSW_PCI
 
 	  To compile this driver as a module, choose M here: the
 	  module will be called mlxsw_pci.
+
+config MLXSW_SWITCHX2
+	tristate "Mellanox Technologies SwitchX-2 support"
+	depends on MLXSW_CORE && NET_SWITCHDEV
+	default m
+	---help---
+	  This driver supports Mellanox Technologies SwitchX-2 Ethernet
+	  Switch ASICs.
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called mlxsw_switchx2.
diff --git a/drivers/net/ethernet/mellanox/mlxsw/Makefile b/drivers/net/ethernet/mellanox/mlxsw/Makefile
index 94841c3..0a05f65 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/Makefile
+++ b/drivers/net/ethernet/mellanox/mlxsw/Makefile
@@ -2,3 +2,5 @@ obj-$(CONFIG_MLXSW_CORE)	+= mlxsw_core.o
 mlxsw_core-objs			:= core.o
 obj-$(CONFIG_MLXSW_PCI)		+= mlxsw_pci.o
 mlxsw_pci-objs			:= pci.o
+obj-$(CONFIG_MLXSW_SWITCHX2)	+= mlxsw_switchx2.o
+mlxsw_switchx2-objs		:= switchx2.o
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index b6c11d2..fbc94da 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -53,6 +53,8 @@
 #define MODULE_MLXSW_DRIVER_ALIAS(kind)	\
 	MODULE_ALIAS(MLXSW_MODULE_ALIAS_PREFIX kind)
 
+#define MLXSW_DEVICE_KIND_SWITCHX2 "switchx2"
+
 struct mlxsw_core;
 struct mlxsw_driver;
 struct mlxsw_bus;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index 8371e08..de8ff32 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -55,6 +55,7 @@
 static const char mlxsw_pci_driver_name[] = "mlxsw_pci";
 
 static const struct pci_device_id mlxsw_pci_id_table[] = {
+	{PCI_VDEVICE(MELLANOX, PCI_DEVICE_ID_MELLANOX_SWITCHX2), 0},
 	{0, }
 };
 
@@ -63,6 +64,8 @@ static struct dentry *mlxsw_pci_dbg_root;
 static const char *mlxsw_pci_device_kind_get(const struct pci_device_id *id)
 {
 	switch (id->device) {
+	case PCI_DEVICE_ID_MELLANOX_SWITCHX2:
+		return MLXSW_DEVICE_KIND_SWITCHX2;
 	default:
 		BUG();
 	}
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.h b/drivers/net/ethernet/mellanox/mlxsw/pci.h
index 6176a93..887af84 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.h
@@ -39,6 +39,7 @@
 
 #include "item.h"
 
+#define PCI_DEVICE_ID_MELLANOX_SWITCHX2	0xc738
 #define MLXSW_PCI_BAR0_SIZE		(1024 * 1024) /* 1MB */
 #define MLXSW_PCI_PAGE_SIZE		4096
 
diff --git a/drivers/net/ethernet/mellanox/mlxsw/port.h b/drivers/net/ethernet/mellanox/mlxsw/port.h
index e626231..795b21c 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/port.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/port.h
@@ -51,6 +51,10 @@
 #define MLXSW_PORT_MAX_PHY_PORTS	0x40
 #define MLXSW_PORT_MAX_PORTS		MLXSW_PORT_MAX_PHY_PORTS
 
+#define MLXSW_PORT_DEVID_BITS_OFFSET	10
+#define MLXSW_PORT_PHY_BITS_OFFSET	4
+#define MLXSW_PORT_PHY_BITS_MASK	(MLXSW_PORT_MAX_PHY_PORTS - 1)
+
 #define MLXSW_PORT_CPU_PORT		0x0
 
 enum mlxsw_port_admin_status {
diff --git a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
new file mode 100644
index 0000000..15dc53b
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
@@ -0,0 +1,1541 @@
+/*
+ * drivers/net/ethernet/mellanox/mlxsw/switchx2.c
+ * Copyright (c) 2015 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2015 Jiri Pirko <jiri@mellanox.com>
+ * Copyright (c) 2015 Ido Schimmel <idosch@mellanox.com>
+ * Copyright (c) 2015 Elad Raz <eladr@mellanox.com>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the names of the copyright holders nor the names of its
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/netdevice.h>
+#include <linux/etherdevice.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/skbuff.h>
+#include <linux/if_vlan.h>
+#include <net/switchdev.h>
+#include <generated/utsrelease.h>
+
+#include "core.h"
+#include "reg.h"
+#include "port.h"
+#include "trap.h"
+#include "txheader.h"
+
+static const char mlxsw_sx_driver_name[] = "mlxsw_switchx2";
+static const char mlxsw_sx_driver_version[] = "1.0";
+
+struct mlxsw_sx_port;
+
+#define MLXSW_SW_HW_ID_LEN 6
+
+struct mlxsw_sx {
+	struct mlxsw_sx_port **ports;
+	struct mlxsw_core *core;
+	const struct mlxsw_bus_info *bus_info;
+	u8 hw_id[MLXSW_SW_HW_ID_LEN];
+};
+
+struct mlxsw_sx_port_pcpu_stats {
+	u64			rx_packets;
+	u64			rx_bytes;
+	u64			tx_packets;
+	u64			tx_bytes;
+	struct u64_stats_sync	syncp;
+	u32			tx_dropped;
+};
+
+struct mlxsw_sx_port {
+	struct net_device *dev;
+	struct mlxsw_sx_port_pcpu_stats __percpu *pcpu_stats;
+	struct mlxsw_sx *mlxsw_sx;
+	u8 local_port;
+};
+
+/* tx_hdr_version
+ * Tx header version.
+ * Must be set to 0.
+ */
+MLXSW_ITEM32(tx, hdr, version, 0x00, 28, 4);
+
+/* tx_hdr_ctl
+ * Packet control type.
+ * 0 - Ethernet control (e.g. EMADs, LACP)
+ * 1 - Ethernet data
+ */
+MLXSW_ITEM32(tx, hdr, ctl, 0x00, 26, 2);
+
+/* tx_hdr_proto
+ * Packet protocol type. Must be set to 1 (Ethernet).
+ */
+MLXSW_ITEM32(tx, hdr, proto, 0x00, 21, 3);
+
+/* tx_hdr_etclass
+ * Egress TClass to be used on the egress device on the egress port.
+ * The MSB is specified in the 'ctclass3' field.
+ * Range is 0-15, where 15 is the highest priority.
+ */
+MLXSW_ITEM32(tx, hdr, etclass, 0x00, 18, 3);
+
+/* tx_hdr_swid
+ * Switch partition ID.
+ */
+MLXSW_ITEM32(tx, hdr, swid, 0x00, 12, 3);
+
+/* tx_hdr_port_mid
+ * Destination local port for unicast packets.
+ * Destination multicast ID for multicast packets.
+ *
+ * Control packets are directed to a specific egress port, while data
+ * packets are transmitted through the CPU port (0) into the switch partition,
+ * where forwarding rules are applied.
+ */
+MLXSW_ITEM32(tx, hdr, port_mid, 0x04, 16, 16);
+
+/* tx_hdr_ctclass3
+ * See field 'etclass'.
+ */
+MLXSW_ITEM32(tx, hdr, ctclass3, 0x04, 14, 1);
+
+/* tx_hdr_rdq
+ * RDQ for control packets sent to remote CPU.
+ * Must be set to 0x1F for EMADs, otherwise 0.
+ */
+MLXSW_ITEM32(tx, hdr, rdq, 0x04, 9, 5);
+
+/* tx_hdr_cpu_sig
+ * Signature control for packets going to CPU. Must be set to 0.
+ */
+MLXSW_ITEM32(tx, hdr, cpu_sig, 0x04, 0, 9);
+
+/* tx_hdr_sig
+ * Stacking protocl signature. Must be set to 0xE0E0.
+ */
+MLXSW_ITEM32(tx, hdr, sig, 0x0C, 16, 16);
+
+/* tx_hdr_stclass
+ * Stacking TClass.
+ */
+MLXSW_ITEM32(tx, hdr, stclass, 0x0C, 13, 3);
+
+/* tx_hdr_emad
+ * EMAD bit. Must be set for EMADs.
+ */
+MLXSW_ITEM32(tx, hdr, emad, 0x0C, 5, 1);
+
+/* tx_hdr_type
+ * 0 - Data packets
+ * 6 - Control packets
+ */
+MLXSW_ITEM32(tx, hdr, type, 0x0C, 0, 4);
+
+static void mlxsw_sx_txhdr_construct(struct sk_buff *skb,
+				     const struct mlxsw_tx_info *tx_info)
+{
+	char *txhdr = skb_push(skb, MLXSW_TXHDR_LEN);
+	bool is_emad = tx_info->is_emad;
+
+	memset(txhdr, 0, MLXSW_TXHDR_LEN);
+
+	/* We currently set default values for the egress tclass (QoS). */
+	mlxsw_tx_hdr_version_set(txhdr, MLXSW_TXHDR_VERSION_0);
+	mlxsw_tx_hdr_ctl_set(txhdr, MLXSW_TXHDR_ETH_CTL);
+	mlxsw_tx_hdr_proto_set(txhdr, MLXSW_TXHDR_PROTO_ETH);
+	mlxsw_tx_hdr_etclass_set(txhdr, is_emad ? MLXSW_TXHDR_ETCLASS_6 :
+						  MLXSW_TXHDR_ETCLASS_5);
+	mlxsw_tx_hdr_swid_set(txhdr, 0);
+	mlxsw_tx_hdr_port_mid_set(txhdr, tx_info->local_port);
+	mlxsw_tx_hdr_ctclass3_set(txhdr, MLXSW_TXHDR_CTCLASS3);
+	mlxsw_tx_hdr_rdq_set(txhdr, is_emad ? MLXSW_TXHDR_RDQ_EMAD :
+					      MLXSW_TXHDR_RDQ_OTHER);
+	mlxsw_tx_hdr_cpu_sig_set(txhdr, MLXSW_TXHDR_CPU_SIG);
+	mlxsw_tx_hdr_sig_set(txhdr, MLXSW_TXHDR_SIG);
+	mlxsw_tx_hdr_stclass_set(txhdr, MLXSW_TXHDR_STCLASS_NONE);
+	mlxsw_tx_hdr_emad_set(txhdr, is_emad ? MLXSW_TXHDR_EMAD :
+					       MLXSW_TXHDR_NOT_EMAD);
+	mlxsw_tx_hdr_type_set(txhdr, MLXSW_TXHDR_TYPE_CONTROL);
+}
+
+static int mlxsw_sx_port_admin_status_set(struct mlxsw_sx_port *mlxsw_sx_port,
+					  bool is_up)
+{
+	struct mlxsw_sx *mlxsw_sx = mlxsw_sx_port->mlxsw_sx;
+	char paos_pl[MLXSW_REG_PAOS_LEN];
+
+	mlxsw_reg_paos_pack(paos_pl, mlxsw_sx_port->local_port,
+			    is_up ? MLXSW_PORT_ADMIN_STATUS_UP :
+			    MLXSW_PORT_ADMIN_STATUS_DOWN);
+	return mlxsw_reg_write(mlxsw_sx->core, MLXSW_REG(paos), paos_pl);
+}
+
+static int mlxsw_sx_port_oper_status_get(struct mlxsw_sx_port *mlxsw_sx_port,
+					 bool *p_is_up)
+{
+	struct mlxsw_sx *mlxsw_sx = mlxsw_sx_port->mlxsw_sx;
+	char paos_pl[MLXSW_REG_PAOS_LEN];
+	u8 oper_status;
+	int err;
+
+	mlxsw_reg_paos_pack(paos_pl, mlxsw_sx_port->local_port, 0);
+	err = mlxsw_reg_query(mlxsw_sx->core, MLXSW_REG(paos), paos_pl);
+	if (err)
+		return err;
+	oper_status = mlxsw_reg_paos_oper_status_get(paos_pl);
+	*p_is_up = oper_status == MLXSW_PORT_ADMIN_STATUS_UP ? true : false;
+	return 0;
+}
+
+static int mlxsw_sx_port_mtu_set(struct mlxsw_sx_port *mlxsw_sx_port, u16 mtu)
+{
+	struct mlxsw_sx *mlxsw_sx = mlxsw_sx_port->mlxsw_sx;
+	char pmtu_pl[MLXSW_REG_PMTU_LEN];
+	int max_mtu;
+	int err;
+
+	mtu += MLXSW_TXHDR_LEN + ETH_HLEN;
+	mlxsw_reg_pmtu_pack(pmtu_pl, mlxsw_sx_port->local_port, 0);
+	err = mlxsw_reg_query(mlxsw_sx->core, MLXSW_REG(pmtu), pmtu_pl);
+	if (err)
+		return err;
+	max_mtu = mlxsw_reg_pmtu_max_mtu_get(pmtu_pl);
+
+	if (mtu > max_mtu)
+		return -EINVAL;
+
+	mlxsw_reg_pmtu_pack(pmtu_pl, mlxsw_sx_port->local_port, mtu);
+	return mlxsw_reg_write(mlxsw_sx->core, MLXSW_REG(pmtu), pmtu_pl);
+}
+
+static int mlxsw_sx_port_swid_set(struct mlxsw_sx_port *mlxsw_sx_port, u8 swid)
+{
+	struct mlxsw_sx *mlxsw_sx = mlxsw_sx_port->mlxsw_sx;
+	char pspa_pl[MLXSW_REG_PSPA_LEN];
+
+	mlxsw_reg_pspa_pack(pspa_pl, swid, mlxsw_sx_port->local_port);
+	return mlxsw_reg_write(mlxsw_sx->core, MLXSW_REG(pspa), pspa_pl);
+}
+
+static int mlxsw_sx_port_module_check(struct mlxsw_sx_port *mlxsw_sx_port,
+				      bool *p_usable)
+{
+	struct mlxsw_sx *mlxsw_sx = mlxsw_sx_port->mlxsw_sx;
+	char pmlp_pl[MLXSW_REG_PMLP_LEN];
+	int err;
+
+	mlxsw_reg_pmlp_pack(pmlp_pl, mlxsw_sx_port->local_port);
+	err = mlxsw_reg_query(mlxsw_sx->core, MLXSW_REG(pmlp), pmlp_pl);
+	if (err)
+		return err;
+	*p_usable = mlxsw_reg_pmlp_width_get(pmlp_pl) ? true : false;
+	return 0;
+}
+
+static int mlxsw_sx_port_open(struct net_device *dev)
+{
+	struct mlxsw_sx_port *mlxsw_sx_port = netdev_priv(dev);
+	int err;
+
+	err = mlxsw_sx_port_admin_status_set(mlxsw_sx_port, true);
+	if (err)
+		return err;
+	netif_start_queue(dev);
+	return 0;
+}
+
+static int mlxsw_sx_port_stop(struct net_device *dev)
+{
+	struct mlxsw_sx_port *mlxsw_sx_port = netdev_priv(dev);
+	int err;
+
+	netif_stop_queue(dev);
+	err = mlxsw_sx_port_admin_status_set(mlxsw_sx_port, false);
+	if (err)
+		return err;
+	return 0;
+}
+
+static netdev_tx_t mlxsw_sx_port_xmit(struct sk_buff *skb,
+				      struct net_device *dev)
+{
+	struct mlxsw_sx_port *mlxsw_sx_port = netdev_priv(dev);
+	struct mlxsw_sx *mlxsw_sx = mlxsw_sx_port->mlxsw_sx;
+	struct mlxsw_sx_port_pcpu_stats *pcpu_stats;
+	const struct mlxsw_tx_info tx_info = {
+		.local_port = mlxsw_sx_port->local_port,
+		.is_emad = false,
+	};
+	int err;
+
+	if (unlikely(skb_headroom(skb) < MLXSW_TXHDR_LEN)) {
+		struct sk_buff *skb_new;
+
+		skb_new = skb_realloc_headroom(skb, MLXSW_TXHDR_LEN);
+		dev_kfree_skb_any(skb);
+		if (!skb_new) {
+			this_cpu_inc(mlxsw_sx_port->pcpu_stats->tx_dropped);
+			return NETDEV_TX_OK;
+		}
+		skb = skb_new;
+	}
+	mlxsw_sx_txhdr_construct(skb, &tx_info);
+	err = mlxsw_core_skb_transmit(mlxsw_sx, skb, &tx_info);
+	if (err == -EAGAIN)
+		return NETDEV_TX_BUSY;
+
+	pcpu_stats = this_cpu_ptr(mlxsw_sx_port->pcpu_stats);
+	u64_stats_update_begin(&pcpu_stats->syncp);
+	pcpu_stats->tx_packets++;
+	pcpu_stats->tx_bytes += skb->len;
+	u64_stats_update_end(&pcpu_stats->syncp);
+
+	return NETDEV_TX_OK;
+}
+
+static int mlxsw_sx_port_change_mtu(struct net_device *dev, int mtu)
+{
+	struct mlxsw_sx_port *mlxsw_sx_port = netdev_priv(dev);
+	int err;
+
+	err = mlxsw_sx_port_mtu_set(mlxsw_sx_port, mtu);
+	if (err)
+		return err;
+	dev->mtu = mtu;
+	return 0;
+}
+
+static struct rtnl_link_stats64 *
+mlxsw_sx_port_get_stats64(struct net_device *dev,
+			  struct rtnl_link_stats64 *stats)
+{
+	struct mlxsw_sx_port *mlxsw_sx_port = netdev_priv(dev);
+	struct mlxsw_sx_port_pcpu_stats *p;
+	u64 rx_packets, rx_bytes, tx_packets, tx_bytes;
+	u32 tx_dropped = 0;
+	unsigned int start;
+	int i;
+
+	for_each_possible_cpu(i) {
+		p = per_cpu_ptr(mlxsw_sx_port->pcpu_stats, i);
+		do {
+			start = u64_stats_fetch_begin_irq(&p->syncp);
+			rx_packets	= p->rx_packets;
+			rx_bytes	= p->rx_bytes;
+			tx_packets	= p->tx_packets;
+			tx_bytes	= p->tx_bytes;
+		} while (u64_stats_fetch_retry_irq(&p->syncp, start));
+
+		stats->rx_packets	+= rx_packets;
+		stats->rx_bytes		+= rx_bytes;
+		stats->tx_packets	+= tx_packets;
+		stats->tx_bytes		+= tx_bytes;
+		/* tx_dropped is u32, updated without syncp protection. */
+		tx_dropped	+= p->tx_dropped;
+	}
+	stats->tx_dropped	= tx_dropped;
+	return stats;
+}
+
+static const struct net_device_ops mlxsw_sx_port_netdev_ops = {
+	.ndo_open		= mlxsw_sx_port_open,
+	.ndo_stop		= mlxsw_sx_port_stop,
+	.ndo_start_xmit		= mlxsw_sx_port_xmit,
+	.ndo_change_mtu		= mlxsw_sx_port_change_mtu,
+	.ndo_get_stats64	= mlxsw_sx_port_get_stats64,
+};
+
+static void mlxsw_sx_port_get_drvinfo(struct net_device *dev,
+				      struct ethtool_drvinfo *drvinfo)
+{
+	struct mlxsw_sx_port *mlxsw_sx_port = netdev_priv(dev);
+	struct mlxsw_sx *mlxsw_sx = mlxsw_sx_port->mlxsw_sx;
+
+	strlcpy(drvinfo->driver, mlxsw_sx_driver_name, sizeof(drvinfo->driver));
+	strlcpy(drvinfo->version, mlxsw_sx_driver_version,
+		sizeof(drvinfo->version));
+	snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version),
+		 "%d.%d.%d",
+		 mlxsw_sx->bus_info->fw_rev.major,
+		 mlxsw_sx->bus_info->fw_rev.minor,
+		 mlxsw_sx->bus_info->fw_rev.subminor);
+	strlcpy(drvinfo->bus_info, mlxsw_sx->bus_info->device_name,
+		sizeof(drvinfo->bus_info));
+}
+
+struct mlxsw_sx_port_hw_stats {
+	char str[ETH_GSTRING_LEN];
+	u64 (*getter)(char *payload);
+};
+
+static const struct mlxsw_sx_port_hw_stats mlxsw_sx_port_hw_stats[] = {
+	{
+		.str = "a_frames_transmitted_ok",
+		.getter = mlxsw_reg_ppcnt_a_frames_transmitted_ok_get,
+	},
+	{
+		.str = "a_frames_received_ok",
+		.getter = mlxsw_reg_ppcnt_a_frames_received_ok_get,
+	},
+	{
+		.str = "a_frame_check_sequence_errors",
+		.getter = mlxsw_reg_ppcnt_a_frame_check_sequence_errors_get,
+	},
+	{
+		.str = "a_alignment_errors",
+		.getter = mlxsw_reg_ppcnt_a_alignment_errors_get,
+	},
+	{
+		.str = "a_octets_transmitted_ok",
+		.getter = mlxsw_reg_ppcnt_a_octets_transmitted_ok_get,
+	},
+	{
+		.str = "a_octets_received_ok",
+		.getter = mlxsw_reg_ppcnt_a_octets_received_ok_get,
+	},
+	{
+		.str = "a_multicast_frames_xmitted_ok",
+		.getter = mlxsw_reg_ppcnt_a_multicast_frames_xmitted_ok_get,
+	},
+	{
+		.str = "a_broadcast_frames_xmitted_ok",
+		.getter = mlxsw_reg_ppcnt_a_broadcast_frames_xmitted_ok_get,
+	},
+	{
+		.str = "a_multicast_frames_received_ok",
+		.getter = mlxsw_reg_ppcnt_a_multicast_frames_received_ok_get,
+	},
+	{
+		.str = "a_broadcast_frames_received_ok",
+		.getter = mlxsw_reg_ppcnt_a_broadcast_frames_received_ok_get,
+	},
+	{
+		.str = "a_in_range_length_errors",
+		.getter = mlxsw_reg_ppcnt_a_in_range_length_errors_get,
+	},
+	{
+		.str = "a_out_of_range_length_field",
+		.getter = mlxsw_reg_ppcnt_a_out_of_range_length_field_get,
+	},
+	{
+		.str = "a_frame_too_long_errors",
+		.getter = mlxsw_reg_ppcnt_a_frame_too_long_errors_get,
+	},
+	{
+		.str = "a_symbol_error_during_carrier",
+		.getter = mlxsw_reg_ppcnt_a_symbol_error_during_carrier_get,
+	},
+	{
+		.str = "a_mac_control_frames_transmitted",
+		.getter = mlxsw_reg_ppcnt_a_mac_control_frames_transmitted_get,
+	},
+	{
+		.str = "a_mac_control_frames_received",
+		.getter = mlxsw_reg_ppcnt_a_mac_control_frames_received_get,
+	},
+	{
+		.str = "a_unsupported_opcodes_received",
+		.getter = mlxsw_reg_ppcnt_a_unsupported_opcodes_received_get,
+	},
+	{
+		.str = "a_pause_mac_ctrl_frames_received",
+		.getter = mlxsw_reg_ppcnt_a_pause_mac_ctrl_frames_received_get,
+	},
+	{
+		.str = "a_pause_mac_ctrl_frames_xmitted",
+		.getter = mlxsw_reg_ppcnt_a_pause_mac_ctrl_frames_transmitted_get,
+	},
+};
+
+#define MLXSW_SX_PORT_HW_STATS_LEN ARRAY_SIZE(mlxsw_sx_port_hw_stats)
+
+static void mlxsw_sx_port_get_strings(struct net_device *dev,
+				      u32 stringset, u8 *data)
+{
+	u8 *p = data;
+	int i;
+
+	switch (stringset) {
+	case ETH_SS_STATS:
+		for (i = 0; i < MLXSW_SX_PORT_HW_STATS_LEN; i++) {
+			memcpy(p, mlxsw_sx_port_hw_stats[i].str,
+			       ETH_GSTRING_LEN);
+			p += ETH_GSTRING_LEN;
+		}
+		break;
+	}
+}
+
+static void mlxsw_sx_port_get_stats(struct net_device *dev,
+				    struct ethtool_stats *stats, u64 *data)
+{
+	struct mlxsw_sx_port *mlxsw_sx_port = netdev_priv(dev);
+	struct mlxsw_sx *mlxsw_sx = mlxsw_sx_port->mlxsw_sx;
+	char ppcnt_pl[MLXSW_REG_PPCNT_LEN];
+	int i;
+	int err;
+
+	mlxsw_reg_ppcnt_pack(ppcnt_pl, mlxsw_sx_port->local_port);
+	err = mlxsw_reg_query(mlxsw_sx->core, MLXSW_REG(ppcnt), ppcnt_pl);
+	for (i = 0; i < MLXSW_SX_PORT_HW_STATS_LEN; i++)
+		data[i] = !err ? mlxsw_sx_port_hw_stats[i].getter(ppcnt_pl) : 0;
+}
+
+static int mlxsw_sx_port_get_sset_count(struct net_device *dev, int sset)
+{
+	switch (sset) {
+	case ETH_SS_STATS:
+		return MLXSW_SX_PORT_HW_STATS_LEN;
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+struct mlxsw_sx_port_link_mode {
+	u32 mask;
+	u32 supported;
+	u32 advertised;
+	u32 speed;
+};
+
+static const struct mlxsw_sx_port_link_mode mlxsw_sx_port_link_mode[] = {
+	{
+		.mask		= MLXSW_REG_PTYS_ETH_SPEED_100BASE_T,
+		.supported	= SUPPORTED_100baseT_Full,
+		.advertised	= ADVERTISED_100baseT_Full,
+		.speed		= 100,
+	},
+	{
+		.mask		= MLXSW_REG_PTYS_ETH_SPEED_100BASE_TX,
+		.speed		= 100,
+	},
+	{
+		.mask		= MLXSW_REG_PTYS_ETH_SPEED_SGMII |
+				  MLXSW_REG_PTYS_ETH_SPEED_1000BASE_KX,
+		.supported	= SUPPORTED_1000baseKX_Full,
+		.advertised	= ADVERTISED_1000baseKX_Full,
+		.speed		= 1000,
+	},
+	{
+		.mask		= MLXSW_REG_PTYS_ETH_SPEED_10GBASE_T,
+		.supported	= SUPPORTED_10000baseT_Full,
+		.advertised	= ADVERTISED_10000baseT_Full,
+		.speed		= 10000,
+	},
+	{
+		.mask		= MLXSW_REG_PTYS_ETH_SPEED_10GBASE_CX4 |
+				  MLXSW_REG_PTYS_ETH_SPEED_10GBASE_KX4,
+		.supported	= SUPPORTED_10000baseKX4_Full,
+		.advertised	= ADVERTISED_10000baseKX4_Full,
+		.speed		= 10000,
+	},
+	{
+		.mask		= MLXSW_REG_PTYS_ETH_SPEED_10GBASE_KR |
+				  MLXSW_REG_PTYS_ETH_SPEED_10GBASE_CR |
+				  MLXSW_REG_PTYS_ETH_SPEED_10GBASE_SR |
+				  MLXSW_REG_PTYS_ETH_SPEED_10GBASE_ER_LR,
+		.supported	= SUPPORTED_10000baseKR_Full,
+		.advertised	= ADVERTISED_10000baseKR_Full,
+		.speed		= 10000,
+	},
+	{
+		.mask		= MLXSW_REG_PTYS_ETH_SPEED_20GBASE_KR2,
+		.supported	= SUPPORTED_20000baseKR2_Full,
+		.advertised	= ADVERTISED_20000baseKR2_Full,
+		.speed		= 20000,
+	},
+	{
+		.mask		= MLXSW_REG_PTYS_ETH_SPEED_40GBASE_CR4,
+		.supported	= SUPPORTED_40000baseCR4_Full,
+		.advertised	= ADVERTISED_40000baseCR4_Full,
+		.speed		= 40000,
+	},
+	{
+		.mask		= MLXSW_REG_PTYS_ETH_SPEED_40GBASE_KR4,
+		.supported	= SUPPORTED_40000baseKR4_Full,
+		.advertised	= ADVERTISED_40000baseKR4_Full,
+		.speed		= 40000,
+	},
+	{
+		.mask		= MLXSW_REG_PTYS_ETH_SPEED_40GBASE_SR4,
+		.supported	= SUPPORTED_40000baseSR4_Full,
+		.advertised	= ADVERTISED_40000baseSR4_Full,
+		.speed		= 40000,
+	},
+	{
+		.mask		= MLXSW_REG_PTYS_ETH_SPEED_40GBASE_LR4_ER4,
+		.supported	= SUPPORTED_40000baseLR4_Full,
+		.advertised	= ADVERTISED_40000baseLR4_Full,
+		.speed		= 40000,
+	},
+	{
+		.mask		= MLXSW_REG_PTYS_ETH_SPEED_25GBASE_CR |
+				  MLXSW_REG_PTYS_ETH_SPEED_25GBASE_KR |
+				  MLXSW_REG_PTYS_ETH_SPEED_25GBASE_SR,
+		.speed		= 25000,
+	},
+	{
+		.mask		= MLXSW_REG_PTYS_ETH_SPEED_50GBASE_KR4 |
+				  MLXSW_REG_PTYS_ETH_SPEED_50GBASE_CR2 |
+				  MLXSW_REG_PTYS_ETH_SPEED_50GBASE_KR2,
+		.speed		= 50000,
+	},
+	{
+		.mask		= MLXSW_REG_PTYS_ETH_SPEED_56GBASE_R4,
+		.supported	= SUPPORTED_56000baseKR4_Full,
+		.advertised	= ADVERTISED_56000baseKR4_Full,
+		.speed		= 56000,
+	},
+	{
+		.mask		= MLXSW_REG_PTYS_ETH_SPEED_100GBASE_CR4 |
+				  MLXSW_REG_PTYS_ETH_SPEED_100GBASE_SR4 |
+				  MLXSW_REG_PTYS_ETH_SPEED_100GBASE_KR4 |
+				  MLXSW_REG_PTYS_ETH_SPEED_100GBASE_LR4_ER4,
+		.speed		= 100000,
+	},
+};
+
+#define MLXSW_SX_PORT_LINK_MODE_LEN ARRAY_SIZE(mlxsw_sx_port_link_mode)
+
+static u32 mlxsw_sx_from_ptys_supported_port(u32 ptys_eth_proto)
+{
+	if (ptys_eth_proto & (MLXSW_REG_PTYS_ETH_SPEED_10GBASE_CR |
+			      MLXSW_REG_PTYS_ETH_SPEED_10GBASE_SR |
+			      MLXSW_REG_PTYS_ETH_SPEED_40GBASE_CR4 |
+			      MLXSW_REG_PTYS_ETH_SPEED_40GBASE_SR4 |
+			      MLXSW_REG_PTYS_ETH_SPEED_100GBASE_SR4 |
+			      MLXSW_REG_PTYS_ETH_SPEED_SGMII))
+		return SUPPORTED_FIBRE;
+
+	if (ptys_eth_proto & (MLXSW_REG_PTYS_ETH_SPEED_10GBASE_KR |
+			      MLXSW_REG_PTYS_ETH_SPEED_10GBASE_KX4 |
+			      MLXSW_REG_PTYS_ETH_SPEED_40GBASE_KR4 |
+			      MLXSW_REG_PTYS_ETH_SPEED_100GBASE_KR4 |
+			      MLXSW_REG_PTYS_ETH_SPEED_1000BASE_KX))
+		return SUPPORTED_Backplane;
+	return 0;
+}
+
+static u32 mlxsw_sx_from_ptys_supported_link(u32 ptys_eth_proto)
+{
+	u32 modes = 0;
+	int i;
+
+	for (i = 0; i < MLXSW_SX_PORT_LINK_MODE_LEN; i++) {
+		if (ptys_eth_proto & mlxsw_sx_port_link_mode[i].mask)
+			modes |= mlxsw_sx_port_link_mode[i].supported;
+	}
+	return modes;
+}
+
+static u32 mlxsw_sx_from_ptys_advert_link(u32 ptys_eth_proto)
+{
+	u32 modes = 0;
+	int i;
+
+	for (i = 0; i < MLXSW_SX_PORT_LINK_MODE_LEN; i++) {
+		if (ptys_eth_proto & mlxsw_sx_port_link_mode[i].mask)
+			modes |= mlxsw_sx_port_link_mode[i].advertised;
+	}
+	return modes;
+}
+
+static void mlxsw_sx_from_ptys_speed_duplex(bool carrier_ok, u32 ptys_eth_proto,
+					    struct ethtool_cmd *cmd)
+{
+	u32 speed = SPEED_UNKNOWN;
+	u8 duplex = DUPLEX_UNKNOWN;
+	int i;
+
+	if (!carrier_ok)
+		goto out;
+
+	for (i = 0; i < MLXSW_SX_PORT_LINK_MODE_LEN; i++) {
+		if (ptys_eth_proto & mlxsw_sx_port_link_mode[i].mask) {
+			speed = mlxsw_sx_port_link_mode[i].speed;
+			duplex = DUPLEX_FULL;
+			break;
+		}
+	}
+out:
+	ethtool_cmd_speed_set(cmd, speed);
+	cmd->duplex = duplex;
+}
+
+static u8 mlxsw_sx_port_connector_port(u32 ptys_eth_proto)
+{
+	if (ptys_eth_proto & (MLXSW_REG_PTYS_ETH_SPEED_10GBASE_SR |
+			      MLXSW_REG_PTYS_ETH_SPEED_40GBASE_SR4 |
+			      MLXSW_REG_PTYS_ETH_SPEED_100GBASE_SR4 |
+			      MLXSW_REG_PTYS_ETH_SPEED_SGMII))
+		return PORT_FIBRE;
+
+	if (ptys_eth_proto & (MLXSW_REG_PTYS_ETH_SPEED_10GBASE_CR |
+			      MLXSW_REG_PTYS_ETH_SPEED_40GBASE_CR4 |
+			      MLXSW_REG_PTYS_ETH_SPEED_100GBASE_CR4))
+		return PORT_DA;
+
+	if (ptys_eth_proto & (MLXSW_REG_PTYS_ETH_SPEED_10GBASE_KR |
+			      MLXSW_REG_PTYS_ETH_SPEED_10GBASE_KX4 |
+			      MLXSW_REG_PTYS_ETH_SPEED_40GBASE_KR4 |
+			      MLXSW_REG_PTYS_ETH_SPEED_100GBASE_KR4))
+		return PORT_NONE;
+
+	return PORT_OTHER;
+}
+
+static int mlxsw_sx_port_get_settings(struct net_device *dev,
+				      struct ethtool_cmd *cmd)
+{
+	struct mlxsw_sx_port *mlxsw_sx_port = netdev_priv(dev);
+	struct mlxsw_sx *mlxsw_sx = mlxsw_sx_port->mlxsw_sx;
+	char ptys_pl[MLXSW_REG_PTYS_LEN];
+	u32 eth_proto_cap;
+	u32 eth_proto_admin;
+	u32 eth_proto_oper;
+	int err;
+
+	mlxsw_reg_ptys_pack(ptys_pl, mlxsw_sx_port->local_port, 0);
+	err = mlxsw_reg_query(mlxsw_sx->core, MLXSW_REG(ptys), ptys_pl);
+	if (err) {
+		netdev_err(dev, "Failed to get proto");
+		return err;
+	}
+	mlxsw_reg_ptys_unpack(ptys_pl, &eth_proto_cap,
+			      &eth_proto_admin, &eth_proto_oper);
+
+	cmd->supported = mlxsw_sx_from_ptys_supported_port(eth_proto_cap) |
+			 mlxsw_sx_from_ptys_supported_link(eth_proto_cap) |
+			 SUPPORTED_Pause | SUPPORTED_Asym_Pause;
+	cmd->advertising = mlxsw_sx_from_ptys_advert_link(eth_proto_admin);
+	mlxsw_sx_from_ptys_speed_duplex(netif_carrier_ok(dev),
+					eth_proto_oper, cmd);
+
+	eth_proto_oper = eth_proto_oper ? eth_proto_oper : eth_proto_cap;
+	cmd->port = mlxsw_sx_port_connector_port(eth_proto_oper);
+	cmd->lp_advertising = mlxsw_sx_from_ptys_advert_link(eth_proto_oper);
+
+	cmd->transceiver = XCVR_INTERNAL;
+	return 0;
+}
+
+static u32 mlxsw_sx_to_ptys_advert_link(u32 advertising)
+{
+	u32 ptys_proto = 0;
+	int i;
+
+	for (i = 0; i < MLXSW_SX_PORT_LINK_MODE_LEN; i++) {
+		if (advertising & mlxsw_sx_port_link_mode[i].advertised)
+			ptys_proto |= mlxsw_sx_port_link_mode[i].mask;
+	}
+	return ptys_proto;
+}
+
+static u32 mlxsw_sx_to_ptys_speed(u32 speed)
+{
+	u32 ptys_proto = 0;
+	int i;
+
+	for (i = 0; i < MLXSW_SX_PORT_LINK_MODE_LEN; i++) {
+		if (speed == mlxsw_sx_port_link_mode[i].speed)
+			ptys_proto |= mlxsw_sx_port_link_mode[i].mask;
+	}
+	return ptys_proto;
+}
+
+static int mlxsw_sx_port_set_settings(struct net_device *dev,
+				      struct ethtool_cmd *cmd)
+{
+	struct mlxsw_sx_port *mlxsw_sx_port = netdev_priv(dev);
+	struct mlxsw_sx *mlxsw_sx = mlxsw_sx_port->mlxsw_sx;
+	char ptys_pl[MLXSW_REG_PTYS_LEN];
+	u32 speed;
+	u32 eth_proto_new;
+	u32 eth_proto_cap;
+	u32 eth_proto_admin;
+	bool is_up;
+	int err;
+
+	speed = ethtool_cmd_speed(cmd);
+
+	eth_proto_new = cmd->autoneg == AUTONEG_ENABLE ?
+		mlxsw_sx_to_ptys_advert_link(cmd->advertising) :
+		mlxsw_sx_to_ptys_speed(speed);
+
+	mlxsw_reg_ptys_pack(ptys_pl, mlxsw_sx_port->local_port, 0);
+	err = mlxsw_reg_query(mlxsw_sx->core, MLXSW_REG(ptys), ptys_pl);
+	if (err) {
+		netdev_err(dev, "Failed to get proto");
+		return err;
+	}
+	mlxsw_reg_ptys_unpack(ptys_pl, &eth_proto_cap, &eth_proto_admin, NULL);
+
+	eth_proto_new = eth_proto_new & eth_proto_cap;
+	if (!eth_proto_new) {
+		netdev_err(dev, "Not supported proto admin requested");
+		return -EINVAL;
+	}
+	if (eth_proto_new == eth_proto_admin)
+		return 0;
+
+	mlxsw_reg_ptys_pack(ptys_pl, mlxsw_sx_port->local_port, eth_proto_new);
+	err = mlxsw_reg_write(mlxsw_sx->core, MLXSW_REG(ptys), ptys_pl);
+	if (err) {
+		netdev_err(dev, "Failed to set proto admin");
+		return err;
+	}
+
+	err = mlxsw_sx_port_oper_status_get(mlxsw_sx_port, &is_up);
+	if (err) {
+		netdev_err(dev, "Failed to get oper status");
+		return err;
+	}
+	if (!is_up)
+		return 0;
+
+	err = mlxsw_sx_port_admin_status_set(mlxsw_sx_port, false);
+	if (err) {
+		netdev_err(dev, "Failed to set admin status");
+		return err;
+	}
+
+	err = mlxsw_sx_port_admin_status_set(mlxsw_sx_port, true);
+	if (err) {
+		netdev_err(dev, "Failed to set admin status");
+		return err;
+	}
+
+	return 0;
+}
+
+static const struct ethtool_ops mlxsw_sx_port_ethtool_ops = {
+	.get_drvinfo		= mlxsw_sx_port_get_drvinfo,
+	.get_link		= ethtool_op_get_link,
+	.get_strings		= mlxsw_sx_port_get_strings,
+	.get_ethtool_stats	= mlxsw_sx_port_get_stats,
+	.get_sset_count		= mlxsw_sx_port_get_sset_count,
+	.get_settings		= mlxsw_sx_port_get_settings,
+	.set_settings		= mlxsw_sx_port_set_settings,
+};
+
+static int mlxsw_sx_port_attr_get(struct net_device *dev,
+				  struct switchdev_attr *attr)
+{
+	struct mlxsw_sx_port *mlxsw_sx_port = netdev_priv(dev);
+	struct mlxsw_sx *mlxsw_sx = mlxsw_sx_port->mlxsw_sx;
+
+	switch (attr->id) {
+	case SWITCHDEV_ATTR_PORT_PARENT_ID:
+		attr->u.ppid.id_len = sizeof(mlxsw_sx->hw_id);
+		memcpy(&attr->u.ppid.id, &mlxsw_sx->hw_id, attr->u.ppid.id_len);
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	return 0;
+}
+
+static const struct switchdev_ops mlxsw_sx_port_switchdev_ops = {
+	.switchdev_port_attr_get	= mlxsw_sx_port_attr_get,
+};
+
+static int mlxsw_sx_hw_id_get(struct mlxsw_sx *mlxsw_sx)
+{
+	char spad_pl[MLXSW_REG_SPAD_LEN];
+	int err;
+
+	err = mlxsw_reg_query(mlxsw_sx->core, MLXSW_REG(spad), spad_pl);
+	if (err)
+		return err;
+	mlxsw_reg_spad_base_mac_memcpy_from(spad_pl, mlxsw_sx->hw_id);
+	return 0;
+}
+
+static int mlxsw_sx_port_dev_addr_get(struct mlxsw_sx_port *mlxsw_sx_port)
+{
+	struct mlxsw_sx *mlxsw_sx = mlxsw_sx_port->mlxsw_sx;
+	struct net_device *dev = mlxsw_sx_port->dev;
+	char ppad_pl[MLXSW_REG_PPAD_LEN];
+	int err;
+
+	mlxsw_reg_ppad_pack(ppad_pl, false, 0);
+	err = mlxsw_reg_query(mlxsw_sx->core, MLXSW_REG(ppad), ppad_pl);
+	if (err)
+		return err;
+	mlxsw_reg_ppad_mac_memcpy_from(ppad_pl, dev->dev_addr);
+	/* The last byte in base mac address is always 0 */
+	dev->dev_addr[ETH_ALEN - 1] += mlxsw_sx_port->local_port;
+	return 0;
+}
+
+static int mlxsw_sx_port_stp_state_set(struct mlxsw_sx_port *mlxsw_sx_port,
+				       u16 vid, enum mlxsw_reg_spms_state state)
+{
+	struct mlxsw_sx *mlxsw_sx = mlxsw_sx_port->mlxsw_sx;
+	char *spms_pl;
+	int err;
+
+	spms_pl = kmalloc(MLXSW_REG_SPMS_LEN, GFP_KERNEL);
+	if (!spms_pl)
+		return -ENOMEM;
+	mlxsw_reg_spms_pack(spms_pl, mlxsw_sx_port->local_port, vid, state);
+	err = mlxsw_reg_write(mlxsw_sx->core, MLXSW_REG(spms), spms_pl);
+	kfree(spms_pl);
+	return err;
+}
+
+static int mlxsw_sx_port_speed_set(struct mlxsw_sx_port *mlxsw_sx_port,
+				   u32 speed)
+{
+	struct mlxsw_sx *mlxsw_sx = mlxsw_sx_port->mlxsw_sx;
+	char ptys_pl[MLXSW_REG_PTYS_LEN];
+
+	mlxsw_reg_ptys_pack(ptys_pl, mlxsw_sx_port->local_port, speed);
+	return mlxsw_reg_write(mlxsw_sx->core, MLXSW_REG(ptys), ptys_pl);
+}
+
+static int
+mlxsw_sx_port_mac_learning_mode_set(struct mlxsw_sx_port *mlxsw_sx_port,
+				    enum mlxsw_reg_spmlr_learn_mode mode)
+{
+	struct mlxsw_sx *mlxsw_sx = mlxsw_sx_port->mlxsw_sx;
+	char spmlr_pl[MLXSW_REG_SPMLR_LEN];
+
+	mlxsw_reg_spmlr_pack(spmlr_pl, mlxsw_sx_port->local_port, mode);
+	return mlxsw_reg_write(mlxsw_sx->core, MLXSW_REG(spmlr), spmlr_pl);
+}
+
+static int mlxsw_sx_port_create(struct mlxsw_sx *mlxsw_sx, u8 local_port)
+{
+	struct mlxsw_sx_port *mlxsw_sx_port;
+	struct net_device *dev;
+	bool usable;
+	int err;
+
+	dev = alloc_etherdev(sizeof(struct mlxsw_sx_port));
+	if (!dev)
+		return -ENOMEM;
+	mlxsw_sx_port = netdev_priv(dev);
+	mlxsw_sx_port->dev = dev;
+	mlxsw_sx_port->mlxsw_sx = mlxsw_sx;
+	mlxsw_sx_port->local_port = local_port;
+
+	mlxsw_sx_port->pcpu_stats =
+		netdev_alloc_pcpu_stats(struct mlxsw_sx_port_pcpu_stats);
+	if (!mlxsw_sx_port->pcpu_stats) {
+		err = -ENOMEM;
+		goto err_alloc_stats;
+	}
+
+	dev->netdev_ops = &mlxsw_sx_port_netdev_ops;
+	dev->ethtool_ops = &mlxsw_sx_port_ethtool_ops;
+	dev->switchdev_ops = &mlxsw_sx_port_switchdev_ops;
+
+	err = mlxsw_sx_port_dev_addr_get(mlxsw_sx_port);
+	if (err) {
+		dev_err(mlxsw_sx->bus_info->dev, "Port %d: Unable to get port mac address\n",
+			mlxsw_sx_port->local_port);
+		goto err_dev_addr_get;
+	}
+
+	netif_carrier_off(dev);
+
+	dev->features |= NETIF_F_NETNS_LOCAL | NETIF_F_LLTX |
+			 NETIF_F_VLAN_CHALLENGED;
+
+	/* Each packet needs to have a Tx header (metadata) on top all other
+	 * headers.
+	 */
+	dev->hard_header_len += MLXSW_TXHDR_LEN;
+
+	err = mlxsw_sx_port_module_check(mlxsw_sx_port, &usable);
+	if (err) {
+		dev_err(mlxsw_sx->bus_info->dev, "Port %d: Failed to check module\n",
+			mlxsw_sx_port->local_port);
+		goto err_port_module_check;
+	}
+
+	if (!usable) {
+		dev_dbg(mlxsw_sx->bus_info->dev, "Port %d: Not usable, skipping initialization\n",
+			mlxsw_sx_port->local_port);
+		goto port_not_usable;
+	}
+
+	err = mlxsw_sx_port_swid_set(mlxsw_sx_port, 0);
+	if (err) {
+		dev_err(mlxsw_sx->bus_info->dev, "Port %d: Failed to set SWID\n",
+			mlxsw_sx_port->local_port);
+		goto err_port_swid_set;
+	}
+
+	err = mlxsw_sx_port_speed_set(mlxsw_sx_port,
+				      MLXSW_REG_PTYS_ETH_SPEED_40GBASE_CR4);
+	if (err) {
+		dev_err(mlxsw_sx->bus_info->dev, "Port %d: Failed to set speed\n",
+			mlxsw_sx_port->local_port);
+		goto err_port_speed_set;
+	}
+
+	err = mlxsw_sx_port_mtu_set(mlxsw_sx_port, ETH_DATA_LEN);
+	if (err) {
+		dev_err(mlxsw_sx->bus_info->dev, "Port %d: Failed to set MTU\n",
+			mlxsw_sx_port->local_port);
+		goto err_port_mtu_set;
+	}
+
+	err = mlxsw_sx_port_admin_status_set(mlxsw_sx_port, false);
+	if (err)
+		goto err_port_admin_status_set;
+
+	err = mlxsw_sx_port_stp_state_set(mlxsw_sx_port,
+					  MLXSW_PORT_DEFAULT_VID,
+					  MLXSW_REG_SPMS_STATE_FORWARDING);
+	if (err) {
+		dev_err(mlxsw_sx->bus_info->dev, "Port %d: Failed to set STP state\n",
+			mlxsw_sx_port->local_port);
+		goto err_port_stp_state_set;
+	}
+
+	err = mlxsw_sx_port_mac_learning_mode_set(mlxsw_sx_port,
+						  MLXSW_REG_SPMLR_LEARN_MODE_DISABLE);
+	if (err) {
+		dev_err(mlxsw_sx->bus_info->dev, "Port %d: Failed to set MAC learning mode\n",
+			mlxsw_sx_port->local_port);
+		goto err_port_mac_learning_mode_set;
+	}
+
+	err = register_netdev(dev);
+	if (err) {
+		dev_err(mlxsw_sx->bus_info->dev, "Port %d: Failed to register netdev\n",
+			mlxsw_sx_port->local_port);
+		goto err_register_netdev;
+	}
+
+	mlxsw_sx->ports[local_port] = mlxsw_sx_port;
+	return 0;
+
+err_register_netdev:
+err_port_admin_status_set:
+err_port_mac_learning_mode_set:
+err_port_stp_state_set:
+err_port_mtu_set:
+err_port_speed_set:
+err_port_swid_set:
+port_not_usable:
+err_port_module_check:
+err_dev_addr_get:
+	free_percpu(mlxsw_sx_port->pcpu_stats);
+err_alloc_stats:
+	free_netdev(dev);
+	return err;
+}
+
+static void mlxsw_sx_port_remove(struct mlxsw_sx *mlxsw_sx, u8 local_port)
+{
+	struct mlxsw_sx_port *mlxsw_sx_port = mlxsw_sx->ports[local_port];
+
+	if (!mlxsw_sx_port)
+		return;
+	unregister_netdev(mlxsw_sx_port->dev); /* This calls ndo_stop */
+	mlxsw_sx_port_swid_set(mlxsw_sx_port, MLXSW_PORT_SWID_DISABLED_PORT);
+	free_percpu(mlxsw_sx_port->pcpu_stats);
+}
+
+static void mlxsw_sx_ports_remove(struct mlxsw_sx *mlxsw_sx)
+{
+	int i;
+
+	for (i = 1; i < MLXSW_PORT_MAX_PORTS; i++)
+		mlxsw_sx_port_remove(mlxsw_sx, i);
+	kfree(mlxsw_sx->ports);
+}
+
+static int mlxsw_sx_ports_create(struct mlxsw_sx *mlxsw_sx)
+{
+	size_t alloc_size;
+	int i;
+	int err;
+
+	alloc_size = sizeof(struct mlxsw_sx_port *) * MLXSW_PORT_MAX_PORTS;
+	mlxsw_sx->ports = kzalloc(alloc_size, GFP_KERNEL);
+	if (!mlxsw_sx->ports)
+		return -ENOMEM;
+
+	for (i = 1; i < MLXSW_PORT_MAX_PORTS; i++) {
+		err = mlxsw_sx_port_create(mlxsw_sx, i);
+		if (err)
+			goto err_port_create;
+	}
+	return 0;
+
+err_port_create:
+	for (i--; i >= 1; i--)
+		mlxsw_sx_port_remove(mlxsw_sx, i);
+	kfree(mlxsw_sx->ports);
+	return err;
+}
+
+static void mlxsw_sx_pude_event_func(const struct mlxsw_reg_info *reg,
+				     char *pude_pl, void *priv)
+{
+	struct mlxsw_sx *mlxsw_sx = priv;
+	struct mlxsw_sx_port *mlxsw_sx_port;
+	enum mlxsw_reg_pude_oper_status status;
+	u8 local_port;
+
+	local_port = mlxsw_reg_pude_local_port_get(pude_pl);
+	mlxsw_sx_port = mlxsw_sx->ports[local_port];
+	if (!mlxsw_sx_port) {
+		dev_warn(mlxsw_sx->bus_info->dev, "Port %d: Link event received for non-existent port\n",
+			 local_port);
+		return;
+	}
+
+	status = mlxsw_reg_pude_oper_status_get(pude_pl);
+	if (MLXSW_PORT_OPER_STATUS_UP == status) {
+		netdev_info(mlxsw_sx_port->dev, "link up\n");
+		netif_carrier_on(mlxsw_sx_port->dev);
+	} else {
+		netdev_info(mlxsw_sx_port->dev, "link down\n");
+		netif_carrier_off(mlxsw_sx_port->dev);
+	}
+}
+
+static struct mlxsw_event_listener mlxsw_sx_pude_event = {
+	.func = mlxsw_sx_pude_event_func,
+	.trap_id = MLXSW_TRAP_ID_PUDE,
+};
+
+static int mlxsw_sx_event_register(struct mlxsw_sx *mlxsw_sx,
+				   enum mlxsw_event_trap_id trap_id)
+{
+	struct mlxsw_event_listener *el;
+	char hpkt_pl[MLXSW_REG_HPKT_LEN];
+	int err;
+
+	switch (trap_id) {
+	case MLXSW_TRAP_ID_PUDE:
+		el = &mlxsw_sx_pude_event;
+		break;
+	}
+	err = mlxsw_core_event_listener_register(mlxsw_sx->core, el, mlxsw_sx);
+	if (err)
+		return err;
+
+	mlxsw_reg_hpkt_pack(hpkt_pl, MLXSW_REG_HPKT_ACTION_FORWARD,
+			    MLXSW_REG_HTGT_TRAP_GROUP_EMAD, trap_id);
+	err = mlxsw_reg_write(mlxsw_sx->core, MLXSW_REG(hpkt), hpkt_pl);
+	if (err)
+		goto err_event_trap_set;
+
+	return 0;
+
+err_event_trap_set:
+	mlxsw_core_event_listener_unregister(mlxsw_sx->core, el, mlxsw_sx);
+	return err;
+}
+
+static void mlxsw_sx_event_unregister(struct mlxsw_sx *mlxsw_sx,
+				      enum mlxsw_event_trap_id trap_id)
+{
+	struct mlxsw_event_listener *el;
+
+	switch (trap_id) {
+	case MLXSW_TRAP_ID_PUDE:
+		el = &mlxsw_sx_pude_event;
+		break;
+	}
+	mlxsw_core_event_listener_unregister(mlxsw_sx->core, el, mlxsw_sx);
+}
+
+static void mlxsw_sx_rx_listener_func(struct sk_buff *skb, u8 local_port,
+				      u16 trap_id, void *priv)
+{
+	struct mlxsw_sx *mlxsw_sx = priv;
+	struct mlxsw_sx_port *mlxsw_sx_port = mlxsw_sx->ports[local_port];
+	struct mlxsw_sx_port_pcpu_stats *pcpu_stats;
+
+	if (unlikely(!mlxsw_sx_port)) {
+		if (net_ratelimit())
+			dev_warn(mlxsw_sx->bus_info->dev, "Port %d: skb received for non-existent port\n",
+				 local_port);
+		return;
+	}
+
+	skb->dev = mlxsw_sx_port->dev;
+
+	pcpu_stats = this_cpu_ptr(mlxsw_sx_port->pcpu_stats);
+	u64_stats_update_begin(&pcpu_stats->syncp);
+	pcpu_stats->rx_packets++;
+	pcpu_stats->rx_bytes += skb->len;
+	u64_stats_update_end(&pcpu_stats->syncp);
+
+	skb->protocol = eth_type_trans(skb, skb->dev);
+	netif_receive_skb(skb);
+}
+
+static const struct mlxsw_rx_listener mlxsw_sx_rx_listener[] = {
+	{
+		.func = mlxsw_sx_rx_listener_func,
+		.local_port = MLXSW_PORT_MAX_PORTS,
+		.trap_id = MLXSW_TRAP_ID_FDB_MC,
+	},
+	/* Traps for specific L2 packet types, not trapped as FDB MC */
+	{
+		.func = mlxsw_sx_rx_listener_func,
+		.local_port = MLXSW_PORT_MAX_PORTS,
+		.trap_id = MLXSW_TRAP_ID_STP,
+	},
+	{
+		.func = mlxsw_sx_rx_listener_func,
+		.local_port = MLXSW_PORT_MAX_PORTS,
+		.trap_id = MLXSW_TRAP_ID_LACP,
+	},
+	{
+		.func = mlxsw_sx_rx_listener_func,
+		.local_port = MLXSW_PORT_MAX_PORTS,
+		.trap_id = MLXSW_TRAP_ID_EAPOL,
+	},
+	{
+		.func = mlxsw_sx_rx_listener_func,
+		.local_port = MLXSW_PORT_MAX_PORTS,
+		.trap_id = MLXSW_TRAP_ID_LLDP,
+	},
+	{
+		.func = mlxsw_sx_rx_listener_func,
+		.local_port = MLXSW_PORT_MAX_PORTS,
+		.trap_id = MLXSW_TRAP_ID_MMRP,
+	},
+	{
+		.func = mlxsw_sx_rx_listener_func,
+		.local_port = MLXSW_PORT_MAX_PORTS,
+		.trap_id = MLXSW_TRAP_ID_MVRP,
+	},
+	{
+		.func = mlxsw_sx_rx_listener_func,
+		.local_port = MLXSW_PORT_MAX_PORTS,
+		.trap_id = MLXSW_TRAP_ID_RPVST,
+	},
+	{
+		.func = mlxsw_sx_rx_listener_func,
+		.local_port = MLXSW_PORT_MAX_PORTS,
+		.trap_id = MLXSW_TRAP_ID_DHCP,
+	},
+	{
+		.func = mlxsw_sx_rx_listener_func,
+		.local_port = MLXSW_PORT_MAX_PORTS,
+		.trap_id = MLXSW_TRAP_ID_IGMP_QUERY,
+	},
+	{
+		.func = mlxsw_sx_rx_listener_func,
+		.local_port = MLXSW_PORT_MAX_PORTS,
+		.trap_id = MLXSW_TRAP_ID_IGMP_V1_REPORT,
+	},
+	{
+		.func = mlxsw_sx_rx_listener_func,
+		.local_port = MLXSW_PORT_MAX_PORTS,
+		.trap_id = MLXSW_TRAP_ID_IGMP_V2_REPORT,
+	},
+	{
+		.func = mlxsw_sx_rx_listener_func,
+		.local_port = MLXSW_PORT_MAX_PORTS,
+		.trap_id = MLXSW_TRAP_ID_IGMP_V2_LEAVE,
+	},
+	{
+		.func = mlxsw_sx_rx_listener_func,
+		.local_port = MLXSW_PORT_MAX_PORTS,
+		.trap_id = MLXSW_TRAP_ID_IGMP_V3_REPORT,
+	},
+};
+
+static int mlxsw_sx_traps_init(struct mlxsw_sx *mlxsw_sx)
+{
+	char htgt_pl[MLXSW_REG_HTGT_LEN];
+	char hpkt_pl[MLXSW_REG_HPKT_LEN];
+	int i;
+	int err;
+
+	mlxsw_reg_htgt_pack(htgt_pl, MLXSW_REG_HTGT_TRAP_GROUP_RX);
+	err = mlxsw_reg_write(mlxsw_sx->core, MLXSW_REG(htgt), htgt_pl);
+	if (err)
+		return err;
+
+	for (i = 0; i < ARRAY_SIZE(mlxsw_sx_rx_listener); i++) {
+		err = mlxsw_core_rx_listener_register(mlxsw_sx->core,
+						      &mlxsw_sx_rx_listener[i],
+						      mlxsw_sx);
+		if (err)
+			goto err_rx_listener_register;
+
+		mlxsw_reg_hpkt_pack(hpkt_pl, MLXSW_REG_HPKT_ACTION_TRAP_TO_CPU,
+				    MLXSW_REG_HTGT_TRAP_GROUP_RX,
+				    mlxsw_sx_rx_listener[i].trap_id);
+		err = mlxsw_reg_write(mlxsw_sx->core, MLXSW_REG(hpkt), hpkt_pl);
+		if (err)
+			goto err_rx_trap_set;
+	}
+	return 0;
+
+err_rx_trap_set:
+	mlxsw_core_rx_listener_unregister(mlxsw_sx->core,
+					  &mlxsw_sx_rx_listener[i],
+					  mlxsw_sx);
+err_rx_listener_register:
+	for (i--; i >= 0; i--) {
+		mlxsw_reg_hpkt_pack(hpkt_pl, MLXSW_REG_HPKT_ACTION_FORWARD,
+				    MLXSW_REG_HTGT_TRAP_GROUP_RX,
+				    mlxsw_sx_rx_listener[i].trap_id);
+		mlxsw_reg_write(mlxsw_sx->core, MLXSW_REG(hpkt), hpkt_pl);
+
+		mlxsw_core_rx_listener_unregister(mlxsw_sx->core,
+						  &mlxsw_sx_rx_listener[i],
+						  mlxsw_sx);
+	}
+	return err;
+}
+
+static void mlxsw_sx_traps_fini(struct mlxsw_sx *mlxsw_sx)
+{
+	char hpkt_pl[MLXSW_REG_HPKT_LEN];
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(mlxsw_sx_rx_listener); i++) {
+		mlxsw_reg_hpkt_pack(hpkt_pl, MLXSW_REG_HPKT_ACTION_FORWARD,
+				    MLXSW_REG_HTGT_TRAP_GROUP_RX,
+				    mlxsw_sx_rx_listener[i].trap_id);
+		mlxsw_reg_write(mlxsw_sx->core, MLXSW_REG(hpkt), hpkt_pl);
+
+		mlxsw_core_rx_listener_unregister(mlxsw_sx->core,
+						  &mlxsw_sx_rx_listener[i],
+						  mlxsw_sx);
+	}
+}
+
+static int mlxsw_sx_flood_init(struct mlxsw_sx *mlxsw_sx)
+{
+	char sfgc_pl[MLXSW_REG_SFGC_LEN];
+	char sgcr_pl[MLXSW_REG_SGCR_LEN];
+	char *smid_pl;
+	char *sftr_pl;
+	int err;
+
+	/* Due to FW bug, we must configure SMID. */
+	smid_pl = kmalloc(MLXSW_REG_SMID_LEN, GFP_KERNEL);
+	if (!smid_pl)
+		return -ENOMEM;
+	mlxsw_reg_smid_pack(smid_pl, MLXSW_PORT_MID);
+	err = mlxsw_reg_write(mlxsw_sx->core, MLXSW_REG(smid), smid_pl);
+	kfree(smid_pl);
+	if (err)
+		return err;
+
+	/* Configure a flooding table, which includes only CPU port. */
+	sftr_pl = kmalloc(MLXSW_REG_SFTR_LEN, GFP_KERNEL);
+	if (!sftr_pl)
+		return -ENOMEM;
+	mlxsw_reg_sftr_pack(sftr_pl, 0, 0, MLXSW_REG_SFGC_TABLE_TYPE_SINGLE, 0);
+	err = mlxsw_reg_write(mlxsw_sx->core, MLXSW_REG(sftr), sftr_pl);
+	kfree(sftr_pl);
+	if (err)
+		return err;
+
+	/* Flood different packet types using the flooding table. */
+	mlxsw_reg_sfgc_pack(sfgc_pl,
+			    MLXSW_REG_SFGC_TYPE_UNKNOWN_UNICAST,
+			    MLXSW_REG_SFGC_BRIDGE_TYPE_1Q_FID,
+			    MLXSW_REG_SFGC_TABLE_TYPE_SINGLE,
+			    0);
+	err = mlxsw_reg_write(mlxsw_sx->core, MLXSW_REG(sfgc), sfgc_pl);
+	if (err)
+		return err;
+
+	mlxsw_reg_sfgc_pack(sfgc_pl,
+			    MLXSW_REG_SFGC_TYPE_BROADCAST,
+			    MLXSW_REG_SFGC_BRIDGE_TYPE_1Q_FID,
+			    MLXSW_REG_SFGC_TABLE_TYPE_SINGLE,
+			    0);
+	err = mlxsw_reg_write(mlxsw_sx->core, MLXSW_REG(sfgc), sfgc_pl);
+	if (err)
+		return err;
+
+	mlxsw_reg_sfgc_pack(sfgc_pl,
+			    MLXSW_REG_SFGC_TYPE_UNREGISTERED_MULTICAST_NON_IP,
+			    MLXSW_REG_SFGC_BRIDGE_TYPE_1Q_FID,
+			    MLXSW_REG_SFGC_TABLE_TYPE_SINGLE,
+			    0);
+	err = mlxsw_reg_write(mlxsw_sx->core, MLXSW_REG(sfgc), sfgc_pl);
+	if (err)
+		return err;
+
+	mlxsw_reg_sfgc_pack(sfgc_pl,
+			    MLXSW_REG_SFGC_TYPE_UNREGISTERED_MULTICAST_IPV6,
+			    MLXSW_REG_SFGC_BRIDGE_TYPE_1Q_FID,
+			    MLXSW_REG_SFGC_TABLE_TYPE_SINGLE,
+			    0);
+	err = mlxsw_reg_write(mlxsw_sx->core, MLXSW_REG(sfgc), sfgc_pl);
+	if (err)
+		return err;
+
+	mlxsw_reg_sfgc_pack(sfgc_pl,
+			    MLXSW_REG_SFGC_TYPE_UNREGISTERED_MULTICAST_IPV4,
+			    MLXSW_REG_SFGC_BRIDGE_TYPE_1Q_FID,
+			    MLXSW_REG_SFGC_TABLE_TYPE_SINGLE,
+			    0);
+	err = mlxsw_reg_write(mlxsw_sx->core, MLXSW_REG(sfgc), sfgc_pl);
+	if (err)
+		return err;
+
+	mlxsw_reg_sgcr_pack(sgcr_pl, true);
+	return mlxsw_reg_write(mlxsw_sx->core, MLXSW_REG(sgcr), sgcr_pl);
+}
+
+static int mlxsw_sx_init(void *priv, struct mlxsw_core *mlxsw_core,
+			 const struct mlxsw_bus_info *mlxsw_bus_info)
+{
+	struct mlxsw_sx *mlxsw_sx = priv;
+	int err;
+
+	mlxsw_sx->core = mlxsw_core;
+	mlxsw_sx->bus_info = mlxsw_bus_info;
+
+	err = mlxsw_sx_hw_id_get(mlxsw_sx);
+	if (err) {
+		dev_err(mlxsw_sx->bus_info->dev, "Failed to get switch HW ID\n");
+		return err;
+	}
+
+	err = mlxsw_sx_ports_create(mlxsw_sx);
+	if (err) {
+		dev_err(mlxsw_sx->bus_info->dev, "Failed to create ports\n");
+		return err;
+	}
+
+	err = mlxsw_sx_event_register(mlxsw_sx, MLXSW_TRAP_ID_PUDE);
+	if (err) {
+		dev_err(mlxsw_sx->bus_info->dev, "Failed to register for PUDE events\n");
+		goto err_event_register;
+	}
+
+	err = mlxsw_sx_traps_init(mlxsw_sx);
+	if (err) {
+		dev_err(mlxsw_sx->bus_info->dev, "Failed to set traps for RX\n");
+		goto err_rx_listener_register;
+	}
+
+	err = mlxsw_sx_flood_init(mlxsw_sx);
+	if (err) {
+		dev_err(mlxsw_sx->bus_info->dev, "Failed to initialize flood tables\n");
+		goto err_flood_init;
+	}
+
+	return 0;
+
+err_flood_init:
+	mlxsw_sx_traps_fini(mlxsw_sx);
+err_rx_listener_register:
+	mlxsw_sx_event_unregister(mlxsw_sx, MLXSW_TRAP_ID_PUDE);
+err_event_register:
+	mlxsw_sx_ports_remove(mlxsw_sx);
+	return err;
+}
+
+static void mlxsw_sx_fini(void *priv)
+{
+	struct mlxsw_sx *mlxsw_sx = priv;
+
+	mlxsw_sx_traps_fini(mlxsw_sx);
+	mlxsw_sx_event_unregister(mlxsw_sx, MLXSW_TRAP_ID_PUDE);
+	mlxsw_sx_ports_remove(mlxsw_sx);
+}
+
+static struct mlxsw_config_profile mlxsw_sx_config_profile = {
+	.used_max_vepa_channels		= 1,
+	.max_vepa_channels		= 0,
+	.used_max_lag			= 1,
+	.max_lag			= 64,
+	.used_max_port_per_lag		= 1,
+	.max_port_per_lag		= 16,
+	.used_max_mid			= 1,
+	.max_mid			= 7000,
+	.used_max_pgt			= 1,
+	.max_pgt			= 0,
+	.used_max_system_port		= 1,
+	.max_system_port		= 48000,
+	.used_max_vlan_groups		= 1,
+	.max_vlan_groups		= 127,
+	.used_max_regions		= 1,
+	.max_regions			= 400,
+	.used_flood_tables		= 1,
+	.max_flood_tables		= 2,
+	.max_vid_flood_tables		= 1,
+	.used_flood_mode		= 1,
+	.flood_mode			= 3,
+	.used_max_ib_mc			= 1,
+	.max_ib_mc			= 0,
+	.used_max_pkey			= 1,
+	.max_pkey			= 0,
+	.swid_config			= {
+		{
+			.used_type	= 1,
+			.type		= MLXSW_PORT_SWID_TYPE_ETH,
+		}
+	},
+};
+
+static struct mlxsw_driver mlxsw_sx_driver = {
+	.kind			= MLXSW_DEVICE_KIND_SWITCHX2,
+	.owner			= THIS_MODULE,
+	.priv_size		= sizeof(struct mlxsw_sx),
+	.init			= mlxsw_sx_init,
+	.fini			= mlxsw_sx_fini,
+	.txhdr_construct	= mlxsw_sx_txhdr_construct,
+	.txhdr_len		= MLXSW_TXHDR_LEN,
+	.profile		= &mlxsw_sx_config_profile,
+};
+
+static int __init mlxsw_sx_module_init(void)
+{
+	return mlxsw_core_driver_register(&mlxsw_sx_driver);
+}
+
+static void __exit mlxsw_sx_module_exit(void)
+{
+	mlxsw_core_driver_unregister(&mlxsw_sx_driver);
+}
+
+module_init(mlxsw_sx_module_init);
+module_exit(mlxsw_sx_module_exit);
+
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_AUTHOR("Jiri Pirko <jiri@mellanox.com>");
+MODULE_DESCRIPTION("Mellanox SwitchX-2 driver");
+MODULE_MLXSW_DRIVER_ALIAS(MLXSW_DEVICE_KIND_SWITCHX2);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/txheader.h b/drivers/net/ethernet/mellanox/mlxsw/txheader.h
new file mode 100644
index 0000000..06fc46c
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/txheader.h
@@ -0,0 +1,80 @@
+/*
+ * drivers/net/ethernet/mellanox/mlxsw/txheader.h
+ * Copyright (c) 2015 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2015 Ido Schimmel <idosch@mellanox.com>
+ * Copyright (c) 2015 Jiri Pirko <jiri@mellanox.com>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the names of the copyright holders nor the names of its
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _MLXSW_TXHEADER_H
+#define _MLXSW_TXHEADER_H
+
+#define MLXSW_TXHDR_LEN 0x10
+#define MLXSW_TXHDR_VERSION_0 0
+
+enum {
+	MLXSW_TXHDR_ETH_CTL,
+	MLXSW_TXHDR_ETH_DATA,
+};
+
+#define MLXSW_TXHDR_PROTO_ETH 1
+
+enum {
+	MLXSW_TXHDR_ETCLASS_0,
+	MLXSW_TXHDR_ETCLASS_1,
+	MLXSW_TXHDR_ETCLASS_2,
+	MLXSW_TXHDR_ETCLASS_3,
+	MLXSW_TXHDR_ETCLASS_4,
+	MLXSW_TXHDR_ETCLASS_5,
+	MLXSW_TXHDR_ETCLASS_6,
+	MLXSW_TXHDR_ETCLASS_7,
+};
+
+enum {
+	MLXSW_TXHDR_RDQ_OTHER,
+	MLXSW_TXHDR_RDQ_EMAD = 0x1f,
+};
+
+#define MLXSW_TXHDR_CTCLASS3 0
+#define MLXSW_TXHDR_CPU_SIG 0
+#define MLXSW_TXHDR_SIG 0xE0E0
+#define MLXSW_TXHDR_STCLASS_NONE 0
+
+enum {
+	MLXSW_TXHDR_NOT_EMAD,
+	MLXSW_TXHDR_EMAD,
+};
+
+enum {
+	MLXSW_TXHDR_TYPE_DATA,
+	MLXSW_TXHDR_TYPE_CONTROL = 6,
+};
+
+#endif
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [patch net-next 4/4] mlxsw: Introduce Mellanox SwitchX-2 ASIC support
  2015-07-23 15:43 ` [patch net-next 4/4] mlxsw: Introduce Mellanox SwitchX-2 ASIC support Jiri Pirko
@ 2015-07-23 17:19   ` Alexander Duyck
  2015-07-23 19:42     ` Jiri Pirko
  2015-07-26  2:45   ` Scott Feldman
  1 sibling, 1 reply; 29+ messages in thread
From: Alexander Duyck @ 2015-07-23 17:19 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, idosch, eladr, ogerlitz, sfeldma, roopa, f.fainelli,
	tgraf, ast, jhs, daniel, john.fastabend, simon.horman, linville,
	andy, shm, nhorman, Jiri Pirko

On 07/23/2015 08:43 AM, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
>
> Benefit from the previously introduced Mellanox Switch infrastructure and
> add driver for SwitchX-2 ASIC. Note that this driver is very simple now.
> It implements bare minimum for getting device to work on slow-path.
> Fast-path offload functionality is going to be added soon.
>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Signed-off-by: Elad Raz <eladr@mellanox.com>
> ---
>   drivers/net/ethernet/mellanox/mlxsw/Kconfig    |   11 +
>   drivers/net/ethernet/mellanox/mlxsw/Makefile   |    2 +
>   drivers/net/ethernet/mellanox/mlxsw/core.h     |    2 +
>   drivers/net/ethernet/mellanox/mlxsw/pci.c      |    3 +
>   drivers/net/ethernet/mellanox/mlxsw/pci.h      |    1 +
>   drivers/net/ethernet/mellanox/mlxsw/port.h     |    4 +
>   drivers/net/ethernet/mellanox/mlxsw/switchx2.c | 1541 ++++++++++++++++++++++++
>   drivers/net/ethernet/mellanox/mlxsw/txheader.h |   80 ++
>   8 files changed, 1644 insertions(+)
>   create mode 100644 drivers/net/ethernet/mellanox/mlxsw/switchx2.c
>   create mode 100644 drivers/net/ethernet/mellanox/mlxsw/txheader.h
>

<snip>

> diff --git a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
> new file mode 100644
> index 0000000..15dc53b
> --- /dev/null
> +++ b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
> @@ -0,0 +1,1541 @@
> +/*
> + * drivers/net/ethernet/mellanox/mlxsw/switchx2.c
> + * Copyright (c) 2015 Mellanox Technologies. All rights reserved.
> + * Copyright (c) 2015 Jiri Pirko <jiri@mellanox.com>
> + * Copyright (c) 2015 Ido Schimmel <idosch@mellanox.com>
> + * Copyright (c) 2015 Elad Raz <eladr@mellanox.com>
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions are met:
> + *
> + * 1. Redistributions of source code must retain the above copyright
> + *    notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *    notice, this list of conditions and the following disclaimer in the
> + *    documentation and/or other materials provided with the distribution.
> + * 3. Neither the names of the copyright holders nor the names of its
> + *    contributors may be used to endorse or promote products derived from
> + *    this software without specific prior written permission.
> + *
> + * Alternatively, this software may be distributed under the terms of the
> + * GNU General Public License ("GPL") version 2 as published by the Free
> + * Software Foundation.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
> + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
> + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
> + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
> + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
> + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
> + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
> + * POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/types.h>
> +#include <linux/netdevice.h>
> +#include <linux/etherdevice.h>
> +#include <linux/slab.h>
> +#include <linux/device.h>
> +#include <linux/skbuff.h>
> +#include <linux/if_vlan.h>
> +#include <net/switchdev.h>
> +#include <generated/utsrelease.h>
> +
> +#include "core.h"
> +#include "reg.h"
> +#include "port.h"
> +#include "trap.h"
> +#include "txheader.h"
> +
> +static const char mlxsw_sx_driver_name[] = "mlxsw_switchx2";
> +static const char mlxsw_sx_driver_version[] = "1.0";
> +
> +struct mlxsw_sx_port;
> +
> +#define MLXSW_SW_HW_ID_LEN 6
> +
> +struct mlxsw_sx {
> +	struct mlxsw_sx_port **ports;
> +	struct mlxsw_core *core;
> +	const struct mlxsw_bus_info *bus_info;
> +	u8 hw_id[MLXSW_SW_HW_ID_LEN];
> +};
> +
> +struct mlxsw_sx_port_pcpu_stats {
> +	u64			rx_packets;
> +	u64			rx_bytes;
> +	u64			tx_packets;
> +	u64			tx_bytes;
> +	struct u64_stats_sync	syncp;
> +	u32			tx_dropped;
> +};
> +

It seems like this is the second time I have seen someone have to invent 
a new pcpu stats to add tx_dropped.  Maybe you should look at adding 
tx_dropped to pcpu_sw_netstats and just use that.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch net-next 4/4] mlxsw: Introduce Mellanox SwitchX-2 ASIC support
  2015-07-23 17:19   ` Alexander Duyck
@ 2015-07-23 19:42     ` Jiri Pirko
  0 siblings, 0 replies; 29+ messages in thread
From: Jiri Pirko @ 2015-07-23 19:42 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: netdev, davem, idosch, eladr, ogerlitz, sfeldma, roopa,
	f.fainelli, tgraf, ast, jhs, daniel, john.fastabend,
	simon.horman, linville, andy, shm, nhorman, Jiri Pirko

Thu, Jul 23, 2015 at 07:19:46PM CEST, alexander.duyck@gmail.com wrote:
>On 07/23/2015 08:43 AM, Jiri Pirko wrote:
>>From: Jiri Pirko <jiri@mellanox.com>
>>
>>Benefit from the previously introduced Mellanox Switch infrastructure and
>>add driver for SwitchX-2 ASIC. Note that this driver is very simple now.
>>It implements bare minimum for getting device to work on slow-path.
>>Fast-path offload functionality is going to be added soon.
>>
>>Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>Signed-off-by: Ido Schimmel <idosch@mellanox.com>
>>Signed-off-by: Elad Raz <eladr@mellanox.com>
>>---
>>  drivers/net/ethernet/mellanox/mlxsw/Kconfig    |   11 +
>>  drivers/net/ethernet/mellanox/mlxsw/Makefile   |    2 +
>>  drivers/net/ethernet/mellanox/mlxsw/core.h     |    2 +
>>  drivers/net/ethernet/mellanox/mlxsw/pci.c      |    3 +
>>  drivers/net/ethernet/mellanox/mlxsw/pci.h      |    1 +
>>  drivers/net/ethernet/mellanox/mlxsw/port.h     |    4 +
>>  drivers/net/ethernet/mellanox/mlxsw/switchx2.c | 1541 ++++++++++++++++++++++++
>>  drivers/net/ethernet/mellanox/mlxsw/txheader.h |   80 ++
>>  8 files changed, 1644 insertions(+)
>>  create mode 100644 drivers/net/ethernet/mellanox/mlxsw/switchx2.c
>>  create mode 100644 drivers/net/ethernet/mellanox/mlxsw/txheader.h
>>
>
><snip>
>
>>diff --git a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
>>new file mode 100644
>>index 0000000..15dc53b
>>--- /dev/null
>>+++ b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
>>@@ -0,0 +1,1541 @@
>>+/*
>>+ * drivers/net/ethernet/mellanox/mlxsw/switchx2.c
>>+ * Copyright (c) 2015 Mellanox Technologies. All rights reserved.
>>+ * Copyright (c) 2015 Jiri Pirko <jiri@mellanox.com>
>>+ * Copyright (c) 2015 Ido Schimmel <idosch@mellanox.com>
>>+ * Copyright (c) 2015 Elad Raz <eladr@mellanox.com>
>>+ *
>>+ * Redistribution and use in source and binary forms, with or without
>>+ * modification, are permitted provided that the following conditions are met:
>>+ *
>>+ * 1. Redistributions of source code must retain the above copyright
>>+ *    notice, this list of conditions and the following disclaimer.
>>+ * 2. Redistributions in binary form must reproduce the above copyright
>>+ *    notice, this list of conditions and the following disclaimer in the
>>+ *    documentation and/or other materials provided with the distribution.
>>+ * 3. Neither the names of the copyright holders nor the names of its
>>+ *    contributors may be used to endorse or promote products derived from
>>+ *    this software without specific prior written permission.
>>+ *
>>+ * Alternatively, this software may be distributed under the terms of the
>>+ * GNU General Public License ("GPL") version 2 as published by the Free
>>+ * Software Foundation.
>>+ *
>>+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
>>+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
>>+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
>>+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
>>+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
>>+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
>>+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
>>+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
>>+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
>>+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
>>+ * POSSIBILITY OF SUCH DAMAGE.
>>+ */
>>+
>>+#include <linux/kernel.h>
>>+#include <linux/module.h>
>>+#include <linux/types.h>
>>+#include <linux/netdevice.h>
>>+#include <linux/etherdevice.h>
>>+#include <linux/slab.h>
>>+#include <linux/device.h>
>>+#include <linux/skbuff.h>
>>+#include <linux/if_vlan.h>
>>+#include <net/switchdev.h>
>>+#include <generated/utsrelease.h>
>>+
>>+#include "core.h"
>>+#include "reg.h"
>>+#include "port.h"
>>+#include "trap.h"
>>+#include "txheader.h"
>>+
>>+static const char mlxsw_sx_driver_name[] = "mlxsw_switchx2";
>>+static const char mlxsw_sx_driver_version[] = "1.0";
>>+
>>+struct mlxsw_sx_port;
>>+
>>+#define MLXSW_SW_HW_ID_LEN 6
>>+
>>+struct mlxsw_sx {
>>+	struct mlxsw_sx_port **ports;
>>+	struct mlxsw_core *core;
>>+	const struct mlxsw_bus_info *bus_info;
>>+	u8 hw_id[MLXSW_SW_HW_ID_LEN];
>>+};
>>+
>>+struct mlxsw_sx_port_pcpu_stats {
>>+	u64			rx_packets;
>>+	u64			rx_bytes;
>>+	u64			tx_packets;
>>+	u64			tx_bytes;
>>+	struct u64_stats_sync	syncp;
>>+	u32			tx_dropped;
>>+};
>>+
>
>It seems like this is the second time I have seen someone have to invent a
>new pcpu stats to add tx_dropped.  Maybe you should look at adding tx_dropped
>to pcpu_sw_netstats and just use that.

I don't mind doing that. Multiple drivers use custom stat structures.
We can easily change that into some common structure if needed.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch net-next 3/4] mlxsw: Add interface to access registers and process events
  2015-07-23 15:43 ` [patch net-next 3/4] mlxsw: Add interface to access registers and process events Jiri Pirko
@ 2015-07-23 21:12   ` Andy Gospodarek
  2015-07-24  5:24     ` Jiri Pirko
  2015-07-24  5:13   ` Scott Feldman
  1 sibling, 1 reply; 29+ messages in thread
From: Andy Gospodarek @ 2015-07-23 21:12 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, idosch, eladr, ogerlitz, sfeldma, roopa,
	f.fainelli, tgraf, ast, jhs, daniel, john.fastabend,
	simon.horman, linville, andy, shm, nhorman, Jiri Pirko

On Thu, Jul 23, 2015 at 05:43:35PM +0200, Jiri Pirko wrote:
> From: Ido Schimmel <idosch@mellanox.com>
> 
> Add the ability to construct mailbox-style register access messages
> called EMADs with provisions to construct and parse the registers payload.
> Implement EMAD transaction layer which is responsible for the reliable
> transmission of EMADs.
> Also, add an infrastructure used by the switch driver to register for
> particular events generated by the device.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> Signed-off-by: Elad Raz <eladr@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlxsw/core.c |  736 ++++++++++++++++
>  drivers/net/ethernet/mellanox/mlxsw/core.h |   21 +
>  drivers/net/ethernet/mellanox/mlxsw/emad.h |  127 +++
>  drivers/net/ethernet/mellanox/mlxsw/port.h |   19 +
>  drivers/net/ethernet/mellanox/mlxsw/reg.h  | 1289 ++++++++++++++++++++++++++++
>  5 files changed, 2192 insertions(+)
>  create mode 100644 drivers/net/ethernet/mellanox/mlxsw/emad.h
>  create mode 100644 drivers/net/ethernet/mellanox/mlxsw/reg.h
> 
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
> index 211ec9b..bd0f692 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/core.c
> +++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
[...]
> +	struct list_head event_listener_list;
> +	struct {
> +		struct sk_buff *resp_skb;
> +		u64 tid;
> +		wait_queue_head_t wait;
> +		bool trans_active;
> +		struct mutex lock; /* One EMAD transaction at a time. */
> +		bool use_emad;
> +	} emad;
>  	struct mlxsw_core_pcpu_stats __percpu *pcpu_stats;
>  	struct dentry *dbg_dir;
>  	struct {
[...]
>  	}
>  
>  	INIT_LIST_HEAD(&mlxsw_core->rx_listener_list);
> +	INIT_LIST_HEAD(&mlxsw_core->event_listener_list);
>  	mlxsw_core->driver = mlxsw_driver;
>  	mlxsw_core->bus = mlxsw_bus;
>  	mlxsw_core->bus_priv = bus_priv;
[...]
> +	/* No reason to save item if we did not manage to register an RX
> +	 * listener for it.
> +	 */
> +	list_add_rcu(&el_item->list, &mlxsw_core->event_listener_list);
> +

I see where 'event_listener_list' is defined and where entries are
added/removed, but where is the code that would receive these events and
presumably search this list so all handlers registered (currently just
PUDE) can handle events?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch net-next 0/4] Introduce Mellanox Technologies Switch ASICs switchdev drivers
  2015-07-23 15:43 [patch net-next 0/4] Introduce Mellanox Technologies Switch ASICs switchdev drivers Jiri Pirko
                   ` (3 preceding siblings ...)
  2015-07-23 15:43 ` [patch net-next 4/4] mlxsw: Introduce Mellanox SwitchX-2 ASIC support Jiri Pirko
@ 2015-07-24  0:03 ` Scott Feldman
  2015-07-24  0:38   ` Rustad, Mark D
  2015-07-24  5:38   ` Jiri Pirko
  2015-07-27  6:51 ` David Miller
  2015-07-27 20:21 ` Scott Feldman
  6 siblings, 2 replies; 29+ messages in thread
From: Scott Feldman @ 2015-07-24  0:03 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Netdev, David S. Miller, idosch, eladr, ogerlitz, Roopa Prabhu,
	Florian Fainelli, Thomas Graf, ast, Jamal Hadi Salim,
	Daniel Borkmann, john fastabend, simon.horman, John Linville,
	Andy Gospodarek, Shrijeet Mukherjee, nhorman

On Thu, Jul 23, 2015 at 8:43 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> This patchset introduces Mellanox Technologies Switch driver infrastructure
> and support for SwitchX-2 ASIC.
>
> The driver is divided into 3 logical parts:
> 1) Bus - implements switch bus interface. Currently only PCI bus is
>    implemented, but more buses will be added in the future. Namely I2C
>    and SGMII.
>    (patch #2)
> 2) Driver - implemements of ASIC-specific functions.
>    Currently SwitchX-2 ASIC is supported, but a plan exists to introduce
>    support for Spectrum ASIC in the near future.
>    (patch #4)
> 3) Core - infrastructure that glues buses and drivers together.
>    It implements register access logic (EMADs) and takes care of RX traps
>    and events.
>    (patch #1 and #3)
>
> Ido Schimmel (1):
>   mlxsw: Add interface to access registers and process events
>
> Jiri Pirko (3):
>   mlxsw: Introduce Mellanox switch driver core
>   mlxsw: Add PCI bus implementation
>   mlxsw: Introduce Mellanox SwitchX-2 ASIC support

This is awesome!  Reviewing...

checkpatch.pl shows 1 ERROR, bunch of >80 chars WARNs, and bunch of
CHECKs on space after cast.

On the CHECKs on space after cast, should we modify checkpatch.pl to
not flag those for drivers/net?

-scott

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch net-next 0/4] Introduce Mellanox Technologies Switch ASICs switchdev drivers
  2015-07-24  0:03 ` [patch net-next 0/4] Introduce Mellanox Technologies Switch ASICs switchdev drivers Scott Feldman
@ 2015-07-24  0:38   ` Rustad, Mark D
  2015-07-24  5:38   ` Jiri Pirko
  1 sibling, 0 replies; 29+ messages in thread
From: Rustad, Mark D @ 2015-07-24  0:38 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Jiri Pirko, Netdev, David S. Miller, idosch, eladr, ogerlitz,
	Roopa Prabhu, Florian Fainelli, Thomas Graf, ast,
	Jamal Hadi Salim, Daniel Borkmann, john fastabend, simon.horman,
	John Linville, Andy Gospodarek, Shrijeet Mukherjee, nhorman

[-- Attachment #1: Type: text/plain, Size: 406 bytes --]

> On Jul 23, 2015, at 5:03 PM, Scott Feldman <sfeldma@gmail.com> wrote:
> 
> On the CHECKs on space after cast, should we modify checkpatch.pl to
> not flag those for drivers/net?

Please don't. My internal parser really wants to see the cast right up against whatever it is casting. Has this practice been changing and I haven't noticed?

--
Mark Rustad, Networking Division, Intel Corporation


[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 841 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch net-next 2/4] mlxsw: Add PCI bus implementation
  2015-07-23 15:43 ` [patch net-next 2/4] mlxsw: Add PCI bus implementation Jiri Pirko
@ 2015-07-24  4:52   ` Scott Feldman
  2015-07-24  5:30     ` Jiri Pirko
  2015-07-26  5:15   ` Scott Feldman
  1 sibling, 1 reply; 29+ messages in thread
From: Scott Feldman @ 2015-07-24  4:52 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Netdev, David S. Miller, idosch, eladr, ogerlitz, Roopa Prabhu,
	Florian Fainelli, Thomas Graf, ast, Jamal Hadi Salim,
	Daniel Borkmann, john fastabend, simon.horman, John Linville,
	Andy Gospodarek, Shrijeet Mukherjee, nhorman, Jiri Pirko

On Thu, Jul 23, 2015 at 8:43 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> From: Jiri Pirko <jiri@mellanox.com>
>
> Add PCI bus implementation for Mellanox Technologies Switch ASICs. This
> includes firmware initialization, async queues manipulation and command
> interface implementation.
>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Signed-off-by: Elad Raz <eladr@mellanox.com>

[cut]

> +static int mlxsw_pci_skb_transmit(void *bus_priv, struct sk_buff *skb,
> +                                 const struct mlxsw_tx_info *tx_info)
> +{
> +       struct mlxsw_pci *mlxsw_pci = bus_priv;
> +       struct mlxsw_pci_queue *q;
> +       struct mlxsw_pci_queue_elem_info *elem_info;
> +       char *wqe;
> +       int i;
> +       int err;
> +
> +       if (skb_shinfo(skb)->nr_frags > MLXSW_PCI_WQE_SG_ENTRIES - 1)
> +               return -EINVAL;

Can you skb_linearize() here to try to continue?

> +       q = mlxsw_pci_sdq_pick(mlxsw_pci, tx_info);
> +       spin_lock_bh(&q->lock);
> +       elem_info = mlxsw_pci_queue_elem_info_producer_get(q);
> +       if (!elem_info) {
> +               /* queue is full */
> +               err = -EAGAIN;
> +               goto unlock;
> +       }
> +       elem_info->u.sdq.skb = skb;
> +
> +       wqe = elem_info->elem;
> +       mlxsw_pci_wqe_c_set(wqe, 1); /* always report completion */
> +       mlxsw_pci_wqe_lp_set(wqe, !!tx_info->is_emad);
> +       mlxsw_pci_wqe_type_set(wqe, MLXSW_PCI_WQE_TYPE_ETHERNET);
> +
> +       err = mlxsw_pci_wqe_frag_map(mlxsw_pci, wqe, 0, skb->data,
> +                                    skb_headlen(skb), DMA_TO_DEVICE);
> +       if (err)
> +               goto unlock;
> +
> +       for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
> +               const skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
> +
> +               err = mlxsw_pci_wqe_frag_map(mlxsw_pci, wqe, i + 1,
> +                                            skb_frag_address(frag),
> +                                            skb_frag_size(frag),
> +                                            DMA_TO_DEVICE);
> +               if (err)
> +                       goto unmap_frags;
> +       }
> +
> +       /* Set unused sq entries byte count to zero. */
> +       for (i++; i < MLXSW_PCI_WQE_SG_ENTRIES; i++)
> +               mlxsw_pci_wqe_byte_count_set(wqe, i, 0);

Is hw OK with not clearing the unused sq entries dma_address?  Setting
byte_count to zero must be sufficient?

> +
> +       /* Everything is set up, ring producer doorbell to get HW going */
> +       q->producer_counter++;
> +       mlxsw_pci_queue_doorbell_producer_ring(mlxsw_pci, q);
> +
> +       goto unlock;
> +
> +unmap_frags:
> +       for (; i >= 0; i--)
> +               mlxsw_pci_wqe_frag_unmap(mlxsw_pci, wqe, i, DMA_TO_DEVICE);
> +unlock:
> +       spin_unlock_bh(&q->lock);
> +       return err;
> +}
> +
> +static int mlxsw_pci_cmd_exec(void *bus_priv, u16 opcode, u8 opcode_mod,
> +                             u32 in_mod, bool out_mbox_direct,
> +                             char *in_mbox, size_t in_mbox_size,
> +                             char *out_mbox, size_t out_mbox_size,
> +                             u8 *p_status)
> +{
> +       struct mlxsw_pci *mlxsw_pci = bus_priv;
> +       dma_addr_t in_mapaddr = 0;
> +       dma_addr_t out_mapaddr = 0;
> +       bool evreq = mlxsw_pci->cmd.nopoll;
> +       unsigned long timeout = msecs_to_jiffies(MLXSW_PCI_CIR_TIMEOUT_MSECS);
> +       bool *p_wait_done = &mlxsw_pci->cmd.wait_done;

Why is this initialized and then later set to false?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch net-next 3/4] mlxsw: Add interface to access registers and process events
  2015-07-23 15:43 ` [patch net-next 3/4] mlxsw: Add interface to access registers and process events Jiri Pirko
  2015-07-23 21:12   ` Andy Gospodarek
@ 2015-07-24  5:13   ` Scott Feldman
  2015-07-24  5:18     ` Jiri Pirko
  2015-07-24  6:48     ` Elad Raz
  1 sibling, 2 replies; 29+ messages in thread
From: Scott Feldman @ 2015-07-24  5:13 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Netdev, David S. Miller, idosch, eladr, ogerlitz, Roopa Prabhu,
	Florian Fainelli, Thomas Graf, Alexei Starovoitov,
	Jamal Hadi Salim, Daniel Borkmann, john fastabend, simon.horman,
	John Linville, Andy Gospodarek, Shrijeet Mukherjee, nhorman,
	Jiri Pirko

On Thu, Jul 23, 2015 at 8:43 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> From: Ido Schimmel <idosch@mellanox.com>
>
> Add the ability to construct mailbox-style register access messages
> called EMADs with provisions to construct and parse the registers payload.
> Implement EMAD transaction layer which is responsible for the reliable
> transmission of EMADs.
> Also, add an infrastructure used by the switch driver to register for
> particular events generated by the device.

What is this EMADs used for?  Is this for intra-switch or inter-switch
communications?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch net-next 3/4] mlxsw: Add interface to access registers and process events
  2015-07-24  5:13   ` Scott Feldman
@ 2015-07-24  5:18     ` Jiri Pirko
  2015-07-24  6:48     ` Elad Raz
  1 sibling, 0 replies; 29+ messages in thread
From: Jiri Pirko @ 2015-07-24  5:18 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Netdev, David S. Miller, idosch, eladr, ogerlitz, Roopa Prabhu,
	Florian Fainelli, Thomas Graf, Alexei Starovoitov,
	Jamal Hadi Salim, Daniel Borkmann, john fastabend, simon.horman,
	John Linville, Andy Gospodarek, Shrijeet Mukherjee, nhorman,
	Jiri Pirko

Fri, Jul 24, 2015 at 07:13:42AM CEST, sfeldma@gmail.com wrote:
>On Thu, Jul 23, 2015 at 8:43 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> From: Ido Schimmel <idosch@mellanox.com>
>>
>> Add the ability to construct mailbox-style register access messages
>> called EMADs with provisions to construct and parse the registers payload.
>> Implement EMAD transaction layer which is responsible for the reliable
>> transmission of EMADs.
>> Also, add an infrastructure used by the switch driver to register for
>> particular events generated by the device.
>
>What is this EMADs used for?  Is this for intra-switch or inter-switch
>communications?

It is used for passing configuration messages to/from switch device.
It is also used for events. EMADS are passed inside skb like ordinary
packets. They do not leave the switch.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch net-next 3/4] mlxsw: Add interface to access registers and process events
  2015-07-23 21:12   ` Andy Gospodarek
@ 2015-07-24  5:24     ` Jiri Pirko
  2015-07-24 12:09       ` Andy Gospodarek
  0 siblings, 1 reply; 29+ messages in thread
From: Jiri Pirko @ 2015-07-24  5:24 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: netdev, davem, idosch, eladr, ogerlitz, sfeldma, roopa,
	f.fainelli, tgraf, ast, jhs, daniel, john.fastabend,
	simon.horman, linville, andy, shm, nhorman, Jiri Pirko

Thu, Jul 23, 2015 at 11:12:20PM CEST, gospo@cumulusnetworks.com wrote:
>On Thu, Jul 23, 2015 at 05:43:35PM +0200, Jiri Pirko wrote:
>> From: Ido Schimmel <idosch@mellanox.com>
>> 
>> Add the ability to construct mailbox-style register access messages
>> called EMADs with provisions to construct and parse the registers payload.
>> Implement EMAD transaction layer which is responsible for the reliable
>> transmission of EMADs.
>> Also, add an infrastructure used by the switch driver to register for
>> particular events generated by the device.
>> 
>> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> Signed-off-by: Elad Raz <eladr@mellanox.com>
>> ---
>>  drivers/net/ethernet/mellanox/mlxsw/core.c |  736 ++++++++++++++++
>>  drivers/net/ethernet/mellanox/mlxsw/core.h |   21 +
>>  drivers/net/ethernet/mellanox/mlxsw/emad.h |  127 +++
>>  drivers/net/ethernet/mellanox/mlxsw/port.h |   19 +
>>  drivers/net/ethernet/mellanox/mlxsw/reg.h  | 1289 ++++++++++++++++++++++++++++
>>  5 files changed, 2192 insertions(+)
>>  create mode 100644 drivers/net/ethernet/mellanox/mlxsw/emad.h
>>  create mode 100644 drivers/net/ethernet/mellanox/mlxsw/reg.h
>> 
>> diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
>> index 211ec9b..bd0f692 100644
>> --- a/drivers/net/ethernet/mellanox/mlxsw/core.c
>> +++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
>[...]
>> +	struct list_head event_listener_list;
>> +	struct {
>> +		struct sk_buff *resp_skb;
>> +		u64 tid;
>> +		wait_queue_head_t wait;
>> +		bool trans_active;
>> +		struct mutex lock; /* One EMAD transaction at a time. */
>> +		bool use_emad;
>> +	} emad;
>>  	struct mlxsw_core_pcpu_stats __percpu *pcpu_stats;
>>  	struct dentry *dbg_dir;
>>  	struct {
>[...]
>>  	}
>>  
>>  	INIT_LIST_HEAD(&mlxsw_core->rx_listener_list);
>> +	INIT_LIST_HEAD(&mlxsw_core->event_listener_list);
>>  	mlxsw_core->driver = mlxsw_driver;
>>  	mlxsw_core->bus = mlxsw_bus;
>>  	mlxsw_core->bus_priv = bus_priv;
>[...]
>> +	/* No reason to save item if we did not manage to register an RX
>> +	 * listener for it.
>> +	 */
>> +	list_add_rcu(&el_item->list, &mlxsw_core->event_listener_list);
>> +
>
>I see where 'event_listener_list' is defined and where entries are
>added/removed, but where is the code that would receive these events and
>presumably search this list so all handlers registered (currently just
>PUDE) can handle events?

That is handled by calling mlxsw_core_rx_listener_register.
that will add each event handler as a item to &mlxsw_core->rx_listener_list
These rx_listeners are called from mlxsw_core_skb_receive.
So event is here a special case of rx_listener. The event list is used
just to contain struct mlxsw_event_listener_item instances.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch net-next 2/4] mlxsw: Add PCI bus implementation
  2015-07-24  4:52   ` Scott Feldman
@ 2015-07-24  5:30     ` Jiri Pirko
  0 siblings, 0 replies; 29+ messages in thread
From: Jiri Pirko @ 2015-07-24  5:30 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Netdev, David S. Miller, idosch, eladr, ogerlitz, Roopa Prabhu,
	Florian Fainelli, Thomas Graf, ast, Jamal Hadi Salim,
	Daniel Borkmann, john fastabend, simon.horman, John Linville,
	Andy Gospodarek, Shrijeet Mukherjee, nhorman, Jiri Pirko

Fri, Jul 24, 2015 at 06:52:16AM CEST, sfeldma@gmail.com wrote:
>On Thu, Jul 23, 2015 at 8:43 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>>
>> Add PCI bus implementation for Mellanox Technologies Switch ASICs. This
>> includes firmware initialization, async queues manipulation and command
>> interface implementation.
>>
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
>> Signed-off-by: Elad Raz <eladr@mellanox.com>
>
>[cut]
>
>> +static int mlxsw_pci_skb_transmit(void *bus_priv, struct sk_buff *skb,
>> +                                 const struct mlxsw_tx_info *tx_info)
>> +{
>> +       struct mlxsw_pci *mlxsw_pci = bus_priv;
>> +       struct mlxsw_pci_queue *q;
>> +       struct mlxsw_pci_queue_elem_info *elem_info;
>> +       char *wqe;
>> +       int i;
>> +       int err;
>> +
>> +       if (skb_shinfo(skb)->nr_frags > MLXSW_PCI_WQE_SG_ENTRIES - 1)
>> +               return -EINVAL;
>
>Can you skb_linearize() here to try to continue?

Sure, I think we can do it. This would be good to have in rocker too.


>
>> +       q = mlxsw_pci_sdq_pick(mlxsw_pci, tx_info);
>> +       spin_lock_bh(&q->lock);
>> +       elem_info = mlxsw_pci_queue_elem_info_producer_get(q);
>> +       if (!elem_info) {
>> +               /* queue is full */
>> +               err = -EAGAIN;
>> +               goto unlock;
>> +       }
>> +       elem_info->u.sdq.skb = skb;
>> +
>> +       wqe = elem_info->elem;
>> +       mlxsw_pci_wqe_c_set(wqe, 1); /* always report completion */
>> +       mlxsw_pci_wqe_lp_set(wqe, !!tx_info->is_emad);
>> +       mlxsw_pci_wqe_type_set(wqe, MLXSW_PCI_WQE_TYPE_ETHERNET);
>> +
>> +       err = mlxsw_pci_wqe_frag_map(mlxsw_pci, wqe, 0, skb->data,
>> +                                    skb_headlen(skb), DMA_TO_DEVICE);
>> +       if (err)
>> +               goto unlock;
>> +
>> +       for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
>> +               const skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
>> +
>> +               err = mlxsw_pci_wqe_frag_map(mlxsw_pci, wqe, i + 1,
>> +                                            skb_frag_address(frag),
>> +                                            skb_frag_size(frag),
>> +                                            DMA_TO_DEVICE);
>> +               if (err)
>> +                       goto unmap_frags;
>> +       }
>> +
>> +       /* Set unused sq entries byte count to zero. */
>> +       for (i++; i < MLXSW_PCI_WQE_SG_ENTRIES; i++)
>> +               mlxsw_pci_wqe_byte_count_set(wqe, i, 0);
>
>Is hw OK with not clearing the unused sq entries dma_address?  Setting
>byte_count to zero must be sufficient?

Yes, byte_count == 0 is sufficient. No need to set address.


>
>> +
>> +       /* Everything is set up, ring producer doorbell to get HW going */
>> +       q->producer_counter++;
>> +       mlxsw_pci_queue_doorbell_producer_ring(mlxsw_pci, q);
>> +
>> +       goto unlock;
>> +
>> +unmap_frags:
>> +       for (; i >= 0; i--)
>> +               mlxsw_pci_wqe_frag_unmap(mlxsw_pci, wqe, i, DMA_TO_DEVICE);
>> +unlock:
>> +       spin_unlock_bh(&q->lock);
>> +       return err;
>> +}
>> +
>> +static int mlxsw_pci_cmd_exec(void *bus_priv, u16 opcode, u8 opcode_mod,
>> +                             u32 in_mod, bool out_mbox_direct,
>> +                             char *in_mbox, size_t in_mbox_size,
>> +                             char *out_mbox, size_t out_mbox_size,
>> +                             u8 *p_status)
>> +{
>> +       struct mlxsw_pci *mlxsw_pci = bus_priv;
>> +       dma_addr_t in_mapaddr = 0;
>> +       dma_addr_t out_mapaddr = 0;
>> +       bool evreq = mlxsw_pci->cmd.nopoll;
>> +       unsigned long timeout = msecs_to_jiffies(MLXSW_PCI_CIR_TIMEOUT_MSECS);
>> +       bool *p_wait_done = &mlxsw_pci->cmd.wait_done;
>
>Why is this initialized and then later set to false?

*p_wait_done
so we actually set not p_wait_done but mlxsw_pci->cmd.wait_done there.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch net-next 0/4] Introduce Mellanox Technologies Switch ASICs switchdev drivers
  2015-07-24  0:03 ` [patch net-next 0/4] Introduce Mellanox Technologies Switch ASICs switchdev drivers Scott Feldman
  2015-07-24  0:38   ` Rustad, Mark D
@ 2015-07-24  5:38   ` Jiri Pirko
  1 sibling, 0 replies; 29+ messages in thread
From: Jiri Pirko @ 2015-07-24  5:38 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Netdev, David S. Miller, idosch, eladr, ogerlitz, Roopa Prabhu,
	Florian Fainelli, Thomas Graf, ast, Jamal Hadi Salim,
	Daniel Borkmann, john fastabend, simon.horman, John Linville,
	Andy Gospodarek, Shrijeet Mukherjee, nhorman

Fri, Jul 24, 2015 at 02:03:20AM CEST, sfeldma@gmail.com wrote:
>On Thu, Jul 23, 2015 at 8:43 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> This patchset introduces Mellanox Technologies Switch driver infrastructure
>> and support for SwitchX-2 ASIC.
>>
>> The driver is divided into 3 logical parts:
>> 1) Bus - implements switch bus interface. Currently only PCI bus is
>>    implemented, but more buses will be added in the future. Namely I2C
>>    and SGMII.
>>    (patch #2)
>> 2) Driver - implemements of ASIC-specific functions.
>>    Currently SwitchX-2 ASIC is supported, but a plan exists to introduce
>>    support for Spectrum ASIC in the near future.
>>    (patch #4)
>> 3) Core - infrastructure that glues buses and drivers together.
>>    It implements register access logic (EMADs) and takes care of RX traps
>>    and events.
>>    (patch #1 and #3)
>>
>> Ido Schimmel (1):
>>   mlxsw: Add interface to access registers and process events
>>
>> Jiri Pirko (3):
>>   mlxsw: Introduce Mellanox switch driver core
>>   mlxsw: Add PCI bus implementation
>>   mlxsw: Introduce Mellanox SwitchX-2 ASIC support
>
>This is awesome!  Reviewing...

Thanks!


>
>checkpatch.pl shows 1 ERROR, bunch of >80 chars WARNs, and bunch of
>CHECKs on space after cast.

Those >80char WARNs are mostly macro "\" on col 81. That is afaik ok.
Maybe that should be changed in checkpatch as well.

The error is false possitive. It is again in macro, checkpatch thinks it
is a function and yells about "{" on the same line. However it is a
struct initialization so in that case it is ok.


>
>On the CHECKs on space after cast, should we modify checkpatch.pl to
>not flag those for drivers/net?

I agree.


>
>-scott

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch net-next 3/4] mlxsw: Add interface to access registers and process events
  2015-07-24  5:13   ` Scott Feldman
  2015-07-24  5:18     ` Jiri Pirko
@ 2015-07-24  6:48     ` Elad Raz
  1 sibling, 0 replies; 29+ messages in thread
From: Elad Raz @ 2015-07-24  6:48 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Jiri Pirko, Netdev, David S. Miller, Ido Schimmel, Or Gerlitz,
	Roopa Prabhu, Florian Fainelli, Thomas Graf, Alexei Starovoitov,
	Jamal Hadi Salim, Daniel Borkmann, john fastabend, simon.horman,
	John Linville, Andy Gospodarek, Shrijeet Mukherjee, nhorman,
	Jiri Pirko, Matty Kadosh



Sent from my iPhone

> On Jul 24, 2015, at 08:14, Scott Feldman <sfeldma@gmail.com> wrote:
> 
>> On Thu, Jul 23, 2015 at 8:43 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> From: Ido Schimmel <idosch@mellanox.com>
>> 
>> Add the ability to construct mailbox-style register access messages
>> called EMADs with provisions to construct and parse the registers payload.
>> Implement EMAD transaction layer which is responsible for the reliable
>> transmission of EMADs.
>> Also, add an infrastructure used by the switch driver to register for
>> particular events generated by the device.
> 
> What is this EMADs used for?  Is this for intra-switch or inter-switch
> communications?

Ethernet management datagram.
It's command encoding wrap as a packet. It used for host interface communication as well as multi-silicon support.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch net-next 3/4] mlxsw: Add interface to access registers and process events
  2015-07-24  5:24     ` Jiri Pirko
@ 2015-07-24 12:09       ` Andy Gospodarek
  0 siblings, 0 replies; 29+ messages in thread
From: Andy Gospodarek @ 2015-07-24 12:09 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, idosch, eladr, ogerlitz, sfeldma, roopa,
	f.fainelli, tgraf, ast, jhs, daniel, john.fastabend,
	simon.horman, linville, andy, shm, nhorman, Jiri Pirko

On Fri, Jul 24, 2015 at 07:24:53AM +0200, Jiri Pirko wrote:
> Thu, Jul 23, 2015 at 11:12:20PM CEST, gospo@cumulusnetworks.com wrote:
> >On Thu, Jul 23, 2015 at 05:43:35PM +0200, Jiri Pirko wrote:
> >> From: Ido Schimmel <idosch@mellanox.com>
> >> 
> >> Add the ability to construct mailbox-style register access messages
> >> called EMADs with provisions to construct and parse the registers payload.
> >> Implement EMAD transaction layer which is responsible for the reliable
> >> transmission of EMADs.
> >> Also, add an infrastructure used by the switch driver to register for
> >> particular events generated by the device.
> >> 
> >> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> >> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> >> Signed-off-by: Elad Raz <eladr@mellanox.com>
> >> ---
> >>  drivers/net/ethernet/mellanox/mlxsw/core.c |  736 ++++++++++++++++
> >>  drivers/net/ethernet/mellanox/mlxsw/core.h |   21 +
> >>  drivers/net/ethernet/mellanox/mlxsw/emad.h |  127 +++
> >>  drivers/net/ethernet/mellanox/mlxsw/port.h |   19 +
> >>  drivers/net/ethernet/mellanox/mlxsw/reg.h  | 1289 ++++++++++++++++++++++++++++
> >>  5 files changed, 2192 insertions(+)
> >>  create mode 100644 drivers/net/ethernet/mellanox/mlxsw/emad.h
> >>  create mode 100644 drivers/net/ethernet/mellanox/mlxsw/reg.h
> >> 
> >> diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
> >> index 211ec9b..bd0f692 100644
> >> --- a/drivers/net/ethernet/mellanox/mlxsw/core.c
> >> +++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
> >[...]
> >> +	struct list_head event_listener_list;
> >> +	struct {
> >> +		struct sk_buff *resp_skb;
> >> +		u64 tid;
> >> +		wait_queue_head_t wait;
> >> +		bool trans_active;
> >> +		struct mutex lock; /* One EMAD transaction at a time. */
> >> +		bool use_emad;
> >> +	} emad;
> >>  	struct mlxsw_core_pcpu_stats __percpu *pcpu_stats;
> >>  	struct dentry *dbg_dir;
> >>  	struct {
> >[...]
> >>  	}
> >>  
> >>  	INIT_LIST_HEAD(&mlxsw_core->rx_listener_list);
> >> +	INIT_LIST_HEAD(&mlxsw_core->event_listener_list);
> >>  	mlxsw_core->driver = mlxsw_driver;
> >>  	mlxsw_core->bus = mlxsw_bus;
> >>  	mlxsw_core->bus_priv = bus_priv;
> >[...]
> >> +	/* No reason to save item if we did not manage to register an RX
> >> +	 * listener for it.
> >> +	 */
> >> +	list_add_rcu(&el_item->list, &mlxsw_core->event_listener_list);
> >> +
> >
> >I see where 'event_listener_list' is defined and where entries are
> >added/removed, but where is the code that would receive these events and
> >presumably search this list so all handlers registered (currently just
> >PUDE) can handle events?
> 
> That is handled by calling mlxsw_core_rx_listener_register.
> that will add each event handler as a item to &mlxsw_core->rx_listener_list
> These rx_listeners are called from mlxsw_core_skb_receive.
> So event is here a special case of rx_listener. The event list is used
> just to contain struct mlxsw_event_listener_item instances.
> 

Thanks for the explanation!  I missed how that was called the first time
I went through this.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch net-next 4/4] mlxsw: Introduce Mellanox SwitchX-2 ASIC support
  2015-07-23 15:43 ` [patch net-next 4/4] mlxsw: Introduce Mellanox SwitchX-2 ASIC support Jiri Pirko
  2015-07-23 17:19   ` Alexander Duyck
@ 2015-07-26  2:45   ` Scott Feldman
  2015-07-26  7:10     ` Jiri Pirko
  1 sibling, 1 reply; 29+ messages in thread
From: Scott Feldman @ 2015-07-26  2:45 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Netdev, David S. Miller, idosch, eladr, ogerlitz, Roopa Prabhu,
	Florian Fainelli, Thomas Graf, Alexei Starovoitov,
	Jamal Hadi Salim, Daniel Borkmann, john fastabend, simon.horman,
	John Linville, Andy Gospodarek, Shrijeet Mukherjee, nhorman,
	Jiri Pirko

On Thu, Jul 23, 2015 at 8:43 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> From: Jiri Pirko <jiri@mellanox.com>
>
> Benefit from the previously introduced Mellanox Switch infrastructure and
> add driver for SwitchX-2 ASIC. Note that this driver is very simple now.
> It implements bare minimum for getting device to work on slow-path.
> Fast-path offload functionality is going to be added soon.
>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Signed-off-by: Elad Raz <eladr@mellanox.com>

[cut]

> +static netdev_tx_t mlxsw_sx_port_xmit(struct sk_buff *skb,
> +                                     struct net_device *dev)
> +{
> +       struct mlxsw_sx_port *mlxsw_sx_port = netdev_priv(dev);
> +       struct mlxsw_sx *mlxsw_sx = mlxsw_sx_port->mlxsw_sx;
> +       struct mlxsw_sx_port_pcpu_stats *pcpu_stats;
> +       const struct mlxsw_tx_info tx_info = {
> +               .local_port = mlxsw_sx_port->local_port,
> +               .is_emad = false,
> +       };
> +       int err;
> +
> +       if (unlikely(skb_headroom(skb) < MLXSW_TXHDR_LEN)) {

Does this happen at all since dev->hard_header_len was set in probe to
add MLXSW_TXHDR_LEN?

> +               struct sk_buff *skb_new;
> +
> +               skb_new = skb_realloc_headroom(skb, MLXSW_TXHDR_LEN);
> +               dev_kfree_skb_any(skb);
> +               if (!skb_new) {
> +                       this_cpu_inc(mlxsw_sx_port->pcpu_stats->tx_dropped);
> +                       return NETDEV_TX_OK;
> +               }
> +               skb = skb_new;
> +       }
> +       mlxsw_sx_txhdr_construct(skb, &tx_info);
> +       err = mlxsw_core_skb_transmit(mlxsw_sx, skb, &tx_info);
> +       if (err == -EAGAIN)
> +               return NETDEV_TX_BUSY;

I think there is a problem here when returning NETDEV_TX_BUSY when
original skb might have been freed above in the headroom check. (ref
Documentation/networking/driver.txt).

[cut]

> +static int mlxsw_sx_port_dev_addr_get(struct mlxsw_sx_port *mlxsw_sx_port)
> +{
> +       struct mlxsw_sx *mlxsw_sx = mlxsw_sx_port->mlxsw_sx;
> +       struct net_device *dev = mlxsw_sx_port->dev;
> +       char ppad_pl[MLXSW_REG_PPAD_LEN];
> +       int err;
> +
> +       mlxsw_reg_ppad_pack(ppad_pl, false, 0);
> +       err = mlxsw_reg_query(mlxsw_sx->core, MLXSW_REG(ppad), ppad_pl);
> +       if (err)
> +               return err;
> +       mlxsw_reg_ppad_mac_memcpy_from(ppad_pl, dev->dev_addr);
> +       /* The last byte in base mac address is always 0 */
> +       dev->dev_addr[ETH_ALEN - 1] += mlxsw_sx_port->local_port;

If MLXSW_PORT_MAX_PORTS > 256, you'll wrap this.  Is dev_addr[ETH_ALEN
- 2] available to carry into?

> +       return 0;
> +}
> +

[cut]

> +static int mlxsw_sx_port_create(struct mlxsw_sx *mlxsw_sx, u8 local_port)
> +{
> +       struct mlxsw_sx_port *mlxsw_sx_port;
> +       struct net_device *dev;
> +       bool usable;
> +       int err;
> +
> +       dev = alloc_etherdev(sizeof(struct mlxsw_sx_port));
> +       if (!dev)
> +               return -ENOMEM;
> +       mlxsw_sx_port = netdev_priv(dev);
> +       mlxsw_sx_port->dev = dev;
> +       mlxsw_sx_port->mlxsw_sx = mlxsw_sx;
> +       mlxsw_sx_port->local_port = local_port;
> +
> +       mlxsw_sx_port->pcpu_stats =
> +               netdev_alloc_pcpu_stats(struct mlxsw_sx_port_pcpu_stats);
> +       if (!mlxsw_sx_port->pcpu_stats) {
> +               err = -ENOMEM;
> +               goto err_alloc_stats;
> +       }
> +
> +       dev->netdev_ops = &mlxsw_sx_port_netdev_ops;
> +       dev->ethtool_ops = &mlxsw_sx_port_ethtool_ops;
> +       dev->switchdev_ops = &mlxsw_sx_port_switchdev_ops;
> +
> +       err = mlxsw_sx_port_dev_addr_get(mlxsw_sx_port);
> +       if (err) {
> +               dev_err(mlxsw_sx->bus_info->dev, "Port %d: Unable to get port mac address\n",
> +                       mlxsw_sx_port->local_port);
> +               goto err_dev_addr_get;
> +       }
> +
> +       netif_carrier_off(dev);
> +
> +       dev->features |= NETIF_F_NETNS_LOCAL | NETIF_F_LLTX |

Not supposed to use LLTX in new drivers, according to
include/linux/netdev_features.h.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch net-next 2/4] mlxsw: Add PCI bus implementation
  2015-07-23 15:43 ` [patch net-next 2/4] mlxsw: Add PCI bus implementation Jiri Pirko
  2015-07-24  4:52   ` Scott Feldman
@ 2015-07-26  5:15   ` Scott Feldman
  2015-07-26  6:49     ` Jiri Pirko
  1 sibling, 1 reply; 29+ messages in thread
From: Scott Feldman @ 2015-07-26  5:15 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Netdev, David S. Miller, idosch, eladr, ogerlitz, Roopa Prabhu,
	Florian Fainelli, Thomas Graf, Alexei Starovoitov,
	Jamal Hadi Salim, Daniel Borkmann, john fastabend, simon.horman,
	John Linville, Andy Gospodarek, Shrijeet Mukherjee, nhorman,
	Jiri Pirko

On Thu, Jul 23, 2015 at 8:43 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> From: Jiri Pirko <jiri@mellanox.com>
>
> Add PCI bus implementation for Mellanox Technologies Switch ASICs. This
> includes firmware initialization, async queues manipulation and command
> interface implementation.
>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Signed-off-by: Elad Raz <eladr@mellanox.com>
> ---

[cut]

> +static int mlxsw_pci_fw_area_init(struct mlxsw_pci *mlxsw_pci, char *mbox,
> +                                 u16 num_pages)
> +{
> +       struct mlxsw_pci_mem_item *mem_item;
> +       int i;
> +       int err;

Need to init err to zero here

> +
> +       mlxsw_pci->fw_area.items = kcalloc(num_pages, sizeof(*mem_item),
> +                                          GFP_KERNEL);
> +       if (!mlxsw_pci->fw_area.items)
> +               return -ENOMEM;
> +       mlxsw_pci->fw_area.num_pages = num_pages;
> +
> +       mlxsw_cmd_mbox_zero(mbox);
> +       for (i = 0; i < num_pages; i++) {
> +               mem_item = &mlxsw_pci->fw_area.items[i];
> +
> +               mem_item->size = MLXSW_PCI_PAGE_SIZE;
> +               mem_item->buf = pci_alloc_consistent(mlxsw_pci->pdev,
> +                                                    mem_item->size,
> +                                                    &mem_item->mapaddr);
> +               if (!mem_item->buf)
> +                       goto err_alloc;
> +               mlxsw_cmd_mbox_map_fa_pa_set(mbox, i, mem_item->mapaddr);
> +               mlxsw_cmd_mbox_map_fa_log2size_set(mbox, i, 0); /* 1 page */
> +       }
> +
> +       err = mlxsw_cmd_map_fa(mlxsw_pci->core, mbox, num_pages);
> +       if (err)
> +               goto err_cmd_map_fa;
> +
> +       return 0;
> +
> +err_cmd_map_fa:
> +err_alloc:
> +       for (i--; i >= 0; i--) {
> +               mem_item = &mlxsw_pci->fw_area.items[i];
> +
> +               pci_free_consistent(mlxsw_pci->pdev, mem_item->size,
> +                                   mem_item->buf, mem_item->mapaddr);
> +       }
> +       kfree(mlxsw_pci->fw_area.items);
> +       return err;
> +}

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch net-next 2/4] mlxsw: Add PCI bus implementation
  2015-07-26  5:15   ` Scott Feldman
@ 2015-07-26  6:49     ` Jiri Pirko
  0 siblings, 0 replies; 29+ messages in thread
From: Jiri Pirko @ 2015-07-26  6:49 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Jiri Pirko, Netdev, David S. Miller, idosch, eladr, ogerlitz,
	Roopa Prabhu, Florian Fainelli, Thomas Graf, Alexei Starovoitov,
	Jamal Hadi Salim, Daniel Borkmann, john fastabend, simon.horman,
	John Linville, Andy Gospodarek, Shrijeet Mukherjee, nhorman

Sun, Jul 26, 2015 at 07:15:04AM CEST, sfeldma@gmail.com wrote:
>On Thu, Jul 23, 2015 at 8:43 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>>
>> Add PCI bus implementation for Mellanox Technologies Switch ASICs. This
>> includes firmware initialization, async queues manipulation and command
>> interface implementation.
>>
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
>> Signed-off-by: Elad Raz <eladr@mellanox.com>
>> ---
>
>[cut]
>
>> +static int mlxsw_pci_fw_area_init(struct mlxsw_pci *mlxsw_pci, char *mbox,
>> +                                 u16 num_pages)
>> +{
>> +       struct mlxsw_pci_mem_item *mem_item;
>> +       int i;
>> +       int err;
>
>Need to init err to zero here

You are right. We need to set err before "goto err_alloc". Thanks.


>
>> +
>> +       mlxsw_pci->fw_area.items = kcalloc(num_pages, sizeof(*mem_item),
>> +                                          GFP_KERNEL);
>> +       if (!mlxsw_pci->fw_area.items)
>> +               return -ENOMEM;
>> +       mlxsw_pci->fw_area.num_pages = num_pages;
>> +
>> +       mlxsw_cmd_mbox_zero(mbox);
>> +       for (i = 0; i < num_pages; i++) {
>> +               mem_item = &mlxsw_pci->fw_area.items[i];
>> +
>> +               mem_item->size = MLXSW_PCI_PAGE_SIZE;
>> +               mem_item->buf = pci_alloc_consistent(mlxsw_pci->pdev,
>> +                                                    mem_item->size,
>> +                                                    &mem_item->mapaddr);
>> +               if (!mem_item->buf)
>> +                       goto err_alloc;
>> +               mlxsw_cmd_mbox_map_fa_pa_set(mbox, i, mem_item->mapaddr);
>> +               mlxsw_cmd_mbox_map_fa_log2size_set(mbox, i, 0); /* 1 page */
>> +       }
>> +
>> +       err = mlxsw_cmd_map_fa(mlxsw_pci->core, mbox, num_pages);
>> +       if (err)
>> +               goto err_cmd_map_fa;
>> +
>> +       return 0;
>> +
>> +err_cmd_map_fa:
>> +err_alloc:
>> +       for (i--; i >= 0; i--) {
>> +               mem_item = &mlxsw_pci->fw_area.items[i];
>> +
>> +               pci_free_consistent(mlxsw_pci->pdev, mem_item->size,
>> +                                   mem_item->buf, mem_item->mapaddr);
>> +       }
>> +       kfree(mlxsw_pci->fw_area.items);
>> +       return err;
>> +}

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch net-next 4/4] mlxsw: Introduce Mellanox SwitchX-2 ASIC support
  2015-07-26  2:45   ` Scott Feldman
@ 2015-07-26  7:10     ` Jiri Pirko
  2015-07-26 17:14       ` Jiri Pirko
  0 siblings, 1 reply; 29+ messages in thread
From: Jiri Pirko @ 2015-07-26  7:10 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Netdev, David S. Miller, idosch, eladr, ogerlitz, Roopa Prabhu,
	Florian Fainelli, Thomas Graf, Alexei Starovoitov,
	Jamal Hadi Salim, Daniel Borkmann, john fastabend, simon.horman,
	John Linville, Andy Gospodarek, Shrijeet Mukherjee, nhorman,
	Jiri Pirko

Sun, Jul 26, 2015 at 04:45:07AM CEST, sfeldma@gmail.com wrote:
>On Thu, Jul 23, 2015 at 8:43 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>>
>> Benefit from the previously introduced Mellanox Switch infrastructure and
>> add driver for SwitchX-2 ASIC. Note that this driver is very simple now.
>> It implements bare minimum for getting device to work on slow-path.
>> Fast-path offload functionality is going to be added soon.
>>
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
>> Signed-off-by: Elad Raz <eladr@mellanox.com>
>
>[cut]
>
>> +static netdev_tx_t mlxsw_sx_port_xmit(struct sk_buff *skb,
>> +                                     struct net_device *dev)
>> +{
>> +       struct mlxsw_sx_port *mlxsw_sx_port = netdev_priv(dev);
>> +       struct mlxsw_sx *mlxsw_sx = mlxsw_sx_port->mlxsw_sx;
>> +       struct mlxsw_sx_port_pcpu_stats *pcpu_stats;
>> +       const struct mlxsw_tx_info tx_info = {
>> +               .local_port = mlxsw_sx_port->local_port,
>> +               .is_emad = false,
>> +       };
>> +       int err;
>> +
>> +       if (unlikely(skb_headroom(skb) < MLXSW_TXHDR_LEN)) {
>
>Does this happen at all since dev->hard_header_len was set in probe to
>add MLXSW_TXHDR_LEN?

This needs to be done for example for bridge forwarding case and other
forwarding cases.


>
>> +               struct sk_buff *skb_new;
>> +
>> +               skb_new = skb_realloc_headroom(skb, MLXSW_TXHDR_LEN);
>> +               dev_kfree_skb_any(skb);
>> +               if (!skb_new) {
>> +                       this_cpu_inc(mlxsw_sx_port->pcpu_stats->tx_dropped);
>> +                       return NETDEV_TX_OK;
>> +               }
>> +               skb = skb_new;
>> +       }
>> +       mlxsw_sx_txhdr_construct(skb, &tx_info);
>> +       err = mlxsw_core_skb_transmit(mlxsw_sx, skb, &tx_info);
>> +       if (err == -EAGAIN)
>> +               return NETDEV_TX_BUSY;
>
>I think there is a problem here when returning NETDEV_TX_BUSY when
>original skb might have been freed above in the headroom check. (ref
>Documentation/networking/driver.txt).

I have to check this out a bit more. Thanks for pointing that out.


>
>[cut]
>
>> +static int mlxsw_sx_port_dev_addr_get(struct mlxsw_sx_port *mlxsw_sx_port)
>> +{
>> +       struct mlxsw_sx *mlxsw_sx = mlxsw_sx_port->mlxsw_sx;
>> +       struct net_device *dev = mlxsw_sx_port->dev;
>> +       char ppad_pl[MLXSW_REG_PPAD_LEN];
>> +       int err;
>> +
>> +       mlxsw_reg_ppad_pack(ppad_pl, false, 0);
>> +       err = mlxsw_reg_query(mlxsw_sx->core, MLXSW_REG(ppad), ppad_pl);
>> +       if (err)
>> +               return err;
>> +       mlxsw_reg_ppad_mac_memcpy_from(ppad_pl, dev->dev_addr);
>> +       /* The last byte in base mac address is always 0 */
>> +       dev->dev_addr[ETH_ALEN - 1] += mlxsw_sx_port->local_port;
>
>If MLXSW_PORT_MAX_PORTS > 256, you'll wrap this.  Is dev_addr[ETH_ALEN
>- 2] available to carry into?

That will never happen. MLXSW_PORT_MAX_PORTS is 0x40 and address got
from ppad register always ends with 0


>
>> +       return 0;
>> +}
>> +
>
>[cut]
>
>> +static int mlxsw_sx_port_create(struct mlxsw_sx *mlxsw_sx, u8 local_port)
>> +{
>> +       struct mlxsw_sx_port *mlxsw_sx_port;
>> +       struct net_device *dev;
>> +       bool usable;
>> +       int err;
>> +
>> +       dev = alloc_etherdev(sizeof(struct mlxsw_sx_port));
>> +       if (!dev)
>> +               return -ENOMEM;
>> +       mlxsw_sx_port = netdev_priv(dev);
>> +       mlxsw_sx_port->dev = dev;
>> +       mlxsw_sx_port->mlxsw_sx = mlxsw_sx;
>> +       mlxsw_sx_port->local_port = local_port;
>> +
>> +       mlxsw_sx_port->pcpu_stats =
>> +               netdev_alloc_pcpu_stats(struct mlxsw_sx_port_pcpu_stats);
>> +       if (!mlxsw_sx_port->pcpu_stats) {
>> +               err = -ENOMEM;
>> +               goto err_alloc_stats;
>> +       }
>> +
>> +       dev->netdev_ops = &mlxsw_sx_port_netdev_ops;
>> +       dev->ethtool_ops = &mlxsw_sx_port_ethtool_ops;
>> +       dev->switchdev_ops = &mlxsw_sx_port_switchdev_ops;
>> +
>> +       err = mlxsw_sx_port_dev_addr_get(mlxsw_sx_port);
>> +       if (err) {
>> +               dev_err(mlxsw_sx->bus_info->dev, "Port %d: Unable to get port mac address\n",
>> +                       mlxsw_sx_port->local_port);
>> +               goto err_dev_addr_get;
>> +       }
>> +
>> +       netif_carrier_off(dev);
>> +
>> +       dev->features |= NETIF_F_NETNS_LOCAL | NETIF_F_LLTX |
>
>Not supposed to use LLTX in new drivers, according to
>include/linux/netdev_features.h.

In our case, wee need to use this. Since multiple port netdevs may use
the same send dataqueue, we need to do locking ourselves.

Thanks for review!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch net-next 4/4] mlxsw: Introduce Mellanox SwitchX-2 ASIC support
  2015-07-26  7:10     ` Jiri Pirko
@ 2015-07-26 17:14       ` Jiri Pirko
  2015-07-27  6:17         ` Rosen, Rami
  0 siblings, 1 reply; 29+ messages in thread
From: Jiri Pirko @ 2015-07-26 17:14 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Netdev, David S. Miller, idosch, eladr, ogerlitz, Roopa Prabhu,
	Florian Fainelli, Thomas Graf, Alexei Starovoitov,
	Jamal Hadi Salim, Daniel Borkmann, john fastabend, simon.horman,
	John Linville, Andy Gospodarek, Shrijeet Mukherjee, nhorman,
	Jiri Pirko

<snip>

>
>
>>
>>> +               struct sk_buff *skb_new;
>>> +
>>> +               skb_new = skb_realloc_headroom(skb, MLXSW_TXHDR_LEN);
>>> +               dev_kfree_skb_any(skb);
>>> +               if (!skb_new) {
>>> +                       this_cpu_inc(mlxsw_sx_port->pcpu_stats->tx_dropped);
>>> +                       return NETDEV_TX_OK;
>>> +               }
>>> +               skb = skb_new;
>>> +       }
>>> +       mlxsw_sx_txhdr_construct(skb, &tx_info);
>>> +       err = mlxsw_core_skb_transmit(mlxsw_sx, skb, &tx_info);
>>> +       if (err == -EAGAIN)
>>> +               return NETDEV_TX_BUSY;
>>
>>I think there is a problem here when returning NETDEV_TX_BUSY when
>>original skb might have been freed above in the headroom check. (ref
>>Documentation/networking/driver.txt).
>
>I have to check this out a bit more. Thanks for pointing that out.

You are right. In case queue is busy we obviously cannot free
the original skb. Will fix that in V2. Thanks!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [patch net-next 4/4] mlxsw: Introduce Mellanox SwitchX-2 ASIC support
  2015-07-26 17:14       ` Jiri Pirko
@ 2015-07-27  6:17         ` Rosen, Rami
  2015-07-28 18:06           ` Jiri Pirko
  0 siblings, 1 reply; 29+ messages in thread
From: Rosen, Rami @ 2015-07-27  6:17 UTC (permalink / raw)
  To: Jiri Pirko, Scott Feldman
  Cc: Netdev, David S. Miller, idosch, eladr, ogerlitz, Roopa Prabhu,
	Florian Fainelli, Thomas Graf, Alexei Starovoitov,
	Jamal Hadi Salim, Daniel Borkmann, john fastabend, simon.horman,
	John Linville, Andy Gospodarek, Shrijeet Mukherjee, nhorman,
	Jiri Pirko

Hi, Jiri,

Keep on the good work!

The .func  member of the mlxsw_rx_listener object has this prototype:
void (*func)(struct sk_buff *skb, u8 local_port, u16 trap_id, void *priv);

Is the trap_id  parameter needed ?
In the three use cases of .func, which are either mlxsw_emad_rx_listener_func(), mlxsw_core_event_listener_func(), or mlxsw_sx_rx_listener_func(), this parameter is not used at all.


Regards,
Rami Rosen
Intel Corporation

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch net-next 0/4] Introduce Mellanox Technologies Switch ASICs switchdev drivers
  2015-07-23 15:43 [patch net-next 0/4] Introduce Mellanox Technologies Switch ASICs switchdev drivers Jiri Pirko
                   ` (4 preceding siblings ...)
  2015-07-24  0:03 ` [patch net-next 0/4] Introduce Mellanox Technologies Switch ASICs switchdev drivers Scott Feldman
@ 2015-07-27  6:51 ` David Miller
  2015-07-27 20:21 ` Scott Feldman
  6 siblings, 0 replies; 29+ messages in thread
From: David Miller @ 2015-07-27  6:51 UTC (permalink / raw)
  To: jiri
  Cc: netdev, idosch, eladr, ogerlitz, sfeldma, roopa, f.fainelli,
	tgraf, ast, jhs, daniel, john.fastabend, simon.horman, linville,
	andy, shm, nhorman

From: Jiri Pirko <jiri@resnulli.us>
Date: Thu, 23 Jul 2015 17:43:32 +0200

> This patchset introduces Mellanox Technologies Switch driver
> infrastructure and support for SwitchX-2 ASIC.

Very nice.  I look forward to your next iteration, taking into
consideration the feedback you've received.

Thanks!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch net-next 0/4] Introduce Mellanox Technologies Switch ASICs switchdev drivers
  2015-07-23 15:43 [patch net-next 0/4] Introduce Mellanox Technologies Switch ASICs switchdev drivers Jiri Pirko
                   ` (5 preceding siblings ...)
  2015-07-27  6:51 ` David Miller
@ 2015-07-27 20:21 ` Scott Feldman
  2015-07-27 20:27   ` Jiri Pirko
  6 siblings, 1 reply; 29+ messages in thread
From: Scott Feldman @ 2015-07-27 20:21 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Netdev, David S. Miller, idosch, eladr, ogerlitz, Roopa Prabhu,
	Florian Fainelli, Thomas Graf, Alexei Starovoitov,
	Jamal Hadi Salim, Daniel Borkmann, john fastabend, simon.horman,
	John Linville, Andy Gospodarek, Shrijeet Mukherjee, nhorman

On Thu, Jul 23, 2015 at 8:43 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> This patchset introduces Mellanox Technologies Switch driver infrastructure
> and support for SwitchX-2 ASIC.

You guys did a great job on the driver; looking forward to seeing
L2/L3 hooked up.  Very nice, aesthetically pleasing code.

Is this a ground-up rewrite or a port of the SDK?  It's so clean and
tight, I'm guessing a ground-up rewrite.

Reviewed-by: Scott Feldman <sfeldma@gmail.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch net-next 0/4] Introduce Mellanox Technologies Switch ASICs switchdev drivers
  2015-07-27 20:21 ` Scott Feldman
@ 2015-07-27 20:27   ` Jiri Pirko
  2015-07-27 21:21     ` Florian Fainelli
  0 siblings, 1 reply; 29+ messages in thread
From: Jiri Pirko @ 2015-07-27 20:27 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Netdev, David S. Miller, idosch, eladr, ogerlitz, Roopa Prabhu,
	Florian Fainelli, Thomas Graf, Alexei Starovoitov,
	Jamal Hadi Salim, Daniel Borkmann, john fastabend, simon.horman,
	John Linville, Andy Gospodarek, Shrijeet Mukherjee, nhorman

Mon, Jul 27, 2015 at 10:21:54PM CEST, sfeldma@gmail.com wrote:
>On Thu, Jul 23, 2015 at 8:43 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> This patchset introduces Mellanox Technologies Switch driver infrastructure
>> and support for SwitchX-2 ASIC.
>
>You guys did a great job on the driver; looking forward to seeing
>L2/L3 hooked up.  Very nice, aesthetically pleasing code.

Thanks!

>
>Is this a ground-up rewrite or a port of the SDK?  It's so clean and
>tight, I'm guessing a ground-up rewrite.

It's rewritten from scratch.

>
>Reviewed-by: Scott Feldman <sfeldma@gmail.com>

Thanks for your review!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch net-next 0/4] Introduce Mellanox Technologies Switch ASICs switchdev drivers
  2015-07-27 20:27   ` Jiri Pirko
@ 2015-07-27 21:21     ` Florian Fainelli
  0 siblings, 0 replies; 29+ messages in thread
From: Florian Fainelli @ 2015-07-27 21:21 UTC (permalink / raw)
  To: Jiri Pirko, Scott Feldman
  Cc: Netdev, David S. Miller, idosch, eladr, ogerlitz, Roopa Prabhu,
	Thomas Graf, Alexei Starovoitov, Jamal Hadi Salim,
	Daniel Borkmann, john fastabend, simon.horman, John Linville,
	Andy Gospodarek, Shrijeet Mukherjee, nhorman

On 27/07/15 13:27, Jiri Pirko wrote:
> Mon, Jul 27, 2015 at 10:21:54PM CEST, sfeldma@gmail.com wrote:
>> On Thu, Jul 23, 2015 at 8:43 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>> This patchset introduces Mellanox Technologies Switch driver infrastructure
>>> and support for SwitchX-2 ASIC.
>>
>> You guys did a great job on the driver; looking forward to seeing
>> L2/L3 hooked up.  Very nice, aesthetically pleasing code.
> 
> Thanks!
> 
>>
>> Is this a ground-up rewrite or a port of the SDK?  It's so clean and
>> tight, I'm guessing a ground-up rewrite.
> 
> It's rewritten from scratch.

Only glanced through the driver, but this looks like really really nice
and clean, very pleased to see such a driver being submitted. That is
very encouraging and should inspire other companies in doing so, so
thanks for doing this!
-- 
Florian

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch net-next 4/4] mlxsw: Introduce Mellanox SwitchX-2 ASIC support
  2015-07-27  6:17         ` Rosen, Rami
@ 2015-07-28 18:06           ` Jiri Pirko
  0 siblings, 0 replies; 29+ messages in thread
From: Jiri Pirko @ 2015-07-28 18:06 UTC (permalink / raw)
  To: Rosen, Rami
  Cc: Scott Feldman, Netdev, David S. Miller, idosch, eladr, ogerlitz,
	Roopa Prabhu, Florian Fainelli, Thomas Graf, Alexei Starovoitov,
	Jamal Hadi Salim, Daniel Borkmann, john fastabend, simon.horman,
	John Linville, Andy Gospodarek, Shrijeet Mukherjee, nhorman,
	Jiri Pirko

Mon, Jul 27, 2015 at 08:17:22AM CEST, rami.rosen@intel.com wrote:
>Hi, Jiri,
>
>Keep on the good work!
>
>The .func  member of the mlxsw_rx_listener object has this prototype:
>void (*func)(struct sk_buff *skb, u8 local_port, u16 trap_id, void *priv);
>
>Is the trap_id  parameter needed ?
>In the three use cases of .func, which are either mlxsw_emad_rx_listener_func(), mlxsw_core_event_listener_func(), or mlxsw_sx_rx_listener_func(), this parameter is not used at all.

You are right. We intended to use MLXSW_TRAP_ID_DONT_CARE and use
trap_id in the callback, but when user registers a callback for exact
trap_id, this is not needed. Will remove in v2.

Thanks.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2015-07-28 18:06 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-23 15:43 [patch net-next 0/4] Introduce Mellanox Technologies Switch ASICs switchdev drivers Jiri Pirko
2015-07-23 15:43 ` [patch net-next 1/4] mlxsw: Introduce Mellanox switch driver core Jiri Pirko
2015-07-23 15:43 ` [patch net-next 2/4] mlxsw: Add PCI bus implementation Jiri Pirko
2015-07-24  4:52   ` Scott Feldman
2015-07-24  5:30     ` Jiri Pirko
2015-07-26  5:15   ` Scott Feldman
2015-07-26  6:49     ` Jiri Pirko
2015-07-23 15:43 ` [patch net-next 3/4] mlxsw: Add interface to access registers and process events Jiri Pirko
2015-07-23 21:12   ` Andy Gospodarek
2015-07-24  5:24     ` Jiri Pirko
2015-07-24 12:09       ` Andy Gospodarek
2015-07-24  5:13   ` Scott Feldman
2015-07-24  5:18     ` Jiri Pirko
2015-07-24  6:48     ` Elad Raz
2015-07-23 15:43 ` [patch net-next 4/4] mlxsw: Introduce Mellanox SwitchX-2 ASIC support Jiri Pirko
2015-07-23 17:19   ` Alexander Duyck
2015-07-23 19:42     ` Jiri Pirko
2015-07-26  2:45   ` Scott Feldman
2015-07-26  7:10     ` Jiri Pirko
2015-07-26 17:14       ` Jiri Pirko
2015-07-27  6:17         ` Rosen, Rami
2015-07-28 18:06           ` Jiri Pirko
2015-07-24  0:03 ` [patch net-next 0/4] Introduce Mellanox Technologies Switch ASICs switchdev drivers Scott Feldman
2015-07-24  0:38   ` Rustad, Mark D
2015-07-24  5:38   ` Jiri Pirko
2015-07-27  6:51 ` David Miller
2015-07-27 20:21 ` Scott Feldman
2015-07-27 20:27   ` Jiri Pirko
2015-07-27 21:21     ` Florian Fainelli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).